Electronic Journal of Statistics ISSN:1935-7524 Optimal-order uniform and nonuniform bounds on the rate of convergence to 6 normality for maximum likelihood 1 0 estimators 2 ∗ c e Iosif Pinelis D Department of Mathematical Sciences 4 Michigan Technological University 1 Houghton, Michigan 49931 e-mail: [email protected] ] T Abstract: Itiswell knownthat, under general regularityconditions, the S distributionofthemaximumlikelihoodestimator(MLE)isasymptotically normal.Veryrecently,bounds oftheoptimalorderO(1/√n)ontheclose- . h ness ofthedistributionoftheMLEto normalityinthe so-calledbounded t Wassersteindistancewereobtained[2,1],wherenisthesamplesize.How- a ever, the corresponding bounds on the Kolmogorov distance were only of m theorderO(1/n1/4).Inthispaper,bounds oftheoptimalorderO(1/√n) [ on the closeness of the distribution of the MLE to normality in the Kol- mogorovdistancearegiven,aswellastheirnonuniformcounterparts,which 3 workbetter intailzones ofthedistributionoftheMLE.Theseresultsare v basedinpartonpreviouslyobtainedgeneral optimal-orderbounds onthe 7 rateofconvergencetonormalityinthemultivariatedeltamethod.Thecru- 7 cialobservation isthat, under natural conditions,the MLEcanbetightly 1 enough bracketed between twosmooth enough functions of thesum ofin- 2 dependent random vectors, which makes the delta method applicable. It 0 appears that the nonuniform bounds for MLEs in general have no prece- . dentsintheexistingliterature;aspecialcasewasrecentlytreatedbyPinelis 1 andMolzon[20].Theresultscanbeextended toM-estimators. 0 6 AMS 2010 subject classifications:62F10,62F12, 60F05,60E15. 1 Keywords and phrases: maximum likelihood estimators, Berry–Esseen : bounds,deltamethod,ratesofconvergence. v i X Contents r a 1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 General setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3 Tight bracketing of the MLE between two functions of the sum of independent random vectors . . . . . . . . . . . . . . . . . . . . . . . . 6 4 General uniform and nonuniformbounds from [20] on the rate of con- vergence to normality for smooth nonlinear functions of sums of inde- pendent random vectors . . . . . . . . . . . . . . . . . . . . . . . . . . 8 5 Making the bracketing work: Applying the general bounds of [20] . . 9 ∗December 15,2016 1 I. Pinelis/Rate of convergence to normality for maximum likelihood estimators 2 6 Exponentially small bounds on the remainder term P(θˆ θ >δ) . . 11 0 | − | 6.1 Bounding the remainder: Log-concavecase. . . . . . . . . . . . . 11 6.2 Bounding the remainder: General case . . . . . . . . . . . . . . . 11 7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 A On condition (1.1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1. Introduction Let us begin with the following quote from Kiefer [10] of 1968: asecondareaofwhatseemtomeimportantproblemstoworkonhastodowith the fact that we do have, in many settings, quite a good large sample theory, butwedon’tknowhowlargethesamplesizeshavetobeforthattheorytotake hold. Now, I’m sure most of you are familiar with the error estimate one can givefortheclassicalcentral-limittheorem,whichgoesbythenameoftheBerry- Esseen estimate, and which tells you that under certain assumptions one can actuallygiveanexplicitboundonthedeparturefromthenormaldistributionof thesamplemeanforagivensamplesize,theerrortermbeingoforder1/√n.For most other statistical problems, in fact for almost anything other than the use of the sample mean, we have nothing. The most obvious example of this (and thisisnotoriginalwithme;manypeoplehavebeenconcernedwiththis),isthe maximum likelihood estimator in the case of regular estimation. We all know what the asymptotic distribution is. Can you give explicitly some useful bound on the departure from the asymptotic normal distribution as a function of the samplesizen?Itseemstobeaterrificallydifficultproblem. Since then, there has been some significant progress in this direction, espe- cially rather recently. For instance, Berry–Esseen-type bounds of order 1/√n wereobtainedforU-statistics–seee.g.[11];fortheStudentstatistic[4,3];and, even more recently, for rather broad classes of other statistics that depend on the observations in a nonlinear fashion [6, 20]. As Kiefer pointed out, it is well known that, under general regularity condi- tions, the distribution of the maximum likelihood estimator (MLE) is asymp- totically normal. In this paper, we shall consider Berry–Esseen-type bounds of order 1/√n for the MLE. First such bounds were apparently obtained in the paper [15], followed by [17, 18]. Very recently, bounds on the closeness of the distribution of the MLE to normality in the so-calledbounded Wassersteindis- tance, dbW, were obtained in [2]. In the rather common special case when the MLE θˆ is expressible as a smooth enough function of a linear statistic of in- dependent identically distributed (i.i.d.) observations, the bounds obtained in [2] were sharpened and simplified in [1] by using a versionof the delta method. More specifically, it was assumed in [1] that n 1 q(θˆ)= g(X ), (1.1) i n i=1 X where q: Θ R is a twice continuously differentiable one-to-one mapping, g: R R is→a Borel-measurable function, and the X ’s are i.i.d. real-valued i → r.v.’s. I. Pinelis/Rate of convergence to normality for maximum likelihood estimators 3 Itwasnotedin[2,Proposition2.1]thatforanyr.v.Y andastandardnormal r.v. Z one has dKo(Y,Z) 6 2 dbW(Y,Z), where dKo denotes the Kolmogorov distance. This bound on dKo in terms of dbW is the best possible one, up a p constant factor, as shown in [20]. Therefore, even though the bounds on the bounded Wasserstein distance dbW obtained in [2, 1] are of the optimal order O(1/√n),theresultingboundsontheKolmogorovdistanceareonlyoftheorder O(1/n1/4). That the order O(1/√n) is optimal for MLEs is well known; for instance, see the example of the Bernoullifamily of distributions givenin [15]. (cid:0) In[20],optimal-orderboundsoftheformO(1/√n)ontherateofconvergence (cid:1) tonormalityinthe generalmultivariatedelta methodweregiven.Those results areapplicablewhenthestatisticofinterestcanbeexpressedasasmoothenough function of the sum of independent random vectors. Accordingly, various kinds of applications were presented in [20]. In particular, uniform and nonuniform bounds of the optimal order on the closeness of the distribution of the MLE to normality were obtained in [20] under conditions similar to the mentioned conditions assumed in [1]. In this paper we present a way to extend those results in [20] to the general case, without an assumption of the form (1.1), made in [1, 20]. Of course, in generaltheMLEcannotberepresentedasafunctionofthesumofindependent random vectors (see Appendix A for details). However, the crucial observation here is that, under natural conditions, the MLE can be tightly enough brack- etedbetweentwosuchsmoothenoughfunctions,whichmakesthedeltamethod applicable. Thus, the present paper is methodologically different from the pre- ceding work on Berry–Esseen-typebounds for the MLE, in that it relies on the general result developed in [20], rather than on methods specially designed to deal with the MLE. Perhaps more importantly, the new method yields not only uniform bounds (that is,in the Kolmogorovmetric) ofthe optimalorderO(1/√n) onthe close- nessofthedistributionoftheMLEtonormalitybutalsotheirso-callednonuni- form counterparts, which work much better for large deviations, that is, in tail zones ofthe distribution ofthe MLE – which areusually of foremostinterestin statistical tests. Such nonuniform bounds for MLEs in general appear to have no precedents in the existing literature (except that, as stated above, a special case of nonuniform bounds for MLEs was recently treated in [20]). The paper is organized as follows. The general setting of the problem is described in Section 2. The key step of tight enough bracketing of the MLE between two functions of the sum of independent random vectors is made in Section 3. General uniform and nonuniform optimal-order bounds from [20] on theconvergencerateinthemultivariatedeltamethodarepresentedinSection4. In Section 5, we make the bracketing work by applying the general bounds in the multivariate delta method. Yet, this leaves out the problem of bounding a remainder, which is a probability of large deviations of the MLE from the true value of the parameter. It is shown in Section 6 that under natural conditions this remainder is exponentially fast decreasing (in n) and thus asymptotically negligibleascomparedtothemaintermontheorderof1/√n.Allthesefindings I. Pinelis/Rate of convergence to normality for maximum likelihood estimators 4 are summarized in Section 7, where the main result of this paper is presented, along with corresponding discussion. In Appendix A, it is shown that, under generalregularityconditions,(1.1) (orevena relaxedversionofit) implies that the family of densities is a one-parameter exponential one; in particular, this allowsoneto giveanynumberofexampleswherethe mainresultofthe present paper is applicable, whereas the corresponding result in [20] is not. 2. General setting Let X,X ,X ,... be random variables (r.v.’s) mapping a measurable space 1 2 (Ω, ) to another measurable space ( , ) and let (P ) be a parametric θ θ Θ A X B ∈ family of probability measures on (Ω, ) such that the r.v.’s X,X ,X ,... are 1 2 i.i.d. with respect to each of the probaAbility measures P with θ Θ; here the θ parameterspaceΘ is assumedto be a subsetof the realline R. As∈usual,let E θ denotetheexpectationwithrespecttotheprobabilitymeasureP .Supposethat θ for each θ Θ the distribution P X 1 of X has a density p with respect to a θ − θ ∈ measureµon .Becausetheextendedrealline[ , ]iscompact,foreachn N and each poBint x=x =(x ,...,x ) n t−h∞e lik∞elihood function Θ θ ∈ n 1 n ∈X ∋ 7→ L (θ):= n p (x )hasatleastonegeneralizedmaximizerθˆ (x)intheclosure x i=1 θ i n ofthesetΘin[ , ],inthesensethatsup L (θ)=limsup L (θ). Picking,foQreach−x∞=∞(x ,...,x ) n,anyoθn∈eΘofxsuchgeneralizeθ→dθmˆna(xx)imxizers 1 n ∈X θˆ (x), one obtains a map Ω ω θˆ (X(ω)), where X:=X :=(X ,...,X ); n n n 1 n ∋ 7→ anysuchmapwillbe denotedherebyθˆ (X)(orsimplybyθˆ orθˆ)andreferred n n to as a maximum likelihood estimator (MLE) of θ. This is a somewhat more generaldefinitionoftheMLEthanusual,andingeneralanMLEθˆwillnothave tobe ar.v.;thatis,itcanbe non-measurablewithrespectto the sigma-algebra . However, to simplify the presentation, we shall still refer to sets of the form Aθˆ J := ω Ω: θˆ (X(ω)) J for Borel sets J Θ as events and write n { ∈ } { ∈ ∈ } ⊆ P (θˆ J)implying thatthe latterexpressionmayandshouldbe understoodas θ ∈ either one of the expressions(P ) (θˆ J) or (P ) (θˆ J), where and stand θ ∗ θ ∗ for the corresponding outer and inne∈r measures. O∗ f c∈ourse, when the m∗ap θˆis measurable, then one can use the bona fide expressions of the mentioned form P (θˆ J). θ ∈ Let θ Θ be the “true” value of the unknown parameter θ, such that 0 ∈ [θ δ,θ +δ] Θ (2.1) 0 0 ◦ − ⊆ for some real δ > 0, where Θ denotes the interior of the subset Θ of R. For ◦ brevity, let P:=P and E:=E . θ0 θ0 For x and θ Θ, consider the log-likelihood ∈X ∈ ℓ (θ):=lnp (x) x θ and assume the following: I. Pinelis/Rate of convergence to normality for maximum likelihood estimators 5 (I) The set := x : p (x)>0 is the same forall θ [θ δ,θ +δ], >0 θ 0 0 X { ∈X } ∈ − andforeachx thedensityp (x)andhencethelog-likelihoodℓ (θ) >0 θ x ∈X are thrice differentiable in θ at each point θ [θ δ,θ +δ]. 0 0 (II) StandardregularityconditionsholdsothatE∈ℓ (θ−)=0andEℓ (θ )2 = ′X 0 ′X 0 Eℓ (θ )=I(θ ) (0, ), where I(θ) is the Fisher information at θ. (III) E−ℓ ′X(′θ )03+E ℓ0 (∈θ )3∞< . (IV) E| ′X su0p| ℓ| ′X(′θ)03|< ∞. | ′X′′ | ∞ θ∈[θ0−δ,θ0+δ] Remark 2.1. The expectation Eℓ (θ ), mentioned in condition (II), may be ′X 0 understood as p (θ )µ(dx), where p (θ) := p (x); similarly, for the other ′x 0 x θ expectations meXn>ti0oned in conditions (II)–(IV). Of course, all the derivatives R here are with respect to θ. Concerningthe“standardregularityconditions”mentionedincondition(II), it will be enough to assume that P( ∂ p (X)=0)>0 and for some measurable ∂θ θ 6 function g: [0, ) such that gdµ < and all θ [θ δ,θ +δ] >0 0 0 and x ∈ X>X0 we→have∞|∂∂θpθ(x)|+|∂∂Rθ2X2p>θ0(x)| 6 g∞(x); see e.g.∈[13, L−emma 5.3, page116]and [19,Lemma 2.4](more generalconditions canbe givenusing [19, Lemma 2.3]). Then I(θ) will also be continuous in θ [θ δ,θ +δ]. 0 0 ∈ − Conditions(I)–(IV)arerathersimilartoregularityconditionsusedinrelated literature;seeRemark7.3onpage18fordetails.Itappearsthattheseconditions will be generally satisfied provided that ℓ (θ) is smooth enough in θ. x Forinstance,letusbrieflyconsiderthe casewhenthe familyofdensities(p ) θ is a location family, so that ℓ (θ)=λ(x θ) for all (x,θ) Θ=R2, where x − ∈X × λis asmoothenoughfunction. Ifthe densities p havepower-liketails,then for θ somepositiverealconstantsc andc onehasλ(x) c ln x asx ,in + which case typically λ(k)(x) c −k!x kln x fo∼r k−=±0,1|,.|.. as→x ±∞ . − So, conditions (III) a|nd (IV|)∼w−ill ±hold|,|since| ℓ|(k)(θ) = λ(k)(x θ)→. If±t∞he x | | | − | tails densities p are lighter than power-like tails, so that (say) λ(x) c xα θ for some real α > 0 as x , then typically λ(k)(x) c k!∼x−α ±k|fo|r − → ±∞ | | ∼ − ± | | k =0,1,... as x , so that conditions (III) and (IV) will again hold. →±∞ The case of a scale family is quite similar to that of a location family. Alter- natively, the “scale” case can be reduced to the “location” one by logarithmic rescaling in both x and θ. Atthispoint,consideralsothecasewhenthefamilyofdensities(p )isanex- θ ponentialfamily,sothatℓ (θ)=w(θ)T(x)+d(θ)forsomefunctionsw,T,andd x andforall(x,θ) Θ=R2,wherethefunctionswanddaresmoothenough, ∈X× with w (θ )=0. Then ℓ(k)(θ)=w(k)(θ)T(x)+d(k)(θ). So, conditions (III) and ′ 0 x (IV)willhold6 inthiscaseaswell,sinceE T(X)α = T(x)αexp w(θ )T(x)+ 0 | | | | { d(θ ) µ(dx) for α> 0, T(x)α =O(ehT(x)+e hT(x)X) for any given real α >0 0 } | | − R andany givennonzero realh, and the conditions θ Θ and w (θ )=0 imply 0 ◦ ′ 0 ∈ 6 that exp [w(θ )+h]T(x)+d(θ ) µ(dx) < for all real h close enough to 0 0 { } ∞ 0. X R I. Pinelis/Rate of convergence to normality for maximum likelihood estimators 6 Let n ℓ (θ):= ℓ (θ) (2.2) X Xi i=1 X for θ Θ, the log-likelihood of the sample X=(X ,...,X ). 1 n ∈ 3. Tight bracketing of the MLE between two functions of the sum of independent random vectors Without loss of generality (w.l.o.g.), = . Then on the event >0 X X G:= θˆ [θ δ,θ +δ] (3.1) 0 0 { ∈ − } (G for “good event”) one must have (θˆ θ )2 0=ℓ′X(θˆ)=ℓ′X(θ0)+(θˆ−θ0)ℓ′X′(θ0)+ −2 0 ℓ′X′′(θ0+ξ(θˆ−θ0)) (3.2) (θˆ θ )2 =n Z (θˆ θ )U + − 0 R (3.3) 0 − − 2 (cid:16) (cid:17) for someξ (0,1),depending onthe values ofthe X ’s,where Z := 1 n Z , ∈ i n i=1 i U := 1 n U , R:= 1 n R , R := 1 n R , n i=1 i n i=1 i ∗ n i=1 i∗ P P P P Z :=ℓ (θ ), U := ℓ (θ ), i ′Xi 0 i − ′X′i 0 R :=ℓ (θ +ξ(θˆ θ )) [ R ,R ], R := sup ℓ (θ). (3.4) i ′X′′i 0 − 0 ∈ − i∗ i∗ i∗ | ′X′′i | θ∈[θ0−δ,θ0+δ] Note that the Z ’s are i.i.d. r.v.’s, and so are the U ’s and the R ’s (but not i i i∗ necessarily the R ’s). i Equalities(3.2)and(3.3)provideaquadraticequationforθˆ.So,ontheevent G one has Z θˆ θ = if R=0 & U =0, 0 − U 6 (3.5) θˆ θ d ,d if R=0, 0 + − ∈{ −} 6 where 2 U U 2ZR d := ± − . ± q R Letting B :=B B , where 1 2 ∪ B := R=0, θˆ θ =d U 60 and B := U2 62Z R (3.6) 1 { 6 − 0 +}∪{ } 2 { | | ∗} I. Pinelis/Rate of convergence to normality for maximum likelihood estimators 7 (B for “bad event”), on the event B U > 0 one has θˆ θ = d > 1 0 + ∩{ } | − | | | U/R >U/R , whence, by (3.1), | | ∗ n U U P(G∩B1)6P U 60 or R 6δ =P R 6δ =P (Ui−δRi∗)60 . (cid:16) ∗ (cid:17) (cid:16) ∗ (cid:17) (cid:16)Xi=1 (cid:17) (3.7) By definitions (3.4) and conditions (II), (III), and (IV), EU1 >0, E|Z1|3 <∞, E|U1|3 <∞, E(R1∗)3 <∞, (3.8) and hence ER < . So, w.l.o.g. one may choose δ > 0 to be small enough so 1∗ ∞ that δ1 :=E(Ui−δRi∗)>0. Then,lettingY :=(U δR ) E(U δR )andusing(3.7),Markov’sinequality, i i− i∗ − i− i∗ and a Rosenthal-type inequality (see e.g. [19, Theorem 1.5]) n n 1 3 P(G B )6P Y 6 nδ 6 E Y ∩ 1 i − 1 (nδ )3 i 1 (cid:16)Xi=1 (cid:17) (cid:12)Xi=1 (cid:12) 6 nE|Y1|3+ (cid:12)(cid:12)8/π(n(cid:12)(cid:12)EY12)3/2 6 C , (3.9) (nδ )3 n3/2 p 1 whereC:= E Y 3+ 8/π(EY2)3/2 /δ3,whichdependsonδ >0,EY2 < , and E Y 3 < | 1–|but not on n.1 1 1 1 ∞ | 1| (cid:0) ∞ p (cid:1) Next,theoccurrenceofB impliesthatofatleastoneofthefollowingevents: 2 B := U 6 1 EU ,B := R >1+ER ,orB := Z > 1(EU )2/(1+ ER21 ) .{So, 2 1} 22 { ∗ 1∗} 23 {| | 8 1 1∗ } P(B )6P(B )+P(B )+P(B ). (3.10) 2 21 22 23 In view of (3.8), the bounding of each of the probabilities P(B ), P(B ), 21 22 P(B ) is quite similar to the bounding of P(G B ) in (3.9) – because 23 1 P(B )=P( n Y 6 nδ ),P(B )=P( n ∩Y >nδ ),andP(B )= P( 2n1 Y i>=1nδi,21), w−here21Y :=22U EUi=,1δ i,2:2= 1 E2U2 > 0, Y 23:= R iE=R1| ,iδ,23P|:=1>230,Y :=Zi,21 EZ i=−ZP,δ1 :=211(E2U )2/1(1+ERi,)22>0. i∗P− 1∗ 22 i,23 i− 1 i 23 8 1 1∗ Thus, by (3.6), (3.9), and (3.10), C P(G B)6P(G B )+P(B )6 . (3.11) ∩ ∩ 1 2 n3/2 On the other hand, if R = 0 and U > 0, then d = 2Z ; here, the 6 − U+√U2 2ZR − condition U > 0 was used only to ensure that the denominator of the latter ratio is nonzero. Hence, on the event G B one has \ 2Z U >0 and θˆ θ = [T ,T ], (3.12) 0 + − 2 ∈ − U + U 2ZR − q I. Pinelis/Rate of convergence to normality for maximum likelihood estimators 8 where 2Z T := ; (3.13) ± 2 U + U 2Z R ∓ | | ∗ q note that, when R = 0 and U > 0, the expression of θˆ θ in (3.12) is in 0 − agreement with the corresponding expression in (3.5). Now that the desired bracketing of θˆ θ between T and T is obtained 0 + − − in (3.12), we are ready to apply some of the mentioned general results of [20], presented in the next section. 4. General uniform and nonuniform bounds from [20] on the rate of convergence to normality for smooth nonlinear functions of sums of independent random vectors The standardnormal distribution function (d.f.) will be denoted by Φ. For any Rd-valued random vector ζ, we use the norm notation ζ := E ζ p 1/p for any real p>1, p k k k k where denotes the Eucli(cid:0)dean no(cid:1)rm on Rd. Takek·akny Borel-measurable functional f: Rd R satisfying the following → smoothness condition: there exist ǫ (0, ), M (0, ), and a continuous ǫ linear functional L: Rd R such tha∈t ∞ ∈ ∞ → M f(x) L(x) 6 ǫ x 2 for all x Rd with x 6ǫ. (4.1) | − | 2 k k ∈ k k Thus, f(0) = 0 and L necessarily coincides with the first Fr´echet derivative, f (0), of the function f at 0. Moreover, for the smoothness condition (4.1) to ′ hold, it is enough that 1 d2 M >M :=sup f(x+tx) : x Rd, 0< x 6ǫ ; (4.2) ǫ ǫ∗ x 2 dt2 t=0 ∈ k k (cid:26)k k (cid:12)(cid:12) (cid:12) (cid:12)(cid:12) (cid:27) it is not necessary that f be(cid:12)(cid:12)twice differen(cid:12)(cid:12)tiab(cid:12)(cid:12)le at 0. E.g., if d=1 and f(x)= x for x R, then f(0)=0, f (0)=1, and f (x)= 2signx for real x=0; 1+x ∈ ′ ′′ −(1+x)3 6 | | | | so, (4.1) holds for any real ǫ > 0 with L(x) x and M = 2, whereas f (0) ǫ ′′ ≡ does not exist. Let V,V ,...,V be i.i.d. random vectors in Rd, with EV =0 and 1 n n 1 V := V . i n i=1 X Further let L(V) 3 σ˜ := L(V) , v := V , and ς := k k . (4.3) 2 3 3 3 k k k k σ˜ I. Pinelis/Rate of convergence to normality for maximum likelihood estimators 9 Theorem 4.1.[20] Suppose that (4.1) holds, and that σ˜ >0 and v < . Then 3 for all z R ∞ ∈ f(V) C P 6z Φ(z) 6 , (4.4) σ˜/√n − √n (cid:12) (cid:16) (cid:17) (cid:12) where C is a finite pos(cid:12)itive expression that d(cid:12)epends only on the function f (cid:12) (cid:12) (through (4.1)) and the moments σ˜, ς , and v . Moreover, for any ω (0, ) 3 3 ∈ ∞ and for all z 0,ω√n (4.5) ∈ one has (cid:0) (cid:3) f(V) C P 6z Φ(z) 6 ω , (4.6) σ˜/√n − z3√n (cid:12) (cid:16) (cid:17) (cid:12) (cid:12) (cid:12) where C is a finite p(cid:12)ositive expression that(cid:12)depends only on the function f ω (through (4.1)), the moments σ˜, ς , and v , and also on ω. 3 3 The restriction (4.5) cannot be relaxed in general; see [20]. To simplify the presentation, in what follows let C stand for various finite positiveexpressionswhosevaluesdonotdependonnorz;thatis,Cwilldenote various positive real constants – with respect to n and z. However, C may depend on other attributes of the setting, including the model (P ) under θ θ Θ consideration, the P -distribution of X , and the values of paramet∈ers freely θ0 1 chosen in a given range (such as ω in (4.5) and ε in (4.1)). 5. Making the bracketing work: Applying the general bounds of [20] Now let d=3 and then let := x=(x ,x ,x ) Rd =R3: x +EU >0, (x +EU )2 >2x x +ER . D { 1 2 3 ∈ 2 1 2 1 | 1|| 3 1∗|} By (3.4) andconditions (II)and(IV), EU =I(θ ) (0, )andER [0, ). 1 0 ∈ ∞ 1∗ ∈ ∞ So, for some real ǫ > 0, the set contains the ǫ-neighborhood of the origin 0 of R3. D Define functions f : R3 R by the formula ± → 2x 1 f (x)=f (x ,x ,x )= (5.1) ± ± 1 2 3 x +EU + (x +EU )2 2x x +ER 2 1 2 1 ∓ | 1|| 3 1∗| for x=(x ,x ,x ) , and let f(x):=p0 if x R3 . Clearly, f (0)=0, 1 2 3 ∈D ∈ \D ± x x 1 1 L±(x):=f±′ (0)(x)= EU1 = I(θ0) (5.2) for x = (x ,x ,x ) R3, and, in accordance with (4.2), the smoothness con- 1 2 3 ∈ dition (4.1) holds for some ǫ and M in (0, ) – because, as was noted above, ǫ EU = I(θ ) (0, ) and ER [0, ),∞and hence the denominator of the 1 0 ∈ ∞ 1∗ ∈ ∞ I. Pinelis/Rate of convergence to normality for maximum likelihood estimators 10 ratio in (5.1) is bounded away from 0 for x=(x ,x ,x ) in a neighborhood of 1 2 3 0. Next, let Vi :=(Zi,Ui−EUi,Ri∗−ERi∗) (5.3) for i = 1,...,n, with Z ,U ,R as defined in (3.4). Then, by (4.3), (5.2), and i i i∗ condition (II), for f =f , ± EZ2 1 σ˜ = 1 = >0 (5.4) sI(θ0)2 I(θ0) and v3 = E V 3 < by conditions (IIIp) and (IV). So, all the conditions of 3 k k ∞ Theorem 4.1 are satisfied for f =f . ± Moreover,by (3.13), (5.1), and (5.3), T =f (V) ± ± on the event G B. So, by the inclusion relation in (3.12) which holds on \ c c c the event G B = (G B) , where denotes the complement and (5.4), \ ∪ (cid:0) inequality (4.4) in Theorem 4.1 implies (cid:1) P nI(θ )(θˆ θ )6z 6P nI(θ )f (V)6z +P(Gc B) 0 0 0 − − ∪ (cid:16)p (cid:17)6Φ((cid:16)zp)+ C +P(Gc B(cid:17)) √n ∪ and, quite similarly, P nI(θ )(θˆ θ )6z >P nI(θ )f (V)6z P(Gc B) 0 0 0 + − − ∪ (cid:16)p (cid:17)>Φ((cid:16)zp) C P(Gc B(cid:17)), − √n − ∪ for all real z. Note that P(Gc B)=P(Gc)+P(G B). It follows now by (3.1) ∪ ∩ and (3.11) that C P nI(θ )(θˆ θ )6z Φ(z) 6 +P(θˆ θ >δ) (5.5) 0 0 0 − − √n | − | (cid:12) (cid:16)p (cid:17) (cid:12) (cid:12) (cid:12) for all real(cid:12)z. Quite similarly, but using (4.6)(cid:12) instead of (4.4), one has C P nI(θ )(θˆ θ )6z Φ(z) 6 +P(θˆ θ >δ) (5.6) 0 − 0 − z3√n | − 0| (cid:12) (cid:16)p (cid:17) (cid:12) (cid:12) (cid:12) for z a(cid:12)s in (4.5). (cid:12) Typically, given rather standard regularity conditions, the remainder term P(θˆ θ > δ) decreases exponentially fast in n and thus is negligible as com- 0 | − | C C paredwith the “error”term , and even with the “error”term – under √n z3√n condition (4.5). Some details on this can be found in the following section.