Maximum likelihood estimators for the extreme value index based on the block maxima method 3 1 0 Clément Dombry∗ 2 n January 24, 2013 a J 3 2 ] R Abstract P Themaximum likelihoodmethodoffers astandard way toestimate . h the three parameters of a generalized extreme value (GEV) distribu- t a tion. Combined with the block maxima method, it is often used in m practice toassesstheextremevalueindexandnormalization constants [ of a distribution satisfying a first order extreme value condition, as- 1 sumingimplicitelythattheblockmaximaareexactlyGEVdistributed. v This is unsatisfactory since the GEV distribution is a good approxi- 1 1 mation of the block maxima distribution only for blocks of large size. 6 The purpose of this paper is to provide a theoretical basis for this 5 methodology. Under a first order extreme value condition only, we . 1 prove the existence and consistency of the maximum likelihood esti- 0 matorsfortheextremevalueindexandnormalizationconstantswithin 3 1 the framework of the block maxima method. : v i Key words: extreme value index, maximum likelihood estimator, X block maxima method, consistency. r a AMS Subject classification: 62G32. 1 Introduction and results Estimation of the extreme value index is a central problem in extreme value theory. A variety of estimators are available in the literature, ∗Université de Poitiers, Laboratoire de Mathématiques et Applications, UMR CNRS 7348, Téléport 2, BP 30179, F-86962 Futuroscope-Chasseneuil cedex, France. Email: [email protected] 1 for example among others, the Hill estimator [8], the Pickand’s esti- mator [13], the probability weighted moment estimator introduced by Hosking et al. [10] or the moment estimator suggested by Dekkers et al. [5]. The monographs by Embrechts et al. [7], Beirlant et al. [2] or de Haan and Ferreira [4] provide good reviews on this estimation problem. In this paper, we are interested on estimators based on the maxi- mum likelihood method. Two different types of maximum likelihood estimators (MLEs) have been introduced, based on the peak over threshold method and block maxima method respectively. The peak over threshold method relies on the fact that, under the extreme value condition, exceedances over high threshold converge to a generalized Pareto distribution (GPD) (see Balkema and de Haan [1]). A MLE within the GPD model has been proposed by Smith [18]. Its theoreti- cal properties under the extreme value condition are quite difficult to analyze due to the absence of an explicit expression of the likelihood equations: existence and consistency have been proven by Zhou [21], asymptotic normality by Drees et al. [6]. The block maxima method relies on the approximation of the maxima distribution by a general- ized extreme value (GEV) distribution. Computational issues for ML estimation within the GEV model have been considered by Prescott and Walden [14, 15], Hosking [9] and Macleod [11]. Since the support of the GEV distribution depends on the unknown extreme value index γ, the usual regularity conditions ensuring goodasymptotic properties are not satisfied. This problem is studied by Smith [17]: asymptotic normality is proven for γ > −1/2 and consistency for γ > −1. Itshouldbestressedthattheblockmaximamethodisbasedonthe assumption that the observations come from a distribution satisfying the extreme value condition so that the maximum of a large numbers of observations follows approximatively a generalized extreme value (GEV) distribution. On the contrary, the properties of the maximum likelihood relies implicitely on the assumption that the block max- ima have exactly a GEV distribution. In many situations, this strong assumption is unsatisfactory and we shall only suppose that the un- derlyingdistribution isinthedomain ofattractionofanextremevalue distribution. This is the purpose of the present paper to justify the maximum likelihood method for the block maxima method under an extreme value condition only. We first recall some basic notions of univariate extreme value the- ory. The extreme value distribution distribution with index γ is noted G and has distribution function γ F (x) = exp(−(1+γx)−1/γ), 1+γx > 0. γ 2 We say that a distribution function F satisfies the extreme value con- dition with index γ, or equivalently that F belongs to the domain of attraction of G if there exist constants a > 0 and b such that γ m m lim Fm(a x+b ) = F (x), x ∈ R. (1) m m γ m→+∞ That is commonly denoted F ∈ D(G ). The necessary and sufficient γ conditions for F ∈ D(G ) can be presented in different ways, see e.g. γ de Haan [3] or de Haan and Ferreira [4, chapter 1]. We remind the following simple criterion and choice of normalization constants. ← Theorem 1. Let U = 1 be the left continuous inverse function 1−F (cid:16) (cid:17) of 1/(1−F). Then F ∈ D(G ) if and only if there exists a function γ a(t) > 0 such that U(tx)−U(t) xγ −1 lim = , for all x > 0. t→+∞ a(t) γ Then, a possible choice for the function a(t) is given by γU(t), γ > 0, a(t) = −γ(U(∞)−U(t)), γ < 0, U(t)−t−1 tU(s)ds, γ = 0, 0 R and a possible choice for the normalization constants in (1) is a = a(m) and b = U(m). m m Inthe sequel, we will always usethe normalization constants (a ) and m (b ) given in Theorem 1. Note that they are unique up to asymptotic m equivalence in the following sense: if (a′ ) and (b′ ) are such that m m Fm(a′ x+b′ )→ F (x) for all x ∈ R, then m m γ a′ b′ −b lim m = 1 and lim m m = 0. (2) m→+∞ am m→+∞ am The log-likelihood of the extreme value distribution G is given by γ ℓ (x) = −(1+1/γ)log(1+γx)−(1+γx)−1/γ, γ if 1 + γx > 0 and −∞ otherwise. For γ = 0, the formula is inter- preted as ℓ (x) = −x−exp(−x). The three parameter extreme value 0 distribution with shape γ, location µ and scale σ > 0 has distribution function x 7→ F (σx+µ). The corresponding log-likelihood is γ x−µ ℓ (x) = ℓ −logσ. (γ,µ,σ) γ σ (cid:16) (cid:17) 3 The set-up of the block maxima method is the following. We con- sider independent and identically distributed (i.i.d.) random variables (X ) with common distribution function F ∈ D(G ) and corre- i i≥1 γ0 sponding normalization sequences (a ) and (b ) as in Theorem 1. m m We divide the sequence (X ) into blocks of length m ≥ 1 and define i i≥1 the k-th block maximum by M = max(X ,...,X ), k ≥ 1. k,m (k−1)m+1 km For fixed m ≥ 1, the variables (M ) are i.i.d. with distribution k,m k≥1 function Fm and M −b k,m m =⇒ G as m → +∞. (3) a γ0 m Equation (3)suggeststhatthedistributionofM isapproximately a k,m GEV distribution with parameters (γ ,b ,a ) and this is standard to 0 m m estimate these parameters by the maximum likelihood method. The log-likelihood of the n-sample (M ,...,M ) is 1,m n,m n 1 L (γ,σ,µ) = ℓ (M ). n (γ,µ,σ) k,m n Xk=1 Ingeneral,L hasnoglobalmaximum,leadingustothefollowingweak n notion: we say that (γ ,µ ,σ ) is a MLE if L has a local maximum n n n n at (γ ,µ ,σ ). Clearly, a MLE solves the likelihood equations n n n b b b ∂L ∂L ∂L b b b n n n ∇L = 0 with ∇L = , , . (4) n n ∂γ ∂µ ∂σ (cid:16) (cid:17) Conversely, any solution of the likelihood equations with a definite negative Hessian matrix is a MLE. For the purpose of asymptotic, we let the length of the blocks m = m(n) depend on the sample size n. Our main result is the following theorem, stating the existence of consistent MLEs. Theorem 2. Suppose F ∈ D(G ) with γ > −1 and assume that γ0 0 m(n) lim = +∞. (5) n→+∞ logn Then there exists a sequence of estimators (γ ,µ ,σ ) and a random n n n integer N ≥ 1 such that b b b P[(γ ,µ ,σ ) is a MLE for all n ≥ N]= 1 (6) n n n and b b b µ −b σ a.s. n m a.s. n a.s. γ −→ γ , −→ 0 and −→ 1 as n → +∞. (7) n 0 a a m m b b b 4 The condition γ > −1 is natural and agrees with Smith [17]: it 0 is easy to see that the likelihood equation (4) has no solution with γ ≤ 1 so that no consistent MLE exists when γ < −1 (see Remark 3 0 below). Condition (5) states that the block length m(n) grows faster than logarithmically in the sample size n, which is not very restrictive. Let us mention a few further remarks on this condition. Remark 1. A control of the block size is needed, as the following simple example shows. Consider a distribution F ∈ D(G ) with γ > γ0 0 0 and such that the left endpoint of F is equal to −∞. Then for each m ≥ 1, the distribution of the m-th block maxima M has k,m left endpoint equal to −∞ and there exist a sequence m(n) (growing slowly to +∞) such that M −b k,m m lim min = −∞ almost surely. (8) n→+∞1≤k≤n am The log-likelihood L (γ,σ,µ) is finite if and only if n M −µ k,m min 1+γ > 0, 1≤k≤n(cid:16) σ (cid:17) so that any MLE (γ ,µ ,σ ) must satisfy n n n b b b Mk,m−µn min 1+γ > 0. n 1≤k≤n(cid:16) σn (cid:17) b b Using this observations, one shows easily that Equation (8) is an ob- b struction for the consistency (7) of the MLE. Of course, this phe- nomenon can not happen under condition (5). Remark 2. It shall be stressed that condition (5) appears only in the proof of Lemma 4 below. One can prove that under stronger as- sumptions on the distribution F ∈ D(G ), condition (5). This is for γ0 examplethecaseifF isaPareto distribution function: onechecks eas- ilythattheproofofLemma4goesthroughundertheweakercondition lim m(n)= +∞. HenceTheorem2holdsunderthisweaker con- n→+∞ dition in the Pareto case. In order to avoid technical conditions that are hard to check in practice when F is unknown, we do not develop this direction any further. The structure of the paper is the following. We gather in Section 2 some preliminaries on properties of the GEV log-likelihood and of the empiricaldistribution associatedtonormalizedblockmaxima. Section 3 is devoted to the proof of Theorem 2, which relies on an adaptation of Wald’s method for proving the consistency of M-estimators. Some technicalcomputations(proofofLemma4)involvingregularvariations theory are postponed to an Appendix. 5 2 Preliminaries 2.1 Properties of the GEV log-likelihood We gather in the following proposition some basic properties of the GEV log-likelihood. We note x− and x+ the left and right end point γ γ of the domain ℓ , i.e. γ (x−,x+)= {x ∈ R; 1+γx > 0}. γ γ Clearly, it is equal to (−∞,−1/γ), R and (−1/γ,+∞) when γ < 0, γ = 0 and γ > 0 respectively. Proposition 1. The function ℓ is smooth on its domain. γ 1. If γ ≤ −1, ℓ is stricly increasing on its domain and γ +∞ if γ < −1 lim ℓ (x) = −∞ lim ℓ (x)= . x→x−γ γ x→x+γ γ (cid:26) 0 if γ = −1 2. Ifγ > −1, ℓ isincreasingon(x−,x∗]anddecreasing on[x∗,x+), γ γ γ γ γ where (1+γ)−γ −1 x∗ = . γ γ Furthermore lim ℓ (x) = lim ℓ (x) = −∞ γ γ x→x−γ x→x+γ and ℓ reaches its maximum ℓ (x∗) = (1 + γ)(log(1 + γ) − 1) γ γ γ uniquely. Remark 3. According to Proposition 1, the log-likelihood ℓ has no γ localmaximuminthecaseγ ≤ −1. Thisentails thatthelog-likelihood Equation (4) has no local maximum in (−∞,−1]×R×(0,+∞) and that any MLE (γ ,µ ,σ ) satisfies γ > −1. Hence, no consistent n n n n MLE does exist if γ < −1. The limit case γ = −1 is more difficult 0 0 b b b b to analyze and is disregarded in this paper. 2.2 Normalized block maxima In view of Equation (3), we define the normalized block maxima M −b k,m m M = , k ≥ 1, k,m a m f 6 and the corresponding likelihood n 1 L (γ,σ,µ) = ℓ (M ). n (γ,µ,σ) k,m(n) n Xk=1 e f It should be stressed that the normalization sequences (a ) and (b ) m m are unknown so that the normalized block maxima M and the like- k,m lihood L cannot be computed from the data only. However, they will n f be useful in our theoretical analysis since they have good asymptotic e properties. The following simple observation will be useful. Lemma 1. (γ ,µ ,σ ) is a MLE if and only if L has a local maxi- n n n n mum at (γ ,(µ −b )/a ,σ /a ). n bn b mb m n m e Proof. Thbe GEbV likelihoodbsatisfies the scaling property ℓ ((x−b)/a) = ℓ (x)+loga γ,µ,σ (γ,aµ+b,aσ) so that µ−b σ m L (γ,µ,σ) = L γ, , −loga . n n m a a (cid:16) m m(cid:17) e Hence the local maximizers of L and L are in direct correspondence n n and the lemma follows. e 2.3 Empirical distributions The likelihood function L˜ can beseen as a functional of the empirical n distribution defined by n 1 Pn = n δMfk,m, Xk=1 where δ denotes the Dirac mass at point x ∈ R. For any measurable x f :R → [−∞,+∞), we note P [f] the integral with respect to P , i.e. n n n 1 P [f]= f(M ). n k,m n Xk=1 f With these notations, it holds L (γ,µ,σ) = P [ℓ ]. n n (γ,µ,σ) e The empirical process is defined by n 1 Fn(t) = Pn((−∞,t]) = n 1{Mfk,m≤t}, t ∈ R. Xk=1 7 In the case of an i.i.d. sequence, the Glivenko-Cantelli Theorem states that the empirical process converges almost surely uniformly to the sample distribution function. According to the general theory of em- pirical processes (see e.g. Shorack and Wellner [16] Theorem 1, p106), this result can be extended to triangular arrays of i.i.d. random vari- ables. Equation (3) entails the following result. Lemma 2. Suppose F ∈ D(G ) and lim m(n) =+∞. Then, γ0 n→+∞ sup|F (t)−F (t)| −a.→s. 0 as n → +∞. n γ0 t∈R This entails the almost surely weak convergence P ⇒ G , whence n γ0 P [f] −a→.s G [f] as n → +∞ n γ0 for all bounded and continuous function f : R → R. The following lemma dealing with more general functions will be useful. Lemma 3. Suppose F ∈ D(G ) and lim m(n) = +∞. Then, γ0 n→+∞ for all upper semi-continuous function f : R → [−∞,+∞) bounded from above, limsupP [f]≤ G [f] a.s.. n γ0 n→+∞ Proof of Lemma 3. Let M be an upper bound for f. The function f˜= M −f is non-negative and lower semicontinuous. Clearly, P [f]= M −P [f˜] and G [f]= M −G [f˜], n n γ0 γ0 whence it is enough to prove that liminfP [f˜]≥ G [f˜] a.s.. n γ0 n→+∞ To see this, we use the relation 1 P [f˜] = f˜(F←(u))du. n n Z 0 where F← is the left-continuous inverse function n F← = inf{x ∈ R; F (x) ≥ u}, u∈ (0,1). n n Lemma 2 together with the continuity of the distribution function F entail that almost surely, F←(u) → F←(u) for all u ∈ (0,1) as γ0 n γ0 n → +∞. Using the fact that f˜is lower semi-continuous, we obtain liminff˜(F←(u)) ≥ f˜(F←(u)) u ∈(0,1). n→+∞ n γ0 8 On the other hand, according to Fatou’s lemma, 1 1 liminf f˜(F←(u))du ≥ liminff˜(F←(u))du. n n n→+∞Z Z n→+∞ 0 0 Combining the two inequalities, we obtain 1 liminff˜(F←(u)) ≥ f˜(F←(u))du, n→+∞ n Z γ0 0 whence liminfP [f˜]≥ G [f˜] a.s.. n γ0 n→+∞ The next lemma plays a crucial role in our proof of Theorem 2. Its proof is quite technical and is postponed to an appendix. Lemma 4. Suppose F ∈ D(G ) with γ > −1 and assume condition γ0 0 (5) is satisfied. Then, lim P [ℓ ] = G [ℓ ] a.s.. (9) n γ0 γ0 γ0 n→+∞ It shall be stressed that Lemma 4 is the only part in the proof of Theorem 2 where condition (5) is needed (see Remark 2). 3 Proof of Theorem 2 We introduce the short notation Θ = (−1,+∞) × R × (0,+∞). A generic point of Θ is noted θ = (γ,µ,σ). The restriction L :Θ → [−∞,+∞) is continuous, so that for any n compact K ⊂ Θ, L is bounded and reaches its maximum on K. We en can thus define θK = (γK,µK,σK) such that ne n n n e eθnKe= aergmax Ln(θ). (10) θ∈K e e The following proposition is the key in the proof of Theorem 2. Proposition 2. Let θ = (γ ,0,1) and K ⊂ Θ be a compact neigh- 0 0 borhood of θ . Under the assumptions of Theorem 2, 0 lim θK = θ a.s.. n 0 n→+∞ e TheproofofProposition2reliesonanadaptationofWald’smethod for proving the consistency of M-estimators (see Wald [20] or van der Vaart [19] Theorem 5.14). The standard theory of M-estimation is designed for i.i.d. samples, while we have to deal with the triangular array {(M ) ,n ≥ 1}. We first state two lemmas. k,m 1≤k≤n f 9 Lemma 5. For all θ ∈ Θ, G [ℓ ] ≤ G [ℓ ] and the equality holds if γ0 θ γ0 θ0 and only if θ = θ . 0 Proof of Lemma 5. The quantity G [ℓ −ℓ ] is the Kullback-Leibler γ0 θ0 θ divergence of th GEV distributions with parameters θ and θ and is 0 known to be non-negative (see van der Vaart [19] section 5.5). It vanishes if and only if the two distributions agree. This occurs if and only if θ = θ because the GEV model is identifiable. 0 Lemma 6. For B ⊂ Θ, define ℓ (x) = supℓ (x), x ∈R. B θ θ∈B Let θ ∈ Θ and B(θ,ε) be the open ball in Θ with center θ and radius ε > 0. Then, limG [ℓ ] = G [ℓ ]. γ0 B(θ,ε) γ0 θ ε→0 Proof of Lemma 6. Proposition 1 implies ℓ (x) = ℓ ((x−µ)/σ)−logσ ≤ m −logσ. θ γ γ One deduce that if B is contained in (1,γ¯]× [σ¯,+∞) × R for some γ¯ > −1 and σ¯ > 0 , then there exists M(γ¯,σ¯) such that ℓ (x) ≤ M(γ¯,σ¯) for all θ ∈ B,x ∈ R. θ HencethereexistsM > 0suchthatfunctionM−ℓ isnon-negative B(θ,ε) for ε small enough. The continuity of θ 7→ ℓ (x) on Θ implies θ limℓ (x)= ℓ (x) for all x ∈ R. B(θ,ε) θ ε→0 Then, Fatou’s Lemma entails G liminf(M −ℓ ) ≤ liminfG M −ℓ , γ0 B(θ,ε) γ0 B(θ,ε) ε→0 ε→0 (cid:2) (cid:3) (cid:2) (cid:3) whence we obtain limsupG [ℓ ] ≤ G [ℓ ]. γ0 B(θ,ε) γ0 θ ε→0 On the other hand, θ ∈ B(θ,ε) implies G [ℓ ] ≥ G [ℓ ]. We γ0 B(θ,ε) γ0 θ deduce limG [ℓ ] = G [ℓ ]. γ0 B(θ,ε) γ0 θ ε→0 10