Statistical inference for expectile-based risk measures Volker Kr¨atschmer Henryk Za¨hle ∗ † 6 1 0 2 p e S 0 Abstract 2 Expectiles were introduced by Newey and Powell [43] in the context of linear ] T regression models. Recently, Bellini et al. [6] revealed that expectiles can also S be seen as reasonable law-invariant risk measures. In this article, we show that . h the corresponding statistical functionals are continuous w.r.t. the 1-weak topol- t a ogy and suitably functionally differentiable. By means of these regularity results m we can derive several properties such as consistency, asymptotic normality, boot- [ strap consistency, and qualitative robustness of the corresponding estimators in 3 nonparametric and parametric statistical models. v 1 6 Keywords: Expectile-based risk measure; 1-weak continuity; Quasi-Hadamard dif- 2 ferentiability; Statistical estimation; Weak dependence; Strong consistency; Asymptotic 5 0 normality; Bootstrap consistency; Qualitative robustness; Functional delta-method . 1 0 6 1 : v i X r a ∗Faculty of Mathematics, University of Duisburg–Essen;[email protected] †Department of Mathematics, Saarland University; [email protected] 1 1. Introduction Let (Ω, ,P) be an atomless probability space and use Lp = Lp(Ω, ,P) to denote the F F usual Lp-space. The α-expectile of X L2, with α (0,1), can uniquely be defined by ∈ ∈ ρ (X) := argmin αE[((X m)+)2]+(1 α)E[((m X)+)2] α m∈R{ − − − } = argmin E[V (X m)] (1) m∈R α − (see Proposition 1 and Example 4 in [6]), where αx2 , x 0 V (x) := ≥ , x R. α (1 α)x2 , x < 0 ∈ (cid:26) − Expectiles were introduced by Newey and Powell [43] in the context of linear regression models. On the one hand, (1) generalizes the expectation of X which coincides with ρ (X) when specifically α = 1/2. On the other hand, (1) is similar to the α-quantile of α X which can be obtained by replacing x2 by x in the definition of V . This motivates α | | the name α-expectile. For every X L2 the mapping m E[V (X m)] is convex and differentiable with α ∈ 7→ − derivative given by m 2U (X)(m), where α 7→ − U (X)(m) := E[U (X m)], m R (2) α α − ∈ with αx , x 0 U (x) := ≥ , x R. α (1 α)x , x < 0 ∈ (cid:26) − Moreover, for X L1 the mapping m U (X)(m) is well defined and bijective; cf. α ∈ 7→ Lemma A.1 (Appendix A). These observations together imply that for X L2 the ∈ α-expectile admits the representation ρ (X) = U (X) 1(0), (3) α α − where U (X) 1 denotes the inverse function of U (X). In particular, (3) can be used to α − α define a map ρ : L1 R which is compatible with (1). For every X L1 the value in α → ∈ (3) will be called the corresponding α-expectile. Recently, Bellini et al. [6] revealed that expectiles can be also seen as reasonable risk measures when 1/2 α < 1. In Proposition 6 in [6], they prove that the map ≤ ρ : L2 R provides a coherent risk measure if (and only if) 1/2 α < 1. Recall that α → ≤ a map ρ : R, with a subspace of L0, is said to be a coherent risk measure if it is X → X monotone: ρ(X ) ρ(X ) for all X ,X with X X , 1 2 1 2 1 2 • ≤ ∈ X ≤ cash-invariant: ρ(X +m) = ρ(X)+m for all X and m R, • ∈ X ∈ subadditive: ρ(X +X ) ρ(X )+ρ(X ) for all X ,X , 1 2 1 2 1 2 • ≤ ∈ X 2 positively homogenous: ρ(λX) = λρ(X) for all X and λ 0. • ∈ X ≥ It is shown in the Appendix A (Proposition A.2) that even the map ρ : L1 R α → provides a coherent risk measure if (and only if) 1/2 α < 1. For 0 < α < 1/2 the ≤ map ρ : L1 R is at least monotone, cash-invariant, and positively homogeneous. For α → this reason we will henceforth refer to ρ : L1 R as expectile-based risk measure at α → level α (0,1). It is worth mentioning that ρ already appeared implicitly in an earlier α ∈ paper by Weber [49]. As Ziegel [51] pointed out that ρ satisfies a particularly desirable α property of risk measures in the context of backtesting, ρ attracted special attention α in the field of monetary risk measurement in the last few years [1, 5, 6, 22, 24, 51]. For pros and cons of expectile-based risk measures and of other standard risk measures see, for instance, the discussions by Acerbi and Szekely [1], Bellini and Di Bernardino [5], and Emmer et al. [24]. This article is concerned with the statistical estimation of expectile-based risk mea- sures. The goal is the estimation of ρ (X) for some X L1 with unknown distri- α ∈ bution function F. Let F be the class of all distribution functions on R satisfying 1 x dF(x) < . Note that F coincides with the set of the distribution functions of all 1 | | ∞ ´elements of L1, because the underlying probability space was assumed to be atomless. Also note that F ∈ F1 if and only if 0 F(x)dx < ∞ and 0∞(1−F(x))dx < ∞ hold. Since ρ is law-invariant (i.e. ρ (X ´)−=∞ ρ (X ) when P X´ 1 = P X 1), we may α α 1 α 2 1− 2− ◦ ◦ associate with ρ a statistical functional : F R via α α 1 R → (F ) := ρ (X), X L1, (4) α X α R ∈ where F denotes the distribution function of X. That is, X (F) = (F) 1(0) for all F F , (5) α α − 1 R U ∈ where (F)(m) := U (x m)dF(x), m R. (6) Uα ˆ α − ∈ Then, if F is a reasonable estimator for F, the plug-in estimator (F ) is typically a n α n R reasonable estimator for ρ (X) = (F). α α R Inanonbparametricframework, acanonicalexampleforF istheempirbicaldistribution n function n 1 b F := (7) n n 1[Xi,∞) i=1 X of n identically distributed randombvariables X ,...,X drawn according to F. In this 1 n case we have n (F ) = (F ) 1(0) = unique solution in m of U (X m) = 0. (8) α n α n − α i R U − i=1 X b b 3 That is, the plug-in estimator is nothing but a simple Z-estimator (M-estimator). For Z-estimators (M-estimators) there are several results concerning consistency and the asymptotic distribution in the literature. A classical reference is Huber’s seminal paper [30]; see also standard textbooks as [31, 46, 47, 48]. Recently Holzmann and Klar [29] used results of Arcones [3] and Van der Vaart [47] to derive asymptotic properties of the Z-estimator in (8). They restricted their attention to i.i.d. observations but allowed for observations without finite second moment. On the other hand, even in the nonparametric setting the estimator F may differ n from the empirical distribution function so that the plug-in estimator need not be a Z-estimator. See, for instance, Section 3 in [7] for estimators F being dbifferent from n the empirical distribution function. Also, in a parametric setting the estimator F will n hardly be the empirical distribution function. For these reasobns, we will consider a suitable linearization of the functional in order to be in the position to derive sbeveral α R asymptotic properties of the plug-in estimator (F ) in as many as possible situations. α n R Linearizations of Z-functionals have been considered before, for instance, by Clarke [17, 18]. However, these results do not cover the parbticular Z-functional , because the α R function U is unbounded. By using the concept of quasi-Hadamard differentiability as α well asthecorrespondingfunctionaldelta-methodintroducedbyBeutner andZa¨hle[7,8] we will overcome the difficulties with the unboundedness of U . Quasi-Hadamard differ- α entiability of the functional will in particular admit some bootstrap results for the α R plug-in estimator (F ) when F is the empirical distribution function of X ,...,X . α n n 1 n R It is worth mentioning that Heesterman and Gill [28] also considered a linearization approach to Z-estimatbors. Howebver (boiled down to our setting) they did not consider a linearization of the functional (to be evaluated at the estimator F of F) but only α n R of the functional that provides the unique zero of a strictly decreasing and continuous function tending to as its argument tends to (as the function b (F )). To some α n ±∞ ∓∞ U extent this approach is less flexible than our approach. Especially parametric estimators cannot be handled by this approach without further ado. b The rest of this article is organized as follows. In Section 2 we will establish a certain continuity andthe above-mentioned differentiability of the functional . In Sections 3– α R 4 we will apply the results of Section 2 to the nonparametric and parametric estimation of (F). In Section 5 we will prove the main result of Section 2, and in Section 6 we α R will verify two examples and a lemma presented in Sections 3–4. The Appendix provides some auxiliary results. In particular, in Section B of the Appendix we formulate a slight generalization of the functional delta-method in the form of Beutner and Za¨hle [8]. 2. Regularity of the functional α R In this section we investigate the functional : F R defined in (5) for continuity α 1 R → and differentiability. We equip F with the 1-weak topology. This topology is defined 1 4 to be the coarsest topology for which the mappings µ f dF, f , are continuous, 1 7→ ∈ C where is the set of all continuous functions f : R R´ with f(x) C (1+ x ) for 1 f C → | | ≤ | | all x R and some finite constant C > 0. A sequence (F ) F converges 1-weakly to f n 1 ∈ ⊆ some F F if and only if f dF f dF for all f ; cf. Lemma 3.4 in [33]. The 0 1 n 0 1 ∈ → ∈ C set F can obviously be iden´tified with´the set of all Borel probability measures µ on R 1 satisfying x µ(dx) < . In this context the 1-weak topology is sometimes referred to | | ∞ as ψ -weak´topology; see, for instance, [34]. But for our purposes it is more convenient 1 to work with the F -terminology. 1 Let L be the space of all Borel measurable functions v : R R modulo the equiv- 0 → alence relation of ℓ-almost sure identity. Note that F L , and let L L be the 1 0 1 0 ⊆ ⊆ subspace of all v L for which 0 ∈ v := v(x) ℓ(dx) (9) k k1,ℓ ˆ | | is finite. Here, and henceforth, ℓ stands for the Borel Lebesgue measure on R. Note that F F L for F ,F F . It is well-known that : L R provides a 1 2 1 1 2 1 1,ℓ 1 + − ∈ ∈ k· k → complete and separable norm on L and that 1 d (F ,F ) := F F W,1 1 2 1 2 1,ℓ k − k defines the Wasserstein-1 metric d : F F R on F . Also note that d W,1 1 1 + 1 W,1 × → metrizes the 1-weak topology on F ; cf. Remark 2.9 in [34]. 1 2.1. Continuity Since the Wasserstein-1 metric d metrizes the 1-weak topology on F , the following W,1 1 theorem is an immediate consequence of a recent result by Bellini et al. [6, Theorem 10]. Theorem 2.1 The functional : F R is continuous for the 1-weak topology. α 1 R → Theorem 2.1 can also be obtained by combining Theorem 4.1 in [16] with the Repre- sentation theorem 3.5 in [34]. Indeed, these two theorems together imply that the risk functional associated with any law-invariant coherent risk measure on L1 is 1-weakly continuous. For 1/2 α < 1 the functional itself is derived from a law-invariant α ≤ R coherent risk measure (see Proposition A.2 in Appendix A). So it is 1-weakly continu- ous. For 0 < α < 1/2 the map ρˇ : L1 R defined by ρˇ (X) := ρ ( X) provides a α α α → − − law-invariant coherent risk measure (cf. Proposition A.2 in Appendix A), so that the as- sociated statistical functional ˇ : F R, ˇ (F) = (Fˇ), is 1-weakly continuous. α 1 α α R 7→ R −R Here Fˇ stands for the distribution function derived from F via Fˇ(x) := 1 F(( x) ). − − − Sinceforanysequence (Fn)n∈N0 ⊆ F1,Fn → F0 1-weaklyifandonlyifFˇn → Fˇ0 1-weakly, it follows that also the functional is 1-weakly continuous. α R By the 1-weak continuity of we are in the position to easily derive strong consis- α R tency of the plug-in estimator (F ) for (F) in several situations; see Sections 3.1 α n α R R and 4.1. b 5 2.2. Differentiability We will use the notion of quasi-Hadamard differentiability introduced in [7, 8]. Quasi- Hadamard differentiability is a slight (but useful) generalization of the conventional tangential Hadamard differentiability. The latter is commonly acknowledged to be a suitable notion of differentiability in the context of the functional delta-method (see e.g. the bottom of p.166 in [28]), and it was shown in [7, 8] that the former is still strong enough to obtain a functional delta-method. Let L be equipped with the norm . 1 1,ℓ k·k Definition 2.2 Let : F R be a map and L0 be a subset of L . Then is said R 1 → 1 1 R to be quasi-Hadamard differentiable at F F tangentially to L0 L if there exists a ∈ 1 1h 1i continuous map ˙ : L0 R such that RF 1 → (F +ε v ) (F) lim ˙ (v) R n n −R = 0 (10) F n→∞(cid:12)R − εn (cid:12) (cid:12) (cid:12) holds for each triplet (v,(cid:12)(v ),(ε )), with v L0, (ε ) (cid:12) (0, ) satisfying ε 0, n n ∈ 1 n ⊆ ∞ n → (v ) L satisfying v v 0 as well as (F +ε v ) F . In this case the map n 1 n 1,ℓ n n 1 ⊆ k − k → ⊆ ˙ is called quasi-Hadamard derivative of at F tangentially to L0 L . RF R 1h 1i Notethat even when L0 = L , quasi-Hadamard differentiability of at F tangentially 1 1 R to L L is not the same as Hadamard differentiability of at F tangentially to L 1 1 1 h i R (with L regarded as the basic linear space containing both F and L ). Indeed, 0 1 1 1,ℓ k·k does not impose a norm on all of L (but only on L ), so that Hadamard differentiability 0 1 w.r.t. the norm is not defined. 1,ℓ k·k Theorem 2.3 Let F F and assume that it is continuous at (F). Then the func- 1 α ∈ R tional : F R is quasi-Hadamard differentiable at F tangentially to L L with α 1 1 1 R → h i linear quasi-Hadamard derivative ˙ : L R given by α;F 1 R → (1 α) v(x+ (F))ℓ(dx) + α v(x+ (F))ℓ(dx) ˙α;F(v) := − ´(−∞,0) Rα ´(0,∞) Rα . (11) R − (1 2α)F( (F)) + α α − R Note that (1 2α)F( (F)) + α = (1 α)F( (F))+α(1 F( (F))) > 0 holds α α α − R − R − R so that the denominator in (11) is strictly positive. Also note that quasi-Hadamard dif- ferentiability is already known form Theorem 2.4 in [35]. However, in [35] the derivative was not specified explicitly. The proof of Theorem 2.3 can be found in Section 5. Remark 2.4 As a direct consequence of Theorem 2.3 we obtain that the functional α R is also quasi-Hadamard differentiable at F (being continuous at (F)) tangentially to α R any subspace of L that is equipped with a norm being at least as strict as the norm 1 ✸ . 1,ℓ k·k 6 Example 2.5 To illustrate Remark 2.4, let φ : R [1, ) be a continuous function → ∞ that is non-increasing on ( ,0] and non-decreasing on [0, ). Let F be the set φ −∞ ∞ of all distribution functions F on R for which F < , where v := [0, ) φ φ sup v(x) φ(x). Let D be the space of all bouknde−d 1c`ad∞l`agk funct∞ions on Rkankd D x∈R| | φ be the subspace of all v D satisfying v < and lim v(x) = 0. If C := φ x φ ∈ k k ∞ | |→∞| | 1/φdℓ < , then D L and F F . On the space D the norm is stricter φ 1 φ 1 φ φ ∞ ⊆ ⊆ k·k ´than , because 1,ℓ k·k v = v(x) ℓ(dx) C v for every v D . (12) k k1,ℓ ˆ | | ≤ φk kφ ∈ φ Therefore is also quasi-Hadamard differentiable at F tangentially to D D with α φ φ R h i linear quasi-Hadamard derivative ˙ : D R given by (11) restricted to v D , α;F φ φ R → ∈ where D is equipped with the norm . ✸ φ φ k·k The established quasi-Hadamard differentiability of brings us in the position to α R easily derive results on the asymptotics of (F ); see Sections 3.2–3.3 and 4.2. In α n R Section 3.2 we combine Theorem 2.3 with a central limit theorem (by Dede [19]; cf. Theorem C.3 below) for the empirical process in tbhe space (L , ) in order to obtain 1 1,ℓ k·k the asymptotic distribution of (F ) in a rather general nonparametric setting. In α n R view of Example 2.5 one can alternatively use central limit theorems for the empirical process in the space (D , ) to obbtain the asymptotic distribution of (F ). See, φ φ α n k·k R for instance, Examples 4.4–4.5 in [8] as well as references cited there. b 3. Nonparametric estimation of (F) α R In thissection we consider nonparametric statistical models. Wewill always assume that the sequence of observations (X ) is a strictly stationary sequence of real-valued random i variables. In addition we will mostly assume that (X ) is ergodic; see Section 6.1 and i 6.7 in [14] for the definition of a strictly stationary and ergodic sequence. Recall that every sequence of i.i.d. random variables is strictly stationary and ergodic. Moreover a strictly stationary sequence is ergodic when it is mixing in the ergodic sense, and it is mixing in the ergodic sense when it is α-mixing; see Section 2.5 in [13]. For illustration, also note that many GARCH processes are strictly stationary and ergodic; cf. [11, 42]. Throughout this section the estimator for the marginal distribution function F of (X ) is assumed to be the empirical distribution function F of X ,...,X as defined i n 1 n in (7). Note that the mapping Ω F , ω F (ω, ), is ( , (F ))-measurable for 1 n 1 → 7→ · F B the Borel σ-algebra (F ) on (F ,d ), because the mappinbg Rn F , (x ,...,x ) 1 1 W,1 1 1 n B → 7→ 1 n , is( ,d )-continuous. Hence bybcontinuity of w.r.t. d , it follows tnhati=11([Fxi,∞))is a rkea·kl-vaWlu,1ed random variable on (Ω, ,P). Rα W,1 P Rα n F b 7 3.1. Strong consistency For 1/2 α < 1 the following theorem is a direct consequence of Theorem 2.6 in [34]. ≤ In the general case, Theorem 2.1 ensures that one can follow the lines in the proof of Theorem 2.6 in [34] to obtain the assertion of Theorem 3.1; we omit the details. Theorem 3.1 Let (X ) be a strictly stationary and ergodic sequence of L1-random vari- i ables on some probability space (Ω, ,P), and denote by F the distribution function of F the X . Let F be the empirical distribution function of X ,...,X as defined in (7). i n 1 n Then the plug-in estimator (F ) is strongly consistent for (F) in the sense that α n α R R b (F ) (F) P-a.s. αb n α R → R If X ,X ,... are i.i.d. random variables, then strong consistency can also be obtained 1 2 b from classical results on Z-estimators as, for example, Lemma A in Section 7.2.1 of [46]. Moreover, it was shown recently by Holzmann and Klar [29, Theorem 2] that in the i.i.d. case one even has sup (F ) (F) 0 P-a.s. for any α ,α (0,1) with α∈[αℓ,αu]|Rα n → Rα | → ℓ u ∈ α < α . ℓ u b 3.2. Asymptotic distribution Dedecker and Prieur [20] introduced the following dependence coefficients for a strictly stationary sequence of real-valued random variables (Xi) (Xi)i N on some probability space (Ω, ,P): ≡ ∈ F φ(n) := sup sup P[X ( ,x] k]( ) P[X ( ,x]] , (13) k N x R k n+k ∈ −∞ |F1 · − n+k ∈ −∞ k∞ ∈ ∈ α(n) := sup sup P[X ( ,x] k]( ) P[X ( ,x]] . (14) e k N x R k n+k ∈ −∞ |F1 · − n+k ∈ −∞ k1 ∈ ∈ Here ke:= σ(X ,...,X ) and denotes the usual Lp-norm on Lp = Lp(Ω, ,P), F1 1 k k · kp F p [1, ]. Note that by Proposition 3.22 in [12] the usual φ- and α-mixing coefficients ∈ ∞ φ(n) and α(n) can be represented as in (13)–(14) with sup and ( ,x] replaced by x∈R −∞ sup and A, respectively. In particular, φ(n) φ(n) and α(n) α(n). It is worth A∈B(R) ≤ ≤ mentioning that in [20] the starting point is actually a strictly stationary sequence of random variables indexed by Z (rather than Ne) and that therefeore the definitions of the above dependence coefficients are slightly different. However, it is discussed in detail in the Appendix D that any strictly stationary sequence (Xi) (Xi)i N can be extended ≡ ∈ to a strictly stationary sequence (Yi)i Z satisfying φ(n) = φ(n) and α(n) = α(n), where ∈ φ(n) and α(n) are the dependence coefficients of (Yi)i Z as originally introduced in [20]. It is also discussed in the Appendix D that if (Xi) eis i∈n addition ergoedic, then (Yi)i Z is ∈ ergodic too. Let us denote by Q the c`adl`ag inverse of the tail function x P[ X > x]. Let us |X1| 7→ | 1| write N for the centered normal distribution with variance s2. Moreover, let us use 0,s2 ❀ to denote convergence in distribution. 8 Theorem 3.2 Let (X ) be a strictly stationary and ergodic sequence of real-valued ran- i dom variables on some probability space (Ω, ,P). Denote by F the distribution function F of the X , and assume that F is continuous at (F) and that F(1 F)dℓ < i α R − ∞ (in particular F F ). Let F be the empirical distribution func´tion of X ,...,X as ∈ 1 n p 1 n defined in (7). Finally assume that one of the following two conditions holds: b n 1/2φ(n)1/2 < , (15) − ∞ n N X∈ e n 1/2 Q (u)u 1/2ℓ(du) < . (16) n N − ˆ(0,αe(n)) |X1| − ∞ X∈ Then √n( (F ) (F)) ❀ Z in (R, (R)) α n α F R −R B for Z N with F ∼ 0,s2 b s2 = s2 := f (t )C (t ,t )f (t )(ℓ ℓ)(d(t ,t )), (17) α,F ˆ α,F 0 F 0 1 α,F 1 ⊗ 0 1 R2 where 1 f (t) := (1 α) (t)+α (t) , (18) α,F (1 2α)F( α(F))+α − 1(−∞,Rα(F)] 1(Rα(F),∞) − R (cid:16) (cid:17) 1 ∞ C (t ,t ) := F(t t )(1 F(t t ))+ Cov( , ). (19) F 0 1 0 ∧ 1 − 0 ∨ 1 1{X1≤ti} 1{Xk≤ti−1} i=0 k=2 XX Proof Theorem 2.3 shows that is quasi-Hadamard differentiable at F tangentially α R to L L (w.r.t. the norm ) with quasi-Hadamard derivative ˙ given by (11). 1 1 1,ℓ α;F h i k·k R The functional delta-methodin theformof Theorem B.3(i) and Theorem C.3 thenimply that √n( (F ) (F)) converges in distribution to ˙ (B ), where B is an L - α n α α;F F F 1 R −R R valued centered Gaussian random variable with covariance operator Φ given by (60). BF Now, ˙ (Bb) = f (x)B (x)ℓ(dx) for the L -function f given by (18). Since α;F F α,F F α,F R − ∞ B is a centered Gau´ssian random element of L , and since f represents a continuous F 1 α,F linear functional on L , the random variable ˙ (B ) is normally distributed with zero 1 α;F F R mean and variance Var[ ˙ (B )] = E[ ˙ (B )2] = Φ (f ,f ), and the latter Rα;F F Rα;F F BF α,F α,F ✷ expression is equal to the right-hand side in (17). Note that when X ,X ,... are i.i.d. random variables, then (15) and (16) are clearly 1 2 satisfied and the expression for the variance s2 in (17) simplifies insofar as the sum 1i=0 ∞k=2(···)in(19)vanishes, sothats2 = E[Uα(X1−Rα(F))2]/dF(α)2 withdF(α) := (1 2α)F( (F))+α. The latter may be seen by applying Hoeffding’s variance formula P− P Rα (cf., e.g., Lemma 5.24 in [40]) to calculate Var[(X (F))+] and Var[(X (F)) ] 1 α 1 α − −R −R (take into account that by (3) we have E[U (X (F))2] = Var[U (X (F))], α 1 α α 1 α − R − R and obviously Cov((X (F))+,(X (F)) ) = 0). But even in this case the 1 α 1 α − − R − R 9 variance s2 depends on the unknown distribution function F in a fairly complex way. So, for the derivation of asymptotic confidence intervals the bootstrap results of Section 3.3 are expected to lead to a more efficient method than the method that is based on the nonparametric estimation of s2 = s2 . α,F Remark 3.3 In the i.i.d. case Theorem 3.2 can also be obtained from classical results on Z-estimators as, for example, Theorem A in Section 7.2.2 of [46]. Recently Holzmann and Klar [29, Theorem 7] showed that, still in the i.i.d. case, continuity of F at (F) α R is even necessary in order to obtain a normal limit. It is also worth mentioning that the integrability condition on F in Theorem 3.2 is slightly stronger than needed, at least in the i.i.d. case. Holzmann and Klar [29, Corollary 4] only assumed that F possesses a finite second absolute moment which is slightly weaker than assuming our integrability ✸ condition. Remark 3.4 The following assertions illustrate the assumptions of Theorem 3.2. (i) The integrability condition F(1 F)dℓ < holds if φ2dF < for some − ∞ ∞ continuous function φ : R ´ [1, ) satisfying 1/φdℓ <´ and being strictly →p ∞ ∞ decreasing and strictly increasing on R and R ´, respectively. + − (ii) Condition (15) holds if φ(n) = (n b) for some b > 1. − O (iii) Condition (16) implies condition (15) with φ(n) replaced by α(n). e (iv) Condition (16) is equivalent to e e n 1/2 α(n)1/2 P[ X > x]1/2ℓ(dx) < . − ˆ ∧ | 1| ∞ Xn∈N (0,∞) e (v) Condition (16) holds if F(1 F)dℓ < and α(n) = (n b) for some b > 1. − − ∞ O ´ p ✸ See Section 6.1 for the proofs of these assertions. e 3.3. Bootstrap consistency In this section we present two results on bootstrap consistency in the setting of Theorem 3.2. In the following Theorem 3.5 we will assume that the random variables X ,X ,... 1 2 are i.i.d. In Theorem 3.6 ahead we will assume that the sequence (X ) is β-mixing. We i will use ̺ to denote the bounded Lipschitz metric on the set of all Borel probability BL measures on R; see the Appendix B for the definition of the bounded Lipschitz metric. By P we will mean the law of a random variable ξ under P, and as before N refers ′ξ ′ 0,s2 to the centered normal distribution with variance s2. 10