Some asymptotic results on density estimators by wavelet projections 2 ⋆ 1Davit Varron, Université Catholique de Louvain 0 2 n Abstract a J 6Let (Xi)i≥1 be an i.i.d. sample on Rd having density f. Given a real function φ on Rd with finite 2 variation and given an integer valued sequence (jn), let fˆn denote the estimator of f by wavelet ] Tprojection based on φ and with multiresolution level equal to j . We provide exact rates of almost n Ssure convergence to 0 of the quantity sup | fˆ (x)−E(fˆ )(x) |, when n2−djn/logn → ∞ and H . x∈H n n h tis a given hypercube of Rd. We then show that, if n2−djn/logn → c for a constant c > 0, then the a mquantity sup |fˆ (x)−f | almost surely fails to converge to 0. x∈H n [Keywords: Empirical processes, Wavelets, Density estimation, Laws of the iterated logarithm. 1 AMS classification: 62G30, 62G30, 62G07, 42C40. v 0 2 5 5 1.1 Introduction and statement of the results 0 2 1 : vThe well known wavelet theory (see, e.g., Mallat (1989)) has proven useful in i Xmay branches of applied mathematics and functional estimation in the field of r astatitics. In this paper, we are interested in estimating the Lebesgue density f of an independent, identically distributed (i.i.d.) sample (X ) on Rd. Let φ be a i i≥1 (mother wavelet) real function on Rd. There exists N = 2d−1 associated (father wavelet functions) Ψ , 1 ≤ j ≤ N, so as any function f ∈ L2(Rd,λ) has the j ⋆ Institute of Statistics, voie du Roman Pays 20, 1348 Louvain la Neve, Belgium [email protected] 1 following orthogonal representation (for more details, see, e.g., Masry (1997)). N f := α φ + β Ψ , where (1.1) j ,k j ,k i,j,k i,j,k 0 0 kX∈Zd jX≥j0iX=1kX∈Zd φ :=2dj/2φ(2j · −k), Ψ := 2dj/2Ψ (2j · −k), i ≤ N, j ≥ 1, k ∈ Zd. (1.2) j,k i,j,k i 1.1 The linear wavelet projection estimator The linear wavelet projection estimator (see, e.g., Masry (1997)) of f is con- structed by estimating the coefficients α , β by their empirical analogues: j ,k i,j,k 0 1 n 1 n ˆ αˆ := φ (X ), β := Ψ (X ), (1.3) j ,k j ,k i i,j,k i,j,k i 0 n 0 n iX=1 iX=1 and stopping the expansion (1.1) at a deterministic (multiresolution) level j , n which will be assumed to grow with the sample size n. jn−1 N ˆ ˆ f (x) := αˆ φ + β Ψ . (1.4) n j j ,k i,j,k i,j,k 0 0 kX∈Zd jX=j0 iX=1kX∈Zd The aim of this paper is to describe the almost sure asymptotic behaviour of the ˆ quantity | f (x)−f(x) |, uniformly in x ∈ H, where H is a given an hypercube of n Rd. Obviously, the asymptotic behaviour of (j ) plays a crucial role, and 2−djn n n≥1 can be intuitively compared to the bandwidth when estimating f by usual kernel ˆ methods. Massiani (2003) has given an asymptotic result of f when the sample n (X ) takes values in R and under the following conditions, with h := 2−djn : i i≥1 n h ↓ 0, nh ↑ ∞, nh /logn → ∞, log(1/h )/loglogn → ∞, (1.5) n n n n f is continuous and strictly positive on an open subset O and H ⊂ O, (1.6) φ has finite variation on Rd and has a compact support. (1.7) 2 Conditions (1.5) are called the Csörgő-Révész-Stute conditions. Massiani proved that, under (1.5), (1.6) and (1.7) we have, almost surely, 1/2 n2djn ˆ E ˆ lim sup± f (x) − f (x) = 1. (1.8) n n n→∞ x∈H (cid:18)2f(x)log(2djn)(cid:19) (cid:18) (cid:16) (cid:17)(cid:19) We also refer to Masry(1997) for related results when (X ) is a stationary i i≥1 strongly mixing sequence. To prove (1.8), the author made use of the following ˆ expression of f (see, e.g, Masry (1997)) n 1 n ˆ f (x) := K (x,X ), where (1.9) n j i n n iX=1 K(x,y) := φ(x − k)φ(y − k), x,y ∈ Rd, (1.10) kX∈Zd K (x,y) :=2djnK 2jnx,2jny . (1.11) j n (cid:16) (cid:17) Then, the author showed that (1.9) can be expressed quite simply with the func- tional increments of the empirical distribution function, and made extensively use of related results established by Deheuvels and Mason (1992). We point out the fact that the just mentioned pioneering results do not cover the case where the sample is multivariate (d > 1), as this result relies on the strong approximation theorem of Komlós et al. (1977). As a consequence, Massiani could only prove (1.8) when d = 1. However, Mason (2004) recently made a skillful use of some recent tools in empirical processes theory to extend the results of Deheuvels and Mason (1992) to a more general framework, which covers the case where d > 1. As a consequence, we are now able to prove the following result. 3 Theorem 1 Under assertions (1.5), (1.6), (1.7) we have almost surely: (i) For each ǫ > 0, there exists n(ǫ) such that, for each n ≥ n(ǫ) and x ∈ H, 1/2 n2djn ˆ E ˆ f (x) − f (x) ∈ [−1 − ǫ,1 + ǫ], n n (cid:18)2f(x)log(2djn)(cid:19) (cid:18) (cid:16) (cid:17)(cid:19) (ii) For each v ∈ [−1,1] and ǫ > 0, there exists n(ǫ,v) such that, for each n ≥ n(ǫ,v), 1/2 n2djn ˆ E ˆ inf f (x) − f (x) − v < ǫ. n n x∈H (cid:12)(cid:12)(cid:18)2f(x)log(2djn)(cid:19) (cid:18) (cid:16) (cid:17)(cid:19) (cid:12)(cid:12) (cid:12) (cid:12) (cid:12) (cid:12) As mentioned above, the uniform behaviour of the increments of the empirical ˆ process shows up to rule that of f (x). Moreover, it is well known (see Deheuvels n and Mason (1992)) that this behaviour changes abruptly when conditions (1.5) are replaced by the following Erdös-Rényi conditions: h ↓ 0, nh ↑ ∞, nh /logn → c. (1.12) n n n Here, c > 0 is a finite constant. Since the pioneering result of Deheuvels and Ma- son (1992), several extensions have been made. In Varron (2007), Varron recently showed that this nonstandard UFLL still holds when d > 1. Our next result shows that, under (1.12), the nonstandard behaviour of the empirical increments ˆ implies that the uniform strong consistency of f on a hypercube H fails to hold. n Theorem 2 Under (1.6), (1.7), (1.12), the following event has probability 1: ˆ f (x ) n n ∃ ǫ > 0, ∀n , ∃n ≥ n , ∃ x ∈ H fulfilling − 1 > ǫ. (1.13) 0 0 n (cid:12) f(x ) (cid:12) (cid:12) n (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) 2 Proofs of Theorem 1 Recall that h := 2−djn, n ≥ 1. To prove Theorem 1, we shall require some n more notations. Given s := (s ,...,s ) and v := (v ,...,v ), we shall write 1 d 1 d 4 s ≺ v whenever s ≤ v for each 1 ≤ i ≤ d and we shall write [s,v] for the set i i {u ∈ Rd, s ≺ u ≺ v}. The increments of the empirical process based on (X ) i i≥1 are defined as follows (C denoting a Borel subset of Rd) 1 n X − z X − z ∆α (x,h ,C) := n1/2 1 i − E 1 i . (2.1) n n C C n iX=1 (cid:18) h1n/d (cid:19) (cid:18) (cid:18) h1n/d (cid:19)(cid:19) A standard argument of homothety shows that we can make the following as- sumption with no loss of generality: φ has finite variation on Rd and has a support included in [−1/4,1/4]d. (2.2) Set Id := [−1/2,1/2]d, and consider the space B(Id) of real, bounded, Borel functions on Id. We endow B(Id) with the usual supremum norm, namely || g ||:= sup{| g(s) |, s ∈ Id}. The proof of Theorem 1 strongly relies on the following fact, which is due to Mason (2004). Call H the set of points x ∈ H n such that 2jnx ∈ Zd, and define the following Strassen-type set: S := g ∈ B(Id), ∃g˙ Borel, g˙2dλ ≤ 1, ∀s ∈ Id, g(s) = g˙dλ . Id (cid:26) Z Z (cid:27) Id [s,1/2] Fact 1 (Mason, 2004): Set ∆α (x,h ,[s,1/2]) g (s) := n n , x ∈ H, n ≥ 1, s ∈ Id. (2.3) n,x 2f(x)h log(1/h ) n n q Under assumption (1.5) and (1.6), we have almost surely: (a) ∀ǫ > 0, ∃n(ǫ), ∀n ≥ n(ǫ) and x ∈ H, inf || g − g ||< ǫ n,x g∈S Id (b) ∀g ∈ S and ǫ > 0, ∃n(ǫ,g), ∀n ≥ n(ǫ,g), inf || g − g ||< ǫ. Id n,x x∈H n This fact is a nearly direct consequence of Theorem 1 of Mason (2004), considering the class F := {1 , s ∈ Id}, by Remark F.2 in Mason (2004). [s,1/2] 5 Remark 1 We point out the fact that Theorem 1 in Mason (2004) cannot lead to Fact 1 directly, because (b) involves the quantity inf{|| g − g ||, x ∈ H } n,x n instead of inf{|| g − g ||, x ∈ H}. However, looking closely at the proof of n,x point (b) of Theorem 1 in Mason (2004), we can see that H can be replaced by H , as we can choose {z ,...,z } := H in his proof of Lemma 2. n 1,n m ,n n n Set, for fixed x ∈ Rd and n ≥ 1 (recall (1.10)), K (s) :=K(2jnx,2jnx + s), n ∈ Rd, s ∈ Rd (2.4) n,x f σ2 := K2 (s)ds (2.5) n,x Z n,x Rd f By assumption (2.2), each K has support included in Id. Now, we consider n,x the following continuous linefar applications, from B(Id),|| · || to R. For fixed x ∈ Rd and n ≥ 1, set (cid:16) (cid:17) Θ g := σ −1 g(s)dK (s), g ∈ B(Id). (2.6) n,x n,x n,x (cid:16) (cid:17) Z Id f With these notations, we obviously have for each n ≥ 1 and x ∈ Rd, almost surely, nh 1/2 n ˆ E ˆ f (x) − (f (x)) = Θ (g ), (2.7) n n n,x n,x (cid:18)2f(x)log(1/h )(cid:19) (cid:18) (cid:19) n so as the random objects involved in Theorem 1 show up to be correctly chosen functions of the increments of the empirical process. We first focus on proving point (i) of Theorem 1. Standard analysis shows that Θ S = [−1,1] for each x ∈ Rd and n ≥ 1. (2.8) n,x Id (cid:16) (cid:17) 6 Moreover, by definition of the K and by (2.2) we have, Υ denoting the total n,x variation of a function, f sup Υ K < ∞, (2.9) n,x n≥1,x∈H (cid:16) (cid:17) f inf σ > 0, whence (2.10) n,x n≥1,x∈H sup sup | Θ (g) |< ∞. (2.11) n,x n≥1,x∈H g∈B(Id),||g||=1 Note that(2.10)is a consequenceof the Cauchy-Schwartz inequality, as K (s)ds = n,x 1 (see, e.g., Meyer (1990), p. 33). Now, combining (2.11) and point (Raf) of Fact 1, we conclude that point (a) of Theorem 1 is true, by routine topology. We shall now prove point (b) of Theorem 1. Recall that x ∈ H if an only if n 2jnx ∈ Zd. Hence, by definitions (1.10) and (2.4) we have K = K , Θ = Θ , n ≥ 1, x ∈ H . (2.12) n,x 0,0 n,x 0,0 n f f Now fix ǫ > 0 and v ∈ [−1,1]. Recalling (2.8) we choose g ∈ S fulfilling Id Θ (g) = v. Now, as Θ is Lipschitz, and by point (b) of Fact 1, we conclude 0,0 0,0 that, almost surely, there exists n(ǫ,g) such that, for each n ≥ n(ǫ,g), there exists x ∈ H fulfilling | Θ (g ) − v |=| Θ (g ) − Θ (g) |< ǫ. The end of the n n n,x n,x 0,0 n,x 0,0 ✷ proof follows readily, as [−1,1] is compact. 3 Proof of Theorem 2 In this section, condition (1.5) is replaced by condition (1.12). We first define 1 n X − x i ∆F (x,h ,C) := 1 , C Borel, x ∈ H, n ≥ 1, (3.1) n n C cf(x)nhn iX=1 (cid:18) h1n/d (cid:19) g (s) :=∆F (x,h ,[s,1/2]), s ∈ Id, x ∈ H, n ≥ 1. (3.2) n,x n n e 7 Set h(x) = (xlogx−x+1)1 (x)+1 (x) for x ≥ 0 and h(x) = ∞ otherwise. (0,∞) {0} Now consider the following limit sets depending on a real parameter v > 0 : Γ := g, ∃g˙ Borel, h g˙ dλ ≤ 1/v, ∀s ∈ Id, g(s) = g˙dλ . (3.3) v,Id (cid:26) Z (cid:16) (cid:17) Z (cid:27) Id [s,1/2] We shall make use of the following result, which is a consequence of Theorem 1 of Varron (2007). Recall that x ∈ H if and only if 2jnx ∈ Zd. n Fact 2 (Varron) Under assumptions (1.6) and (1.12), the following assertions hold with probability one. (a) ∀ǫ > 0, ∃n(ǫ), ∀n ≥ n(ǫ) and x ∈ H, inf || g − g ||< ǫ; n,x g∈Γ cf(x),Id e (b) ∀ x ∈ H , g ∈ Γ , ǫ > 0, ∃n(ǫ,g), ∀n ≥ n(ǫ,g), inf || g − g ||< ǫ. cf(x) n,x x∈H n e Remark 2 Note that Fact 2 differs from Theorem 1 in Varron (2007) by two aspects. First, the involved class of set is F := {1 , s ∈ Id} instead of 1 [s,1/2] F := {1 , s ∈ [0,1]d}. However, by a standard translation argument, one 2 [0,s] can trivially transpose Theorem 1 in Varron (2007) from F to F . Second, the 2 1 cube H is replaced by H in point (b). As in Remark 1, we underline that this n replacement can be made by a close look at the arguments of Varron (2007) To prove Theorem 2, we shall make use of point (b) of Fact 2. Similarly to what was done in §2, we introduce the following linear applications Θ′ (g) := g(s)dK (s), g ∈ B(Id). (3.4) n,x Z n,x Id f Notice that, for any n ≥ 1 and x ∈ H we have Θ′ = Θ′ since K = K . n n,x 0,0 n,x 0,0 Now, as f f ˆ f (x) n := Θ′ (g ) = Θ′ (g ), n ≥ 1, x ∈ H , (3.5) n,x n,x 0,0 n,x n f(x) e e 8 the proof of point (b) of Theorem 2 would be a direct consequence of point (b) of Fact 2, provided that the following statement is true for some δ > 0 : Θ′ Γ ⊃ [1 − δ,1 + δ]. (3.6) 0,0 cf(x),Id x\∈H (cid:16) (cid:17) Now, by definition of Γ , v > 0 we obviously have v,Id Θ′ Γ = Θ′ (Γ ) =: J, x\∈H 0,0(cid:16) cf(x),Id(cid:17) 0,0 cf(x0),Id where f(x ) = sup{f(x), x ∈ H}. Note that, when d = 1, the set J can be 0 described by making use of the optimisation techniques of Deheuvels and Mason (see Deheuvels and Mason (1991), Theorem 3 and 4 and Deheuvels and Mason (1992), Theorem 4.2). To conclude the proof of Theorem 2, we shall now show that J has a nonempty interior. Define the following function for Id to R: d 1 g : (s ,...,s ) → ( − s ). (3.7) 0 1 d i 2 iY=1 Obviously, g belongs to Γ , as g˙ ≡ 1 fulfills the requirements stated in 0 cf(x ),Id 0 0 (3.3). Moreover, an integration by parts leads to the conclusion that T′ (g ) = K (s)ds = K(s)ds = 1. (3.8) 0,0 0 Z n,x Z Rd f Rd As Γ is convex and T′ is linear, the set J is an interval that contains cf(x0),Id 0,0 T′ (g ) = 1. Moreover, as h is continuous at x = 1, we have ρg ∈ Γ and 0,0 0 0 cf(x0),Id ρ−1g ∈ Γ for ρ > 1 small enough, which entails, by linearity of Θ′ , 0 cf(x0),Id 0,0 inf Θ (g) ≤ ρ−1 < 1 < ρ ≤ sup Θ (g)✷ (3.9) 0,0 0,0 g∈Γcf(x0),Id g∈Γcf(x0),Id References [1] Deheuvels, P., Mason, D. (1991). A tail empirical process approach to some nostandard laws of the iterated logarithm. J. Theoret. Probab. 4, 53–85. 9 [2] Deheuvels, P., Mason, D. (1992). Functional laws of the iterated logarithm for the increments of empirical and quantile processes. Ann. Probab. 20, 1248–1287. [3] Komlós, J., Major, P., Tusnády, G. (1977). An approximation of partial sums of independent r.v.’s and the sample d.f.II. Z. Wahrsch. Verv. Gebiete 34, 33–58. [4] Mallat, S. G.(1989). A theory for multiresolution signal decomposition: the wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 11, 674–693. [5] Mason,D.(2004). Auniformfunctional lawoftheiterated logarithmforthelocalempirical process. Ann. Probab. 32 (2), 1391–1418. [6] Masry, E. (1997). Multivariate probability density estimation by wavelet methods: Strong consistency and rates for stationary time series. Stochastic processes and their applications 67, 177–193. [7] Massiani, A. (2003). Vitesse de convergence uniforme presque sûre de l’estimateur linéaire par méthode d’ondelettes. C. R. Math. Acad. Sci. Paris 337 (1), 67–70. [8] Meyer, Y. (1990). Ondelettes; Ondelettes et Opérateurs I. Hermann Paris. [9] Varron,D.(2007).Anonstandarduniformfunctionallimitlawfortheincrementsofthemutlivariate empirical distribution function. Preprint. 10