Convergence rates for density estimators of weakly dependent time series 7 0 0 2 Nicolas Ragache1 and Olivier Wintenberger2 n 1 MAP5, Universit´eRen´eDescartes 45 rue desSaints-P`eres, 75270 Paris, France a J [email protected] 2 SAMOS,StatistiqueAppliqu´eeetMOd´elisation Stochastique,Universit´eParis 1, 0 1 Centre Pierre Mend`es France, 90 ruede Tolbiac, F-75634 Paris Cedex 13, France. [email protected] ] T 1 Introduction S . h Assume that(Xn)n Z is a sequence ofRd valuedrandomvariableswith com- at mon distribution w∈hich is absolutely continuous with respect to Lebesgue’s m measure,withdensityf.Stationarityisnotassumedsothatthecaseofasam- pled process X = x for any sequence of monotonic functions [ (hn(.))n Z an{dain,nystatihonn(ai)r}y1p≤ri≤ocness(xn)n Z thatadmitsamarginaldensity 2 ∈ ∈ is included. This paper investigates convergence rates for density estimation v in different cases. First, we consider two concepts of weak dependence: 4 5 Non-causal η-dependence introduced in [DL99] by Doukhan & Louhichi, 2 • Dedecker & Prieur’s φ˜-dependence (see [DP04]). 3 • 0 These two notions of dependence cover a large number of examples of time 6 series (see section 3). Next, following Doukhan (see [Dou90]) we propose a 0 § / unified study of linear density estimators fˆn of the form h t n a 1 fˆ (x)= K (x,X ), (1) m n n mn i i=1 : X v where K isasequenceofkernels.Underclassicalassumptionson K Xi (see se{ctiomnn}2.2), the results in the case of independent and identica{llymdnis}- § r tributed (i.i.d. in short) observations X are well known (see for instance i a [Tsy04]). At a fixed point x Rd, the sequence m can be chosen such that n ∈ fˆ (x) f(x) =O n ρ/(2ρ+d) , (2) n q − k − k (cid:16) (cid:17) where X q =EX q. The coefficient ρ>0 measures the regularity of f (see k kq | | Section 2.2 for the definition of the notion of regularity). The same rate of convergencealsoholdsfortheMeanIntegratedSquareError(MISE),defined 2 Nicolas Ragache and Olivier Wintenberger as fˆ (x) f(x) 2p(x)dx for some nonnegative and integrable function p. k n − k2 The rate of uniform convergence on a compact set incurs a logarithmic loss R appears. For all M >0 and for a suitable choice of the sequence m , n qρ/(d+2ρ) logn E sup fˆ (x) f(x)q = , (3) n | − | O n x M (cid:18) (cid:19) k k≤ and ρ/(d+2ρ) logn sup fˆ (x) f(x) = . (4) n a.s. | − | O n kxk≤M (cid:18) (cid:19) These rates are optimal in the minimax sense. We thus have no hope to improve on them in the dependent setting. A wide literature deals with density estimation for absolutely regular or β-mixing processes (for a defi- nitionofmixingcoefficients,see[Dou94]).Forinstance,undertheassumption β = o r 3 2d/ρ , Ango Nze & Doukhan prove in [AD98] that (2), (3) and r − − (4) still(cid:16)hold. Th(cid:17)e sharper condition r|βr| < ∞ entails the optimal rate of convergence for the MISE (see [Vie97]). Results for the MISE have been extended to the more general φ˜- and Pη-dependence contexts by Dedecker & Prieur ([DP04]) and Doukhan & Louhichi in [DL01]. In this paper, our aim is to extend the bounds (2), (3) and (4) in the η- and φ˜-weak dependence contexts. We use the same method as in [DL99] based on the following moment inequality for weakly dependent and centered sequences (Zn)n Z. For each ∈ even integer q and for each integer n 2: ≥ n q (2q 2)! Z − Vq/2 V , (5) (cid:13) i(cid:13) ≤ (q 1)! 2,n ∨ q,n (cid:13)Xi=1 (cid:13)q − n o (cid:13) (cid:13) (cid:13) (cid:13) where X q =EX(cid:13)q and f(cid:13)or k =2,...,q, k kq | | n 1 − V =n (r+1)k 2C (r), k,n − k r=0 X with C (r):=sup cov(Z Z ,Z Z ) , (6) k {| t1··· tp tp+1··· tk |} where the supremum is over all the ordered k-tuples t t such that 1 k ≤ ··· ≤ sup t t =r. 1W≤ei≤wk−il1l ai+p1pl−y tihis bound when the Z s are defined in such a way that i n Z is proportionalto the fluctuation termfˆ (x) Efˆ (x). The inequal- i=1 i n − n ity (5) gives a bound for this part of the deviation of the estimator which P depends on the covariance bounds C (r). The other part of the deviation is k the bias, which is treated by deterministic methods. In order to obtain suit- able controls of the fluctuation term, we need two different type of bounds Convergence rates for density estimators of weakly dependenttime series 3 for C (r). Conditions on the decayof the weakdependence coefficients give a k firstbound.Another typeofconditionis alsorequiredtoboundC (r) forthe k smaller values of r; this is classically achieved with a regularity condition on the joint law of the pairs (X ,X ) for all j =k. In Doukhan & Louhichi (see j k 6 [DL01]), rates of convergence are obtained when the coefficient η decays geo- metrically fastandthe jointdensities arebounded.We relaxthese conditions to cover the case when the joint distributions are not absolutely continuous andwhentheη-andφ˜-dependencecoefficientsdecreaseslowly(sub-geometric and Riemannian decays are considered). Underourassumptions,weprovethat(2)stillholds(seeTheorem1).Un- fortunately,additionallossesappearfortheuniformbounds.Whenη orφ˜ = r r O(e arb) with a>0 and b>0, we prove in Theorem 2 that (3) and (4) hold − withlog(n)replacedbylog2(b+1)/b(n).Ifη orφ˜ =O(r a)witha>1,Theo- r r − rem3givesboundssimilarto(3)and(4)withtherighthandsidereplacedby O(n−qρ/{d+2ρ+2d/(q0+d)} and O( logq0+d(n)/nq0−2 ρ/{2ρq0+d(q0+2)}), respec- { } tively, and with q = 2 (a 1)/2 (by definition x is the smallest integer 0 ⌈ − ⌉ ⌈ ⌉ larger than or equal to the real number x). As already noticed in [DL01], the loss w.r.t the i.i.d. case highly depends on the decay of the dependence coefficients. In the case of geometric decay, the loss is logarithmic while it is polynomial in the case of polynomial decays. Thepaperisorganizedasfollows.InSection2.1,weintroducethenotions of η and φ˜ dependence. We give the notation and hypothesis in Section 2.2. The main results are presented in Section 2.3. We then apply these results to particular cases of weak dependence processes, and we provide examples of kernel K in Section 3. Section 4 contains the proof of the Theorems and m three important lemmas. 2 Main results We firstdescribe the notions of dependence consideredin this paper,then we introduce assumptions and formulate the main results of the paper (conver- gence rates). 2.1 Weak dependence We consider a sequence (Xi)i Z of Rd valued random variables, and we fix a norm on Rd. Moreover,if∈h:Rdu R for some u 1, we define k·k → ≥ h(a ,...,a ) h(b ,...,b ) 1 u 1 u Lip (h)= sup | − | . a b + + a b (a1,...,au)6=(b1,...,bu) k 1− 1k ··· k u− uk Definition 1 (η-dependence, Doukhan & Louhichi (1999)). The pro- cess (Xi)i Z is η-weakly dependent if there exists a sequence of non-negative ∈ real numbers (η ) satisfying η 0 when r and r r 0 r ≥ → →∞ 4 Nicolas Ragache and Olivier Wintenberger cov h(X ,...X ),k X ,...,X (uLip(h)+vLip(k))η , i1 iu iu+1 iu+v ≤ r for(cid:12)all (u(cid:0)+v)-tuples, (i ,..(cid:0).,i ) with i (cid:1)(cid:1)(cid:12) i i +r i (cid:12) 1 u+v 1 (cid:12) u u u+1 ≤···≤ ≤ ≤ ≤···≤ i , and h,k Λ(1) where u+v ∈ Λ(1) = h: u 0,h:Rdu R,Lip(h)< , h = sup h(x) 1 . (cid:26) ∃ ≥ → ∞ k k∞ x Rdu| |≤ (cid:27) ∈ RemarkTheη-dependenceconditioncanbeappliedtonon-causalsequences because information “from the future” (i.e. on the right of the covariance) contributestothedependencecoefficientinthesamewayasinformation“from the past” (i.e. on the left). It is the non-causal alternative to the θ condition in [DD03] and [DL99]. Definition 2 (φ˜-dependence,Dedecker &Prieur(2004)).Let(Ω, ,P) be a probability space and a σ-algebra of . For any l N , any ranAdom ∗ variable X Rdl we defineM: A ∈ ∈ φ˜( ,X)=sup E(g(X) ) E(g(X)) ,g Λ , 1,l M {k |M − k∞ ∈ } where Λ = h:Rdl R/Lip(h)<1 . The sequence of coefficients φ˜ (r) is 1,l k { 7→ } then defined by 1 φ˜ (r)=max sup φ˜(σ( X ;j i ),(X ,...,X )). k l≤k l i+r≤j1<j2<···<jl { j ≤ } j1 jl The process is φ˜-dependent if φ˜(r)=sup φ˜ (r) tends to 0 with r. k>0 k Remark The φ˜ dependence coefficients provide covariance bounds. For a Lipschitz function k and a bounded function h, cov h(X ,...,X ),k X ,...,X i1 iu iu+1 iu+v (cid:12)(cid:12) (cid:0) (cid:0) ≤vE|h((cid:1)X(cid:1)(cid:12)(cid:12)i1,...,Xiu)|Lip(k)φ˜(r). (7) 2.2 Notations and definitions Assume that (Xn)n Z is an η or φ˜ dependent sequence of Rd valued random ∈ variables.We consider two types of decays for the coefficients. The geometric case is the case when Assumption [H1] or [H1’] holds. [H1]: η =O e arb with a>0 and b>0, r − [H1’]: φ˜(r)=(cid:16)O e a(cid:17)rb with a>0 and b>0. − (cid:16) (cid:17) The Riemannian case is the case when Assumption [H2] or [H2’] holds. [H2]: η =O(r a) with a>1, r − [H2’]: φ˜(r)= (r a) with a>1. − O Convergence rates for density estimators of weakly dependenttime series 5 As usual in density estimation, we shall assume: [H3]: The common marginal distribution of the random variables X , n n Z is absolutely continuous with respect to Lebesgue’s measure, with ∈ common bounded density f. The next assumptionis onthe density withrespectto Lebesgue’s measure(if it exists) of the joint distribution of the pairs (X ,X ), j =k. j k 6 [H4] The density f of the joint distribution of the pair (X ,X ) is uni- j,k j k formly bounded with respect to j =k. 6 Unfortunately, for some processes, these densities may not even exist. For example,the jointdistributions ofMarkovchainsX =G(X ,ǫ )maynot n n 1 n − be absolutely continuous. One of the simplest example is 1 X = (X +ǫ ) , (8) k k 1 k 2 − where ǫ is an i.i.d. sequence of Bernoulli random variables and X is k 0 { } uniformly distributed on [0,1]. The process X is strictly stationary but n { } the joint distributions of the pairs (X ,X ) are degenerated for any k. 0 k This Markov chain can also be represented (through an inversion of the time) as a dynamical system (T ,...,T ,T ) which has the same law as n 1 0 − − (X ,X ,...,X ) (T and X are random variables distributed according to 0 1 n 0 0 the invariantmeasure,see [BGR00] for more details). Letus recallthe defini- tion of a dynamical system. Definition 3 (dynamicalsystem).Aone-dimensional dynamicalsystemis defined by k N, T :=Fk(T ), (9) k 0 ∀ ∈ where F :I I, I is a compact subset of R and in this context, Fk denotes → the k-th iterate of the appplication F: F1 = F, Fk+1 = F Fk, k 1. We ◦ ≥ assume that there exists an invariant probability measure µ , i.e. F(µ )=µ , 0 0 0 absolutely continuous with respect to Lebesgue’s measure, and that T is a 0 random variable with distribution µ . 0 We restrictour study to one-dimensionaldynamicalsystems T in the class F ofdynamicalsystemsdefinedbyatransformationF thatsatisfiesthefollowing assumptions (see [Pri01]). • F∀kk(∈xN),∀exxis∈t;int(I),limt→0+Fk(x+t)=Fk(x+)andlimt→0−Fk(x+t)= − k N , denoting Dk = x int(I),Fk(x+) = x and Dk = x • ∀ ∈ ∗ + { ∈ } − { ∈ int(I),Fk(x ) = x , we assume λ Dk Dk = 0, where λ is the Lebesgu−e measu}re. k[∈N∗(cid:16) +[ −(cid:17)! 6 Nicolas Ragache and Olivier Wintenberger When the joint distributions of the pairs (X ,X ) are not assumed abso- j k lutely continuous (and then [H4] is not satisfied), we shall instead assume: [H5] The dynamical system (Xn)n Z belongs to . ∈ F We consider in this paper linear estimators as in (1). The sequence of kernels K is assumed to satisfy the following assumptions. m (a) The support of K is a compact set with diameter O(1/m1/d); m (b) The functions x K (x,y) and x K (y,x) are Lipschitz functions m m 7→ 7→ with Lipschitz constant O m1+1/d ; (c) For all x in the support of K , K (x,y)dy =1; m m (cid:0) (cid:1) (d) The bias of the estimator fˆn defiRned in (1) is of order m−nρ/d, uniformly on compact sets. sup E[fˆ (x)] f(x) =O(m ρ/d). (10) n − −n x M k k≤ (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) 2.3 Results In all our results we consider kernels K and a density estimator of the m form (1) such that assumptions (a), (b), (c) and (d) hold. Theorem 1 (Lq-convergence). Geometric case. Under Assumptions [H4] or [H5] and [H1] or [H1’], the sequence m can be chosen such that inequality (2) holds for all 0< q < n + . Riem∞annian case. Under the assumptions [H4] or [H5], if additionally [H2]holdswitha>max(1+2/d+(d+1)/ρ,2+1/d)(η-dependence), • or [H2’] holds with a>1+2/d+1/ρ (φ˜-dependence), • then the sequence m can be chosen such that inequality (2) holds for all n 0<q q =2 (a 1)/2 . 0 ≤ ⌈ − ⌉ Theorem 2 (Uniform rates, geometric decays). For any M >0, under Assumptions [H4] or [H5] and [H1] or [H1’] we have, for all 0 < q < + , ∞ and for a suitable choice of the sequence m , n qρ/(d+2ρ) log2(b+1)/b(n) E sup fˆ (x) f(x)q = O , n x M| − | n ! k k≤ ρ/(d+2ρ) log2(b+1)/b(n) sup fˆ (x) f(x) = O . n a.s. x M| − | n ! k k≤ Convergence rates for density estimators of weakly dependenttime series 7 Theorem 3 (Uniform rates, Riemannian decays). For any M >0, un- der Assumptions [H4] or [H5], [H2] or [H2’] with a 4 and ρ > 2d, for ≥ q =2 (a 1)/2 and q q , the sequence m can be chosen such that 0 0 n ⌈ − ⌉ ≤ E sup fˆn(x) f(x)q =O n−d+2ρ+2qdρ/(q0+d) , | − | x M k k≤ (cid:16) (cid:17) or such that ρ logq0+d(n) d(q0+2)+ρ(q0+d) sup fˆ (x) f(x) = O . x M| n − | a.s. nq0−2 ! k k≤ Remarks. Theorem1showsthattheoptimalconvergencerateof(2)stillholdsinthe • weak dependence context. In the Riemannian case, when a 4, the con- ≥ ditions are satisfied if the density function f is sufficient regular, namely, if ρ>d+1. The loss with respect to the i.i.d. case in the uniform convergence rates • (Theorems 2 and 3) is due to the fact that the probability inequalities for dependent observations are not as good as Bernstein’s inequality for i.i.d.randomvariables(Bernsteininequalitiesinweakdependencecontext are proved in [KN05]). The convergence rates depend on the decay of the weakdependencecoefficients.Thisisincontrasttothecaseofindependent observations. In Theorem2 the loss is a power of the logarithmof the number of obser- • vations.Letusremarkthatthislossisreducedwhenbtendstoinfinity.In the case of η-dependence and geometric decreasing, the same result is in [DL99]for the specialcaseb=1.In the frameworkofφ˜-dependence,The- orem 2 seems to provide the first result on uniform rates of convergence for density estimators. In Theorem 3, the rate of convergence in the mean is better than the • almostsureratefortechnicalreasons.Contraryto the geometriccase,the loss is no longer logarithmic but is a power of n. The rate gets closer to the optimal rate as q , or equivalently a . 0 →∞ →∞ These results are new under the assumption of Riemannian decay of the • weakdependencecoefficients.Theconditiononaissimilartothecondition onβ in[AD03].Eveniftheratesarebetterthanin[DL01],thereisahuge loss with respect to the mixing case. It would be interesting to know the minimax rates of convergence in this framework. 3 Models, applications and extensions The class of weak dependent processes is very large. We apply our results to three examples:two-sided moving averages, bilinear modelsand ex- panding maps.Thefirsttwowillbehandledwiththehelpofthecoefficients η, the third one with the coefficients φ˜. 8 Nicolas Ragache and Olivier Wintenberger 3.1 Examples of η-dependent time series. It is of course possible to define η-dependent randomfields (see [DDLLLP04] for further details); for simplicity, we only consider processes indexed by Z. Definition 4 (Bernoulli shifts). Let H :RZ R be a measurable function. ABernoullishiftisdefinedasXn =H(ξn i,i →Z)where(ξi)i Z isasequence − ∈ ∈ of i.i.d random variables called the innovation process. Inordertoobtainaboundforthecoefficients η ,weintroducethefollowing r { } regularity condition on H. There exists a sequence δ such that r { } supE H(ξ ,j Z) H ξ ,j Z δ , i j i j j<r r i Z − ∈ − − 1| | ∈ ≤ ∈ (cid:12) (cid:0) (cid:1)(cid:12) (cid:12) (cid:12) Bernoulli shifts are η-dependent with η = 2δ (see [DL99]). In the fol- r r/2 lowing, we consider two special cases of Bernoulli shifts. 1. Non causal linear processes.A realvaluedsequence (ai)i Z suchthat a2 < andtheinnovationprocess ξ defineanon-ca∈usallinear j Z j ∞ { n} pPro∈cess Xn = +∞aiξn i. If we control a moment of the innovations, thelinearproces−s∞(Xn)i−sη-dependent.Thesequence ηr r Nisdirectly P { } ∈ linkedtothecoefficients ai i Zandvarioustypesofdecaymayoccur.We consider only Riemannia{n d}e∈cays a = i A with A 5 since results i − O ≥ for geometric decays are already known.(cid:0)Here(cid:1)ηr = O i>r/2ai = || O(r1−A)and[H2]holds.Furthermore,weassumethatthe(cid:16)sPequence(ξi(cid:17))i Z is i.i.d. and satisfies the condition Eeiuξ0 C(1+ u)−δ, for all u ∈R | | ≤ | | ∈ and for some δ > 0 and C < . Then, the densities f and f exist for j,k ∞ allj =k andtheyareuniformlybounded(seetheproofinthecausalcase 6 inLemma 1 andLemma2 in[GKS96]); hence [H4]holds.If the density f ofX is ρ-regularwith ρ>2,ourestimatorsconvergeto the density with 0 the rates: n ρ/(2ρ+1) in Lq-norm (q 4) at each point x, − • n ρ/(2ρ+3/2) in Lq-norm (≤q 4) uniformly on an interval, − • ≤ log4(n)/n ρ/(4ρ+3) almost surely on an interval. • In the first case, the rate we obtain is the same as in the i.i.d. case. For (cid:0) (cid:1) such linear models, the density estimator also satisfies the Central Limit Theorem (see [HLT01] and [Ded98]). 2. Bilinear model. The process X is a bilinear model if there exist t { } two sequences (ai)i N∗ and (bi)i N∗ of real numbers and real numbers a ∈ ∈ and b such that: ∞ ∞ X =ξ a+ a X +b+ b X . (11) t t j t j j t j − − j=1 j=1 X X Convergence rates for density estimators of weakly dependenttime series 9 Squared ARCH( ) or GARCH(p,q) processes satisfy such an equa- ∞ tion, with b=b =0 for all j 1. Define j ≥ ∞ ∞ λ= ξ a + b . 0 p j j k k j=1 j=1 X X If λ < 1, then the equation (11) has a strictly stationary solution in Lp (see [DMR05]). This solution is a Bernoulli shift for which we have the behavior of the coefficient η: η = O e λr for some λ > 0 if there exists an integer N such that r − • a =b =0 for i N. i i (cid:0) (cid:1) ≥ η =O(e λ√r) for some λ>0 if a =O(e Ai) andb =O(e Bi) with r − i − i − • A>0 and B >0. ηr =O( r/log(r) −λ)forsomeλ>0ifai =O(i−A)andbi =O(i−B) • { } with A>1 and B >1. Letusassumethatthei.i.d.sequence ξ hasamarginaldensityf C , t ξ ρ { } ∈ forsomeρ>2.ThedensityofX conditionallytothepastcanbewritten t as a function of f . We then check recursively that the common density ξ ofX forallt,sayf,alsobelongstoC .Furthermore,theregularityoff t ρ ξ ensures that f and the joint densities f for all j = k are bounded (see j,k 6 [DMR05]) and [H4] holds. The assumptions of Theorem 1 are satisfied, and the estimator fˆ achieves the minimax bound (2) if either: n There exists an integer N such that a =b =0 for i N; i i • ≥ There exist A > 0 and B > 0 such that a = O(e Ai) and b = i − i • O(e Bi); − There exist A 4 and B 5 such that a = O(i A) and b = i − i • ≥ ≥ O(i B). Then, this optimal bound holds only for 2 q < q(A,B) − ≤ where q(A,B)=2[((B 1) A)/2]. − ∧ NotefinallythattheratesofuniformconvergenceprovidedbyTheorems2 and 3 are sub-optimal. 3.2 Examples of φ˜-dependent time series. Let us introduce an important class of dynamical systems: Example 1. (Ti = Fi(T0))i N is an expanding map or equivalently F is a ∈ Lasota-Yorke function if it satisfies the three following criteria. (Regularity) There exists a grid 0 = a a a = 1 such as F 0 1 n 1 • ≤ ···≤ ∈ C and F (x) >0 on ]a ,a [ for each i=1,...,n. ′ i 1 i (Exp|ansivi|ty) Let I −be the set on which (Fn) is defined. There exists n ′ • A>0 and s>1 such that inf (Fn) >Asn. x∈In| ′| (Topologicalmixing)ForanynonemptyopensetsU,V,thereexistsn 1 0 • such as F n(U) V =∅ for all n n . ≥ − 0 ∩ 6 ≥ 10 Nicolas Ragache and Olivier Wintenberger Examples of Markov chains X = G(X ,ǫ ) associated to an expand- n n+1 n ing map T belongingto aregivenin[BGR00]and[DP04].Thesimplest n { } F one is X = (X +ǫ )/2 where the ǫ follows a binomial law and X is k k 1 k k 0 − uniformly distributed on [0,1]. We easily check that F(x) = 2x mod1, the transformation of the associated dynamical system T , satisfies all the as- n sumptions such as T is an expanding map belonging to . n F The coefficients of φ˜-dependence of such a Markov chain satisfy φ˜(r) = O(e ar) for some a > 0 (see [DP04]). Theorems 1 and 2 give the Lq rate − n ρ/(2ρ+1), the uniform Lq rate and the almost sure rate log4(n)/n ρ/(2ρ+1) − of the estimators of the density of µ . 0 (cid:0) (cid:1) 3.3 Sampled process Since we do not assume stationarity of the observed process, the following observation scheme is covered by our results. Let (xn)n Z be a stationary ∈ processwhosemarginaldistributionisabsolutelycontinuous,let(hn)n Z bea ∈ sequenceofmonotonefunctionsandconsiderthesampledprocess X i,n 1 i n { } ≤≤ defined by X =x . The dependence coefficients of the sampled process i,n hn(i) maydecaytozerofasterthantheunderlyingunobervedprocess.Forinstance, ifthedependencecoefficientsoftheprocess(xn)n Z haveaRiemanniandecay, those of the sampled process x with h (i∈) = i2n decay geometrically { hn(i)} n fast. The observationscheme is thus a crucial factor that determines the rate of convergence of density estimators. 3.4 Density estimators and bias In this section, we provide examples of kernels K and smoothness assump- m tions on the density f such that assumptions (a), (b), (c) and (d) of subsec- tion 2.2 are satisfied. Kernel estimators The kernel estimator associated to the bandwidth pa- rameter m is defined by: n n m fˆ (x)= n K m1/d(x X ) . n n n − i Xi=1 (cid:16) (cid:17) WebrieflyrecalltheclassicalanalysisforthedeterministicpartR inthiscase n (see [Tsy04]). Since the sequence X has a constant marginal distribution, n { } we have E[fˆ (x)] = f (x) with f (x) = K(s)f x s/m1/d ds. Let us n n n D − n assume that K is a Lipschitz function comRpactly su(cid:16)pported in D(cid:17) Rd. For ρ>0, let K satisfy, for all j =j + +j with (j ,...,j ) Nd:⊂ 1 d 1 d ··· ∈ 1 ifj =0, xj1 xjdK(x ,...,x )dx dx = 0 for j 1,..., ρ 1 1 , 1 ··· d 1 d 1··· d ∈{ ⌈ − ⌉− } Z =0 if j = ρ 1 . 6 ⌈ − ⌉