Limit Theorems in Mallows Distance for Processes with Gibssian Dependence L. Cioletti C. C. Y. Dorea Departamento de Matem´atica - UnB Departamento de Matem´atica - UnB 70910-900, Bras´ılia, Brazil 70910-900, Bras´ılia, Brazil [email protected] [email protected] 7 R. Vila 1 Departamentos de Matem´atica e Estat´ıstica - UnB 0 2 70910-900, Bras´ılia, Brazil n [email protected] a J 3 1 ] Abstract R Inthispaper,weexploretheconnectionbetweenconvergenceindistributionand P . Mallows distance in the context of positively associated random variables. Our h t resultsextendsomeknowninvarianceprinciplesforsequenceswithFKGproperty. a Applications for processes with Gibbssian dependence structures are included. m [ 1 v 1 Introduction 7 4 7 Positive association for a random vector (X ,X ,··· ,X ) requires that 1 2 n 3 0 cov(cid:0)g(X ,··· ,X ),h(X ,··· ,X )(cid:1) (cid:62) 0, (1) . 1 n 1 n 1 0 whenever g and h are two real-valued coordinatewise nondecreasing functions and when- 7 1 everthecovarianceexists. Thisdependencestructurehasbeenwidelyusedinthestudies : v of reliability theory, see Barlow and Proschan [2]. The basic concept actually appeared i X in Harris [15] in the context of percolation models and it was subsequently generalized r to a large class of Statistical Mechanics models in the seminal work by Fortuin, Kaste- a leyn and Ginibre [11]; in the Statistical Mechanics literature this notion was developed independently from reliability theory, variables are said to satisfy the FKG inequality if they are associated (see, e.g., [11, 18]). In fact, we say a process XXX ≡ {X : i ∈ Z} i satisfies the FKG property if (1) holds for any finite subvector (X ,X ,··· ,X ). i1 i2 in 2010 Mathematics Subject Classification: 60B10, 60F05, 60G10, 60K35. Keywords: Mallows Distance; Positive Association; Gibbs Measures. 1 We will make use of the Mallows distance to analyse the asymptotic behavior of positively associated processes. Mallows distance d (F,G), also known as Wasserstein or r Kantorovich distance, measures the discrepancy betweem two (cumulative) distribution function (d.f.) F and G. The upper Fr`echet bound H(x,y) = F(x) ∧ G(y) illustrates d d d its connection with positive associativity. Let X = F and Y = G, being = equality in distribution. Then from the classical Hoeffding’s formula we have, (cid:90) (cid:0) (cid:1) cov(X,Y) = H(x,y)−F(x)G(y) dxdy. R2 On the other hand, the representation theorem from Dorea and Ferreira [7] allow us to write, (cid:90) d (F,G) = |x−y|rdH(x,y), if r (cid:62) 1. r R2 Besides an extensive applications to a wide variety of fields, this metric has been success- fully used to derive Central Limit Theorem (CLT) type results for heavy-tailed stable distributions (see, e.g., Johnson and Samworth [17] or Dorea and Oliveira [8]). A key property to achieve these results is provided by its close relation to convergence in dis- tribution ( →d ), as established by Bickel and Freedman [3], (cid:90) (cid:90) d (F ,G) → 0 ⇔ F →d G and |x|rdF (x) → |x|rdG(x). (2) r n n n n n R R For stabilized partial sum of positively associated random variables (r.v.’s) we will show convergence in Mallows distance and hence the asymptotic normality. Theorems 4 and 5 generalize Newman and Wright’s [22] CLT for stationary processes. By making use of asymptotic normality we strengthen to Mallows d convergence. As for the non- r stationary case, our Theorem 6 extends Cox and Grimmett’s [6] results. Its proof is conceptually different from the Cox and Grimmett’s proof and, in particular, we shall mention that the characteristic functions does not play a prominent role in our proof. As application we exhibit the d convergence for ferromagnetic Ising type models with r discrete and continuous spins. The results apply to both short and long-range potentials andalsotonon-translationinvariantsystems. Forfiniterangepotentialstheconvergence in the Mallows distance of stabilized sums are obtained for any r ≥ 2. To prove similar results for long-range potentials, near to the critical temperature seems to be a very challenging problem. Here we are able to show that the convergence in the Mallows distance still occurs but some strong restrictions on the order r have to be placed. 2 Positive Association and Mallows Distance Let Z be the set of integers. We will be considering processes XXX ≡ {X : j ∈ Z} defined j on some probability space (Ω,F,P) and that are positively associated. 2 Definition 1. A process XXX is said to be positive associated if, given two coordinatewise non-decreasing functions f,g : Rn → R and j ,··· ,j ∈ Z, we have 1 n cov(cid:0)f(X ,...,X ),g(X ,...,X )(cid:1) (cid:62) 0, j1 jn j1 jn provided the covariance exists. We say that a function f : Rn → R is non-decreasing if f(x ,...,x ) (cid:54) f(y ,...,y ) 1 n 1 n whenever x (cid:54) y for all j = 1,...,n. For the sake of notation, if a different probability j j measure µ is to be associated with the measurable space (Ω,F) we shall write cov and µ similarly E for the expectation. Below we gather few properties needed for our proofs, µ see Newman and Wright [22] or Oliveira [23]. Lemma 1. Let XXX be positive associated. (a) For m ≥ 1, if f : Rmj → R are coordinatewise non-decreasing functions then j j {f (X ,··· ,X ) : i ,··· ,i ∈ Z} is also positive associated. j i1 imj 1 mj (b) If all X ’s possess finite second moment then the characteristic functions φ (r ) = j j j n (cid:88) E(exp{ir X }) and φ(r ,··· ,r ) = E(exp{i r X }) satisfy, j j 1 n j j j=1 (cid:12) (cid:12) (cid:12) (cid:89)n (cid:12) 1 (cid:88) (cid:12)φ(r ,··· ,r )− φ (r )(cid:12) (cid:54) |r r |cov(X ,X ). (cid:12) 1 n j j (cid:12) 2 j k j k (cid:12) (cid:12) j=1 1(cid:54)j(cid:54)=k(cid:54)n Definition 2. (Mallows [20]) For r > 0, the Mallows r-distance between d.f.’s F and G is given by d (F,G) = inf (cid:8)E(|X −Y|r)(cid:9)1/r, X =d F, Y =d G (3) r (X,Y) where the infimum is taken over all random vectors (X,Y) with marginal distributions F and G, respectively. For r ≥ 1 the Mallows distance represents a metric on the space of d.f.’s and bears a close connection with weak convergence given by (2). Let (cid:90) (cid:8) (cid:9) L = F : F a d.f. , |x|rdF(x) < +∞ . r R Theorem 1. (Bickel and Freedman [3]) Let r (cid:62) 1 and let the d.f.’s G and {F } n n≥1 in L . Then d (F ,F) → 0 if and only if (2) holds or equivalently, for every bounded r r n n continuous function g : R → R we have, (cid:90) (cid:90) (cid:90) (cid:90) g(x)dF (x) → g(x)dG(x) and |x|rdF (x) → |x|rdG(x). n n n n R R R R d d d Assume X = F, Y = G and (X,Y) = H, where H(x,y) = F(x) ∧ G(y). Then the following representation result will be helpful to evaluate d (F,G). r 3 Theorem 2. (Dorea and Ferreira [7]) For r (cid:62) 1 we have (cid:90) 1 dr(F,G) = E(cid:8)|F−1(U)−G−1(U)|r(cid:9) = |F−1(u)−G−1(u)|rdu r 0 (cid:90) = E (cid:8)|X −Y|r(cid:9) = |x−y|rdH(x,y), H R2 where U is uniformly distributed on the interval (0,1) and F−1(u) = inf{x ∈ R : F(x) (cid:62) u}, 0 < u < 1, denote the generalized inverse. 3 Asymptotics for Positive Associated and Stationary Sequences Let XXX ≡ {X : j ∈ Z} be a stationary sequence in the sense that for all m (cid:62) 1 and j l ∈ Z, d (X ,··· ,X ) = (X ,··· ,X ). i1 im i1+l im+l For stochastic process XXX it is natural, when dealing with limit theorems, to consider blocks of n consecutive variables, n k+n−1 (cid:88) (cid:88) S = X and S = X . n j [k,k+n) j j=1 j=k Clearly, under stationary assumption we have S =d S for all k ∈ Z. Our first [k,k+n) n result follows from Newman’s CLT: Theorem 3. (Newman [21]) Let XXX be a stationary and positive associated process. Assume that the variance is finite and strictly positive, 0 < varX < +∞, and that 1 (cid:88) σ2 ≡ var(X )+2 cov(X ,X ) < +∞. (4) 1 1 j j(cid:62)2 Then S −nE(X ) [k,k+n)√ 1 →d N(0,1), ∀k ∈ Z. (5) nσ It is worth mentioning that the positive associativity and stationarity assures that (cid:88) σ2 = χ ≡ sup cov(X ,X ) k j k∈Z j∈Z is well-defined and the latter is known as the susceptibility associated to XXX. 4 Define S −nE(X ) [k,k+n) 1 d V = √ = F (6) [k,k+n) [k,k+n) nσ and let Φ be the d.f. of N(0,1). Theorem 4. Under the hypotheses of Theorem 3 we have for 0 < r (cid:54) 2 lim d (F ,Φ) = 0. r [k,k+n) n→∞ Proof. First, note that by stationarity we have n (cid:88) var(S ) = var(S ) = nvar(X )+2(n−1) cov(X ,X ). [k,k+n) n 1 1 j j=2 var(S ) From (4) it follows that [k,k+n) → σ2. Thus n n S −nE(X ) E(V2 ) = E(cid:8)(cid:0) [k,k+n)√ 1 (cid:1)2(cid:9) → 1 = E(Z2), (7) [k,k+n) nσ n d where Z = Φ. Clearly V ∈ L . Since the convergence (5) holds we conclude from [k,k+n) 2 Theorem 1 that d (F ,Φ) → 0. 2 [k,k+n) n Next, to extend the convergence for 0 < r < 2 we make use of the representation Theorem 2. There exists a r.v. Z∗ =d Φ such that the joint distribution of (V ,Z∗) [k,k+n) is given by H(x,y) = F (x)∧Φ(y) and [k,k+n) d2(F ,Φ) = E(cid:8)(V −Z∗)2(cid:9) → 0. 2 [k,k+n) [k,k+n) n By (3) and the Liapounov’s inequality we have for 0 < r (cid:54) 2 dr(F ,Φ) (cid:54) E(cid:8)|V −Z∗|r(cid:9) → 0. r [k,k+n) [k,k+n) n To derive convergence for higher order d , further moment conditions on X ’s will be r j required. For k ∈ Z let u (·) denote the Cox-Grimmet coefficient defined by k (cid:88) u (n) = cov(X ,X ), n (cid:62) 0. (8) k k j j∈Z:|k−j|(cid:62)n (cid:80) Since we are assuming stationarity we may take u(n) = u (n) = cov(X ,X ). k j∈Z:|j|(cid:62)n 0 j Note that, by Lemma 1 the process {X −E(X ) : j ∈ Z} is also stationary and positive j j associated. This allow us to state a moment inequality from Birkel [5] adapted for our needs. 5 Lemma 2. Let 2 < r < r∗ and let XXX be a stationary and positive associated process. r∗(r−2) Assume that E{|X |r∗} < +∞ and that for some constants C > 0 and θ (cid:62) 1 1 2(r∗ −r) we have u(n) (cid:54) C n−θ. Then there exist a constant C = C (r,r∗) such that 1 2 2 supE(cid:8)|S −nE(X )|r(cid:9) (cid:54) C nr/2. (9) [k,k+n) 1 2 k∈Z Note that, under Theorem 4, we have the above conditions satisfied for r = 2. Indeed, by (4) we have u(n) (cid:54) C and (9) follows from (7). 1 Theorem 5. Let 2 < r < r∗ and assume that XXX satisfies the hypotheses of Lemma 2 r∗(r−2) with θ > . Then if σ2, given by (4), is such that 0 < σ2 < +∞ we have 2(r∗ −r) d (F ,Φ) → 0 and E(|V |r) → E(|Z|r), r [k,k+n) n [k,k+n) n d d where F and V are defined by (6) and Z = Φ = N(0,1). [k,k+n) [k,k+n) Proof. (i) Since r∗ > 2, by Theorem 3 we have V →d Z. Next, we show that [k,k+n) V ∈ L and E{|V |r} → E{|Z|r}. (10) [k,k+n) r [k,k+n) n Then d (F ,Φ) → 0 follows immediately from (2). r [k,k+n) n (ii) For (10) we will show that supsupE(|V |r(cid:48)) < +∞ for some r < r(cid:48) < r∗. [k,k+n) n(cid:62)1 k∈Z Thus V ∈ L ⊂ L and the convergence of moments follows from the fact that [k,k+n) r(cid:48) r {|V |r} is uniformly integrable (cf. Billingsley [4], Theorem 5.4). [k,k+n) n(cid:62)1 r∗(r−2) Now let ψ(r) = . Then ψ(cid:48)(·) > 0 for r > 2. It follows that there exist 2(r∗ −r) 2r∗(1+θ) r(cid:48) > r such that θ > ψ(r(cid:48)). Just take r(cid:48) = . From Lemma 2 we have for 2θ+r∗ C = C (r(cid:48),r∗) > 0, 2 2 supE(cid:8)|V |r(cid:48)(cid:9) (cid:54) C nr(cid:48)/2. [k,k+n) 2 k∈Z It follows that, |V |r(cid:48) nr(cid:48)/2 C supE(cid:8) √[k,k+n) (cid:9) (cid:54) C √ = 2 < +∞. ( nσ)r(cid:48) 2( nσ)r(cid:48) σr(cid:48) k∈Z 4 The Non-Stationary Case When stationarity is relaxed a more refined treatment needs to be carried out. The basic idea is to subdivide the partial sum S = (cid:80)k+n−1X into blocks [k,k+n) j=k j S , S , ..., S , S (11) [k,k+ln) [k+ln,k+2ln) [k+(mn−1)ln,k+mnln) [k+mnln,k+n) 6 where the first m = [n/l ] (the largest integer contained in) blocks have size l . Note n n n that the last sum in (11): S , have at most n − m l terms, which is non- [k+mnln,k+n) n n trivial in case l is not a divisor of n. As will be shown, by suitably choosing l (see n n (18) ) and by assuming boundness conditions on Cox-Grimmet coefficient (8), the blocks can be made asymptotically independent. The following arguments suggest the required conditions. Let (cid:88)mn σ2 = var{S } and s2 = var{S }. [k,k+n) [k,k+n) mn [k+(j−1)ln,k+jln) j=1 From positive associativity we have cov(X ,X ) (cid:62) 0 and r s (cid:0) (cid:88) (cid:1) (cid:88) (cid:88) (cid:0) (cid:1) cov S , S = cov X ,X [k,k+ln) [k+(s−1)ln,k+sln) r s 1<s(cid:54)mn k(cid:54)r<k+lnk+ln<s(cid:54)mn (cid:54) (cid:88) (cid:88) cov(cid:0)X ,X (cid:1) r s k(cid:54)r<k+ln|s−r|(cid:62)k+ln−r (cid:88) = u (k +l −r). r n k(cid:54)r<k+ln The non-stationarity can be bypassed if u (·) can be bounded by a stationary sequence r v : Z+ → R such that u (·) (cid:54) v(·). In this case, the same arguments show that for r < s r cov(cid:0)S , (cid:88) S (cid:1) (cid:54) (cid:88)ln v(j) [k+(r−1)ln,k+rln) [k+(s−1)ln,k+sln) r<s(cid:54)mn j=1 and (cid:88) cov(cid:0)S ,S (cid:1) (cid:54) 2m (cid:88)ln v(j). (12) [k+(r−1)ln,k+rln) [k+(s−1)ln,k+sln) n k(cid:54)r(cid:54)=s<k+ln j=1 It follows that, (cid:88)ln s2 (cid:54) σ2 (cid:54) s2 +2m v(j). (13) mn [k,k+mnln) mn n j=1 (cid:8)(cid:88)mn (cid:9) Note that σ2 = var S . Thus, we get “nearly independence” [k,k+mnln) [k+(j−1)ln,k+jln) j=1 σ2 ≈ s2 provided the last term can be properly controlled. This leads to: [k,k+mnln) mn Hypothesis 1. Let XXX be a positive associated process satisfying : (a) there exists a constant c > 0 such that var{X } > c; j (b) there exists a function v : Z+ → R such that (cid:88) v(n) < ∞ and u (n) (cid:54) v(n), ∀j ∈ Z, ∀n (cid:62) 0. (14) j n(cid:62)0 7 Remark 1. For the stationary case condition (a) is a necessary assumption, or else, all the variables would be constants. A weaker condition k+n 1 (cid:88) lim var{X } > c, ∀k ∈ Z j n→∞ n j=k (cid:80) could have been assumed. Also, condition (b) could have been replaced by : u (n) < n(cid:62)0 j ∞ uniformly on j ∈ Z. Lemma 3. Assume that Hypothesis 1 holds. Then for k < k , 1 2 (k −k )c (cid:54) σ2 (cid:54) (k −k )v(0) and m l c (cid:54) s2 (cid:54) m l v(0). (15) 2 1 [k1,k2) 2 1 n n mn n n n Moreover, if l → ∞ and → ∞ then n n n l n σ2 σ2 [k,k+n) → 1 and [k,k+mnln) → 1. (16) σ2 n s2 n [k,k+mnln) mn Proof. (i) Note that, from the positivity of the covariances we have (cid:88) σ2 = cov(X ,X ) [k1,k2) r s k1(cid:54)r,s<k2 k(cid:88)2−1 (cid:88) (cid:54) cov(X ,X ) (cid:54) (k −k )v(0). r s 2 1 r=k1s∈Z:|r−s|(cid:62)0 On the other hand, k(cid:88)2−1 (cid:88) σ2 = var{X }+ cov(X ,X ) [k1,k2) r r s r=k1 k1(cid:54)r(cid:54)=s<k2 (cid:62) (k −k )c. 2 1 It follows that for j = 1,··· ,m we have, n l c (cid:54) σ2 (cid:54) l v(0) n [k+(j−1)ln,k+jln) n and (cid:88)mn m l c (cid:54) s2 = σ2 (cid:54) m l v(0). n n mn [k+(j−1)ln,k+jln) n n j=1 (ii) The positive association also assures that s2 (cid:54) σ2 (cid:54) σ2 mn [k,k+mnln) [k,k+n) and σ2 (cid:54) σ2 +σ2 . [k,k+n) [k,k+mnln) [k+mnln,k+n) 8 σ2 (n−m l )v(0) n From (15) we have [k+mnln,k+n) (cid:54) n n . Since → 1 we get σ2 m l c m l n [k,k+mnln) n n n n σ2 (n−m l )v(0) 1 (cid:54) [k,k+n) (cid:54) 1+ n n → 1. σ2 m l c n [k,k+mnln) n n Similarly, from (13) and (14) we have σ2 m (cid:80)ln v(j) 1 (cid:54) [k,k+mnln) (cid:54) 1+2 n j=1 → 1. s2 m l c n mn n n To handle the weak convergence in the non-stationary setup we will make use of the Berry-Esseen inequality (cf. Feller, vol II, [10] : if ξ ,ξ ,... are zero-mean and 1 2 independent r.v.’s such that E{|ξ |3} < +∞ for j = 1,2,.... Then j (cid:12) (cid:12) (cid:12) (cid:80)n ξ (cid:12) (cid:80)n E(|ξ |3) suxp(cid:12)(cid:12)(cid:12)(cid:12)P(cid:113)var{j(cid:80)=1nj=j1ξj} (cid:54) x−Φ(x)(cid:12)(cid:12)(cid:12)(cid:12) (cid:54) 6(cid:16)(cid:80)nj=j=11var(ξjj)(cid:17)3/2. (17) This will require a restrictier choice of the block size l , n n l3 l → ∞, → ∞ and n → 0. (18) n n n n l m n n Just take, for example, l = nδ with δ < 1/4. n Theorem 6. Assume XXX satisfies Hypothesis 1 and that for some constant C we have ∗ E{|X |3} < C < +∞ for all j ∈ Z. Then for 0 < r (cid:54) 2 we have j ∗ S −E(cid:0)S (cid:1) (cid:0) (cid:1) d [k,k+n) [k,k+n) d d F ,Φ → 0, F = and Φ = N(0,1). (19) r [k,k+n) n [k,k+n) σ [k,k+n) Proof. (i) Without loss of generality we may assume E{X } = 0 for all j. If not, let j X(cid:48) = X −E{X } then the process {X(cid:48) : j ∈ Z} satisfies the same hypotheses. Consider j j j j the blocks (11) and assume that the block size l satisfies (18). We will show that n S d (cid:0)F ,Φ(cid:1) → 0 with F =d [k,k+mnln). (20) 2 mn n mn s mn Assuming (20) holds then, by Theorem 2, there exists Z∗ =d Φ such that S d2(cid:0)F ,Φ(cid:1) = E(cid:8)(cid:0) [k,k+mnln) −Z∗(cid:1)2(cid:9). 2 mn s mn 9 From the definition of Mallows distance (3) we have S d2(cid:0)F ,Φ(cid:1) (cid:54) E(cid:8)(cid:0) [k,k+n) −Z∗(cid:1)2(cid:9). 2 [k,k+n) σ [k,k+n) (cid:0) (cid:1) Using Minkowski’s inequality we have d F ,Φ → 0 provided 2 [k,k+n) n S S A = E(cid:8)(cid:0) [k,k+mnln) − [k,k+mnln)(cid:1)2(cid:9) → 0 (21) n n σ s [k,k+mnln) mn and S S B = E(cid:8)(cid:0) [k,k+n) − [k,k+mnln)(cid:1)2(cid:9) → 0. (22) n n σ σ [k,k+n) [k,k+mnln) As in the proof of Theorem 4, the Liapounov’s inequality completes the proof for 0 < r < 2. S (ii) To show (21) note that E(cid:8)(cid:0) [k,k+mnln)(cid:1)2(cid:9) = 1. By (16) we have σ [k,k+mnln) S σ σ A = E(cid:8)(cid:0) [k,k+mnln)(cid:1)2(cid:0)1− [k,k+mnln)(cid:1)2(cid:9) = (cid:0)1− [k,k+mnln)(cid:1)2 → 0. n n σ s s [k,k+mnln) mn mn For (22) write S = S − S . Same arguments as above shows [k,k+mnln) [k,k+n) [k+mnln,k+n) that, S S E(cid:8)(cid:0) [k,k+n) − [k,k+n) (cid:1)2(cid:9) → 0. n σ σ [k,k+n) [k,k+mnln) n Since → 1 we have by (15) n m l n n S (n−m l )v(0) E(cid:8)(cid:0) [k+mnln,k+n)(cid:1)2(cid:9) (cid:54) n n → 0. n σ m l c [k,k+mnln) n n And B → 0. n n (iii) Since E{|X |3} < C < +∞ the results from Lemma 1 can be applied. Taking j ∗ t r = ··· = r = we get 1 mn s mn (cid:12) (cid:12) (cid:12)(cid:12)E(cid:0)exp(cid:8)i t (cid:88)mn S (cid:9)(cid:1)−(cid:89)mn E(cid:0)exp(cid:8)i t S (cid:9)(cid:1)(cid:12)(cid:12) (cid:54) A(t,k,m ) (cid:12) s [k+(j−1)ln,k+jln) s [k+(j−1)ln,k+jln) (cid:12) n (cid:12) mn j=1 j=1 mn (cid:12) where t2 (cid:88) A(t,k,m ) = cov(S ,S ) n 2s2 [k+(r−1)ln,k+rln) [k+(s−1)ln,k+sln) mn 1(cid:54)r(cid:54)=s(cid:54)mn t2 (cid:88)mn (cid:88) = cov(S , S ) s2 [k+(r−1)ln,k+rln) [k+(s−1)ln,k+sln) mn r=1 r<s(cid:54)mn t2 (cid:88)ln t2m (cid:88)ln (cid:54) m v(j) (cid:54) n v(j) → 0. s2 n m l c n mn j=1 n n j=1 10