ON RATES OF CONVERGENCE IN THE CURIE-WEISS-POTTS MODEL WITH AN EXTERNAL FIELD 1 1 0 2 Peter Eichelsbacher1 and Bastian Martschink2 p e S 9 1 R] Abstract: Stein’s Method provides a way of bounding the distance of a probability P distribution to a target distribution. In the present paper we obtain rates of conver- . h gence for limit theorems via Stein’s Method of exchangeable pairs in the context of t a the Curie-Weiss-Potts model. Our results include Kolmogorov bounds for multivari- m ate normal approximation in the whole domain β 0 and h 0, as well as rates for [ ≥ ≥ non-Gaussian approximations at the extremity of the critical line of the model. 2 v 9 1 AMS 2000 Subject Classification: Primary 60F05; Secondary 82B20, 82B26. 3 0 Key words: Stein’s method, exchangeable pairs, Curie-Weiss Potts models, critical tempera- . 1 1 ture. 0 1 : v i X r a 1 Ruhr-Universit¨at Bochum, Fakulta¨t fu¨r Mathematik, NA 3/66, D-44780 Bochum, Germany, [email protected] 2 Ruhr-Universit¨at Bochum, Fakulta¨t fu¨r Mathematik, NA 3/68, D-44780 Bochum, Germany, [email protected] Both authors have been supported by Deutsche Forschungsgemeinschaftvia SFB/TR 12. 2 PETER EICHELSBACHER ANDBASTIAN MARTSCHINK 1. Introduction 1.1 The Curie-Weiss-Potts model The Curie-Weiss-Potts model is a mean field approxi- mation of the well-known Potts model, a famous model in equilibrium statistical mechanics. It is defined in terms of a mean interaction averaged over all sites in the model, more precisely, by sequences of probability measures of n spin random variables that may occupy one of q different states. For q = 2 the model reduces to the simpler Curie-Weiss model. Two ways in which the Curie-Weiss-Potts model approximates the Potts model are discussed in [18] and [17]. Probability limit theorems for the Curie-Weiss-Potts model are proved first in [13]. One reason of interest in this model is its explicit exhibition of a number of properties of real sub- stances, such as multiple phase transitions, metastable states and others. In comparison to the Curie-Weiss model it has a more intricate phase transition structure because for example at the critical inverse temperature it has not a second-order phase transition like the Curie-Weiss model but a first order transition. In order to carry out the analysis of the model, detailed information about the structure of the set of canonical equilibrium macro-states is required. The probability observing a configuration σ 1,...,q n in an exterior field h equals ∈ { } n 1 β P (σ) = exp δ +h δ (1.1) β,h,n Z 2n σi,σj σi,1 β,h,n ! 1 i j n i=1 ≤X≤ ≤ X where δ is the Kronecker symbol, β := T 1 is the inverse temperature and Z is the normal- − β,h,n ization constant known as the partition function. More precisely: n Z = exp β δ +h δ . β,h,n 2n σi,σj σi,1 σ 1,...,q n 1 i j n i=1 ! ∈{ } ≤≤ ≤ P P P For β small, the spin random variables are weakly dependent while for β large they are strongly dependent. It was shown in [27] that at h = 0 the model undergoes a phase transition at the critical inverse temperature q , if q 2 β = ≤ (1.2) c 2 q 1 log(q 1) , if q > 2; − q 2 − − and that this transition is first order if q > 2. Our interest is in the limit distribution of the empirical vector of the spin variables n n N = (N ,...,N ) = δ ,..., δ (1.3) 1 q σi,1 σi,q ! i=1 i=1 X X which counts the number of spins of each color for a configuration σ. Note that the normalized empirical vector L := N/n belongs to the set of probability vectors n = x Rq : x + +x = 1 and x 0, i . 1 q i H { ∈ ··· ≥ ∀ } ON RATES OF CONVERGENCE IN THE CURIE-WEISS-POTTS MODEL 3 For q > 2 and β < β L satisfies the law of large numbers P L dν δ (dν) as c n β,0,n n ∈ ⇒ ν0 n , where ν = (1/q,...,1/q) Rq. For β > β the law of large numbers breaks down and → ∞ 0 ∈ c (cid:0) (cid:1) is replaced by the limit q 1 P L dν δ (dν), β,0,n n ∈ ⇒ q νi(β) i=1 (cid:0) (cid:1) X where ν (β),i = 1,...,q are q distinct probability vectors in Rq, distinct from ν . The first i 0 { } order phase transition is the fact that for i = 1,...,q lim ν (β) = ν , i 0 β→βc+ 6 see [13]. The case of non-zero external field h = 0 was considered in [2] and it turned out that 6 the first-order transition remains on a critical line. The line was computed explicitly in [3], see (1.4). In the present work we will obtain certain known probabilistic limit theorems for the Curie- Weiss-Potts model, especially for the empirical vector of the spin variables N, but at the same time we present rates of convergence for all the limit theorems. We consider the fluctuations of the empirical vector N around its typical value outside the critical line and we describe the fluctuations and rates of convergence at a extremity of the critical line. This extends previous results on the Curie-Weiss Potts model with no external field [13] as well as with external field [15]. The method of proof will be an application of Stein’s method of so called exchangeable pairs in the case of multivariate normal approximation as well as the application of Stein’s method in the case of non-Gaussian approximation. Stein’s method will be explained later. Weturntothedescriptionofthesetofcanonicalequilibriummacro-statesoftheCurie-Weiss- Potts model. These states are solutions of an unconstrained minimization problem involving probability vectors on Rq. The macro-states describe equilibrium configurations of the model in the thermodynamic limit n . For each i the i’th component of an equilibrium macro-state → ∞ gives the asymptotic relative frequency of spins taking the spin value i with i 1,...,q . ∈ { } We appeal to the theory of large deviations to define the canonical equilibrium macro-states. Sanov’s theorem states that with respect to the product measures P (ω) = 1/qn for ω n ∈ 1,...,q n the empirical vectors L satisfy a large deviations principle (LDP) on with speed n { } H n and rate function given by the relative entropy q I(x) = x log(qx ), x . i i ∈ H i=1 X We use the formal notation P (L dx) exp( nI(x)) (precise definition: [9]). The LDP for n n ∈ ≈ − L with respect to P can be proved as in [11]. Let n β,h,n q q β f (x) = x log(qx ) x2 hx , x . β,h i i − 2 i − 1 ∈ H i=1 i=1 X X 4 PETER EICHELSBACHER ANDBASTIAN MARTSCHINK Then P (L dx) exp( nJ (x)) with β,h,n n β,h ∈ ≈ − J (x) := f (x) inf f (x), β,h β,h β,h −x ∈H see also [8]. Now if J (ν) > 0, then ν has an exponentially small probability of being observed. β,h Hence the corresponding set of canonical equilibrium macro-states are naturally defined by := ν : ν minimizes f (ν) . β,h β,h E ∈ H Remark that the specific Gibbs fre(cid:8)e energy for the Curie-Weiss(cid:9)-Potts model is the quantity ψ(β,h) defined by the limit 1 βψ(β,h) = lim logZ . β,h,n − n n →∞ From the large deviations result it follows that βψ(β,h) = inf f (x). β,h − −x ∈H In the case h = 0 and q > 2, it is known since [27] (for detailed proofs see [8, Theorem 3.1]) that consists of one element for any 0 < β < β , where β is the critical inverse temperature β,0 c c E given in (1.2). For any β > β , the set consists of q elements and at β it consists of q + 1 c c elements. In the case with an external field h 0 the global minimizers of f can be described β,h ≥ as follows. In [3] the following critical line was computed: q 2 h := (β,h) : 0 h < h and h = log(q 1) β − (1.4) T 0 ≤ − − 2(q 1) (cid:26) − (cid:27) with extremities (β ,0) and (β ,h ), where c 0 0 q 1 q 2 β = 4 − and h = log(q 1) 2 − 0 0 q − − q ((β ,h ) were already determined in [2]). Now consider the parametrization 0 0 1+z 1 z 1 z x := , − ,..., − , z [ 1,1]. z 2 2(q 1) 2(q 1) ∈ − (cid:18) − − (cid:19) Depending on the parameters (β,h) the function f presents one or several global minimiz- β,h ers. The following statement summarizes the results of [27], [8] in the case h = 0 and of [3] for h > 0. Theorem 1.1. Let β,h 0. ≥ (1) If h > 0 and (β,h) / h , the function f has a unique global minimum point in . T β,h ∈ H This minimizer is analytic in β and h outside of h (β ,h ) . T 0 0 ∪{ } (2) If h > 0 and (β,h) h , the function f has two global minimum points in . More T β,h ∈ H precisely, for any z (0,(q 2)/q), the two global minimum points of f at ∈ − βz,hz q 1 1+z q 2 β = 2 − log and h = log(q 1) − β z z z zq 1 z − − 2(q 1) (cid:18) − (cid:19) − are the points x . z ± ON RATES OF CONVERGENCE IN THE CURIE-WEISS-POTTS MODEL 5 (3) If h = 0 and β < β , the unique global minimum point of f is (1/q,...,1/q). c β,0 (4) If h = 0 and β > β , there are q global minimum points of f , which all equal x up to c β,0 z a permutation of the coordinates for some z ((q 2)/q,1). ∈ − (5) If h = 0 and β = β , there are q +1 global minimum points of f : the symmetric one c β,0 (1/q,...,1/q) together with the permutations of q 1 1 1 − , ,..., . q q(q 1) q(q 1) (cid:18) − − (cid:19) Interesting enough, the very first results on probabilistic limit theorems ([12] for the Curie- Weissmodeland[13]fortheCurie-Weiss-Pottsmodel)usedthestructureoftheglobalminimum points of another function G . For the Curie-Weiss-Potts model with h = 0 this function is β,h given by q 1 G (u) := β u,u log eβui, u Rq. β,0 2 h i− ∈ i=1 X With convex duality one obtainsthe alternative representation ofthe specific Gibbs freeenergy: βψ(β,0) = minG (u)+logq. β,0 u Rq ∈ Actually f and G have the same global minimum points, see [18] for β = β . A proof β,0 β,0 c 6 of this result for any β > 0 can be found in [8, Theorem 3.1]. The main reason to use G β,0 instead of f is the usefulness of a representation of the distribution of L in terms of G , β,0 n β,h called Hubbard-Stratonovich transform (see [13, Lemma 3.2] and the proof of Lemma 4.5 in this paper). This is a famous tool since the work of Ellis and Newman [12]. For β > 0 and h real the global minimum points of f coincide with the global minimum points of the function β,h q 1 G (u) := β u,u log exp(βu +hδ ) , u Rq (1.5) β,h i i,1 2 h i− ∈ i=1 (cid:0)X (cid:1) (for a proof see [14, Theorem B.1]; or apply a general result on minimum points of certain functions related by convex duality, [8, Theorem A.1], see also [26]). Hence we know that all statements of Theorem 1.1 hold true for G . β,h Corollary 1.2. The statements in Theorem 1.1 for the global minimum points of f hold true β,h one to one for G , defined in (1.5). β,h The detour first describing the canonical equilibrium macro-states of the Curie-Weiss-Potts model using large deviation theory and second using convex duality has the following reason. Applying Stein’s method we will automatically meet the function G and the limit theorems β,h and the proof of certain rates of convergence will depend on the location of the global minimum points of G (as in [12], [13] and [14] and a lot of other papers). But for h > 0 only f and β,h β,h its minimizers were completely characterized in the literature, see Theorem 1.1. So we had to argue that we also know the phase diagram of G . β,h 6 PETER EICHELSBACHER ANDBASTIAN MARTSCHINK 1.2 Stein’s method of exchangeable pairs Starting with a bound for the distance between univariate random variables and the normal distribution Stein’s method was first published in [23] (1972). Being particularly powerful in the presence of both local dependence and weak global dependence his method proved to be very successful. In [24] Stein introduced his ex- changeable pair approach. At the heart of the method is a coupling of a random variable W with another random variable W such that (W,W ) is exchangeable, i.e. their joint distribu- ′ ′ tion is symmetric. Central in his approach is the fact that for all antisymmetric measurable functions F(x,y) we have E[F(W,W )] = 0 if the expectation exists. Stein proved further on ′ that a measure of proximity of W to normality may be provided by the exchangeable pair if W W is sufficiently small. He assumed the property that there is a number λ > 0 such that ′ − the expectation of W W with respect to W satisfies ′ − E[W W W] = λW. ′ − | − Heuristically, this condition can be understood as a linear regression condition: if (W,W ) were ′ bivariate normal with correlation ̺, then E(W W) = ̺W and the condition would be satisfied ′ | with λ = 1 ̺. Stein proved that for any uniformly Lipschitz function h, Eh(W) Eh(Z) − | − | ≤ δ h with Z denoting a standard normally distributed random variable and ′ k k 1 1 δ = 4E 1 E (W W)2 W + E W W 3. (1.6) ′ ′ − 2λ − | 2λ | − | (cid:12) (cid:12) (cid:12) (cid:0) (cid:1)(cid:12) Stein’s approach has been su(cid:12)ccessfully applied in man(cid:12)y models, see e.g. [24] or [25] and refer- (cid:12) (cid:12) ences therein. In [21], the range of application was extended by replacing the linear regression property by a weaker condition assuming that there is also a random variable R = R(W) such that E[W W W] = λW +R. ′ − | − While the approach has proved successful also in non-normal contexts (see [4],[6] and [10]) it remained restricted to the one-dimensional setting for a long time. The problem was that it was not obvious how to transfer the linearity condition into the multivariate case. However in [5] this issue was finally addressed. They extended the linearity condition to the multivariate setting such that, for all i 1,...,d , E[W W W] = λW for a fixed number λ, where ∈ { } i′ − i | − i now W = (W ,...,W ) and W = (W ,...,W ) are identically distributed d-vectors with 1 d ′ 1′ d′ uncorrelated components. As in the univariate case an extension to the additional remainder term R would be straightforward. This coupling has the purpose to estimate the distance to the standard multivariate distribution. Applying the linear regression heuristic in the multivariate case leads to a new condition due to [19]: E[W W W] = ΛW +R (1.7) ′ − | − for an invertible d d matrix Λ and a remainder term R = R(W). This linearity condition is × more natural than the one of [5]. Different exchangeable pairs, obviously, will yield different Λ and R. ON RATES OF CONVERGENCE IN THE CURIE-WEISS-POTTS MODEL 7 Interesting enoughtheCurie-Weiss-Potts model will beanexample todemonstrate the power of the approach in [19]. Constructing an exchangeable pair in the Curie-Weiss-Potts model to obtain an approximate linear regression property (1.7) leads us to the function G . This will β,h be sketched now. Let q > 2, h = 0 and β < β , and let x denote the unique global minimum c 0 point of G , see Theorem 1.1. We consider β,0 N W := √n x = √n L x . 0 n 0 n − − (cid:18) (cid:19) (cid:0) (cid:1) We produce a spin collection σ = (σ ) via a Gibbs sampling procedure: Let I be uniformly ′ i′ i 1 ≥ distributed over 1,...,n and independent from all other random variables involved. We will { } now replace the spin σ by σ drawn from the conditional distribution of the i’th coordinate i i′ given (σ ) , independently from σ . We define j j=i i 6 Y := (Y ,...,Y )t := (δ ,...,δ )t i i,1 i,q σi,1 σi,q and consider Y Y W := W I + I′ . (1.8) ′ − √n √n Hence it is not hard to see that (W,W ) is an exchangeable pair. This construction will also ′ be evident for all the proofs in this paper. Let := σ(σ ,...,σ ). We obtain 1 n F 1 E[W W ] = E Y Y i′ − i | F √n I′,i − I,i | F (cid:2)n (cid:3) 1 1 = E Y Y σ ,...,σ √nn j′,i − j,i | 1 n j=1 X (cid:2) (cid:3) n n 1 1 1 1 = Yj,i + E δσ′,i σ1,...,σn . −√nn √nn j | Xj=1 Xj=1 h i Using our construction we obtain E δσj′,i | σ1,...,σn = E δσj,i | (σt)t6=j = Pβ,0,n(σj = i | (σt)t=6 j). h i By a straightforward calculation (see Le(cid:2)mma 4.3) we(cid:3)get that exp(βm (σ)) i,j P (σ = i (σ ) ) = β,0,n j | t t6=j q exp(βm (σ)) k,j k=1 with m (σ) := 1 n δ . Using the notion m (σP) := 1 n δ one obtains i,j n l=j σl,i i n l=1 σl,i 6 n P1 1 exp(βm (σ)) 1 expP(βm (σ)) i,j i = +R (i). (1.9) √nn q √n q n j=1 exp(βm (σ)) exp(βm (σ)) X k,j k k=1 k=1 P P Moreover by definition of G (see Lemma 4.2) we obtain β,h exp(βu ) 1 ∂ m = u G (u). q exp(βu ) m − β∂u β,0 k=1 k m P 8 PETER EICHELSBACHER ANDBASTIAN MARTSCHINK Summarizing we obtain 1 x 1 1 ∂ E[W W ] = W 0,i +R (i)+ m (σ) G (m(σ)) (1.10) i′ − i|F −n i − √n n √n i − β∂u β,0 (cid:18) i (cid:19) with m(σ) = (m (σ),...,m (σ)). Hence using all informations on G (Taylor-expansion and 1 q β,h the results on the global minimum points, Theorem 1.1, Corollary 1.2), it seems to be possible to calculate Λ and R in the regression condition (1.7). Indeed we are able to proof that this condition is satisfied for any (β,h). In Section 2 of the present paper, the limit theorems and the rates of convergence are stated. They include a central limit theorem and a bound on the distance to a multivariate normal distribution for L outside the critical line. When G has several global minimizers, that is n β,h when (β,h) h or β β and h = 0, the empirical vector L is close to either one or the T c n ∈ ≥ other of the minimizers. In this case we determine a central limit theorem with conditioning for L . Next we describe the fluctuations at the extremity (β ,h ) of the critical line, again n 0 0 combined with a rateof convergence. In Section 3 we state anabstract nonsingular multivariate normal approximation theorem for smooth test functions from [19]. Moreover we present a Kolmogorov bound (nonsmooth test functions) for bounded random vectors W W under ′ − exchangeability. Finally we state an abstract non-Gaussian univariate approximation theorem for the Kolmogorov-distance from [10]. Section 4 contains some auxiliary results which will be necessary for the proofs given in Section 5. 2. Statement of results Let us fix some notation. From now on we will write random vectors in Rd in the form w = (w ,...,w )t, where w are R-valued variables for i = 1,...,d. If a matrix Σ is symmetric, 1 d i nonnegative definite, we denote by Σ1/2 the unique symmetric, nonnegative definite square root of Σ. Id denotes the identity matrix and from now on Z will denote a random vector having standard multivariate normal distribution. The expectation with respect to the measure P β,h,n will be denoted by E := E . Pβ,h,n Let q > 2. We first consider the issue of the fluctuations of the empirical vector N defined in (1.3) around its typical value. The case of the Curie-Weiss model (q = 2) was considered in [22] and [12]. A Berry-Esseen bound was proved in [10] (and independently in [6]). The Curie-Weiss-Potts model was treated in [13] and for non-zero external field in [15, Theorem 3.1]. To the best of our knowledge there are no Kolmogorov bounds known. Theorem 2.1. Let β > 0 and h 0 with (β,h) = (β ,h ). Assume that there is a unique 0 0 ≥ 6 minimizer x of G . Let W be the following random variable: 0 β,h N W := √n x . 0 n − (cid:18) (cid:19) ON RATES OF CONVERGENCE IN THE CURIE-WEISS-POTTS MODEL 9 If Z has the q-dimensional standard normal distribution, we have for every three times differ- entiable function g, Eg(W) Eg Σ 1/2Z C n 1/2, − − − ≤ · for a constant C and Σ := E(cid:12)[W Wt]. Inde(cid:0)ed we ob(cid:1)ta(cid:12)in that C = (q6). (cid:12) (cid:12) O Remark that we compare thedistribution ofthe rescaled vector N with a multivariate normal distribution with covariance matrix E[W Wt]. This is an advantage of Stein’s method: for any fixed number of particles/spins n, we are able to compare the distribution of W with a distribution with the same n-dependent covariance structure. In order to state our next result we introduce conditions on the function classes we consider. G Following [20], let Φ denote the standard normal distribution in Rq. We define for g : Rq R → g+(x) = sup g(x+y) : y δ , (2.1) δ | | ≤ gδ−(x) = inf (cid:8)g(x+y) : |y| ≤ δ (cid:9), (2.2) g˜(x,δ) = gδ+((cid:8)x)−gδ−(x). (cid:9) (2.3) Let be a class of real measurable functions on Rq such that G (1) The functions g are uniformly bounded in absolute value by a constant, which we ∈ G take to be 1 without loss of generality. (2) For any q q matrix A and any vector b Rq, g Ax+b . × ∈ ∈ G (3) For any δ > 0 and any g , g+(x) and g (x) are in . ∈ G δ δ− (cid:0) G (cid:1) (4) For some constant a = a( ,q), sup g˜(x,δ)Φ(dx) aδ. Obviously we may assume G ≤ a 1. g∈G (cid:26)RRq (cid:27) ≥ Considering the one dimensional case, we notice that the collection of indicators of all half lines, and indicators of all intervals form classes in that satisfy these conditions with a = 2/π G and a = 2 2/π respectively. This was shown for example in [20]. In dimension q 1 the class ≥ p of indicators of convex sets is known to be such a class. Using this notation we are able to p present an equivalent Theorem to 2.1 for our function classes . G Theorem 2.2. Let β > 0 and h 0 with (β,h) = (β ,h ). Assume that there is a unique 0 0 ≥ 6 minimizer x of G . Let W and Z be as in 2.1. Then, for all g with g 1, we have 0 β,h ∈ G | | ≤ Eg(W) Eg Σ 1/2Z Clog(n) n 1/2, − − − ≤ · for a constant C and Σ :=(cid:12) E[W Wt]. (cid:0) (cid:1)(cid:12) (cid:12) (cid:12) Letting be the collection of indicators of lower quadrants the distance above specializes to G the Kolmogorov distance. When the function G has several global minimizers, the empirical vector N/n is close to β,h either one or the other of these minima. We determine the conditional fluctuations and a rate of convergence: 10 PETER EICHELSBACHER ANDBASTIAN MARTSCHINK Theorem 2.3. Assume that β,h 0 and that G has multiple global minimum points β,h ≥ x ,...,x with l 2,q,q+1 and let ǫ > 0 be smaller than the distance between any two global 1 l ∈ { } minimizers of G . Furthermore, let W(i) := √n N x . Then, if Z has the q-dimensional β,h n − i standard normal distribution, under the conditional measure (cid:0) (cid:1) P N B(x ,ǫ) , β,h,n · | n ∈ i we have for every three times differentiab(cid:0)le function g, (cid:1) Eg(W(i)) Eg Σ 1/2Z C n 1/2, − − − ≤ · for a constant C and Σ := (cid:12)E W(i)(W(i))t .(cid:0) B(x ,ǫ(cid:1))(cid:12)denotes the open ball of radius ǫ around (cid:12) i (cid:12) x . i (cid:2) (cid:3) We note that we get a similar result as in Theorem 2.2 for the function classes in the case G of several global minimizers. Finally we will take a look at the extremity (β ,h ) of the critical 0 0 line h . Given a vector u Rq, we denote by u the vector space made of all vectors orthogonal T ⊥ ∈ to u in the Euclidean space Rq. Consider the hyperplane q := x Rq : x = 0 , (2.4) i M ∈ (cid:26) i=1 (cid:27) X which is parallel to . The fluctuations belong to , since all global minimizers are in . H M H The following result extends [12, Theorem 3.9] which applies to the case of the Curie-Weiss model at the critical inverse temperature. We remind the fact that at (β ,h ) the function 0 0 G has the unique minimizer x = (1/2,1/2(q 1),...,1/2(q 1)) Rq. Now we take β0,h0 − − ∈ u = (1 q,1,...,1) Rq and define a real valued random variable T and a random − ∈ M ⊂ vector V u such that ⊥ ∈ M∩ N = nx+n3/4T u+n1/2V. (2.5) Since N nx , the implicit definition of T and V presents a partition into a vector in (the − ∈ M subspace spanned by) u and u . The main interest is the limiting behaviour of T. The new ⊥ scaling of W is given by N n 1 j −n3/24(q−1) = T +Vj/n1/4, j = 2,...,q, and its possible limit we observe is reminiscent to [12], see also [7]. The following theorem gives a Kolmogorov bound for Theorem 3.7 in [15]. Theorem 2.4. For (β,h) = (β ,h ) let x = (1/2,1/2(q 1),...,1/2(q 1)) be the unique 0 0 − − minimizer of G and u = (1 q,1,...,1). Furthermore, let Z be a random variable β0,h0 − q,T distributed according to the probability measure on R with the density 1 f (t) := f := C exp t4 , q,T q,T,n · −4E(T4) (cid:18) (cid:19)