7 Occupation laws for some time-nonhomogeneous 0 0 Markov chains 2 n a J Zach Dietz1 and Sunder Sethuraman2 9 2 February 2, 2008 ] R P Abstract . h We consider finite-state time-nonhomogeneous Markov chains whosetransition ma- t a trix at time n is I + G/nζ where G is a “generator” matrix, that is G(i,j) > 0 for m i,j distinct, and G(i,i) = G(i,k), and ζ > 0 is a strength parameter. In these [ − k6=i chains, as time grows, the positions are less and less likely to change, and so form sim- P 1 ple models of age-dependent time-reinforcing schemes. These chains, however, exhibit v some different, perhaps unexpected, occupation behaviors depending on parameters. 8 9 Although it is shown, on the one hand, that the position at time n converges to a 7 point-mixtureforallζ > 0,ontheotherhand,theaverageoccupationvectoruptotime 1 n, when variously 0< ζ < 1, ζ > 1 or ζ = 1, is seen to converge to a constant, a point- 0 7 mixture, or a distributionµG with noatoms and fullsupporton a simplex respectively, 0 as n . This last type of limit can be interpreted as a sort of “spreading” between / ↑ ∞ h the cases 0< ζ < 1 and ζ > 1. at In particular, when G is appropriately chosen, intriguingly, µG is a Dirichlet distri- m bution, reminiscent of results in Po´lya urns. v: Research supported in part by nsa-h982300510041 and NSF-DMS-0504193 i Key words and phrases: laws of large numbers, nonhomogeneous, Markov, occupation, X reinforcement, Dirichlet distribution. r a Abbreviated title: Occupation laws for nonhomogeneous Markov chains. AMS (2000) subject classifications: Primary 60J10; secondary 60F10. 1 Department of Mathematics, TulaneUniversity, 6823 St. Charles Ave., New Orleans, LA 70118; [email protected]. 2DepartmentofMathematics,IowaStateUniversity,396CarverHall,Ames,IA 50011; [email protected]. 1 Introduction and Results In this article, we study laws of large numbers (LLN) for a class of finite space time- nonhomogeneous Markov chains where, as time increases, positions are less likely to 1 change. Although these chains feature simple age-dependent time-reinforcing dynam- ics, some different, perhaps unexpected, LLN occupation behaviors emerge depending on parameters. A specific case, as in Example 1.1, was first introduced in Gantert [8] in connection with analysis of certain simulated annealing LLN phenomena. Example 1.1 Suppose there are only two states 1 and 2, and that the chain moves between the two locations in the following way: At large times n, the chain switches places with probability c/n, and stays put with complementary probability 1 c/n for − c > 0. The chain, as it ages, is less inclined to leave its spot, but nonetheless switches infinitelyoften. Onecanseetheprobabilityofbeinginstate1tendsto1/2regardlessof the initial distribution. One may ask, however, how the average location, or frequency, of state 1 behaves asymptotically. For this example, it was shown in [8] and Ex. 7.1.1. [28], perhaps surprisingly, that any LLN limit could not be a constant, or even converge in probability, without further identification. However, a quick consequence of our results is that the average occupation limit of state 1 converges weakly to the Beta(c,c) distribution (Theorem 1.4). More specifically, we consider a general version of this scheme with m 2 possible ≥ locations,andmovingandstayingprobabilitiesG(i,j)/nζ and1 G(i,k)/nζ from − k6=i i j = i and i i respectively at time n where G = G(i,j) is an m m matrix → 6 → { }P × and ζ > 0 is a strength parameter. After observing the location probabilities tend to a distribution which depends on G, ζ, and initial probability π when ζ > 1, but does not depend on ζ and π when ζ 1 (Theorem 1.1), the results on the average occupation ≤ vector limit separate roughly into three cases depending on whether 0 < ζ < 1, ζ = 1, or ζ > 1. When 0 < ζ < 1, following [8], the average occupation is seen to converge to a constant in probability; and when more specifically 0 < ζ < 1/2, this convergence is proved to be a.s. When ζ > 1, as there are only a finite number of switches, the position eventually stabilizes and the average occupation converges to a mixture of point masses (Theorem 1.2). Our main results are when ζ = 1. In this case, we show the average occupation converges to a non-atomic distribution µ , with full support on a simplex, identified G byits moments (Theorems1.3 and1.5). When,inparticular, Gtakes formG(i,j) = θ j for all i = j, that is when the transititions into a state j are constant, µ takes the G 6 form of a Dirichlet distribution with parameters θ (Theorem 1.4). The proofs of j { } these statements follow by the method of moments, and some surgeries of the paths. Theheuristicis that when0< ζ < 1thechance of switching isstrongand sufficient mixing leads to constant limits, but when ζ > 1 there is little movement giving point- mixture limits. The case ζ = 1 is the intermediate “spreading” situation leading to non-atomiclimits. Forexample,withrespecttoEx. 1.1,whentheswitchingprobability at time n is c/nζ, the Beta(c,c) limit when ζ = 1 interpolates, as c varies on (0, ), ∞ between the point-mass at 1/2, the frequency limit of state 1 when 0< ζ < 1, and the 2 4 c = 10.0 c = 1.0 c = 0.1 3.5 3 2.5 nsity de 2 1.5 1 0.5 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x Figure 1: Beta(c,c) occupation law of state 1 in Ex. 1.1. fair mixture of point-masses at 0 and 1, the limit when ζ > 1 and starting at random (cf. Fig. 1). In the literature, there are only a few results on LLN’s for time-nonhomogeneous Markov chains, often related to simulated annealing and Metropolis algorithms which can be viewed in terms of a generalized model where ζ = ζ(i,j) is a non-negative function. These results relate to the case “maxζ(i,j) < 1” when the LLN limit is a constant[8],Ch. 7[28],[9]. SeealsoCh. 1[16],[19],[20]; andtexts[6],[14],[15]formore on nonhomogeneous Markov chains. In this light, the non-degenerate limits µ found G here seem to be novel objects. In terms of simulated annealing, these limits suggest a more complicated LLN picture at the “critical” cooling schedule when ζ(i,j) = 1 for some pairs i,j in the state space. The advent of Dirichlet limits, when G is chosen appropriately, seems of particular interest,givensimilarresultsforlimitcolor-frequenciesinPo´lyaurns[4],[10],asithints at an even larger role for Dirichlet measures in related but different “reinforcement”- type models (see [17], [23], [22], and references therein, for more on urn and reinforce- ment schemes). In this context, the set of “spreading” limits µ in Theorem 1.3, in G which Dirichlet measures are buta subset, appears intriguing as well (cf. Remarks 1.4, 1.5 and Fig. 2). In another vein, although different, Ex. 1.1 seems not so far from the case of independent Bernoulli trials with success probability 1/n at the nth trial. For such trials much is known about the spacings between successes, and connections to GEM random allocation models and Poisson-Dirichlet measures [27], [1], [2], [3], [24], [25]. We also mention, in a different, neighbor setting, some interesting but distinct LLN’s have been shown for arrays of time-homogeneous Markov sequences where the transition matrix P for the nth row converges to a limit matrix P [7], [11], Section n 5.3 [15]; see also [21] which comments on some “metastability” concerns. 3 We now develop some notation to state results. Let Σ = 1,2,...,m be a finite { } set of m 2 points. We say a matrix M = M(i,j) : 1 i,j m on Σ is a generator ≥ { ≤ ≤ } matrix if M(i,j) 0 for all distinct 1 i,j m, and M(i,i) = M(i,j) for ≥ ≤ ≤ − j6=i 1 i m. In particular, M is a generator with nonzero entries if M(i,j) > 0 for ≤ ≤ P 1 i,j m distinct, and M(i,i) < 0 for 1 i m. ≤ ≤ ≤ ≤ To avoid technicalities, e.g. with reducibility, we work with the following matrices, G = G Rm×m :G is a generator matrix with nonzero entries , ∈ (cid:26) (cid:27) although extensions should be possible for a larger class. For G G, let n(G,ζ) = ∈ max G(i,i)1/ζ , and define for ζ >0 1≤i≤m ⌈ | | ⌉ I for 1 n n(G,ζ) PG,ζ = ≤ ≤ n I +G/nζ for n n(G,ζ)+1 (cid:26) ≥ G,ζ where I is the m m identity matrix. Then, for all n 1, P is ensured to be a n × ≥ stochastic matrix. LetπbeadistributiononΣ,andletPG,ζ bethe(nonhomogeneous)Markovmeasure π N N on the sequence space Σ with Borel sets (Σ ) corresponding to initial distribution B G,ζ π and transition kernels P . That is, with respect to the coordinate process, n { } X = X ,X ,... , we have PG,ζ(X = i) = π(i) and the Markov property 0 1 π 0 h i PG,ζ(X = j X ,X ,...,X ,X = i) = PG,ζ(i,j) π n+1 | 0 1 n−1 n n+1 G,ζ for all i,j Σ and n 0. Our convention then is that P controls “transitions” ∈ ≥ n+1 between times n and n+1. Let also EG,ζ be expectation with respect to PG,ζ. More π π generally, E denotes expectation with respect to measure µ. µ Define the occupation statistic Z = Z , ,Z for n 1 where n 1,n m,n h ··· i ≥ n 1 Z = 1 (X ) i,n i k n k=1 X for 1 i m. Then, Z is an element of the m 1-dimensional simplex, n ≤ ≤ − m ∆ = x : x = 1,0 x 1 for 1 i m . m i i ≤ ≤ ≤ ≤ (cid:26) i=1 (cid:27) X The first result is on convergence of the position of the process. For G G, let ν G ∈ be the stationary distribution corresponding to G (of the associated continuous time homogeneous Markov chain), that is the unique left eigenvector, with positive entries, normalized to unit sum, of the eigenvalue 0. 4 Theorem 1.1 For G G, ζ > 0, and initial distribution π, under PG,ζ, π ∈ d X ν n G,π,ζ −→ where ν is a probability vector on Σ depending in general on ζ, G, and π. When G,π,ζ 0 < ζ 1, ν does not depend on π and ζ and reduces to ν = ν . G,π,ζ G,π,ζ G ≤ Remark 1.1 For ζ > 1, with only finitely many moves, the convergence is a.s., and ν is explicit when G = V D V−1 is diagonalizable with D diagonal and G,π,ζ G G G G D (i,i) = λG, the ith eigenvalue of G, for 1 i m. By calculation, ν = G i ≤ ≤ G,π,ζ πt PG,ζ = πtV D′V−1 with D′ diagonal and D′(i,i) = (1+λG/nζ). n≥1 n G G n≥n0(G,ζ)+1 i Q Q We now consider the cases ζ = 1 with respect to average occupation limits. Let i 6 be the basis vector i = 0,...,0,1,0,...,0 ∆ with a 1 in the ith component and m h i ∈ δi be the point mass at i for 1 i m. ≤ ≤ Theorem 1.2 Let G G, and π be an initial distribution. Under PG,ζ, we have that π ∈ Z ν n G −→ converges in probability when 0 < ζ < 1; when more specifically 0 < ζ < 1/2, this convergence is PG,ζ-a.s. π However, when ζ > 1, under PG,ζ, π m d Zn νG,π,ζ(i)δi . −→ i=1 X Remark 1.2 Simulations suggest that actually a.s. convergence might hold also on the range 1/2 ζ < 1 (with worse convergence rates as ζ 1). ≤ ↑ Let now γ ,...,γ 0, be integers such that γ¯ = m γ 1. Define the list 1 m ≥ i=1 i ≥ A = a : 1 i γ¯ = 1,...,1,2,...,2,...,m,...,m . Let S(γ ,...,γ ) be the γ¯! i 1 m { ≤ ≤ } { }P γ1 γ2 γm permutations of A, although there are only γ¯ distinct permutations; that is, | {z } | {z } γ1|,γ2,·{··z,γm } each permutation appears m γ ! times. k=1 k (cid:0) (cid:1) Note also, for G G, being a generator matrix, all eigenvalues of G have non- ∈ Q positive real parts (indeed, I+G/k is a stochastic matrix for k large; then, by Perron- Frobenius, the real parts of its eigenvalues satisfy 1 1+Re(λG)/k 1, yielding the − ≤ i ≤ non-positivity), and so the resolvent (xI G)−1 is well defined for x 1. − ≥ Theorem 1.3 For ζ = 1, G G, and initial distribution π, we have under PG,ζ that π ∈ d Z µ n G −→ 5 where µ isameasure onthesimplex∆ characterized byitsmoments: For1 i m, G m ≤ ≤ E (x ) = lim EG,ζ(Z ) = ν (i), µG i n→∞ π i,n G and for integers γ ,...,γ 0 when γ¯ 2, 1 m ≥ ≥ E xγ1 xγm = lim EG,ζ Zγ1 Zγm µG 1 ··· m n→∞ π 1,n··· m,n (cid:18) (cid:19) (cid:18) (cid:19) γ¯−1 −1 1 = ν (σ ) iI G (σ ,σ ). G 1 i i+1 γ¯ − σ∈S(γX1,...,γm) Yi=1(cid:18) (cid:19) Remark 1.3 However, asinEx. 1.1and[8],whenζ = 1asabove, Z cannotconverge n in probability (as the tail field σ X ,X ,... is trivial by Theorem 1.2.13 and n n n+1 ∩ { } Proposition 1.2.4 [16] and (2.3), but the limit distribution µ is not a point-mass by G say Theorem 1.5 below). This is in contrast to Po´lya urns where the color frequencies converge a.s. We now consider a particular matrix under which µ is a Dirichlet distribution. G For θ ,...,θ > 0, define 1 m θ θ¯ θ θ θ 1 2 3 m θ− θ θ¯ θ ··· θ 1 2 3 m Θ = ... −... ... ··· ... θ θ θ ··· θ θ¯ 1 2 3 m ··· − where θ¯= m θ . It is clear Θ G. Recall identification of the Dirichlet distribution l=1 l ∈ by its density and moments; see [18], [26] for more on these distributions. Namely, the P Dirichlet distribution on the simplex ∆ with parameters θ ,...,θ (abbreviated as m 1 m Dir(θ ,...,θ )) has density 1 m Γ(θ¯) xθ1−1 xθm−1. Γ(θ ) Γ(θ ) 1 ··· m 1 m ··· The moments with respect to integers γ ,...,γ 0 with γ¯ 1 are 1 m ≥ ≥ m θ (θ +1) (θ +γ 1) E xγ1 xγm = i=1 i i ··· i i− , (1.1) 1 ··· m γ¯−1(θ¯+i) (cid:18) (cid:19) Q i=0 where we take θ (θ +1) (θ +γ 1) = 1 whQen γ = 0. i i i i i ··· − Theorem 1.4 We have µ = Dir(θ ,...,θ ). Θ 1 m Remark 1.4 Moreover, by comparing the first few moments in Theorem 1.3 with (1.1), one can check µ is not a Dirichlet measure for many G’s with m 3. However, G ≥ when m = 2, then any G takes the form of Θ with θ = G(2,1) and θ = G(1,2), and 1 2 so µ = Dir(G(2,1),G(1,2)). G 6 Figure 2: Empirical µ densities under G and G respectively. G left right We now characterize the measures µ : G G as “spreading” measures different G { ∈ } from the limits when 0< ζ < 1 and ζ > 1. Theorem 1.5 Let G G. Then, (1) µ (U) > 0 for any non-empty open set U ∆ . G m ∈ ⊂ Also, (2) µ has no atoms. G Remark 1.5 We suspect better estimates in the proof of Theorem 1.5 will show µ G is in fact mutually absolutely continuous with respect to Lebesgue measure on ∆ . Of m course, in this case, it would be of interest to find the density of µ . Meanwhile, we G give two histograms, found by calculating 1000 averages, each on a run of time-length 10000 starting at random on Σ at time n(G,1) (= 3,1 respectively), in Figure 2 of the empirical density when m = 3 and G takes forms 3 1 2 .4 .2 .2 − − G = 2 3 1 , and G = .3 .6 .3 . left right − − 1 2 3 .5 .5 1 − − To help visualize plots, ∆ is mapped to the plane by linear transformation f(x) = 3 x f( 1,0,0 )+x f( 0,1,0 )+x f( 0,0,1 ) where f( 1,0,0 ) = √2,0 , f( 0,1,0 ) = 1 2 3 h i h i h i h i h i h i 0,0 and f(0,0,1) = √2 1/2,√3/2 . The map maintains a distance √2 between the h i h i transformed vertices. We now comment on the plan of the paper. The proofs of Theorems 1.1 and 1.2, 1.3, 1.4, and 1.5 (1) and (2) are in sections 2,3,4, 5, and 6 respectively. These sections do not depend structurally on each other. 7 2 Proofs of Theorems 1.1 and 1.2 We first recall some results for nonhomogeneous Markov chains in the literature. For a stochastic matrix P on Σ, define the “contraction coefficient” 1 c(P) = max P(x,z) P(y,z) x,y 2 − z (cid:12) (cid:12) X(cid:12) (cid:12) (cid:12) (cid:12) = 1 min (cid:12) min P(x,z),P((cid:12)y,z) (2.1) − x,y z (cid:26) (cid:27) X The following is, for instance, Theorem 4.5.1 [28]. Proposition 2.1 Let X be a time-nonhomogeneous Markov chain on Σ connected by n transition matrices P with corresponding stationary distributions ν . Suppose n n { } { } ∞ ∞ c(P )= 0 and ν ν < . (2.2) n n n+1 Var k − k ∞ n=1 n=1 Y X Then, ν = lim ν exists, and, starting from any initial distribution π, we have for n→∞ n each k Σ that ∈ lim P(X = k) = ν(k). n n→∞ The following is stated in Section 2 [8] as a consequence of results (1.2.22) and Theorem 1.2.23 in [16]. Proposition 2.2 Given the setting of Proposition 2.1, suppose (2.2) is satisfied, and c = max c(P ) < 1 for all n n for some n 1. Let π and f be any initial n n0≤i≤n i ≥ 0 0 ≥ distribution, and function f : Σ R. Then, we have convergence → n 1 f(X ) E [f] i ν n → i=1 X in the following senses: (i) In probability, when lim n(1 c ) = . n→∞ n − ∞ (ii) a.s. when n≥n02−n(1−c2n)−2 < ∞. Proof of TheorPem 1.1. We first consider when ζ > 1. In this case there are only a finite number of movements by Borel-Cantelli since PG,ζ(X = X ) n≥1 π n 6 n+1 ≤ C n−ζ < . Hence there is a time of last movement N < a.s. Then, n≥1 ∞ P ∞ limX = X a.s., and, for k Σ, the limit distribution ν is defined and given by n N G,π,ζ P ∈ PG,ζ(X = k) = ν (k). π N G,π,ζ When 0 < ζ 1, as G G, by calculation with (2.1), c(PG,ζ) = 1 C /nζ for all n G ≤ ∈ − n n (G,ζ) large enough and a constant C > 0. Then, 0 G ≥ C c(PG,ζ) = 1 G = 0. (2.3) n − nζ nY≥1 n≥nY0(G,ζ)(cid:18) (cid:19) 8 Since for n > n(G,ζ), νt PG,ζ = νt (I G/nζ)= νt , the second condition of Proposi- G n G − G tion 2.1 is trivially satisfied, and hence the result follows. (cid:3) Proof of Theorem 1.2. Whenζ > 1, asmentionedintheproofofTheorem1.1,there areonlyafinitenumberofmovesa.s.,andsoa.s. limZ = m 1 kconcentrates n k=1 [XN=k] on basis vectors k . Hence, as defined in proof of Theorem 1.1, PG,ζ(X = k) = { } P π N ν (k), and the result follows. G,π,ζ When 0 < ζ < 1, we apply Proposition 2.2 and follow the method in [8]. First, as in the proof of Theorem 1.1, (2.2) holds, and c(PG,ζ) = 1 C /nζ for a constant n G − C > 0 and all n n (G,ζ). Then, c = max c(PG,ζ) = 1 C /nζ < 1. G ≥ 0 n n0(G,ζ)≤i≤n i − G Now, n(1 c ) = C n1−ζ to give the probability convergence in part (i). For a.s. n G − ↑∞ convergence in part (ii) when 0 <ζ < 1/2, note 1 1 1 = = < . (cid:3) n 2n(1−c2n)2 n 2n(CG/(2n)ζ)2 n CG2(21−2ζ)n ∞ X X X 3 Proof of Theorem 1.3. In this section, as ζ = 1 is fixed, we suppress notational dependence on ζ. Also, as Z n takes values on the compact set ∆ , the weak convergence in Theorem 1.3 follows by m convergence of the moments. The next lemma establishes convergence of the first moments. Lemma 3.1 For G G, 1 k m, and initial distribution π, ∈ ≤ ≤ lim EG Z = ν (k) π k,n G n→∞ (cid:18) (cid:19) Proof. From Theorem 1.1, and Cesaro convergence, n n 1 1 limEG Z = lim EG 1 (X ) = lim PG(X = k) = ν (k). (cid:3) n π k,n n n π k i n n π i G (cid:18) (cid:19) i=1 (cid:18) (cid:19) i=1 X X We now turn to the joint moment limits in several steps, and will assume in the following that γ ,...,γ 0 with γ¯ 2. The first step is an “ordering of terms.” 1 m ≥ ≥ Lemma 3.2 For G G, and initial distribution π, we have ∈ lim EG Zγ1 Zγm n→∞ π 1,n··· m,n (cid:12) (cid:18) (cid:19) (cid:12) n−γ¯+1n−γ¯+2 n γ¯ (cid:12) 1 (cid:12) EG 1 (X ) = 0. − nγ¯ ··· π σl il σ∈S(γX1,...,γm) iX1=1 iX2>i1 iγ¯>Xiγ¯−1 (cid:18)Yl=1 (cid:19)(cid:12)(cid:12) (cid:12) (cid:12) 9 Proof. By definition of S(γ ,...,γ ), 1 m 1 1 EG Zγ1 Zγm = EG 1 (X )1 (X ) 1 (X ) . π 1,n··· m,n γ¯!nγ¯ π σ1 i1 σ2 i2 ··· σγ¯ iγ¯ (cid:18) (cid:19) σ∈S(γX1,...,γm) (cid:18) (cid:19) 1≤i1,...,iγ¯≤n Note now n 1= γ¯!nγ¯, and 1= γ¯!γ¯! . γ¯ σ∈S(γX1,...,γm) σ∈S(γX1,...,γm) (cid:18) (cid:19) 1≤i1,...,iγ¯≤n 1≤i1,...,iγ¯≤n, distinct Let be those indices i ,...,i , 1 i ,...,i n which are not distinct, that is 1 γ¯ 1 γ¯ K h i ≤ ≤ i = i for some j = k. Then, j k 6 γ¯ γ¯ 1 1 EG 1 (X ) EG 1 (X ) γ¯!nγ¯ π σl il − π σl il (cid:12)(cid:12)(cid:12)σ1∈≤Si(1γX,1..,..,.i.γ¯,γ≤mn) (cid:18)Yl=1 (cid:19) 1≤i1σ,∈..S.,(iγXγ¯1≤,.n..,,γdmis)tinct (cid:18)Yl=1 (cid:19)(cid:12)(cid:12)(cid:12) 1(cid:12) 1 (cid:12) = EG 1 (X ) 1 (X ) γ¯!nγ¯ π σ1 i1 ··· σγ¯ iγ¯ σ∈S(γX1,...,γm) (cid:18) (cid:19) hi1,...,iγ¯i∈K 1 1 n γ¯!nγ¯ γ¯!γ¯! = o(1). ≤ γ¯!nγ¯ − γ¯ (cid:18) (cid:18) (cid:19)(cid:19) But, γ¯ γ¯ EG 1 (X ) = γ¯! EG 1 (X ) . (cid:3) π σl il π σl il σ∈S(γX1,...,γm) (cid:18)Yl=1 (cid:19) σ∈S(γX1,...,γm) (cid:18)Yl=1 (cid:19) 1≤i1,...,iγ¯≤n, distinct 1≤i1<···<iγ¯≤n The next lemma replaces the initial measure with ν . Let PG = j PG for G i,j l=i l 1 i j. ≤ ≤ Q Lemma 3.3 For G G and initial distribution π, we have ∈ n−γ¯+1n−γ¯+2 n γ¯ 1 lim EG 1 (X ) (3.1) n→∞ nγ¯ ··· π σl il (cid:12)(cid:12)σ∈S(γX1,...,γm) iX1=1 iX2>i1 iγ¯>Xiγ¯−1 (cid:18)Yl=1 (cid:19) (cid:12) n−γ¯+1n−γ¯+2 n γ¯−1 (cid:12) ν (σ ) G 1 PG (σ ,σ ) = 0. − nγ¯ ··· il+1,il+1 l l+1 σ∈S(γX1,...,γm) iX1=1 iX2>i1 iγ¯>Xiγ¯−1 Yl=1 (cid:12)(cid:12) (cid:12) Proof. As PG(X =t X = s)= PG (s,t) for 1 i< j and s,t Σ, w(cid:12)e have π j | i i+1,j ≤ ∈ n−γ¯+1n−γ¯+2 n γ¯ 1 EG 1 (X ) nγ¯ ··· π σl il σ∈S(γX1,...,γm) iX1=1 iX2>i1 iγ¯>Xiγ¯−1 (cid:18)Yl=1 (cid:19) n−γ¯+1n−γ¯+2 n γ¯−1 1 = PG(X = σ ) PG (σ ,σ ) nγ¯ ··· π i1 1 il+1,il+1 l l+1 σ∈S(γX1,...,γm) iX1=1 iX2>i1 iγ¯>Xiγ¯−1 Yl=1 10