ebook img

Analysis of Convergence Rates of Some Gibbs Samplers on Continuous State Spaces PDF

0.39 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Analysis of Convergence Rates of Some Gibbs Samplers on Continuous State Spaces

ANALYSIS OF CONVERGENCE RATES OF SOME GIBBS SAMPLERS ON CONTINUOUS STATE SPACES AARON SMITH 1. Abstract 2 1 0 We use a non-Markovian coupling and small modifications of techniques from the 2 theory of finite Markov chains to analyze some Markov chains on continuous state n spaces. The first is a generalization of a sampler introduced by Randall and Winkler, u the second a Gibbs sampler on narrow contingency tables. J 2 1 2. Introduction ] The problem of sampling from a given distribution on high-dimensional continuous R spaces arises in the computational sciences and Bayesian statistics, and a frequently- P . used solution is Markov chain Monte Carlo (MCMC); see [16] for many examples. h t Because MCMC methods produce good samples only after a lengthy mixing period, a a long-standing mathematical question is to analyze the mixing times of the MCMC m algorithms which are in common use. Although there are many mixing conditions, [ the most commonly used is called the mixing time, and is based on the total variation 2 distance: v 5 For measures ν, µ with common measurable σ-algebra A, the total variation dis- 1 tance between µ and ν is 4 5 ||µ−ν|| = sup(µ(A)−ν(A)) . TV 8 A∈A 0 1 For an ergodic discrete-time Markov chain Xt with unique stationary distribution 1 π, the mixing time is : v τ((cid:15)) = inf{t : ||L(X )−π|| < (cid:15)} i t TV X r AlthoughmostscientificandstatisticalusesofMCMCmethodsoccurincontinuous a state spaces, much of the mathematical mixing analysis has been in the discrete setting. The methods that have been developed for discrete chains often break down when used to analyze continuous chains, though there are some efforts, such as [28] [24] [18], to create general techniques. This paper extends the author’s previous work in [26] and work of Randall and Winkler [22], and attempts to provide some more Date: June 13, 2012. 1 2 AARON SMITH examples of relatively sharp analyses of continuous chains similar to those used to develop the discrete theory. The first process that we analyze is a Gibbs sampler on the simplex with a very restricted set of allowed moves. Fix a finite group G of size n with symmetric gen- erating set R of size m. For unity of notation, label the group elements with the integers from 1 to n. We consider the process X [g] on the simplex ∆ = {X ∈ t G Rn|(cid:80) X[g] = 1;X[g] ≥ 0}. At each step, choose g ∈ G, r ∈ R and λ ∈ [0,1] g∈G uniformly, and set X [g] = λ(X [g]+X [gr]) t+1 t t (1) X [gr] = (1−λ)(X [g]+X [gr]) t+1 t t For all other h ∈ G set X [h] = X [h]. Let U be the uniform distribution on t+1 t G ∆ ; this is also the stationary distribution of X . Also consider a random walk Z G t t on G, where in each stage we choose g ∈ G and r ∈ R uniformly at random and set Z = gr if Z = g, set Z = g if Z = gr, and Z = Z otherwise. This is t+1 t t+1 t t+1 t the standard simple random walk on the Cayley graph, slowed down by a factor of about n. Let γ be the spectral gap of the walk Z , and follow the notation that L(X) (cid:98) t denotes the distribution of a random variable X. Theorem1(ConvergenceRateforGibbsSamplerwithGeometry). ForT > 56k+178 log(n), γ (cid:98) ||L(X )−U || ≤ 14n−k T G TV and conversely for T < k, γ (cid:98) 1 ||L(X )−U || ≥ e−k −3n−1 T G TV 3 2 Thissubstantiallygeneralizes[22]and[26],fromsamplerscorrespondingtoG = Z , n and R = {1,−1} or R = Z \{0} respectively, to general Cayley graphs. In addition n to being of mathematical interest, this process is an example of a gossip process with some geometry, studied by electrical engineers and sociologists interested in how information propagates through networks; see [25] for a survey. The proof of the upper bound will use an auxilliary chain similar to that found in [22], a coupling argument improved from [26], and an unusual use of comparison theory from [9]. The proof of the lower bound is elementary. The next example consists of narrow contingency tables. Beginning with the work of Diaconis and Efron [6] on independence tests, there has been interest in finding efficient ways to sample uniformly from the collection of integer-valued matrices with given row and column sums. A great deal of this effort has been based on Markov chainMonteCarlomethods. WhilesomeoftheeffortshavedealtdirectlywithMarkov chains on these integer-valued matrices, much recent success, including [11] [21], has CONVERGENCE RATES OF GIBBS SAMPLERS 3 involved using knowledge of Gibbs samplers on convex sets in Rn and clever ways to project from the continuous chain to the desired matrices [20]. Unfortunately, while the general bounds are polynomial in the number of entries in the desired matrix, they often have a large degree and leading coefficient; see [17]. In this paper, we find some better bounds for very specific cases. Like the paper [26], this is part of an attempt to make further use of non-Markovian coupling techniques [13] [2] [5] [19] and also to expand the small set of carefully analyzed Gibbs samplers [22] [23] [7] [8]. In this case, the new techniques are two slight modifications of the path-coupling method introduced in [4]. In many path-coupling arguments, a burn-in argument is used to show that for most pairs of points in a metric space, there is a path along which the Markov transition kernel is contractive acting on any pair of points along the path. In this argument, we show that for all paths, the kernel is contractive acting on most pairs of points along the path. This type of modification seems likely to be useful only on continuous spaces. We consider the following Gibbs sampler X [i,j] on the space M = {X ∈ R2n : t n (cid:80)n X[i,j] = n ∀ 1 ≤ j ≤ 2,(cid:80)2 X[i,j] = 2 ∀ 1 ≤ i ≤ n,X[i,j] ≥ 0} of nonneg- i=1 j=1 ative n by 2 matrices with column sums fixed to be n and row sums fixed to be 2. To make a step of the Gibbs sampler, choose two distinct integers 1 ≤ i < j ≤ n and update the four entries X [i,1], X [i,2], X [j,1] and X [j,2] to be uniform t+1 t+1 t+1 t+1 conditional on all other entries of X . Let U be the uniform distribution on M t n n inherited from Lebesgue measure. Then we find the following reasonable bound on the mixing time of this sampler: Theorem 2 (Convergence Rate for Narrow Matrices). For T > (31k +81)nlog(n), ||L(X )−U || ≤ 13n−k T n TV and conversely for T < (1−k)nlog(n), and n sufficiently large, ||L(X )−U || ≥ 1−2n−k T n TV 3. General Strategy and the Partition Process Both of our bounds will be obtained using a similar strategy, ultimately built on the classical coupling lemma. We recall that a coupling of Markov chains with transition kernel K is a process (X ,Y ) so that marginally both X and Y are Markov t t t t chains with transition kernel K. Although we always couple entire paths {X }T and t t=0 {Y }T , we often use the shorthand notation of saying that we are coupling X and Y . t t=0 t t In order to describe a coupling, note that for both walks being studied, the evolution of the Markov chain X can be represented by X = f(X ,i(t),j(t),λ(t)), where f is t t+1 t a deterministic function, i(t),j(t) are random coordinates (either elements of [n] or of a group G), and λ(t) is drawn from Lebesgue measure on [0,1]. These representations are given in equations (8) and (1) respectively. To couple X and Y , it is thus enough t t 4 AARON SMITH to couple the update variables i(t)α,j(t)α,λ(t)α, with α ∈ {x,y}, used to construct X and Y respectively. t t Our couplings will provide bounds on mixing times through the following lemma (see [15], Theorem 5.2 - they work in discrete space, but their proof doesn’t rely on this assumption): Lemma 3 (Fundamental Coupling Lemma). If (X ,Y ) is a coupling of Markov t t chains, Y is distributed according to the stationary distribution of K, and τ is the 0 first time at which X = Y , then t t ||L(X )−L(Y )|| ≤ P[τ > t] t t TV In each chain, then, we begin with X started at a distribution of our choice, and Y t t started at stationarity. For any fixed (large) T, we will then couple X and Y so that t t they will have coupled by time T with high probability. Each coupling will have two phases: an initial phase from time 0 to time T in which X and Y get close with high 1 t t probability, and a non-Markovian coupling phase from time T to time T = T +T 1 1 2 in which they are forced to collide. Unlike many coupling proofs, the time of interest T must be specified before constructing the coupling. While the initial contraction phases are quite different for the two chains, the final coupling phase can be described in a unified way. The unifying device is the partition process P on set partitions of [n] = {1,2,...,n}, introduced in [26] for t a special case of the first sampler treated here (see that paper for details). This partition process contains some information about the coordinates {i(t),j(t)}T t=T1 used by Y throughout the entire process, and is the only source of information from t the future that is used to construct the non-Markovian coupling. Critically, we don’t useanyinformationabouttherandomvariablesλ(t)usedateachstep, whichmakesit trivial to check that the couplings constructed in this paper have the correct marginal distributions. The process {P }T will consist of a set of nested partitions of [n], P ≤ P ≤ t t=T1 T1 T1+1 ... ≤ P = {{1},{2},...,{n}}, where we say partition A is less than partition B if T every element of partition B is a subset of an element of partition A. To construct P , t wefirstlookatthesequenceofgraphsG withvertexset[n]andedgeset{(i(s),j(s)) : t s > t}. ThenletP consistoftheconnectedcomponentsofG . WhileconstructingP , t t t we will also record a series of ‘marked times’ T = t > t > ... > t and associated 1 2 m special subsets S(t ,1) and S(t ,2) of [n]. We will set t = T, and then inductively set j j 1 t = sup{t : t < t ,P (cid:54)= P }. Finally, note that if P (cid:54)= P , the only difference j j−1 t−1 t t−1 t between them is that two elements of P have been merged into a single element in t P . Label the set merged at time t with fewer elements S(t ,1), and label the other t−1 j j one S(t ,2). If both sets have the same number of elements, set S(t ,1) to be the one j j containing the smallest element (this is, of course, quite arbitrary). We will be interested in the smallest time τ such that P = [n], a single block. T−τ From classical arguments (see e.g. chapter 7 of [3]), CONVERGENCE RATES OF GIBBS SAMPLERS 5 Lemma 4 (Connectedness). For the Gibbs sampler on narrow matrices, 1 P[τ > ( +(cid:15))nlog(n)] ≤ 2n−(cid:15) 2 The analogous lemma for the other example will be proved in Section 5, Lemma 7. For both of our walks, we will use two types of coupling, the ‘proportional’ coupling and the ‘subset’ coupling. In both cases, we will set i(t)x = i(t)y and j(t)x = j(t)y at each step. In the proportional coupling, we will also set λ(t)x = λ(t)y. To discuss the subset coupling, we must define the weight of X on a subset S ⊂ [n], t (cid:80) which we call w(X ,S). For the simplex walk, we define w(X ,S) = X [s]. For t t s∈S t (cid:80) narrowmatrices, wedefinew(X ,S) = X [s,1]. Thesubsetcouplingsassociated t s∈S t with subset S ⊂ [n], which are defined immediately prior to lemmas 18 and 8, will often set w(X ,S) = w(Y ,S). We say that a subset coupling of subset S at time t+1 t+1 t succeeds if that equality holds; otherwise, we say it fails. In each case, the coupling of X and Y during the non-Markovian coupling phase t t will be as follows. At marked times t , we will perform a subset coupling of X , Y j tj tj with respect to S(t ,1). At all other times, we will perform a proportional coupling. j This leads to: Lemma 5 (Final Coupling). Assume the non-Markovian coupling phase lasts from time T to T, that P = {[n]}, and that all subset couplings succeed. Then X = Y . 1 T1 T T Proof. Let F denote the collection of equations w(X ,S) = w(Y ,S) for all S ∈ P . t t t t We will show by induction that the equations F hold for all T ≤ t ≤ T. At time T , t 1 1 wehavew(X ,[n]) = w(Y ,[n]) = 1. Bydefinitionofthepartitionprocess, iftisnot T1 T1 a marked time and all equations F hold, then all equations F also hold. In fact, t t+1 this is true for any coupling of λ(t)x,λ(th)y at that step, not just the proportional coupling. Assume t = t is a marked time, and that the equations F hold. Then if F j tj tj+1 don’t all hold, we must have that either w(X ,S(t ,1)) (cid:54)= w(Y ,S(t ,1)) or tj+1 j tj+1 j w(X ,S(t ,2)) (cid:54)= w(Y ,S(t ,2)), since none of the terms in the other equations tj+1 j tj+1 j change. By assumption, all subset couplings have succeeded, so w(X ,S(t ,1)) = tj+1 j w(Y ,S(t ,1)). By construction, w(X ,S(t ,2)) = w(X ,S(t ,1)∪S(t ,2))− tj+1 j tj+1 j tj+1 j j w(Y ,S(t ,1)) and similarly for Y , so w(X ,S(t ,2)) = w(Y ,S(t ,2)). tj+1 j tj+1 tj+1 j tj+1 j Thus, the inductive claim has been proved. Finally, we note that if w(X ,{i}) = w(Y ,{i}) for any singleton {i}, then X [i] = t t t Y [i] for the sampler on the simplex (respectively X [i,j] = Y [i,j] for j ∈ {1,2} for t t t the other sampler). Since P = {{1},{2},...,{n}}, this proves the lemma. (cid:4) T So, in both cases, showing that all subset couplings succeed with high probability is sufficient to show that coupling has succeeded. 6 AARON SMITH 4. Contraction for Gibbs Samplers on the Simplex with Geometry In this section, we prove a contraction lemma for Gibbs samplers on the simplex associated with a group G and symmetric generating set R of G (that is, R−1 = R), where |G| = n, |R| = m, and id is the identity element of G. We recall briefly some definitions. We write ∆ = {X ∈ RG|X[g] ≥ 0,(cid:80) X[g] = 1}. If X ∈ ∆ is a G g∈G t G copy of the Markov chain, we take a step by choosing g ∈ G, r ∈ R and λ ∈ [0,1] uniformly and setting X [g] = λ(X [g]+X [gr]), X [gr] = (1−λ)X [g]+X [gr]), t+1 t t t+1 t t and for all other entries X [h] = X [h]. This walk is closely related to a slow simple t+1 t random walk on the group. In particular, we let Z ∈ G be the random walk that t evolves by choosing at each time step a group element g ∈ G and generator r ∈ R uniformly at random, and setting Z = Z r if Z = g, and Z = Z otherwise. t+1 t t t+1 t Let K(cid:98) be the transition kernel associated with the random walk Z . Since R is t symmetric, therandomwalkisreversible, soK(cid:98) canbewrittenin abasisoforthogonal eigenvectors with real eigenvalues 1 = λ(cid:98) > λ(cid:98) ≥ ... ≥ λ(cid:99) ≥ −1. Since it is 1-lazy, 1 2 n 2 all eigenvalues are in fact nonnegative. Let γ = 1−λ(cid:98) be the spectral gap of K(cid:98). In (cid:98) 2 this section we will show that Lemma 6 (Contraction Estimate for Gibbs Sampler on Cayley Graphs). Let X , Y t t be two copies of the Gibbs sampler on the simplex associated with G and R, with joint distribution given by a proportional coupling at each step. Then E[||X −Y ||2] ≤ 4ne−(cid:98)tγ(cid:98)(cid:99) t t 2 8 Proof. We will construct an auxilliary Markov chain on G associated with X , and t compare it to the standard random walk Z . Let X , Y be two copies of the walk, t t t and couple them at each step with the proportional coupling. For h ∈ G, let Sh = t (cid:80) (X [g] − Y [g])(X [hg] − Y [hg]). We will analyze the evolution of the vector g∈G t t t t S = (Sid,...). t t CONVERGENCE RATES OF GIBBS SAMPLERS 7 There are three cases to analyze: h ∈ R, h ∈/ R and h (cid:54)= id, and h = id. Let F be t the σ-algebra generated by X and Y , 0 ≤ s ≤ t. For case 1, we have s s 4 2 E[Sh |F ] = (1− + )Sh t+1 t n mn t 1 (cid:88) (cid:88) + [(X [i]+X [ri]−Y [i]−Y [ri])(X [hi]−Y [hi]) t t t t t t 2mn i∈G r∈R,r(cid:54)=h,h−1 +(X [ri]+X [i]−Y [ri]−Y [i])(X [hri]−Y [hri]) t t t t t t +(X [h−1i]−Y [h−1i])(X [i]+X [ri]−Y [i]−Y [ri]) t t t t t t +(X [h−1ri]−Y [h−1ri])(X [i]+X [ri]−Y [i]−Y [ri])] t t t t t t 2 (cid:88) 1 + [ (X [i]+X [hi]−Y [i]−Y [hi])2 t t t t mn 6 i∈G +(X [i]−Y [i])(X [hi]−Y [hi])+(X [i]−Y [i])(X [h2i]−Y [h2i])] t t t t t t t t 2 2 2 2 = (1− + )Sh + Sid + Sh2 n 3mn t 3mn t mn t 1 (cid:88) + (Shr−1 +Shr +Srh +Srh−1) 2mn t t t t r∈R,r(cid:54)=h,h−1 and we note that the sum of the coefficients is 1− 2 . For case 2, we have 3mn 4 2m E[Sh |F ] = (1− )Sh + Sh t+1 t n t mn t 1 (cid:88) + (Shr−1 +Shr +Srh +Srh−1) 2mn t t t t r∈R 2 1 (cid:88) = (1− )Sh + (Shr−1 +Shr +Srh +Srh−1) n t 2mn t t t t r∈R where the sum of the coefficients is 1. Finally, in case 3, we have 2 2 (cid:88)(cid:88) E[Sid |F ] = (1− )Sid + (X [i]+X [ri]−Y [i]−Y [ri])2 t+1 t n t 3mn t t t t r∈R i∈G 2 4 (cid:88) = (1− )Sid + Sr 3n t 3mn t r∈R 8 AARON SMITH and here the sum of the coefficients is 1+ 2 . If we rewrite Uid = 1Sid, and otherwise 3n t 2 t Ug = Sg, then we find the following transformations. For case 1, we have t t 2 2 4 2 (2) E[Uh |F ] = (1− + )Uh + Uid + Uh2 t+1 t n 3mn t 3mn t mn t 1 (cid:88) + (Uhr−1 +Uhr +Urh +Urh−1) 2mn t t t t r∈R,r(cid:54)=h,h−1 For case 2, we have 2 1 (cid:88) (3) E[Uh |F ] = (1− )Uh + (Uhr−1 +Uhr +Urh +Urh−1) t+1 t n t 2mn t t t t r∈R Finally, in case 3, we have 2 2 (cid:88) (4) E[Uid |F ] = (1− )Uid + Ur t+1 t 3n t 3mn t r∈R where the sum of the coefficients is now 1 in all three cases. In particular, the equa- tions(2)to(4)nowdefineaMarkovchainonG. Fromequation(2), thisrandomwalk sends the identity to itself with probability 1− 2 , and to a uniformly chosen element 3n of S with the remaining probability; Equations (3) and (4) describe transitions for h ∈ R and h ∈/ R respectively. Call the transition kernel K. (cid:80) Before analyzing the chain, we note that (X [i]−Y [i]) = 0, and so i∈G t t (cid:32) (cid:33)2 (cid:88) 0 = (X [i]−Y [i]) t t i∈G (cid:88) (cid:88) = (X [i]−Y [i])2 + (X [i]−Y [i])(X [j]−Y [j]) t t t t t t i∈G i(cid:54)=j (cid:88) = Sid + Sh t t h(cid:54)=id From this calculation, if (cid:104)v,(2,1,1,...,1)(cid:105) = 0, then (cid:104)Kv,(2,1,1,...,1)(cid:105) = 0 as well. By direct computation, π = 1 (2,1,1,...,1) is a reversible measure for K. It n+1 is also clear that the distribution π = 1(1,1,...,1) is the reversible measure for K(cid:98). (cid:98) n We are now ready to compare the chains. Recall from [10] that the Dirichlet form associated to a Markov chain with transition kernel Q and stationary distribution ν is given by 1 (cid:88) E(φ) = ν(g)Q(g,h)(φ(x)−φ(y))2 2 h,g∈G Let E and E(cid:98) be the Dirichlet forms associated with K and K(cid:98) respectively. Then by comparing terms, it is clear that E(φ) ≥ 1E(cid:98)(φ) for any φ and π, π(cid:98) ≤ 2. By e.g. 4 π π Lemma 13.12 of [15], this implies γ ≥ 1γ. (cid:98) 8(cid:98) CONVERGENCE RATES OF GIBBS SAMPLERS 9 Recall that if (cid:104)v,π(cid:105) = 0, then (cid:104)Kmv,π(cid:105) = 0 as well. In particular K applied to the subspace orthogonal to π has L2 → L2 operator norm at most 1−γ. Thus, we have for any v in that subspace ||Kmv|| ≤ e−(cid:98)γm(cid:99)||v|| 2 2 going back to our original situation, we are interested in the vector (Sg). At time t 0, Sid ≤ 2, and by Cauchy-Schwarz |Sh| ≤ 4. Thus, ||Ug|| ≤ 4n, and of course 0 0 0 2 |Sid| ≤ ||Sg|| ≤ ||Ug|| . So we find that t t 2 t 2 E[|Sid|] ≤ 4ne−(cid:98)tγ(cid:98)(cid:99) 8 t which is the contraction estimate in Lemma 6. (cid:4) 5. Coupling for Gibbs Samplers on the Simplex with Geometry Having shown contraction, we must now show convergence in total variation dis- tance. First, the analogue to Lemma 4: Lemma 7 (Connectedness for Gibbs Sampler on Cayley Graphs). Let τ be as defined immediately before Lemma 4 and let γ be as defined immediately before Theorem 1. (cid:98) Then for t > 8(C+3)log(n), we have γ (cid:98) P[τ > t] ≤ 2n−C Proof. We consider a graph-valued process G , where G is a graph with no edges, t 0 and vertex set equal to the group G. To construct G from G , choose elements t+1 t g ∈ G and r ∈ R uniformly at random, and add the edge (g,gr) if it isn’t already in G . We note that τ > t if and only if G is not connected, so we would like to t t estimate the time at which G becomes connected. t First, fix two elements x,y ∈ G. We’d like to see if x, y are in the same component of G . To do so, let X , Y be two copies of the Gibbs sampler described in the last t t t section, with X = x, Y = y. Couple X , Y and G by the proportional coupling. 0 0 t t t Then assume x, y are in different components C , C at time t. We would have x y (cid:88) (cid:88) 1 (cid:88) 1 |X [g]−Y [g]|2 ≥ + t t |C |2 |C |2 x y g g∈Cx g∈Cy 4 ≥ n By Markov’s inequality, then, n (cid:88) P[C (cid:54)= C ] ≤ E[ |X −Y |2] x y t t 4 g 10 AARON SMITH and so, by standard union bound for fixed x over all y, if A is the event that G is t t disconnected, n2 (cid:88) P[A ] ≤ supE[ |X −Y |2|X = µ,Y = ν] t t t 0 0 4 µ,ν g ≤ 2n3e−(cid:98)tγ(cid:98)(cid:99) 8 where the last inequality is due to Lemma 6. (cid:4) Next, we define subset couplings and discuss success probabilities for this walk. Fix points X , Y , subset S ⊂ [n] and updated coordinates i = i(t) ∈ S, j = j(t) ∈/ S. t t The next step is to construct a pair of uniform random variables λ = λ(t)x and x λ = λ(t)y with which to update the chains X and Y respectively. Assume first that y t t Xt[i]+Xt[j] < 1, and choose λ uniformly in [0,1]. Then set Yt[i]+Yt[j] y Y [i]+Y [j] 1 (cid:88) t t (5) λ = λ + (Y [s]−X [s]) x y t t X [i]+X [j] X [i]+X [j] t t t t s∈S/{i} if that results in a value between 0 and 1. Otherwise, choose λ independently of λ , x y according to the density: (cid:18) (cid:19) X [i]+X [j] t t (6) f(λ) = C 1− 1 (λ) g(λ)∈[0,1] Y [i]+Y [j] t t where C−1 = (cid:82)1f(λ)dλ is a normalizing constant, and 0 Y [i]+Y [j] 1 (cid:88) t t g(λ) = λ + (Y [s]−X [s]) t t X [i]+X [j] X [i]+X [j] t t t t s∈S/{i} From the assumption that Xt[i]+Xt[j] < 1, it is easy to see that f really is a density Yt[i]+Yt[j] on [0,1]. From its construction as a remainder density, it is easy to check that under this coupling, λ is uniformly distributed on [0,1]. If Xt[i]+Xt[j] > 1, an analogous x Yt[i]+Yt[j] construction will work. More precisely, in this case choose λ first, and then choose x λ to satisfy equation 5 if the result is in [0,1], rather than choosing λ first. If the y y result is not in [0,1], then choose λ according to its remainder measure, given by y equation (6) with X and Y flipped and g replaced by g−1. Note that if equation 5 is t t satisfied, then w(S,X ) = w(S,Y ). t+1 t+1 For a pair of points (x,y) in the simplex, a pair of update entries (i,j), and a subset S ⊂ [n] of interest such that i ∈ S and j not in S, we define p(x,y,i,j,S) to be the probability that the associated subset coupling succeeds. Then the following lemma from [26] gives a lower bound on this probability:

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.