Table Of Content

Rapid Mixing of Gibbs Sampling on Graphs that are Sparse on Average 9 Elchanan Mossel∗ AllanSly † 0 0 January 29,2009 2 n a J 9 Abstract 2 Gibbssamplingalso knownasGlauberdynamicsisa populartechniqueforsamplinghighdimen- ] R sional distributions defined on graphs. Of special interest is the behavior of Gibbs sampling on the Erdo˝s-ReńyirandomgraphG(n,d/n), whereeachedgeischosenindependentlywithprobabilityd/n P anddisfixed. WhiletheaveragedegreeinG(n,d/n)isd(1−o(1)),itcontainsmanynodesofdegree . h oforderlogn/loglogn. t a The existence of nodes of almost logarithmic degrees implies that for many natural distributions m defined on G(n,p) such as uniformcoloring(with a constantnumberof colors) or the Ising modelat [ anyfixedinversetemperatureβ,themixingtimeofGibbssamplingisatleastn1+Ω(1/loglogn). Recall thattheIsingmodelwithinversetemperatureβ definedonagraphG = (V,E)isthedistributionover 3 {±}V givenbyP(σ) = 1 exp(β σ(v)σ(u)). Highdegreenodesposea technicalchallenge v Z (v,u)∈E 3 in proving polynomial time mixing of the dynamics for many models including the Ising model and P 0 coloring. Almost all known sufficient conditions in terms of β or number of colors needed for rapid 6 mixingofGibbssamplersarestatedintermsofthemaximumdegreeoftheunderlyinggraph. 3 Inthisworkweshowthatforeveryd < ∞andtheIsingmodeldefinedonG(n,d/n),thereexists . 4 a β > 0, such that for all β < β with probability going to 1 as n → ∞, the mixing time of the d d 0 dynamics on G(n,d/n) is polynomial in n. Our results are the first polynomial time mixing results 7 provenforanaturalmodelonG(n,d/n)ford>1wheretheparametersofthemodeldonotdependon 0 : n. TheyalsoprovidearareexamplewhereonecanproveapolynomialtimemixingofGibbssamplerin v asituationwheretheactualmixingtimeisslowerthannpolylog(n). Ourproofexploitsinnovelways i X the localtreelike structureof Erdo˝s-Reńyirandomgraphs, comparisonand blockdynamicsarguments r andarecentresultofWeitz. a Ourresultsextendtomuchmoregeneralfamiliesofgraphswhicharesparseinsomeaveragesense andtomuchmoregeneralinteractions. Inparticular,theyapplytoanygraphforwhicheveryvertexv ofthegraphhasaneighborhoodN(v)ofradiusO(logn)inwhichtheinducedsub-graphisatreeunion at most O(logn) edges and where for each simple path in N(v) the sum of the vertex degrees along thepathisO(logn). Moreover,ourresultapplyalsointhecaseofarbitraryexternalfieldsandprovide thefirstFPRASforsamplingtheIsingdistributioninthiscase. WefinallypresentanonMarkovChain algorithmforsamplingthedistributionwhichiseffectiveforawiderrangeofparameters. Inparticular, forG(n,d/n)itappliesforallexternalfieldsandβ <β ,wheredtanh(β )=1isthecriticalpointfor d d decayofcorrelationfortheIsingmodelonG(n,d/n). Keywords: Erdo˝s-ReńyiRandomGraphs,GibbsSamplers,GlauberDynamics,MixingTime,Isingmodel. ∗Email:[email protected],U.C.Berkeley.SupportedbyanAlfredSloanfellowshipinMathematics andbyNSFgrantsDMS-0528488,DMS-0548249(CAREER)byDODONRgrantN0014-07-1-05-06. †Email:[email protected],U.C.Berkeley. 1 Introduction Efficient approximate sampling from Gibbs distributions is a central challenge of randomized algorithms. Examples include sampling from the uniform distribution over independent sets of a graph [27, 26, 6, 8], sampling from the uniform distribution of perfect matchings in a graph [17], or sampling from the uniform distribution of colorings [12, 4, 5] of a graph. A natural family of approximate sampling techniques is given by Gibbs samplers, also known as Glauber dynamics. These are reversible Markov chains that have the desired distribution as their stationary distribution and where at each step the status of one vertex isupdated. Itistypicallyeasytoestablishthatthechainswilleventuallyconvergetothedesireddistribution. Studying the convergence rate of the dynamics is interesting from both the theoretical computer science thestatistical physics perspectives. Approximate convergence intimepolynomial inthesizeofthesystem, sometimes called rapid mixing, is essential in computer science applications. The convergence rate is also of natural interest in physics where the dynamical properties of such distributions are extensively studied, see e.g. [20]. Much recent work has been devoted to determining sufficient and necessary conditions for rapid convergence of Gibbs samplers. A common feature to most of this work [27, 26, 6, 8, 12, 4, 18, 22] isthattheconditions forconvergence arestatedintermsofthemaximaldegreeoftheunderlying graph. In particular,theseresultsdonotallowfortheanalysisofthemixingrateofGibbssamplersontheErdo˝s-Reńyi random graph, which is sparse on average, but has rare denser sub-graphs. Recent work has been directed at showing how to relax statements so that they do not involve maximal degrees [5, 13], but the results are not strong enough to imply rapid mixing of Gibbs sampling for the Ising model on G(n,d/n) for d > 1 andanyβ > 0orforsampling uniform colorings fromG(n,d/n)ford > 1and1000dcolors. Thesecond challenge ispresented asthemajoropenproblem of[5]. In this paper we give the first rapid convergence result of Gibbs samplers for the Ising model on Erdo˝s- Reńyi random graphs in terms of the average degree and β only. Our results hold for the Ising model allowing different interactions and arbitrary external fields. We note that there is an FPRAS that samples from the Ising model on any graph [16] as long as all the interactions are positive and the external field is thesamefor allvertices. However, these results do notprovide aFPRASinthe case wheredifferent nodes havedifferent externalfieldsaswedohere. Ourresultsarefurtherextendedtomuchmoregeneralfamiliesofgraphsthatare“tree-like” and“sparseon average”. These are graph where every vertex has a radius O(logn) neighborhood which is a tree with at mostO(logn)edges added and wherefor each simple path in theneighborhood, the sum ofdegrees along the path is O(logn). An important open problem [5] is to establish similar conditions for other models definedongraphs, suchastheuniform distribution overcolorings. Below we define the Ising model and Gibbs samplers and state our main result. Some related work and a sketchoftheproofarealsogivenastheintroduction. Section2givesamoredetailedproofthoughwehave nottriedtooptimizeanyoftheparametersinproofsbelow. 1.1 The IsingModel The Ising model is perhaps the simplest model defined on graphs. This model defines a distribution on labelings of the vertices of the graph by + and −. The Ising model has various natural generalizations including the uniform distribution over colorings. The Ising model with varying parameters is of use in a varietyofareasofmachinelearning, mostnotablyinvision, seee.g.[9]. Definition1.1 The (homogeneous) Ising model on a (weighted) graph G with inverse temperature β is a 2 distribution onconfigurations {±}V suchthat 1 P(σ) = exp(β σ(v)σ(u)) (1) Z(β) {v,u}∈E X whereZ(β)isanormalizing constant. Moregenerally, wewillbeinterested in(inhomogeneous) Isingmodelsdefinedby: 1 P(σ) = exp( β σ(v)σ(u)+ h σ(v)), (2) u,v v Z(β) {v,u}∈E v X X where h are arbitrary and where β ≥ 0 for all u and v. In the more general case we will write v u,v β = max β . u,v u,v 1.2 Gibbs Sampling The Gibbs sampler is a Markov chain on configurations where a configuration σ is updated by choosing a vertex v uniformly atrandom and assigning itaspin according tothe Gibbs distribution conditional on the spinsonG−{v}. Definition1.2 Givenagraph G = (V,E) andaninverse temperature β,theGibbssampler isthe discrete time Markov chain on {±}V where given the current configuration σ the next configuration σ′ is obtained bychoosingavertexv inV uniformlyatrandomand • Lettingσ′(w) = σ(w)forallw 6= v. • σ′(v)isassigned thespin+withprobability exp(h + β σ(u)) v u:(v,u)∈E u,v . exp(h + β σ(u))+exp(−h − β σ(u)) v u:(v,u)∈E u,v P v u:(v,u)∈E u,v P P We will be interested in the time it takes the dynamics to get close to the distributions (1) and (2). The mixingtime τ ofthechainisdefinedasthenumberofstepsneededinordertoguarantee thatthechain, mix starting from an arbitrary state, is within total variation distance 1/2e from the stationary distribution. We willboundthemixingtimebytherelaxation timedefinedbelow. It is well known that Gibbs sampling is a reversible Markov chain with stationary distribution P. Let 1 = λ > λ ≥ ... ≥ λ ≥ −1 denote the eigenvalues of the transition matrix of Gibbs sampling. The 1 2 m spectral gap is denoted by max{1−λ ,1−|λ |} and the relaxation time τ is the inverse of the spectral 2 m gap. Therelaxation timecanbegivenintermsoftheDirichletformoftheMarkovchainbytheequation 2 P(σ)(f(σ))2 τ = sup σ : P(σ)f(σ) 6= 0 (3) Q(σ,τ)(f(σ)−f(τ))2 ( σ6=τ P σ ) X where f : {±}V → R is any fuPnction on configurations, Q(σ,τ) = P(σ)P(σ → τ) and P(σ → τ) is transition probability from σ to τ. We use the result that for reversible Markov chains the relaxation time satisfies 1 τ ≤ τ ≤ τ 1+ log(minP(σ))−1 (4) mix 2 σ (cid:18) (cid:19) 3 whereτ isthemixingtime(seee.g. [2])andsobyboundingtherelaxationtimewecanboundthemixing mix timeuptoapolynomialfactor. While our results are given for the discrete time Gibbs Sampler described above, it will at times be conve- nient to consider the continuos time version of the model. Here sites are updated at rate 1 by independent Poissonclocks. Thetwochainsarecloselyrelated,therelaxationtimeofthecontinuoustimeMarkovchain isntimestherelaxation timeofthediscrete chain(seee.g. [2]). Forourproofsitwillbeusefultousethenotionofblockdynamics. TheGibbssamplercanbegeneralizedto updateblocks ofvertices ratherthanindividual vertices. ForblocksV ,V ,...,V ⊂ V withV = ∪ V the 1 2 k i i blockdynamicsoftheGibbssamplerupdatesaconfigurationσbychoosingablockV uniformlyatrandom i andassigningthespinsinV accordingtotheGibbsdistributionconditionalonthespinsonG−{V }. There i i isalsoacontinuousanaloginwhichtheblockseachupdateatrate1. Incontinuoustime,therelaxationtime ofthe Gibbs sampler can begiven in termsof the relaxation timeof the block dynamics and therelaxation timesoftheGibbssamplerontheblocks. Proposition 1.3 Incontinuous timeifτ istherelaxation timeoftheblock dynamics and τ isthemax- block i imum the relaxation time on V given any boundary condition from G − {V } then by Proposition 3.4 of i i [20] τ ≤ τ (maxτ )max{#j :v ∈ V }. (5) block i j i v∈V 1.2.1 MonotoneCoupling For two configurations X,Y ∈ {−,+}V we let X < Y denote that X is greater than or equal to Y pointwise. When all the interactions β are positive, it is well known that the Ising model is a monotone ij systemunderthispartialordering, thatisifX < Y then, P σ = +|σ = X ≥ P σ = +|σ = Y . v V\{v} V\{v} v V\{v} V\{v} As it is a monotne sys(cid:0)tem there exists a coupling(cid:1)of Ma(cid:0)rkov chains {Xtx}x∈{−,+}V(cid:1) such that marginally each has the lawofthe Gibbs Samplerwith starting configurations Xx = X and further that ifx < y then 0 for all t, Xx < Xy. This is referred to as the monotone coupling and can be constructed as follows: let t t v ,...bearandomsequence ofverticesupdated bytheGibbsSamplerandassociate withthemiidrandom 1 variablesU ,...distributed asU[0,1]whichdeterminehowthesiteisupdated. Attheithupdatethesitev 1 i isupdated to+if exp(h + β σ(u)) v u:(v,u)∈E u,v U ≤ i exp(h + β σ(u))+exp(−h − β σ(u)) v u:(v,u)∈E u,v P v u:(v,u)∈E u,v and to − otherwise. It is well kPnown that such transitions preserve tPhe partial ordering which guarantees that if x < y then Xx < Xy by the monotonicity ofthe system. Inparticular this implies that itisenough t t tobounded thetimetakentocouplefromtheall+andall−starting configurations. 1.3 Erdo˝s-Reńyi Random Graphs and OtherModels ofgraphs The Erdo˝s-Reńyi random graph G(n,p), is the graph with n vertices V and random edges E where each potential edge (u,v) ∈ V ×V ischosen independently withprobability p. Wetake p = d/n where d ≥ 1 is fixed. In the case d < 1, it is well known that with high probability all components of G(n,p) are of logarithmicsizewhichimpliesimmediatelythatthedynamicsmixinpolynomial timeforallβ. 4 Foravertex v in G(n,d/n) let V(v,l) = {u ∈ G : d(u,v) ≤ l}, the set of vertices within distance l of v, let S(v,l) = {u ∈ G : d(u,v) = l}, let E(v,l) = {(u,w) ∈ G : u,w ∈ V(v,l)} and let B(v,l) be the graph(V(v,l),E(v,l)). Ourresultsonlyrequiresomesimplefeaturesoftheneighborhoods ofallvertices inthegraph. Definition1.4 LetG= (V,E)beagraphandv avertexinG. Lett(G)denotethetreeaccessofG,i.e., t(G) = |E|−|V|+1. Wecallapathv ,v ,...selfavoiding ifforalli 6= j itholdsthatv 6= v . Weletthemaximalpathdensity 1 2 i j mbedefinedby m(G,v,l) = max d u Γ u∈Γ X wherethe maximum is taken over allself-avoiding paths Γ starting at v withlength at mostl and d isthe u degreeofnodeu. Wewritet(v,l)fort(B(v,l))andm(v,l)form(B(v,l),v,l). 1.4 OurResults Throughout wewillbeusing thetermwithhigh probability tomeanwithprobability 1−o(1)asngoesto ∞. Theorem1.5 LetGbearandom graphdistributed asG(n,d/n). When 1 tanh(β) < , e2d there exists constant a C = C(d) such that the mixing time of the Glauber dynamics is O(nC) with high probability (probability 1− o(1)) over the graph as n goes to ∞. The result holds for the homogeneous model(1)andfortheinhomogeneous model(2)provided|h |≤ 100βnforallv. v NoteinthetheoremabovetheO(·)bounddependsonβ. Itmaybeviewedasaspecialcaseofthefollowing moregeneralresult. Theorem1.6 Let G = (V,E) be any graph on n vertices satisfying the following properties. There exist a> 0,0 < b< ∞and0< c < ∞suchthatforallv ∈ V itholdsthat t(v,alogn)≤ blogn, m(v,alogn)≤ clogn. Thenif a tanh(β) < , e1/a(c−a) there exists constant aC = C(a,b,c,β) suchthat themixing timeoftheGlauber dynamics isO(nC). The result holds for the homogeneous model (1) and for the inhomogeneous model (2) provided |h | ≤ 100βn v forallv. Remark1.7 The condition that |h | ≤ 100βn for all v will be needed in the proof of the result in the v general case (2). However, we note that given Theorem 1.6 as a black box, it is easy to extend the result and provide an efficient sampling algorithm in the general case without any bounds on the h . In the case v wheresomeoftheverticesvsatisfy|h |≥ 10βn,itiseasytoseethatthetargetdistribution satisfiesexcept v 5 with exponentially small probability that σ = + for all v with h > 10βn and σ = − for all v with v v v h < −10βn. Thuswemayset σ = +whenh > 10βn and σ = −when h < 10βn and consider the v v v v v dynamics wherethesevalues arefixed. Doingsowilleffectively restrict thedynamics tothegraphspanned bytheremaining vertices and willmodify thevalues ofh fortheremaining vertices; however, itiseasyto v see that all remaining vertices will have |h | ≤ 100βn. It is also easy to verify that if the original graph v satisfied the hypothesis of Theorem 1.6 then so does the restricted one. Therefore we obtain an efficient samplingprocedure forthedesireddistribution. 1.5 Related Workand OpenProblems Muchworkhasbeenfocusedontheproblemofunderstanding themixingtimeoftheIsingmodelinvarious contexts. Inaseriesofresults[14,1,28]culminatingin[25]itwasshownthattheGibbssampleroninteger latticemixesrapidlywhenthemodelhasthestrongspatialmixingproperty. InZ2strongspatialmixing,and therefore rapidmixing, holdsintheentireuniqueness regime(seee.g. [21]). Ontheregulartreethemixing time is always polynomial but is only O(nlogn) up to the threshold for extremity [3]. For completely general graphs thebestknownresultsaregivenbytheDobrushin condition whichestablishes rapidmixing whendtanh(β) < 1wheredisthemaximumdegree. Most results for mixing rates of Gibbs samplers are stated in terms of the maximal degree. For example manyresultshavefocusedonsamplinguniformcolorings, theresultareoftheform: foreverygraphwhere all degrees are at most d if the number of colors q satisfies q ≥ q(d) then Gibbs sampling is rapidly mixing [27, 26, 6, 8, 12, 4, 18, 22]. For example, Jerrum [15] showed that one can take q(d) = 2d. Thenoveltyoftheresult presented hereisthatitallowsforthestudyofgraphs wheretheaverage degreeis smallwhilesomedegreesmaybelarge. Previous attempts at studying this problem, withbounded average degree but some large degrees, for sampling uniform colorings yielded weaker results. In [5] it is shown that Gibbs sampling rapidly mixes on G(n,d/n) if q = Ω ((logn)α) where α < 1 and that a variant of the algorithm rapidly mixes if d q ≥ Ω (loglogn/logloglogn). Indeed the main open problem of [5]is to determine if one can take q to d be a function of d only. Our results here provide a positive answer to the analogous question for the Ising model. We further note that other results where the conditions on degree are relaxed [13] do not apply in oursetting. Thefollowingpropositions, whichareeasyandwellknown,establish thatford> 1andlargeβ themixing timeisexponential innandthatforalld > 0andβ > 0themixingtimeismorethannpolylog(n). Proposition 1.8 If d > 0 and β > 0 then with high probability the mixing time of the dynamics on G(n,d/n)isatleastn1+Ω(1/loglogn). Proof: TheprooffollowsfromthefactthatG(n,d/n)containsanisolatedstarwiths = Ω(logn/loglogn) vertices with high probability and that the mixing time ofthe star is sexp(Ω(s)). Since thestar isupdated withfrequency s/n,itfollowsthatthemixingtimeisatleast (n/s)sexp(Ω(s)) = nexp(Ω(s)) = n1+Ω(1/loglogn). (cid:4) Proposition 1.9 If d > 1 then there exists β′ such that if β > β′ then the with probability going to 1, the d d mixingtimeofthedynamicsonG(n,d/n)isexp(Ω(n)). 6 Proof: Theclaim followsfromexpansion properties ofG(n,d/n). Itiswellknownthatifd > 1thenwith high probability G(n,d/n) contains a core C of size at least α n such that that every S ⊂ C of size at d least α /4n has atleast γ n edges between C and S \C. Let Abe the set of configurations σ such that σ d d restrictedtoC hasatleastα /4+’sandatleastα /4−’s. ThenP(A) ≤ 2nexp(β|E|−2βγ n)/Z. Onthe d d d otherhandif+denotestheall+statethenP(+) = P(−) = exp(β|E|)/Z. Thusbystandardconductance arguments, themixingtimeisexponential innwhen2exp(−2βγ ) < 1. (cid:4) d It is natural to conjecture that properties of the Ising model on the branching process with Poisson(d) offspring distribution determines the mixing timeof the dynamics on G(n,d/n). In particular, itis natural to conjecture that the critical point for uniqueness of Gibbs measures plays a fundamental role [10, 24] as results of similar flavor were recently obtained for the hard-core model on random bi-partite d regular graphs[23]. Conjecture1.10 Ifdtanh(β) > 1thenwithhigh probability over G(n,d/n)themixingtimeoftheGibbs sampler is exp(Ω(n)). If d > 1 and dtanh(β) < 1 then with high probability over G(n,d/n) the mixing timeoftheGibbssamplerispolynomial inn. After proposing the conjecture we have recently learned that Antoine Gerschenfeld and Andrea Montanari have found an elegant proof for estimating the partition function (that is the normalizing constant Z(β)) for the Ising model on random d-regular graphs [11]. Their result together with a standard conductance argument showsexponentially slowmixingabovetheuniqueness threshold whichinthecontext ofrandom regulargraphsis(d+1)tanh(β) = 1. 1.6 ProofTechnique Ourprooffollowsthefollowingmainsteps. • Analysis ofthe mixing timeforGibbs sampling ontrees ofvarying degrees. Wefindabound onthe mixing time on trees in terms of the maximal sum of degrees along any simple path from the root. This implies that for all β if we consider a tree where each node has number of descendants that has Poisson distribution with parameter d − 1 then with high probability the mixing time of Gibbs sampling on the tree is polynomial in its size. The motivation for this step is that we are looking at tree-like graphs Notehowever, that theresults established here hold for allβ, whilerapid mixing for G(n,d/n)doesnotholdforallβ. Ouranalysishereholdsforallboundaryconditionsandallexternal fieldsonthetree. • We next use standard comparison arguments to extend the result above to case where the graph is a tree with a few edges added. Note that with high probability for all v ∈ G(n,d/n) the induced subgraphB(v, 1log n)onallverticesofdistanceatmost 1log nfromvisatreewithatmostafew 2 d 2 d edgesadded. (Notethisstillholdsforallβ). • We next consider the effect of the boundary on the root of the tree. We show that for tree of alogn levels, the total variation distance of the conditional distribution at the root given all + boundary conditionsandall−boundaryconditionsisn−1−Ω(1)withprobability1−n−1−Ω(1) providedβ < β d issufficientlysmall(thisistheonlystepwherethefactthatβ issmallisused). • Usingtheconstruction ofWeitz[27]andaLemmafrom[18,3]weshowthatthespatialdecayestab- lishedinthepreviousstepalsoholdswithprobability 1−o(1)forallneighborhoods B(v,alogn)in thegraph. 7 • Theremainingstepsusethefactthatastrongenoughdecayofcorrelationinsideblockseachofwhich is rapidly mixing implies that the dynamics on the full graph is rapidly mixing. This idea is taken from[7]. • Inordertoshowrapidmixingitsufficestoexhibitacouplingofthedynamicsstartingatall+andall −thatcoupleswithprobability atleast1/2inpolynomialtime. Weshowthatthemonotonecoupling (wheretheconfigurationstartedat−isalways“below”theconfigurationstartedat+)satisfiesthisby showingthatforeachvinpolynomialtimethetwoconfigurationsatvcoupledexceptwithprobability n−1/(2e). • In order to establish the later fact, it suffices to show that running the dynamics on B(v,alogn) starting at all + and all + boundary conditions and the dynamics starting at all − and all − will coupleatv exceptwithprobability n−1/(2e)withinpolynomial time. • The final fact then follows from the fact that the dynamics inside B(v,alogn) have polynomial mixingtimeandthatthestationarydistributionsinB(v, 1log n)given+and−boundaryconditions 2 d agreeatv withprobability atleast1−n−1/(4e). Wenotethatthedecayofcorrelation ontheself-avoiding treedefinedbyWeitzthatweprovehereallowsa different sampling scheme from the target distribution. Indeed, this decay of correlation implies that given anyassignment toasubset ofthevertices S andanyv 6∈ S wemaycalculate using theWeitztreeofradius alogninpolynomial timetheconditional probability thatσ(v) = +uptoanadditive errorofn−1/100. It is easy to see that this allow sampling the distribution in polynomial time. More specifically, consider the followingalgorithm from[27]. Algorithm1.11 Fix a radius parameter L and label the vertices v ,...,v . Then the algorithm approxi- 1 n matelysamplesfromP(σ)byassigning thespinsofv sequentially. Repeatingfrom1 ≤ i ≤ n: i • Instepiconstruct TL (v ),thetreeofself-avoiding walkstruncated atdistance Lfromv . SAW i i • Calculate pi = PTSLAW(σvi = +|σ{v1,...,vi−1},τA−Vi−1). (The boundary conditions at the tree can be chosen arbitrarily; in particular, one may calculate p i withnoboundary conditions). • Fixσ = X whereX isarandomvariable withp = P(X = +)= 1−P(X = −). vi vi vi i vi vi Thenweprovethat: Theorem1.12 LetGbearandomgraphdistributed asG(n,d/n). When 1 tanh(β) < , d for any γ > 0 there exist constants r = r(d,β,γ) and C = C(d,β,γ) such that with high probability Algorithm 1.11, with parameter rlogn, has running time O(nC) and output distribution Q with d (P,Q) < n−γ. Theresultholdsforthehomogeneous model(1)andfortheinhomogeneous model(2). TV 8 Theorem1.13 LetG = (V,E) be any graph on n vertices satisfying the following properties. Thereexist a> 0,0 < b< ∞suchthatforallv ∈ V, |V (v,alogn)|≤ balogn (6) TSAW(v) whereV (v,r) = {u∈ T (v) :d(u,v) ≤ r}. When TSAW(v) SAW 1 tanh(β) < , b for any γ > 0 there exist constants r = r(a,b,β,γ) and C = C(a,b,β,γ) such that Algorithm 1.11, with parameter rlogn, has running time O(nC) and output distribution Q with d (P,Q) < n−γ. The result TV holdsforthehomogeneous model(1)andfortheinhomogeneous model(2). 1.7 Acknowledgment E.M. thanks Andrea Montanari and Alistair Sinclair for interesting related discussions. The authors thank JinshanZhangforpointing outanerrorinaprevious draft. 2 Proofs 2.1 Relaxationtimeon Sparseand GaltonWatsonTrees Recall thatthe local neighborhood ofavertex inG(n,d/n)looks like abranching process tree. Inthe first step of the proof we bound the relaxation time on a tree generated by a Galton-Watson branching process. Moregenerally, weshowthattreesthatarenottoodensehavepolynomialmixingtime. Definition2.1 LetT beafiniterootedtree. Wedefinem(T) = max d wherethemaximumistaken Γ v∈Γ v overallsimplepathsΓemanating fromtherootandd isthedegreeofnodev. v P Theorem2.2 Letτ betherelaxation timeofthecontinuous timeGibbsSampleronT where0 ≤ β ≤ β u,v foralluandv andgivenarbitrary boundary conditions andexternal field. Then τ ≤ exp(4βm(T)). Proof: Weproceedbyinductiononmwithasimilarargumenttotheoneusedin[18]foraregulartree. Notethatif m = 0theclaimholdstruesinceτ = 1. Forthegeneral case,letv betherootofT,anddenoteitschildren by u ,...,u and denote the subtree of the descendants of u by Ti. Now let T′ be the tree obtained by 1 k i removing the k edges from v to the u , let P′ be theIsing model on T′ and let τ′ bethe relaxation timeon i T′. Byequation (3)wehavethat max P(σ)/P′(σ) τ/τ′ ≤ σ ≤ exp(4βk). (7) min Q(σ,τ)/Q′(σ,τ) σ,τ Now we divide T′ into k + 1 blocks {{v},{T1},...,{Tk}}. Since these blocks are not connected to each other the block dynamics is simply the product chain. Each block updates at rate 1 and therefore the 9 relaxation time of the block dynamics is simply 1. By applying Proposition1.3 we get that the relaxation timeonT′ issimplythemaximumoftherelaxation timesontheblocks, τ′ ≤ max{1,τi}. where τi is the relaxation time on Ti. Note that by the definition of m, it follows that the value of m for eachofthesubtreesTi satisfiesm(Ti) ≤ m−k,andthereforeforalliitholdsthatτi ≤ exp(4β(m−k)). Thisthenimpliesby(7)thatτ ≤ exp(4βm)asneeded. (cid:4) 2.2 Someproperties ofGaltonWatsonTrees Here we prove a couple of useful properties for Galton Watson trees that will be used below. We let T be the tree generated by a Galton-Watson branching process with offspring distribution N such that for all t, Eexp(tN) < ∞ and such that E(N) = d. Of particular interest to us would be the Poisson distribution withmeandwhichhas Eexp(tN)= exp(d(et −1)). WeletT denotethefirstrlevelsofT. WeletM(r)denotethevalueofmforT(r)andτ(r)thesupremum r of the relaxation times of the continuous time Gibbs Sampler on T(r) over any boundary conditions and externalfieldsassumingthatβ = supβ . WedenotebyZ thenumberofdescendants atlevelr. u,v r Theorem2.3 Undertheassumptions abovewehave: • Thereexistsapositivefunction c(t)suchthatforalltandallr: E[exp(tM(r))] ≤ exp(c(t)r). • ThenEτ(r) ≤ C(β)r forsomeC(β) < ∞depending onβ = supβ only. u,v • IfN isthePoissondistribution withmeandthenforallt > 0, supE[exp(tZ d−r)] < ∞. r r Proof: LetK denote the degree of the root of T and for 1 ≤ i ≤ K let M (r−1) denote the value of m r i forthesub-treeofT rootedatthei’thchild. Then: r E[exp(tM(r))] = E[max(1, max exp(t(M (r−1)+K)))] i 1≤i≤K K ≤ E[(1+exp(tK)) exp(tM (r−1))] i i=1 X = E[(1+Kexp(tK))]E[exp(tM(r−1))]. andsotheresultfollowsbyinduction provided thatc(t)islargeenough sothat exp(c(t)) ≥ E(1+Kexp(tK)). Forthesecond statementofthetheorem,notethatbytheprevious theoremwehavethat Eτ(r) ≤ E[exp(4βM(r))], 10