ebook img

Entropic Ricci curvature bounds for discrete interacting systems PDF

0.29 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Entropic Ricci curvature bounds for discrete interacting systems

TheAnnalsofAppliedProbability 2016,Vol.26,No.3,1774–1806 DOI:10.1214/15-AAP1133 (cid:13)c InstituteofMathematicalStatistics,2016 ENTROPIC RICCI CURVATURE BOUNDS FOR DISCRETE 1 6 INTERACTING SYSTEMS 1 0 By Max Fathi and Jan Maas 2 n Universit´e Paris 6 and Institute of Science and Technology Austria u J We develop a new and systematic method for proving entropic 7 RiccicurvaturelowerboundsforMarkovchainsondiscretesets.Us- 2 ing different methods, such bounds have recently been obtained in severalexamples(e.g.,1-dimensionalbirthanddeathchains,product ] chains,Bernoulli–Laplacemodels,andrandomtranspositionmodels). R However,ageneralmethodtoobtaindiscreteRicciboundshadbeen P lacking.Ourmethodcoversalloftheexamplesabove.Inaddition,we h. obtain new Ricci curvature bounds for zero-range processes on the t complete graph. The method is inspired by recent work of Caputo, a Dai Pra and Posta on discrete functional inequalities. m [ 1. Introduction. RiccicurvaturelowerboundsforRiemannianmanifolds 3 play acrucialroleindifferentialgeometry, butalsoinprobabilitytheoryand v analysis. In particular, such bounds are known to imply important proper- 2 6 ties of Brownian motion on the manifold. For example, positive curvature 5 bounds imply several useful functional inequalities (logarithmic Sobolev in- 0 equality, Poincar´e inequality), as well as exponential rates of convergence 0 . to equilibrium. Such results were first obtained using the celebrated Bakry– 1 E´mery approach developed in [1]. 0 5 1 1.1. Discrete entropic Ricci curvature. In [18] and [9] a notion of Ricci : v curvature bounds for Markov chains was introduced, based on convexity i X properties of the relative entropy along geodesics in the space of probabil- ity measures over the state space, for a well-chosen metric structure. It is r a Received January 2015; revised July 2015. 1Supported by the German Research Foundation through the Collaborative Research Center 1060 The Mathematics of Emergent Effects and the Hausdorff Center for Mathe- matics. AMS 2000 subject classifications. 60J10, 60K35. Key words and phrases. DiscreteRiccicurvature,transportmetrics,functionalinequal- ities,birth-deathprocesses,zero-rangeprocesses,Bernoulli–Laplacemodel,randomtrans- position model. This is an electronic reprint of the original article published by the Institute of Mathematical Statistics in The Annals of Applied Probability, 2016,Vol. 26, No. 3, 1774–1806. This reprint differs from the original in pagination and typographic detail. 1 2 M. FATHI ANDJ. MAAS the discrete analogue of the synthetic notion of Ricci curvature bounds in geodesicmetric-measurespacesobtainedinthecontributionsofLott–Villani [17] and Sturm [24]. The main observation behind the Lott–Villani–Sturm definition of Ricci curvatureboundsis that, on aRiemannian manifold M,theRicci curvature isboundedfrombelowbysomeconstant κ∈RifandonlyiftheBoltzmann– Shannonentropy H(ρ)= ρlogρdvol is κ-convex along geodesics in the L2- Wasserstein space of probability measures on M. However, this definition is R not well adapted to discrete spaces, because the L2-Wasserstein space over a discrete space does not contain any geodesic. To circumvent this issue, [18] and [20] introduced a different metric W. This metric is built from a Markov kernel in such a way that the heat flow associated to this kernel is the gradient flow of the entropy with respect to thedistance W.Thisisananalogueofthecontinuous situation,whereithas been proven by Jordan, Kinderlehrer and Otto in [15] that the heat flow on Rd isthegradientflowoftheentropywithrespecttotheWassersteinmetric. A Markov chain is then said to have Ricci curvature bounded from below by some constant κ if the relative entropy (with respect to the invariant probability measure of the chain) is κ-convex along geodesics for the metric W. It has been shown in [9] that a Ricci curvature lower bound has signif- icant consequences for the associated Markov chain, such as a (modified) logarithmic Sobolev inequality, a Talagrand transportation inequality and a Poincar´e inequality.Therefore,itisdesirabletoobtainsharpRiccicurvature boundsin concrete discrete examples. To this day, there are very few results of this type. Mielke[20]obtainedRiccicurvatureboundsforone-dimensionalbirthand deathprocesses,andappliedtheseboundstodiscretizationsof1-dimensional Fokker–Planck equations. Erbar and Maas [9] proved a tensorization princi- ple: the Ricci curvature lower bound for a product system is the minimum of the Ricci curvature bounds of the components. This property is crucial for applications to high-dimensional problems. As a special case, the result yields the sharp Ricci bound for the discrete hypercube {0,1}n. The first high-dimensional results beyond the product-setting have been recently ob- tained byErbar,Maas andTetali [11],whoobtained Ricci curvaturebounds for the Bernoulli–Laplace model and for the random transposition model on the symmetric group. In spite of this progress, a systematic method for obtaining discrete Ricci curvatureboundshasbeenlacking. Inthispaper,wepresentsuchamethod, which allows us to obtain curvature bounds for several interacting particle systems on the complete graph. The method allows us to recover all of the above-mentioned results, and yields new curvature bounds for zero-range processes on the complete graph. RICCI BOUNDSFOR DISCRETESYSTEMS 3 1.2. The Bochner-approach. To explain our method, let us recall that the discrete Ricci curvature bounds considered in this paper correspond to bounds on the second derivative of the entropy along W-geodesics. In the continuous setting, on a Riemannian manifold M, the second derivative of the entropy along a 2-Wasserstein geodesic (ρ ) is formally given by t d2 1 (1) H(ρ )= −h∇ψ ,∇∆ψ i+ ∆(|∇ψ |2)dρ , dt2 t t t 2 t t ZM whereψ:[0,1]×M→Rsatisfiesthecontinuity equation∂ ρ+∇·(ρ∇ψ)=0 t and the Hamilton–Jacobi equation ∂ ψ + 1|∇ψ|2 =0. Note that the inte- t 2 grand above can be reformulated using Bochner’s formula, which yields (2) −h∇ψ ,∇∆ψ i+ 1∆(|∇ψ |2)= 1|D2ψ|2+Ric(∇ψ,∇ψ). t t 2 t 2 Therefore, it follows at least formally that a lower bound Ric≥κ implies κ-convexity of the entropy along geodesics. In the discrete case, the second derivative of the entropy along geodesics is given by an expression B(ρ,ψ) which somewhat resembles the right-hand side of (1), but the dependence on ρ is more complicated. Unfortunately, there does not seem to be a suitable discrete analogue of Bochner’s formula. In this paper, we present an approach to get around this difficulty. We obtain a convenient lower bound for the second derivative of the entropy, whichturnsouttobeveryuseful;seeTheorem3.5below.Thisexpressionhas the same features as the right-hand sideof Bochner’s formula (2): one of the terms is nonnegative and contains only second-order (discrete) derivatives, while the other terms contain only first-order derivatives. The expression is remarkably flexible, since it allows us to derive sharp Ricci bounds in the rather diverse examples mentioned above. The work of Caputo, Dai Pra and Posta [5] which inspired this paper, fits very naturally into this framework. In order to obtain modified logarith- mic Sobolev inequalities, these authors followed the Bakry–E´mery strategy, which consists of computing the second derivative of the entropy along the “heat equation” ∂ ρ=Lρ, where L is the generator of a reversible Markov t chain. It can be checked that this second derivative can be expressed as 2B(ρ,logρ). Therefore, the bounds obtained in [5] and [8] are a special case of the bounds on B(ρ,ψ), which we obtain for arbitrary ψ. We refer to Section 2.3 for more details. 1.3. Organization and notation. In Section 2, we collect preliminaries on discrete transportmetrics, the associated Riemannian structure, and the notion of discrete entropic Ricci curvature. Section 3 contains the general criterion for Ricci curvature bounds. Various examples are studied in Sec- tion 4. The Appendix states a few useful properties of the logarithmic mean that are used in the proofs. 4 M. FATHI ANDJ. MAAS Throughout the paper, we use the probabilistic notation π[F]=π[F(η)]= F(η)π(η). η∈X X 2. Transport metrics and discrete Ricci curvature. In this section, we brieflyrecallsomebasicfactsonthediscretetransportationmetricW,which plays a crucial role in the paper. This metric has been introduced in [18] in the setting of finite Markov chains, and (independently)in [19] for reaction– diffusion systems. Closely related metrics have been considered in [7] in the setting of Fokker–Planck equations on graphs. Our discussion follows [18] with a slightly different normalisation convention (see also [10]). We work witha discrete, finitespace X anda Markov generator L,acting on functions ψ:X →R by Lψ(η)= Q(η,η˜)(ψ(η˜)−ψ(η)). η˜∈X X The transition rates Q(η,η˜) are nonnegative for all distinct η,η˜∈X, and we use the convention that Q(η,η)=0 for all η∈X. We assume that Q is irreducible, that is, for all η,η˜∈X, there exists a sequence {η }n ⊆X such i i=0 that η =η, η =η˜, and Q(η ,η )>0 for all i=0,...,n−1. It is a basic 0 n i i+1 result of Markov chain theory that there is a unique stationary probability measure π on X, which means that π(η)=1 and π(η˜)= Q(η,η˜)π(η). η∈X η∈X X X We shall always assume that π is reversible for Q, that is, the detailed balance equations Q(η,η˜)π(η)=Q(η˜,η)π(η˜) hold for all η,η˜∈X. We shall refer to (X,Q,π) as a Markov triple. 2.1. The nonlocal transport metric W. Let P(X):= ρ:X →R ρ(η)π(η)=1 + (cid:26) (cid:12)ηX∈X (cid:27) (cid:12) be the set of probability densities on X ((cid:12)with respect to the stationary mea- sure π). We consider the nonlocal transport metric W defined for ρ ,ρ ∈ 0 1 P(X) by 1 1 W(ρ ,ρ )2:=inf (ψ (η)−ψ (η˜))2ρˆ(η,η˜)Q(η,η˜)π(η)dt , 0 1 t t t ρ,ψ(cid:26)2Z0 η,η˜∈X (cid:27) X RICCI BOUNDSFOR DISCRETESYSTEMS 5 where the infimum runs over all sufficiently smooth curves ρ:[0,1]→P(X) and ψ:[0,1]→RX satisfying the continuity equation d (3) ρ (η)+ (ψ (η˜)−ψ (η))ρˆ(η,η˜)Q(η,η˜)=0 ∀η∈X, t t t t dt η˜∈X X with boundary conditions ρ| =ρ and ρ| =ρ . It has been shown in t=0 0 t=1 1 [18] that W defines a distance on the set of probability densities P(X). Moreover, this distance is induced by a Riemannian structure on its interior P (X):={ρ∈P(X):ρ(x)>0 for all x∈X}. ∗ Thisdefinition is a naturalanalogue of theBenamou–Brenier formulation of the Wasserstein metric, with one crucial additional feature in the discrete setting: the logarithmic mean 1 r−s (4) ρˆ(η,η˜):=θ(ρ(η),ρ(η˜)) with θ(r,s)= r1−pspdp= logr−logs Z0 is used to define, loosely speaking, the value of ρ along the edge between η and η˜. Note that the integral representation of θ holds for all r,s≥0, while the alternative expression requires r and s to be strictly positive. The main reason for the appearance of this particular mean is the fact that it allows one to formulate a discrete chain rule ρˆ∇logρ=∇ρ, where ∇ψ(η,η˜):=ψ(η˜)−ψ(η) denotes the discrete gradient. This identity takes over the role of the usual chain rule ∇logρ=∇ρ/ρ that conveniently holds in the continuous setting. In [18, 19], it is shown that the “heat equation” ∂ ρ=Lρ is the gradient t flow equation for the relative entropy H(ρ):= π(η)ρ(η)logρ(η), η∈X X with respect to the Riemannian structure associated with W. In this sense, the metric W is a natural discrete analogue of the L2-Wasserstein distance. Moreover, at least in some situations [13], it can be shown that the metric W converges to the L2-Wasserstein in the continuous limit, in the sense of Gromov–Hausdorff. Following the Lott–Villani–Sturm approach, [9, 18] gave the following definition of lower Ricci curvature boundedness. Definition 2.1 (Ricci curvature lower boundedness). We say that a Markov triple (X,Q,π) has Ricci curvature bounded from below by κ∈R if for any constant speed geodesic (ρ ) in (P(X),W) we have t t∈[0,1] κ H(ρ )≤(1−t)H(ρ )+tH(ρ )− t(1−t)W(ρ ,ρ )2. t 0 1 0 1 2 6 M. FATHI ANDJ. MAAS We shall use the notation Ric(X,Q,π)≥κ [or briefly Ric(Q)≥κ]. Since (P(X),W) is a geodesic space, this definition is nontrivial. This notion of curvature has several interesting properties, obtained in [9]: • A Markov chain that has Ric(Q)≥κ>0 satisfies various functional in- equalities, such as the modified logarithmic Sobolev inequality 1 H(ρ)≤ E(ρ,logρ) ∀ρ∈P(X), α with constant α=2κ, as well as a Poincar´e inequality 1 Var (ψ)≤ E(ψ,ψ) ∀ψ:X →R, π λ with constant λ=κ. Here, Var (ψ)=π[ψ2]−π[ψ]2, and E denotes the π discrete Dirichlet form given by 1 E(ϕ,ψ):= π (ϕ(η)−ϕ(η˜))(ψ(η)−ψ(η˜))Q(η,η˜) . 2 (cid:20) η˜ (cid:21) X Thesefunctionalinequalities implytheexponential convergence estimates H(etLρ)≤e−αtH(ρ) and ketLψk ≤e−λtkψk L2(X,π) L2(X,π) along the heat equation ∂ ρ=Lρ. t • Ricci curvature bounds tensorize: if Ric(X ,Q ,π )≥κ , then the product i i i i chain (X ×X ,Q,π ⊗π ) defined by 1 2 1 2 Q (x ,y ), if x =y , 1 1 1 2 2 Q((x ,x ),(y ,y ))= Q (x ,y ), if x =y , 1 2 1 2  2 2 2 1 1 0, otherwise, satisfies Ric(X1×X2,Q,π1⊗π2)≥min{κ1,κ2}. Remark 2.2. Besides the entropic Ricci curvature studied in this pa- per, several other notions of discrete Ricci curvature have been studied in the literature, including Ollivier’s coarse Ricci curvature [22], rough Ricci curvaturebyBonciocatandSturm[3]andBakry–E´mery-typeconditions,for example, [16]. We refer to [23] to a survey covering some of these concepts. 2.2. The Riemannian structure induced by W. It will be useful to de- scribe the Riemannian structure associated to W in more detail. Let E := {(x,y)⊆X ×X :Q(x,y)>0} be the set of edges in the graph induced by Q, and let G be the set of discrete gradients, that is, all functions Ψ:E →R of the form Ψ(η,η˜)=∇ψ(η,η˜):=ψ(η˜)−ψ(η) for some function ψ:X →R. RICCI BOUNDSFOR DISCRETESYSTEMS 7 Note that, at each ρ∈P (X), the tangent space of P (X) is naturally ∗ ∗ given by T :={σ :X →R| σ(η)π(η)=0}. It can be proved (see [18], η∈X Section 3) that, for each ρ∈P (X), the mapping ∗ P K :∇ψ7→ (ψ(η˜)−ψ(η))ρˆ(η,η˜)Q(η,η˜) ρ η˜∈X X definesanbijectionbetweenG andT.Ateachρ∈P (X),thismapallowsus ∗ toidentify thetangent spacewith G.Inother words,thecontinuity equation (3) provides an identification between the “vertical tangent vector” dρ and dt the“horizontal tangent vector” ∇ψ.ARiemannianstructureisthendefined by the ρ-dependent scalar product h·,·i :G×G→R given by ρ 1 hΦ,Ψi := π Φ(η,η˜)Ψ(η,η˜)ρˆ(η,η˜)Q(η,η˜) ρ 2 (cid:20)η˜∈X (cid:21) X for Φ,Ψ:E →R. We shall frequently use the notation A(ρ,ψ):=k∇ψk2, ρ where k·k denotes the norm induced by the scalar product h·,·i . ρ ρ The geodesic equations for this Riemannian structure are given by ∂ ρ (η)+ (ψ (η˜)−ψ (η))ρ (η,η˜)Q(η,η˜)=0, s s s s s η˜∈X  X ∂sψs(η)+ 12 (ψs(η)−ψs(η˜b))2(ρs)1(η,η˜)Q(η,η˜)=0, η˜∈X X  where, byabuse of notation, ρˆ denotes the bpartial derivative of the loga-  i rithmic mean defined in (4), that is, for i=1,2, ρˆ(η,η˜):=∂ θ(ρ(η),ρ(η˜)). i i These equations can be regarded as discrete analogues of the continuity equation and the Hamilton–Jacobi equation, respectively. The Hessian of the entropy can then be calculated using the expression d2 hHessH(ρ)∇ψ,∇ψi = Ent(ρ ), ρ ds2 s (cid:12)s=0 (cid:12) where (ρ ) solves the geodesic equations wit(cid:12)h initial conditions ρ| =ρ s (cid:12) s=0 0 and ψ| =ψ . The following result provides an explicit formula for the s=0 0 Hessian which resembles its continuous counterpart (1); we refer to [9] for a proof. With a slight abuse of notation, we write 1 hΦ,Ψi = π Φ(η,η˜)Ψ(η,η˜)Q(η,η˜) . π 2 (cid:20)η˜∈X (cid:21) X 8 M. FATHI ANDJ. MAAS Proposition 2.3 (Identification of the Hessian). For any ρ∈P (X) ∗ and ψ:X →R, we have B(ρ,ψ):=hHessH(ρ)∇ψ,∇ψi ρ 1 = hLˆρ·∇ψ,∇ψi −hρˆ·∇ψ,∇Lψi , π π 2 where Lˆρ(η,η˜):=ρ (η,η˜)Lρ(η)+ρ (η,η˜)Lρ(η˜). 1 2 Since the metric structure we consider is Riemannian in P (X), one ex- b b ∗ pects that κ-geodesic convexity of H is equivalent to hHessH(ρ)∇ψ,∇ψi ≥κk∇ψk2 ρ ρ for any function ψ and any probability density ρ∈ P (X). As the Rie- ∗ mannian structure is degenerate on the boundary of P(X), this is not an immediate result, but this issue has been solved in [9], Theorem 4.4. We thus have the following result. Proposition 2.4 (Characterization of Ricci boundedness). Let κ∈R. A Markov triple (X,Q,π) satisfies Ric(X,Q,π)≥κ if and only if for all ρ∈P (X) and all functions ψ:X →R, we have ∗ (5) B(ρ,ψ)≥κA(ρ,ψ). Inthis paper,weprovideasystematic methodtoprove suchaninequality in concrete situations. A very useful criterion will be given in Section 3. 2.3. The Bakry–E´mery approach to the logarithmic Sobolev inequality. Let us now compare our method with the Bakry–E´mery approach to the (modified) logarithmic Sobolev inequality (MLSI), which has been devel- opedinthediscrete settingin[4]and[5].InordertoprovetheMLSIH(ρ)≤ 1E(ρ,logρ),intheBakry–E´mery methodoneconsidersthebehaviourof the α entropy h(t):=H(ρ ) for solutions to the heat equation ∂ ρ =Lρ . Since t t t t h′(t)=−E(ρ ,logρ ), the MLSI asserts that h(t) ≤−1h′(t). Rather than t t α approaching this inequality directly, the idea is to investigate the second derivative of the entropy. Suppose that one could prove the “convex en- tropy decay inequality” h′′(t)≥−αh′(t). Since h(t),h′(t)→0 as t→∞, this inequality implies the MLSI after integration. Taking into account that (Lρ )2 h′′(t)=π[Lρ Llogρ ]+π t , t t ρ t (cid:20) (cid:21) the following lemma (taken from [5]) summarises this discussion. RICCI BOUNDSFOR DISCRETESYSTEMS 9 Lemma 2.5. Let α>0 and suppose that the convex entropy decay in- equality (Lρ)2 π[LρLlogρ]+π ≥αE(ρ,logρ) ρ (cid:20) (cid:21) holds for all ρ∈P (X). Then the modified logarithmic Sobolev inequality ∗ H(ρ)≤ 1E(ρ,logρ) holds as well. Moreover, (2.5) implies the exponential α convergence bound E(etLρ,logetLρ)≤e−αtE(ρ,logρ) for all ρ∈P (X). ∗ The last assertion follows readily from Gronwall’s lemma and is actually equivalent to (2.5). A direct calculation shows that A(ρ,logρ)=E(ρ,logρ), 1 1 (Lρ)2 B(ρ,logρ)= π[LρLlogρ]+ π , 2 2 ρ (cid:20) (cid:21) hence, in view of Proposition 2.4, the following lemma follows immediately. Lemma 2.6. Let κ ∈ R and suppose that Ric(X,Q,π) ≥ κ. Then the convex entropy decay inequality (2.5) holds with α=2κ. The condition Ric(X,Q,π)≥κ is in principle strictly stronger than the convex entropy decay inequality (2.5): we need to check that the inequality B(ρ,ψ) ≥ κA(ρ,ψ) holds for all functions ψ, and not just for ψ = logρ. There seems to be no reason why ψ=logρ should always be the extremal case in this inequality, but we are unaware of an explicit example which demonstrates this. In general, the constant we shall obtain here for the lower bound on Ricci curvature is worse (i.e., smaller) than the best known constant for the modified logarithmic Sobolev inequality. This was to be expected, since even in the continuous setting lower bounds on the Ricci curvature have no reason to yield the optimal constant for the logarithmic Sobolev inequality in general. 3. The Bochner method. To explain and apply our method, it will be convenient to use the following point of view. Given an irreducible and re- versibleMarkovtriple(X,Q,π),weconsiderasetGofmapsofX ontoitself, that represents the set of allowed moves, and a function c:X ×G→R , + that represents the jump rates. 10 M. FATHI ANDJ. MAAS Definition 3.1. The pair (G,c) is called a mapping representation of L if the following properties are satisfied: 1. The generator L can be written as Lψ(η)= ∇ ψ(η)c(η,δ), δ δ∈G X where ∇ ψ(η):=ψ(δη)−ψ(η). δ 2. For any δ∈G, there exists a unique δ−1∈G satisfying δ−1(δη)=η for all η with c(η,δ)>0. 3. For every F :X ×G→R, we have (6) π F(η,δ)c(η,δ) =π F(δη,δ−1)c(η,δ) . (cid:20)δ∈G (cid:21) (cid:20)δ∈G (cid:21) X X Every irreducible reversible Markov chain has a mapping representation. It is always possible to explicitly build one, by considering the set of bi- jections t :X →X that exchanges η and η˜, and leaves all other points η,η˜ unchanged (see [9]), but in general it is more natural to work with a differ- ent, smaller set G, as we shall see in the examples of Section 4. Property 3 expresses the reversibility of the Markov chain. With this notation, we can rewrite the action functional A as 1 2 A(ρ,ψ)= π c(η,δ)(∇ ψ(η)) ρˆ(η,δη) . δ 2 (cid:20)δ∈G (cid:21) X As discussed in the Introduction, a key element of the optimal transport approach to curvature in the continuous setting is the Bochner identity 1∆(|∇ψ|2)−h∇ψ,∇∆ψi=|D2ψ|2+Ric(∇ψ,∇ψ). This identity allows one 2 to reformulate 1 B(ρ,ψ):= ∆(|∇ψ|2)−h∇ψ,∇∆ψidρ, 2 ZM the continuous analogue of B, in the equivalent form B(ρ,ψ)= |D2ψ|2+Ric(∇ψ,∇ψ)dρ. ZM Since the first term in this expression is nonnegative, lower bounds on the Ricci curvature immediately translate into lower bounds on B(ρ,ψ). In the discrete setting, as far as we know, there is no direct analogue of Bochner’s identity. To get around this issue (in the context of the Bakry– E´mery approach for functional inequalities), it was suggested in [4] and [5] tointroduceafunctionR thatsatisfiessomepropertiesofspatial invariance, in such a way that a one-sided Bochner inequality holds.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.