1 Displacement Convexity in Spatially Coupled Scalar Recursions Rafah El-Khatib, Nicolas Macris, Tom Richardson, Ruediger Urbanke EPFL Switzerland, and Qualcomm USA Emails: {rafah.el-khatib,nicolas.macris,ruediger.urbanke}@epfl.ch, [email protected] 7 1 0 2 Abstract n a We introduce a technique for the analysis of general spatially coupled systems that are governed by scalar J recursions. Such systems can be expressed in variational form in terms of a potential functional. We show, under 7 1 mild conditions, that the potential functional is displacement convex and that the minimizers are given by the fixed points of the recursions. Furthermore, we give the conditions on the system such that the minimizing fixed point is ] T unique up to translation along the spatial direction. The condition matches those in [1] for the existence of spatial I . fixed points. Displacement convexity applies to a wide range of spatially coupled recursions appearing in coding s c theory, compressive sensing, random constraint satisfaction problems, as well as statistical mechanical models. We [ illustrate it with applications to Low-Density Parity-Check and generalized LDPC codes used for transmission on 1 thebinaryerasurechannel,orgeneralbinarymemorylesssymmetricchannelswithintheGaussianreciprocalchannel v 1 approximation, as well as compressive sensing. 5 6 4 I. INTRODUCTION 0 . 1 Spatially coupled systems have been used recently in various frameworks such as coding [2], [3], [4], [5] (for a 0 7 reviewofapplicationsinthecontextofcommunicationssee[5]andreferencestherein),compressivesensing[6],[7], 1 : statistical physics [8], [9], and random constraint satisfaction problems [10], [11]. These systems exhibit excellent v i performance, often optimal, under low complexity message passing algorithms, due to the threshold saturation X r phenomenon [5], [12], [13]. For example, spatially coupled high-degree regular LDPC codes achieve the Shannon a capacity under belief propagation [5], [13]. Another line of research has used spatially coupled constructions to prove results about the original uncoupled underlying model. For example, this idea was used to obtain proofs of replica-symmetric formulas for the mutual information in coding [14], in rank-one matrix factorization [15], and to improve provable algorithmic lower bounds on phase transition thresholds of random constraint satisfaction problems [11]. Given the success of spatial coupling in a wide variety of problems, it should hardly come as a surprise that there are fundamental mathematical structures behind spatially coupling. This paper is concerned with a somewhat hidden convexity structure called displacement convexity. Some of our preliminary work on this matter appeared in [16], [17], [18]. January18,2017 DRAFT 2 The large system asymptotic performance of spatially coupled systems is assessed by the solutions of coupled density evolution (DE) type update equations. In general, the fixed points of these equations can be viewed as the stationary pointequations of a functionalthat is typically calledthe “potential functional” andis an “average form” oftheBethefreeenergy[19]oftheunderlyinggraphicalmodel.1 Ithasalreadybeenrecognizedthatthisvariational formulation is a powerful tool to analyze DE updates under suitable initial conditions [1], [8], [12], [13]. There are various possible formulations of this potential functional; in this paper, we will use the representation from [1] for scalar systems. In a previous contribution [16], we showed that the potential, in the form given in [12], associated to a spatially coupled low-density parity-check (LDPC) code whose single system is the p(cid:96),rq-regular Gallager ensemble, with transmission over the binary erasure channel with parameter (cid:15), or the BEC((cid:15)), has a convex structure called displacement convexity. This structure is well-known in the theory of optimal transport [21]. In fact, the potential we consider in [16] is not convex in the usual sense but it is in the sense of displacement convexity. This, in itself, is an interesting property. Although the formalism in [16] can be extended to more general scalar recursions, for example, those pertaining to irregular LDPC codes, it does not appear to extend to a very wide class of general scalar recursions. The main purpose of the present paper is to prove that a rather general class of scalar systems also exhibits the property of displacement convexity, and even strict displacement convexity under rather mild assumptions. Although the analysis of the present paper is similar in spirit to [16] it is also significantly different and more far reaching in its range of applications. We use the potential in the representation of [1] which allows to obtain much more general proofs that hold under quite mild conditions. The results are applicable to recursions appearing not only in coding, but also in compressive sensing and random constraint satisfaction problems. The main propositions of this paper are: Proposition 5.1 that states that the potential functional has the displace- mentconvexityproperty;Proposition6.1thatassertsthatmonotonicminimizersofthepotentialfunctionalarefixed pointsolutionsofthespatiallycoupledDEequations(inageneralizedsense);Proposition7.4thatgivesthecondition for the unicity of the minimizers up to translations along the spatial axis. It is also of interest that the potential functional satisfies a rearrangement inequality, namely Proposition 3.4 that ensures that one can find minimizers among monotonic spatial fixed points. The conditions for our results to hold are rather mild and essentially match those in [1] for the existence of spatial fixed points. This manuscript is organized as follows. Section II introduces spatially coupled recursions and the variational formulation.InSectionIII,weproverearrangementinequalitiesthatallowustoreducethesearchforminimaofthe potential to a space of monotonic functions, and, in Section IV, we discuss the existence question using the direct method from functional analysis. The potential is shown to be displacement convex in Section V. In Section VI, we generalize the notion of fixed point solutions to the DE equations and show that such generalized solutions are minimizers of the potential. Unicity of the minimizer is addressed in Section VII. In Section VIII, we illustrate 1 Inthecontextofstatisticalmechanics,thepotentialfunctionalisthe“replicafreeenergyfunctional”[20].Thepreciseconnectionbetween theBethefreeenergyandthepotentialfunctionalinthecaseofcodingcanbefoundin[13]. January18,2017 DRAFT 3 displacement convexity with applications to coding and compressive sensing. II. SETUPANDVARIATIONALFORMULATION In this section, we explain the set-up for general spatially coupled scalar recursions and give a variational formulationoftheserecursions.Thefixedpointequationsofthescalarrecursionswillbegenericallycalled“density evolution” (DE) equations. The case of regular p(cid:96),rq-LDPC code ensembles with transmission over the BECp(cid:15)q will serve as a concrete running example for the setting. Consider the pair of DE fixed point equations $ ’& u“h pvq, g (1) ’% v “h puq, f where u,v P r0,1s. The update functions h , h are assumed to be non-decreasing from r0,1s to r0,1s, and f g normalizedsuchthath p0q“h p0q“0andh p1q“h p1q“1.WewillthinkofthemasEXIT-likecurvesofDE f g f g pu,h puqq and ph pvq,vq for u,v Pr0,1s (see Fig. 1). It is always possible to adopt this normalization in specific f g applications. Example: Takeanp(cid:96),rq-regularGallagerensemble,withtransmissionovertheBEC((cid:15)).Lety(resp.x)betheerasure probability emitted by the check (resp. variable) nodes. The DE fixed point equations are y“1´p1´xqr´1 and x“(cid:15)y(cid:96)´1.Inthispaper,weareinterestedinthespecificvalue(cid:15)“(cid:15) whichistheMAPthresholdoftheensemble. MAP Let x , y be the non-trivial stable fixed point when (cid:15)“(cid:15) . To achieve the normalization of (1) we make the MAP MAP MAP change of variables y“y u and x“x v, so that the DE equations become u“y´1p1´p1´x vqr´1q and MAP MAP MAP MAP v “(cid:15) x´1y(cid:96)´1u(cid:96)´1. Note that we must have 1“y´1p1´p1´x qr´1q and 1“(cid:15) x´1y(cid:96)´1. We then set MAP MAP MAP MAP MAP MAP MAP MAP $ ’& h pvq“y´1p1´p1´x vqr´1q, g MAP MAP (2) ’% h puq“u(cid:96)´1, f which satisfy the required normalizations h p0q “ h p0q “ 0 and h p1q “ h p1q “ 1. The corresponding EXIT f g f g curves have three intersections. The one at p0,0q corresponds to the trivial fixed point of DE, the one at p1,1q corresponds to the stable non-trivial fixed point of DE, and the third one at a middle point corresponds to the unstable fixed point. Thenaturalsettingfordisplacementconvexity,atleastinthecontextofspatialcoupling,isthecontinuumsetting, which can be thought of as an approximation of the corresponding discrete system in the regime of large spatial length and coupling window size. The continuum limit has already been introduced in the literature as a convenient means to analyze the behavior of an originally discrete model [1], [6], [8]. Consider a spatially coupled system with an averaging window w : R Ñ R which is always assumed to be ş bounded, non-negative, even, integrable, and normalized such that dxwpxq “ 1. The averaging window is the R January18,2017 DRAFT 4 means for the “coupling” in “spatial coupling”. Let us define the constant ż C :“ dx|x|wpxq. (3) w R We assume throughout the paper that C is finite. As we shall see, this is directly related to finiteness of the w potential. Let f,g : R Ñ r0,1s be two functions and denote by fw “ f b w and gw “ g b w their usual ş ş convolutions with w, i.e., fwpxq“ dxfpyqwpx´yq and gwpxq“ dxgpyqwpx´yq. The pair of fixed point R R DE equations of a spatially coupled scalar continuous system are $ ’& gpxq“h pfwpxqq, g (4) ’% fpxq“h pgwpxqq, f where x P R is the spatial position. We will often refer to the functions f, g as profiles and to h , h as update f g functions. A pair of profiles f,g : R Ñ r0,1s that solves the above equations almost everywhere will be called a fixed point, FP for short. Note that (4) are non-local equations because of the coupling through w. In this paper, we are interested in profiles p : R Ñ r0,1s (p denotes a generic profile like f and g) that satisfy the limit conditions lim ppxq“0, lim ppxq“1. (5) xÑ´8 xÑ`8 Wenotethatthesetwolimitvaluesaretheextremefixedpointsof(1)Wewillrefertosuchprofilesasinterpolating profiles. A pair f,g of interpolating profiles that solves (4) is called an interpolating FP. Definition: A function p : R Ñ r0,1s satisfying (5) is called an interpolating profile. A pair f,g of interpolating profiles that solves (4) almost everywhere, i.e., up to a set of measure zero, is called an interpolating fixed point (FP). In Section III, we show that when minimizing the potential functional over the space of interpolating profiles we can focus on monotonic (non-decreasing) profiles. A. Potential function associated to (1) In [1] the following potential function is introduced, ż ż u v φph ,h ;u,vq“ du1h´1pu1q` dv1h´1pv1q ´uv. (6) f g g f 0 0 Often, when they are clear from context or irrelevant, we will drop the update functions h and h as arguments f g fromthenotationanddenotethispotentialfunctionbyφpu,vq.Sinceh´1 andh´1 arenon-decreasingthepotential g f φpu,vq is convex in u for fixed v and convex in v for fixed u. It is minimized over v by setting v “ h puq and f over u by setting u“h pvq. g January18,2017 DRAFT 5 h puq f 0.8 0.6 h´1puq g 0.4 0.2 0 u 0 0.2 0.4 0.6 0.8 Fig.1. Agenericexampleofthesystemsweconsider.TheEXIT-likecurvesarehf (inred)andh´g1(inblue).ThesignedareaAphf,hg;1q from(7)isthesumofthelightgrayareas(positivelysigned)andthedarkgrayareas(negativelysigned),anditisequalto0. Substitutingv “h puqin(6),weobtaintheintegralofthesignedareabetweenthetwoEXITcurves(seefig.1) f as Aph ,h ;uq“φph ,h ;u,h puqq f g f g f ż u “ du1ph´1pu1q´h pu1qq. (7) g f 0 Note that this is the signed area bounded by the two curves and the region between the vertical axis at the origin and a vertical axis at u. In [1], the following key result was shown. It states that for an interpolating FP to exist the potential φ must be minimal at both limit points. Lemma 2.1: If there exists an interpolating FP solution to (4), then φph ,h ;u,vq ě 0 for all u,v P r0,1s and f g Aph ,h ;1q“φph ,h ;1,1q“0. f g f g The result applies not only to interpolating FPs but also to a relaxed definition of interpolating “consistent” FPs (CFPs) that we define in Section VI. In [1], when the assumption φph ,h ;1,1q “ 0 is made, the condition f g φph ,h ;u,vqě0forallu,v Pr0,1sistermedthepositivegapcondition(PGC).Inthispaperwewilladditionally f g assume φph ,h ;1,1q“0 throughout so the term positive gap condition will be used to imply both this equality f g and the inequality in Lemma 2.1. WhentheinequalityinLemma2.1isstrict,i.e.,φph ,h ;u,vqą0forpu,vqRtp0,0q,p1,1quthenthecondition f g is termed the strictly positive gap condition (SPGC) in [1]. In this case, it was shown that an interpolating fixed point profile exists provided w is strictly positive on the interior of some interval r´W,Ws and zero off of the interval. This support condition on w can be relaxed under various other conditions (see [1]). Definition: Wesaythatthepositivegapcondition(PGC)issatisfiedwhenφph ,h ;1,1q“0andφph ,h ;u,vqě f g f g 0forallu,v Pr0,1s.Thestrictlypositivegapconditionissatisfiedwhenφph ,h ;1,1q“0andφph ,h ;u,vqą0 f g f g January18,2017 DRAFT 6 for pu,vqRtp0,0q,p1,1qu. Example: For the p(cid:96),rq-regular Gallager ensemble, with transmission over the BEC((cid:15)) with (cid:15)“(cid:15) we have the MAP potential function ! ) 1 r´1 (cid:96)´1 φpu,vq“ u´ p1´p1´y uqr´r1qq ` v(cid:96)´(cid:96)1 ´uv, x y r MAP (cid:96) MAP MAP and the signed area ! ) 1 r´1 u(cid:96) Aphf,hg;uq“x u` y rpp1´yMAPuqr´r1 ´1q ´ (cid:96) . MAP MAP Moreover, we have Aph ,h ;1q “ 0. In fact, this last constraint together with the two fixed point equations f g y “1´p1´x qr´1 and x “(cid:15) y(cid:96)´1 completely determine (cid:15) , x and y . The SPGC holds for this MAP MAP MAP MAP MAP MAP MAP MAP example (see Section VIII for further illustration). B. Potential functional of the spatially coupled system (4) The solutions of spatially coupled DE equations (4) are given by the stationary point of a potential functional W of f and g defined below. This can be checked by setting the functional derivatives of this potential functional with respect to each of f and g to zero. We set ż Wpf,gq“ dxI pxq, (8) f,g,w R where we have introduced the notation ż ż gpxq fpxq I pxq“ duh´1puq` dvh´1pvq´fwpxqgpxq. (9) f,g,w g f 0 0 Example: For the p(cid:96),rq-regular LDPC code and transmission over the BEC((cid:15)), the potential (8) is ż ” (cid:32) ( ı 1 r´1 (cid:96)´1 Wpf,gq“ dx gpxq´ p1´p1´y gpxqqr´r1q ` fpxq(cid:96)´(cid:96)1 ´fwpxqgpxq . x y r MAP (cid:96) MAP MAP R Notethatthelimitoftheintegrandin(8)(andtheexample)vanisheswhenxÑ´8becauseofthecondition(5) on the profiles. It also vanishes when xÑ`8 because of (5) and Aph ,h ;1q“0. However, this does not suffice f g fortheexistenceoftheintegral,essentiallyduetothefactthatfw´f maynotbeLebesgueintegrable(formonotonic profiles this difficulty does not arise). So it is possible that Wpf,gq fails to be well-defined as a Lebesgue integral for some choices of the interpolating profiles. OnceweconsiderinterpolatingprofilesandassumethePGCandthatC ă8,wecancircumventthistechnical w issue by defining the potential functional as ż B Wpf,gq“ lim dxI pxq. (10) f,g,w A,BÑ8 ´A We show below that the limit always exists (it is possibly `8). Lemma 2.2: Assuming the PGC, we have for any interpolating profile pair f,g that ż Wpf,gqě dxφpfwpxq,gpxqq, (11) R January18,2017 DRAFT 7 and given a sequence of interpolating pairs f ,g converging pointwise almost everywhere to an interpolating pair i i f,g we have liminfWpf ,g qěWpf,gq. (12) i i iÑ8 ş ş Proof: Define H pfq“ fdvh´1pvq and H pgq“ gduh´1puq. Note that f 0 f g 0 g I “pH ˝gq`pH ˝fq´fwg. f,g,w g f Now, if we define I˜ “pH ˝gq`pH ˝fqw´fwg, f,g,w g f then ż ż ż B B B dxI pxq“ dxI˜ pxq` dxpI pxq´I˜ pxqq f,g,w f,g,w f,g,w f,g,w ´A ż´A ż´A B B “ dxI˜ pxq` dxppH ˝fq´pH ˝fqwpxqq. f,g,w f f ´A ´A Taking limits A,B Ñ`8 by definition (10) and Lemma A.1, we obtain ż B Wpf,gq“ lim dxI˜ pxq. f,g,w A,BÑ`8 ´A We will shortly see that the PGC implies I˜ pxq is non-negative so that Wpf,gq is well defined (it is possibly f,g,w `8). This also means that it is possible to adopt ż Wpf,gq“ dxI˜ pxq, (13) f,g,w R as an alternative expression for Wpf,gq. Now, note that H pfq and H pgq are convex functions because h´1 and h´1 are non-decreasing. Indeed f g f g ż f`a H pf `aq´H pfq“ dvh´1pvqěah´1pfq“aH1pfq. f f f f f f By Jensen’s inequality we have pH ˝fqw ěpH ˝fwq, f f and we therefore obtain I˜ pxqěφpfwpxq,gpxqq, (14) f,g,w which proves the non-negativity of I˜ pxq since φpfwpxq,gpxqq is non-negative by the PGC. f,g,w Integrating (14) and using (13), we obtain the first claim (11) of the lemma. Furthermore, we get the second claim (12) directly by applying Fatou’s lemma to (13) (we can apply Fatou’s lemma since by (14) I˜ is a fi,gi,w non-negative sequence, and it converges to I˜ ). f,g,w Let us remark that in the process of proving this lemma, we have seen Wpf,gq can be defined as (10) or equivalently as (13), as long as we assume the PGC, interpolating profiles and C ă`8. w January18,2017 DRAFT 8 C. Discussion In Section VI, we show that among all interpolating profiles, monotonic interpolating CFPs yield minimizers of W. To do that, we use rearrangement properties that are summarized in Section III. For a fixed f, we always have Wpf,gqěWpf,h ˝fwq.ThisisbecauseI pxqisconvexingpxqforfixedfpxqandsettinggpxq“h pfwpxqq g f,g,ω g minimizes I pxq over gpxq for fixed fpxq. f,g,ω One of the main results of this paper is to show the displacement convexity of W in its two arguments. More precisely,wecanthinkofinterpolatingbetweentwopairspf ,g qandpf ,g qofmonotonicprofilesbyinterpolating 0 0 1 1 their inverse functions. Hence, we consider $ ’&f´1 “p1´λqf´1`λf´1, λ 0 1 ’% g´1 “p1´λqg´1`λg´1, λ 0 1 and show that Wpf ,g q is a convex function of λ. Note that for a monotonic interpolating profile p the inverse λ λ function p´1puq is uniquely defined for almost all u P p0,1q and right and left limits p´1pu`q and p´1pu´q, respectively, are uniquely determined. Displacement convexity is explained in more detail in Section V. Displacement convexity applies only to monotonic profiles. In the next section, we address the conditions under which one can conclude that minimizers of W satisfying (5) can taken to be monotonic. The following quantities will play a crucial role in the remainder of this work, ż ż x x Ωpxq“ dzwpzq, Vpxq“ dzΩpzq. (15) ´8 ´8 Here, V is called the kernel for reasonsthat will becomeclear. As will beseen, displacement convexityarises from the convexity of V. Lemma 2.3: Assume that C ă8. Then, V is well defined and convex. w Proof: Using integration by parts, we can write ż ˇ ż x ˇx x Vpxq“ dzΩpzq“xΩpxqˇ ´ dzzwpzq. (16) ´8 ´8 ´8 For z ď0, we have ż ż z z dx|x|wpxqě dx|z|wpxq“|z|Ωpzqě0, ´8 ´8 so taking z Ñ´8 shows lim zΩpzq“0. Using (16) we conclude that zÑ´8 ż x Vpxq“xΩpxq´ dzzwpzq. (17) ´8 Thus, V is finite and well-defined. Convexity follows because V2pxq“wpxqě0. Much of the analysis in this paper proceeds relatively simply under the assumption that ż dxp1´fwpxqqgpxqă8. R Most of our results will first be established under this assumption. In general, however, this assumption is not needed and it is sufficient only that C ă8. We typically generalize our results to this case by taking limits. Let w us discuss this issue. January18,2017 DRAFT 9 Fig.2. Aprofilef anditssaturatedversiontfsK. Definition: We say that a function f is saturated off of the finite interval r´K,Ks if fpxq“0 for xPp´8,´Kq and fpxq“1 for xPpK,8q. Given a profile f let us define tfs by K tfs pxq“1 fpxq`1 . K t|x|ďKu txąKu By definition, tfs is saturated off of r´K,Ks (see Fig. 2). K Lemma 2.4: Let f,g be interpolating profiles and assume the PGC and that C ă8, then w lim Wptfs ,tgs q“Wpf,gq. K K KÑ8 Proof: See Appendix B. We end this section with another useful definition. Definition: Assuming it exists, we define ˜ ¸ ż ż ż ż fpxq gpxq Lpf,gq“Wpf,gq´ p1´fwpxqqgpxqdx “ dx dvh´1pvq´ dup1´h´1puqq . (18) f g R R 0 0 Aswewillsee,thefunctionalLpf,gqcapturesthe“simple”(uncoupled)partofW :Itisinvariantunderincreasing rearrangements and linear under displacement interpolation. III. REARRANGEMENTS Displacement convexity is usually defined on a space of probability measures. For measures on the real line, it is most convenient to view displacement convexity on a space of cumulative distribution functions (cdf’s). It is therefore fortunate that the search for the global minimum of the potential functional (8) can be reduced to the space of profiles f and g that are non-decreasing. In this section, we use the tool of increasing rearrangements to show that such rearrangements of f and g can only decrease the potential. Symmetricdecreasingrearrangementsareaclassicaltoolinanalysis,see[22].Herewewilluseacloselyrelated cousin namely increasing rearrangements (see [23]). Our presentation is self-contained and no previous exposure to rearrangements is needed. Consider a profile p : R Ñ r0,1s that satisfies (5). The increasing rearrangement2 2Notethatanincreasingrearrangementisnotnecessarilystrictlyincreasing. January18,2017 DRAFT 10 Fig.3. Asimpleexampleofanincreasingrearrangementforstepfunctions. of p is the increasing function p¯that has the same limits, and where the mass of each level set is in some sense preserved (here the mass of a level set is infinite). More formally, let us represent p in layer cake form as ż ż ppxq 1 ppxq“ dt“ dt1 pxq, (19) Et 0 0 where 1 is the indicator function of the level set E “tx|ppxqątu. For each value tPr0,1q, the level set E Et t t can be written as the disjoint union of a bounded set A and a half line pa ,`8q. We define the rearranged set t t E¯ “pa ´|A |,`8q, and then t t t ż 1 p¯pxq“ dt1E¯tpxq. (20) 0 A simple example capturing the notion of increasing rearrangement is shown in Fig. 3. Lemma 3.1: Let p and q be two profiles satisfying (5). Then, assuming the left integral exists, we have ż ż dxpppxq´qpxqq“ dxpp¯pxq´q¯pxqq. R R Proof: For each t P p0,1q there exists a minimal a such that pa ,8q Ă tx : ppxq ą tuXtx : qpxq ą tu. t t Define B “ tx : ppxq ą tuzpa ,8q and B “ tx : qpxq ą tuzpa ,8q. We also define the same quantities for p,t t q,t t the rearranged profiles p¯and q¯, namely a¯ , B and B . We show below that t p¯,t q¯,t |B |´|B |“|B |´|B |. (21) p,t q,t p¯,t q¯,t Equation (21) gives the result since, using the layer cake representation, it follows that ż ż ż ż 1 1 dxpppxq´qpxqq“ dtp|B |´|B |q“ dtp|B |´|B |q“ dxpp¯pxq´q¯pxqq. p,t q,t p¯,t q¯,t R 0 0 R January18,2017 DRAFT