ebook img

S-AMP for Non-linear Observation Models PDF

0.15 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview S-AMP for Non-linear Observation Models

S-AMP for Non-linear Observation Models Burak C¸akmak Ole Winther Bernard H. Fleury Department of Electronic Systems DTU Compute Department of Electronic Systems Aalborg University Technical University of Denmark Aalborg University 9220 Aalborg, Denmark 2800 Lyngby, Denmark 9220 Aalborg, Denmark Email: [email protected] Email: [email protected] Email: fl[email protected] 5 Abstract—RecentlyweextendedApproximatemessagepassing ThederivationofGAMPisbasedoncertainapproximations 1 (AMP) algorithm to be able to handle general invariant matrix (mainly Gaussian and quadratic approximations) of loopy 0 ensembles. In this contribution we extend our S-AMP approach belief propagation. If the measurement matrix is large and 2 to non-linear observation models. We obtain generalized AMP has zero mean and iid entries, GAMP provides excellent (GAMP) algorithm as the special case when the measurement n matrix has zero-mean iid Gaussian entries. Our derivation performance,e.g.[3],[9].Furthermore,forgeneralmatrixen- a is based upon 1) deriving expectation propagation (EP) like sembles it can show quite reasonable accuracy [10]. However J algorithms from the stationary-points equations of the Gibbs the algorithmitself and itsderivationare notwell-understood. 5 free energy under first- and second-moment constraints and 2) TobetterunderstandGAMP,in[11]theauthorscharacterize 2 applying additive free convolution in free probability theory to get low-complexity updates for the second moment quantities. its fixed points. Specifically, they show that GAMP can be ] Index Terms—Approximate Message Passing, Variational In- obtained from the stationary-point equations of some im- T ference, Expectation Propagation, Free Probability plicit approximationsof naive mean-field approximation[11]. I . These implicit approximations only provide limited insight. s c I. INTRODUCTION Furthermore,thenaivemean-fieldinterpretationismisleading, [ becausethefixedpointsofAMP-typealgorithmsaretypically Approximatemessagepassingtechniques,e.g.[1]–[3],have knownastheTAP-likeequations,i.e.theyincludeacorrection 1 recentlyreceivedsignificantattentionbythesignalprocessing term to naive mean-field solution. In fact GAMP can also be v community. Essentially, these methods are based on taking 6 obtainedfromthestationary-pointsequationsoftheBethefree the large system limit of loopy belief propagation where the 1 energy (BFE) of the underlying loopy graph under first- and central limit theorem can be applied when the underlying 2 second-momentconstraints.However,thisapproachalsolimits 6 measurement matrix has independent and zero-mean entries. our understanding, because the BFE formulation of a loopy 0 Variational inference techniques are well-established in the graph is suitable for sparsely connected systems. . field of information theory e.g. [4], [5] and machine learning 1 In this work we focus on the BFE formulation of a tree 0 e.g.[6],[7].Forexample,itiswell-knownthatexactinference graph, i.e. an exact Gibbs free energy formulation. We note 5 can be formulated as the solution to a minimization problem 1 of the Gibbs free energy of the underlying probabilistic that our approach coincides with the expectation prorogation : (EP) [12]–[14] since the fixed points of EP are the stationary v model under certain marginalization consistency constraints points of BFE of the underlying probabilistic graph under a i [4]. We have recently shown in [8] that for the zero-mean X set of moment consistency constraints [6]. independent identically distributed (iid) measurement matrix, r Notations: TheentriesoftheN×K matrixX aredenoted a approximatemessagepassing(AMP)algorithm[1]canalsobe by either X or [X] , n ∈ N , {n : 1 ≤ n ≤ N} obtainedfromthestationary-pointsequationsoftheGibbsen- nk nk and k ∈ K , {k : 1 ≤ k ≤ K}. Without loss of ergy under first- and second-moment consistency constraints. generality we assume that K and N are disjoint. (·)† denotes Furthermore, AMP can be extended to general invariant1 the transposition. We denote by ℜz and ℑz the real and matrixensemblesbymeansoftheasymptoticspectrumofthe imaginary part of z ∈C, respectively. The entries of a vector measurement matrix. We call this approach S-AMP (where S u ∈ RT×1 are indicated by either u or [u] , t ∈ [1,T]. comes from the fact that the derivation uses the S-transform). t t Furthermore hui , T u /T. Moreover, diag(u) is a AMP is an estimation algorithm for the linear observation t=1 t diagonal matrix with the elements of vector u on the main models. However many interesting cases occur in practice P diagonal. For a square matrix X, diag(X) is a column wheretheobservationmodelisnon-linear,e.gnon-linearform vector containing the diagonal elements of X. Furthermore of compressed sensing, Gaussian processes for classification. Diag(X),diag(diag(X)).TheGaussianprobabilitydensity In this article we extend S-AMP approach [8] to general function (pdf) with mean µ and the covariance Σ is denoted observation models. Specifically we address the sum-product byN(·;µ,Σ).Throughoutthepaperwhenreferringto“inthe generalized AMP (GAMP for short) algorithm [3]. large system limit” we imply that N,K tends to infinity with theratioα,N/K fixed.Alllargesystem limitsareassumed 1Notethatweomittomentiontheinvariancepropertyin[8].Itishowever crucial forthederivation. to hold in the almost sure sense, unless explicitly stated. II. SYSTEMMODEL ANDREVIEW OF GAMP III. GIBBS FREEENERGY WITH MOMENTCONSTRAINTS Consider the estimation of a random vector x ∈ RK×1 Forthesakeofnotationalcompactness,considers=(x,z). which is linearly transformed by A ∈ RN×K as z , Ax, Furthermorewe introducethesetV ,K∪N andassumethat thenpassedthrougha noisychannelwhoseoutputis givenby K and N are disjoint. Moreover we define y ∈RN×1.We assumethattheconditionalpdfofthechannel factorizes according to fA(s),δ(z−Ax) (16) p (x ) v ∈K p(y|z)= p(yn|zn). (1) f (s ), v v (17) v v n∈N (p(yv|zv) v ∈N. Y FurthermorefortheBayesiansettingweassignapriorpdffor With these definitions, the posterior pdf of s reads x that is assumed to be factorized as 1 p(x)= p(xk). (2) p(s|y,A)= ZfA(s) fv(sv). (18) kY∈K vY∈V A. GAMP summarized with Z denoting a normalization constant. The factor graph We summarize GAMP here for the sake of streamlining representing (18) is a tree. Thus the BFE of (18) is equal to and making the connection to the derivation of S-AMP. We its Gibbs free energy [4]: separate the GAMP iteration rules [3] into two parts: (i) GAMP-1st order that initializes xˆt, τtx and mt from tabula G({˜bv,bA,bv}),− ˜bv(sv)log˜bv(sv)dsv rasa at t≤0 and proceeds iteratively as v∈VZ X f (s) f (s ) κt =Axˆt−(Λt)−1mt−1 (3) − b (s)log A ds− b (s )log v v ds . z z A b (s) v v b (s ) v zˆt =µz(κtz;Λtz) (4) Z A vX∈VZ v v (19) τt =σ (κt;Λt) (5) z z z z Here b and b , v ∈ V, denote the beliefs of the factors in mt =Λtz(zˆt−κtz) (6) (18), wAhile ˜b ,vv ∈ V, denote the beliefs of the unknown v κt =(Λt)−1A†mt+xˆt (7) variables in (18). Without loss of generality we assume that x x xˆt+1 =µx(κtx;Λtx) (8) the expressions fA(s)/bA(s) and fv(sv)/bv(sv) in (19) are strictly continuous; so that the Gibbs free energy is well- τt+1 =σ (κt;Λt). (9) x x x x defined.Indeedthisiswhatwewillendupwithintheanalysis. (ii) GAMP-2nd order are the update rules for Λt and Λt: If we define a Lagrangianfor (19) that accounts for certain z x Λt =(diag((A◦A)τt))−1 (10) marginalization consistency constrains, then at its stationary z x point, the belief ˜b (s ) is equal to p(s |y,A) for all v ∈ V τt =Λt(1−Λtτt) (11) v v v m z z z [4]. Instead, following the arguments of [6], we define the Λtx =diag((A◦A)†τtm). (12) Lagrangian on the basis of a set of moment consistency In these expressions 1 is the all-ones vector of appropriate constraints as dimension and µx and σx are scalar functions. Specifically, if L({˜b ,b ,b }),G({b ,b ,˜b })+Z v A v v A v Λ is a K×K diagonalmatrix and κ is a K×1 vector; then for k ∈ K, [µx(κ;Λ)]k and [σx(κ;Λ)]k are respectively the − ν†v φ(sv) bA(s)−˜bv(sv) ds mean and the variance taken over the pdf vX∈V Z n o Λ qk(xk)∝pk(xk)exp − kk(xk−κk)2 . (13) − ν¯†v φ(sv) bv(sv)−˜bv(sv) dsv. (20) 2 (cid:18) (cid:19) vX∈V Z n o Similarly, µz and σz are scalar functions such that if Λ is a Here we consider constraints on the mean and variance, i.e. N ×N diagonal matrix and κ is a N ×1 vector, for n∈N, φ(s ) = (s ,s2), v ∈ V. For convenience we write the v v v [µz(κ;Λ)]n and[σz(κ;Λ)]n arerespectivelythemeanandthe Lagrangian multipliers as variance taken over the pdf Λ Λ qn(zn)∝p(yn|zn)exp −Λnn(zn−κn)2 . (14) νv , γv,− 2vv , ν¯v , ρv,− 2vv , v ∈V. 2 (cid:18) (cid:19) (cid:18) (cid:19) (cid:18) (cid:19) The term Z accounts for the normalization constraints: If the entries of A are iid with zero mean and variance 1/N, the iteration steps for the GAMP-2nd order simplify as Z ,−β 1− b (s)ds α A A Λt = I, Λt = τt I, (15) (cid:18) Z (cid:19) z hτti x m x − β˜ 1− ˜b (x )ds −β 1− b (s )ds where I is the identity matrix of appr(cid:10)opri(cid:11)ate dimension. We v v v v v v v v note that if in addition p(y|z)=N(y;z,σ2I), GAMP yields vX∈V (cid:18) Z (cid:19) (cid:18) Z (cid:19) AMP, see e.g. [2, Appendix C]. where β , β , β˜ are the associated Lagrange multipliers. A v v We formulate the estimate of s ,v ∈V, as Thereby (31) has a form identical to (13) and (14) for v ∈N v and for v ∈K, respectively. Then let us define sˆ , s ˜b⋆(s )ds , (21) v v v v v µ(κ;Λ),(µ (κ ;Λ ),µ (κ ;Λ )) (32) Z x x x z z z where ˜b⋆v(sv) represents˜bv(sv) at a stationary point of (20). σ(κ;Λ),(σx(κx;Λx),σz(κz;Λz)). (33) A. The Stationary Points of the Lagrangian The entries [µ(κ;Λ)]v and [σ(κ;Λ)]v are respectively the mean and variance of the belief (25). Moreover we introduce Fornotationalconvenienceweintroducefirstthe(K+N)× (K+N)diagonalmatricesΛandΛaswellasthe(K+N)×1 Σ, Σx 0 , sˆ=(xˆ,zˆ). (34) vectors γ and ρ whose entries are respectively Λ , Λ , γ 0 Σ vv vv v z (cid:18) (cid:19) andρ , v ∈V.In connectionwith variablesx andz we write v Withthesedefinitions,theidentitiesresultingfromthemoment Λ 0 consistency constraints are given by Λ= x , γ =(γ ,γ ) (22) 0 Λ x z z (cid:18) Λ 0 (cid:19) sˆ=Diag(Σ)(γ+ρ), sˆ=µ(κ;Λ) (35) Λ= 0x Λ , ρ=(ρx,ρz). (23) Diag(Σ)=(Λ+Λ)−1, diag(Σ)=σ(κ;Λ). (36) z (cid:18) (cid:19) ThedimensionsofΛx andΛx areK×K;vectorsγx andρx B. The TAP-like Equations and GAMP-1st Order have dimension K×1. By using the fixed-point identities presented in Sec- Following the arguments of [6], we have the stationary tionIII-A,onecanintroducenumerousfixed-pointalgorithms. points of the Lagrangian (20) in the form In this work we restrict our attention to TAP-like algorithms, 1 e.g. [3], [12], [14]. To that end we start with the definitions ˜b⋆(s )= exp((ν +ν¯ )†φ(s )), v ∈V (24) v v Z˜ v v v in (28) and write v 1 b⋆v(sv)= Z fv(sv)exp(ν¯†vφ(sv)), v ∈V (25) γx =−A†γz+(Λx+A†ΛzA)xˆ. (37) v 1 1 b⋆(s)= f (s)exp − s†Λs+s†γ (26) Furthermore by making use of the identities in (35) and (36) A ZA A (cid:18) 2 (cid:19) we have where ZA, Zv, Z˜v are the associated normalizationconstants. ρ =A†γ −(Λ +A†Λ A)xˆ+(Λ +Λ )xˆ (38) Let us first considerthe marginalizationof the belief b⋆(s) x z x z x x A =A†(γ −Λ Axˆ)+Λ xˆ (39) with respect to z: z z x =A†m+Λ xˆ with m,(γ −Λ Axˆ). (40) x z z b⋆(x)= b⋆(x,z)dz =N(x;xˆ,Σ ) (27) A A x Moreover, by the definition of m we also point out that Z where m=(Λ +Λ )zˆ−ρ −Λ Axˆ (41) z z z z Σ ,(Λ +A†Λ A)−1, xˆ,Σ (γ +A†γ ). (28) =Λ zˆ−ρ =Λ (zˆ−κ ). (42) x x z x x z z z z z Here we note that Σ is positive definite since b⋆(x) is a TherebyweexactlyobtainthefixedpointsofGAMP-1storder, x A well-defined pdf. Second, let us consider the marginalization i.e. (3)-(9). Now let us keep the iterations step of GAMP-1st overx,whichbasicallyfollowsfromthelineartransformation order but define the update rule for Λt and Λt on the basis x z property of a Gaussian random vector: of the fixed point identities in (36). For example: b⋆(z)= e−12z†Λzz+z†γz δ(z−Ax)e−12x†Λxx+x†γxdx Λtz =(diag(τtz−1))−1−Λtz−1 (43) A ZA Z Σtx =(Λtx−1−A†ΛtzA)−1. (44) = δ(z−Ax)N(x;xˆ,Σ)dx=N(z;zˆ,Σz) (29) Λt =Diag AΣtA† −1−Λt (45) z x z Z where zˆ,Axˆ and Σz ,AΣxA†. Λtx =diag((cid:16)τtx)−1−Λ(cid:17)tx−1 (46) At this stage it is convenient to define Λt =Diag Σt −1−Λt. (47) x x x κ,(κx,κz)=(Λ−x1ρx,Λ−z1ρz) (30) In this way we obtain a (cid:0)new(cid:1)fixed point algorithm whose fixed points are the stationary point of Lagrangian (20). with κ ∈RK. In this way we can write the belief in (25) as x Howeverfromthe complexitypointof view these updatesare Λ problematic due to the matrix inversion in (44). In the sequel b⋆(s )∝f (s )exp − vv(s −κ )2 . (31) v v v v 2 v v we will address how to bypass (44) as K,N are large. (cid:18) (cid:19) C. The Large-System Simplifications Thenbyinvoking(48)weeasilyobtainthat(seeAppendixB) To circumvent the complexity problem (44), we utilize 1 1 q ≃ . (51) the so-called additive free convolution in free probability K [Λ ] +RK(−q) theory [15]. The reduction that we obtain in this way can kX∈K x kk Jz be also obtainedbymeansofthe self-averagingansatz in [14, Thereby, we conclude that Section 3.1]. [Λ ] ≃RK (−q), k ∈K. (52) In order to make use of additive free convolution we need x kk Jz torestrictourconsiderationtotheinvariantmatrixensembles: Theaverageof(52)overtherandommatrixAagreeswith[14, Eq. (50)]. Note that the simplification in (52) is still implicit due to the definition of q in (51). Subsequently we present ASSUMPTION 1 Consider the singular value decomposition A = UDV where UN×N and VK×K are orthogonal an explicitcomplexitysimplification for[Λx]kk. First we note that (52) states that we can replace all the elements [Λ ] , matrices and D is a N ×K non-negative diagonal matrix. x kk k ∈ K by a single scalar quantity, say Λ . This allows us to We distinguish between the invariance assumption on A from x write q ≃ hσ (κ ,Λ I)i with κ = Λ−1A†m+xˆ. Then, right and from left: a) A is invariant from right, i.e. V is x x x x x from (52) we write an explicit fixed point identity for Λ as Haar distributed; b) A is invariant from left, i.e. U is Haar x distributed. Λ =RK (−hσ (κ ;Λ I)i). (53) x Jz x x x It indeed makes sense to distinguish between the invariance As a second part we address similar complexity simplifica- from right and the invariance from left. For example, once tion for [Λ ] for n ∈ N. To that end let us introduce an z nn we consider the classical linear observation model such as auxiliary N ×1 vector τ˜ whose entries are defined as m p(y|z) = N(y;z,σ2I), then Λ = I/σ2. In this case we do z [τ˜ ] ,[Λ ] −[Λ ]2 [A(Λ +A†Λ A)−1A†] (54) not need to consider Assumption 1-b). m n z nn z nn x z nn Second,wemakethefollowingtechnicalassumptiononthe =[(Λ−z1+AΛ−x1A†)−1]nn, (55) limiting spectrum of the respective matrices: where(55)followsdirectlyfromWoodbury’smatrixinversion ASSUMPTION 2 As N,K → ∞ with the ratio α = N/K lemma. Furthermore by making use of (36) for (54) we can fixed let the spectra of Λ , Λ and A†A converge almost write the following fixed-point identity x z surely to some limiting spectra whose supports are compact. [Λ ]2 [(Λ−1+AΛ−1A†)−1] =[Λ ] − z nn (56) Duetolackofanexplicitdefinitionofthe“Lagrangian”matrix z x nn z nn [Λz]nn+[Λz]nn Λ, Assumption 2 is rather implicit. Nevertheless it can be 1 = . (57) considered in the same vein as the so-called thermodynamic [Λ−1] +[Λ−1] z nn z nn limit in statistical physics: all microscopic variables converge Thus, we can invoke identical arguments on the additive to deterministic values in the thermodynamic limit [16]. free convolution approximation above for [Λ ] as well. z nn For example, under Assumption 1-a) and Assumption 2, it Specifically, under Assumption 1-b) and Assumption 2, for turns out that Λ and J , A†Λ A are asymptotically free x z z a large N,K we have [17] and from [15, Lemma 3.3.4] we have that2 1 RKΛx+Jz(ω)≃RKΛx(ω)+RKJz(ω), ℑω <0. (48) [Λz]nn ≃ RNJx(−hτ˜mi), n∈N (58) Here for a T ×T symmetric matrix X RT denotes the R- X with J , AΛ−1A†. The complexity simplification (58) is transform of the spectrum of X (see Appendix A) and ≃ x x still implicit due the definition of τ˜ . To present an explicit stands for the large system approximation that turns to an m form of it consider first (56) and (57) such that we can write almost surely equality in the large system limit. Furthermore we introduce [τ˜ ] =[Λ ] − [Λz]2nn (59) RT (r), lim ℜRT (ω), ℑr =0 (49) m n z nn [Λz]nn+[Λz]nn X ω→r X =[Λ ] (1−[Λ ] [σ (κ ;Λ )] ). (60) z nn z nn z z z n whenever the limit exists. On the other hand, (58) implies that we can replace all the It turns out that by solely invoking “additive free convolu- elements [Λ ] , n ∈ N by a single scalar quantity, say Λ . tion”, e.g. (48), we can easily solve the complexity issue of z nn z Now for convenience let us define N ×1 vector τ whose the fixed point identities for Λ and Λ which do not require m x z entries are given by matrix inversion. First we consider the simplification for Λ . x To than end let us first define the auxiliary variable [τ ] ,Λ (1−Λ [σ (κ ;Λ I)] ), n∈N. (61) m n z z z z z n 1 1 1 q , tr{(Λ +J )−1}= . (50) Then following (58) we introduce an explicit fixed-point K x z K [Λ ] +[Λ ] kX∈K x kk x kk identity for Λz as 2InfactwecandefinetheR-transformonnegativerealline.Howeverinthe Λ = 1 . (62) expositionitrequiresanimplicitassumptionthatΛisbeingpositive-definite. z RN (−hτ i) Jx m So far we have shown in (53) and (62) how to bypass the function, the R-transform formulation becomes rather trivial. needformatrixinversionto“update”Λ andΛ ,respectively. In general updating Λ and Λ requires a matrix inversion x z x z However this treatment require solving a highly non-trivial at each iteration, e.g. see (43)–(47). An alternative, but sub- random matrix problem i.e. deriving the closed form solution optimal,methodwouldbetheSwept-AMPalgorithm[10]that for RK and RN . This is usually, though not always, not is based on the GAMP methodology and includes O(N2) Jz Jx possible. On the other hand deriving the solution of e.g RK operations. Jz in the limiting case, denote R , is rather simpler. Due to Jz the uniform convergence property of the R-transform [15, REFERENCES Lemma 3.3.4], this approach would allow us to accurately predict, for example RK, for large N,K. This is what we Jz [1] A.M.DavidL.DonohoandA.Montanari,“Message-passingalgorithms show in the next subsection for the zero mean iid Gaussian for compressed sensing,” Proceedings of the National Academy of matrix ensemble. Sciences, vol.106,pp.18914–18919,September2009. Example: The zero-mean and iid case, i.e. GAMP: In this [2] F. Krzakala, M. Me´zard, F. Sausset, Y. Sun, and L. Zdeborova´, “Probabilistic reconstruction in compressed sensing: algorithms, phase section we provide the explicit solutions for Λ and Λ when x z diagrams, and threshold achieving matrices,” Journal of Statistical the entries of A are assumed to be iid Gaussian with zero Mechanics: TheoryandExperiment, no.P08009,2012. mean and variance 1/N. [3] S. Rangan, “Generalized approximate message passing for estimation with random linear mixing,” in IEEE International Symposium on Fromthewell-knownMarc˘henko-Pasturtheoremweobtain Information Theory(ISIT),Saint-Petersburg, Russia,July2011. that (see Appendix C) [4] J. S. Yedidia, W. Freeman, and Y. Weiss, “Constructing free-energy approximations and generalized belief propagation algorithms,” IEEE 1 1 RK (ω)≃ (63) Transactions on Information Theory, vol. 51, no. 7, pp. 2282–2312, Jz N [Λ−1] −ω/α 2005. nX∈N z nn [5] E.Riegler,G.E.Kirkelund,C.N.Manchon,M.Badiu,andB.H.Fleury, RN (ω)≃ 1 1 . (64) “Merging belief propagation andthe meanfieldapproximation: A free Jx αK [Λ ] −ω energy approach,” IEEE Transactions on Information Theory, vol. 59, k∈K x kk no.1,pp.588–602,2013. X Then we obtain the following expression for Λ and Λ as [6] T.Heskes,M.Opper, W.Wiegerinck, O.Winther, andO.Zoeter,“Ap- x z proximateinferencetechniqueswithexpectationconstraints,”Journalof 1 1 Statistical Mechanics: TheoryandExperiment, vol.2005,2005. Λx ≃ N n∈N [Λ−z1]nn+hσx(κx;ΛxI)i/α (65) [7] eMn.ceO,”ppJoeurrannadl oOf.MWaicnhtihneer,L“eEaxrnpiencgtatRioenseacrocnhs,is6te(n2t0a0p5p):ro2x1im77a-t2e20in4f.er- X [8] B. C¸akmak, O. Winther, and B. H. Fleury, “S-AMP: Approximate 1 1 1 ≃ . (66) message passing for general matrix ensembles,” in IEEE Information Λz αK [Λx]kk+hτmi TheoryWorkshop(ITW),Hobart,Tasmanina,Australia,November2014. k∈K X [9] Y.Xu,Y.Kabashima,andL.Zdeborova´,“Bayesiansignalreconstruction From these equations one can conclude that for1-bitcompressedsensing,”JournalofStatisticalMechanics:Theory andExperiment, vol.2014,2014. α Λ ≃ , Λ ≃hτ i. (67) [10] A. Manoel, F. Krzakala, E. W. Tramel, and L. Zdeborova´, “Sparse z hσ (κ ;Λ I)i x m estimation with the swept approximated message-passing algorithm,” x x x arXiv:1406.4311, Jun2014. Thus we recover the fixed point of the GAMP-2nd order [11] S. Rangan, P. Schniter, E. Riegler, A. Fletcher, and V. Cevher, “Fixed updates for the zero-mean iid matrix ensemble as in (15). points of generalized approximate message passing with arbitrary ma- trices,”inIEEEInternationalSymposiumonInformationTheory(ISIT), IV. CONCLUSION Istanbul,Turkey,July2013. [12] M.OpperandO.Winther,“Gaussianprocessesforclassification:Mean- For the given zero-mean iid Gaussian matrix ensemble, fieldalgorithms,” NeuralComputation, pp.2655–2684, 2000. the fixed points of GAMP “asymptotically” coincide with the [13] T.P.Minka,“Expectation propagation forapproximate Bayesian infer- ence,”inSeventeenthconferenceonUncertaintyinartificialintelligence stationarypointsofGibbsfreeenergyunderfirst-andsecond- (UAI),2001. moment constraints. It turns out that the only critical issue [14] M. Opper and O. Winther, “Adaptive and self-averaging Thouless- for GAMP is the update rules for ”variance” parameters Λ Anderson-Palmermeanfieldtheoryforprobabilisticmodeling,”Physical x ReviewE,vol.64,pp.056131–(1–14), October2001. and Λ . These parameters play a central role. Specifically a z [15] F. Hiai and D. Petz, The Semicirle Law, Free Random Variables and crude update rule for a given measurement matrix ensemble Entropy. AmericanMathematical Society,2006. would completely spoil the optimality of the algorithm. If [16] M.Me´zard,G.Parisi,andM.Virasoro,SpinGlassTheoryandBeyond. for general invariant matrix ensembles, Λ and Λ can be WorldScientific LectureNotesinPhysics-Vol.9,1987. x z [17] B. Collins and P. Sniady, “Integration with respect to the Haar mea- updated based on the R-transform formulation in (53) and sureonunitary, orthogonal andsymplectic group,” Communications in (62); the algorithm “asymptotically” fulfills the stationary Mathematical Physics,vol.264,no.3,pp.773–795, 2006. points identities of Gibbs free energy formulation. Once the [18] W.Hachem, P.Loubaton, and J.Najim, “Deterministic equivalents for certain functionals of large random matrices,” The Annals of Applied closed form expressions of (53) and (62) are obtained, the Probability, vol.17,no.3,pp.875–930,2007. resulting algorithm includes solely O(N) operations. But the [19] N. R. Rao and R. Speicher, “Multiplication of free random variables computation of the solutions to these identities is not trivial. and the S-transform: the case of vanishing mean,” Electronic Commu- nications inProbability. Nevertheless it is sometimes doable, e.g. the random row [20] J. W. Silverstein and Z. Bai, “On the empirical distribution of eigen- orthogonal matrix ensembles. Furthermore once either the values of a class of large dimensional random matrices,” Journal of prior or the likelihood is expressed in terms of a Gaussian Multivariate Analysis. APPENDIX A Here, withoutloss of generality,we can define qǫ ,q+ǫ.On PRELIMINARIES the other hand, from the definition of the R-transform in (70) we have Let P a probability distribution on real line. We denote X 1 the Stieltjest transform of PX as RKΛx(−GKΛx(−s))+s− GK (−s) =0 ℑs<0. (76) Λ dP (x) x GX(s), X , ℑs>0 (68) Hence we can write x−s Z q +jǫ≃GK (−RK(−q −jǫ)) (77) where ℑGX(s)>0 [18]. ǫ Λx Jz ǫ The R-transform of P is defined as [15] 1 1 X = . (78) K [Λ ] +RK (−q −jǫ) R (ω),G−1(−ω)− 1, ℑω <0 (69) kX∈K x kk Jz ǫ X X ω This completes the proof. with G−X1 denoting the inverse of GX. Equivalently, APPENDIX C 1 PROOF OF(63)&(64) RX(−GX(s))=s+ GX(s). (70) Let us first consider Jz = A†ΛzA. Note that we do not assume that A and Λ are independent. On the other hand, z HerewedrawtheattentionofthereaderthatℑRX(ω)<0,for Assumption 2 results in that A†A and Λ are asymptotically z ℑω < 0 unless P is a Dirac distribution. This fact follows X free of each others. In this way we can find R by means of from the following property of the Stieltjest transform [18, Jz theso-calledmultiplicativefreeconvolution[19].Howeverthis Proposition 2.2]: for ℑs > 0, ℑ{ 1 +s} ≤ 0 where the GX(s) requiresthe reader to be familiar with the S-transform in free equality holds if, and only if, P is a Dirac distribution. X probability.Infact,byinvokingstandardrandommatrixresults wecanbypasstheneedforusingtheS-transform.Specifically, REMARK 1 LetPX havea pdfpX. FurthermoreletpX(0)= fromthewell-knownMarc˘henko-Pasturtheorem,wecanwrite 0; so that lim ℑG (jǫ)=0. Moreover let ǫ→0+ X 1 G (s)= . (79) q , lim GX(jǫ)= x−1dPX(x)<∞. (71) Jz −s+ 1/xd+PGΛJzz((xs))/α ǫ→0+ Z Theresult(79)isprovenundertRheassumptionsthattheentries Then we have the following identity of A are iid (not necessarily Gaussian) with zero mean and 1 A is independentof Λz [20]. Due to the asymptotic freeness, = lim RX(−ω). (72) this result holds when the entries are restricted to Gaussian q ω→q but without restriction that A and Λ are independent. Now z Consider an T ×T symmetric matrix X. Let L be the set by letting s = G−1(−ω) in (79) and from the definition of containing the eigenvalues of X. The spectrum of X is the R-transform inJz(69) we have denoted by dPΛ (x) R (ω)= z . (80) PTX(x), T1 |{λ∈L:λ<x}|. (73) Jz Z 1/x−ω/α Furthermorefollowingthe identicalargumentsforJ we find x We denotetheStieltjest transformandtheR-transformofPT X 1 dPΛ (x) by GTX and RTX, respectively. Furthermore if for T →∞, X RJx(ω)= α x−xω . (81) has a limiting spectrum almost surely it is denoted by P . Z X Moreover,the Stieltjest transformand the R-transformof P Due to [15, Lemma 3.3.4], the right hand side of the X are denoted by G and R , respectively. expressions in (63) and (64) converge uniformly to (80) and X X (81), respectively. This completes the proof. APPENDIX B PROOF OF(51) Note that from definition in (50) we have q = x−1dPKΛx+Jz(x). (74) Z By invoking Remark 1 and (48) (under the Assumption 1-a) and Assumption 2), successively we can write 1 lim RK (−q −jǫ)+RK(−q −jǫ)− ≃0. ǫ→0+ Λx ǫ Jz ǫ qǫ+jǫ (75)

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.