1 Approximate Message Passing with Nearest Neighbor Sparsity Pattern Learning Xiangming Meng, Sheng Wu, Linling Kuang, Defeng (David) Huang, and Jianhua Lu, Fellow, IEEE Abstract—We consider the problem of recovering clus- group LASSO [10], StructOMP [17], Graph-CoSaMP teredsparsesignalswithnopriorknowledgeofthesparsity [18], and blocksparse Bayesian learning (B-SBL) [19]– 6 pattern. Beyond simple sparsity, signals of interest often [21], etc. However, these algorithms require knowledge 1 exhibitsanunderlyingsparsitypatternwhich,ifleveraged, of sparsity pattern which is usually unknown a priori. 0 canimprovethereconstructionperformance.However, the 2 sparsity pattern is usually unknown a priori. Inspired To reconstruct sparse signals with unknown structure, a by the idea of k-nearest neighbor (k-NN) algorithm, we numberofmethods[22]–[28]havebeendevelopedtouse n proposeanefficientalgorithmtermedapproximatemessage various structured priors to encourage both sparsity and a passing with nearest neighbor sparsity pattern learning cluster patternssimultaneously.Themain effortof these J (AMP-NNSPL), which learns the sparsity pattern adap- algorithmsliesinconstructingahierarchicalpriormodel, 4 tively.AMP-NNSPLspecifiesaflexiblespikeandslabprior ontheunknownsignaland,aftereachAMPiteration,sets e.g., Markov tree [23], structured spike and slab [24], ] the sparse ratios as the average of the nearest neighbor [25], hierarchicalGamma-Gaussian [26]–[28] to encode T estimates via expectation maximization (EM). Experimen- the structured sparsity pattern. .I tal results on both synthetic and real data demonstrate In this letter, we take an alternative approach and s thesuperiorityofourproposedalgorithmbothintermsof c reconstructionperformanceandcomputationalcomplexity. propose an efficient message passing algorithm, termed [ AMP with nearest neighbor sparsity pattern learning Index Terms—Compressed sensing, structured sparsity, (AMP-NNSPL),torecoverclusteredsparsesignalsadap- 1 approximate message passing, k-nearest neighbor. v tively, i.e., without any prior knowledge of the sparsity 3 pattern.Forclusteredsparsesignals,ifthenearestneigh- 4 I. INTRODUCTION bors of one element are zeros (nonzeros),it will tend to 5 bezero(nonzero)withhighprobability,asimilarideaof 0 Compressed sensing (CS) aims to accurately recon- k-nearestneighbor(k-NN)algorithmwhichassumesthat 0 struct sparse signals from undersampled linear mea- . surements [1]–[3]. To this end, a plethora of methods data close together morelikely belongto the same cate- 1 gory[29],[30].Therefore,insteadofexplicitlymodeling 0 have been studied in the past years. Among others, thesophisticatedsparsitypattern,AMP-NNSPLspecifies 6 approximate message passing (AMP) [4] proposed by 1 Donohoetal.isonestate-of-the-artalgorithmtoaddress aflexiblespikeandslabpriorontheunknownsignaland, : aftereachAMPiteration,updatesthesparseratiosasthe v sparse signal reconstruction in CS. Moreover, AMP average of their nearest neighbor estimates via expecta- i has been extended to Bayesian AMP (B-AMP) [5], X tion maximization (EM) [31]. In this way, the sparsity [6] and general linear mixing problems [7]–[9]. While r patternislearnedadaptively.Simulationsresultsonboth a many practical signals can be described as sparse, they synthetic and real data demonstrate the superiority of often exhibit an underlying structure, e.g., the nonzero our proposed algorithm both in terms of reconstruction coefficients occur in clusters [10]–[16]. Exploiting such performance and computational efficiency. intrinsic structure beyond simple sparsity can signifi- cantly boost the reconstruction performance [14]–[16]. II. SYSTEMMODEL Tothisend,variousalgorithmshavebeenproposed,e.g., Consider the following linear Gaussian model This work was partially supported by the National Nature Sci- y=Ax+w, (1) ence Foundation of China (Grant Nos. 91338101, 91438206, and 61231011),theNationalBasicResearchProgramofChina(GrantNo. where x ∈ RN is the unknown signal, y ∈ RM is 2013CB329001). X. Meng and J. Lu are with the Department of Elec- the available measurements, A ∈ RM×N is the known tronic Engineering, Tsinghua University, Beijing, China. (e-mail: measurement matrix, and w ∈ RM ∼ N w;0,∆ I 0 [email protected]; [email protected]). (cid:0) (cid:1) is the additive noise. N x;m,C denotes a Gaussian S.WuandL.KuangarewiththeTsinghuaSpaceCenter,Tsinghua (cid:0) (cid:1) University, Beijing, China. (e-mail: [email protected]; distribution of x with mean m and covariance C and [email protected]). I denotes the identity matrix. Our goal is to estimate x Defeng (David) Huang is with the School of Electrical, Electronic from y when M ≪ N and x is clustered sparse while and Computer Engineering, The University of Western Australia, Australia (e-mail:[email protected]). its specific sparsity pattern is unknown a priori. 2 To enforce sparsity, from a Bayesian perspective, For more details of AMP and its extensions, the the signals are assumed to follow sparsity-promoting readersare referredto [4]–[6], [35]. Two problemsarise prior distributions, e.g., Laplace prior [32], automatic in traditional AMP. First, it assumes full knowledge of relevance determination [33], and spike and slab prior the prior distribution and noise variance, which is an [6], [34]. In this letter we consider a flexible spike and impractical assumption. Second, it does not account for slab prior of the form the potential structure of sparsity. In the sequel, we resorttoexpectationmaximization(EM)tolearntheun- N N p (x)= p (x )= 1−λ δ(x )+λ f(x ) , (2) knownhyperparameters.Further,toencouragestructured 0 Y 0 i Y(cid:2)(cid:0) i(cid:1) i i i (cid:3) sparsity, we develop a nearest neighbor sparsity pattern i=1 i=1 learning rule motivated by the idea of k-NN algorithm. where λ ∈(0,1)is the sparse ratio, i.e., the probability i For lack of space, we onlyconsider the sparse Gaussian of x being nonzero, δ(x ) is the Dirac delta function, i i case, f x = N x ;µ ,τ , while generalization to f(xi)isthedistributionofthenonzeroentriesinx,e.g., other se(cid:0)ttini(cid:1)gs is po(cid:0)ssiible.0 0(cid:1) f(x ) = N(x ;µ ,τ ) for sparse Gaussian signals and i i 0 0 Thehiddenvariablesarechosenastheunknownsignal f(x )=δ(x −1) for sparse binary signals, etc. i i vector x and the hyperparameters are denoted by θ. It is important to note that in (2) we specify an indi- The specific definition of θ depends on the choice of vidualλ for each entry, as opposed to a commonvalue i distribution f x in (2). In the Gaussian case, θ = in[6],[34].Thisisakeyfeaturethatwillbeexploitedby (cid:0) (cid:1) µ ,τ ,∆ ,λ ,i=1,...,N while in the binary case, the proposed algorithm for reconstruction of structured (cid:8)θ =0 ∆0 ,λ0 ,ii=1,...,N .(cid:9)Denote by θt the estimate sparse signals. Up to now, it seems that no structure (cid:8) 0 i (cid:9) of hyperparameters at the tth EM iteration, then EM is ever introduced to enforce the underlying sparsity alternates between the following two steps [31] pattern. Indeed, if the sparse ratios λ ,i = 1,...,N i are learned independently, we will not benefit from the Q θ,θt =E lnp x,y |y;θt , (7) potential structure. The main contribution of this letter (cid:0) (cid:1) n (cid:0) (cid:1) o is a novel adaptive learning method which encourages θt+1 =argmaxQ θ,θt , (8) θ (cid:0) (cid:1) clustered sparsity, as descried in Section III. where E ·|y;θt denotes expectation conditioned on observati(cid:8)ons y w(cid:9)ith parameters θt, i.e., the expectation III. PROPOSED ALGORITHM is with respect to the posterior distribution p x|y;θt . In this section, inspired by the idea of k-NN, we (cid:0) (cid:1) From (1), (2), the joint distribution p(x,y) in (7) is propose an adaptive reconstruction algorithm to recover defined as clustered sparse signals without any prior knowledge of the sparsity pattern, e.g., structure and sparse ratio. p(x,y)=p y|x (1−λ )δ(x )+λ f(x ), (9) (cid:0) (cid:1)Y i i i i Beforeproceeding,we first givea brief descriptionof i AMP. Generally, AMP decouples the vector estimation where p(y|x) = N y;Ax,∆ I . AMP offers an 0 problem (1) into N scalar problems in the asymptotic efficient approximatio(cid:0)n of p x|y(cid:1);θt , denoted as regime [35], [36] q x|y;θt = q x |R ,Σ , (cid:0)whereby(cid:1)the E step (7) (cid:0) (cid:1) Qi (cid:0) i i i(cid:1) can be efficiently calculated. Since joint optimization of R =x +w˜ 1 1 1 θ is difficult, we adopt the incremental EM update rule . y=Ax+w−→.. , (3) proposedin [37], i.e., we updateone or partialelements at a time while holding the other parameters fixed. R =x +w˜ N N N Aftersomealgebra,themarginalposteriordistribution where the effective noise w˜i asymptotically follows of xi in (4) can be written as N w˜ ;0,Σ . The values of R ,Σ are updated itera- i i i i tiv(cid:0)ely in eac(cid:1)h AMP iteration (see Algorithm 1) and the q xi|Ri,Σi = 1−πi δ xi +πiN xi;mi,Vi , (10) (cid:0) (cid:1) (cid:0) (cid:1) (cid:0) (cid:1) (cid:0) (cid:1) posterior distribution of x is estimated as i where 1 τ Σ q(cid:0)xi|Ri,Σi(cid:1)= Z(Ri,Σi)p0(cid:0)xi(cid:1)N(cid:0)xi;Ri,Σi(cid:1), (4) Vi = Σi0+iτ0, (11) whereZ(Ri,Σi)isthenormalizationconstant.From(4), m = τ0Ri+Σiµ0, (12) i the estimates of the mean and variance of xi are Σi+τ0 λ i π = , (13) ga(Ri,Σi)=ˆ xiq(cid:0)xi|Ri,Σi(cid:1)dxi, (5) i λi+(cid:0)1−λi(cid:1)exp(cid:0)−L(cid:1) 2 gc(Ri,Σi)=ˆ x2iq(cid:0)xi|Ri,Σi(cid:1)dxi−ga2(Ri,Σi). (6) L= 12lnΣiΣ+iτ0 + 2RΣi2i − (cid:0)2RΣii−+µτ00(cid:1) . (14) (cid:0) (cid:1) 3 Note that for notational brevity, we have omitted the maximum or a saddle point of the likelihood function iteration index t. The mean and variance defined in (5) [31]. The sparse ratios λ and noise variance ∆ are i 0 and (6) can now be explicitly calculated as initialized as λ1 =0.5 and ∆1 = y 2/M SNR0+1 , i 0 (cid:13) (cid:13)2 (cid:0) (cid:1) respectively, where SNR0 is sugg(cid:13)est(cid:13)ed to be 100 [34]. g R ,Σ =π m , (15) a i i i i (cid:0) (cid:1) For the sparse Gaussian case, active mean µ and vari- gc(cid:0)Ri,Σi(cid:1)=πi(cid:0)m2i +Vi(cid:1)−ga2(cid:0)Ri,Σi(cid:1). (16) ance τ0 are initialized as µ10 = 0, and τ01 =0 (cid:0)(cid:13)y(cid:13)22 − toTmoalxeiamrnizetheQ(cid:0)spθa,rθset(cid:1)rawtiioths λreis,pie=ct1to,..λ.i,.NA,ftweresnoemede lM2 ∆no10r(cid:1)m/λa1in(cid:13)(cid:13)dAF(cid:13)(cid:13)ro2Fb,erneisupsenctoivrmel,y,rewspheecreti(cid:13)(cid:13)v·e(cid:13)(cid:13)l2y,. (cid:13)(cid:13)·(cid:13)(cid:13)F(cid:13)ar(cid:13)e the algebra, we obtain the standard EM update equation as Theproposedapproximatemessagepassingwithnear- λt+1 = πt, which, albeit simple, fails to capture the est neighbor sparsity pattern learning (AMP-NNSPL) is i i inherentstructurein the sparsity pattern.To addressthis summarized in Algorithm 1. The complexity of AMP- problem, a novel learning rule is proposed as follows NNSPLisdominatedbymatrix-vectormultiplicationsin 1 the originalAMP and thus only scales as O(MN), i.e., λt+1 = πt, (17) i N i X j the proposed algorithm is computationally efficient. (cid:12) (cid:0) (cid:1)(cid:12)j∈N(i) (cid:12) (cid:12) where N i denotes the set of nearest neighbor in- Algorithm 1 AMP-NNSPL Algorithm (cid:0) (cid:1) dexes of element x in x (1) and N i denotes the Input: y A. i (cid:12) (cid:0) (cid:1)(cid:12) cardinality of N i . For one dimen(cid:12)siona(cid:12)l (1D) data , Initialization: Set t=1 and Tmax,ǫtoc. Initialize N i = i−1,i(cid:0)+(cid:1)1 1 and N i =2, while for two µ ,τ ,∆ and λ ,i=1,...,N as in Section III. 0 0 0 i (cid:0) (cid:1) (cid:8) (cid:9) (cid:12) (cid:0) (cid:1)(cid:12) dimensional(2D)data,N(cid:0)i(cid:1)(cid:12)=(cid:8)(q,(cid:12)l−1),(q,l+1),(q− xˆ1i = xip0(xi)dxi,νi1 = |xi−xˆ1i|2p0(xi)dxi,i= 1,l),(q+1,l) and N i = 4, where (q,l) indicates 1,...,´N, V0 =1,Z0 =y´,a=1,...,M. (cid:9) (cid:12) (cid:0) (cid:1)(cid:12) a a a the coordinates of xi(cid:12)in th(cid:12)e 2D space. Generalizations 1) Factor node update: For a=1,...,M to other cases can be made. Vt = |A |2νt, Notethatin(17),wehavechosenthenearestneighbor a X ai i ofeachelement,excludingitself,astheneighboringset. i Vt The estimate of one sparse ratio is not determined by Zt = A xˆt− a y −Zt−1 . its own estimate, but rather the average of its nearest a Xi ai i ∆t0+Vat−1(cid:0) a a (cid:1) neighbor estimates. The insight for this choice is that, 2) Variable node update: For i=1,...,N for clustered sparse signals, if the nearest neighbors of oneelementarezero(nonzero),itwillbezero(nonzero) Σt = |Aai|2 −1, with high probability, a similar idea to k-NN. If the i hX∆t +Vti a 0 a neighboring set is chosen as the whole elements, the A y −Zt proposedalgorithmreducestoEM-BG-GAMP[6],[34]. Rt =xˆt+Σt ai(cid:0) a a(cid:1), i i iX ∆t +Vt The leaning of other hyperparameters follows the a 0 a standard rule of EM algorithm. Maximizing Q θ,θt xˆt+1 =g Rt,Σt , (cid:0) (cid:1) i a(cid:0) i i(cid:1) with respect to ∆0 and after some algebra, we obtain νˆt+1 =g Rt,Σt . i c(cid:0) i i(cid:1) ∆t+1 = 1 (cid:0)ya−Zat(cid:1)2 + ∆t0Vat , (18) 3) Update λti+1,i=1,...N, as (17); 0 M Xa h(cid:0)1+Vat/∆t0(cid:1)2 ∆t0+Vati 4) Update µt0+1,τ0t+1,∆t0+1 as (19), (20), and (18); 5) Set t ← t + 1 and proceed to step 1) until T whereZt andVt areobtainedwithin theAMPiteration max a a iterations or xˆt+1−xˆt <ǫ xˆt . and are defined in Algorithm 1. Similarly, maximizing (cid:13) (cid:13)2 toc(cid:13) (cid:13)2 Q θ,θt with respect to µ and τ resultsin the update (cid:13) (cid:13) (cid:13) (cid:13) 0 0 (cid:0) (cid:1) equations πtmt IV. NUMERICAL EXPERIMENTS µt+1 = Pi i i, (19) 0 πt In this section, a series of numerical experiments i i τ0t+1 = P1πtXπit(cid:2)(cid:0)µt0−mti(cid:1)2+Vi(cid:3). (20) aprreoppoesrefdoramlgeodritthomdeumndoenrsvtraartieoutsheseptteinrfgosr.mCaonmcepaorfisothnes Pi i i aremadetosomestate-of-the-artmethodswhichneedno Valid initialization of the unknown hyperparameters prior information of the sparstiy pattern, e.g., PC-SBL is essential since EM algorithmmay convergeto a local [26] and its AMP version PCSBL-GAMP [27], MBCS- LBP[28],andEM-BG-GAMP[34].Theperformanceof 1For end points of 1D data, the nearest neighbor set has only one BasisPursuit(BP) [38]–[40] isalsoevaluated.Through- element.Foredgepointsof2Ddata,thenearestneighborsethasonly twoorthreeelements. out the experiments, we set the maximum number of 4 1 1 10 0.8 AMP-NNSPL AMP-NNSPL Success Rate0000....2468 APPEMCCM-SP-SBB-BNGLLN--GGSAAPMMLPP Pattern Success Rate0000....2468 APPEMCCM-SP-SBB-BNGLLN--GGSAAPMMLPP NMSE (dB)----432100000 PPEMBCCMPB-S-CSBBSBGL-LL--GGBAAPMMPP Recovery Time (sec)000000......234567 PPEMBCCMPB-S-CSBBSBGL-LL--GGBAAPMMPP MBCS-LBP MBCS-LBP -50 0.1 BP BP 00.2 0.3 0.4 0.5 0.6 0.7 0.8 00.2 0.3 0.4 0.5 0.6 0.7 0.8 -600.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 00.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Measurement ratio: M/N Measurement ratio: M/N Measurement ratio: M/N Measurement ratio: M/N Figure1. Successrate(left)andpatternsuccessrate(right)vs.M/N Figure2. NMSE(left)andrecoverytime(right)vs.M/N forblock- forblock-sparse signalsN =100,K=25,L=4,noiseless case. sparsesignals N =100,K=25,L=4,SNR=50dB. 1 0 AMP-NNSPL iterations for AMP-NNSPL, PCSBL-GAMP, and EM- 0.8 APMCSPB-NLN-GSAPMLP -10 PECMS-BBGL--GGAAMMPP oBwfGet-euGrsmeAitMnhaePtidoteonfabtuoetbsTeemttǫaitnxogc=.=T2h10e00−e,l6ae.nmFdeontrhtseootthfoelmrereaaalngscouerrietvmhamleunset, Success Rate00..46 EBMP-BG-GAMP NMSE(dB)---432000 BP matrixA∈RM×N are independentlygeneratedfollow- 0.2 -50 ing standard Gaussian distribution and the columns are 0 -60 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 Measurement ratio: M/N Measurement ratio: M/N normalized to unit norm. The success rate is defined as the ratio of the number of successful trials to the total Figure 3. Success rate (left) in noiseless case and NMSE (right) at number of experiments, where a trial is successful if SNR=50dBvs.M/N forreal2Dangiogram image. the normalized mean square error (NMSE) is less than -60dB, where NMSE=20log ( xˆ−x / x ). The 10 (cid:13) (cid:13)2 (cid:13) (cid:13)2 canbeseenthatAMP-NNSPLsignificantlyoutperforms pattern recovery success rate is d(cid:13)efined(cid:13)as t(cid:13)he(cid:13)ratio of other methodsboth in terms of success rate and NMSE. the number of successful trials to the total number of In particular, when M/N = 0.12 and SNR = 50 dB, experiments, where a trial is successful if the support is typical recovery results are illustrated in Fig. 4, which exactlyrecovered.Acoefficientwhosemagnitudeisless showsthatAMP-NNSPLachivesthebestreconstruction than 10−4 is deemed as a zero coefficient. performance. A. Synthetic Data Wegeneratesyntheticblock-sparsesignalsinasimilar way as [21], [26], where K nonzero elements are par- titioned into L blocks with random sizes and random locations. Set N = 100,K = 25,L = 4 and the (cid:11)(cid:68)(cid:12)(cid:50)(cid:85)(cid:76)(cid:74)(cid:76)(cid:81)(cid:68)(cid:79) (cid:11)(cid:69)(cid:12)(cid:37)(cid:51) (cid:11)(cid:70)(cid:12)(cid:40)(cid:48)(cid:16)(cid:37)(cid:42)(cid:16)(cid:42)(cid:36)(cid:48)(cid:51) nonzeroelementsaregeneratedindependentlyfollowing Gaussian distribution with mean µ = 3 and variance 0 τ =1. The results are averaged over 1000 independent (cid:37)(cid:51) (cid:16)(cid:19)(cid:17)(cid:23)(cid:19)(cid:20) 0 runs.Fig.1depictsthesuccessrateandpatternrecovery (cid:40)(cid:48)(cid:16)(cid:37)(cid:42)(cid:16)(cid:42)(cid:36)(cid:48)(cid:51) (cid:16)(cid:20)(cid:17)(cid:21)(cid:20)(cid:28) (cid:51)(cid:38)(cid:54)(cid:37)(cid:47)(cid:16)(cid:42)(cid:36)(cid:48)(cid:51) (cid:16)(cid:26)(cid:17)(cid:20)(cid:23)(cid:25) success rate. It can be seen that AMP-NNSPL achieves (cid:36)(cid:48)(cid:51)(cid:16)(cid:49)(cid:49)(cid:54)(cid:51)(cid:47) (cid:16)(cid:20)(cid:26)(cid:17)(cid:28)(cid:19)(cid:27) the highest success rate and pattern recovery rate at (cid:11)(cid:71)(cid:12)(cid:51)(cid:38)(cid:54)(cid:37)(cid:47)(cid:16)(cid:42)(cid:36)(cid:48)(cid:51) (cid:11)(cid:72)(cid:12)(cid:36)(cid:48)(cid:51)(cid:16)(cid:49)(cid:49)(cid:54)(cid:51)(cid:47) (cid:11)(cid:73)(cid:12)(cid:49)(cid:48)(cid:54)(cid:40) various measurement ratios. In the noisy setting, Fig. 2 shows the average NMSE and runtime of different Figure 4. Recovery results of real 2D angiogram image in noisy algorithms when the signal to noise ratio (SNR) is 50 settingwhenM/N =0.12andSNR=50dB. dB, where SNR = 20log ( Ax / w ). We see 10 (cid:13) (cid:13)2 (cid:13) (cid:13)2 that AMP-NNSPL outperforms(cid:13)othe(cid:13)r m(cid:13)eth(cid:13)ods both in V. CONCLUSION terms of NMSE and computational efficiency. Inthislettter,weproposeanefficientalgorithmtermed AMP-NNSPL to recover clustered sparse signals when B. Real Data the sparsity pattern is unknown. Inspired by the k- Toevaluatetheperformanceonrealdata,weconsider NN algorithm, AMP-NNSPL learns the sparse ratios a real angiogram image [18] of 100×100 pixels with in each AMP iteration as the average of their nearest sparsity around 0.12. Fig. 3 depicts the success rate in neighbor estimates using EM, thereby the sparsity pat- noiseless case and NMSE at SNR = 50 dB, respec- tern is learned adaptively. Experimental results on both tively. The MBCS-LBP and PC-SBL algorithms are not synthetic and real data demonstrate the state-of-the-art included due to their high computational complexity. It performance of AMP-NNSPL. 5 REFERENCES [20] Z.ZhangandB.D.Rao,“Sparsesignalrecoverywithtemporally correlated source vectors using sparsebayesian learning,” IEEE J.Sel.TopicsSignalProcess.,vol.5,no.5,pp.912–926,2011. [1] D.L.Donoho, “Compressedsensing,” IEEETrans.Inf.Theory, [21] Z. Zhang and B. Rao, “Extension of SBL algorithms for the vol.52,no.4,pp.1289–1306, Apr.2006. recovery of block sparse signals with intra-block correlation,” [2] E.J.CandèsandM.B.Wakin,“Anintroduction tocompressive IEEETrans. onSignal Process.,vol. 61, no.8, pp. 2009–2015, sampling,” IEEE Signal Process. Mag., vol. 25(2), pp. 21–30, 2013. Mar.2008. [22] L. He and L. Carin, “Exploiting structure in wavelet-based [3] Y.C.EldarandG.Kutyniok, Eds.,Compressedsensing:theory bayesian compressive sensing,” IEEE Trans. Signal Process., andapplications. Cambridge Univ.Press,2012. vol.57,no.9,pp.3488–3497, 2009. [4] D.L.Donoho, A.Maleki, andA.Montanari, “Message-passing algorithmsforcompressedsensing,”inProc.Nat.Acad.Sci.,vol. [23] S.SomandP.Schniter,“Compressiveimagingusingapproximate 106,no.45,Nov.2009,pp.18914–18919. message passing and a markov-tree prior,” IEEE Trans. Signal [5] ——, “Message passing algorithms for compressed sensing: Process.,vol.60,no.7,pp.3439–3448, 2012. I. motivation and construction,” in IEEE Information Theory [24] L.Yu,H.Sun,J.-P.Barbot,andG.Zheng,“Bayesiancompressive Workshop(ITW),Jan.2010,pp.1–5. sensingforclusterstructuredsparsesignals,”SignalProcessing, vol.92,no.1,pp.259–269, 2012. [6] F. Krzakala, M. Mézard, F.Sausset, Y. Sun, and L. Zdeborová, [25] M.R.Andersen,O.Winther,andL.K.Hansen,“Spatio-temporal “Probabilistic reconstruction incompressedsensing:algorithms, spikeandslabpriorsformultiplemeasurementvectorproblems,” phase diagrams, and threshold achieving matrices,” Journal of arXivpreprintarXiv:1508.04556, 2015. StatisticalMechanics:TheoryandExperiment,vol.2012,no.08, [26] J. Fang, Y. Shen, H. Li, and P. Wang, “Pattern-coupled sparse p.P08009,2012. Bayesian learning for recovery of block-sparse signals,” IEEE [7] S. Rangan, “Generalized approximate message passing for esti- Trans.onSignalProcess.,vol.63,no.2,pp.360–372,2015. mationwithrandomlinearmixing,”inProc.IEEEInt.Symp.Inf. [27] J.Fang,L.Zhang,andH.Li,“Two-dimensionalpattern-coupled Theory,2011,pp.2168–2172. sparse Bayesian learning via generalized approximate message [8] P.Schniter, “Amessage-passingreceiver forBICM-OFDMover passing,”arXivpreprintarXiv:1505.06270, 2015. unknown clustered-sparse channels,” IEEEJ. Sel. Topics Signal [28] L.Yu,H.Sun,G.Zheng,andJ.P.Barbot,“ModelbasedBayesian Process.,vol.5,no.8,pp.1462–1474, Dec.2011. compressive sensing via local beta process,” Signal Processing, [9] U. S. Kamilov, S. Rangan, A. K. Fletcher, and M. Unser, “Ap- vol.108,pp.259–271,2015. proximatemessagepassingwithconsistentparameterestimation [29] E. Fix and J. L. Hodges Jr, “Discriminatory analysis- and applications to sparse learning,” IEEE Trans. Inf. Theory, nonparametric discrimination: consistency properties,” DTIC vol.60(5),pp.2969–2985,May.2014. Document, Tech.Rep.,1951. [10] M.YuanandY.Lin,“Modelselection andestimationinregres- [30] T.M.CoverandP.E.Hart,“Nearestneighborpatternclassifica- sion with grouped variables,” Journal of the Royal Statistical tion,”IEEETrans.Inf.Theory,vol.13,no.1,pp.21–27,1967. Society: Series B (Statistical Methodology), vol. 68, no. 1, pp. [31] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum 49–67,2006. likelihoodfromincompletedataviatheemalgorithm,”Journalof [11] V.Cevher,P.Indyk,C.Hegde,andR.G.Baraniuk,“Recoveryof theroyalstatisticalsociety.SeriesB(methodological), pp.1–38, clusteredsparsesignalsfromcompressivemeasurements,”DTIC 1977. Document, Tech.Rep.,2009. [32] T. Park and G. Casella, “The Bayesian lasso,” Journal of the [12] M.Stojnic,F.Parvaresh,andB.Hassibi,“Onthereconstruction AmericanStatisticalAssociation,vol.103,no.482,pp.681–686, of block-sparse signals with an optimal number of measure- 2008. ments,” IEEE Trans. Signal Process., vol. 57, no. 8, pp. 3075– [33] M. E. Tipping, “Sparse Bayesian learning and the relevance 3085,2009. vector machine,” The journal of machine learning research, [13] R.G.Baraniuk,V.Cevher,M.F.Duarte,andC.Hegde,“Model- vol.1,pp.211–244,2001. based compressive sensing,” IEEE Trans. Inf. Theory, vol. 56, [34] J. P. Vila and P. Schniter, “Expectation-maximization gaussian- no.4,pp.1982–2001,2010. mixtureapproximatemessagepassing,”IEEETrans.SignalPro- [14] J. Huang, T. Zhang et al., “The benefit of group sparsity,” The cess.,vol.61,no.19,pp.4658–4672,2013. AnnalsofStatistics, vol.38,no.4,pp.1978–2004, 2010. [35] A.Montanari, “Graphical models concepts incompressed sens- [15] Y.C.EldarandM.Mishali,“Robustrecoveryofsignals froma ing,”arXivpreprintarXiv:1011.4328, 2010. structuredunionofsubspaces,”IEEETrans.Inf.Theory,vol.55, [36] M.BayatiandA.Montanari,“Thedynamicsofmessagepassing no.11,pp.5302–5316, 2009. ondensegraphs,withapplicationstocompressedsensing,”IEEE [16] Y.C.Eldar,P.Kuppinger,andH.Bölcskei,“Block-sparsesignals: Trans.Inf.Theory,vol.57,no.2,pp.764–785,Feb.2011. Uncertaintyrelationsandefficientrecovery,”IEEETrans.Signal [37] R.M.NealandG.E.Hinton,“AviewoftheEMalgorithmthat Process.,vol.58,no.6,pp.3042–3054, 2010. justifies incremental, sparse, and other variants,” in Learning in [17] J.Huang, T.Zhang, andD.Metaxas, “Learning with structured graphical models. Springer, 1998,pp.355–368. sparsity,” The Journal of Machine Learning Research, vol. 12, [38] S.S.Chen,D.L.Donoho,andM.A.Saunders,“Atomicdecom- pp.3371–3412, 2011. positionbybasispursuit,”SIAMjournalonscientificcomputing, vol.20,no.1,pp.33–61,1998. [18] C. Hegde, P. Indyk, and L. Schmidt, “A nearly-linear time [39] E. J. Candès and T. Tao, “Decoding by linear programming,” framework forgraph-structured sparsity,” inProceedings ofThe IEEETrans.Inf.Theory,vol.51,no.12,pp.4203–4215, 2005. 32ndInternational Conference onMachine Learning, 2015, pp. [40] E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty 928–937. principles: Exact signal reconstruction from highly incomplete [19] D.P.Wipf and B.D.Rao, “Anempirical Bayesian strategy for frequency information,” IEEETrans.Inf.Theory,vol.52,no.2, solving the simultaneous sparse approximation problem,” IEEE pp.489–509, 2006. Trans.SignalProcess.,vol.55,no.7,pp.3704–3716, 2007.