Table Of Content

IEEETRANSECTIONONSIGNALPROCESSING,VOL. ,NO. , ,2015 1 Group Sparse Recovery via the (cid:96)0((cid:96)2) Penalty: Theory and Algorithm Yuling Jiao, Bangti Jin, Xiliang Lu Abstract—In this work we propose and analyze a novel ap- very popular. Many deep results on the equivalence between proachforgroupsparserecovery.Itisbasedonregularizedleast the(cid:96)0 and(cid:96)1 problemsanderrorestimateshavebeenobtained squares with an (cid:96)0((cid:96)2) penalty, which penalizes the number of [3], [4], based on the concepts mutual coherence (MC) and nonzerogroups.Onedistinctfeatureoftheapproachisthatithas 6 thebuilt-indecorrelationmechanismwithineachgroup,andthus restricted isometry property (RIP). 1 can handle challenging strong inner-group correlation. We pro- 0 vide a complete analysis of the regularized model, e.g., existence 2 of a global minimizer, invariance property, support recovery, A. Group sparse recovery v and properties of block coordinatewise minimizers. Further, In practice, in addition to sparsity, signals may exhibit the regularized problem admits an efficient primal dual active o additional structure, e.g., nonzero coefficients occur in clus- set algorithm with a provable finite-step global convergence. N ters/groups, which are commonly known as block- / group- At each iteration, it involves solving a least-squares problem 2 on the active set only, and exhibits a fast local convergence, sparsity. In electroencephalogram (EEG), each group encodes 2 whichmakesthemethodextremelyefficientforrecoveringgroup theinformationaboutthedirectionandstrengthofthedipoles sparsesignals.Extensivenumericalexperimentsarepresentedto of each discrete voxel representing the dipole approximation illustrate salient features of the model and the efficiency and ] [5].Otherapplicationsincludemulti-tasklearning[6],wavelet T accuracy of the algorithm. A comparative study indicates its image analysis [7], [8], gene analysis [9], [10] and multichan- I competitiveness with existing approaches. . nel image analysis [11], [12], to name a few. The multiple s IndexTerms—groupsparsity,blocksparsity,blockwisemutual c measurement vector problem is also one special case [13]. incoherence, global minimizer, block coordinatewise minimizer, [ primal dual active set algorithm, (cid:96)0((cid:96)2) penalty In these applications, the focus is to recover all contributing 2 groups, instead of one entry from each group. The group v structureisanimportantpieceofaprioriknowledgeaboutthe 4 I. INTRODUCTION problem,andshouldbeproperlyaccountedforintherecovery 7 SPARSE recovery has received much attention in many methodinordertoimproveinterpretabilityandaccuracyofthe 1 areas,e.g.,signalprocessing,statistics,andmachinelearn- recovered signal. 4 0 ing recently. The key assumption is that the data y ∈ Rn is There have been many important developments of group . generated by a linear combination of a few atoms of a given sparse recovery. One popular approach is group lasso, ex- 1 dictionary Ψ ∈ Rn×p, p (cid:29) n, where each column represents tending lasso using an (cid:96)1((cid:96)2)-penalty [14]–[17]. A number 0 6 an atom. In the presence of noise η ∈Rn (with a noise level of theoretical studies have shown many desirable properties 1 (cid:15)=(cid:107)η(cid:107)), it is formulated as of group lasso, and its advantages over lasso for recovering : group sparse signals [18]–[23] under suitable MC or RIP v y =Ψx†+η, (1) i type conditions. To remedy the drawbacks of group lasso, X where the vector x† ∈Rp denotes the signal to be recovered. e.g., biasedness and lack of the oracle property [24], [25], r nonconvex penalties have been extended to the group case, a The most natural formulation of the problem of finding the sparsest solution is the following (cid:96)0 optimization e.g.,bridge,smoothlyclippedabsolutedeviation(SCAD),and minmax concavity penalty (MCP) [17], [26], [27]. A number xm∈iRnp 12(cid:107)Ψx−y(cid:107)2+λ(cid:107)x(cid:107)(cid:96)0, (2) of efficient algorithms [16], [28]–[34] have been proposed for convex and nonconvex group sparse recovery models. Like in where (cid:107)·(cid:107) denotes the Euclidean norm of a vector, (cid:107)·(cid:107)(cid:96)0 the sparse case, several group greedy methods have also been denotes the number of nonzero entries, and λ > 0 is a reg- developed and analyzed in depth [20], [35], [36]. ularization parameter. Due to discontinuity of the (cid:96)0 penalty, However, in these interesting works, the submatrices of it is challenging to find a global minimizer of problem (2). Ψ are assumed to be well conditioned in order to get esti- In practice, lasso / basis pursuit [1], [2], which replaces the mation errors. While this assumption is reasonable in some (cid:96)0 penalty with its convex relaxation, the (cid:96)1 penalty, has been applications, it excludes the practically important case of strong correlation within groups. For example, in microar- SchoolofStatisticsandMathematics,ZhongnanUniversityofEconomics ray gene analysis, it was observed that genes in the same andLaw,Wuhan,430063,P.R.China.([email protected]) Department of Computer Science, University College London, Gower pathway produce highly correlated values [37]; in genome- Street,LondonWC1E6BT,UK.([email protected],[email protected]) wide association studies, SNPs are highly correlated or even Correspondingauthor.SchoolofMathematicsandStatisticsandHubeiKey linearly dependent within segments of the DNA sequence LaboratoryofComputationalScience,WuhanUniversity,Wuhan430072,P.R. China.([email protected]) [38];infunctionalneuroimaging,identifyingthebrainregions IEEETRANSECTIONONSIGNALPROCESSING,VOL. ,NO. , ,2015 2 involved in the cognitive processing of an external stimuli is to show its competitiveness with start-of-art group sparse formalized as identifying the non-zero coefficients of a linear recovery methods, including group lasso and greedy methods. model predicting the external stimuli from the neuroimaging data, where strong correlation occurs between neighboring C. Connections with existing works and organization voxels [39]; just to name a few. The proposed model (3) is closely related to the following In the presence of strong inner-group correlation, an in- constrained nonconvex optimization advertent application of standard sparse recovery techniques is unsuitable. Numerically, one often can only recover one min(cid:107)x(cid:107) subject to y =Ψx, (P ) (cid:96)0((cid:96)q) q predictorwithineachcontributinggroup,whichisundesirable in the absence of noise η. This model was studied in [20], when seeking the whole group [40]. Theoretically, the corre- [36], [43], [44]. In the case of q = 2, Eldar and Mishali lation leads bad RIP or MC conditions, and thus many sparse [43] discussed unique group sparse recovery, and Eldar et recovery techniques may perform poorly. al [20] developed an orthogonal matching pursuit algorithm for recovering group sparse signals and established recovery B. The (cid:96)0((cid:96)2) approach and our contributions conditionintermsofblockcoherence.Seealso[36]forrelated In this work, we shall develop and analyze a nonconvex results for subspace signal separation. Elhamifar and Vidal model and algorithm for recovering group-sparse signals with [44] derived the necessary and sufficient conditions for the potentially strong inner-group correlation. Our approach is equivalence of problem (P ) with a convex (cid:96)1((cid:96)q) relaxation, q based on the following (cid:96)0((cid:96)2) optimization and gave sufficient conditions using the concept cumulative min(cid:8)J (x)= 1(cid:107)Ψx−y(cid:107)2+λ(cid:107)x(cid:107) (cid:9), (3) subspace coherence. Further, under even weaker conditions, x∈Rp λ 2 (cid:96)0((cid:96)2) they extended these results to the Ψ-weighted formulation where the (cid:96)0((cid:96)2) penalty (cid:107)·(cid:107) (with respect to a given N (cid:96)0((cid:96)2) (cid:88) partition {Gi}Ni=1) is defined below in (6), and the regular- min (cid:107)ΨGixGi(cid:107)0(cid:96)q subject to y =Ψx, (Pq(cid:48)) ization parameter λ > 0 controls the group sparsity level of i=1 the solution. The (cid:96)0((cid:96)2) penalty is to penalize the number of which is especially suitable for redundant dictionaries. The nonzero groups. To the best of our knowledge, this model has models (P ) and (P(cid:48)) are equivalent, if the columns within q q not been systematically studied in the literature, even though each group are of full column rank. Our approach (3) can the(cid:96)0((cid:96)2)penaltywasusedinseveralpriorworks;seeSection be viewed as a natural extension of (P ) with q = 2 to the q I-C below. We shall provide both theoretical analysis and case of noisy data using a Lagrangian formulation, which, efficient solver for the model. due to the nonconvexity of the (cid:96)0((cid:96)2) penalty, is generally The model (3) has several distinct features. The regularized not equivalent to the constrained formulation. In this work, solution is invariant under full rank column transformation, we provide many new insights into analytical properties and anddoesnotdependonthespecificparametrizationwithinthe algorithm developments for the model (3), which have not groups.Thus,itallowsstronginner-groupcorrelationandmer- been discussed in these prior works. Surprisingly, we shall itsabuilt-indecorrelationeffect,andadmitstheoreticalresults show that the model (3) has built-in decorrelation effect for under very weak conditions. Further, both global minimizer redundant dictionaries, similar to the model (P(cid:48)). q andblockcoordinatewiseminimizerhavedesirableproperties, The rest of the paper is organized as follows. In Section II, e.g., support recovery and oracle property. wedescribetheproblemsetting,andderiveusefulestimates.In The main contributions of this work are three-folded. First, SectionIII,weprovideanalyticalproperties,e.g.,theexistence we establish fundamental properties of the model (3), e.g., of a global minimizer, invariance property, and optimality existence of a global minimizer, local optimality, necessary condition.InSectionIV,wedevelopanefficientgroupprimal optimality condition, and transformation invariance, which dual active set with continuation algorithm, and analyze its theoretically substantiates (3). For example, the invariance convergenceandcomputationalcomplexity.Finally,inSection implies that it can be equivalently transformed into a problem V, several numerical examples are provided to illustrate the with orthonormal columns within each group, and thus it mathematical theory and the efficiency of the algorithm. All is independent of the conditioning of inner-group columns, the technical proofs are given in the appendices. which contrasts sharply with most existing group sparse recovery models. Second, we develop an efficient algorithm II. PRELIMINARIES for solving the model (3), which is of primal dual active set In this section, we describe the problem setting, and derive (PDAS)type.Itisbasedonacarefulanalysisofthenecessary useful estimates. optimalitysystem,andrepresentsanontrivialextensionofthe PDAS algorithm for the (cid:96)1 and (cid:96)0 penalties [41], [42]. It is A. Problem setting and notations very efficient when coupled with a continuation strategy, due to its Newton nature [41]. Numerically, each inner iteration Throughout, we assume that the sensing matrix Ψ∈Rn×p involves only solving a least-squares problem on the active withn(cid:28)phasnormalizedcolumns(cid:107)ψ (cid:107)=1fori=1,...,p, i set. The whole algorithm converges globally in finite steps and the index set S = {1,...,p} is divided into N non- to the oracle solution. Third, we present extensive numerical overlapping groups {G }N such that 1 ≤ s = |G | ≤ s i i=1 i i experiments to illustrate the features of our approach, and and (cid:80)N |G | = p. For any index set B ⊆ S, we denote i=1 i IEEETRANSECTIONONSIGNALPROCESSING,VOL. ,NO. , ,2015 3 by x (respectively Ψ ) the subvector of x (respectively the B. Blockwise mutual coherence B B submatrix of Ψ) which consists of the entries (respectively Weshallanalyzethemodel(3)usingtheconceptblockwise columns) whose indices are listed in B. All submatrices Ψ , Gi mutual coherence (BMC). We first introduce some notation: i = 1,2,...,N, are assumed to have full column rank. The ttrouethesigpnaarltitxio†nis{Gas}suNme,di.eto., bxe† g=ro(uxp†sp,a..r.s,ex†wit)h, wreisthpeTct Ψ¯Gi =(ΨtGiΨGi)12 and Di,j =Ψ¯−Gi1ΨtGiΨGjΨ¯−Gj1. (7) nonzero groups. Accioir=d1ingly, the groupGin1dex seGtN{1,...,N} Since ΨGi has full column rank, Ψ¯Gi is symmetric positive is divided into the active set A† and inactive set I† by definite and invertible. The main tool in our analysis is the BMC µ of the matrix A† ={i:(cid:107)x† (cid:107)=(cid:54) 0} and I† =(A†)c. (4) Ψ with respect to the partition {G }N , which is defined by Gi i i=1 (cid:104)u,v(cid:105) The data vector y in (1), possibly contaminated by noise, can µ=maxµ , where µ = sup , (8) tbreuereaccatsitveasseyt=A†Ψ(xa†s+ifηit=we(cid:80)rei∈pAro†vΨidGeidx†Gbyi +anη.orGacivlee)n, twhee i(cid:54)=j i,j i,j vu∈∈NNji\\{{00}}(cid:107)u(cid:107)(cid:107)v(cid:107) define the oracle solution xo by the least squares solution on whereN isthesubspacespannedbythecolumnsofΨ ,i.e., A† to (1), i.e., N =spain{ψ ,l ∈G }⊆Rn. The quantity µ is theGciosine i l i i,j of the minimum angle between two subspaces N and N . xo = argmin (cid:107)Ψx−y(cid:107)2. (5) i j Thus the BMC µ generalizes the concept mutual coherence supp(x)⊆∪i∈A†Gi (MC) ν, which is defined by ν =max |(cid:104)ψ ,ψ (cid:105)| [47], and i(cid:54)=j i j The oracle solution xo is uniquely defined provided that is widely used in the analysis of sparse recovery algorithms Ψ has full column rank. It is the best approximation [42], [48], [49]. The concept BMC was already introduced ∪i∈A†Gi for problem (1), and will be used as the benchmark. in[36]forseparatingsubspacesignals,and[44]foranalyzing For any vector x ∈ Rp, we define an (cid:96)r((cid:96)q)-penalty (with convexblocksparserecovery.Inlinearalgebra,oneoftenuses respect to the partition {G }N ) for r ≥0 and q >0 by principalanglestoquantifytheanglesbetweentwosubspaces i i=1 [50], i.e., given U,V ⊆ Rn, the principal angles θ for l =  ((cid:80)Ni=1(cid:107)xGi(cid:107)r(cid:96)q)1/r, r >0, 1,2,...,min(dimU,dimV) are defined recursively byl (cid:107)x(cid:107) = (cid:93){i:(cid:107)x (cid:107) (cid:54)=0}, r =0, (6) (cid:96)r((cid:96)q)  maxi{(cid:107)GxiGi(cid:96)(cid:107)q(cid:96)q}, r =∞. cos(θl)=u∈U,(cid:107)u(cid:107)=1,mua⊥xspan{ui}li−=11(cid:104)u,v(cid:105). When r = q > 0, the (cid:96)r((cid:96)q) penalty reduces to the usual v∈V,(cid:107)v(cid:107)=1, v⊥span{vj}lj−=11 (cid:96)r penalty. The choice r = 0 (or r = ∞) and q = 2 is By the definition of principal angles, µ = cos(θ ) for i,j 1 frequently used below. Further, we shall abuse the notation (U,V) = (N ,N ); see Lemma 2 below and [50, pp. 603– i j (cid:107)·(cid:107)(cid:96)r((cid:96)q) for any vector that is only defined on some sub- 604] for the proof. Principal angles (and hence BMC) can be groups (equivalently zero extension). computed efficiently by QR and SVD [50], unlike RIP or its For any r,q ≥1, the (cid:96)r((cid:96)q) penalty defines a proper norm, variants [51]. and was studied in [45]. For any r,q > 0, the (cid:96)r((cid:96)q) penalty Lemma 2: Let Ui ∈ Rn×si and Vj ∈ Rn×sj be two is continuous. The (cid:96)0((cid:96)2) penalty, which is of major interest matrices whose columns are orthonormal basis of N and N , i j inthiswork,isdiscontinuous,butstilllowersemi-continuous. respectively,and{θ }min(si,sj)betheprincipalanglesbetween Proposition 1: The(cid:96)0((cid:96)2)penaltyislowersemicontinuous. N and N . Then, µl l=1=cos(θ )=σ (UtV ). Proof:Let{xn}⊂Rp beaconvergentsequencetosome iThenexjtresultshoiw,jsthatthe1BMCmµaxcanibejboundedfrom (cid:107)txox∗∗G(cid:107)∈xi(cid:107)∗GR(cid:96)i0p(cid:107).≤,Bfloyirmthiien=fc(cid:107)o1xn,tnG.i.ni.(cid:107)u,(cid:96)itN0y[.o4Nf6,othwLeemt(cid:96)h2menaaosr2sm.e2r,]t.i(cid:107)oxnnGfio(cid:107)llcoownsvefrrgoems atthhbeeovBBeMMbCCy ditshoeeshsManrCopterνd;ethpsaeennedaAodpnipreethcnetdieixnxntAeenr-sfgioorronuthopefctpohrreoreoMlfa.CtiHo,nes.innccee, Now we derive the hard-thresholding operator x∗ ∈Hλ(g) Proposition 3: Let the MC ν of Ψ satisfy (s−1)ν < 1. for one single group for an s-dimensional vector g ∈Rs as Then for the BMC µ of Ψ, there holds µ≤ νs . 1−ν(s−1) Below we always assume the following condition. x∗ ∈argmin 1(cid:107)x−g(cid:107)2+λ(cid:107)x(cid:107) , x∈Rs 2 (cid:96)0((cid:96)2) Assumption 2.1: The BMC µ of Ψ satisfies µ∈(0,1/3T). We have a few comments on Assumption 2.1. wherethe(cid:107)·(cid:107) penaltyisgivenby(cid:107)x(cid:107) =1ifx(cid:54)=0, (cid:96)0((cid:96)2) (cid:96)0((cid:96)2) Remark 2.1: First,ifthegroupsizesdonotvarymuch,then and (cid:107)x(cid:107) =0 otherwise. Then it can be verified directly (cid:96)0((cid:96)2) the condition µ < 1/3T holds if ν < 1/C(cid:107)x†(cid:107) . The latter (cid:96)0 √  condition with C ∈ (2,7) is widely used for analyzing lasso g, if (cid:107)g(cid:107)> 2λ,  √ [52]andOMP[49],[53].Hence,theconditioninAssumption x∗ = 0, if (cid:107)g(cid:107)< 2λ, √ 2.1 generalizes the classical one. Second, it allows strong  0 or g, if (cid:107)g(cid:107)= 2λ. inner-group correlations (i.e., ill-conditioning of Ψ ), for Gi For a vector x∈Rp, the hard thresholding operator H (with which the MC ν can be very close to one, and thus it has λ respect to the partition {G }N ) is defined groupwise. For a built-in mechanism to tackle inner-group correlation. This i i=1 s = 1, it recovers the usual hard thresholding operator, and differs essentially from existing approaches, which rely on hence it is called a group hard thresholding operator. certain pre-processing techniques [54], [55]. IEEETRANSECTIONONSIGNALPROCESSING,VOL. ,NO. , ,2015 4 Remark 2.2: A similar block MC, defined by µ = is often employed to decorrelate Ψ [54], [55]. In contrast, B max (cid:107)Ψt Ψ (cid:107)/s, was used for analyzing group greedy our approach has a built-in decorrelation mechanism: it is algorii(cid:54)=thjms G[2i0]G, j[35] and group lasso [22] (without scaling independentoftheconditioningofthesubmatrices{Ψ }N . Gi i=1 s). If every submatrix Ψ is column orthonormal, i.e., For a properly chosen λ, a global minimizer has nice Gi Ψt Ψ = I, then µ and µ are identical. However, to properties, e.g., exact support recovery for small noise and Gi Gi B obtain the error estimates in [20], [35], the MC ν within each oracle property; the proof is given in Appendix E. groupisstillneeded,whichexcludesinner-groupcorrelations. Theorem 8: Let Assumption 2.1 hold, x be a global mini- The estimates in [22] were obtained under the assumption mizer of (3) with an active set A, and x¯† =Ψ¯ x† . wmealxlic(cid:107)oΨndtGiitΨioGneid−[I2(cid:107)2,≤T1h/e2o,rewmhic1h].aGgaroinupimrpelsitersictthaetigΨenGvialauree (i) tLheetnΛ|A=\|{Ai†∈|+A|†A:†(cid:107)\x¯A†Gi|(cid:107)≤<2Λ2√. 2λG+i3(cid:15)}|.GIif λGi>(cid:15)2/2, conditions [18], [21] and group RIP [23] were adopted for (ii) If η is small, i.e., (cid:15) < min {(cid:107)x¯† (cid:107)}/5, then for analyzing the group lasso. Under these conditions, strong any λ ∈ ((cid:15)2/2,(min {(cid:107)x¯†i∈(cid:107)A}†−2(cid:15)G)i2/8), the oracle correlation within groups is also not allowed. solution xo is the onliy∈Ag†lobalGmi inimizer to J . Now we give a few useful estimates. The proofs can be λ found in Appendix B. B. Necessary optimality condition Lemma 4: For any i,j, there hold Since problem (3) is highly nonconvex, there seems no (cid:107)Ψ¯−1Ψt y(cid:107)≤(cid:107)y(cid:107), (cid:107)Ψ Ψ¯−1x (cid:107)=(cid:107)x (cid:107), Gi Gi Gi Gi Gi Gi convenient characterization of a global minimizer that is (cid:26) (cid:107)D x (cid:107) ≤µ(cid:107)xGj(cid:107) i(cid:54)=j, amenable with numerical treatment. Hence, we resort to the i,j Gj =(cid:107)xGj(cid:107) i=j. concept of a block coordinatewise minimizer (BCWM) with ML≤emTm,ale5t: For any distinct groups Gi1,··· ,GiM, 1 ≤ raelospnegcetatcohthgerogurpocuopoprdairntiattieonxG{Gi [i5}6Ni=].1,Spwehciicfihcailslym, ainBimCiWzinMg x∗ to the functional J satisfies for i=1,2,...,N     λ D ··· D x i.1,i1 . i1.,iM G.i1 x∗ ∈arg min J (x∗ ,··· ,x∗ ,x ,x∗ ,··· ,x∗ ). D = .. .. ..  and x= .. . Gi xGi∈Rsi λ G1 Gi−1 Gi Gi+1 GN D ··· D x iM,i1 iM,iM GiM We have the following necessary and sufficient condition There holds (cid:107)Dx(cid:107) ∈ [(1−(M −1)µ)(cid:107)x(cid:107) ,(1+ for a BCWM x∗; see Appendix F for the proof. It is also the (cid:96)∞((cid:96)2) (cid:96)∞((cid:96)2) (M −1)µ)(cid:107)x(cid:107) ]. necessary optimality condition of a global minimizer x∗. (cid:96)∞((cid:96)2) Lemma 5 directly implies the uniqueness of the oracle Theorem 9: The necessary and sufficient optimality condi- solution xo; see Appendix C for the proof. tion for a BCWM x∗ ∈Rp of problem (3) is given by Corollary 6: If Assumption 2.1 holds, then xo is unique. x¯∗ ∈H (x¯∗ +d¯∗ ), i=1,...,N, (10) Gi λ Gi Gi III. THEORYOFTHE(cid:96)0((cid:96)2)OPTIMIZATIONPROBLEM Ψwhxe∗r)eax¯n∗Gdid¯=∗ Ψ¯=GiΨ¯x∗G−i1,da∗nd.thedualvariabled∗isd∗ =Ψt(y− Now we analyze the model (3), e.g., existence of a global Gi Gi Gi Remark 3.2: Theoptimalitysystemisexpressedintermsof minimizer, invariance property, support recovery, and block thetransformedvariablesx¯ andd¯only,insteadoftheprimary coordinatewise minimizers. variables x and d. This has important consequences for the analysis and algorithm of the (cid:96)0((cid:96)2) model: both should be A. Existence and property of a global minimizer carriedoutinthetransformeddomain.Clearly,(10)isalsothe First we show the existence of a global minimizer to optimalitysystemofaBCWMx¯∗ forproblem(9),concurring problem (3); see Appendix D for the proof. with the invariance property. Theorem 7: Thereexistsaglobalminimizertoproblem(3). Notation. In the discussions below, given a primal variable x It can be verified directly that the (cid:96)0((cid:96)2) penalty is in- anddualvariabled,wewilluse(x¯,d¯)forthetransformedvari- variant under group full-rank column transformation, i.e., ables, i.e., x¯ =Ψ¯ x and d¯ =Ψ¯−1d , i=1,...,N. (cid:107)Ψ¯ x (cid:107) =(cid:107)x (cid:107) fornonsingularΨ¯ ,i=1,2,...,N. Using theGgiroup hGaird-GthiresholdGiing opeGriatoGriH , we deduce Gi Gi (cid:96)0 Gi (cid:96)0 Gi λ Thus problem (3) can be equivalently transformed into √ (cid:107)x¯∗ +d¯∗ (cid:107)< 2λ⇒x¯∗ =0 (⇔x∗ =0), N Gi Gi √ Gi Gi 12(cid:107)(cid:88)ΨGiΨ¯−Gi1x¯Gi −y(cid:107)2+λ(cid:107)x¯(cid:107)(cid:96)0((cid:96)2). (9) (cid:107)x¯∗Gi +d¯∗Gi(cid:107)> 2λ⇒d¯∗Gi =0 (⇔d∗Gi =0). i=1 Combining these two relations gives a simple observation with x¯Gi = Ψ¯GixGi. This invariance does not hold for other (cid:107)x¯ (cid:107)≥√2λ≥(cid:107)d¯ (cid:107). (11) group sparse penalties, e.g., group lasso and group MCP. Gi Gi Further,theBMCµisinvariantunderthetransformation,since NextwediscussinterestingpropertiesofaBCWMx∗.First, span({ψ :l∈G })=span({(Ψ Ψ¯−1) }). it is always a local minimizer, i.e., J (x∗+h)≥J (x∗) for Remarlk 3.1: Miost existing appGrioacGhiesl do not distinguish all small h∈Rp; see Appendix G foλr the proof. λ inner- and inter-group columns, and thus require incoherence Theorem 10: A BCWM x∗ of the functional J is a local λ between the columns within each group in the theoretical minimizer. Further, with its active set A, if Ψ has full ∪i∈AGi analysis. For strong inner-group correlation, a clustering step column rank, then it is a strict local minimizer. IEEETRANSECTIONONSIGNALPROCESSING,VOL. ,NO. , ,2015 5 Given the active set A of a BCWM x∗, if |A| is controlled, Algorithm 1 GPDASC algorithm thenAprovidesinformationaboutA†;seeTheorem11below 1: Input: Ψ ∈ Rn×p, {Gi}Ni=1, Kmax, λ0 = 21(cid:107)y(cid:107)2, and and Appendix H for the proof. In particular, if the noise η is ρ∈(0,1). smTahlle,owreitmh 1a1p:roLpeetrAcshsouimceptoifonλ2,.t1hehnolAd,⊆andAx†.∗ beaBCWM 23:: CSeotmxp(uλt0e)Ψ¯=G0i,=d((λΨ0tG)i=ΨGΨit)y1,/2A.(λ0)=∅. to the model (3) with a support A and |A| ≤ T. Then the 4: for s=1,2,... do following statements hold. √ 5: Set λs =ρλs−1, x0 =x(λs−1), d0 =d(λs−1), A−1 = (i) The inclusion {i:(cid:107)x¯† (cid:107)≥2 2λ+3(cid:15)}⊆A holds. A(λ ). Gi s−1 (ii) The inclusion A⊆A† holds if (cid:15) is small: 6: for k =0,1,...,Kmax do (cid:15)≤tmin{(cid:107)x¯† (cid:107)} for some 0≤t< 1−3µT. (12) 7: Let x¯kGi =Ψ¯GixkGi and d¯kGi =Ψ¯−Gi1dkGi, and define (iii) If the seti{∈iA∈† A†G:i(cid:107)x¯† (cid:107)∈[2√2λ−3(cid:15),2√22λ+3(cid:15)]} is Ak ={i:(cid:107)x¯kGi +d¯kGi(cid:107)>(cid:112)2λs}. empty, then A⊆A†. Gi 8: Check the stopping criterion Ak =Ak−1. 9: Update the primal variable xk+1 by IV. GROUPPRIMAL-DUALACTIVESETALGORITHM xk+1 = argmin (cid:107)Ψx−y(cid:107). Now we develop an efficient, accurate and globally conver- supp(x)⊆∪i∈AkGi gentgroupprimaldualactivesetwithcontinuation(GPDASC) 10: Update the dual variable by dk+1 =Ψt(y−Ψxk+1). algorithm for problem (3). It generalizes the algorithm for the 11: end for (cid:96)1 and (cid:96)0 regularized problems [41], [42] to the group case. 12: Set the output by x(λs), d(λs) and A(λs). 13: Check the stopping criterion A. GPDASC algorithm (cid:107)Ψx(λ )−y(cid:107)≤(cid:15). (13) s The starting point is the necessary and sufficient optimality condition(10)foraBCWMx∗,cf.Theorem9.Thefollowing 14: end for two observations from (10) form the basis of the derivation. First, given a BCWM x∗ (and its dual variable d∗ =Ψt(y− Ψx∗)), one can determine the active set A∗ by signals. Further, since the inner iterates are of Newton type √ [41], the local convergence should be fast. However, in order A∗ ={i:(cid:107)x¯∗ +d¯∗ (cid:107)> 2λ} Gi Gi to fully exploit this nice feature, a good initial guess of the and the inactive set I∗ its complement, provided that the primal and dual variables (x,d) is required. To this end, we √ set {i : (cid:107)x¯∗ +d¯∗ (cid:107) = 2λ} is empty. Second, given the apply a continuation strategy along λ. Specifically, given a activesetAG∗i,oneGcaindetermineuniquelytheprimalanddual large λ0, we gradually decrease its value by λs = ρλs−1, variables x∗ and d∗ by (with B =∪ G ) for some decreasing factor ρ ∈ (0,1), and take the solution i∈A∗ i (cid:40)x∗ =0 ∀i∈I∗ and Ψt Ψ x∗ =Ψt y, (x(λs−1),d(λs−1)) to the λs−1-problem Jλs−1 to warm start d∗GGji =0 ∀j ∈A∗ and dB∗Gi =B ΨBtGi(y−BΨx∗) ∀i∈I∗. theTλhesr-epraorbeletmwoJsλtso.pping criteria in the algorithm, at steps 8 and 13, respectively. In the inner loop, one may terminate the By iterating these two steps alternatingly, with the current estimates (x,d) and (A,I) in place of (x∗,d∗) and (A∗,I∗), iteration if the active set Ak does not change or a maximum numberK ofinneriterationsisreached.Sincethestopping we arrive at an algorithm for problem (3). max criterion A = A for convex optimization may never be The complete procedure is listed in Algorithm 1. Here k k−1 K ∈ N is the maximum number of inner iterations, λ reached in the nonconvex context [42], it has to be terminated max 0 is the initial guess of λ. The choice λ = 1(cid:107)y(cid:107)2 ensures afteramaximumnumberKmaxofiterations.Ourconvergence that x0 = 0 is the only global minimize0r, cf. P2roposition 12 analysis holds for any Kmax ∈ N, including Kmax = 1, and below, with a dual variable d0 =Ψty. The scalar ρ∈(0,1) is we recommend Kmax ≤5 in practice. The stopping criterion at step 13 is essentially concerned with the proper choice of the decreasing factor for λ, which essentially determines the λ. The choice of λ stays at the very heart of the model (3). length of the continuation path. Many rules, e.g., discrepancy principle, balancing principle The algorithm consists of two loops: an inner loop of and information criterion, have been developed for variational solving problem (3) with a fixed λ using a GPDAS algorithm regularization [57]. In Algorithm 1, we give only the discrep- (lines 6–10), and an outer loop of continuation along the ancy principle (13), assuming that a reliable estimate on the parameter λ by gradually decreasing its value. noise level (cid:15) is available. The rationale behind the principle In the inner loop, it involves a least-squares problem: is that the reconstruction accuracy should be comparable with xk+1 = argmin (cid:107)Ψx−y(cid:107), the data accuracy. Note that the use of (13) (and other rules) supp(x)⊆∪i∈AkGi doesnotincurredanyextracomputationaloverheads,sincethe which amounts to solving a (normal) linear system of size sequence of solutions {x(λs)} is already generated along the |∪ G |≤|A |s. Hence, this step is very efficient, if the continuation path. i∈Ak i k active set A is small, which is the case for group sparse Now we justify the choice of λ : for large λ, 0 is the only k 0 IEEETRANSECTIONONSIGNALPROCESSING,VOL. ,NO. , ,2015 6 global minimizer to J . The proof is given in Appendix I. flops. Then the Cholesky factor of Ψt Ψ is O((|B ∪ λ Bk Bk k−1 Proposition 12: The following statements hold. B |−|B ∩B |)|B |(n+|B |)). Along the continua- k k−1 k k−1 k−1 (i) For any λ>0, x∗ =0 is a strict local minimizer to Jλ; tion path, (|Bk−1∪Bk|−|Bk−1∩Bk|) is small, as confirmed (ii) For any λ > λ := 1(cid:107)y(cid:107)2, x∗ = 0 is the only global in Fig. 5 below, and thus the overall cost is often of O(np). 0 2 minimizer of problem (3). V. NUMERICALRESULTSANDDISCUSSIONS Now we present numerical results to illustrate distinct B. Convergence analysis features of the proposed (cid:96)0((cid:96)2) model and the efficiency and Now we state the global convergence of Algorithm 1. accuracy of Algorithm 1. All the numerical experiments were Theorem 13: LetAssumption2.1and(12)hold.Thenfora performed on a four-core desktop computer with 3.16 GHz properchoiceofρ∈(0,1),andforanyK ≥1,Algorithm max and 8 GB RAM. The MATLAB code (GPDASC) is available 1 converges to xo in a finite number of iterations. at http://www0.cs.ucl.ac.uk/staff/b.jin/software/gpdasc.zip. We only sketch the main ideas, and defer the lengthy proof to Appendix J. The most crucial ingredient of the proof is to A. Experimental setup characterize a monotone decreasing property of the “energy” during the iteration by some auxiliary set Γ defined by First we describe the problem setup of the numerical ex- s (cid:110) √ (cid:111) periments. In all the numerical examples, the group sparse Γs = i:(cid:107)x¯†Gi(cid:107)≥ 2s . (14) structure of the true signal x† is encoded in the partition {G }N , which is of equal group size s, with p = Ns, and The inclusion Γ ⊆ Γ holds trivially for s > s . If A i i=1 s1 s2 1 2 k x† has T =|A†| nonzero groups. The dynamic range (DR) of is the active set at the kth iteration, the corresponding energy the signal x† is defined by E is defined by E =E(A )=max (cid:107)x¯† (cid:107). Then with prkoperly chosen s k> s , thkere holdsiΓ∈Ik ⊆GiA ⊆ A† ⇒ max{|x†|:x† (cid:54)=0} 1 2 s21λ k DR= i i . Γs22λ ⊆Ak+1 ⊆A†. This relation characterizes the evolution min{|x†i|:x†i (cid:54)=0} oftheactivesetA ,andprovidesacrucialstrictmonotonicity k Wefixtheminimumnonzeroentryatmin{|x†|:x† (cid:54)=0}=1. of the energy Ek. This observation is sufficient to show the i i convergence of the algorithm to the oracle solution xo in a The sensing matrix Ψ is constructed as follows. First we finite number of steps; see Appendix J for details. generate a random Gaussian matrix Ψ(cid:101) ∈ Rn×p, n (cid:28) p, withitsentriesfollowinganindependentidenticallydistributed Remark 4.1: TheconvergenceinTheorem13holdsforany K ∈ N, including K = 1. According to the proof in (i.i.d.) standard Gaussian distribution with a zero mean and max max unit variance. Then for any i ∈ {1,2...,N}, we introduce Appendix J, the smaller are the factor µT and the noise level (cid:15), the smaller is the decreasing factor ρ that one can choose correlation within the ith group Gi by: given ΨGi ∈Rn×|Gi| and thus Algorithm 1 takes fewer outer iterations to reach by setting ψ1 =ψ(cid:101)1, ψ|Gi| =ψ(cid:101)|Gi| and convergenceonthecontinuationpath.Weoftentakenρ=0.7. ψj =ψ(cid:101)j +θ(ψ(cid:101)j−1+ψ(cid:101)j+1), j =2,...,|Gi|−1, where the parameter θ ≥0 controls the degree of inner-group C. Computational complexity correlation: The larger is θ, the stronger is the correlation. Now we comment on the computational complexity of Finally, we normalize the matrix Ψ to obtain Ψ such that all Algorithm 1. First, we consider one inner iteration. Steps 7- columns are of unit length. The data y is formed by adding 8 take O(p) flops. At Step 9, explicitly forming the matrix noise η to the exact data y† = Ψx† componentwise, where oΨftBfkoΨrmBikn,gBΨk t=y∪isi∈nAekgGligi,ibtalekessinOc(eni|tBiks|2o)fflteonpsp,raencdomthpeucteodst. the entries ηi follow an i.i.d. Gaussian distribution N(0,σ2). Below we shall denote by the tuple (n,p,N,T,s,DR,θ,σ) The Cholesky factorization costs O(|B |3) flops and the k the data generation parameters, and the notation N : d : N 1 2 back-substitution needs O(|B |2) flops. Hence step 9 takes k denotes the sequence of numbers starting with N and less 1 O(max(|B |3,n|B |2)) flops. At step 10, the matrix-vector k k than N with a spacing d. 2 product takes O(np) flops. Hence, the the overall cost of one inner iteration is O(max(|B |3,pn,n|B |2)). Since the k k B. Comparison with existing group sparse models GPDAS is of Newton type, a few iterations suffice conver- First we compare our (cid:96)0((cid:96)2) model (3) (and Algorithm 1) gence, which is numerically confirmed in Section V. So with with three state-of-the-art group sparse recovery models and a good initial guess, for each fixed λ, the overall cost is O(max(|B |3,pn,|B |2n)). In particular, if the true solution algorithms, i.e., group lasso model k k √ x† is sufficiently sparse, i.e., |Bk|(cid:28)min(n, p), the cost of min(cid:107)x(cid:107)(cid:96)1((cid:96)2) subject to (cid:107)Ψx−y(cid:107)≤(cid:15) per inner iteration is O(np). x∈Rp Generally, one can apply the well-know low-rank Cholesky (solved by the group SPGl1 method [29], available at http: up/down-dateformulas[58]tofurtherreducethecost.Specif- //www.cs.ubc.ca/∼mpf/spgl1/, last accessed on December 23, ically, with B = ∪ G , we down-date by removing the 2015), group MCP (GMCP) model [17], [26], [27] (solved k i∈Ak i columns in B but not in B at the cost of O(|B \ by a group coordinate descent (GCD) method [34]), and k−1 k k−1 B ||B |2) flops, and update by appending the columns in group OMP (GOMP) [20], [35]. We refer to these refer- k k−1 B but not in B in O(|B \B |(|B |2+n|B |)) ences for their implementation details. Since the algorithm k k−1 k k−1 k−1 k−1 IEEETRANSECTIONONSIGNALPROCESSING,VOL. ,NO. , ,2015 7 103 1.4 GPDASC GPDASC GOMP GOMP 1.2 SPGl1 GSPGl1 GCD GCD 1 Probability 00..68 Time 102 0.4 0.2 0 101 -0.2 150 200 250 300 350 0 20 40 60 80 100 120 T T (a) computing time (in seconds) (a) (800,2×103,500,10:10:100,4,10,0,10−3) 101 GPDASC 1.4 GPDASC GOMP GOMP SPGl1 1.2 SPGl1 100 GCD GCD 1 Probability 00..68 Relativeerror 1100--21 0.4 0.2 10-3 0 10-4 150 200 250 300 350 -0.2 T 0 20 40 60 80 100 120 140 160 180 T (b) relative error (b) (800,2×103,500,10:10:100,4,10,3,10−3) Fig. 2: Computing time and relative error for GPDASC, Fig. 1: The probability of exact support recovery. GOMP,SPGl1,andGCDfortheproblemsetting(2×103,1× 104,2.5×103,150 : 50 : 350,4,100,1,10−2). All computa- tions were performed with the same continuation path. essentially determines the performance of each method, we shall indicate the methods by the respective algorithms, i.e., SPGl1, GCD, GOMP and GPDASC. In the comparison, we model. Note that GMCP as implemented in [17] is robust examine separately support recovery, and computing time and with respect to the inner-group correlation, since it performs reconstruction error. All the reported results are the average a preprocessing step to decorrelate Ψ by reorthonormalizing of 100 independent simulations of the experimental setting. the columns within each group. However, unlike the (cid:96)0((cid:96)2) First, to show exact support recovery, we consider the penalty, this step generally changes the GMCP objective following two problem settings: (800,2 × 103,500,10 : function, due to a lack of transform invariance, and thus may 10 : 100,4,10,0,10−3) and (800,2 × 103,500,10 : 10 : complicate the theoretical analysis of the resulting recovery 100,4,10,3,10−3), for which the condition numbers of the method. Meanwhile, as a greedy approximation, GOMP does submatrices Ψ are O(1) and O(102), respectively, for the a fairly good job overall: for small θ, it can almost perform Gi case θ = 0 and θ = 3, respectively. Given the group size as well as the (cid:96)0((cid:96)2) model, but deteriorates greatly for large s = 4, the condition number O(102) is fairly large, and thus θ. By its very construction, GOMP from [20] does not take the latter is numerically far more challenging than the former. careoftheinner-groupcorrelationdirectly.Surprisingly,group The numerical results are presented in Fig. 1, where the exact lasso fails most of the time. A closer look at the recovered recovery is measured by A∗ =A†, with A† and A∗ being the signals shows that it tends to choose a slightly larger active true and recovered active sets, respectively. set than A† in the noisy case, and this explains its relatively Numerically,itisobservedthatasthe(group)sparsitylevel poor performance in terms of the exact recovery probability, T and correlation parameter θ increase, the (cid:96)0((cid:96)2) model and although the relative error is not too large. Intuitively, this GMCParethebestperformersinthetest.Theoretically,thisis concurswiththefactthattheconvexrelaxationoftentradesthe notsurprising:the(cid:96)0((cid:96)2)modelrepresentsthegolden-standard computational efficiency by compromising the reconstruction for group sparse recovery, like the (cid:96)0 model for the usual accuracy. sparsity, and GMCP is a close nonconvex proxy to the (cid:96)0((cid:96)2) Next we compare their computing time and reconstruction IEEETRANSECTIONONSIGNALPROCESSING,VOL. ,NO. , ,2015 8 error on the following two problem settings: (2 × 103,1 × 3 104,2.5 × 103,200 : 25 : 400,4,100,1,10−2) and (5 × 103,2 × 104,5 × 103,500 : 50 : 800,4,100,10,10−3), for which the condition number of the submatrices Ψ is of Gi O(10) and O(103), respectively. The case θ = 10 involves ath2 verystronginner-groupcorrelation,anditisverychallenging. e p h The numerical results are presented in Figs. 2 and 3. n t o er of it # 1 104 103 0 5 10 15 20 25 30 35 40 45 sindexforλs me (a) (500,103,250,50,4,100,0,10−3) Ti 3 102 (cid:40)(cid:49)(cid:37)(cid:34)(cid:52)(cid:36)(cid:1) (cid:40)(cid:48)(cid:46)(cid:49)(cid:1) (cid:52)(cid:49)(cid:40)(cid:77)(cid:18) 101 (cid:40)(cid:36)(cid:37) ath2 500 550 600 65T0 700 750 800 he p n t o (a) computing time (in second) er of it 100 # 1 10−1 0 5 10 15 20 25 30 35 40 45 or sindexforλs err (b) (500,103,250,50,4,100,3,10−3) ve ati el Fig. 4: The number of iterations along the continuation path, R10−2 for each fixed regularization parameter λ . s (cid:40)(cid:49)(cid:37)(cid:34)(cid:52)(cid:36)(cid:1) (cid:40)(cid:48)(cid:46)(cid:49)(cid:1) (cid:52)(cid:49)(cid:40)(cid:77)(cid:18) (cid:40)(cid:36)(cid:37) of other algorithms has doubled. Further, the relative error 10−3 500 550 600 650 700 750 800 by the (cid:96)0((cid:96)2) model does not deteriorate with the increase T of the correlation parameter θ, due to its inherent built-in (b) relative error decorrelation mechanism, cf. Section III, and thus it is far smallerthanthatbyothermethods,especiallywhenthegroup Fig. 3: Computing time and relative error for GPDASC, GOMP,SPGl1,andGCDfortheproblemsetting(5×103,2× sparsity level T is large. In summary, these experiments show 104,5×10−3,500 : 50 : 800,4,100,10,10−3). All computa- clearly that the proposed (cid:96)0((cid:96)2) model is very competitive in terms of computing time, reconstruction error and exact tions were performed with the same continuation path. support recovery. For θ = 1, the proposed GPDASC for the (cid:96)0((cid:96)2) model C. Superlinear local convergence of Algorithm 1 is at least three to four times faster than GCD and GOMP, cf. Fig. 2. The efficiency of GPDASC stems from its Newton We illustrate the convergence behavior of Algorithm 1 nature and the continuation strategy, apart from solving least- with two problem settings: (500,103,250,50,4,100,0,10−3) squares problems only on the active set. We shall examine and (500,103,250,50,4,100,3,10−3). To examine the local its convergence more closely below. Group lasso is also convergence, we show the number of iterations for each fixed computationallyattractive,sinceduetoitsconvexity,itadmits λ along the continuation path in Fig. 4. It is observed that s an efficient solver SPGl1. The coupling with a continuation the stopping criterion at the inner iteration, i.e., Step 8 of strategy is beneficial to the efficiency of SPGl1 [41]. Mean- Algorithm 1, is usually reached with one or two iterations, while, the reconstruction errors of the (cid:96)0((cid:96)2) and GMCP are irrespective of the inner-group correlation strength or the comparable, which is slightly better than GOMP, and they are regularization parameter λ . Hence, Algorithm 1 converges s muchaccuratethanthatofgrouplasso,asobservedearlier.In locallysupperlinearly,likethatfortheconvex(cid:96)1 penalty[41], the case of strong inner-group correlation (i.e., θ = 10), the and the continuation strategy can provide a good initial guess computing time of GPDASC does not change much, but that for each inner iteration such that the fast local convergence of IEEETRANSECTIONONSIGNALPROCESSING,VOL. ,NO. , ,2015 9 4 101 |As+1\As| |As\As+1| 100 3 10−1 r o r10−2 2 r e 10−3 1 10−4 0 5 10 15 20 25 30 35 40 0 s index for λ 0 5 10 15 20 25 30 35 40 s sindexforλs (a) (500,103,250,50,4,100,0,10−3) (a) (500,103,250,15,4,100,0,10−3) 101 4 |As+1\As| 100 |As\As+1| 3 10−1 r o rr10−2 e 2 10−3 10−4 1 0 5 10 15 20 25 30 35 40 s index for λ s 0 0 5 10 15sinde2x0forλs25 30 35 40 (b) (500,103,250,50,4,100,1,10−3) (b) (500,103,250,15,4,100,1,10−3) Fig. 6: The relative (cid:96)2 error of the iterates along the continuation path, for each fixed regularization parameter λ , with Fig. 5: The variation of the active set size measured by |A \ s s respect to the oracle solution xo. A |and|A \A |alongthecontinuationpath,whereA s+1 s+1 s s denotes the active set at the regularization parameter λ . s unit. Then the error first increases slightly, before it starts to decrease monotonically. Upon convergence (i.e., the discrep- the GPDAS is fully exploited. This confirms the complexity ancy principle is satisfied), the iterate converges to the oracle analysisinSectionIV-C.Thehighlydesirableθ-independence solution xo, as indicated by the extremely small error. It is convergence is attributed to the built-in de-correlation effect noteworthy that the convergence behavior is almost identical of the (cid:96)0((cid:96)2) model. for both the uncorrelated and correlated sensing matrices, To gain further insights, we present in Fig. 5 the variation further confirming the advantage of the (cid:96)0((cid:96)2) approach. of the active set along the continuation path using the setting as that of Fig. 4. It is observed that the interesting mono- tonicity relation A ⊂ A holds along the continuation s s+1 D. Multichannel image reconstruction path. The difference of active sets between two neighboring regularization parameters λ is generally small (less than five, In the last set of experiments, we consider recovering 2D s and mostly one or two), and thus each GPDAS update is images from compressive and noisy measurement. efficient,withacostcomparablewiththatofonestepgradient The first example is taken from [59]. The target signal is descent, if using the low-rank Cholesky up/down-date [58], a color image with three-channels I = (I ;I ;I ), with I ∈ r g b c cf. Section IV-C. Further, the empirical observation that each Rl2,c ∈ {r,g,b}. In the computation, we reorder I into one inner iteration often takes only one iteration corroborates vector such that the pixels at the same position from the three the convergence theory in Theorem 13, i.e., the algorithm channels are grouped together. The observational data y is converges globally even if each inner loop takes one iteration. generated by y = ΨI + η where Ψ is a random Gaussian matrix (with correlation within each group) and η is Gaussian Correspondingly, the variation of the relative (cid:96)2 error with noise, following the procedure outlined in Section V-A with respect to the oracle solution xo along the continuation path the following parameters: n = 1152, p = 6912, N = 2304, is given in Fig. 6. For large regularization parameters λ , T = 152, s = 3, θ = 10, σ = 1e-3. The condition number s the regularized solution is zero, and thus the relative error is within each group is O(102). IEEETRANSECTIONONSIGNALPROCESSING,VOL. ,NO. , ,2015 10 The numerical results are presented in Fig. 7 and Table I, where the PSNR is defined by Original GPDASC GOMP V2 PSNR=10·log , MSE where V and MSE is the maximum absolute value and the mean squared error, respectively, of the reconstruction, It is observed that GPDASC, GOMP and GCD produce visually equallyappealingresults,andtheyaremuchbetterthanthatof SPGl1.ThisobservationisalsoconfirmedbythePSNRvalues in Table I: the PSNR of GPDASC is slightly higher than that SPGl1 GCD of GOMP and GCD. The convergence of GPDASC is much faster than GOMP and GCD. The SPGl1 is the most efficient one, but greatly compromises the reconstruction quality. (cid:1)(cid:2)(cid:3)(cid:4)(cid:3)(cid:5)(cid:6)(cid:7) (cid:8)(cid:9)(cid:10)(cid:11)(cid:12)(cid:13) (cid:8)(cid:1)(cid:14)(cid:9) Fig. 8: Reconstructions for the 2D MRI phantom image. TABLEII:Numericalresultsforthe2DMRIphantomimage: n = 3771, p = 12288, N = 4096, T = 724, s = 3, θ = 0, σ =1e-2. algorithm CPUtime(s) PSNR (cid:12)(cid:9)(cid:8)(cid:7)(cid:15) (cid:8)(cid:13)(cid:10) GPDASC 48.5 38.7 GOMP 203 37.3 SPGl1 14.3 20.1 GCD 212 38.2 The observations from the preceding example remain largely valid: the reconstructions by GPDASC, GOMP and GCD are close to each other visually and have comparable Fig. 7: Reconstruction results of the two-dimensional image. PSNR values, and all are much better than that by SPGl1. However, GPDASC is a few times faster than that by GOMP and GCD. TABLE I: Numerical results for the two-dimensional image: n = 1152, p = 6912, N = 2304, T = 152, s = 3, θ = 10, VI. CONCLUSIONS σ =1e-3. In this work we have proposed and analyzed a novel algorithm CPUtime(s) PSNR approach for recovering group sparse signals based on the GPDASC 5.70 48.2 regularized least-squares problem with an (cid:96)0((cid:96)2) penalty. We GOMP 10.9 47.9 SPGl1 2.85 22.2 provided a complete theoretical analysis on the model, e.g., GCD 33.9 48.1 existence of global minimizers, invariance property, support recovery, and properties of block coordinatewise minimizers. Last, we consider multichannel MRI reconstruction. The One salient feature of the approach is that it has built-in samplingmatrixΨisthecompositionofapartialFFTwithan decorrelation mechanism, and can handle very strong inner- inversewavelettransform,withasize3771×12288,wherewe groupcorrelation.Further,thesenicepropertiescanbenumer- haveused6levelsofDaubechies1wavelet.Thethreechannels ically realized efficiently by a primal dual active set solver, for each wavelet expansion are organized into one group, and for which a finite-step global convergence was also proven. the underlying image I = (I ;I ;I ) has 724 nonzero group Extensive numerical experiments were presented to illustrate r g b coefficients(eachofgroupsize3)underthewavelettransform. the salient features of the (cid:96)0((cid:96)2) model, and the efficiency Hence,thedataisformedasy =Ψc+η,wherecisthetarget and accuracy of the algorithm, and the comparative study coefficientwithagroupsparsestructureandη istheGaussian with existing approaches show its competitiveness in terms of noise with a noise level σ = 1e-2. The recovered image I is support recovery, reconstruction errors and computing time. thenobtainedbyapplyingtheinversewavelettransformtothe There are several avenues deserving further study. First, estimated coefficient c. The numerical results are presented in when the column vectors in each group are ill-posed in the Fig. 8 and Table II. sense that they are highly correlated / nearly parallel to each

Group Sparse Recovery via the $\ell^0(\ell^2)$ Penalty: Theory and Algorithm PDF

0.54 MB·

by Yuling Jiao

#journals #arxiv

Checking for file health...

Save to my drive

Quick download

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Group Sparse Recovery via the $\ell^0(\ell^2)$ Penalty: Theory and Algorithm

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.