Necessary and Sufficient Conditions for Sparsity Pattern Recovery The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Fletcher, A.K., S. Rangan, and V.K. Goyal. “Necessary and Sufficient Conditions for Sparsity Pattern Recovery.” Information Theory, IEEE Transactions on 55.12 (2009): 5758-5772. © 2009 IEEE As Published http://dx.doi.org/10.1109/tit.2009.2032726 Publisher Institute of Electrical and Electronics Engineers Version Final published version Accessed Fri Dec 28 17:39:50 EST 2018 Citable Link http://hdl.handle.net/1721.1/52487 Terms of Use Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. Detailed Terms 5758 IEEETRANSACTIONSONINFORMATIONTHEORY,VOL.55,NO.12,DECEMBER2009 Necessary and Sufficient Conditions for Sparsity Pattern Recovery AlysonK.Fletcher,Member,IEEE, SundeepRangan,and VivekKGoyal,SeniorMember,IEEE Abstract—The paper considers the problem of detecting the withadditivenoise ,thelocationsofnonzerocomponentsin sparsitypatternofa(cid:0)-sparsevectorin (cid:0) from(cid:2)randomnoisy couldconveyinformation[2],[3]. measurements. A new necessary condition on the number of In this paper, we are concerned with establishing necessary measurements for asymptotically reliable detection with max- andsufficientconditionsfortherecoveryofthepositionsofthe imum-likelihood (ML) estimation and Gaussian measurement matrices is derived. This necessary condition for ML detection nonzero entries of , which we call the sparsity pattern. The is compared against a sufficient condition for simple maximum abilitytodetectthesparsitypatternofavectorfromnoisymea- correlation(MC)orthresholdingalgorithms.Theanalysisshows surementsdependsonanumberoffactors,includingthesignal thatthegapbetweenthresholdingandMLcanbedescribedbya dimension,levelofsparsity,numberofmeasurements,andnoise simpleexpressionintermsofthetotalsignal-to-noiseratio(SNR), level. The broad goal of this paper is to explain the influence with the gap growing with increasing SNR. Thresholding is also compared against the more sophisticated Lasso and orthogonal ofthesefactorsontheperformancesofvariousdetectionalgo- matchingpursuit(OMP)methods.AthighSNRs,itisshownthat rithms. the gap between Lasso and OMP over thresholding is described by the range of powers of the nonzero component values of the A. RelatedWork unknownsignals.Specifically,thekeybenefitofLassoandOMP overthresholdingistheabilityofLassoandOMPtodetectsignals Sparsitypatternrecovery(ormoresimply,sparsityrecovery) withrelativelysmallcomponents. hasreceivedconsiderableattentioninavarietyofguises.Most IndexTerms—Compressedsensing,convexoptimization,Lasso, transparentfromourformulationistheconnectiontosparseap- maximum-likelihood(ML)estimation,orthogonalmatchingpur- proximation.Inatypicalsparseapproximationproblem,oneis suit(OMP),randommatrices,randomprojections,sparseapprox- givendata ,dictionary1 ,andtolerance . imation,subsetselection,thresholding. Theaimistofind withthefewestnumberofnonzeroentries amongthosesatisfying .ThisproblemisNP-hard [5]butgreedyheuristics(matchingpursuit[4]anditsvariants) I. INTRODUCTION andconvexrelaxations(basispursuit[6],Lasso[7]andothers) A common problem in signal processing is to estimate an canbeeffectiveundercertainconditionson and [8]–[10].In unknownsparsevector fromlinearobservations ourformulation, withoutadditiveGaussiannoisewouldhave of the form . Here, the measurement matrix anexactsparseapproximationwith terms. is known and is an additive noise vector with Morerecently,theconceptof“sensing”sparseorcompress- a known distribution. The vector is said to be sparse ible throughmultiplicationbyasuitablerandommatrix has inthatitisknownaprioritohavearelativelysmallnumberof been termed compressed sensing [11]–[13]. This has popular- nonzerocomponents,butthelocationsofthosecomponentsare izedthestudyofsparseapproximationwithrespecttorandom notknownandmustbedetectedaspartofthesignalestimation. dictionaries,whichwasconsideredalsoin[14],[15]. Sparse signal estimation arises in a number of applications, The principal results in compressed sensing bound the notablysubsetselectioninlinearregression[1].Inthiscase,de- errorofareconstructioncomputedfrom relativetotheerror terminingthelocationsofthenonzerocomponentsof corre- ofanoptimal -termnonlinearapproximationof withrespect spondstofindingasmallsubsetoffeatureswhichlinearlyinflu- toasuitablefixedbasis.Theseresultsshowthatfor onlymod- encetheobserveddata .Indigitalcommunicationoverchannel erately larger than , these errors are similar. For the case wherethe -termrepresentationof isexact,theseresultsthus Manuscript received April 11, 2008; revised February 03, 2009. Current version published November 20, 2009. This work was supported in part by establishexactrecoveryof .Forexample,if hasindependent a University of California President’s Postdoctoral Fellowship, the National andidenticallydistributed(i.i.d.)Gaussianentriesand has ScienceFoundation(NSF)underCAREERGrantCCF-643836,andtheCentre nonzeroentries,then BernoulliatÉcolePolytechniqueFédéraledeLausanne.Thematerialinthis paperwaspresentedinpartattheConferenceonNeuralInformationProcessing (1) Systems,Vancouver,BC,Canada,December2008. A.K.FletcheriswiththeDepartmentofElectricalEngineeringandCom- dictatestheminimumscalingatwhichbasispursuit puter Sciences, University of California, Berkeley, CA 94709 USA (e-mail: [email protected]). S. Rangan is with Qualcomm Technologies, Bedminster, NJ 07302 USA (e-mail:[email protected],[email protected]). succeedsatrecovering exactlyfrom withhighprob- V.K.GoyaliswiththeDepartmentofElectricalEngineeringandComputer ScienceandtheResearchLaboratoryofElectronics,MassachusettsInstituteof ability[16]. Technology,Cambridge,MA02139USA(e-mail:[email protected]). CommunicatedbyJ.Romberg,AssociateEditorforSignalProcessing. 1Thetermseemstohaveoriginatedin[4]andmayapplyto(cid:0)orthecolumns DigitalObjectIdentifier10.1109/TIT.2009.2032726 of(cid:0)asaset. 0018-9448/$26.00©2009IEEE Authorized licensed use limited to: MIT Libraries. Downloaded on February 11, 2010 at 14:46 from IEEE Xplore. Restrictions apply. FLETCHERetal.:NECESSARYANDSUFFICIENTCONDITIONSFORSPARSITYPATTERNRECOVERY 5759 TABLEI SUMMARYOFRESULTSONMEASUREMENTSCALINGFORRELIABLESPARSITYRECOVERY(SEEBODYFORDEFINITIONSANDTECHNICALLIMITATIONS) Extensionoftheexactrecoveryofthesparsitypatterntothe Previous necessary conditions had been based on informa- noisycaseisourobjectofstudy.Onekeypreviousresultinthis tion-theoretic capacity arguments in [21], [22] and a use of area,reviewedinmoredetailinSectionII-B,isthatthescaling Fano’sinequalityin[23].Morerecentpublicationswithneces- saryconditionsinclude[24]–[27].AsdescribedinSectionIII, (2) our new necessary condition is stronger than the previous resultsincertainimportantregimes. is necessary and sufficient for the Lasso technique to succeed Incontrasttoremovingallcomputationalstrictures,itisalso inrecoveringthesparsitypattern,undercertainassumptionson interestingtounderstandwhatperformancecanbeachievedby the signal-to-noise ratio (SNR) [17], [18]. While the scalings algorithmsevensimplerthanLassoandOMP.Tothisend,we (1) and (2) are superficially similar, the presence of noise can consider a computationally trivial maximum correlation (MC) greatlyincreasethenumberofmeasurementsrequired.Forex- algorithm and a closely related thresholding estimator that ample,if then measurementsissufficient has been recently studied in [28]. Similar to Lasso and OMP, in the noiseless case but measurements are thresholdingmay alsoperformsignificantlyworse thanMLat neededinthenoisycase. highSNRs.Infact,weprovideapreciseboundonthisperfor- It should be stressed that this work is concerned only with mancegapandshowthatthresholdingmayrequireasmanyas sparsitypatterndetection—notwithboundingorestimatingthe moremeasurementsthanML. errorwhenestimating innoise.Recoveringthesparsitypat- However, at high SNRs, the gap between thresholding and terncertainlyresultsinwell-controlled error,butitisnotnec- otherpracticalmethodssuchasLassoandOMPisnotaslarge. essaryforlow error.Twoimportantworksinthisregardare In particular, the gap does not grow with SNR. We show that the DantzigselectorofCandèsand Tao[19]and theriskmin- the gap between thresholding on the one hand and Lasso and imization estimator of Haupt and Nowak [20]. Mean-squared OMP on the other hand is instead described precisely by the errorandotherformulationsarebeyondthescopeofthispaper. MAR.Inparticular,theLassoandOMPalgorithmsperformsig- nificantlybetterthanthresholdingwhenthespreadofnonzero B. PreviewofMainResults component magnitudes is large and the estimator must detect Thecondition(2)appliestoLasso,whichisacomputation- relatively small values. On the other hand, when the spread is allytractablebutsuboptimalestimationmethod.Anaturalques- bounded, their performances (in terms of number of measure- tionthenis:Whatarethelimitsonsparsityrecoveryifcompu- mentsforsuccess)canbematchedwithinaconstantfactorbya tational constraints are removed? We address this in our first computationallytrivialalgorithm. main result, Theorem 1 in Section III, which considers max- TableIpreviewsourmainresultsinthe contextofprevious imum-likelihood (ML) estimation. This result is a necessary results for Lasso and OMP. The measurement model and pa- conditionforMLtoasymptoticallyrecoverthesparsitypattern rameters and are defined in Section II. Arbitrarily correctlywhen hasi.i.d.Gaussianentries.ItshowsthatML smallconstantshavebeenomittedtosimplifythetableentries. requires a scaling of the number of measurements that differs C. OrganizationofthePaper fromtheLassoscalinglaw(2)byasimplefactorthatdepends onlyontheSNRandwhatwecalltheminimum-to-averageratio The setting is formalized in Section II. In particular, we (MAR) of the component magnitudes. This expression shows define our concepts of SNR and MAR; our results clarify the that, at high SNRs, there is a potentially large gap in perfor- rolesofthesequantitiesinthesparsityrecoveryproblem.Nec- mancebetweenwhatisachievablewithMLdetectionandcur- essary conditions for success of any algorithm are considered rentpracticalalgorithmssuchasLassoandorthogonalmatching in Section III. There we present a new necessary condition pursuit(OMP).Findingalternativepracticalalgorithmsthatcan andcompareittopreviousresultsandnumericalexperiments. closethisgapisanopenresearcharea. SectionIVintroducesandanalyzesaverysimplethresholding Authorized licensed use limited to: MIT Libraries. Downloaded on February 11, 2010 at 14:46 from IEEE Xplore. Restrictions apply. 5760 IEEETRANSACTIONSONINFORMATIONTHEORY,VOL.55,NO.12,DECEMBER2009 algorithm. Conclusions are given in Section V, and proofs where isthe thcolumnof .Thequantity hasanat- appearintheAppendices. uralinterpretation:Thenumerator isthesignal energy due to the smallest nonzero component in , while the II. PROBLEMSTATEMENT denominator isthetotalnoiseenergy.Theratio thus represents the contribution to the SNR from the smallest Consider estimating a -sparse vector through a nonzero component of the unknown vector . Our choice of vectorofobservations variancesfortheelementsof and yields . Also,observethat(5)and(6)show (3) where isarandommatrixwithi.i.d. en- (8) tries.Thevector representsadditivenoiseandalsohas i.i.d. components.Denotethesparsitypatternof (positionsofnonzeroentries)bytheset ,whichisa -ele- A. Normalizations mentsubsetofthesetofindices .Estimatesofthe Otherworksuseavarietyofnormalizations,e.g.,theentries sparsitypatternwillbedenotedby withsubscriptsindicating of havevariance in[13],[25];theentriesof haveunit thetypeofestimator.Weseekconditionsunderwhichthereex- varianceandthevarianceof isavariable in[17],[18],[23], istsanestimatorsuchthat withhighprobability. [26],[27];andourscalingof andanoisevarianceof are Thesuccessorfailureofanydetectionalgorithmwilldepend usedin[20].Thisnecessitatesgreatcareincomparingresults. on the unknown deterministic vector and the realizations of We have expressed all our results in terms of , , themeasurementmatrix andnoise .Ofcourse,theanalysis and asdefinedabove.Allofthesequantitiesaredimen- handles and probabilistically. In addition, we would like sionless;ifeither and or and arescaledtogether,these toreducethedependenceon tooneortwoscalarparameters ratios will not change. Thus, the results can be applied to any so that our results are easy to interpret. A necessary condition scalingof , and ,providedthatthequantities , , for MLcan begivenusing onlythe magnitudeof thesmallest and arecomputedviatheirratiodefinitions. nonzeroentryof 2 Toaidsomereadersininterpretingtheresultsandcomparing tootherresultsintheliterature,someexpressionsaregivenin equivalentformsusing and .Itshouldbenoted,how- ever,thatanyconditionon and hasanimplicitdepen- Forotherresultsandtocomparealgorithms,weneedmorethan denceonthenormalizationsof and . only .Wethusparameterizethedependenceon through twoquantities:theSNR,andwhatwewillcalltheminimum-to- B. ReviewofLassoandOMPPerformance averageratio(MAR). TheSNRisdefinedby Asdiscussedabove,onecommonapproachfordetectingthe sparsitypatternof istheso-calledLassomethodof[7],also (4) known as basis pursuit denoising [6]. The Lasso method first findsanestimate of viatheoptimization Sinceweareconsidering asanunknowndeterministicvector, and the matrix and vector have i.i.d. compo- (9) nents,itiseasilyverifiedthat where isanalgorithmparameter.TheLassoestimateis (5) essentiallyaleast-squareminimization withan additionalreg- ularization term , which encourages the solution to be TheMARof isdefinedas sparse. The sparsity pattern of can then be used as an esti- mate of the sparsity pattern of . Wewill denote this estimate (6) as Since is the average of , with the upper limit occurring when all the nonzero en- Necessaryandsufficientconditionsforsparsityrecoveryvia triesof havethesamemagnitude. canbeinterpretedas Lasso were determined by Wainwright [17], [18]. He showed areciprocalofdynamicrange. that AnotherquantityofinterestistheminimumcomponentSNR, (10) defined as isnecessaryforLassotoasymptoticallydetectthecorrectspar- (7) sitypatternforany atanySNRlevel.Conversely,ifthemin- imumcomponentSNRscalesas 2Themagnitudeofthesmallestnonzeroentryofthesparsevectorwasfirst highlightedasakeyparameterinsparsitypatternrecoveryintheworkofWain- (11) wright[17],[18],[23]. Authorized licensed use limited to: MIT Libraries. Downloaded on February 11, 2010 at 14:46 from IEEE Xplore. Restrictions apply. FLETCHERetal.:NECESSARYANDSUFFICIENTCONDITIONSFORSPARSITYPATTERNRECOVERY 5761 thenthecondition(10)isalsosufficient;i.e.,thereexistsase- (13b) quenceofthresholdlevels thatguaranteesasymptotically reliablerecovery.3Using(8),thecondition(11)isequivalentto (13c) issufficientforMLtoasymptoticallyreliablyrecoverthespar- Therefore,forthemeasurementscalinggivenpreciselyby(10), sitypattern.Notethattheequalityin(13c)isaconsequenceof weneedthat .Thecondition(10)isincluded (8).Ourfirsttheoremprovidesacorrespondingnecessarycon- in Table I. dition. Another common approach to sparsity pattern detection is Theorem 1: Let , , , the greedy OMP algorithm [30]–[32]. This was analyzed by and bedeterministicsequencesin suchthat TroppandGilbert[33]inasettingwithnonoise.Morerecently, and FletcherandRangan[29]improvedthisanalysis,loweringthe numberofmeasurementssufficientforrecoverywhilealsoal- (14a) lowingnoisesatisfying(11)andboundeduncertaintyinapriori knowledgeof .Theyshowthat,when hasGaussianentries, asufficientconditionforasymptoticreliablerecoveryis (14b) (12) (14c) similar to the condition for Lasso. This condition is also sup- for some . Then even the ML estimator asymptotically portedbynumericalexperimentsreportedin[33].Thesufficient cannotdetectthesparsitypattern,i.e., condition(12)appearsinTableI. III. NECESSARYCONDITIONFORSPARSITYRECOVERY Wefirstconsidersparsityrecoverywithoutbeingconcerned Proof: SeeAppendixB. with computational complexity of the estimation algorithm. Thetheoremprovidesasimplelowerboundontheminimum Since our formulation is non-Bayesian, we consider the ML numberofmeasurementsrequiredtorecoverthesparsitypattern estimate, which is optimal when there is no prior on other intermsof , ,andtheminimumcomponentSNR, . thanitbeing -sparse. Noteagainthattheequalityin(14c)isdueto(8). The vector is -sparse, so the vector belongs to one of subspaces spanned by of the columns of . Remarks: Estimationofthesparsitypatternistheselectionofoneofthese 1) Thetheoremstrengthensanearliernecessaryconditionin subspaces,andsincethenoise isGaussian,theMLestimate [24]whichshowedthatthereexistsa suchthat minimizes the Euclidean norm of the residual between and its projection to the subspace. More simply, the ML estimate choosesthesubspaceclosestto . Mathematically, the ML estimator can be described as fol- is necessary for asymptotic reliable recovery. Theorem 1 lows.Givenasubset ,let denotetheor- strengthens the result to reflect the dependence on MAR thogonalprojectionofthevector ontothe subspacespanned andmaketheconstantexplicit. bythe vectors .TheMLestimate ofthe sparsity 2) When the first of the two terms in the maximum in the pattern is sufficient condition (13) dominates, the sufficient condi- tionmatchesthenecessarycondition(14)withinaconstant factor. The fact that the two conditions match is not sur- prisingsincetheproofsofthetwousesimilarmethods:The where denotesthecardinalityof .Thatis,theMLestimate necessaryconditionisprovenbyconsideringthetailprob- isthesetof indicessuchthatthesubspacespannedbythecor- abilityofallerroreventswithasingleincorrectvector.The responding columns of contain the maximum signal energy sufficientconditionisprovenbyboundingthesumproba- of . bilityofallerrorevents. ML estimation for sparsity recovery was first examined by However,thefirsttermin(13c), ,is Wainwright[23].Heshowedin[23,Theorem1]thatthereexists not always dominant. Forexample,if aconstant suchthatthecondition or , then the second term may be larger. In this case, there is a gap between the necessary (13a) and sufficient conditions. The exact scalings for reliable sparsity recovery with ML detection in these regimes are 3Sufficientconditionsunderweakerconditionson aregivenin[18]. notknown. Interpretingtheseismoresubtle:thescalingof with(cid:0)determinesthe 3) The bound (14) strengthens earlier results in the regime sequencesofregularizationparameters(cid:0)(cid:2) (cid:2)forwhichasymptoticalmostsure successisachieved,and(cid:0)(cid:2) (cid:2)affectsthesufficientnumberofmeasurements. where is bounded and the sparsity ratio Authorized licensed use limited to: MIT Libraries. Downloaded on February 11, 2010 at 14:46 from IEEE Xplore. Restrictions apply. 5762 IEEETRANSACTIONSONINFORMATIONTHEORY,VOL.55,NO.12,DECEMBER2009 is fixed.4 To see this point, we can compare (14) against 8) Our definition of the ML estimator requires that the previous information-theoretic bounds in [21]–[23], [26], numberofnonzerocomponents mustbeknownapriori. [27]. As one example, consider the bound in [21], which Of course, in many practical applications, may not usesasimplecapacityargumenttoshowthat be known. If must be estimated, we would expect the number of measurements to be even higher, so the lower (15) bound(14)shouldapplyinthiscaseaswell. 9) The condition (14) can be rearranged to be a necessary isnecessaryforsparsitypatterndetection.When and condition on a parameter other than . For example, re- theSNRarebothfixed, cansatisfy(15)whilegrowing arrangingtoobtainanecessaryconditionon givesan onlylinearlywith .Theothercapacity-basedboundshave improvementover[34,Corollary4.1]byaboutafactorof thesameproperty. intherestrictedscenarioof consideredtherein. In contrast, Theorem 1 shows that with fixed and 10) After the posting of [35], Theorem 1 was generalized to bounded, isnecessaryfor i.i.d.non-Gaussiandistributionsfortheentriesof in[36]. reliablesparsityrecovery.Thatis,thenumberofmeasure- Anadditionalcontributionof[36]isthestudyofaspecific ments must grow superlinearly in in the linear sparsity distributionfor obtained bymultiplyingi.i.d.Gaussian regimewithboundedSNR. entriesbyi.i.d.Bernoullivariables. 4) Intheregimeofsublinearsparsity(where )orin Comparison to Lasso and OMP: Comparing the necessary theregimewhere ,information-theoretic conditionforMLin(14c)againstthesufficientconditions(10) boundssuchas(15)maybestrongerthan(14)depending and(12)forLassoandOMP,weseethatMLmayrequiredra- ontheprecisescalingof , ,andotherterms. maticallyfewermeasurementsthanLassoorOMP.Specifically, 5) Forasetofsparsesignalsspecifiedbyagiven ,spar- MLmayrequireafactorof fewermeasure- sitypatternrecoveryismostdifficultwhenthemagnitudes ments.Thisgapgrowsas . of all nonzero entries is . (This makes .) Of course, since the ML scaling (14) is only necessary, the Thus,minimaxrateconditionsforthissetaredetermined actualperformancegapbetweenMLandLassomaybesmaller. byanalysisofthecasewithallnonzeroentriesof equal However,we doknow that in the noiseless case, for any fixed to .In[23,Theorem2]Wainwrightgivesanecessary -sparse ,itissufficienttohave forMLtosucceed conditionof withprobability overthechoiceof .Thisisascalingatwhich Lasso fails, even in the noiseless case [17], [18]. It is an open questiontocharacterizetheprecisegapandtodetermineifthere for asymptotic reliable sparsity pattern recovery of such arepracticalalgorithmsthatcloseit. signals.5 When and areofthesame Numerical Validation: The theorem is asymptotic, so it order, Wainwright’s earlier necessary condition is within cannotbeconfirmedcomputationally.Evenqualitativesupport aconstantfactorofthatofTheorem1.WhileWainwright ishardtoobtainbecauseofthehighcomplexityofMLdetec- correctlymakesthepointthat isthemostimportant tion.Nevertheless,wepresentsomeevidenceobtainedthrough quantityforsparsitypatternrecovery,oneshouldbecareful MonteCarlosimulation. tounderstandthatallthenonzeroentriesof affectdetec- Fig.1 showsthe probability of success of MLdetection for tionperformance;theremainingnonzeroentriesdisappear as , , , and are varied, with each point from Wainwright’s analysis because he gives a minimax representingatleast500independenttrials.Eachsubpanelgives result. simulationresultsfor and forone pair.Signalswith arecreated 6) Results more similar to Theorem 1—based on direct analyses of error events rather than information-theoretic byhavingonesmallnonzerocomponentand equal,larger nonzerocomponents.Overlaidontheintensityplotsisawhite arguments—appeared in [24], [25]. The previous results showed that with fixed SNR as defined here, sparsity curverepresenting(14).6 recovery with must fail. The more refined Takinganyonecolumnofonesubpanelfrombottomtotop analysisinthispapergivestheadditional factor shows that as is increased, there is a transition from ML andtheprecisedependenceon . failing to ML succeeding. One can see that (14) follows the 7) Theorem 1 is not contradicted by the relevant sufficient failure–successtransitionqualitatively.Inparticular,theempir- condition of [26], [27]. That sufficient condition holds icaldependenceon and approximatelyfollows(14c). for scaling that gives linear sparsity and Empirically,forthe(small)valueof ,itseemsthatwith . For , Theorem heldfixed,sparsityrecoverybecomeseasieras 1 shows that fewer than measurements increases(and decreases). will cause ML decoding to fail, while [27, Theorem 3.1] Less extensiveMonte Carlo simulations for are re- shows that a typicality based decoder will succeed with portedinFig.2.Theresultsarequalitativelysimilar.Asmight measurements. beexpected,thetransitionfromlowtohighprobabilityofsuc- cessfulrecoveryasafunctionof appearsmoresharpat 4Wewillsometimesuselinearsparsitytomeanthat(cid:0)(cid:2)(cid:3)isfixed. thanat . 5While[23,Theorem2]isstatedwithoutspecifyingaleadingconstant,the constant(cid:0)canbeextractedfromitsproof. 6ColorversionsofFigs.1,2and4areavailableonlinein[35]. Authorized licensed use limited to: MIT Libraries. Downloaded on February 11, 2010 at 14:46 from IEEE Xplore. Restrictions apply. FLETCHERetal.:NECESSARYANDSUFFICIENTCONDITIONSFORSPARSITYPATTERNRECOVERY 5763 Fig.1. SimulatedsuccessprobabilityofMLdetectionfor(cid:0) (cid:0) (cid:2)(cid:3)andmanyvaluesof(cid:2),(cid:3), ,and .Eachsubgraphgivessimulationresultsfor(cid:2) (cid:0) (cid:2)(cid:4)(cid:4)(cid:2)(cid:4)(cid:5)(cid:5)(cid:5)(cid:4)(cid:6)(cid:3)and(cid:3)(cid:0)(cid:2)(cid:4)(cid:4)(cid:2)(cid:4)(cid:5)(cid:5)(cid:5)(cid:4)(cid:7)(cid:3)(cid:3)forone(cid:8) (cid:4) (cid:9)pair.Eachsubgraphheadinggives(cid:8) (cid:4) (cid:9).Eachpointrepresentsatleast500independenttrials. Overlaidontheintensityplotsisawhitecurverepresenting(14). This has slightly higher complexity because it requires the sorting of , but it also has better performance in principle.Itisstraightforwardtoshowthat ifand only if there exists a threshold such that . UsingMCinsteadofthresholdingrequiresaprioriknowledge of buteliminatestheneedtopicka“correct”thresholdvalue. Oursufficientconditionisfortheweakerthresholdingestimate andthusappliesalsototheMCestimate. A variant of these algorithms, called the “one-step greedy algorithm (OSGA),” was proposed by Duarte et al. [37] for detecting jointly sparse signals. For our application, the most closelyrelatedpreviousstudywasbyRauhutetal.[28].They Fig.2. SimulatedsuccessprobabilityofMLdetectionfor(cid:0)(cid:0)(cid:7)(cid:3); (cid:0)(cid:4)(cid:3); proved a sufficient condition for asymptotic reliability of the (cid:0)(cid:4)(left)or (cid:0)(cid:3)(cid:5)(cid:6)(right);andmanyvaluesof(cid:2)and(cid:3).Eachsub- thresholdingestimatewhenthereisnonoise.Thefollowingthe- graphgivessimulationresultsfor(cid:2)(cid:0)(cid:2)(cid:4)(cid:4)(cid:2)(cid:4)(cid:5)(cid:5)(cid:5)(cid:4)(cid:6)(cid:3)and(cid:3)(cid:0)(cid:2)(cid:4)(cid:4)(cid:2)(cid:4)(cid:5)(cid:5)(cid:5)(cid:4)(cid:7)(cid:3)(cid:3), oremtightensthepreviousresultinthenoiselesscaseandgen- witheachpointrepresentingatleast1000independenttrials.Overlaidonthe intensityplots(withscaleasinFig.1)isawhitecurverepresenting(14). eralizestothenoisycase. Theorem 2: Let , , , IV. SUFFICIENTCONDITIONWITHTHRESHOLDING and bedeterministicsequencesin suchthat and TheMLestimatoranalyzedintheprevioussectionbecomes computationally infeasible quickly as problem size increases. (19) We now consider a sparsity recovery algorithm even simpler thanLassoandOMP.Itisnotmeantasacompetitivealternative. forsome ,where Rather, it serves to illustrate the precise benefits of Lasso and OMP. (20) Asbefore,let bethe thcolumnoftherandommatrix . Definethethresholdingestimateas Thenthereexistsasequenceofthresholdlevels such thatthresholdingasymptoticallydetectsthesparsitypattern,i.e., (16) where isthecorrelation (17) Proof: SeeAppendixC. Remarks: and is a threshold level. This algorithm simply corre- 1) Thefactor in(20)canbeboundedtoexpress(19) lates the observed signal with all the frame vectors and inaformmoreeasilycomparableto(14c).If , selectstheindices wherethecorrelationenergyexceedsacer- then , so . tainlevel . Usingthisboundwith(19)showsthatthemorerestrictive Acloselyrelatedalgorithmistocomputethemaximumcor- condition relation(MC)estimate isoneofthe largestvalues (18) (21) Authorized licensed use limited to: MIT Libraries. Downloaded on February 11, 2010 at 14:46 from IEEE Xplore. Restrictions apply. 5764 IEEETRANSACTIONSONINFORMATIONTHEORY,VOL.55,NO.12,DECEMBER2009 issufficientforasymptoticreliabilityofthresholdingwhen .ThisexpressionisshowninTableI,wherethe infinitesimal quantityhasbeenomittedforsimplicity. Fromtheexpression wecandrawfurtherconclusions:(a) ;(b)when , , so(21)isasymptoticallyequivalentto(19);and(c)when ,thesimplerform(21)ispessimistic.Fig.3plots theratio asafunctionof forafew sparsityscalings . 2) Comparing (14c) and (19), we see that thresholding re- Fig.3. Plotsof(cid:2)(cid:3)(cid:3)(cid:4)(cid:0)(cid:4)(cid:5)(cid:5)(cid:6)(cid:7)(cid:3)(cid:3) (cid:0)(cid:4) asafunctionof(cid:0) forafewsparsity quiresafactorofatmost regimes(cid:0)(cid:3)(cid:3)(cid:4).When(cid:0)(cid:5)(cid:3)isconst(cid:0)ant,theratioapproaches(cid:8).When(cid:0) (cid:9) (cid:3) more measurements than ML estimation. This factor is for(cid:6) (cid:3)(cid:10)(cid:4)(cid:2)(cid:4),theratioapproaches(cid:3)(cid:2)(cid:0)(cid:3)(cid:6)(cid:4) .When(cid:0) isasymptotically (cid:2) smallerthananypowerof(cid:6),theratioapproaches(cid:2). upper-bounded by and can be as small as . Thefactor hasanaturalinterpretation:Thelower for reliable recovery. Therefore, like thresholding, Lasso boundforMLestimationin(14c)isprovenbyessentially does notachievea scaling betterthan assumingthat,whendetectingeachcomponentoftheun- ,evenatinfiniteSNR. known vector , the other components are known. 4) There is an important advantage of Lasso over thresh- Thus,thedetectiononlyseesinterferencefromtheadditive olding. Comparing (22) and (23), we see that, at high noise .Incontrast,thresholdingtreatstheother vec- SNR,thresholdingrequiresafactorofupto more torsasnoise,resultinginatotalincreaseineffectivenoise measurements than Lasso. This factor is largest when by a factor of . To compensate for this increase is small, which occurs when there are relatively ineffectivenoise,thenumberofmeasurements mustbe smallnonzerocomponents.Thisgaprevealsthekeyben- scaledproportionally. efit of Lasso over thresholding: its ability to detect small Wecanthinkofthisadditionalnoiseasself-noise,bywhich coefficients, even when they are much below the average we mean the interference caused from different compo- energy. nents of the signal interfering with one another in the AthighSNRs,thegapof canbearbitrarilylarge. observedsignal throughthemeasurementmatrix .This As an extreme example, suppose the unknown vector self-noiseisdistinctfromtheadditivenoise . has componentswith andonecomponent 3) ThegapbetweenthresholdingandMLcanbelargeathigh with for some . Then, and SNRs. As one extreme, consider the case where ,wheretheapproximationisvalidforlarge . . For ML estimation, the lower bound on the number Nowif satisfies(10),thenitcanbeverifiedthat(11)is of measurements required by ML decreases to as alsosatisfied.Therefore,thescaling(10)willbesufficient .7Incontrast,withthresholding,increasingthe forLasso.Incomparison,thresholdingcouldneedasmuch SNRhasdiminishingreturns:as ,theboundon as more measurements, which grows to thenumberofmeasurementsin(19)approaches infinitywith .So,athighSNRs,Lassocansignificantly outperformthresholdingbecausewerequiretheestimator (22) torecoverallthenonzerocomponents,eventheverysmall ones. where the approximation holds for linear sparsity. Thus, 5) ThehighSNRlimit (22)matchesthesufficientcondition evenwith ,theminimumnumberofmeasure- in[28]forthenoise-free case,exceptthattheconstantin mentsisnotimprovedfromthescaling (22)istighter. . 6) We have emphasized dimensionless quantities so that the By the discussion in Remark 2, we can think of this normalizationsof and areimmaterial.Anequivalentto problem as a self-noise limit: As the additive noise is (19)thatdependsonthenormalizationsdefinedhereinis reduced, thresholding becomes limited by signal-depen- dentnoise.Thisself-noiselimitisalsoexhibitedbymore sophisticated methods such as Lasso. For example, as discussed earlier, the analysis of [17] shows that when ,Lassorequires ThresholdSelection: Typically,thethresholdlevel in(16) issetbytradingoffthefalsealarmandmisseddetectionprob- (23) abilities. Optimal selection depends on a variety of factors in- 7Ofcourse,atleast(cid:0)(cid:0)(cid:2)measurementsarenecessary. cludingthenoise,componentmagnitudes,andthestatisticson Authorized licensed use limited to: MIT Libraries. Downloaded on February 11, 2010 at 14:46 from IEEE Xplore. Restrictions apply. FLETCHERetal.:NECESSARYANDSUFFICIENTCONDITIONSFORSPARSITYPATTERNRECOVERY 5765 Fig.4. Simulatedsuccessprobabilityofthresholdingdetectionfor(cid:0) (cid:0)(cid:2)(cid:3)(cid:3)andmanyvaluesof(cid:2),(cid:3), ,and .Eachsubgraphgivessimulationresults for(cid:2) (cid:0) (cid:2)(cid:2)(cid:4)(cid:4)(cid:4)(cid:5)(cid:5)(cid:5)(cid:4)(cid:4)(cid:3)(cid:3)and(cid:3) (cid:0) (cid:2)(cid:4)(cid:6)(cid:4)(cid:6)(cid:3)(cid:4)(cid:5)(cid:5)(cid:5)(cid:4)(cid:2)(cid:3)(cid:3)(cid:3)(cid:3)forone(cid:7) (cid:4) (cid:8)pair.Eachsubgraphheadinggives(cid:7) (cid:4) (cid:8),so (cid:0) (cid:2)(cid:4)(cid:2)(cid:3)(cid:4)(cid:2)(cid:3)(cid:3)forthethree columnsand (cid:0)(cid:2)(cid:4)(cid:3)(cid:5)(cid:6)(cid:4)(cid:3)(cid:5)(cid:4)forthethreerows.Eachpointrepresents1000independenttrials.Overlaidontheintensityplots(withscaleasinFig.1)isablack curverepresenting(19). the number of nonzero components. The proof of Theorem 2 Necessary and sufficient scaling laws for the number of mea- (seeAppendixC)setsthethresholdlevelto surementstorecoverthesparsitypatternfordifferentdetection algorithmswerederived.Theanalysisrevealstheeffectoftwo keyfactors:thetotalSNR,aswellastheMAR,whichisamea- sure of the spread of component magnitudes. The product of where depends on . Thus, the threshold selection explicitly thesefactorsis timestheSNRcontributionfromthesmallest requiresknowledgeof . nonzerocomponent;thisproductoftenappears. If isnotknown,buthasaknownstatisticaldistribution,one Ourmainconclusionsareasfollows. can use the threshold, , where is the expected • Necessary and sufficient scaling laws. As a necessary valueof .AstraightforwardmodificationtotheproofofThe- conditionforsparsitypatterndetection,wehaveprovena orem2showsthatif hasadistributionsuchthat lower bound on the minimum number of measurements for ML estimation to work. We also derived a sufficient conditionforatriviallysimplethresholdingestimator. With fixed SNR and MAR, both the necessary and suffi- almostsurely,thenthethreshold willworkaswell. cient scaling laws have the form . Ofcourse,if isactuallyknownapriori,onecandoslightly However,theeffectoftheSNRandMARcanbedramatic better than thresholding with the MC algorithm, obviating the and is what primarily differentiates the performance be- needforthresholdselection. tweendifferentalgorithms. NumericalValidation: Thresholdingisextremelysimpleand • Self-noiselimitsathighSNR.Thresholdingmayrequireas canthusbesimulatedeasilyforlargeproblemsizes.Fig.4re- manyas timesmoremeasurementsthanML portstheresultsofalargenumberMonteCarlosimulationsof estimation, which is significant at high SNRs. The factor thethresholdingmethodwith .Thesufficientcondition hasaninterpretationasaself-noiseeffect. predicted by(19) matcheswell tothe parameter combinations As a result there is a self-noise limit: As , where the probability of success drops below about . To thresholdingachievesafundamentallyworsescalingthan avoid the issue of threshold selection, we have used the max- ML. Specifically, ML may in principle be able to detect imumcorrelationestimator(18)insteadof(16). the sparsity pattern with measurements. In contrast,duetotheself-noiseeffect,thresholdingrequires V. CONCLUSION at least . Unfortunately, the more Wehaveconsideredtheproblemofdetectingthesparsitypat- sophisticated Lasso and OMP methods also require ternofasparsevectorfromnoisyrandomlinearmeasurements. scaling. Authorized licensed use limited to: MIT Libraries. Downloaded on February 11, 2010 at 14:46 from IEEE Xplore. Restrictions apply. 5766 IEEETRANSACTIONSONINFORMATIONTHEORY,VOL.55,NO.12,DECEMBER2009 • Lasso, OMP and dependence on MAR. Lasso and OMP, Thatis, isequaltothetruesparsitypattern ,exceptthat however, have an important advantage over thresholding a single “correct” index has been replaced by an “incorrect” at high SNRs, which is their ability to deal with a large index .If theMLestimatoris toselect then the rangeincomponentmagnitudes.Specifically,thresholding energyofthenoisyvector mustbelargeronthetruesubspace mayrequireupto timesmoremeasurementsthan thantheincorrectsubspace .Therefore Lasso.Thus,whentherearenonzerocomponentsthatare much below the average component energy, thresholding will perform significantly worse. However, when the (25) componentsoftheunknownvectorhaveequalmagnitudes , Lasso and thresholding have asymptotic Now,asimpleapplicationofthematrixinversionlemmashows scalingwithinaconstantfactor. thatsince While the results provide insight into the limits of detec- tion,thereareclearlymanyopenissues.Mostimportantly,the (26a) well-known “practical” algorithms—Lasso, OMP, and thresh- olding—allappeartohaveascalingofatleast Also,since measurementsas .Incontrast,MLmaybeable to achieve a scaling of with sufficient SNR. An open question is whether there is any practical algorithm that (26b) canachieveasimilarscalingathighSNR. A second open issue is to determine conditions for partial Substituting(26a)–(26b)into(25)andcanceling shows sparsityrecovery.Theaboveresultsdefineconditionsforrecov- (24). eringallthepositionsinthesparsitypattern.However,inmany practicalapplications,obtainingsomelargefractionofthesepo- sitionswouldbeadequate.Neitherthelimitsofpartialsparsity APPENDIXB recoverynortheperformanceofpracticalalgorithmsarecom- PROOFOFTHEOREM1 pletely understood, thoughsome results have been reportedin Tosimplify notation, assume without lossof generalitythat [25]–[27],[38]. .Also,assumethattheminimizationin(8) occursat with APPENDIXA (27) DETERMINISTICNECESSARYCONDITION Finally,sinceaddingmeasurements(i.e.,increasing )canonly TheproofofTheorem1isbasedonthefollowingdetermin- improve the chances that ML detection will work, we can as- isticnecessaryconditionforsparsityrecovery.Recallthenota- sumethatinaddition tosatisfying(14c), the numbers ofmea- tionthatforany , denotestheorthogonal surementssatisfythelowerbound projection onto the span of the vectors . Additionally, let denotetheorthogonalprojectionontotheor- (28) thogonalcomplementof . Lemma1: AnecessaryconditionforMLdetectiontosuc- forsome .Thisassumptionimpliesthat ceed(i.e., )is (29) Here and in the remainder of the proof the limits are as , , forall and and subjectto(14c)and(28).Withtheserequirements (24) on ,weneedtoshow . where . From Lemma 1 in Appendix A, implies (24). Proof: Notethat isanorthogonaldecom- Thus positionof intothe portionsinsideand outsidethe subspace . An approximation of in subspace leavesresidual .Intuitively,thecondition(24)requiresthat theresidualbeatleastashighlycorrelatedwiththeremaining “correct” vector as it is with any of the “incorrect” vectors . Fixany and ,andlet (30) Authorized licensed use limited to: MIT Libraries. Downloaded on February 11, 2010 at 14:46 from IEEE Xplore. Restrictions apply.
Description: