Sparse Methods for Direction-of-Arrival Estimation ZaiYang∗†, JianLi‡, PetreStoica§, andLihuaXie† October3,2016 6 1 0 Contents 2 p e 1 Introduction 3 S 0 2 DataModel 4 3 2.1 DataModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 TheRoleofArrayGeometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 ] T 2.3 ParameterIdentifiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 I . s 3 SparseRepresentationandDOAestimation 7 c [ 3.1 SparseRepresentationandCompressedSensing . . . . . . . . . . . . . . . . . . . . . . . . 7 3.1.1 ProblemFormulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1 v 3.1.2 ConvexRelaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 6 3.1.3 (cid:96) Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 q 9 3.1.4 MaximumLikelihoodEstimation(MLE) . . . . . . . . . . . . . . . . . . . . . . . 11 5 9 3.2 SparseRepresentationandDOAEstimation: theLinkandtheGap . . . . . . . . . . . . . . 12 0 . 9 4 On-GridSparseMethods 13 0 4.1 DataModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 6 4.2 (cid:96) Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1 2,0 : 4.3 ConvexRelaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 v i 4.3.1 (cid:96)2,1 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 X 4.3.2 DimensionalityReductionvia(cid:96) -SVD . . . . . . . . . . . . . . . . . . . . . . . . 15 2,1 r a 4.3.3 AnotherDimensionalityReductionTechnique . . . . . . . . . . . . . . . . . . . . . 16 4.4 (cid:96) Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2,q 4.5 SparseIterativeCovariance-basedEstimation(SPICE) . . . . . . . . . . . . . . . . . . . . 17 4.5.1 GeneralizedLeastSquares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.5.2 SPICE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.6 MaximumLikelihoodEstimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.7 RemarksonGridSelection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 ∗SchoolofAutomation,NanjingUniversityofScienceandTechnology,Nanjing210094,China †SchoolofElectricalandElectronicEngineering,NanyangTechnologicalUniversity,Singapore639798 ‡DepartmentofElectricalandComputerEngineering,UniversityofFlorida,Gainesville,FL32611,USA §DepartmentofInformationTechnology,UppsalaUniversity,Uppsala,SE75105,Sweden 1 5 Off-GridSparseMethods 22 5.1 FixedGrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5.1.1 DataModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5.1.2 (cid:96) Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 1 5.1.3 SparseBayesianLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.2 DynamicGrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.2.1 DataModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.2.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 6 GridlessSparseMethods 25 6.1 DataModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 6.2 VandermondeDecompositionofToeplitzCovarianceMatrices . . . . . . . . . . . . . . . . 26 6.3 TheSingleSnapshotCase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 6.3.1 AGeneralFrameworkforDeterministicMethods . . . . . . . . . . . . . . . . . . . 29 6.3.2 Atomic(cid:96) Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 0 6.3.3 AtomicNorm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 6.3.4 Hankel-basedNuclearNorm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 6.3.5 ConnectionbetweenANMandEMaC . . . . . . . . . . . . . . . . . . . . . . . . . 34 6.3.6 CovarianceFittingMethod: GridlessSPICE(GLS) . . . . . . . . . . . . . . . . . . 35 6.3.7 ConnectionbetweenANMandGLS . . . . . . . . . . . . . . . . . . . . . . . . . . 37 6.4 TheMultipleSnapshotCase: CovarianceFittingMethods . . . . . . . . . . . . . . . . . . . 37 6.4.1 GridlessSPICE(GLS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 6.4.2 SMV-basedAtomicNormMinimization(ANM-SMV) . . . . . . . . . . . . . . . . 40 6.4.3 NuclearNormMinimizationfollowedbyMUSIC(NNM-MUSIC) . . . . . . . . . . 40 6.4.4 ComparisonofGLS,ANM-SMVandNNM-MUSIC . . . . . . . . . . . . . . . . . 41 6.5 TheMultipleSnapshotCase: DeterministicMethods . . . . . . . . . . . . . . . . . . . . . 41 6.5.1 AGeneralFramework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 6.5.2 Atomic(cid:96) Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 0 6.5.3 AtomicNorm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.5.4 Hankel-basedNuclearNorm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 6.6 ReweightedAtomicNormMinimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 6.6.1 ASmoothSurrogatefor(cid:107)Z(cid:107) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 A,0 6.6.2 ALocallyConvergentIterativeAlgorithm . . . . . . . . . . . . . . . . . . . . . . . 47 6.6.3 InterpretationasRAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 6.7 ConnectionsbetweenANMandGLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 6.7.1 TheCaseofL < M . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 6.7.2 TheCaseofL ≥ M . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 6.8 ComputationalIssuesandSolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 6.8.1 DimensionalityReduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 6.8.2 AlternatingDirectionMethodofMultipliers(ADMM) . . . . . . . . . . . . . . . . 52 7 FutureResearchChallenges 53 8 Conclusions 54 2 1 Introduction Directionofarrival(DOA)estimationreferstotheprocessofretrievingthedirectioninformationofseveral electromagneticwaves/sourcesfromtheoutputsofanumberofreceivingantennasthatformasensorarray. DOA estimation is a major problem in array signal processing and has wide applications in radar, sonar, wirelesscommunications,etc. The study of DOA estimation methods has a long history. For example, the conventional (Bartlett) beamformer, which dates back to the World War II, simply uses Fourier-based spectral analysis of the spatially sampled data. Capon’s beamformer was later proposed to improve the estimation performance of closelyspacedsources[1]. Sincethe1970swhenPisarenkofoundthattheDOAscanberetrievedfromdata second order statistics [2], a prominent class of methods designated as subspace-based methods have been developed, e.g., the multiple signal classification (MUSIC) and the estimation of parameters by rotational invarianttechniques(ESPRIT)alongwiththeirvariants[3–7]. Anothercommonapproachisthenonlinear leastsquares(NLS)methodthatisalsoknownasthe(deterministic)maximumlikelihoodestimation. Fora completereviewofthesemethods,readersarereferredto[8,9]. Notethatthesemethodssufferfromcertain well-known limitations. For example, the subspace-based methods and the NLS need a priori knowledge on the source number that may be difficult to obtain. Additionally, Capon’s beamformer, MUSIC and ESPRIT are covariance-based and require a sufficient number of data snapshots to accurately estimate the data covariance matrix. Moreover, they can be sensitive to source correlations that tend to cause a rank deficiencyinthesampledatacovariancematrix. Also,averyaccurateinitializationisrequiredfortheNLS sinceitsobjectivefunctionhasacomplicatedmultimodalshapewithasharpglobalminimum. The purpose of this article is to provide an overview of the recent work on sparse DOA estimation methods. Thesenewmethodsaremotivatedbytechniquesinsparserepresentationandcompressedsensing methodology[10–14],andmostofthemhavebeenproposedduringthelastdecade. Thesparseestimation (or optimization) methods can be applied in several demanding scenarios, including cases with no knowl- edgeofthesourcenumber,limitednumberofsnapshots(evenasinglesnapshot),andhighlyorcompletely correlatedsources. Duetotheseattractivepropertiestheyhavebeenextensivelystudiedandtheirpopularity isreflectedbythelargenumberofpublicationsaboutthem. Itisimportanttonotethatthereisakeydifferencebetweenthecommonsparserepresentationframework andDOAestimation. Tobespecific,thestudiesofsparserepresentationhavebeenfocusedondiscretelinear systems. Incontrasttothis,theDOAparametersarecontinuousvaluedandtheobserveddataarenonlinear intheDOAs. Dependingonthemodeladopted,wecanclassifythesparsemethodsforDOAestimationinto three categories, namely, on-grid, off-grid and gridless, which also corresponds to the chronological order in which they have been developed. For on-grid sparse methods, the data model is obtained by assuming that the true DOAs lie on a set of fixed grid points in order to straightforwardly apply the existing sparse representation techniques. While a grid is still required by off-grid sparse methods, the DOAs are not restricted to be on the grid. Finally, the recent gridless sparse methods do not need a grid, as their name suggests,andtheyoperatedirectlyinthecontinuousdomain. Theorganizationofthisarticleisasfollows. ThedatamodelforDOAestimationisintroducedinSection 2 for far-field, narrowband sources that are the focus of this article. Its dependence on the array geometry and the parameter identifiability problem are discussed. In Section 3 the concepts of sparse representation and compressed sensing are introduced and several sparse estimation techniques are discussed. Moreover, we discuss the feasibility of using sparse representation techniques for DOA estimation and highlight the key differences between sparse representation and DOA estimation. The on-grid sparse methods for DOA estimation are introduced in Section 4. Since they are straightforward to obtain in the case of a single data snapshot, we focus on showing how the temporal redundancy of multiple snapshots can be utilized to improve the DOA estimation performance. Then, the off-grid sparse methods are presented in Section 5. Section 6 is the main highlight of this article in which the recently developed gridless sparse methods are 3 presented. Thesemethodsareofgreatinterestsincetheyoperatedirectlyinthecontinuousdomainandhave strongtheoreticalguarantees. SomefutureresearchdirectionswillbediscussedinSection7andconclusions willbedrawninSection8. Notations used in this article are as follows. R and C denote the sets of real and complex numbers respectively. Boldface lettersare reservedfor vectors andmatrices. |·|denotes theamplitude of ascalar or the cardinality of a set. (cid:107)·(cid:107) , (cid:107)·(cid:107) and (cid:107)·(cid:107) denote the (cid:96) , (cid:96) and Frobenius norms respectively. AT, A∗ 1 2 F 1 2 and AH are the matrix transpose, complex conjugate and conjugate transpose of A respectively. x is the j jthentryofavectorx,andA denotesthejthrowofamatrixA. Unlessotherwisestated,x andA are j Ω Ω the subvector and submatrix of x and A obtained by retaining the entries of x and the rows of A indexed by the set Ω. For a vector x, diag(x) is a diagonal matrix with x on the diagonal. x (cid:23) 0 means x ≥ 0 j for all j. rank(A) denotes the rank of a matrix A and Tr(A) denotes the trace. For positive semidefinite matricesAandB,A ≥ B meansthatA−B ispositivesemidefinite. Finally,E[·]denotestheexpectation of a random variable, and for notational simplicity a random variable and its numerical value will not be distinguished. 2 Data Model 2.1 DataModel In this section, the DOA estimation problem is stated. Consider K narrowband far-field source signals s , k = 1,...,K, impinging on an array of omnidirectional sensors from directions θ , k = 1,...,K. k k Accordingto[8,9],thetimedelaysatdifferentsensorscanberepresentedbysimplephaseshifts,resulting inthefollowingdatamodel: K (cid:88) y(t) = a(θ )s (t)+e(t) = A(θ)s(t)+e(t), t = 1,...,L, (1) k k k=1 where t indexes the snapshot and L is the number of snapshots, y(t) ∈ CM, s(t) ∈ CK and e(t) ∈ CM denote the array output, the vector of source signals and the vector of measurement noise at snapshot t, respectively,whereM isthenumberofsensors. a(θ )istheso-calledsteeringvectorofthekthsourcethat k isdeterminedbythegeometryofthesensorarrayandwillbegivenlater. Thesteeringvectorscomposethe arraymanifoldmatrixA(θ) := [a(θ ),...,a(θ )]. Morecompactly,(1)canbewrittenas 1 K Y = A(θ)S +E, (2) whereY = [y(1),...,y(L)],andS andE aresimilarlydefined. GiventhedatamatrixY andthemapping θ → a(θ),theobjectiveistoestimatetheparametersθ ,k = 1,...,K thatarereferredtoastheDOAs. Itis k worthnotingthatthesourcenumberK isusuallyunknowninpractice;typically,K isassumedtobesmaller thanM,asotherwisetheDOAscannotbeuniquelyidentifiedfromthedata(seedetailsinSubsection2.3). 2.2 TheRoleofArrayGeometry Wenowdiscusshowthemappingθ → a(θ)isdeterminedbythearraygeometry. Wefirstconsiderageneral 2-DarraywiththeM sensorslocatedatpoints(cid:0)r ,θˇ (cid:1),expressedinpolarcoordinates. Forconvenience, m m theunitofdistanceistakenashalfthewavelengthofthewaves. Thena(θ)willbegivenby a (θ) = eiπrmcos(θ−θˇm), m = 1,...,M. (3) m 4 Figure1: ThesetupoftheDOAestimationproblemwithageneral2-Darray. Figure2: ThesetupoftheDOAestimationproblemwithaULA. Intheparticularlyinterestingcaseofalineararray,assumingthatthesensorsarelocatedonthenonneg- ativex-axis,wehavethatθˇ = 0◦,m = 1,...,M. Therefore,a(θ)willbegivenby m a (θ) = eiπrmcosθ, m = 1,...,M. (4) m Wecanreplacethevariableθbyf = 1 cosθanddefinewithoutambiguitya(f) := a(arccos(2f)) = a(θ). 2 Then,themappinga(f)isgivenby a (f) = ei2πrmf, m = 1,...,M. (5) m In the case of a single snapshot, obviously, the spatial DOA estimation problem becomes a temporal fre- quencyestimationproblem(a.k.a. linespectralestimation)giventhesamplesy ,m = 1,...,M measured m attimeinstantsr ,m = 1,...,M. m Ifwefurtherassumethatthesensorsofthelineararrayareequallyspaced,thenthearrayisknownasa uniformlineararray(ULA).Weconsiderthecasewhentwoadjacentantennasofthearrayarespacedbya 5 unitdistance(halfthewavelength). Then,wehavethatr = m−1and m (cid:104) (cid:105)T a(f) = 1,ei2πf,...,ei2π(M−1)f . (6) IfalineararrayisobtainedfromaULAbyretainingonlyapartofthesensors,thenitisknownasasparse lineararray(SLA). It is worth noting that for a 2-D array, it is possible to estimate the DOAs in the entire 360◦ range, while for a linear array we can only resolve the DOAs in a 180◦ range: θ ∈ [0◦,180◦). In the latter k case, correspondingly, the “frequencies” are: f = 1 cosθ ∈ (cid:0)−1, 1(cid:3). Throughout this article, we let D k 2 k 2 2 θ denote the domain of the DOAs that can be [0◦,360◦) or [0◦,180◦), depending on the context. Also, let T = (cid:0)−1, 1(cid:3) be the frequency interval for linear arrays. Finally, we close this section by noting that the 2 2 grid-based(on-gridandoff-grid)sparsemethodscanbeappliedtoarbitrarysensorarrays,whiletheexisting gridlesssparsemethodsaretypicallylimitedtoULAsorSLAs. 2.3 ParameterIdentifiability TheDOAs{θ }K canbeuniquelyidentifiedfromY = A(θ)S iftheredonotexist{θ(cid:48)}K (cid:54)= {θ }K and S(cid:48) such thkatk=Y1 = A(θ)S = A(cid:0)θ(cid:48)(cid:1)S(cid:48). Guaranteeing that the parameters can be ukniqku=e1ly idenktifik=ed1 in the noiseless case is usually a prerequisite for their accurate estimation. The parameter identifiability problem for DOA estimation was studied in [15] for ULAs and in [16,17] for general arrays. The results in[18–20]arealsocloselyrelatedtothisproblem. Forageneralarray,definetheset A := {a(θ) : θ ∈ D }, (7) θ θ anddefinethesparkofA ,denotedbyspark(A ),asthesmallestnumberofelementsinA thatarelinearly θ θ θ dependent[21]. ForanyM-elementarray,itholdsthat spark(A ) ≤ M +1. (8) θ Note that it is generally difficult to compute spark(A ), except in the ULA case in which spark(A ) = θ θ M +1bythefactthatanyM steeringvectorsinA arelinearlyindependent. θ Thepaper[16]showedthatanyK sourcescanbeuniquelyidentifiedfromY = A(θ)S providedthat spark(A )−1+rank(S) θ K < . (9) 2 NotethattheaboveconditioncannotbeeasilyusedinpracticesinceitrequiresknowledgeonS. Toresolve thisproblem,itwasshownin[19]thattheconditionin(9)isequivalentto spark(A )−1+rank(Y) θ K < . (10) 2 Moreover,theconditionin(9)or(10)isalsonecessary[19]. Combiningtheseresults,wehavethefollowing theorem. Theorem2.1. AnyK sourcescanbeuniquelyidentifiedfromY = A(θ)S ifandonlyiftheconditionin (10)holds. Theorem 2.1 provides a necessary and sufficient condition for unique identifiability of the parameters. Inthesinglesnapshotcase,theconditionin(10)reducesto spark(A ) θ K < . (11) 2 6 Therefore,Theorem2.1impliesthatmoresourcescanbedeterminedifmoresnapshotsarecollected,except inthetrivialcasewhenthesourcesignalsareidenticaluptoscalingfactors. IntheULAcase,thecondition in(10)canbesimplifiedas M +rank(Y) K < . (12) 2 Usingtheinequalityrank(S) ≤ K and(8),theconditionin(9)or(10)impliesthat K < spark(A )−1 ≤ M. (13) θ Theorem2.1specifiestheconditionrequiredtoguaranteeuniqueidentifiabilityforanyKsourcesignals. It was shown in [16] that if {θ } are fixed and S is randomly drawn from some absolutely continuous k distribution,thentheK sourcescanbeuniquelyidentifiedwithprobabilityone,providedthat 2rank(S) K < (spark(A )−1). (14) θ 2rank(S)+1 Moreover,thefollowingcondition,whichisslightlydifferentfromthatin(14),isnecessary: 2rank(S) K ≤ (spark(A )−1). (15) θ 2rank(S)+1 Theconditionin(14)isweakerthanthatin(9)or(10). Asanexample,inthesinglesnapshotcase,theupper boundsonK in(10)and(14)areapproximately 1spark(A )and 2spark(A ), respectively. However, the 2 θ 3 θ paper [17] pointed out that the condition in (14) has a relatively limited practical relevance in finite-SNR applications, since under (14), with a strictly positive probability, false DOA estimates far from the true DOAsmayoccur. 3 Sparse Representation and DOA estimation In this section we will introduce the basics of sparse representation that has been an active research topic especially in the past decade. More importantly, we will discuss its connections to and the key differences fromDOAestimation. 3.1 SparseRepresentationandCompressedSensing 3.1.1 ProblemFormulation We first introduce the topic of sparse representation and the closely related concept of compressed sens- ing that have found broad applications in image, audio and signal processing, communications, medical imaging,andcomputationalbiology,tonamejustafew(see,e.g.,thevariousspecialissuesinseveraljour- nals[22–25]). Lety ∈ CM bethesignalthatweobserve. Wewanttosparselyrepresentyviathefollowing model: y = Ax+e, (16) where A ∈ CM×N is a given matrix, with M (cid:28) N, that is referred as a dictionary and whose columns are called atoms, x ∈ CN is a sparse coefficient vector (note that the notation N is reserved for later use), and e accounts for the representation error. By sparsity we mean that only a few entries, say K (cid:28) N, of x are nonzero and the rest are zero. This together with (16) implies that y can be well approximated by a linear combination of K atoms in A. The underlying motivation for the sparse representation is that even though the observed data y lies in a high-dimensional space, it can actually be well approximated in some 7 lower-dimensionalsubspace(K < M). Giveny andA,theproblemofsparserepresentationistofindthe sparsevectorxsubjecttodataconsistency. The concept of sparse representation was later extended within the framework of compressed sensing [12–14]. In compressed sensing, x is the sparse signal of interest that is acquired via the underdetermined linearsystemin(16). GivenM (cid:28) N,theacquiredsignaly isreferredtoasthecompressivedata,Aisthe given sensing matrix, and e denotes the measurement noise. Given y and A, the problem of sparse signal recovery in compressed sensing is also to solve for the sparse vector x subject to data consistency. With no rise for ambiguity we will not distinguish between the terminologies used for sparse representation and compressedsensing,asthesetwoproblemsareverymuchalike. Tosolveforthesparsesignal, intuitively, weshouldfindthesparsestsolution. Intheabsenceofnoise, therefore,weshouldsolvethefollowingoptimizationproblem: min(cid:107)x(cid:107) subjecttoy = Ax, (17) 0 x where (cid:107)x(cid:107) := #{j : x (cid:54)= 0} counts the nonzero entries of x and is referred to as the (cid:96) (pseudo-)norm 0 j 0 orthesparsityofx. ViewAasthesetofitscolumnvectors,anddefineitsspark,denotedbyspark(A)as inSubsection2.3. Itcanbeshownthatthetruesparsesignalxcanbeuniquelydeterminedby(17)ifxhas asparsityof spark(A) K < . (18) 2 To see this, suppose there exists x(cid:48) of sparsity K(cid:48) ≤ K satisfying y = Ax(cid:48) as well. Then it holds that A(x−x(cid:48)) = 0. Sincex−x(cid:48) hasasparsityofatmostK+K(cid:48) ≤ 2K < spark(A),whichholdsfollowing (18), it can be concluded that x − x(cid:48) = 0 and thus x = x(cid:48) since any spark(A) − 1 columns of A are linearlyindependent. Itisinterestingtonotethattheconditionin(18)isverysimilartothatin(11)required toguaranteeidentifiabilityforDOAestimationinthesinglesnapshotcase. Unfortunately, the (cid:96) optimization problem in (17) is NP hard to solve. Therefore, more efficient ap- 0 proachesareneeded. Wenotethatmanymethodsandalgorithmshavebeenproposedforsparsesignalrecov- ery,e.g.,convexrelaxationor(cid:96) optimization[10,11],(cid:96) ,0 < q < 1(pseudo-)normoptimization[26–32], 1 q greedy algorithms such as orthogonal matching pursuit (OMP), compressive sampling matching pursuit (CoSaMP)andsubspacepursuit(SP)[33–38],iterativehardthresholding(IHT)[39],maximumlikelihood estimation (MLE), etc. Readers can consult [40] for a review. Here we introduce convex relaxation, (cid:96) q optimizationandMLEintheensuingsubsections. 3.1.2 ConvexRelaxation Thefirstpracticalapproachtosparsesignalrecoverythatwewillintroduceisbasedontheconvexrelaxation, whichreplacesthe(cid:96) normbyitstightestconvexrelaxation—the(cid:96) norm. Therefore,wesolvethefollowing 0 1 optimizationprobleminlieuof(17): min(cid:107)x(cid:107) , subjecttoy = Ax, (19) 1 which is sometimes referred to as basis pursuit (BP) [10]. Since the (cid:96) norm is convex, (19) can be solved 1 in a polynomial time. In fact, the use of (cid:96) optimization for obtaining a sparse solution dates back to the 1 paper[41]aboutseismicdatarecovery. WhiletheBPwasempiricallyobservedtogivegoodperformance, arigorousanalysishadnotbeenprovidedfordecades. To introduce the existing theoretical guarantees for the BP in (19), we define a metric of the matrix A calledmutualcoherencethatquantifiesthecorrelationsbetweentheatomsinA[11]. 8 Definition3.1. ThemutualcoherenceofamatrixA,µ(A),isthelargestabsolutecorrelationbetweenany twocolumnsofA,i.e., |(cid:104)a , a (cid:105)| i j µ(A) = max , (20) i(cid:54)=j (cid:107)ai(cid:107)2(cid:107)aj(cid:107)2 where(cid:104)·,·(cid:105)denotestheinnerproduct. Intuitively, if two atoms in A are highly correlated, then it will be difficult to distinguish their contri- butions to the measurements y. In the extreme case when two atoms are completely coherent, it will be impossibletodistinguishtheircontributionsandthusimpossibletorecoverthesparsesignalx. Therefore, toguaranteesuccessfulsignalrecovery,themutualcoherenceµ(A)shouldbesmall. Thisistrue,according tothefollowingtheorem. Theorem 3.1 ( [11]). Assume that (cid:107)x(cid:107) ≤ K for the true signal x and µ < 1 . Then, x is the unique 0 2K−1 solutionofthe(cid:96) optimizationandtheBPproblem. 0 Anothertheoreticalguaranteeisbasedontherestrictedisometryproperty(RIP)thatquantifiesthecor- relations of the atoms in A in a different manner and has been popular in the development of compressed sensing. Definition 3.2 ( [42]). The K-restricted isometry constant (RIC) of a matrix A, δ (A), is the smallest K numbersuchthattheinequality (1−δ (A))(cid:107)v(cid:107)2 ≤ (cid:107)Av(cid:107)2 ≤ (1+δ (A))(cid:107)v(cid:107)2 K 2 2 K 2 holdsforallK-sparsevectorsv. AissaidtosatisfytheK-RIPwithconstantδ (A)ifδ (A) < 1. K K Thefollowingtheoreticalguaranteeisprovidedin[43]. √ Theorem3.2([43]). Assumethat(cid:107)x(cid:107) ≤ K forthetruesignalxandδ < 2−1. Thenxistheunique 0 2K solutionofthe(cid:96) optimizationandtheBPproblem. 0 After the work [43], the RIP condition has been improved, e.g., to δ < 3√ [44]. Other types of 2K 4+ 6 RIPconditionsarealsoavailable,e.g.,δ < 0.307in[45]. Itisknownthatstrongerresultscanbeprovided K byusingRIPascomparedtothemutualcoherence. Butitisworthnotingthat,unlikethemutualcoherence that can be easily computed given the matrix A, the complexity of computing the RIC of A may increase dramaticallywiththesparsityK. Inthepresenceofnoisewecansolvethefollowingregularizedoptimizationproblem,usuallyknownas theleastabsoluteshrinkageandselectionoperator(LASSO)[46]: 1 minλ(cid:107)x(cid:107) + (cid:107)Ax−y(cid:107)2, (21) x 1 2 2 whereλ > 0isaregularizationparameter,tobespecified,orthebasispursuitdenoising(BPDN)problem: min(cid:107)x(cid:107) , subjectto (cid:107)Ax−y(cid:107) ≤ η, (22) 1 2 x whereη ≥ (cid:107)e(cid:107) isanupperboundonthenoiseenergy. Notethat(21)and(22)areequivalentforappropriate 2 choices of λ and η, and that both degenerate to BP in the noiseless case by letting η,λ → 0. Under RIP conditionssimilartotheaboveones,ithasbeenshownthatthesparsesignalxcanbestablyreconstructed withthereconstructionerrorbeingproportionaltothenoiselevel[43]. Besides (21) and (22), another (cid:96) optimization method for sparse recovery is the so-called square-root 1 LASSO[47]: minτ (cid:107)x(cid:107) +(cid:107)y−Ax(cid:107) , (23) 1 2 x 9 whereτ > 0isaregularizationparameter. ComparedtotheLASSO,forwhichthenoiseisusuallyassumed to be Gaussian and the regularization parameter λ is chosen proportional to the standard deviation of the noise,SR-LASSOrequiresaweakerassumptiononthenoisedistributionandτ canbechosenasaconstant thatisindependentofthenoiselevel[47]. The(cid:96) optimizationproblemsin(19),(21),(22)and(23)areconvexandareguaranteedtobesolvablein 1 apolynomialtime;however,itisnoteasytoefficientlysolvetheminthecasewhentheproblemdimensionis highsincethe(cid:96) normisnotasmoothfunction. Significantprogresshasbeenmadeoverthepastdecadeto 1 acceleratethecomputation. Examplesinclude(cid:96) -magic[48],interior-pointmethod[49],conjugategradient 1 method [50], fixed-point continuation [51], Nesterov’s smoothing technique with continuation (NESTA) [52,53],ONE-L1algorithms[54],alternatingdirectionmethodofmultipliers(ADMM)[55,56]andsoon. 3.1.3 (cid:96) Optimization q Foravectorx,the(cid:96) ,0 < q < 1(pseudo-)normisdefinedas: q (cid:32) (cid:33)1 (cid:88) q (cid:107)x(cid:107) := |x |q , (24) q n n which is a nonconvex relaxation of the (cid:96) norm. Compared to (17) and (19), in the noiseless case, the (cid:96) 0 q optimizationproblemisgivenby: min(cid:107)x(cid:107)q, subjecttoy = Ax, (25) q x where (cid:107)x(cid:107)q, instead of (cid:107)x(cid:107) , is used for the convenience of algorithm development. Since the (cid:96) norm is q q q a closer approximation to the (cid:96) norm, compared to the (cid:96) norm, it is expected that the (cid:96) optimization in 0 1 q (25) results in better performance than the BP. This is true, according to [29,31]. Indeed, (cid:96) , 0 < q < 1 q optimizationcanexactlydeterminethetruesparsesignalunderweakerRIPconditionsthanthatfortheBP. Note that the results above are applicable to the globally optimal solution to (25), whereas we can only guaranteeconvergencetoalocallyoptimalsolutioninpractice. A well-known algorithm for (cid:96) optimization is the focal underdetermined system solver (FOCUSS) q [26,27]. FOCUSS is an iterative reweighted least squares method. In each iteration, FOCUSS solves the followingweightedleastsquaresproblem: (cid:88) min w |x |2, subjecttoAx = y, (26) n n x n where the weight coefficients w := |x |q−2 are updated using the latest solution x. Note that (26) can be n n solvedinclosedformandhenceaniterativealgorithmcanbeimplementedwithaproperinitialization. This algorithmcanbeinterpretedasamajorization-minimization(MM)algorithmthatisguaranteedtoconverge toalocalminimum. Inthepresenceofnoise,thefollowingregularizedproblemisconsideredinlieuof(21): 1 minλ(cid:107)x(cid:107)q + (cid:107)Ax−y(cid:107)2, (27) x q 2 2 whereλ > 0isaregularizationparameter. AregularizedFOCUSSalgorithmfor(27)wasdevelopedin[28] byusingthesamemainideaasinFOCUSS.Adifficultproblemregarding(27)isthechoiceoftheparameter λ. Although several heuristic methods for tuning this parameter were introduced in [28], to the best of our knowledgetherehavebeennotheoreticalresultsonthisaspect. 10
Description: