ebook img

Cadzow Denoising Upgraded PDF

31 Pages·2017·0.84 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Cadzow Denoising Upgraded

Cadzow Denoising Upgraded: A New Projection Method for the Recovery of Dirac Pulses from Noisy Linear Measurements Laurent Condat, Akira Hirabayashi To cite this version: Laurent Condat, Akira Hirabayashi. Cadzow Denoising Upgraded: A New Projection Method for the Recovery of Dirac Pulses from Noisy Linear Measurements. 2014. ￿hal-00759253v5￿ HAL Id: hal-00759253 https://hal.archives-ouvertes.fr/hal-00759253v5 Preprint submitted on 18 Mar 2014 (v5), last revised 20 Dec 2014 (v6) HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Cadzow Denoising Upgraded: A New Projection Method for the Recovery of Dirac Pulses from Noisy Linear Measurements Laurent Condat GIPSA-lab, CNRS & University of Grenoble St Martin d’H`eres, F-38402,France [email protected] Akira Hirabayashi College of Information Science and Engineering, Ritsumeikan University Shiga, 525-8577,Japan [email protected] March 18, 2014 Abstract WeconsidertherecoveryofafinitestreamofDiracpulsesatnonuniform locations, from noisy lowpass-filtered samples. We show that maximum- likelihood estimation of the unknown parameters amounts to a difficult, evenbelievedNP-hard,matrix problemofstructuredlow rankapproxima- tion. To solve it, we propose a new heuristic iterative algorithm, based on a recently proposed splitting method for convex nonsmooth optimization. Althoughthealgorithmcomes,inabsenceofconvexity,withnoconvergence proof, it converges in practice to a local solution, and even to the global solution of the problem, when the noise level is not too high. Thus, our method improves upon the classical Cadzow denoising method, for same ease of implementation and speed. Key words and phrases : Diracpulses,spiketrain,finite rateofinnovation, super-resolution, sparse recovery, structured low rank approximation, al- ternating projections, Cadzow denoising 2010 AMS Mathematics Subject Classification — 94A20, 94A12, 62M15 1 Introduction and Problem Formulation Reconstruction of signals lying in linear spaces, including bandlimited signals and splines, has long been the dominant paradigm in sampling theory, rooted 2 L. CONDATAND A.HIRABAYASHI in Shannon’s work [97]. Recently, analog reconstruction from discrete samples has been enlarged to a broader class of signals, with so-called finite rate of innovation, i.e. ruled by parsimonious models [12,67,98,101]. This emerging theory predates and parallels the framework of sparse recovery and compressed sensing [89]. In this paper, we focus on the prototypical problem of estimating a finite stream of Dirac pulses, a.k.a. spike train, from uniform, noisy, lowpass-filtered samples [12,36,50,60,95,101]. More precisely, the sought-after unknown signal sˇconsists of K Dirac pulses in the finite interval [0,τ[, where the real τ > 0 and the integer K ≥ 1 are known; that is, K sˇ(t)= aˇ δ(t−tˇ ), ∀t∈ [0,τ[, (1) k k k=1 X where δ(t) is the Dirac mass distribution, {tˇ }K are the unknown distinct k k=1 locations in [0,τ[, and {aˇ }K are the unknown real nonzero amplitudes1. The k k=1 goal is to obtain estimates of these 2K values, which forms a deterministic (non- Bayesian) parametric estimation problem. The available data consists of linear, uniform, noisy measurements {v }N−1 on sˇ, of the form n n=0 τ nτ v = sˇ(t)ϕ −t dt+ε (2) n n N Z0 (cid:16) (cid:17) K nτ = aˇ ϕ −tˇ +ε , ∀n= 0,...,N −1, (3) k k n N Xk=1 (cid:16) (cid:17) where ϕ(t) is the sampling function and the ε ∼ N(0,σ2) are independent n random realizations of Gaussian noise. Note that other noise models could be considered as well, by changing the cost function in eqns. (5), (10), (18) below. ThequestionsofthechoiceofthefunctionϕandofthenumberN ofmeasure- ments allowing perfect reconstruction, in absence of noise, has been addressed in the literature [9,28,36,95]. In a nutshell, the condition N ≥ 2K +1, which we hereafter assume to be true, is necessary and sufficient, provided that ϕ satisfies some constraints in Fourier domain. Additionally, we assume, without loss of generality and only to simplify the notations, that N is odd, of the form N = 2M + 1. Since the emphasis in this paper is on appropriately handling the presence of noise and not on being the most general, we adopt the simplest choice of the Dirichlet sampling function [95], which amounts to periodizing the signal s on the real line before sampling it with the sinc function: M sin(Nπt/τ) 1 ϕ(t) = = ej2πmt/τ, ∀t ∈ R. (4) N sin(πt/τ) N m=−M X 1Thechecksymbolˇisusedtodistinguishthetrueparametervaluesfrom otherquantities, like theirestimates denoted with a tilde˜. Cadzow Denoising Upgraded: A New Projection Method for theRecovery of Dirac Pulses 3 The extension of the setting to the reconstruction of pulses with real shape, instead of the ideal Dirac distribution, is of practical interest for a wide range of applications, including ultrawideband communications [67] and the detection of impulsive signals in biomedical applications [95]. This generalization, or equiva- lently the choice of another sampling function ϕ, can be addressed without any difficulty, as described in details in [95]; therefore, to keep the derivations and notations simple, we do not consider this general case in the paper. We note that the considered sparse deconvolution problem is sometimes called super- resolution [4,18], becauseit consists in locating thepulseswith precision beyond the resolution limit of τ/N suggested by the classical sampling theory. The paper is organized as follows. In Sect. 2, we formulate the maximum likelihood estimation problem and in Sect. 3, we show that it amounts to a low rank matrix approximation problem. The new algorithm to solve it is presented in Sect. 4. Experimental results illustrate the effectiveness of the approach in Sect. 5. 2 Maximum Likelihood Estimation of the Parame- ters A natural approach to solve parametric estimation problem is maximum likeli- hood (ML) estimation; it consists in selecting the model which is the most likely to explain the observed noisy data. Since we have assumed Gaussian noise, this corresponds to solving the nonlinear least-squares problem [41,42]: 2 N−1 K nτ Find (˜t,a˜) = argmin v − a ϕ −t . (5) n k k t∈[0,τ[K,a∈RK nX=0(cid:12)(cid:12) Xk=1 (cid:16)N (cid:17)(cid:12)(cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) We supposein the following that thesolution to this problem is unique, that the obtained amplitudes a˜ = {a˜ }K are nonzero, and that the obtained locations k k=1 ˜t = {t˜ }K aredistinct. Thisis thecase almost surely, with respecttothe noise k k=1 randomness. Now, applying the discrete Fourier transform to the vector of measurements yields the Fourier coefficients defined by N−1 vˆ = v e−j2πmn/N, ∀m = −M,...,M. (6) m n n=0 X We define the Fourier coefficients {εˆ }M similarly. Combining (3) and (4), m m=−M 4 L. CONDATAND A.HIRABAYASHI we get, for every n = 0,...,N −1, K M 1 v −ε = aˇ ej2πm(n/N−tˇk/τ) (7) n n k N k=1 m=−M X X M K 1 = ej2πmn/N aˇ e−j2πmtˇk/τ . (8) k N ! m=−M k=1 X X We recognize the form of the inverse discrete Fourier transform. Thus, by iden- tification, we obtain K vˆ = aˇ e−j2πmtˇk/τ +εˆ , ∀m= −M,...,M. (9) m k m k=1 X Since the inverse discrete Fourier transform is unitary, up to a constant, the problem (9) can be rewritten as [42]: Maximum likelihood (ML) estimation problem 2 M K Find (˜t,a˜) = argmin vˆm− ake−j2πmtτk . (10) t∈[0,τ[K,a∈RKmX=−M(cid:12)(cid:12) Xk=1 (cid:12)(cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) We remark that (10) takes the form of a spectral estimation problem, which consists in retrieving the parameters of a sum of complex exponentials or sinu- soidsfromnoisysamples[84–86]. Thisclassical problem,whichhasawiderange of applications, e.g. in communications, radar, sonar, and geophysical seismol- ogy [86], has been studied extensively [15,20,59,70,83,87,94]. When N ≫ K and the locations t are not too close to each other, classical spectral estimation k techniques like MUSIC [78] and ESPRIT [77], or greedy strategies [69], can be used; they are fast butstatistically suboptimal. Also, we can mention recent de- velopments inspiredbythetheoryofcompressedsensing,aimingatspectralesti- mation from random samples [14,30,44,54,92]; this is notably different from our setting, where the Fourier coefficients are successive and not chosen randomly. In this work, we investigate the ML estimation problem (10) in its whole gener- ality, without any simplifying assumption. The optimal statistical properties of ML estimation for spectral estimation are well known [15,22,56,71,87,88,94]. However, solving the problem(10) exactly is acomputationally intractable task, since the cost function has a multimodal shape with a number of local minima of order of magnitude NK [68] and a narrow trough around the global mini- mum [21,37,51,72,74], see the example of Fig. 1. Thus, algebraic [29,68] or stochastic [33,41,42,90] approaches can be used, but only for small values of K and N, due to the combinatorial nature of the problem. Several methods have Cadzow Denoising Upgraded: A New Projection Method for theRecovery of Dirac Pulses 5 beenproposedtofindalocalminimumofthecostfunctionin(10)[48,55,96,102]. These approaches proceed in two steps: 1) a method is used to obtain a good initialestimate; 2)thisestimateisrefinediteratively andconverges totheclosest local minimizer of the nonconvex cost function in (10). Thus, the quality of the method used for the first step is crucial. A major advantage of the approach developed in the next section is to get rid of this initialization problem. 3 The Annihilation Property: Reformulation of eqn. (10) as a Matrix Denoising Problem Let us assume temporarily that there is no noise, i.e. εˆ = 0 in (9). Then, m the sequence of Fourier coefficients {vˆ }M satisfies a linear annihilating m m=−M difference equation [11], a known property which dates back to de Prony’s work intheeighteenthcentury[73]. Thatis,theconvolutionofthesequenceofFourier coefficients with the annihilating filter {h }K is identically zero: k k=0 K h vˆ = 0, ∀m= −M +K,...,M, (11) k m−k k=0 X with the (reversed) Z-transform of the annihilating filter defined, up to a con- stant, as K K H(z−1)= h zk = (z−ej2πtˇk/τ). (12) k k=0 k=1 X Y In matrix form, the annihilation property is vˆ ··· vˆ −M+K −M  ... ... ...  h.0 0.  ... ... ...  h..  =  0.. . (13)   K  vˆM ··· vˆM−K       T K | {z } Let us define, for any integer P with K ≤ P ≤ M, the Toeplitz—i.e. with constant values along its diagonals—matrix T , of size N−P×P+1, obtained P by arranging the values {vˆ }M in its first row and column; T is depicted m m=−M P in (19) and in (13) for P = K. We define the linear operator toeplitz which P maps the vector vˆ = [vˆ ··· vˆ ]T to T , i.e. T = toeplitz (vˆ). Then, −M M P P P the existence of a nonzero annihilating filter of size K + 1 for the sequence {vˆ }M is completely equivalent to the property that T has rank at most m m=−M P K; that is, rank(toeplitz (vˆ)) ≤ K. P 6 L. CONDATAND A.HIRABAYASHI Now, we turn back to the case when noise is present in the data. Because of noise, the matrix T = toeplitz (vˆ) has full rank. Let us define the vec- P P tor v˜ˆ = [v˜ˆ ··· v˜ˆ ]T from the parameters {t˜ }K and {a˜ }K solution to −M M k k=1 k k=1 (10): v˜m = Kk=1a˜ke−j2πmt˜τk. Then, the annihilating property holds for these denoised Fourier coefficients, so that rank(toeplitz (v˜ˆ)) ≤ K. Therefore, the P P ML estimation problem (10) can be rewritten as: Find v˜ˆ = argminkvˆ′−vˆk2 s.t. rank(toeplitz (vˆ′)) ≤ K. (14) P vˆ′∈CN So, theprocess amounts to denoise the vector vˆ by projecting it on the manifold defined by the composition of the low rank and Toeplitz properties, but this formulation does not yield a constructive method. To circumvent this difficulty, one can further rewrite the problem, so that the unknown becomes the matrix T = toeplitz (v˜ˆ). For this, let us define the weighted Frobenius norm of a P P matrix A = {a }∈ CN−P×P+1 by i,j e N−PP+1 kAk2 = w |a |2, (15) w i,j i,j i=1 j=1 X X where w is the inverse of the size of the diagonal going through the position i,j (i,j): 1/(i−j +P +1) if i−j ≤ 0, w = 1/(P +1) if 1 ≤ i−j ≤ N −2P −1, (16) i,j  1/(j −i+N −P) if i−j ≥ N −2P.  In fact,  kv˜ˆ −vˆk2 = ktoeplitz (v˜ˆ)−toeplitz (vˆ)k2 = kT −T k2. (17) P P w P P w Hence, we can reformulate the estimation problem (10), or equivalently (14), e as the following matrix approximation problem, named structured low rank ap- proximation (SLRA) in the literature [38,61,64]: Structured low rank approximation (SLRA) problem Find T ∈ argmin kT−T k2 (18) P P w T∈CN−P×P+1 s.et. T is Toeplitz and rank(T) ≤ K, for some chosen integer P ∈ K,...,M. The SLRA problem (18), which consists in projecting a matrix in the inter- section of a linear subspace and a nonconvex manifold, is believed to be NP- hard [39,64,81]. So, at first glance, we just have replaced the difficult problem (10) by the SLRA problem of same difficulty. However, the parametric problem Cadzow Denoising Upgraded: A New Projection Method for theRecovery of Dirac Pulses 7 (10) has been reformulated as a matrix denoising problem: the matrix T , or P equivalently the data {v }N−1, is denoised by finding the closest matrix consis- n n=0 tent with the model’s structure. The main advantage is that the initialization problem disappears: an iterative algorithm to solve theSLRAproblem proceeds directly, with the noisy matrix T as initial estimate of the solution T . More- P P over, for a low noise level, an algorithm converging to a local solution will find the global solution T , as T , T and the true noiseless matrix will ceorrespond P P P to the same catchment area of the cost function in (18). Since the desireed parameteres {t˜ }K and {a˜ }K are somehow encoded in k k=1 k k=1 the matrices, we have to describe the extraction process. The whole reconstruc- tion procedure, also given in [12], is detailed in Fig. 2. In absence of noise, it yields perfect reconstruction of the parameters. We note that the estimation of the locations is decoupled from that of the amplitudes; since finding the ampli- tudes given the locations is a simple linear regression problem, the emphasis is on the estimation quality of the locations. Wemustremarkthattheestimates {(t˜ ,a˜ )}K ,obtainedbytheprocedure, k k k=1 coincide with the ML estimates, solution to (10), only if the roots {z˜ }K are k k=1 distinct and all on the complex unit circle. This is guaranteed to be the case if the noise level kεk is below some threshold, which depends on the separation min |tˇ − tˇ |. Indeed, the matrices T and T are centro-Hermitian, k16=k2 k1 k2 P P i.e. their entries satisfy vˆ = vˆ∗ and v˜ˆ = v˜ˆ∗ , for every m in −M,...,M, −m m −m m where·∗ denotescomplexconjugation. Consequently, tehepolynomial K h˜ zk k=0 k is self-inversive [82], so that its roots {z˜ }K are either on the complex unit k k=1 P circle or come by pairs with same complex phase and opposite amplitudes. The perturbation due to noise on the coefficients vˆ yields a proportionally bounded m perturbation on the estimated roots z˜k = ej2πt˜k/τ, with respect to the true roots zˇk = ej2πtˇk/τ, but the roots remain on the complex unit circle. Only if the perturbationislargeenough,twodistinctroots(z˜ ,z˜ )willpossiblymergeand k1 k2 then split in a pair (z˜ ,z˜ = 1/z˜∗ ) on both sides of the unit circle, yielding k1 k2 k1 t˜ = t˜ . k1 k2 3.1 State of the Art for Structured Low Rank Approximation (SLRA) The problem of reconstructing a low-rank matrix from limited, noisy, linear measurements arises in many areas of engineering and applied sciences, such as signal processing, machine learning, control, and computer vision. The wide range of applications includes low-order system identification, dynamic MRI, quantum state tomography, phase retrieval, and global positioning from local distances, to mention a few; see references in [35,61,64]. In many of these problems, the matrix to be recovered is constrained to have some structure, like beingToeplitz, Hankel, Sylvester, orpositive definite. TheSLRAproblemshave 8 L. CONDATAND A.HIRABAYASHI been studied in the literature under different names; they are equivalent or have tight connections with constrained or structured total least-squares, total least- norm, and errors-in-variables [27,47,61,64,65,75,99,100]. Several methods have beenproposed,inthecommunity ofnumericalalgebra, toobtain alocalsolution of a SLRA problem, using high-level optimization routines [13,21,43,47,49,52, 79,80]. For instance, the iterative approach in [21] uses the Matlab function fminunc, which is a BFGS quasi-Newton method with line search. Besides the difficulty of implementation, the algorithm is very costly, since many SVDs are required at each iteration. So, to our knowledge, the only efficient publicly available software package for SLRA is the one currently in development by I. Markovsky [66]. However, itonlyhandlesreal-valued data, whereasthematrices in (18) are complex-valued with centro-Hermitian symmetry. Due to the complexity of dedicated solvers for SLRA, a simple alternative is to use the heuristic method of Cadzow denoising [16,17], a.k.a. alternating projections. This method is promoted in [12,95] for the reconstruction of Dirac pulses, and represents the state of the art for this problem. To describe the method, let us introduce some notations. • We place ourselves in the real Hilbert space H = CN−P×P+1 of complex- valued matrices of size N −P ×P +1, having centro-Hermitian symme- try. The (Frobenius) inner product between two matrices X, X′ of H is hX,X′i = x x′∗ ∈ R. i,j i,j i,j • We denotePby P the projection onto a subset Ω of H, which maps an Ω element of H to the closest element in Ω, with respect to the Frobenius norm. • We define T, the linear subspace of H of Toeplitz matrices. P consists in T averaging along the diagonals; that is, P maps a matrix X with entries T {x }j=1,...,P+1 to a matrix X′ with entries i,j i=1,...,N−P 1 i−j+P+1x if i−j ≤ 0, i−j+P+1 k=1 k,k+j−i x′ = 1 P+1x if 1 ≤ i−j ≤ N −2P −1, (24) i,j  P+1 k=P1 k+i−j,k  j−i+1NP−P jk−=i1+N−P xk+i−j,k if i−j ≥ N −2P.  P The computational cost of this operation is O(P(N −P)). • We define R , the closed butnonconvex manifold of matrices of H having K rank at most K. P consists in truncating the SVD of the matrix, ac- R K cording to the classical Schmidt–Eckart–Young theorem [31] [40, theorem 2.5.3]: if a matrix X has SVD X = LΣRH, then P (X) is obtained R K by setting to zero all except the K largest diagonal elements of Σ, which are the singular values of X. For a square or close to square matrix, i.e. P ∼N/2, the computational cost of the SVD is O(N3). Cadzow Denoising Upgraded: A New Projection Method for theRecovery of Dirac Pulses 9 Then, Cadzow denoising simply consists in applying alternately P and P to R T K the matrix T : P (0) Cadzow denoising algorithm. SetT = T ,depictedin(19). Theniterate, P P for every l ≥ 0, (l+1) (l) T = P P (T ) . P T RK P (cid:12) (cid:12) (cid:0) (cid:1) (cid:12)This algorithm seems to always converge in practice to a matrix T ∈ T ∩R , P K i.e. a Toeplitz matrix of rank at most K. But contrary to a common belief, this convergencehasnotbeenproved[8,24,46,53], althoughsomepartiealconvergence results have been derived recently [1,2,58]. Moreover, even if Cadzow denoising converges to a Toeplitz matrix of rank at most K, this matrix is not a local minimizer of the Frobenius distance or of the weighted Frobenius distance k· −T k in the SLRA problem (18), as shown by examples in [21,27,37]. In the P w next section, we propose a new algorithm to compute a local solution of the SLRA problem, which thus improves uponCadzow denoising, for essentially the same complexity of one SVD per iteration. 4 A New Optimization Algorithm for SLRA Before addressing the particular problem (18), let us consider the generic opti- mization problem: Find x˜ ∈ argminF(x) s.t. x ∈ Ω ∩Ω , (25) 1 2 x∈H where H is a real Hilbert space of finite dimension, Ω and Ω are two closed 1 2 subsets of H, and F : H → R is a convex and differentiable function with β- Lipschitz continuous gradient, for some β > 0; that is, k∇F(x) −∇F(x′)k ≤ βkx−x′k, ∀x,x′ ∈ H. This problem can be solved by the following algorithm: Optimization algorithm. Choose the parameters µ > 0, γ ∈ ]0,1[, and the initial estimates x(0),s(0) ∈H. Then iterate, for every l ≥ 0, x(l+1) = P s(l)+γ(x(l) −s(l))−µ∇F(x(l)) , Ω1 s(l+1) = s(l)−x(l+1) +P (2x(l+1) −s(l)). (cid:12) (cid:0) Ω2 (cid:1) (cid:12) (cid:12) T(cid:12) his algorithm is a particular instance of a more general proximal splitting algorithm derived recently by the first author [25]. Consequently, as corollaries of the results derived in [25], we get the following convergence results. Theorem 1. In (25), suppose that (i) a solution exists; (ii) the sets Ω and Ω 1 2 are convex; (iii) ri(Ω )∩ri(Ω ) 6= ∅, where ri denotes the relative interior [6]. In 1 2 the algorithm, suppose that (iv) 2γ > βµ. Then, the sequence (x(l))l∈N converges to some element x˜ solution to the problem (25).

Description:
Akira Hirabayashi. College of Information Science and Engineering, Ritsumeikan University. Shiga, 525-8577, Japan [email protected]. March 18 Key words and phrases : Dirac pulses, spike train, finite rate of innovation, The results obtained with this Matlab code3 are illustrated.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.