Fixed-point algorithms for frequency estimation and structured low rank approximation Fredrik Andersson and Marcus Carlsson Center for Mathematical Sciences, Lund University, Box 118, 22100 Lund, Sweden Keywords: Fixed-point algorithms, Frequency estimation. General domain Hankel matrices, Fenchel conjugate, Convex envelope. 6 Mathematics Subject Classification: 15B05, 65K10, 41A29, 41A63. 1 0 Abstract 2 Wedevelopfixed-pointalgorithmsfortheapproximationofstructuredmatriceswithrankpenal- n ties. In particular we use these fixed-point algorithms for making approximations by sums of expo- a J nentials, or frequency estimation. For the basic formulation of the fixed-point algorithm we show thatitconvergestotheminimumoftheconvexenvelopeoftheoriginalobjectivefunctionalongwith 6 its structured matrix constraint. It often happens that this solution agrees with the solution to the ] originalminimizationproblem,andweprovideasimplecriteriumforwhenthisistrue. Wealsopro- A videmoregeneralfixed-pointalgorithmsthatcanbeusedtotreattheproblemsofmakingweighted N approximations by sums of exponentials given equally or unequally spaced sampling. We apply the methodtothecaseofmissingdata,althoughoptimalconvergenceofthefixed-pointalgorithmisnot . h guaranteed in this case. However, it turns out that the method often gives perfect reconstruction at (uptomachineprecision)insuchcases. Wealsodiscussmultidimensionalextensions, andillustrate m how the proposed algorithms can be used to recover sums of exponentials in several variables, but when samples are available only along a curve. [ 1 1 Introduction v 2 4 We consider the non-convex problem 2 1 argmin J (A)=argmin σ2rank(A)+(cid:107)A−F(cid:107)2, (1) 0 F,σ0 0 A∈H A∈H . 1 where A is an M ×N-matrix, H any linear subspace, σ is a parameter and the norm refers to the 0 0 Frobenius norm. The convex envelope of J is then given by 6 F,σ0 1 v: Rσ0(A)+(cid:107)A−F(cid:107)2, (2) i X where (cid:88) (cid:16) (cid:17)2 ar Rσ0(A)= σ02− max(σ0−σj(A),0) , j see [19, 20] for details. We provide a fixed point algorithm which is guaranteed to solve the problem argmin R (A)+q(cid:107)A−F(cid:107)2, (3) τ A∈H for any 1 < q < ∞ and τ > 0. It turns out that the solution to (3) often coincides with the solution √ to the non-convex problem (1) for σ = τ/ q, and we provide a simple condition to verify if this has 0 happened. Tobemoreprecise, belowisacompilationofTheorem5.1andTheorem5.2intheparticular case q =2 for the non-linear map B defined by (23). F,τ,q Theorem 1.1. Given any starting point W1, the Picard iteration Wn+1 = B (Wn) converges to a F,τ,2 fixed point W◦. Moreover, the orthogonal projection P (W◦) is unique and H A◦ =2F −P (W◦), H 1 is the unique solution to (3). Moreover, W◦ has the property that A◦ also solves argmin R (A)+(cid:107)A−W◦(cid:107)2. τ A and if σ (W◦) has no singular values equal to τ, then A◦ is the solution to the non-convex problem (1) j √ with σ =τ/ 2. 0 To provide more flexibility in choosing the objective functional, we also consider the problem of minimizing R (A)+q(cid:107)MA−h(cid:107)2 , A∈H, (4) τ V where M : H → V is a linear operator into any given linear space. We give conditions under which the fixed points of a certain non-linear operator coincide with the global minimum of this problem. We apply these algorithms to the problem of approximation by sparse sums of exponentials. Let f :R→C be thefunction that we wish toapproximate, and assume that itis sampled at points x=x . j The problem can then be phrased as (cid:12) (cid:12)2 (cid:88)J (cid:12)(cid:88)K (cid:12) argmin σ02K+ µj(cid:12)(cid:12) cke2πixjζk −f(xj)(cid:12)(cid:12) , ck,ζk ∈C, µj ∈R+. (5) {ck},{ζk},K j=1 (cid:12)k=1 (cid:12) Here, the parameter σ is a parameter that penalizes the number of exponential functions. The weights 0 µ can be used to control the confidence levels in the samples f(x ). j j Thisapproximationproblemiscloselyrelatedtotheproblemoffrequencyestimation, i.e., thedetec- tionofthe(complex)frequenciesζ above. Oncethefrequenciesareknown,theremainingapproximation k problem becomes linear and easy to solve by means of least squares. The literature on frequency esti- mation is vast. The first approach was made already in 1795 [16]. Notable methods that are commonly usedareESPRIT[25],MUSIC[27]andmatrixpencilmethods[17]. Wereferto[28]foranoverviewfrom a signal processing perspective. From an analysis point of view, the AAK-theory developed in [1, 2, 3] dealswithcloselyrelatedproblems,andapproximationalgorithmsbasedontheseresultsweredeveloped in [9, 14]. A competitive algorithm in the case of fixed K was recently developed in [11]. To approach this problem, we let H denote the set of Hankel matrices, i.e., if A ∈ H, then there is a generating function f such that A(j,k)=f(j+k). Hankel matrices are thus constant along the anti- diagonals. By the Kronecker theorem there is a connection between a sums of K exponential functions and Hankel matrices with rank K, namely that if f is a linear combination of K exponential functions sampled at equally spaced points, then the Hankel matrix generated by these samples will generically have rank K. For more details, see e.g. [4, 5, 18, 23]. In the special case with equally spaced sampling, i.e. x =j, and j (cid:40) j if 1≤j ≤N +1 µ = , (6) j 2N +2−j if N +1<j ≤2N +1 the problem (5) (with F ∈ H being the Hankel matrix generated from f) is equivalent with (2). The triangular weight above comes from counting the number of elements in the Hankel matrix along the anti-diagonals. Theorem 1.1 thus provide approximate solutions to this problem, which often turn out to solve the original problem (5) as well, which in this setting is the equivalent of (1). Clearly,formanypurposesitwouldbemorenaturaltousetheregular(cid:96)2 normintheapproximation, i.e., to use constant values of µ . Also more general setups of weights can be of interest (for instance in j the case of missing data). We show how the fixed-point algorithm designed for (4) can be used in this case. The basic algorithms for frequency estimation (or approximations by sums of exponentials) require that the data is provided (completely) at equally spaced samples. In many situations, this assumption doesnothold(inparticularconcerningsamplingintwoormorevariables). Atraditionalmethoddesigned for computing periodograms for unequally spaced sampling is the Lomb-Scargle method [21, 26]. The results are typically far from satisfactory, cf. [11]. For a review of frequency estimation methods for unequally spaced sampling, see [12]. In this work, we will use similar ideas as in [11] for treating the unequallyspacedsampling. GivenasetofsamplepointsX ={x }, letI beamatrixthatinterpolates j X between equally spaced points and the points x . This means for instance that the rows of I have sum j X one. The action of its adjoint I∗ is sometimes referred to as anterpolation. We study the problem X argmin σ2rank(A)+(cid:107)(f −I (a))(cid:107)2. (7) 0 X A(j,k)=a(j+k) 2 using the algorithm for (4), and we also modify (7) to deal with the corresponding weighted case. Finally, we discuss how the methods can be used in the multidimensional case. Let Ω ⊂ Rd be an open bounded connected set and f a function on Ω which is assumed to be of the form K (cid:88) f(x)= ckeζk·x, (8) k=1 where ζ ∈Cd, ζ ·x=(cid:80)d ζ x and K is a low number. In [4] we introduce a new class of operators k k j=1 k,j j called general domain truncated correlation operators, whose “symbols” are functions on Ω, and prove a Kroncker-type theorem for these operators. In particular it follows that their rank is K if the symbol is of the form (8), and the converse situation is also completely clarified. In [5] it is explained how these operators can be discretized giving rise to a class of structured matrices called “general domain Hankel matrices”, (see Section 7 for more details). Letting H be such a class of structured matrices, it turns out that (3) is a natural setting for solving the problem: Given noisy measurements of f at (possibly unequally spaced) points in Ω, find the function f. The connected problem of finding the values {c } k and {ζ } is addressed e.g. in [6]. k The paper is organized as follows: In Section 2 we provide some basic ingredients for the coming sec- tions,inparticularweintroducethesingularvaluefunctionalcalculusandconnectedLipschitzestimates. Our solution of (3) involves an intermediate more intricate objective functional, which is introduced in Section3, wherewealsoinvestigatethestructureofitsFenchelconjugates. Moreoveroursolutionto(3) first solves a dual problem, which is introduced and studied in Section 4. Consequently the key operator B is introduced in Section 4. At this point, we have the ingredients necessary for addressing the F,τ,q main problem (3). In Section 5.1 we recapitulate our main objective and discuss how various choices of q and τ affect the objective functional and also its relation to the original problem (1). An extended version of Theorem 1.1 is finally given in Section 5.2, where we also present our algorithm for solving (3). The more general version of the algorithm and corresponding theory is given in Section 5.3. The remainder of the paper is devoted to frequency estimation and numerical examples. Section 6 considers the one-dimensional problem (7), whereas the problem of retrieving functions of the form (8) is consid- ered in Section 7. This section also contains a more detailed description of the general domain Hankel matriceswhichunderliesthemethod. NumericalexamplesarefinallygiveninSection8andweendwith a summary of our conclusions in Section 9. 2 Preliminaries 2.1 Fenchel conjugates and convex optimization We recall that the Fenchel conjugate f∗ of a function f is given by f∗(y)=max (cid:104)x,y(cid:105)−f(x). x For positive functions f, f∗ is always convex, and if f is also convex, then f = f∗∗, see Theorem 11.1 in [24]. If f is not convex, then f∗∗ equals the convex envelope of f. In particular, note that we always have f∗ =f∗∗∗. Suppose now that f is a convex function on some finite dimensional linear space V, and let M be a linear subspace. Consider the problem of finding minf(x) (9) subj. to x∈M. If y ∈M⊥ then the value in (9) is clearly larger than or equal to the value minf(x)−(cid:104)y,x(cid:105)=−f∗(y). (10) x Suppose now that we can find a value y◦ (not necessarily unique) which minimizes minf∗(y) (11) subj. to y ∈M⊥. Since f is convex, it is easy to see that the corresponding value in (10) equals that of (9). 3 2.2 The singular value functional calculus Let M,N ∈ N be given and let M denote the Euclidean space of M ×N-matrices, equipped with M,N the Frobenius norm. We will need the singular value functional calculus [10], defined as follows. Let f be a real valued function on R such that f(0)=0 and let A ∈M be a matrix with singular value + M,N decomposition A=Udiag({σ (A)})V∗. We then define j S (A)=Udiag({f(σ (A))})V∗. f j The key property we will use is that (cid:107)S (A)−S (B)(cid:107)≤(cid:107)f(cid:107) (cid:107)A−B(cid:107), (12) f f Lip where (cid:107)f(cid:107) refers to the Lipschitz constant for f. In other words, Lipschitz functions give rise to Lip Lipschitz functional calculus, with the same bound. 3 Convex envelopes For a given matrix A, let T (A)=(cid:88)max(cid:0)σ2(A)−σ2,0(cid:1), σ0 j 0 j and (cid:88) (cid:16) (cid:17)2 R (A)= σ2− max(σ −σ (A),0) . σ0 0 0 j j For a proof of the following theorem, see [19, 20]. Theorem 3.1. Let F be a fixed matrix, and let J (A)=σ2rank(A)+(cid:107)A−F(cid:107)2. (13) F,σ0 0 The Fenchel conjugates of J are then F,σ0 (cid:18) (cid:19) X J∗ (X)=T +F −(cid:107)F(cid:107)2, (14) F,σ0 σ0 2 J∗∗ (A)=R (A)+(cid:107)A−F(cid:107)2. (15) F,σ0 σ0 Let τ >0 and let 1<p,q <∞ be conjugate exponents, i.e. such that 1/p+1/q =1. Theorem 3.2. Let K (A,B)=τ2rank(A)+p(cid:107)(A−B)(cid:107)2+q(cid:107)(B−F)(cid:107)2. F,τ,q It’s Fenchel conjugate is then given by KF∗,τ,q(X,Y)=Tτ(cid:18)X2 − 2Yq +F(cid:19)+(q−1)(cid:13)(cid:13)(cid:13)(cid:13)2Yq −F(cid:13)(cid:13)(cid:13)(cid:13)2−q(cid:107)F(cid:107)2. (16) and its convex envelope (Fenchel bi-conjugate) is given by K ∗∗ (A,B)=R (A)+p(cid:107)A−B(cid:107)2+q(cid:107)B−F(cid:107)2, (17) F,τ,q τ Proof. We write simply K when F,τ,q are clear from the context. Consider the Fenchel conjugate of K K ∗(X,Y)=max(cid:104)A,X(cid:105)+(cid:104)B,Y(cid:105)−K (A,B)= A,B (cid:16) (cid:17) max(cid:104)A,X(cid:105)−τ2rankA+max (cid:104)B,Y(cid:105)−p(cid:107)(A−B)(cid:107)2−q(cid:107)(B−F)(cid:107)2 . A B The right maximum is attained for p q 1 B = A+ F + Y. p+q p+q 2(p+q) 4 After substituting B in the Fenchel conjugate and some algebraic simplifications, we obtain K ∗(X,Y)=max(cid:104)A,X(cid:105)−τ2rankA− pq (cid:13)(cid:13)(cid:13)A−(cid:18)Y +F(cid:19)(cid:13)(cid:13)(cid:13)2+q(cid:13)(cid:13)(cid:13)Y +F(cid:13)(cid:13)(cid:13)2−q(cid:107)F(cid:107)2 A p+q (cid:13) 2q (cid:13) (cid:13)2q (cid:13) =(cid:32)max(cid:104)A,X(cid:105)−τ2rankA−(cid:13)(cid:13)(cid:13)A−(cid:18)Y +F(cid:19)(cid:13)(cid:13)(cid:13)2(cid:33)+q(cid:13)(cid:13)(cid:13)Y +F(cid:13)(cid:13)(cid:13)2−q(cid:107)F(cid:107)2. A (cid:13) 2q (cid:13) (cid:13)2q (cid:13) From (14) we see that the expression inside the first parenthesis can be written as Tτ(cid:18)X2 + 2Yq +F(cid:19)−(cid:13)(cid:13)(cid:13)(cid:13)2Yq +F(cid:13)(cid:13)(cid:13)(cid:13)2, from which (16) follows. Next we study the bi-conjugate K ∗∗(A,B)=max(cid:104)A,X(cid:105)+(cid:104)B,Y(cid:105)−K ∗(X,Y). X,Y To simplify the computations we change coordinates (cid:40) C = X + Y +F 2 2q (18) D = Y +F. 2q for which (cid:40) X =2(C−D) Y =2q(D−F). The bi-conjugate can then be written as K ∗∗(A,B)=max2(cid:104)A,C−D(cid:105)+2q(cid:104)B,D−F(cid:105)−K ∗(2(C−D),2q(D−F)) C,D =max 2(cid:104)A,C−D(cid:105)+2q(cid:104)B,D−F(cid:105)−T (C)−(q−1)(cid:107)D(cid:107)2+q(cid:107)F(cid:107)2 = τ C,D (cid:16) (cid:17) =max((cid:104)A,2C(cid:105)−T (C))+max 2(cid:104)D,qB−A(cid:105)−(q−1)(cid:107)D(cid:107)2 +q(cid:107)F(cid:107)2+2q(cid:104)B,F(cid:105). τ C D The maximization over D is straightforward and it will give a contribution (cid:107)qB−A(cid:107)2 for D = qB−A. q−1 q−1 Hence C = X + qB−A by (18), and 2 q−1 (cid:18) (cid:18) (cid:19)(cid:19) 2 X qB−A K ∗∗(A,B)=max (cid:104)A,X(cid:105)+ (cid:104)A,qB−A(cid:105)−T + X q−1 τ 2 q−1 (cid:107)qB−A(cid:107)2 + +q(cid:107)F(cid:107)2−2q(cid:104)B,F(cid:105) q−1 (cid:16) (cid:17) q−2 =max (cid:104)A,X(cid:105)−J∗ (X) + (cid:107)qB−A(cid:107)2 X qBq−−1A,τ (q−1)2 2 + (cid:104)A,qB−A(cid:105)+q(cid:107)F(cid:107)2−2q(cid:104)B,F(cid:105). q−1 We recognize the maximization part as a Fencel conjugate, and hence q−2 2 K ∗∗(A,B)=J∗∗ (A)+ (cid:107)qB−A(cid:107)2+ (cid:104)A,qB−A(cid:105)+q(cid:107)F(cid:107)2−2q(cid:104)B,F(cid:105) qB−A,τ (q−1)2 q−1 q−1 (cid:13) qB−A(cid:13) q−2 2 =R (A)+(cid:13)A− (cid:13)+ (cid:107)qB−A(cid:107)2+ (cid:104)A,qB−A(cid:105)+q(cid:107)F(cid:107)2−2q(cid:104)B,F(cid:105) τ (cid:13) q−1 (cid:13) (q−1)2 q−1 q =R (A)+ (cid:107)A−B(cid:107)2+q(cid:107)B−F(cid:107)2. τ q−1 5 s < s= 2 s=< < = 2= Figure 1: An illustration of the function s (σ) used in Lemma 4.1 τ,2 4 The dual optimization problem Let H⊂M be any fixed linear subspace. We consider the problem M,N argmin K ∗∗(A,B)=R (A)+p(cid:107)A−B(cid:107)2+q(cid:107)B−F(cid:107)2 τ (19) subj. to A=B,B ∈H, where F ∈ H. Let M be the linear subspace of M ×M defined by the equations A = B and M,N M,N B ∈H. It is easy to see that M⊥ is defined by (X,Y)∈M⊥ ⇔X+Y ∈H⊥. Indeed, ⇐ is obvious and the other part can be established by noting that dimM=dimH whereas the subspace defined by X+Y ∈H⊥ has dimension dimH⊥+dimM , (which thus equals dimM × M,N M,N M −dimM). Motivated by the results in Section 2.1, we now focus on the “dual problem” to (19), M,N i.e. argmin K ∗(X,Y) (20) subj. to X+Y ∈H⊥ Set (cid:18) (cid:19) σ s (σ)=max min(σ,τ), . (21) τ,q q When there is no risk for confusion we will omit the subindices. Recall the singular value functional calculus introduced in Section 2.2. Lemma 4.1. 1 S (A)=argmin T (W)+ (cid:107)W −A(cid:107)2. s τ q−1 W Proof. SinceT (W)dependsonlyonthesingularvaluesofW,von-Neumann’sinequality[29]showsthat τ the solution to the above problem will have the same singular vectors as A, (see the proof of Theorem 3.1 in [10] for more details). Henceforth it will be of the form S (A) where s is defined by s (cid:18) (cid:19) s(σ (A))=min max(cid:0)ω2−τ2,0(cid:1)+ 1 (ω−σ (A))2 , (22) j ω q−1 j and by equating the subgradient of the above expression with respect to ω with zero, it is easily verified that this is solved by the function (21). Let P denote the operator of orthogonal projection onto H. We now introduce the (non-linear) H operator B :M →M which will play a key role in our solution of the original problem (3). F,τ,q M,N M,N 6 Definition 4.2. Set B (W)=S (qF +P (W)), (23) F,τ,q sτ,q H⊥ which we abbreviate B when F,τ,q are clear from the context. Theorem 4.3. There exists a fixed point W◦ of B such that the problem minimize K ∗(X,Y) X,Y (24) subject to X+Y ∈H⊥. is solved by X◦ =2p(P (W◦)−F)+2P (W◦) H H⊥ (25) Y◦ =−2p(P (W◦)−F). H Proof. Since X+Y ∈H⊥ we can write X =2qH +H⊥, Y =−2qH +H⊥. 1 2 with H ∈H and H⊥,H⊥ ∈H⊥. By (16) the problem (24) becomes 1 2 argminK ∗(cid:0)2qH +H⊥,−2qH +H⊥(cid:1)= 1 2 H,H⊥,H⊥ 1 2 argminTτ(cid:18)2qH2+H1⊥ + −2qH2q+H2⊥ +F(cid:19)+(q−1)(cid:13)(cid:13)(cid:13)(cid:13)−2qH2q+H2⊥ +F(cid:13)(cid:13)(cid:13)(cid:13)2. H,H⊥,H⊥ 1 2 Note that 2qH +H⊥ −2qH +H⊥ H⊥ H⊥ 1 + 2 =(q−1)H + 1 + 2 =(q−1)H +H⊥, 2 2q 2 2q where H⊥ = H1⊥ + H2⊥. The problem can thus be rewritten 2 2q argmin Tτ(cid:0)F +(q−1)H +H⊥(cid:1)+(q−1)(cid:13)(cid:13)(cid:13)(cid:13)F −H + H2q2⊥(cid:13)(cid:13)(cid:13)(cid:13)2, H,H⊥,H⊥ 2 Since F,H ∈H and H⊥ ∈H⊥, it follows that 2 (cid:13)(cid:13)(cid:13)F −H + H2⊥(cid:13)(cid:13)(cid:13)2 =(cid:107)F −H(cid:107)2+(cid:13)(cid:13)(cid:13)H2⊥(cid:13)(cid:13)(cid:13)2, (cid:13) 2q (cid:13) (cid:13) 2q (cid:13) and hence we conclude that the optimal H⊥ is the zero matrix. Thus 2 H⊥ H⊥ = 1 , 2 and the objective functional can be replaced by argmin T (cid:0)F +(q−1)H +H⊥(cid:1)+(q−1)(cid:107)H −F(cid:107)2. τ H,H⊥ Using the substitution W =F +(q−1)H +H⊥ it then becomes Tτ(W)+(q−1)(cid:13)(cid:13)(cid:13)(cid:13)qW−1 −(cid:18)qFq+−H1⊥(cid:19)(cid:13)(cid:13)(cid:13)(cid:13)2 = 1 T (W)+ (cid:107)W −(qF +P (W))(cid:107)2. τ q−1 H⊥ The minimization problem will thus be solved by W =argmin S(W,P W), (26) H⊥ W 7 where S :M ×H⊥ →R is the functional defined by M,N 1 S(W,V)=T (W)+ (cid:107)W −(qF +V)(cid:107)2. τ q−1 NotethatS isconvexsinceitisobtainedfromtheconvexfunctionK ∗ byaffinechangesofcoordinates. It is also easy to see that lim sup S(W,V)=∞, R→∞(cid:107)W(cid:107)+(cid:107)V(cid:107)>R so it has at least one global minimum. Let (W◦,V◦) be one such. Since S(W◦,V◦)=min S(W◦,V), V it is easy to see that V◦ =P W◦. But then H⊥ W◦ =argmin S(W,V◦)=argmin S(W,P W◦), H⊥ W W which by Lemma 4.1 read W◦ =S (qF +P (W◦))=B(W◦). s H⊥ It is also evident that W◦ is a solution to (26). The formulas in (25) now follows by tracing the changes of variables backwards. 5 A fixed point algorithm 5.1 Discussion of the objective functional To recapitulate our main objective, we wish to minimize σ2rank(A)+(cid:107)A−F(cid:107)2, A∈H, (27) 0 which we replace by its convex envelope R (A)+(cid:107)A−F(cid:107)2, A∈H, (28) σ0 to achieve convexity. We will in this section show how to minimize argmin R (A)+q(cid:107)A−F(cid:107)2, (29) τ A∈H for any 1 < q < ∞ and τ > 0. Note that the corresponding modification to the original objective functional (27), i.e. argmin τ2rank(A)+q(cid:107)A−F(cid:107)2, (30) A∈H √ doesnotchangetheproblem, sincewemayequivalentlysolve(27)withσ =τ/ q (toseethis, multiply 0 (30) with 1/q). However, this is not the case with (29), for it is easy to show that 1 √ R (A)=R √ (A/ q). q τ τ/ q Thus (29) is equivalent with √ argmin R (A/ q)+(cid:107)A−F(cid:107)2, (31) σ0 A∈H √ where σ =τ/ q, compare with (27) and (28). Note as well that 0 R (A)+q(cid:107)A−F(cid:107)2 =R (A)+(cid:107)A−F(cid:107)2+(q−1)(cid:107)A−F(cid:107)2, τ τ by which we conclude that the objective functional in (29) is strictly convex with a unique minimizer A◦. Moreover, the above expression is clearly less than or equal to τ2rank(A)+(cid:107)A−F(cid:107)2+(q−1)(cid:107)A−F(cid:107)2 =τ2rank(A)+q(cid:107)A−F(cid:107)2, so if it happens that R (A◦)+q(cid:107)A◦−F(cid:107)2 =τ2rank(A◦)+q(cid:107)A◦−F(cid:107)2, (32) τ then A◦ is a solution to the original problem (30). However, by the definition of R it follows that (32) τ is satisfied if and only if A◦ has no singular values in the interval (0,τ). 8 1 function A=fixedpoint(F,tau_0, N_iter), 2 W=0∗F; 3 for n=1:N_iter, 4 [u,s,v]=svd(2∗F+PHp(W)); 5 W=u∗max(min(s,tau_0),s/2)∗v'; 6 end; 7 A=2∗F−PH(W); 1 Lambda=@(a)hankel(a(1:(end+1)/2),a((end+1)/2:end)); 2 function a=iLambda(A), 3 N=size(A,1)−1;for j=−N:N, a(N+1+j,1)=mean(diag(flipud(A),j));end 4 PH=@(A)Lambda(iLambda(A));PHp=@(A)A−PH(A); Table 1: MATLAB implementation of the fixed point algorithm of Theorem 5.1 for q = 2, along with subroutines for Hankel operations. 5.2 The basic fixed-point algorithm We now solve (29), using fixed points of the operator B from the previous section. It is easy to see F,τ,q that these may not be unique. Despite that, we have the following. Theorem 5.1. The Picard iteration Wn+1 = B (Wn) converges to a fixed point W◦. Moreover, F,τ,q P (W◦) is unique and H 1 A◦ = (qF −P (W◦)), q−1 H is the unique solution to (29). Before the presenting the proof, let us give a few more comments on the equivalent problem formu- lation (31). Note that √ lim R (A/ q)=τ2rank(A), τ q→0+ but that our method requires q >1 (in fact, the objective functional is no longer convex beyond q =1). However, choosing q ≈ 1 will yield slow convergence of the Picard iteration since then s (σ) ≈ σ. On √ τ,q theotherhand,theeffectofchoosingalargeq (forafixedτ)isthatthetermR (A/ q)deviatesfurther τ from τ2rank(A). We have found that q =2 works well in practice. Proof. We first prove that the Picard iteration converges. Note that the sequence Vn = P Wn, H⊥ n∈N, is generated by the Picard iteration of the operator P B and the starting-point V0 =P W0. H⊥ H⊥ MoreoverWn+1 =B(Vn). SinceBiscontinuous(infact, itisnon-expansiveby (12))itsufficestoshow that (Vn)∞ is convergent. n=0 It is well known [15, 22] that for firmly non-expansive operators (in finite dimensional space) the Picarditerationconvergestoafixedpointaslongasthesetoffixedpointsoftheoperatorisnon-empty. By Theorem 4.3 there exists a fixed point W◦ of B, and it clearly follows that V◦ =P W◦ is a fixed H⊥ point of P B. It remains to show that P B is firmly non-expansive, as an operator on H⊥. An H⊥ H⊥ equivalent to firmly non-expansive is that 2P B−I is non-expansive [13], i.e., H⊥ (cid:107)(2P B(V)−V)−(2P B(W)−W)(cid:107)≤(cid:107)V −W(cid:107), ∀V,W ∈H⊥, (33) H⊥ H⊥ which we will verify shortly. To this end, note that on H⊥ we have 2P B(V)=2P s(qF +V), (34) H⊥ H⊥ and set (cid:18) (cid:19) 2 s†(σ)=2s(σ)−σ =max min(2σ,2τ), σ −σ q (cid:18) (cid:18) (cid:19) (cid:19) 2 =max min(σ,2τ −σ), −1 σ . q 9 Thisfunctions†isclearlyLipschitzcontinuouswithLipschitzconstantequaltoone. SettingV˜ =(qF+V) and W˜ =(qF +W), (12) and (34) implies that (cid:13)(cid:0) (cid:1) (cid:0) (cid:1)(cid:13)2 (cid:13) 2PH⊥B(V)−V − 2PH⊥B(W)−W (cid:13) = (cid:13)(cid:13)P (cid:16)(cid:0)2s(V˜)−V(cid:1)−(cid:0)2s(W˜)−W(cid:1)(cid:17)(cid:13)(cid:13)2 = (cid:13) H⊥ (cid:13) (cid:13)(cid:13)P (cid:16)(cid:0)2s(V˜)−V˜(cid:1)−(cid:0)2s(W˜)−W˜(cid:1)(cid:17)(cid:13)(cid:13)2 =(cid:13)(cid:13)P (cid:16)s†(V˜)−s†(W˜)(cid:17)(cid:13)(cid:13)2 ≤ (cid:13) H⊥ (cid:13) (cid:13) H⊥ (cid:13) (cid:13) (cid:13)2 (cid:13) (cid:13)2 (cid:13)s†(V˜)−s†(W˜)(cid:13) ≤(cid:13)V˜ −W˜(cid:13) =(cid:107)V −W(cid:107), (cid:13) (cid:13) (cid:13) (cid:13) whichestablishes(33),andhencetheoriginalPicarditerationconvergestoafixedpointW◦ofB. Forthe uniquenessofP (W◦),supposethatW˜◦isanotherfixedpointofBandwriteW◦ =H+T, W˜◦ =H˜+T˜, H where H,H˜ ∈H and T,T˜ ∈H⊥. Then (cid:107)H −H˜(cid:107)2+(cid:107)T −T˜(cid:107)2 =(cid:107)W◦−W˜◦(cid:107)2 =(cid:107)B(W◦)−B(W˜◦)(cid:107)2 = (cid:107)s(qF +T)−s(qF +T˜)(cid:107)2 ≤(cid:107)(qF +T)−(qF +T˜)(cid:107)2 =(cid:107)T −T˜(cid:107)2, where the last inequality is due to (12) and the fact that s has Lipschitz constant equal to one. But τ then (cid:107)H −H˜(cid:107)2 ≤(cid:107)T −T˜(cid:107)2−(cid:107)T −T˜(cid:107)2 =0, implying that H = H˜. Finally, if (A◦,B◦) denotes the solution to (19), then it is clear that A◦ solves (29). By the theory in Section 2.1 and Theorem 4.3, it follows that (A◦,B◦) also is a solution to argmin R (A)+p(cid:107)A−B(cid:107)2+q(cid:107)B−F(cid:107)2 (35) τ A,B −(cid:104)A,2p(P (W◦)−F)+2P (W◦)(cid:105)−(cid:104)B,−2p(P (W◦)−F)(cid:105). H H⊥ H where W◦ is as above. By fixing A◦ we find that B◦ =argmin p(cid:107)A◦−B(cid:107)2+q(cid:107)B−F(cid:107)2−(cid:104)B,−2p(P (W◦)−F)(cid:105) H B (36) 1 =F + (A◦−P (W◦)). q H But inserting the constraint A◦ =B◦ (from (19)) in the above equation yields (q−1)A◦ =qF −P (W◦), (37) H as desired. The uniqueness of A◦ was argued before the proof, so it is complete. The matrix W◦ has several peculiar properties, as shown by the next theorem, (compare with (29) and note the absence of the condition A∈H below). Theorem 5.2. Let W◦ and A◦ be as in Theorem 5.1. Then A◦ solves argmin R (A)+(cid:107)A−W◦(cid:107)2. τ A Moreover, if W◦ =Udiag(σ (W◦))V∗, then A◦ =Udiag(σ (A◦))V∗, where j j σ (A◦)=σ (W◦), if σ (W◦)>τ j j j 0≤σ (A◦)≤σ (W◦), if σ (W◦)=τ . (38) j j j σ (A◦)=0, if σ (W◦)<τ j j In particular, if σ (W◦) has no singular values equal to τ, then A◦ is the solution to the non-convex j problem (30). Proof. By (36), A◦ minimizes (35) restricted to the subspace B = F + 1(A−P (W◦)). By inserting q H this expression for B in (36) and by a completion of squares, we get A◦ =argmin R (A)+(cid:107)A−W◦(cid:107)2. τ A 10