A Penalty Method for Rank Minimization Problems in Symmetric Matrices∗ Xin Shen† John E. Mitchell‡ 7 1 January 13, 2017 0 2 n a J Abstract 2 1 The problem of minimizing the rank of a symmetric positive semidefinite matrix subject to constraints can be cast equivalently as a semidefinite program with comple- ] C mentarityconstraints(SDCMPCC).Theformulationrequirestwopositivesemidefinite O matrices to be complementary. We investigate calmness of locally optimal solutions . to the SDCMPCC formulation and hence show that any locally optimal solution is a h KKT point. t a We develop a penalty formulation of the problem. We present calmness results m for locally optimal solutions to the penalty formulation. We also develop a proximal [ alternating linearized minimization (PALM) scheme for the penalty formulation, and 1 investigate the incorporation of a momentum term into the algorithm. Computational v results are presented. 8 1 Keywords: Rank minimization, penalty methods, alternating minimization 2 AMS Classification: 90C33, 90C53 3 0 . 1 1 Introduction to the Rank Minimization Problem 0 7 1 Recently rank constrained optimization problems have received increasing interest because : v of their wide application in many fields including statistics, communication and signal pro- i X cessing [10, 36]. In this paper we mainly consider one genre of the problems whose objective r is to minimize the rank of a matrix subject to a given set of constraints. The problem has a the following form: minimize rank(X) X∈Rm×n (1) subject to X ∈ C ∗ThisworkwassupportedinpartbytheAirForceOfficeofSponsoredResearchundergrantsFA9550-08- 1-0081andFA9550-11-1-0260andbytheNationalScienceFoundationunderGrantNumberCMMI-1334327. †Monsanto, St. Louis, MO. ‡Department of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 ([email protected], http://www.rpi.edu/~mitchj). 1 where Rm×n is the space of size m by n matrices, and C is the feasible region for X. The class of problems has been considered computationally challenging because of its nonconvex nature. The rank function is also highly discontinuous, which makes rank mini- mization problems hard to solve. Many methods have been developed previously to attack the problem, including nuclear norm approximation [10, 24, 26, 5]. The nuclear norm of a matrix X ∈ Rm×n is defined as the sum of its singular values: √ (cid:88) ||X||∗ = σ = trace( XTX) i In the approximated problem, the objective is to find a matrix with the minimal nuclear norm minimize ||X||∗ X∈Rm×n (2) subject to X ∈ C The nuclear norm is convex and continuous. Many algorithms have been developed previ- ously to find the optimal solution to the nuclear norm minimization problem, including inte- rior point methods [24], singular value thresholding [5], Augmented Lagrangian method [22], proximal gradient method [23], subspace selection method [15], reweighting methods [28], and so on. These methods have been shown to be efficient and robust in solving large scale nuclear norm minimization problems in some applications. Previous works provided some explanation for the good performance for convex approximation by showing that nuclear norm minimization and rank minimization is equivalent under certain assumptions. Recht [33] presented a version of the restricted isometry property for a rank minimization problem. Under such a property the solution to the original rank minimization problem can be exactly recovered by solving the nuclear norm minimization problem. However, these properties are too strong and hard to validate, and the equivalence result cannot be extended to the general case. Zhang et al. [46] gave a counterexample in which the nuclear norm fails to find the matrix with the minimal rank. In this paper, we focus on the case of symmetric matrices X. Let Sn denote the set of symmetric n×n matrices, and Sn denote the cone of n×n symmetric positive semidefinite + matrices. The set C is taken to be the intersection of Sn with another convex set, taken to + be an affine manifold in our computational testing. To improve the performance of the nuclear norm minimization scheme in the case of symmetric positive semidefinite matrices, a reweighted nuclear norm heuristic was put for- ward by Mohan et al. [27]. In each iteration of the heuristic a reweighted nuclear norm minimization problem is solved, which takes the form: minimize (cid:104)W,X(cid:105) X∈Sn (3) subject to X ∈ C ∩Sn + where W is a positive semidefinite matrix, with W based upon the result of the last iteration. As with the standard nuclear norm minimization, the method only applies to problems with special structure. The lack of theoretical guarantee for these convex approximations 2 in general problems motivates us to turn to the exact formulation of the rank function, whichcanbeconstructedasamathematicalprogramwithsemidefiniteconecomplementarity constraints (SDCMPCC). Methods using nonconvex optimization to solve rank minimization problems include [17, 20,37,38]. Incontrasttothemethodinthispaper, thesereferencesworkwithanexplicitlow rank factorization of the matrix of interest. Other methods based on a low-rank factorization include the thresholding methods [5, 6, 40, 41]. SimilartotheLPCCformulationforthe(cid:96) minimizationproblem[4,11], theadvantageof 0 the SDCMPCC formulation is that it can be expressed as a smooth nonlinear program, thus it can be solved by general nonlinear programming algorithms. The purpose of this paper is to investigate whether nonlinear semidefinite programming algorithms can be applied to solve the SDCMPCC formulation and examine the quality of solution returned by the nonlinear algorithms. We’re faced with two challenges. The first one is the nonconvexity of the SDCMPCC formulation, which means that we can only assure that the solutions we find are locally optimal. The second is that most nonlinear algorithms use KKT conditions as their termination criteria. Since a general SDCMPCC formulation is not well-posed because ofthecomplementarityconstraints, i.e, KKTstationaritymaynotholdatlocaloptima, there might be some difficulties with the convergence of these algorithms. We show in Theorem 2 that any locally optimal point for the SDCMPCC formulation of the rank minimization problem does indeed satisfy the KKT conditions. 2 Semidefinite Cone Complementarity Formulation for Rank Minimization Problems A mathematical program with semidefinite cone complementarity constraints (SDCMPCC) is a special case of a mathematical program with complementarity constraints (MPCC). In SDCMPCC problems the constraints include complementarity between matrices rather than vectors. When the complementarity between matrices is replaced by the complementarity betweenvectors, theproblemturnsintoastandardMPCC.ThegeneralSDCMPCCprogram takes the following form: minimize f(x) x∈Rq subject to g(x) ≤ 0 (4) h(x) = 0 Sn (cid:51) G(x) ⊥ H(x) ∈ Sn + + where f : IRq → IR, h : IRq → IRp, g : IRq → IRm, G : IRq → Sn and H : IRq → Sn. The requirement G(x) ⊥ H(x) for G(x),H(x) ∈ Sn is that the Frobenius inner product of G(x) + and H(x) is equal to 0, where the Frobenius inner product of two matrices A ∈ Rm×n and B ∈ Rm×n is defined as (cid:104)A,B(cid:105) = trace(ATB). 3 We define c(x) := (cid:104)G(x),H(x)(cid:105). (5) It is shown in Bai et al. [2] that (4) can be reformulated as a convex conic completely positive optimization problem. However, the cone in the completely positive formulation does not have a polynomial-time separation oracle. An SDCMPCC can be written as a nonlinear semidefinite program. Nonlinear semidef- inite programming recently received much attention because of its wide applicability. Ya- mashita [43] surveyed numerical methods for solving nonlinear SDP programs, including Augmented Lagrangian methods, sequential SDP methods and primal-dual interior point methods. However, there is still much room for research in both theory and practice with solution methods, especially when the size of problem becomes large. An SDCMPCC is a special case of a nonlinear SDP program. It is hard to solve in general. In addition to the difficulties in general nonlinear semidefinite programming, the complementarity constraints pose challenges to finding the local optimal solutions since the KKT condition may not hold at local optima. Previous work showed that optimality condi- tionsinMPCC,suchasM-stationary,C-StationaryandStrongStationary,canbegeneralized into the class of SDCMPCC. Ding et al. [8] discussed various kinds of first order optimality conditions of an SDCMPCC and their relationship with each other. An exact reformulation of the rank minimization problem using semidefinite cone con- straints is due to Ding et al. [8]. We begin with a special case of (1), in which the matrix variable X ∈ Rn×n is restricted to be symmetric and positive semidefinite. The special case takes the form: minimize rank(X) X∈Sn subject to X ∈ C˜ (6) X (cid:23) 0 ByintroducinganauxilliaryvariableU ∈ Rn×n,wecanmodelProblem(6)asamathematical program with semidefinite cone complementarity constraints: minimize n − (cid:104)I, U(cid:105) X∈Sn ˜ subject to X ∈ C 0 (cid:22) X ⊥ U (cid:23) 0 (7) 0 (cid:22) I − U X (cid:23) 0 U (cid:23) 0 TheequivalencebetweenProblem(6)andProblem(7)canbeverifiedbyaproperassignment of U for given feasible X. Suppose X has the eigenvalue decomposition: X = PTΣP (8) 4 Let P be the matrix composed of columns in P corresponding to zero eigenvalues. We can 0 set: U = P PT (9) 0 0 It is obvious that rank(X) = n − (cid:104)I,U(cid:105) (10) It follows that: Opt(6) ≥ Opt(7) Theoppositedirectionoftheaboveinequalitycanbeeasilyvalidatedbythecomplementarity constraints. If the there exists any matrix U with the rank of U greater than n − Opt(6), the complementarity constraints would be violated. ThecomplementarityformulationcanbeextendedtocaseswherethematrixvariableX ∈ Rm×n is neither positive semidefinite nor symmetric. One way to deal with nonsymmetric X is to introduce an auxilliary variable Z: (cid:20) (cid:21) G XT Z = (cid:23) 0 X B Liu et al. [24] has shown that for any matrix X, we can find matrix G and B such that Z (cid:23) 0 and rank(Z) = rank(X). The objective is to minimize the rank of matrix Z instead of X. A drawback of the above extension is that it might introduce too many variables. An alternative way is to modify the complementarity constraint. If m > n, the rank of matrix X must be bounded by n and equals the rank of matrix XTX ∈ Sn. XTX is both symmetric + and positive semidefinite and we impose the following constraint: (cid:104)U,XTX(cid:105) = 0 where U ∈ Sn×n. The objective is minimize the rank of XTX instead, or equivalently to minimize n−(cid:104)I,U(cid:105). 3 Constraint Qualification of the SDCMPCC Formu- lation SDCMPCC problems are generally hard to solve and there have been discussions on poten- tial methods to solve them [42, 47], including relaxation and penalty methods. The original SDCMPCC formulation and all its variations fall into the genre of nonlinear semidefinite programming. There has been some previous work on algorithms for solving nonlinear semidefinite programming problems, including an Augmented Lagrangian method and a proximal gradient method. Most existing algorithms use the KKT conditions as criteria for checking local optimality, and they terminate at KKT stationary points. The validity of KKT conditions at local optima can be guaranteed by constraint qualification. However, 5 as pointed out in [8], common constraint qualifications such as LICQ and Robinson CQ are violated for SDCMPCC. The question arises as to whether any constraint qualification holds at the SDCMPCC formulation of a rank minimization problem. In this section we’ll show that a constraint qualification called calmness holds at any local optimum of the SDCMPCC formulation. 3.1 Introduction of Calmness Calmness was first defined by Clarke [7]. If calmness holds then a first order KKT necessary condition holds at a local minimizer. Thus, calmness plays the role of a constraint quali- fication, although it involves both the objective function and the constraints. It has been discussed in the context of conic optimization problems in [16, 44, 45], in addition to Ding et al. [8]. Here, we give the definition from [7], adapted to our setting. Definition 1. Let K ⊆ IRn be a convex cone. Let f : IRq → IR, h : IRq → IRp, g : IRq → IRm, and G : IRq → IRn be continuous functions. A feasible point x¯ to the conic optimization problem min f(x) x∈IRq subject to g(x) ≤ 0 h(x) = 0 G(x) ∈ K is Clarke calm if there exist positive (cid:15) and µ such that f(x) − f(x¯) +µ||(r,s,P)|| ≥ 0 whenever ||(r,s,P)|| ≤ (cid:15), ||x−x¯|| ≤ (cid:15), and x satisfies the following conditions: h(x)+r = 0, g(x)+s ≤ 0, G(x)+P ∈ K. The idea of calmness is that when there is a small perturbation in the constraints, the improvementintheobjectivevalueinaneighborhoodofx¯mustbeboundedbysomeconstant times the magnitude of perturbation. Theorem 1. [8] If calmness holds at a local minimizer x¯ of (4) then the following first order necessary KKT conditions hold at x¯: there exist multipliers λh ∈ Rp, λg ∈ IRm, ΩG ∈ Sn, ΩH ∈ Sn, and λc ∈ IR such + + that the subdifferentials of the constraints and objective function of (4) satisfy 0 ∈ ∂f(x¯) + ∂(cid:104)h,λh(cid:105)(x¯) + ∂(cid:104)g,λg(cid:105)(x¯) +∂(cid:104)G,ΩG(cid:105)(x¯) + ∂(cid:104)H,ΩH(cid:105)(x¯) + λc∂c(x¯), λg ≥ 0, (cid:104)g(x¯),λg(cid:105) = 0, ΩG ∈ Sn, ΩH ∈ Sn, + + (cid:104)ΩG,G(x¯)(cid:105) = 0, (cid:104)ΩH,H(x¯)(cid:105) = 0. 6 In the framework of general nonlinear programming, previous results [25] show that the Mangasarian-Fromowitz constraint qualification (MFCQ) and the constant-rank constraint qualification (CRCQ) imply local calmness. When all the constraints are linear, CRCQ will hold. However, in the case of SDCMPCC, calmness may not hold at locally optimal points. Linear semidefinite programming programs are a special case of SDCMPCC: take H(x) identically equal to the zero matrix. Even in this case, calmness may not hold. For linear SDP, the optimality conditions in Theorem 1 correspond to primal and dual feasibility together with complementary slackness, so for example any linear SDP which has a duality gap will not satisfy calmness. Consider the example below, where we show explicitly that calmness does not hold: minimize x 2 x1,x2 x +1 0 0 (11) 2 s.t G(x) = 0 x1 x2 (cid:23) 0 0 x 0 2 It is trivial to see that any point (x ,0) with x ≥ 0 is a global optimal point to the problem. 1 1 However: Proposition 1. Calmness does not hold at any point (x ,0) with x ≥ 0. 1 1 Proof. We will omit the case when x > 0 and only show the proof for the case x = 0. 1 1 Take x = δ and x = −δ2 1 2 As δ → 0, we can find a matrix: 1−δ2 0 0 M = 0 δ −δ2 (cid:23) 0 0 −δ2 δ3 in the semidefinite cone and ||G(δ,−δ2)−M|| = δ3. However, the objective value at (δ,−δ2) is −δ2. Thus we have: f(x ,x ) − f(0,0) −δ2 −δ2 1 2 = ≤ → −∞ ||G(δ,−δ2)−M|| ||G(δ,−δ2)−M|| δ3 as δ → 0. It follows that calmness does not hold at the point (0,0) since µ is unbounded. 3.2 Calmness of SDCMPCC Formulation In this part, we would like to show that in Problem (7), calmness holds for each pair (X,U) with X feasible and U given by (9). 7 Proposition 2. Each (X,U) with X feasible and U given by (9) is a local optimal solution in Problem (7) Proof. Thepropositionfollowsfromthefactthatrank(X(cid:48)) ≥ rank(X)forallX(cid:48) closeenough to X. Proposition 3. For any feasible (X,U) in Problem (7) with X feasible and U given by (9), ˆ ˆ let (X,U) be a feasible point to the optimization problem below: ˆ minimize n − (cid:104)I, U(cid:105) Xˆ,Uˆ∈Sn ˆ ˜ subject to X + p ∈ C ˆ ˆ |(cid:104)X, U(cid:105)| ≤ q (12) ˆ λ (I − U) ≥ −r min ˆ λ (X) ≥ −h min 1 ˆ λ (U) ≥ −h min 2 where p, q, r, h and h are perturbations to the constraints and λ (M) denotes the 1 2 min minimum eigenvalue of matrix M. Assume X has at least one positive eigenvalue. For ˆ ˆ ||(p,q,r,h ,h )||, ||X −X||, and ||U −U|| all sufficiently small, we have 1 2 (cid:16) (cid:17) (cid:104)I,U(cid:105)−(cid:104)I,Uˆ(cid:105) ≥ −2q − (n−rank(X)) r+(1 + r)2 h λ˜ λ˜ 1 (13) (cid:16) (cid:17) −h 4 ||X||∗ −rank(X) 2 λ˜ where ||X||∗ is the nuclear norm of X and λ˜ is the smallest positive eigenvalue of X. Proof. The general scheme is to determine the lower bound for (cid:104)I,U(cid:105) and an upper bound ˆ for (cid:104)I,U(cid:105). A lower bound of (cid:104)I,U(cid:105) can be easily found by exploiting the complementarity ˆ constraints and its value is n−rank(X). To find the upper bound of (cid:104)I,U(cid:105), the approach ˆ we take is to fix X in Problem (12) and estimate a lower bound for the objective value of the following problem: ˜ minimize n − (cid:104)I, U(cid:105) U˜∈Sn ˆ ˜ subject to −(cid:104)X, U(cid:105) ≤ q, y 1 (cid:104)Xˆ, U˜(cid:105) ≤ q, y (14) 2 ˜ I − U (cid:23) −rI, Ω 1 ˜ U (cid:23) −h I, Ω 2 2 8 where y , y , Ω and Ω are the Lagrangian multipliers for the corresponding constraints. It 1 2 1 2 ˆ ˆ ˆ is obvious that (X,U) must be feasible to Problem (14). We find an upper bound for (I,U) by finding a feasible solution to the dual problem of Problem (14), which is: maximize n + qy + qy − (1 + r)trace(Ω ) − h trace(Ω ) 1 2 1 2 2 y1,y2∈R,Ω1,Ω2∈Sn ˆ ˆ subject to −y X + y X − Ω + Ω = −I 1 2 1 2 (15) y , y ≤ 0 1 2 Ω , Ω (cid:23) 0 1 2 We can find a lower bound on the dual objective value by looking at a tightened version, ˆ which is established by diagonalizing X by linear transformation and restricting the non- diagonal term of Ω and Ω to be 0. Let {f }, {g } be the entries on the diagonal of Ω and 1 2 i i 1 ˆ ˆ Ω after the transformation respectively, and {λ } be the eigenvalues of X. The tightened 2 i problem is: (cid:80) (cid:80) maximize n + qy + qy − (1 + r) f − h g y1,y2∈R,f,g∈Rn 1 2 i i 2 i i ˆ ˆ subject to −y λ + y λ − f + g = −1, ∀i = 1···n 1 i 2 i i i (16) y , y ≤ 0 1 2 f , g ≥ 0, ∀i = 1···n i i By proper assignment of the value of y ,y ,f,g, we can construct a feasible solution to the 1 2 tightened problem and give a lower bound for the optimal objective of the dual problem. ˜ Let {λ } be the set of eigenvalues of X, with λ the smallest positive eigenvalue, and set: i 2 y = 0 and y = − 1 2 ˜ λ For f and g: • if λˆ < λ˜, take f = 1+y λˆ , g = 0. i 2 i 2 i i • if λˆ ≥ λ˜, take f = 0 and g = 2λˆ −1. i 2 i i λ˜ i It is trivial to see that the above assignment will yield a feasible solution to Problem (16) and hence a lower bound for the dual objective is: 2q (cid:88) (cid:88) 2 ˆ ˆ n − − (1 + r)(1+y λ ) − h ( λ −1) (17) ˜ 2 i 2 ˜ i λ λ λˆi<λ˜2 λˆi≥λ˜2 By weak duality the primal objective value must be greater or equal to the dual objective value, thus: 2q (cid:88) (cid:88) 2 ˆ ˆ ˆ n−(cid:104)I,U(cid:105) ≥ n − − (1 + r)(1+y λ ) − h ( λ −1). ˜ 2 i 2 ˜ i λ λ λˆi<λ˜2 λˆi≥λ˜2 9 Since we can write (cid:88) n−(cid:104)I,U(cid:105) = n − 1, λi=0 ˆ it follows that for ||U −U|| sufficiently small we have (n−(cid:104)I,Uˆ(cid:105)) − (n−(cid:104)I,U(cid:105)) ≥ n − 2q − (cid:80) (1 + r)(1+y λˆ ) λ˜ 2 i λˆi<λ˜2 −h (cid:80) (2λˆ −1) − (n − (cid:80) 1) 2 λ˜ i λˆi≥λ˜2 λi=0 (18) = −2q − (cid:80) (r+(1 + r)y λˆ ) λ˜ 2 i λˆi<λ˜2 −h (cid:80) (2λˆ −1). 2 λ˜ i λˆi≥λ˜2 For λˆ < λ˜, by the constraints λˆ ≥ −h and setting y = −2, we have: i 2 i 1 2 λ˜ 2h ˆ 1 r+(1 + r)y λ ≤ r+(1 + r) . 2 i ˜ λ For λˆ ≥ λ˜, recall the definition for nuclear norm and we have: i 2 (cid:88) λˆ ≤ 2||X||∗ i λˆi≥λ˜2 ˆ ˆ for ||X −X|| sufficiently small. Since there are exactly n−rank(X) eigenvalues in X that converge to 0, we can simplify the above inequality(18) and have: (n−(cid:104)I,Uˆ(cid:105)) − (n−(cid:104)I,U(cid:105)) ≥ −2q − (n−rank(X))(r+(1 + r)2h1) λ˜ λ˜ (19) (cid:16) (cid:17) −h 4 ||X||∗ −rank(X) . 2 λ˜ Thus we can prove the inequality. There is one case that is not covered by Proposition 3, namely that X = 0. This is also calm, as we show in the next lemma. ˆ ˆ Lemma 1. Assume X = 0 is feasible in (7), with U given by (9). Let (X,U) be a feasible point to (12). We have ˆ (n−(cid:104)I,U(cid:105)) − (n−(cid:104)I,U(cid:105)) ≥ −nr. Proof. Note that (cid:104)I,U(cid:105) = n, since X = 0 and U satisfies (9). In addition, each eigenvalue ˆ of U is no larger than 1+r, so the result follows. 10