ebook img

Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit PDF

0.32 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit

Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit Arvind Ganesh†, John Wright∗, Xiaodong Li‡, Emmanuel J. Cande`s‡,§ and Yi Ma∗,† ∗Microsoft Research Asia, Beijing, P.R.C †Dept. of Electrical and Computer Engineering, UIUC, Urbana, IL 61801 ‡Dept. of Mathematics, Stanford University, Stanford, CA 94305 §Dept. of Statistics, Stanford University, Stanford, CA 94305 Abstract—We consider the problem of recovering a low- Thus, the problem at hand is to recover a low-rank matrix 0 rank matrix when some of its entries, whose locations are not L0 (the principal components) from a corrupted data matrix known a priori, are corrupted by errors of arbitrarily large 1 magnitude. It has recently been shown that this problem can D =L +S , 0 0 0 be solved efficiently and effectively by a convex program named 2 PrincipalComponentPursuit(PCP),providedthatthefractionof where the entries of S can have arbitrary magnitude. Al- 0 n corruptedentriesandtherankofthematrixarebothsufficiently though this problem is intractable (NP-hard) to solve under a small.Inthispaper,weextendthatresulttoshowthatthesame general conditions, recent studies have discovered that certain J convex program, with a slightly improved weighting parameter, convex program can effectively solve this problem under sur- 2 exactly recovers the low-rank matrix even if “almost all” of 2 its entries are arbitrarily corrupted, provided the signs of the prisinglybroadconditions.Theworkof[6],[7]hasproposeda errors are random. We corroborate our result with simulations convex program to recover low-rank matrices when a fraction ] on randomly generated matrices and errors. of their entries have been corrupted by errors of arbitrary T magnitude i.e., when the matrix S is sufficiently sparse. This I I. INTRODUCTION 0 . approach, dubbed Principal Component Pursuit (PCP) by [6], s c Low-rank matrix recovery and approximation have been suggests solving the following convex optimization problem: [ extensively studied lately for their great importance in theory min (cid:107)L(cid:107) +λ(cid:107)S(cid:107) s.t. D =L+S, (2) and practice. Low-rank matrices arise in many real data ∗ 1 2 L,S v analysis problems when the high-dimensional data of interest where(cid:107)·(cid:107) and(cid:107)·(cid:107) denotethematrixnuclearnorm(sumof 2 lie on a low-dimensional linear subspace. This model has ∗ 1 singularvalues)and1-norm(sumofabsolutevaluesofmatrix 6 been extensively and successfully used in many diverse areas, 3 entries), respectively, and λ>0 is a weighting parameter. For including face recognition [1], system identification [2], and 2 square matrices of size n×n, the main result of [6] can be information retrieval [3], just to name a few. 1. summarized as follows: Principal Component Analysis (PCA) [4] is arguably the 0 If the singular vectors of L are not too coherent most popular algorithm to compute low-rank approximations 0 0 with the standard basis, and the support of S is 1 toahigh-dimensionaldatamatrix.Essentially,PCAsolvesthe 0 random, then solving the convex program (2) with : following optimization problem: v λ=n−1/2 exactlyrecoversL ofrankO(n/log2n) Xi mLin (cid:107)D−L(cid:107) s.t. rank(L)≤r, (1) from errors S0 affecting ρn20of the entries, where where D ∈Rm×n is the given data matrix, and (cid:107)·(cid:107) denotes ρ>0 is a sufficiently small positive constant. r a the matrix spectral norm. The optimal solution to the above In this work, we extend the above result to show that under problem is the best rank-r approximation (in an (cid:96)2 sense) to the same assumptions, (2) recovers low-rank matrices even D [5].Furthermore,PCAofferstheoptimalsolutionwhenthe if the fraction of corrupted entries ρ is arbitrarily close to matrix D is corrupted by i.i.d. Gaussian noise. In addition to one,providedthesignsoftheerrorsarerandom.Equivalently theoretical guarantees, the PCA can be computed stably and speaking, almost all of the matrix entries can be badly efficiently via the Singular Value Decomposition (SVD). corrupted by random errors. The analysis in this paper is a The major drawback of PCA is its brittleness to errors of nontrivial modification to the arguments of [6] and leads to a large magnitude, even if such errors affect only a few entries better estimate of the weighting parameter λ that enables this of the matrix D. In fact, a single corrupted entry can throw denseerror-correctionperformance.Weverifyourresultwith the low-rank matrix Lˆ estimated by PCA arbitrarily far from simulations on randomly generated matrices. the true solution. Unfortunately, these kinds of non-Gaussian, gross errors and corruptions are prevalent in modern data. For II. ASSUMPTIONSANDMAINRESULT example,shadowsinafaceimagecorruptonlyasmallpartof Forconvenienceofnotation,weconsidersquarematricesof the image, but the corrupted pixels can be arbitrarily far from size n×n. The results stated here easily extend to non-square their true values in magnitude. matrices. Assumption A: Incoherence Model for L . It is clear that anarbitrarilylargefractionofitsentriesarecorruptedbyerrors 0 for some low-rank and sparse pairs (L ,S ), the problem of of arbitrary magnitude and the locations of the uncorrupted 0 0 separatingM =L +S intothecomponentsthatgeneratedit entries are unknown. 0 0 is not well-posed, e.g., if L is itself a sparse matrix. In both Relations to Existing Results. While [6] has proved that 0 matrix completion and matrix recovery, it has proved fruitful PCP succeeds, with high probability, in recovering L and 0 to restrict attention to matrices whose singular vectors are not S exactly with λ = n−1/2, the analysis required that the 0 aligned with the canonical basis. This can be formalized via fraction of corrupted entries ρ is small. The new result shows the notion of incoherence introduced in [8]. If L = UΣV∗ that, with random error signs, PCP succeeds with ρ arbitrarily 0 denotes a reduced singular value decomposition of L , with closetoone.Thisresultalsosuggestsusingaslightlymodified 0 U,V ∈Rn×r, and Σ∈Rr×r, then L is µ-incoherent if weighting parameter λ. Although the new λ is of the same 0  max (cid:107)U∗e (cid:107)2 ≤ µr/n, order as n−1/2, we identify a dependence on ρ that is crucial  maxi (cid:107)V∗ei(cid:107)2 ≤ µr/n, (3) for correctly recovering L0 when ρ is large. i i  (cid:107)UV∗(cid:107) ≤ (cid:112)µr/n2, This dense error correction result is not an isolated phe- ∞ nomenon when dealing with high-dimensional highly corre- where the ei’s are the canonical basis vectors in Rn. Here, latedsignals.Inasense,thisworkisinspiredbyaconceptually (cid:107)·(cid:107)∞ denotes the matrix ∞-norm (maximum absolute value similar result for recovering sparse signal via (cid:96)1 minimization of matrix entries). [9].Tosummarize,torecoverasparsesignalxfromcorrupted Assumption B: Random Signs and Support for S0. Simi- linear measurements: y = Ax+e, one can solve the convex larly,itisclearthatforsomeverysparsepatternsofcorruption, programmin(cid:107)x(cid:107) +(cid:107)e(cid:107) , s.t. y =Ax+e.Ithasbeenshown 1 1 exact recovery is not possible, e.g., if S0 affects an entire row in[9]thatifAissufficientlycoherentandxsufficientlysparse, or column of the observation. In [6], such ambiguities are the convex program can exactly recover x even if the fraction . avoided by placing a random model on Ω=supp(S0), which of nonzero entries in e approaches one. we also adopt. In this model, each entry (i,j) is included The result is also similar in spirit to results on matrix in Ω independently with probability ρ. We say Ω ∼ Ber(ρ) completion [8], [10], [11], which show that under similar wheneverΩissampledfromtheabovedistribution.Wefurther incoherence assumptions, low-rank matrices can be recovered introducearandommodelforthesignsofS0:weassumethat from vanishing fractions of their entries. for (i,j)∈Ω, sgn((S ) ) is an independent random variable 0 ij takingvalues±1withprobability1/2.Equivalently,underthis III. MAINIDEASOFTHEPROOF model, if E =sgn(S ), then TheproofofTheorem1followsasimilarlineofarguments 0  presented in [6], and is based on the idea of constructing a 1, w.p. ρ/2,  dual certificate W whose existence certifies the optimality of E = 0, w.p. 1−ρ, (4) ij  −1, w.p. ρ/2. (L0,S0). As in [6], the dual certificate is constructed in two parts via a combination of the “golfing scheme” of David This error model differs from the one assumed in [6], in Gross [11], and the method of least squares. However, several which the error signs come from any fixed (even adversarial) details of the construction must be modified to accommodate n×n sign pattern. The stronger assumption that the signs are a large ρ. random is necessary for dense error correction. Beforecontinuing,wefixsomenotation.Giventhecompact Our main result states that under the above assumptions SVD of L = UΣV∗, we let T ⊂ Rn×n denote the linear 0 and models, PCP corrects large fractions of errors. In fact, subspace {UX∗+YV∗|X,Y ∈Rn×r}. By a slight abuse of provided the dimension is high enough and the matrix L is 0 notation, we also denote by Ω the linear subspace of matrices sufficiently low-rank, ρ can be any constant less than one: whose support is a subset of Ω. We let P and P denote the T Ω Theorem 1 (Dense Error Correction via PCP). Fix any projection operators T and Ω, respectively. ρ<1. Suppose that L is an n×n matrix of rank r obeying The following lemma introduces a dual vector that in turn, 0 (3) with incoherence parameter µ, and the entries of sign(S0) ensures that (L0,S0) is the unique optimal solution to (2). are sampled i.i.d. according to (4). Then as n becomes large1, Lemma 1. (Dual Certificate) Assume λ < 1 − α and Principal Component Pursuit (2) exactly recovers (L ,S ) 0 0 (cid:107)P P (cid:107) ≤ 1−(cid:15) for some α,(cid:15) ∈ (0,1). Then, (L ,S ) is Ω T 0 0 with high probability, provided the unique solution to (2) if there is a pair (W,F) obeying λ=C1(cid:18)4(cid:112)1−ρ+ 94(cid:19)−1(cid:114)1ρ−nρ, r < µClo2gn2n, (5) UV∗+W =λ(sgn(S0)+F +PΩD) where 0<C1 ≤4/5 and C2 >0 are certain constants. with PTW = 0 and (cid:107)W(cid:107) ≤ α, PΩF = 0 and (cid:107)F(cid:107)∞ ≤ 12, and (cid:107)P D(cid:107) ≤(cid:15)2. Inotherwords,providedtherankofamatrixisoftheorder Ω F of n/µlog2n, PCP can recover the matrix exactly even when Weprovethislemmaintheappendix.Lemma1generalizes Lemma 2.5 of [6] as follows: 1Forρclosertoone,thedimensionnmustbelarger;formally,n>n0(ρ). 1) [6] assumes that (cid:107)P P (cid:107) ≤ 1/2, whereas we only By“highprobability”,wemeanwithprobabilityatleast1−cnβ forsome Ω T fixedβ>0. require that (cid:107)PΩPT(cid:107) is bounded away from one. By Lemma 2, the former assumption is justified only for (b) (cid:107)P (UV∗+WL)(cid:107) <λ(1−σ)2, Ω F small values of ρ (or for small amounts of corruption). (c) (cid:107)P (UV∗+WL)(cid:107) < λ. Ω⊥ ∞ 4 2) While [6] requires that (cid:107)W(cid:107) ≤ 1/2, we impose a more general bound on (cid:107)W(cid:107). We find that a value of α closer The proof of this lemma follows that of Lemma 2.8 of [6] to 1 gives a better estimate of λ. exactly–theonlydifferenceisthathereweneedtousetighter constants that hold for larger n. The main tools needed are For example, by setting α = 9/10, to prove that (L ,S ) is 0 0 bounds on the operator norm of P P (which follow from theuniqueoptimalsolutionto(2),itissufficienttofindadual Ωj T Lemma 2), as well as bounds on vector W satisfying  P W =0, (cid:107)Q−q−1P P Q(cid:107) /(cid:107)Q(cid:107) , (cid:107)Q−q−1P Q(cid:107)/(cid:107)Q(cid:107) ,  (cid:107)WT (cid:107)< 9 , Ωj T ∞ ∞ Ωj ∞ (cid:107)P (UV1∗0+W −λsgn(S ))(cid:107) ≤λ(cid:15)2, (6) for any fixed nonzero Q (which are given by Lemmas 3.1  (cid:107)PΩ (UV∗+W)(cid:107) < λ0, F and 3.2 of [6]). These bounds can be invoked thanks to the Ω⊥ ∞ 2 independencebetweentheΩ ’sinthegolfingscheme.Weomit j assuming that (cid:107)PΩPT(cid:107)≤1−(cid:15) and λ<1/10. the details here due to limited space and invite the interested Weconstructadualcertificateintwoparts,W =WL+WS reader to consult [6]. using a variation of the golfing scheme [11] presented in [6]. Lemma 4. Assume that Ω∼Ber(ρ), and that the signs of S 1) Construction of WL using the golfing scheme. The golf- 0 ing scheme writes Ωc = ∪j0 Ω , where the Ω ⊆ are i.i.d. symmetric (and independent of Ω). Then, under the j=1 j j assumptions of Theorem 1, the matrix WS obeys, with high [n]×[n] are independent Ber(q), with q chosen so that probability, (1 − q)j0 = ρ.2 The choice of q ensures that indeed Ω ∼ Ber(ρ), while the independence of the Ω ’s allows (a) (cid:107)WS(cid:107)<8/10, j a simple analysis of the following iterative construction: (b) (cid:107)PΩ⊥WS(cid:107)∞ < λ4. Starting with Y =0, we iteratively define See the appendix for the proof details. The proof of this 0 Y =Y +q−1P P (UV∗−Y ), lemma makes heavy use of the randomness in sgn(S ), j j−1 Ωj T j−1 0 and set and the fact that these signs are independent of Ω. The WL =P Y . (7) T⊥ j0 idea is to first bound the norm of the linear operator 2) Construction of WS using least squares. We set R = P (cid:80) (P P P )k, and then, conditioning on Ω, T⊥ k≥1 Ω T Ω WS =argmin(cid:107)Q(cid:107) s.t. P Q=λsgn(S ), we use Hoeffding’s inequality to obtain a tail bound for F Ω 0 x∗R(sgn(S ))y foranyfixedx,y.Thisextendstoaboundon P Q=0. 0 T (cid:107)WS(cid:107) = sup x∗R(sgn(S ))y via a union bound (cid:107)x(cid:107)≤1,(cid:107)y(cid:107)≤1 0 Since (cid:107)P P P (cid:107)=(cid:107)P P (cid:107)2 <1, it is not difficult to across an appropriately chosen net. We state this argument Ω T Ω Ω T show that the solution is given by the Neumann series formally in the appendix. (cid:88) WS =λP (P P P )ksgn(S ). (8) Although the line of argument here is similar to the proof T⊥ Ω T Ω 0 of Lemma 2.9 in [6], there are some important differences k≥0 In the remainder of this section, we present three lemmas since that work assumed that ρ (and hence, (cid:107)P P (cid:107)) is Ω T that establish the desired main result Theorem 1. The first small. Our analysis gives a tighter probabilistic bound for lemma validates the principal assumption of Lemma 1 that (cid:107)P (cid:80) (P P P )kE(cid:107), which in turn yields a better T⊥ k≥1 Ω T Ω (cid:107)PΩPT(cid:107) is bounded away from one. The other two lemmas estimate of the weighting parameter λ as a function of ρ. collectively prove that the dual certificate W = WL +WS generated by the procedure outlined above satisfies (6) with IV. SIMULATIONS high probability, and thereby, prove Theorem 1 by virtue of In this section, we provide simulation results on randomly Lemma 1. generated matrices to support our main result, and suggest Lemma2. (Corollary2.7in[6])SupposethatΩ∼Ber(ρ)and potential improvements to the value of λ predicted by our L obeys the incoherence model (3). Then, with high proba- analysis in this paper. For a given dimension n, rank r, and 0 bility,(cid:107)PΩPT(cid:107)2 ≤ρ+δ,providedthat1−ρ≥C0δ−2µrlnogn sparsity parameter ρ, we generate L0 and S0 as follows: for some numerical constant C0 >0. 1) L0 =R1R2∗, where R1,R2 ∈Rn×r are random matrices whose entries are i.i.d. distributed according to a normal This result plays a key role in establishing the following two bounds on WL and WS, respectively. distribution with mean zero and variance 100/n. 2) S is a sparse matrix with exactly ρn2 non-zero entries, . 0 L√emma 3. Assume that Ω ∼ Ber(ρ), and (cid:107)PΩPT(cid:107) ≤ σ = whose support is chosen uniformly at random from all ρ+δ <1. Set j0 =2(cid:100)logn(cid:101). Then, under the assumptions possible supports of size ρn2.3 The non-zero entries of of Theorem 1, the matrix WL obeys, with high probability, S take value ±1 with probability 1/2. 0 (a) (cid:107)WL(cid:107)<1/10, 3AsarguedinAppendix7.1of[6],fromtheperspectiveofsuccessofthe 2Thevalueofj0 isspecifiedinLemma3. algorithm,thisuniformmodelisessentiallyequivalenttotheBernoullimodel. (a) r=1,C1=0.8 (b) r=1,C1=4 Fig. 1. Dense error correction for varying dimension. Given n, r, and ρ, we generate L0 = R1R2∗ as the product of two independent n×r i.i.d. N(0,100/n)matrices,andS0 isasparsematrixwithρn2 non-zeroentriestakingvalues±1withprobability1/2.Foreachpair(n,ρ),theplotsshowthe fractionofsuccessfulrecoveriesoveratotalof10independenttrials.Here,whitedenotesreliablerecoveryinalltrials,andblackdenotesfailureinalltrials, withalinearscaleforintermediatefractions. We use the augmented Lagrange multiplier method (ALM) can be further strengthened. In our analysis, the value of λ [12] to solve (2). This algorithm exhibits good convergence is essentially determined by the spectral norm of WS; it is behavior, and since its iterations each have the same com- reasonable to believe that dual certificates of smaller spectral plexity as an SVD, it is scalable to reasonably large matrices. norm can be constructed by methods other than least squares. Let (Lˆ,Sˆ) be the optimal solution to (2). The recovery is Finally, while we have stated our results for the case of considered successful if (cid:107)L0−Lˆ(cid:107)F < 0.01, i.e., the relative squarematrices,similarresultscanbeobtainedfornon-square error in the recovered low-r(cid:107)aLn0k(cid:107)Fmatrix is less than 1%. matrices with minimal modification to the proof. For our first experiment, we fix rank(L ) = 1. This case 0 APPENDIX:PROOFOFLEMMA1ANDLEMMA4 demonstrates the best possible error correction behavior for any given dimension n. We vary n from 400 upto 1600, and Proof of Lemma 1. for each n consider varying ρ ∈ (0,1). For each (n,ρ) pair, Proof: Let UV∗ +W be a subgradient of the nuclear 0 we choose norm at L , and sgn(S )+F be a subgradient of the (cid:96) - (cid:18) (cid:112) 9(cid:19)−1 (cid:114)1−ρ norm at S0. For any fea0sible s0olution (L +H,S −H) 1to λ=C · 4 1−ρ+ (9) 0 0 0 1 4 nρ (2), with C = 0.8 as suggested by Theorem 1. Figure 1(a) plots 1 the fraction of successes across 10 independent trials. Notice (cid:107)L0+H(cid:107)∗+λ(cid:107)S0−H(cid:107)1 ≥ that the amount of corruption that PCP can handle increases (cid:107)L (cid:107) +λ(cid:107)S (cid:107) +(cid:104)UV∗+W ,H(cid:105)−λ(cid:104)sgn(S )+F ,H(cid:105) 0 ∗ 0 1 0 0 0 monotonically with dimension n. We have found that the λ given by our analysis is actually Choosing W0 such that (cid:104)W0,H(cid:105) = (cid:107)PT⊥H(cid:107)∗ and F0 such somewhat pessimistic for moderate n – better error correction that (cid:104)F0,H(cid:105)=−(cid:107)PΩ⊥H(cid:107)14 gives behavior in relatively low dimensions can be observed by (cid:107)L +H(cid:107) +λ(cid:107)S −H(cid:107) choosingλaccordingto(9),butwithalargerconstantC =4. 0 ∗ 0 1 1 ≥(cid:107)L (cid:107) +λ(cid:107)S (cid:107) +(cid:107)P H(cid:107) +λ(cid:107)P H(cid:107) Figure 1(b) verifies this by repeating the same experiment 0 ∗ 0 1 T⊥ ∗ Ω⊥ 1 +(cid:104)UV∗−λsgn(S ),H(cid:105). as in Figure 1(a), but with the modified λ. Indeed, we see 0 larger fractions of error successfully corrected. For instance, By assumption, UV∗−λsgn(S )=λF −W +λP D. Since 0 Ω we observe that for n = 1600, choosing C1 = 0.8 enables (cid:107)W(cid:107)≤α, and (cid:107)F(cid:107) ≤ 1, we have reliable recovery when upto 35% of the matrix entries are ∞ 2 corrupted, whereas with C1 = 4, PCP can handle upto 75% |(cid:104)UV∗−λsgn(S0),H(cid:105)| ofcorruptedentries.Asdiscussedbelow,thissuggeststhereis ≤α(cid:107)P H(cid:107) + λ(cid:107)P H(cid:107) +λ|(cid:104)P D,H(cid:105)|. T⊥ ∗ 2 Ω⊥ 1 Ω stillroomforimprovingourbounds,eitherbytighteranalysis of the current construction or by constructing dual certificates Substituting the above relation, we get WS of smaller norm. (cid:107)L +H(cid:107) +λ(cid:107)S −H(cid:107) 0 ∗ 0 1 V. DISCUSSION ≥(cid:107)L0(cid:107)∗+λ(cid:107)S0(cid:107)1+(1−α)(cid:107)PT⊥H(cid:107)∗+ λ2(cid:107)PΩ⊥H(cid:107)1 −λ|(cid:104)P D,H(cid:105)| Ω This work showed that PCP in fact corrects large fractions ≥(cid:107)L (cid:107) +λ(cid:107)S (cid:107) +(1−α)(cid:107)P H(cid:107) + λ(cid:107)P H(cid:107) ofrandomerrors,providedthematrixtoberecoveredsatisfies 0 ∗ 0 1 T⊥ ∗ 2 Ω⊥ 1 −λ(cid:15)2(cid:107)P H(cid:107) Ω F the incoherence condition and the corruptions are random in both sign and support. The fact that a higher value of the constant C offers better error-correction performance in 4Forinstance,F0=−sgn(PΩ⊥H)andW0=PT⊥W,where(cid:107)W(cid:107)=1 1 and(cid:104)W,PT⊥H(cid:105)=(cid:107)PT⊥H(cid:107)∗.SuchaW existsduetothedualitybetween moderate dimensions suggests that the analysis in this work (cid:107)·(cid:107)and(cid:107)·(cid:107)∗. We note that For any τ ∈(0,1), let N denote an τ-net for Sn−1 of size τ (cid:107)P H(cid:107) ≤(cid:107)P P H(cid:107) +(cid:107)P P H(cid:107) atmost(3/τ)n (see[14]Lemma3.18).Then,itcanbeshown Ω F Ω T F Ω T⊥ F ≤(1−(cid:15))(cid:107)H(cid:107) +(cid:107)P H(cid:107) that F T⊥ F ≤(1−(cid:15))((cid:107)PΩH(cid:107)F +(cid:107)PΩ⊥H(cid:107)F)+(cid:107)PT⊥H(cid:107)F (cid:107)R(E)(cid:107)= sup (cid:104)y,R(E)x(cid:105)≤(1−τ)−2 sup (cid:104)y,R(E)x(cid:105) and, therefore, x,y∈Sn−1 x,y∈Nτ . (cid:107)PΩH(cid:107)F ≤≤ 11−−(cid:15)(cid:15)(cid:15)(cid:107)(cid:107)PPΩ⊥HH(cid:107)(cid:107)F++11(cid:15)(cid:107)(cid:107)PPT⊥HH(cid:107)(cid:107)F. F(cid:104)yo,rRa(Efix)xed(cid:105) =pai(cid:104)rR((xy,xy∗)),∈E(cid:105)N.τC×ondNitτio,nwaleodnefiΩne=Xs(uxp,py()E=), (cid:15) Ω⊥ 1 (cid:15) T⊥ ∗ the signs of E are i.i.d. symmetric and by Hoeffding’s in- In conclusion, we have equality, we have (cid:107)L +H(cid:107) +λ(cid:107)S −H(cid:107) (cid:18) 2t2 (cid:19) 0 ∗ 0 1 P(|X(x,y)|>t|Ω)≤2exp − . ≥(cid:107)L (cid:107) +λ(cid:107)S (cid:107) +((1−α)−λ(cid:15))(cid:107)P H(cid:107) (cid:107)R(xy∗)(cid:107)2 +λ0(cid:0)21∗−(1−0(cid:15))1(cid:15)(cid:1)(cid:107)PΩ⊥H(cid:107)1. T⊥ ∗ Since (cid:107)xy∗(cid:107)F =1, we have (cid:107)R(xy∗)(cid:107)F ≤(cid:107)R(cid:107),Fso Because (cid:107)PΩPT(cid:107) < 1, the intersection of Ω∩T = {0}, and P(cid:18) sup |X(x,y)|>t|Ω(cid:19)≤2|N |2exp(cid:18)− 2t2 (cid:19), hence, for any nonzero H, at least one of the above terms τ (cid:107)R(cid:107)2 involving H is strictly positive. x,y∈Nτ Proof of Lemma 4. and for any fixed Ω∈E1∩E2 Proof: (cid:18)3(cid:19)2n (cid:18) 2(1−τ)4(1−ρ)t2(cid:19) Proof of (a). Let E = sgn(S ). By assumption, the P((cid:107)R(E)(cid:107)>t|Ω)≤2 exp − . 0 τ (1+η)2ρ distribution of each entry of E is given by (4). Using (8) we can exWprSes=s WλPS asE: +λP (cid:88)(P P P )kE In particular, for any C > (1+η)(1−τ)−2(cid:113)log(cid:0)τ3(cid:1), Ω ∈ T⊥ T⊥ Ω T Ω E ∩E , 1 2 k≥1 (cid:16) (cid:113) (cid:17) :=P WS +P WS. P (cid:107)R(E)(cid:107)>C ρn |Ω <exp(−C(cid:48)n), T⊥ 0 T⊥ 1 1−ρ For the first term, we have (cid:107)PT⊥W0S(cid:107) ≤ λ(cid:107)E(cid:107). Using where C(cid:48)(C) > 0. Since inf0<τ<1(1−τ)−2(cid:112)log(3/τ) < standardargumentsonthenormofamatrixwithi.i.d.entries, √ 9/4, by an appropriate choice of τ and η >0, we have we have (cid:107)E(cid:107)≤4 nρ with overwhelming probability [13]. soFWorSth=eλseRco(End).teNrmot,icweethsaettwRhe=nePvTer⊥(cid:107)(cid:80)Pk≥P1((cid:107)PΩ<P1T,PΩ)k, P(cid:18)(cid:107)R(E)(cid:107)> 49(cid:114)1ρ−nρ(cid:19)<exp(−C(cid:48)n)+P((E1∩E2)c). 1 Ω T (cid:107)R(cid:107)=(cid:107)PT⊥(cid:88)(PΩPTPΩ)k(cid:107) Thus, (cid:107)WS(cid:107)<λ(cid:18)4√ρ+ 9(cid:114) ρ (cid:19)√n≤8/10 k≥1 4 1−ρ (cid:88) ≤(cid:107)PT⊥PΩPTPΩ(cid:107)·(cid:107) (PΩPTPΩ)k(cid:107) with high probability, provided n is sufficiently large. k≥0 Proof of (b) follows the proof of Lemma 2.9 (b) of [6]. (cid:88) ≤(cid:107)P P P (cid:107)·(cid:107)P P (cid:107)· (cid:107)P P P (cid:107)k T⊥ Ω T T Ω Ω T Ω REFERENCES k≥0 (cid:88) [1] J. Wright, A. Yang, A. Ganesh, Y. Ma, and S. Sastry, “Robust face =(cid:107)PT⊥PΩ⊥PT(cid:107)·(cid:107)PTPΩ(cid:107)· (cid:107)PΩPT(cid:107)2k recognitionviasparserepresentation,”IEEETrans.PatternAnalysisand MachineIntelligence,vol.31,no.2,Feb2009. k≥0 [2] M.Fazel,H.Hindi,andS.Boyd,“Rankminimizationandapplications (cid:107)P P (cid:107)·(cid:107)P P (cid:107) ≤ Ω⊥ T Ω T . (10) insystemtheory,”inAmericanControlConference,June2004. 1−(cid:107)P P (cid:107)2 [3] C. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala, “Latent T Ω semanticindexing:Aprobabilisticanalysis,”JournalofComputerand Consider the two events: SystemSciences,vol.61,no.2,Oct2000. √ [4] I.Jolliffe,“Principalcomponentanalysis,”1986. E :={(cid:107)P P (cid:107)≤ ρ+δ}, 1 Ω T [5] C.EckartandG.Young,“Theapproximationofonematrixbyanother E :={(cid:107)P P (cid:107)≤(cid:112)1−ρ+δ}. oflowerrank,”Psychometrika,vol.1,pp.211–218,1936. 2 Ω⊥ T [6] E. Cande`s, X. Li, Y. Ma, and J. Wright, “Robust principal component For any fixed η >0, we can choose δ(η,ρ)>0, such that on analysis?”preprintsubmittedtoJournaloftheACM,2009. [7] V. Chandrasekaran, S. Sanghavi, P. Parrilo, and A. Willsky, “Sparse E1∩E2, (cid:114) ρ and low-rank matrix decompositions,” in IFAC Symposium on System (cid:107)R(cid:107)≤(1+η) . (11) Identification,2009. 1−ρ [8] E.Cande`sandB.Recht,“Exactmatrixcompletionviaconvexoptimza- SinceΩ∼Ber(ρ)andΩc ∼Ber(1−ρ),byLemma2,E ∩E tion,”Found.ofComput.Math.,2008. 1 2 [9] J.WrightandY.Ma,“Denseerrorcorrectionvia(cid:96)1-minimization,”to occurs with high probability provided appearinIEEETransactionsonInformationTheory,2008. r ≤δ(η,ρ)2min(ρ,1−ρ)n/µlogn. (12) [10] E.Cande`sandT.Tao,“Thepowerofconvexrelaxation:Near-optimal matrix completion,” to appear in IEEE Transactions on Information Since by assumption r ≤ Cn/µlog2n, (12) holds for n Theory,2009. [11] D. Gross, “Recovering low-rank matrices from few coefficients in any sufficiently large. basis,”preprint,2009. [12] Z.Lin,M.Chen,L.Wu,andY.Ma,“Theaugmentedlagrangemultiplier method for exact recovery of corrupted low-rank matrices,” UIUC TechnicalReportUILU-ENG-09-2215,Oct2009. [13] R. Vershynin, “Math 280 lecture notes,” 2007, available at http://www-stat.stanford.edu/∼dneedell/280.html. [14] M. Ledoux, The Concentration of Measure Phenomenon. American MathematicalSociety,2001.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.