ebook img

High Dimensional Low Rank plus Sparse Matrix Decomposition PDF

1.8 MB·English
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview High Dimensional Low Rank plus Sparse Matrix Decomposition

1 A Subspace Learning Approach to High-Dimensional Matrix Decomposition with Efficient Information Sampling Mostafa Rahmani, Student Member, IEEE and George K. Atia, Member, IEEE Abstract—This paper is concerned with the problem of low- non-zero elements of S are sufficiently diffused [2]. Although rankplussparsematrixdecompositionforbigdata.Conventional the problem in (2) is convex, its computational complexity algorithmsformatrixdecompositionusetheentiredatatoextract is intolerable with large volumes of high-dimensional data. thelow-rankandsparsecomponents,andarebasedonoptimiza- Eventheefficientiterativealgorithmsproposedin[3],[4]have tion problems that scale with the dimension of the data, which 6 limit their scalability. Furthermore, the existing randomized prohibitive computational and memory requirements in high- 1 approachesmostlyrelyonuniformrandomsampling,whichcan dimensional settings. 0 bequiteinefficientformanyrealworlddatamatricesthatexhibit 2 Contributions: This paper proposes a new randomized de- additional structures (e.g. clustering). In this paper, a scalable b subspace-pursuit approach that transforms the decomposition composition approach, which extracts the LR component in e problem to a subspace learning problem is proposed. The two consecutive steps. First, the column-space (CS) of L is F decomposition is carried out using a small data sketch formed learnedfromasmallsubsetofthecolumnsofthedatamatrix. from sampled columns/rows. Even when the data is sampled 3 Second, the row-space (RS) of L is obtained using a small uniformly at random, it is shown that the sufficient number 1 subset of the rows of D. Unlike conventional decomposition of sampled columns/rows is roughly O(rµ), where µ is the coherencyparameterandrtherankofthelow-rankcomponent. that uses the entire data, we only utilize a small data sketch, ] A Inaddition,efficientsamplingalgorithmsareproposedtoaddress and solve two low-dimensional optimization problems in lieu the problem of column/row sampling from structured data. of one high-dimensional matrix decomposition problem (2) N The proposed sampling algorithms can be independently used resulting in significant running time speed-ups. . for feature selection from high-dimensional data. The proposed s approach is amenable to online implementation and an online To the best of our knowledge, it is shown here for the c [ scheme is proposed. first time that the sufficient number of randomly sampled columns/rowsscaleslinearlywiththerankrandthecoherency Index Terms—Low-Rank Matrix, Subspace Learning, Big 2 Data, Matrix Decomposition, Column Sampling, Sketching parameter of L even with uniform random sampling. Also, v in contrast to the existing randomized approaches [5]–[7], 2 8 which use blind uniform random sampling, we propose a I. INTRODUCTION 1 new methodology for efficient column/row sampling. When 0 SUPPOSEwearegivenadatamatrixD∈RN1×N2,which the columns/rows of L are not distributed uniformly in the 0 can be expressed as CS/RS of L, which prevails much of the real world data, the . 2 proposed sampling approach is shown to achieve significant D=L+S, (1) 0 savings in data usage compared to uniform random sampling- 5 where L is a low rank (LR) and S is a sparse matrix with based methods that require remarkable portions of the data. 1 : arbitrary unknown support, whose entries can have arbitrarily The proposed sampling algorithms can be independently used v large magnitude. Many important applications in which the for feature selection from high-dimensional data. i X dataunderstudycanbenaturallymodeledusing(1)weredis- In the presented approach, once the CS is learned, each r cussed in [1]. The cutting-edge Principal Component Pursuit columnisdecomposedefficientlyandindependentlyusingthe a approach developed in [1], [2], directly decomposes D into proposed randomized vector decomposition method. Unlike its LR and sparse components by solving the convex program most existing approaches, which are batch-based, this unique feature enables applicability to online settings. The presented min λ(cid:107)S˙(cid:107) +(cid:107)L˙(cid:107) 1 ∗ vector decomposition method can be independently used in L˙,S˙ (2) many applications as an efficient vector decomposition algo- subject to L˙ +S˙ =D rithm or for efficient linear decoding [8]–[10]. where(cid:107).(cid:107) isthe(cid:96) -norm,(cid:107).(cid:107) isthenuclearnormandλde- 1 1 ∗ termines the trade-off between the sparse and LR components [2]. The convex program (2) can precisely recover both the A. Notation and definitions LR and sparse components if the columns and rows subspace We use bold-face upper-case letters to denote matrices and ofLaresufficientlyincoherentwiththestandardbasisandthe bold-face lower-case letters to denote vectors. Given a matrix L, (cid:107)L(cid:107) denotes its spectral norm, (cid:107)L(cid:107) its Frobenius norm, This material is based upon work supported by the National Science F Foundation under NSF grant No. CCF-1320547 and NSF CAREER Award and (cid:107)L(cid:107)∞ the infinity norm, which is equal to the maximum CCF-1552497. absolute value of its elements. In an N-dimensional space, e i The authors are with the Department of Electrical Engineering and Com- istheith vectorofthestandardbasis(i.e.,theith elementofe puterScience,UniversityofCentralFlorida,Orlando,FL32816USA(e-mail: i mostafa@knights.ucf.edu,george.atia@ucf.edu). is equal to one and all the other elements are equal to zero). 2 The notation A=[] denotes an empty matrix and the matrix thealgorithmin[4]requirescomputingtheSVDofanN ×N 1 2 matrix in every iteration. A=[A A ...A ] 1 2 n is the column-wise concatenation of the matrices {A }n . B. Randomized approaches i i=1 Random sampling refers to sampling without replacement. Owing to their inherent low-dimensional structures, the robust principal component analysis (PCA) and matrix de- II. BACKGROUNDANDRELATEDWORK composition problems can be conceivably solved using small data sketches, i.e., a small set of random observations of the A. Exact LR plus sparse matrix decomposition data [6], [7], [12]–[15]. In [12], it was shown based on a The incoherence of the CS and RS of L is an important re- simpledegrees-of-freedomanalysisthattheLRandthesparse quirement for the identifiability of the decompostion problem components can be precisely recovered using a small set of in (1) [1], [2]. For the LR matrix L with rank r and compact random linear measurements of D. A convex program was SVD L = UΣVT (where U ∈ RN1×r, Σ ∈ Rr×r and proposed in [12] to recover these components using random V ∈ RN2×r), the incoherence condition is typically defined matrix embedding with a polylogarithmic penalty factor in through the requirements [1], [2] samplecomplexity,albeittheformulationalsorequiressolving µr µr max(cid:107)UTe (cid:107)2 ≤ , max(cid:107)VTe (cid:107)2 ≤ a high-dimensional optimization problem. i i 2 N1 i i 2 N2 The iterative algorithms which solve (2) have complexity (cid:114) (3) and (cid:107)UVT(cid:107) ≤ µr O(N1N2r) per iteration since they compute the partial SVD ∞ N N decomposition of N × N dimensional matrices [4]. To 2 1 1 2 reduce complexity, GoDec [16] uses a randomized method to for some parameter µ that bounds the projection of the efficientlycomputetheSVD,andthedecompositionalgorithm standardbasis{e }ontotheCSandRS.Otherusefulmeasures i in [17] minimizes the rank of ΦL instead of L, where Φ is for the coherency of subspaces are given in [11] as, a random projection matrix. However, these approaches do (cid:112) (cid:112) γ(U)= N1max|U(i,j)|,γ(V)= N2max|V(i,j)|, (4) not have provable performance guarantees and their memory i,j i,j requirements scale with the full data dimensions. Another where γ(U) and γ(V) bound the coherency of the CS and limitation of the algorithm in [17] is its instability since the RS, respectively. When some of the elements of the different random projections may yield different results. orthonormal basis of a subspace are too large, the subspace is The divide-and-conquer approach in [5] (and a similar coherent with the standard vecto√rs. Actually, it is not hard to algorithmin[18]),canachievesuper-linearspeedupsoverfull- show that max(γ(V),γ(U))≤ µ. scale matrix decomposition. This approach forms an estimate The decomposition of a data matrix into its LR and sparse of L by combining two low-rank approximations obtained componentswasanalyzedin[1],[2],andsufficientconditions from submatrices formed from sampled rows and columns of for exact recovery using the convex minimization (2) were D using the generalized Nystro¨m method [19]. Our approach derived. In [1], the sparsity pattern of the sparse matrix is also achieves super-linear speedups in decomposition, yet is selecteduniformlyatrandomfollowingtheso-calledBernoulli fundamentallydifferentfrom[5]andoffersseveraladvantages model to ensure that the sparse matrix is not LR with over- for the following reasons. First, our approach is a subspace- whelming probability. In this model, which is also used in pursuit approach that focuses on subspace learning in a this paper, each element of the sparse matrix can be non- structure-preserving data sketch. Once the CS is learned, each zero independently with a constant probability. Without loss column of the data is decomposed independently using a pro- of generality (w.l.o.g.), suppose that N2 ≤N1. The following posed randomized vector decomposition algorithm. Second, lemma states the main result of [1]. unlike[5],whichisabatchapproachthatrequirestostorethe Lemma1(Adapted from [1]). Supposethatthesupportsetof entire data, the structure of the proposed approach naturally S follows the Bernoulli model with parameter ρ. The convex lendsitselftoonlineimplementation(c.f.SectionIV-E),which program (2) with λ = √1 yields the exact decomposition could be very beneficial for settings where the data comes in N1 on the fly. Third, while the analysis provided in [5] requires with probability at least 1−c N −10 provided that 1 1 roughlyO(r2µ2max(N ,N ))randomobservationstoensure 1 2 r ≤ρ N µ−1(log(N ))−2 , ρ≤ρ (5) exactdecompositionwithhighprobability(whp),weshowthat r 2 1 s theorderofsufficientnumberofrandomobservationsdepends where ρ , c and ρ are numerical constants. s 1 r linearly on the rank and the coherency parameter even if Theoptimizationproblemin(2)isconvexandcanbesolved uniform random sampling is used. Fourth, the structure of the using standard techniques such as interior point methods [2]. proposed approach enables us to leverage efficient sampling Although these methods have fast convergence rates, their strategies for challenging and realistic scenarios in which the usage is limited to small-size problems due to the high columns and rows of L are not uniformly distributed in their complexity of computing a step direction. Similar to the respective subspaces, or when the data exhibits additional iterative shrinking algorithms for (cid:96) -norm and nuclear norm structures(e.g.clusteringstructures)(c.f.SectionsIV-B,IV-C). 1 minimization, a family of iterative algorithms for solving the In such settings, the uniform random sampling used in [5] optimization problem (2) were proposed in [3], [4]. However, requires significantly larger amounts of data to carry out the they also require working with the entire data. For example, decomposition. 3 III. STRUCTUREOFTHEPROPOSEDAPPROACHAND Thus,thecolumnssubspaceoftheLRmatrixcanberecovered THEORETICALRESULT by finding the columns subspace of Ls1. Our next lemma establishes that (9) yields the exact decomposition using In this section, the structure of the proposed randomized roughlym =O(µr)randomlysampledcolumns.Tosimplify decomposition method is presented. A step-by-step analysis 1 the analysis, in the following lemma it is assumed that the oftheproposedapproachisprovidedandsufficientconditions CS of the LR matrix is sampled from the random orthogonal for exact decomposition are derived. Theorem 5 stating the model [20], i.e., the columns of U are selected uniformly at main theoretical result of the paper is presented at the end of random among all families of r-orthonormal vectors. this section. The proofs of the lemmas and the theorem are deferred to the appendix. Let us rewrite (1) as Lemma 3. Suppose the columns subspace of L is sampled from the random orthogonal model, L has the same column s1 D=UQ+S, (6) subspace of L and the support set of S follows the Bernoulli where Q = ΣV. The representation matrix Q ∈ Rr×N2 is a modelwithparameterρ.Inaddition,assumethatthecolumns of D were sampled uniformly at random. If fullrowrankmatrixthatcontainstheexpansionofthecolumns s1 ofLintheorthonormalbasisU.Thefirststepoftheproposed m ≥ r µ(cid:48)(logN )2 and ρ≤ρ , (10) approach aims to learn the CS of L using a subset of the 1 ρ 1 s r columnsofD,andinthesecondsteptherepresentationmatrix then (9) yields the exact decomposition with probability at is obtained using a subset of the rows of D. least 1−c N−3, where LetU denotetheCSofL.Fundamentally,U canbeobtained 8 1 (cid:18) (cid:19) from a small subset of the columns of L. However, since we µ(cid:48)=max c7max(r,logN1),6γ2(V),(c γ(V)logN )2 (11) do not have direct access to the LR matrix, a random subset r 9 1 of the columns of D is first selected. Hence, the matrix of and c , c and c are constant numbers provided that N is 7 8 9 1 sampled columns D can be written as D = DS , where s1 s1 1 greater than the RHS of first inequality of (10). S1 ∈ RN2×m1 is the column sampling matrix and m1 is the number of selected columns. The matrix of selected columns Therefore, according to Lemma 3 and Lemma 2, the CS of can be written as L can be obtained using roughly O(rµ) uniformly sampled data columns. Note that m (cid:28)N for high-dimensional data 1 1 Ds1 =Ls1+Ss1, (7) as m1 scales linearly with r. Hence, the requirement that N1 is also greater than the RHS of first inequality of (10) is by whereL andS areitsLRandsparsecomponents,respec- s1 s1 no means restrictive and is naturally satisfied. tively.TheideaistodecomposethesketchD intoitsLRand s1 Supposethat(9)decomposesD intoitsexactcomponents sparse components to learn the CS of L from the CS of L . s1 s1 and assume that U has been correctly identified. W.l.o.g., we NotethatthecolumnsofL areasubsetofthecolumnsofL s1 can use U as an orthonormal basis for the learned CS. An since L =LS . Should we be able to decompose D into s1 1 s1 arbitrary column d of D can be written as d = Uq +s , its exact LR and sparse components (c.f. Lemma 3), we also i i i i where q and s are the corresponding columns of Q and S, needtoensurethatthecolumnsofL spanU.Thefollowing i i s1 respectively. Thus, d −Uq is a sparse vector. This suggests lemma establishes that a small subset of the columns of D i i that q can be learned using the minimization sampled uniformly at random contains sufficient information i (i.e., the columns of the LR component of the sampled data min(cid:107)d −Uqˆ(cid:107) , (12) i i 1 span U) if the RS is incoherent. qˆi where the (cid:96) -norm is used as a surrogate for the (cid:96) -norm to Lemma 2. Suppose m columns are sampled uniformly at 1 0 1 promote a sparse solution [8], [11]. The optimization problem random from the matrix L with rank r. If (12)issimilartoasystemoflinearequationswithr unknown (cid:18) (cid:18) (cid:19)(cid:19) 3 variables and N equations. Since r (cid:28) N , the idea is to m ≥rγ2(V)max c logr,c log , (8) 1 1 1 2 3 δ learn q using only a small subset of the equations. Thus, we i propose the following vector decomposition program then the selected columns of the matrix L span the columns subspace of L with probability at least (1−δ) where c2 and min(cid:107)ST2di−ST2Uqˆi(cid:107)1, (13) c3 are numerical constants. qˆi Thus,ifγ(V)issmall(i.e.,theRSisnotcoherent),asmall where S2 ∈ RN1×m2 selects m2 rows of U (and the corre- sponding m elements of d ). set of randomly sampled columns can span U. According to 2 i First, we have to ensure that the rank of STU is equal to Lemma2,ifm satisfies(8),thenLandL havethesameCS 2 1 s1 the rank of U, for if q∗ is the optimal point of (13), then whp. The following optimization problem (of dimensionality Uq∗ will be the LR component of d . According to Lemma N m ) is solved to decompose D into its LR and sparse i 1 1 s1 2, m = O(rγ(U)), is sufficient to preserve the rank of U components. 2 when the rows are sampled uniformly at random. In addition, 1 min √ (cid:107)S˙ (cid:107) +(cid:107)L˙ (cid:107) the following lemma establishes that if the rank of U is equal s1 1 s1 ∗ L˙s1,S˙s1 N1 (9) to the rank of ST2U, then the sufficient value of m2 for (13) subject to L˙ +S˙ =D . to yield the correct columns of Q whp is linear in r. s1 s1 s1 4 Lemma 4. Suppose that the rank of STU is equal to the where 2 rank of L and assume that the CS of L is sampled from the (cid:18)c max(r,logN ) (cid:19) µ(cid:48) =max 7 1 ,6γ2(V),(c γ(V)logN )2 , random orthogonal model. The optimal point of (13) is equal r 9 1 to q with probability at least (1−3δ) provided that i logN κ= 1, 0.5 r ρ≤ , rβ(cid:0)c6κlogNδ1 +1(cid:1) {ci}9i=1, c(cid:48)2 and c(cid:48)3 are constant numbers and β is any (cid:18)2rβ(β−2)log(cid:0)1(cid:1)(cid:18) N (cid:19) real number greater that one, then the proposed approach m2 ≥max 3(β−1)2 δ c6κlog δ1 +1 , (14) (Algorithm 1) yields the exact decomposition with probability at least (1−5δ−3rN−7) provided that N is greater than (cid:114) (cid:19) 1 1 N 3 c (log 1)2, 6 the RHS of first inequality of (10). 5 δ δ Theorem (5) guarantees that the LR component can be where κ= logN1, c and c are constant numbers and β can obtainedusingasmallsubsetofthedata.Therandomizedap- r 5 6 be any real number greater than one. proachhastwomainadvantages.First,itsignificantlyreduces the memory/storage requirements since it only uses a small Therefore,wecanobtaintheLRcomponentofeachcolumn datasketchandsolvestwolow-dimensionaloptimizationprob- usingarandomsubsetofitselements.Since(12)isan(cid:96) -norm 1 lemsversusonelargeproblem.Second,theproposedapproach minimization, we can write the representation matrix learning hasO(max(N ,N )×max(m ,m )×r)per-iterationrunning 1 2 1 2 problem as time complexity, which is significantly lower than O(N N r) 1 2 min(cid:107)STD−STUQˆ(cid:107) . (15) per iteration for full scale decomposition (2) [3], [4] implying Qˆ 2 2 1 remarkable speedups for big data. For instance, consider U and Q sampled from N(0,1), r = 5, and S following the Thus, (15) learns Q using a subset of the rows of D as ST2D Bernoulli model with ρ=0.02. For values of N1 =N2 equal is the matrix formed from m2 sampled rows of D. to 500, 1000, 5000, 104 and 2×104, if m1 = m2 = 10r, As such, we solve two low-dimensional subspace pursuit the proposed approach yields the correct decomposition with problems (9) and (15) of dimensions N1m1 and N2m2, 90,300,680,1520and4800-foldspeedup,respectively,over respectively, instead of an N1N2-dimensional decomposition directly solving (2). problem(2),anduseasmallrandomsubsetofthedatatolearn U and Q. The table of Algorithm 1 explains the structure of Algorithm 1 Structure of Proposed Approach the proposed approach. Input:DatamatrixD∈RN1×N2 We can readily state the following theorem which estab- 1. Initialization: Form column sampling matrix S1 ∈ RN2×m1 and row lishes sufficient conditions for Algorithm 1 to yield exact samplingmatrixS2∈RN1×m2. decomposition.Inthistheorem,itisassumedthatthecolumns 2.CSLearning androwsaresampleduniformlyatrandom.Inthenextsection, 2.1 Column sampling: Matrix S1 samples m1 columns of the given data anefficientmethodforcolumnandrowsamplingispresented. matrix,Ds1=DS1. In addition, a scheme for online implementation is proposed. 2.2CSlearning:Theconvexprogram(9)isappliedtothesampledcolumns Ds1. Theorem5. SupposethecolumnssubspaceoftheLRmatrixis 2.3CScalculation:TheCSisfoundasthecolumnssubspaceofthecalculated sampled from the random orthogonal model and the support LRcomponent. set of S follows the Bernoulli model with parameter ρ. In 3.RepresentationMatrixLearning addition, it is assumed that Algorithm 1 samples the columns 3.1 Row sampling: Matrix S2 samples m2 rows of the given data matrix, and rows uniformly at random. If for any small δ >0, Ds2=ST2D. (cid:32) (cid:18) (cid:19) 3.2 Representation matrix learning: The convex problem (15) is applied to 3 m1 ≥max rγ2(V)max c2logr,c3log δ , thesampledrowstofindtherepresentationmatrix. Output:IfUˆ isanorthonormalbasisforthelearnedCSandQˆ istheobtained (cid:33) r representationmatrix,thenLˆ =UˆQˆ istheobtainedLRcomponent. µ(cid:48)(logN )2 ρ 1 r (cid:32) (cid:0) (cid:48) (cid:48) 3(cid:1) IV. EFFICIENTCOLUMN/ROWSAMPLING m ≥max rlogN max c logr,c log , 2 1 2 3 δ In sharp contrast to randomized algorithms for matrix 2rβ(β−2)log(cid:0)N2(cid:1)(cid:18) N N (cid:19) (16) approximationsrootedinnumericallinearalgebra(NLA)[21], δ c κlog 1 2 +1 , [22],whichseektocomputematrixapproximationsfromsam- 3(β−1)2 6 δ pleddatausingimportancesampling,inmatrixdecomposition (cid:114) (cid:33) N N 3 androbustPCAwedonothavedirectaccesstotheLRmatrix c (log 1 2)2, 6 5 δ δ to measure how informative particular columns/rows are. As such,theexistingrandomizedalgorithmsformatrixdecompo- (cid:32) (cid:33) 0.5 sitionandrobustPCA[5]–[7],[13]havepredominantlyrelied ρ≤min ρ , s rβ(cid:0)c6κlogN1δN2 +1(cid:1) upon uniform random sampling of columns/rows. 5 subspace, consider a matrix G ∈ R2000×6150 generated as G = [G G ...G ]. For 1 ≤ i ≤ n, G = U Q , where 1 2 n 2 i i i Ui ∈ R2000×nr, Qi ∈ Rnr×20n0r. For n/2 + 1 ≤ i ≤ n, Gi = UiQi , where Ui ∈ R2000×nr, Qi ∈ Rnr×5nr. The elements of U and Q are sampled independently from a i i normal N(0,1) distribution. The parameter r is set equal to 60, thus, the rank of G is equal to 60 whp. Fig. 2 illustrates the rank of the randomly sampled columns versus the number Fig.1. Datadistributionsinatwo-dimensionalsubspace.Theredpointsare of sampled columns for different number of clusters n. As n thenormalizeddatapoints. increases, so does the required number of uniformly sampled columns. When n = 60, it turns out that we need to sample 60 more than half of the columns to span the CS. As such, 55 we cannot evade high-dimensionality with uniform random mns50 n = 1 column/row sampling. In [23], it was shown that the RS Rank of the Sampled Colu344505 nnnnnn ====== 2412360000 cdoncooirusehhmtteeriibcrrbeeaeunnlrtccrioeoyyfsniurp,nalaactnrnridaendmoaLmsveeietlcsemyerw-msovahefamernt2spha,lte.ehwdTeRhhcciSceoo.hsllueuemmostbnnasssbelfdriovselhapletoeidwonndthassammltintoahertecaehrscluyloufufiosrtncetihretehnede-t 30 250 1000 2000 3000 4000 5000 6000 B. Efficient column sampling method Number of Randomly Sampled Columns Fig.2. Therankofasetofuniformlyrandomsampledcolumnsfordifferent Column sampling is widely used for dimensionality re- numberofclusters. duction and feature selection [15], [26], [27]. In the column sampling problem, the LR matrix (or the matrix whose span is to be approximated with a small set of its columns) is InSectionIV-A,webrieflydescribetheimplicationsofnon- available. Thus, the columns are sampled based on their uniform data distribution and show that uniform random sam- importance, measured by the so-called leverage scores [21], pling may not be favorable for data matrices exhibiting some as opposed to blind uniform sampling. We refer the reader to structures that prevail much of the real datasets. In Section [15], [27] and references therein for more information about IV-B, we demonstrate an efficient column sampling strategy efficient column sampling methods. which will be integrated with the proposed decomposition Next, we present a sampling approach which will be used method.Thedecompositionmethodwithefficientcolumn/row in Section IV-C where the proposed decomposition algorithm sampling is presented in Sections IV-C and IV-D. with efficient sampling is presented. The proposed sampling strategy is inspired by the approach in [27] in the context A. Non-uniform data distribution of volume sampling. The table of Algorithm 2 details the Whendatapointslieinalow-dimensionalsubspace,asmall presented column sampling procedure. Given a matrix A subset of the points can span the subspace. However, uniform with rank r , the algorithm aims to sample a small subset A random sampling is only effective when the data points are of the columns of A that span its CS. The first column is distributeduniformlyinthesubspace.Toclarify,Fig.1shows sampleduniformlyatrandomorbasedonajudiciouslychosen two scenarios for a set of data points in a two-dimensional probability distribution [21]. The next columns are selected subspace. In the left plot, the data points are distributed sequentially so as to maximize the novelty to the span of the uniformlyatrandom.Inthiscase,tworandomlysampleddata selected columns. As shown in step 2.2 of Algorithm 2, a pointscanspanthesubspacewhp.Intherightplot,95percent design threshold τ is used to decide whether a given column of the data lie on a one-dimensional subspace, thus we may brings sufficient novelty to the sampled columns by thresh- not be able to capture the two-dimensional subspace from a olding the (cid:96) -norm of its projection on the complement of 2 small random subset of the data points. the span of the sampled columns. The threshold τ is naturally In practice, the data points in a low-dimensional subspace set to zero in a noise-free setting. Once the selected columns may not be uniformly distributed, but rather exhibit some are believed to span the CS of A, they are removed from additional structures. A prevailing structure in many modern A. This procedure is repeated C times (using the remaining applications is clustered data [23], [24]. For example, user columns). In each time, the algorithm finds r columns which ratingsforcertainproducts(e.g.movies)inrecommendersys- spantheCSofA.Aftereveryiteration,therankofthematrix temsarenotonlyLRduetotheirinherentcorrelations,butalso of remaining columns is bounded above by r . As such, A exhibit additional clustering structures owing to the similarity the algorithm samples approximately m ≈ Cr columns 1 A of the preferences of individuals from similar backgrounds in total. In the proposed decomposition method with efficient (e.g. education, culture, or gender) [23], [25]. column/row sampling (presented in Section IV-C), we set C To further show that uniform random sampling falls short large enough to ensure that the selected columns form a low when the data points are not distributed uniformly in the rank matrix. 6 Algorithm 2 Efficient Sampling from LR Matrices intuitionisthatwhilealmost60randomlysampledrowsofG Input:MatrixA. span the RS, a considerable portion of the columns (almost 1.Initialize 4000) should be sampled to capture the CS as shown in Fig. 1.1 The parameter C is chosen as an integer greater than or equal to one. 2. As another example, consider an extreme scenario where ThealgorithmfindsC setsoflinearlydependentcolumns. only two columns of G ∈ R1000×1000 are non-zero. In this 1.2 Set I = ∅ as the index set of the sampled columns and set v = τ, case, with random sampling we need to sample almost all the B=AandC=[]. columns to ensure that the sampled columns span the CS of 2.RepeatCTimes G.But,ifthenon-zerocolumnsarenon-sparse,asmallsubset 2.1 Let b be a non-zero randomly sampled column from B with index ib. of randomly chosen rows of G will span its row space. UpdateCandI asC=[C b],I={I, ib}. Let rˆ denote a known upper bound on r. Such knowledge 2.2Whilev≥τ is often available as side information depending on the par- 2.2.1SetE=PcB,wherePcistheprojectionmatrixontothecomplement ticular application. For instance, facial images under varying spaceofspan(C). illumination and facial expressions are known to lie on a 2.2.2 Define f as the column of E with the maximum (cid:96)2-norm with index special low-dimensional subspace [28]. For visualization, Fig. if.UpdateC,I andv asC=[C f] , I={I, if} and v=(cid:107)f(cid:107)2. 3 provides a simplified illustration of the matrices defined in 2.2EndWhile this section. We sample C rˆrows of D uniformly at random. r 2.3SetC=[]andsetBequaltoAwiththecolumnsindexedbyI setto Let Dw ∈ R(Crrˆ)×N2 denote the matrix of sampled rows. zero. We choose C sufficiently large to ensure that the non-sparse r 2.EndRepeat componentofD isaLRmatrix.DefineL ,assumablywith w w Output:ThesetI containstheindicesoftheselectedcolumns. rank r, as the LR component of D . If we locate a subset w of the columns of L that span its CS, the corresponding w columns of L would span its CS. To this end, the convex program (2) is applied to D to extract its LR component w denoted Lˆ . Then, Algorithm 2 is applied to Lˆ to find a w w set of informative columns by sampling m ≈ Crˆ columns. 1 In Remark 1, we discuss how to choose C in the algorithm. Define Lˆs as the matrix of columns selected from Lˆ . The w w matrix D is formed using the columns of D corresponding s1 to the sampled columns of Lˆ . w 2) CS learning: Similar to the CS learning step of Algo- Fig.3. VisualizationofthematricesdefinedinSectionIV-C.MatrixDw is rithm 1, we can obtain the CS of L by decomposing Ds1. selectedrandomlyorusingAlgorithm3describedinSectionIV-D. However, we propose to leverage valuable information in the matrix Lˆs in decomposing D . In particular, if D is w s1 w decomposed correctly, the RS of Lˆs would be same as that C. Proposed decomposition algorithm with efficient sampling w of L given that the rank of L is equal to r. Let V be an s1 w s1 In this section, we develop a modified decomposition algo- orthonormal basis for the RS of Lˆs. Thus, in order to learn w rithmthatreplacesuniformrandomsamplingwiththeefficient the CS of D , we only need to solve s1 column/row sampling method (Algorithm 2). In Section V, it is shown that the proposed technique can remarkably reduce min(cid:107)Ds1−UˆVsT1(cid:107)1. (17) Uˆ the sampling requirement. We consider a setting wherein the data points (the columns of L) are not uniformly distributed, Remark 1. Define dis1 as the ith row of Ds1. According to rather they admit an additional structure (such as clustering), (17), Ui (the ith row of U) is obtained as the optimal point wherefore a small subset of uniformly sampled columns is of not likely to span the CS. However, we assume that the rows min(cid:107)(di )T −V (Uˆi)T(cid:107) . (18) of L are distributed well enough, in the sense that they do s1 s1 1 Uˆi not much align along any specific directions, such that C r r Based on the analysis provided for the proof of Lemma 4, the rows of L sampled uniformly at random span its RS whp, optimal point of (18) is equal to Ui if m ≥ r +η(cid:107)Si (cid:107) , for some constant C . In Section IV-D, we dispense with this 1 s1 0 r where Si is the ith row of S , (cid:107)Si (cid:107) the number of non- assumption. The proposed decomposition algorithm rests on s1 s1 s1 0 zero elements of Si , and η a real number which depends on three key ideas detailed next. s1 the coherency of the subspace spanned by V . Thus, here C 1) Informative column sampling: The first important idea s1 is determined based on the rank of L and the sparsity of S, underlying the proposed sampling approach is to start sam- i.e., Cr −r has to be sufficiently greater than the expected pling along the dimension that has the better distribution. value for the number of non-zero elements of the rows of S . For instance, in the example considered in Section IV-A, the s1 columns of G admit a clustering structure. However, the CS Remark 2. We note that the convex algorithm (2) may not of G is a random r-dimensional subspace, which means that always yield accurate decomposition of D since structured w the rows of G are distributed uniformly at random in the RS data may not be sufficiently incoherent [2], [23] suggesting ofG.Thus,inthiscasewestartwithrowsampling.Themain that the decomposition step can be further improved. Let Ds w 7 be the matrix consisting of the columns of D corresponding w tothecolumnsselectedfromLˆ toformLˆs.Accordingtoour w w investigations, an improved V can be obtained by applying s1 thedecompositionalgorithmpresentedin[29]toDs thenuse w the RS of Lˆs as an initial guess for the RS of the non-sparse w component of Ds. Since Ds is low-dimensional (roughly w w O(r) × O(r) dimensional matrix), this extra step is a low complexity operation. 3) Representation matrix learning: Suppose that the CS of Fig. 4. Visualization of Algorithm 3. We run few cycles of the algorithm L was learned correctly, i.e., the span of the optimal point of and stop when the rank of the LR component does not change over T (17) is equal to the span of U. Thus, we use U as a basis for consecutive steps. One cycle of the algorithm starts from the point marked the learned CS. Now we leverage the information embedded “I” and proceeds as follows. I: Matrix Dw is decomposed and Lˆw is the obtained LR component of Dw. II: Algorithm 2 is applied to Lˆw to select in U to select the informative rows. Algorithm 2 is applied to theinformativecolumnsofLˆw.Lˆsw isthematrixofcolumnsselectedfrom UT to locate m2 ≈Cr rows of U. Thus, we form the matrix Lˆw. III: Matrix Dc is formed from the columns of D that correspond to DofsU2 .frTomhust,hethreowresproefseDntactioornremspaotnridxinigs lteoartnheedsealsected rows tchoemcpoolnuemnntsooffDLˆcsw..2:1:AMlgaotrriitxhmDc2iissdaepcpolimedpotoseLdˆTcandtoLˆsecleisctththeeoibntfaoinrmedatLivRe rows of Lˆc. Lˆsc is the matrix of rows selected from Lˆc. 3: Matrix Dw is min(cid:107)D −U Qˆ(cid:107) , (19) formedastherowsofDcorrespondingtotherowsusedtoformLˆsc. s2 s2 1 Qˆ where Us2 ∈Rm2×r is the matrix of the selected rows of U. quite well and the RS of Lw converges to the RS of L in few Subsequently,theLRmatrixcanbeobtainedfromthelearned steps.Wehavealsofoundthataddingsomerandomlysampled CS and the representation matrix. columns (rows) to Dc(Dw) can effectively avert converging toalowerdimensionalsubspace.Forinstance,somerandomly sampledcolumnscanbeaddedtoD ,whichwasobtainedby D. Column/Row sampling from sparsely corrupted data applying Algorithm 2 to Lˆ . c w In Section IV-C, we assumed that the LR component of D has rank r. However, if the rows are not well-distributed, Algorithm 3 Efficient Column/Row Sampling from Sparsely w a reasonably sized random subset of the rows may not span Corrupted LR Matrices the RS of L. Here, we present a sampling approach which 1.Initialization can find the informative columns/rows even when both the Form Dw ∈ RCrrˆ×N2 by randomly choosing Crrˆrows of D. Initialize columns and the rows exhibit clustering structures such that a k=1andsetT equaltoanintegergreaterthan1. smallrandomsubsetofthecolumns/rowsofLcannotspanits 2.Whilek>0 CS/RS. The algorithm presented in this section (Algorithm 3) 2.1Samplethemostinformativecolumns can be independently used as an efficient sampling approach 2.1.1ObtainLˆw via(2)astheLRcomponentofDw. from big data. In this paper, we use Algorithm 3 to form Dw 2.1.2ApplyAlgorithm2toLˆw withC=Cr. if both the columns and rows exhibit clustering structures. 2.1.3 Form the matrix Dc from the columns of D corresponding to the ThetableofAlgorithm3,Fig.4anditscaptionprovidethe sampledcolumnsofLˆw. details of the proposed sampling approach and the definitions 2.2Samplethemostinformativerows of the used matrices. We start the cycle from the position 2.2.1ObtainLˆc via(2)astheLRcomponentofDc. marked “I” in Fig. 4 with Dw formed according to the 2.2.2ApplyAlgorithm2toLˆTc withC=Cr. initialization step of Algorithm 3. For ease of exposition, 2.2.3FormthematrixDw fromtherowsofDcorrespondingtothesampled assume that Lˆw = Lw and Lˆc = Lc, i.e., Dw and Dc rowsofLˆc. are decomposed correctly. The matrix Lˆsw is the informative 2.3 If the dimension of the RS of Lˆw does not increase in T consecutive columns of Lˆ . Thus, the rank of Lˆs is equal to the rank iterations,setk=0tostopthealgorithm. w w of Lˆ . Since Lˆ = L , Lˆs is a subset of the rows of L . 2.EndWhile w w w w c If the rows of L exhibit a clustering structure, it is likely Output:ThematricesDw andLˆw canbeusedforcolumnsamplinginthe that rank(Lˆs) < rank(L ). Thus, rank(L ) < rank(L ). firststepoftheAlgorithmpresentedinSectionIV-C. w c w c We continue one cycle of the algorithm by going through steps 1, 2 and 3 of Fig. 4 to update D . Using a similar Algorithm 3 was found to converge in a very small number w argument, we see that the rank of an updated L will be of iterations (typically less than 4 iterations). Thus, even w greater than the rank of Lc. Thus, if we run more cycles when Algorithm 3 is used to form the matrix Dw, the order of the algorithm – each time updating D and D – the of complexity of the proposed decomposition method with w c rank of L and L will increase. As detailed in the table efficient column/row sampling (presented in Section IV-C) is w c of Algorithm 3, we stop if the dimension of the span of the roughly O(max(N1,N2)r2). obtained LR component does not change in T consecutive iterations. While there is no guarantee that the rank of L E. Online Implementation w will converge to r (it can converge to a value smaller than Theproposeddecompositionapproachconsistsoftwomain r), our investigations have shown that Algorithm 3 performs steps, namely, learning the CS of the LR component then 8 decomposing the columns independently. This structure lends whereNisanadditivenoisecomponent.In[30],itwasshown itselftoonlineimplementation,whichcouldbeverybeneficial that the program in settings where the data arrives on the fly. The idea is to min λ(cid:107)Sˆ(cid:107) +(cid:107)Lˆ(cid:107) first learn the CS of the LR component from a small batch 1 ∗ Lˆ,Sˆ (23) otrfacthkeedd,aatnayanndewkedeaptatrcaockluinmgnthcaenCbSe. SdeinccoemtphoeseCdSbiassebdeinogn subject to (cid:13)(cid:13)Lˆ +Sˆ−D(cid:13)(cid:13)F ≤(cid:15)n, the updated subspace. The table of Algorithm 4 details the canrecovertheLRandsparsecomponentswithanerrorbound proposed online matrix decomposition algorithm, where d that is proportional to the noise level. The parameter (cid:15) has t n denotes the tth received data column. to be chosen based on the noise level. This modified version Algorithm4usesaparametern whichdeterminestherate can be used in the proposed algorithms to account for the u at which the algorithm updates the CS of the LR component. noise. Similarly, to account for the noise in the representation For instance, if nu = 20, then the CS is updated every 20 learningproblem(15),the(cid:96)1-normminimizationproblemcan new data columns (step 2.2 of Algorithm 4). The parameter be modified as follows: n has to be set in accordance with the rate of the change u min(cid:107)STD−STUQˆ −Eˆ(cid:107) subject to (cid:107)Eˆ(cid:107) ≤δ . (24) of the subspace of the LR component; a small value for nu Qˆ,Eˆ 2 2 1 F n is used if the subspace is changing rapidly. The parameter n determines the number of columns last received that are useds Eˆ ∈Rm2×N2 is used to cancel out the effect of the noise and to update the CS. If the subspace changes rapidly, the older the parameter δn is chosen based on the noise level [31]. columnsmaybelessrelevanttothecurrentsubspace,hencea small value for ns is used. On the other hand, when the data V. NUMERICALSIMULATIONS is noisy and the subspace changes at a slower rate, choosing In this section, we present some numerical simulations to a larger value for ns can lead to more accurate estimation of studytheperformanceoftheproposedrandomizeddecomposi- the CS. tion method. First, we present a set of simulations confirming our analysis which established that the sufficient number of Algorithm 4 Online Implementation sampled columns/rows is linear in r. Then, we compare the 1.Initialization proposed approach to the state-of-the-art randomized algo- 1.1Settheparametersnu andns equaltointegersgreaterthanorequalto rithm[5]anddemonstratethattheproposedsamplingstrategy one. 1.2FormD0∈RN1×(Crrˆ) as can lead to notable improvement in performance. We then provide an illustrative example to showcase the effectiveness D0=[d1d2...dCrrˆ]. of our approach on real video frames for background subtrac- Decompose D0 using (2) and obtain the CS of its LR component. Define tionandactivitydetection.Giventhestructureoftheproposed Uo as the learned CS, Qo the appropriate representation matrix and Sˆ the obtainedsparsecomponentofD0. approach, it is shown that side information can be leveraged 1.3ApplyAlgorithm2toUTo toconstructtherowsamplingmatrixS2. to further simplify the decomposition task. In addition, a numerical example is provided to examine the performance 2.Foranynewdatacolumndt do 2.1Decomposedt as ofAlgorithm3.Finally,weinvestigatetheperformanceofthe online algorithm and show that the proposed online method mqˆitn(cid:107)ST2dt−ST2Uoqˆt(cid:107)1, (20) can successfully track the underlying subspace. andupdate In all simulations, the Augmented Lagrange multiplier Qo←[Qo q∗t],Sˆ←[Sˆ (dt−Uoq∗t)],whereq∗t istheoptimalpointof (ALM) algorithm [1], [4] is used to solve the optimization (20). problem (2). In addition, the (cid:96) -magic routine [32] is used to 2.2Iftheremainderof ntu isequaltozero,updateUo as solve the (cid:96) -norm minimizatio1n problems. It is important to 1 min(cid:107)Dt−UˆoQto(cid:107)1, (21) note that in all the provided simulations (except in Section Uˆo V-D), the convex program (2) that operates on the entire data whereQto isthelastnsrˆcolumnsofQo andDt isthematrixformedfrom canyieldcorrectdecompositionwithrespecttotheconsidered the last nsrˆreceived data columns. Apply Algorithm 2 to the new UTo to criteria. Thus, if the randomized methods cannot yield correct updatetherowsamplingmatrixS2. decomposition, it is because they fall short of acquiring the 2.EndFor essential information through sampling. Output The matrix Sˆ as the obtained sparse matrix, Lˆ = D−Sˆ as the obtained LR matrix and Uo as the current basis for the CS of the LR A. Phase transition plots component. In this section, we investigate the required number of ran- domly sampled columns/rows. The LR matrix is generated as a product L=UrQr, where Ur ∈RN1×r and Qr ∈Rr×N2. The elements of U and Q are sampled independently from F. Noisy Data r r a standard normal N(0,1) distribution. The sparse matrix S In practice, noisy data can be modeled as follows the Bernoulli model with ρ = 0.02. In this experi- ment, Algorithm 1 is used and the column/rows are sampled D=L+S+N, (22) uniformly at random. 9 Fig.7. Performanceoftheproposedapproachandtherandomizedalgorithm in [5]. A value 1 indicates correct decomposition and a value 0 indicates incorrectdecomposition. Fig.5. Phasetransitionplotsforvariousrankandsparsitylevels.Whitedesig- natessuccessfuldecompositionandblackdesignatesincorrectdecomposition. For 1≤i≤ n, 2 G =U Q , i i i whereUi ∈R2000×nr,Qi ∈Rnr×13n0r andtheelementsofUi and Q are sampled independently from a normal distribution i N(0,1). For n/2+1≤i≤n, G =13U Q , i i i where Ui ∈ R2000×nr, Qi ∈ Rnr×1n0r, and the elements Fig.6. Phasetransitionplotsforvariousdatamatrixdimensions. of U and Q are sampled independently from an N(0,1) i i distribution. We set r equal to 60; thus, the rank of L is equal to 60 whp. The sparse matrix S follows the Bernoulli model Fig.5showsthephasetransitionplotsfordifferentnumbers and each element of S is non-zero with probability 0.02. In of randomly sampled rows/columns. In this simulation, the this simulation, we do not use Algorithm 3 to form D . The dataisa1000×1000matrix.Foreach(m ,m ),wegenerate w 1 2 matrixD isformedfrom300uniformlysampledrowsofD. 10 random realizations. A trial is considered successful if the w recoveredLRmatrixLˆ satisfies (cid:107)L−Lˆ(cid:107)F ≤5×10−3.Itisclear We evaluate the performance of the algorithm for different (cid:107)L(cid:107)F values of n, i.e., different number of clusters. Fig. 7 shows that the required number of sampled columns/rows increases the performance of the proposed approach and the approach astherankorthesparsityparameterρareincreased.Whenthe in [5] for different values of m and m . For each value sparsity parameter is increased to 0.3, the proposed algorithm 1 2 of m = m , we compute the error in LR matrix recovery can hardly yield correct decomposition. Actually, in this case 1 2 the matrix S is no longer a sparse matrix. (cid:107)L−Lˆ(cid:107)F averaged over 10 independent runs, and conclude (cid:107)L(cid:107)F ThetoprowofFig.5confirmsthatthesufficientvaluesfor that the algorithm can yield correct decomposition if the m andm areroughlylinearinr.Forinstance,whentherank average error is less than 0.01. In Fig. 7, the values 0, 1 1 2 is increased from 5 to 25, the required value for m increases designate incorrect and correct decomposition, respectively. It 1 from 30 to 140. In this experiment, the column and RS of canbeseenthatthepresentedapproachrequiresasignificantly L are sampled from the random orthogonal model. Thus, the smallernumberofsamplestoyieldthecorrectdecomposition. CS and RS have small coherency whp [20]. Therefore, the This is due to the fact that the randomized algorithm [5] important factor governing the sample complexity is the rank samples both the columns and rows uniformly at random of L. Indeed, Fig. 6 shows the phase transition for different and independently. In sharp contrast, we use Lˆw to find the sizes of the data matrix when the rank of L is fixed. One most informative columns to form Ds1, and also leverage the can see that the required values for m and m are almost information embedded in the CS to find the informative rows 1 2 independent of the size of the data confirming our analysis. to from Ds2. One can see that when n=60, [5] cannot yield correct decomposition even when m =m =1800. 1 2 B. Efficient column/row sampling C. Vector decomposition for background subtraction In this experiment, the algorithm presented in Section IV-C TheLRplussparsematrixdecompositioncanbeeffectively iscomparedtotherandomizeddecompositionalgorithmin[5]. usedtodetectamovingobjectinastationarybackground[1], Itisshownthattheproposedsamplingstrategycaneffectively [33].ThebackgroundismodeledasaLRmatrixandthemov- reduce the required number of sampled columns/rows, and ing object as a sparse matrix. Since videos are typically high makes the proposed method remarkably robust to structured dimensionalobjects,standardalgorithmscanbequiteslowfor data. In this experiment, D is a 2000×4200 matrix. The LR such applications. Our algorithm is a good candidate for such component is generated as a problem as it reduces the dimensionality significantly. The L=[G G ...G ]. decompositionproblemcanbefurthersimplifiedbyleveraging 1 2 n 10 100 ws o R ns/ 90 m u ol C d 80 Number of Clusters = 2 Fig.8. Stationarybackground. mple Number of Clusters = 50 Sa Number of Clusters = 100 he 70 of t k Ran 60 1 2 3 4 5 6 Iteration Number Fig.10. Therankofthematrixofsampledcolumns. mately 3r columns of DT and form D from the rows of D c w corresponding to the selected rows of D . c 2. Apply Algorithm 2 to D with C =3 to sample approxi- Fig. 9. Two frames of a video taken in a lobby. The first column displays w theoriginalframes.ThesecondandthirdcolumnsdisplaytheLRandsparse mately 3r columns of Dw and form Dc from the columns componentsrecoveredusingtheproposedapproach. of D corresponding to the selected columns of D . Fig. c 10 shows the rank of D after each iteration. It is evident c that the algorithm converges to the rank of L in less than 3 prior information about the stationary background. In partic- iterations even for n=100 clusters. For all values of n, i.e., ular, we know that the background does not change or we n∈{2,50,60}, the data is a 10250×10250 matrix. canconstructitwithsomepre-knowndictionary.Forexample, considerthevideofrom[34],whichwasalsousedin[1].Few frames of the stationary background are illustrated in Fig. 8. E. Online Implementation Thus, we can simply form the CS of the LR matrix using In this section, the proposed online method is examined. these frames which can describe the stationary background It is shown that the proposed scalable online algorithm tracks in different states. Accordingly, we just need to learn the theunderlyingsubspacesuccessfully.ThematrixSfollowsthe representation matrix. As such, the background subtraction Bernoulli model with ρ=0.01. Assume that the orthonormal problem is simplified to a vector decomposition problem. Fig. matrix U ∈ RN1×r spans a random r-dimensional subspace. 9 shows that the proposed method successfully separates the The matrix L is generated as follows. background and the moving objects. In this experiment, 500 For k from 1 to N randomly sampled rows are used (i.e., 500 randomly sampled 2 1. Generate E∈RN1×r and q∈Rr×1 randomly. pixels) for the representation matrix learning (15). While the 2. L=[L Uq] . runningtimeofourapproachisjustfewmilliseconds,ittakes 3. If (mod(k,n) = 0) almost half an hour if we use (2) to decompose the video file U=approx-r(U+αE). [1]. End If End For D. Alternating algorithm for column sampling Theelementsofq andEaresampledfromstandardnormal i Inthissection,weinvestigatetheperformanceofAlgorithm distributions.Theoutputofthefunctionapprox-risthematrix 3 for column sampling. The rank of the selected columns is of the first r left singular vectors of the input matrix and shown to converge to the rank of L even when both the rows mod(k,n) is the remainder of k/n. The parameters α and and columns of L exhibit a highly structured distribution. To n control the rate of change of the underlying subspace. The generate the LR matrix L we first generate a matrix G as subspace changes at a higher rate if α is increased or n is in Section IV-A but setting r = 100. Then, we construct the decreased.Inthissimulation,n=10,i.e.,theCSisrandomly matrix U from the first r right singular vectors of G. We rotated every 10 new data columns. In this simulation, the g then generateG ina similarway andset V equalto thefirst parameterr =5andN =400.Wecomparetheperformance g 1 r right singular vectors of G. Let the matrix L = U VT. of the proposed online approach to the online algorithm g g For example, for n = 100, L ∈ R10250×10250. Note that the in [35]. For our proposed method, we set C = 20 when resulting LR matrix is nearly sparse since in this simulation Algorithm2 isapplied to U, i.e.,20r rowsof U are sampled. we consider a very challenging scenario in which both the The method presented in [35] is initialized with the exact √ columns and rows of L are highly structured and coherent. CS and its tuning parameter is set equal to 1/ N . The 1 Thus,inthissimulationwesetthesparsematrixequaltozero algorithm [35] updates the CS with every new data column. and use Algorithm 3 as follows. The matrix D is formed The parameter n of the proposed online method is set equal c u using 300 columns sampled uniformly at random and the to 4 (i.e., the CS is updated every 4 new data columns) and following steps are performed iteratively: the parameter n is set equal to 5r. Define Lˆ as the recovered s 1. Apply Algorithm 2 to DT with C =3 to sample approxi- LR matrix. Fig. 11 shows the (cid:96) -norm of the columns of c 2

See more

The list of books you might like

book image

Believe Me

Tahereh Mafi
·177 Pages
·2021
·2.19 MB

book image

The Silent Patient

Alex Michaelides
·0.52 MB

book image

Can’t Hurt Me: Master Your Mind and Defy the Odds

David Goggins
·364 Pages
·2018
·2.99 MB

book image

The Sweetest Oblivion (Made Book 1)

Danielle Lori
·360 Pages
·2018
·1.72 MB

book image

Greek Government Gazette: Part 2, 1993 no. 615

The Government of the Hellenic Republic
·8 Pages
·1993
·0.51 MB

book image

समग्र शणै गोंयबाब -3

शांताराम वर्दे वालावलीकार
·22 MB

book image

Chhattisgarh Gazette, 2006-01-20, No. 3, Pt. 3 (2)

Government of Chhattisgarh
·0.28 MB

book image

Ships Company Book 1

18 Pages
·2021
·0.59 MB

book image

Greek Government Gazette: Part 2, 2006 no. 1777

The Government of the Hellenic Republic
·2006
·0.13 MB

book image

Greek Government Gazette: Part 2, 2006 no. 1923

The Government of the Hellenic Republic
·2006
·0.13 MB

book image

Greek Government Gazette: Part 2, 2006 no. 1675

The Government of the Hellenic Republic
·2006
·3 MB

book image

Greek Government Gazette: Part 2, 2006 no. 1674

The Government of the Hellenic Republic
·2006
·0.12 MB

book image

Cage Aquaculture

172 Pages
·2014
·3.16 MB