1 Deterministic and Probabilistic Conditions for Finite Completability of Low-rank Multi-View Data Morteza Ashraphijuo, Xiaodong Wang and Vaneet Aggarwal 7 Abstract 1 0 2 Weconsiderthemulti-viewdatacompletionproblem,i.e.,tocompleteamatrixU=[U1|U2]wheretheranks n of U,U , and U are given. In particular, we investigate the fundamental conditions on the sampling pattern, 1 2 a J i.e., locations of the sampled entries for finite completability of such a multi-view data given the corresponding 3 rank constraints. In contrast with the existing analysis on Grassmannian manifold for a single-view matrix, i.e., ] T conventional matrix completion, we propose a geometric analysis on the manifold structure for multi-view data to I . s incorporate more than one rank constraint. We provide a deterministic necessary and sufficient condition on the c [ sampling pattern for finite completability. We also give a probabilistic condition in terms of the number of samples 1 percolumnthatguaranteesfinitecompletabilitywithhighprobability.Finally,usingthedevelopedtools,wederive v 7 the deterministic and probabilistic guarantees for unique completability. 3 7 0 Index Terms 0 . Multi-view learning, low-rank matrix completion, finite completion, unique completion, algebraic geom- 1 0 etry. 7 1 : v i X r a Morteza Ashraphijuo and Xiaodong Wang are with the Department of Electrical Engineering, Columbia University, NY, email: {ashraphijuo,wangx}@ee.columbia.edu. Vaneet Aggarwal is with the School of Industrial Engineering, Purdue University, West Lafayette, IN, email: [email protected]. 2 I. INTRODUCTION High-dimensional data analysis has received significant recent attention due to the ubiquitous big data, including images and videos, product ranking datasets, gene expression database, etc. Many real-world high-dimensional datasets exhibit low-rank structures, i.e., the data can be represented in a much lower dimensionalform[1,2].Efficientlyexploitingsuchlow-rankstructureforanalyzinglargehigh-dimensional datasets is one of the most active research area in machine learning and data mining. In this paper, we consider the multi-view low-rank data completion problem, where the ranks of the first and second views, as well as the rank of whole data consisting of both views together, are given. The single-view learning problem has plenty of applications in various areas including signal processing [3], network coding [4], etc. The multi-view learning problem also finds applications in signal processing [5], multi-label image classification [6–8], image retrieval [9], image synthesis [10,11], data classification [12], multi-lingual text categorization [13], etc. Given a sampled matrix and a rank constraint, any matrix that agrees with the sampled entries and rank constraint is called a completion. A sampled matrix with a rank constraint is finitely completable if and only if there exist only finitely many completions of it. Most literature on matrix completion focus on developing optimization methods to obtain a completion. For single-view learning, methods including alternating minimization [14,15], convex relaxation of rank [16–18], etc., have been proposed. One generalization of the matrix completion problem is tensor completion where the number of orders can be more than two and alternating minimization methods [19,20] and other optimization-based methods [21,22], etc., have been proposed. Moreover, for multi-view learning, many optimization-based algorithms have been proposed recently [23–33]. The optimization-based matrix completion algorithms typically require incoherent conditions, which constrainsthevaluesoftheentries(sampledandnon-sampled)toobtainacompletionwithhighprobability. Moreover, the fundamental completability conditions that are independent of the specific completion algorithms have also been investigated. Specifically, deterministic conditions on the locations of the sampled entries (sampling pattern) are obtained through algebraic geometry analyses on Grassmannian manifold that lead to finite/unique solutions to the matrix completion problem [34,35]. In particular, in [34] a deterministic sampling pattern is proposed that is necessary and sufficient for finite completability of the sampled matrix of the given rank. Such an algorithm-independent condition can lead to a much lower sampling rate than the one that is required by the optimization-based completion algorithms. In 3 [36], we proposed a geometric analysis on Tucker manifold for low-rank tensor completion problem and provided the necessary and sufficient conditions on sampling pattern for finite completability of tensor. However, the analysis on Grassmannian manifold in [34] is not capable of incorporating more than one rankconstraint,andtheanalysisonTuckermanifoldin[36]isnotcapableofincorporatingrankconstraints for different views. In this paper, we investigate the finite completability problem for multi-view data by proposing an analysis on the manifold structure for such data. Consider a sampled data matrix U that is partitioned as U = [U |U ], where U and U are the first 1 2 1 2 and second views of U. The multi-view matrix completion problem is to complete U given the ranks of U,U , and U . Let Ω be the sampling pattern matrix of U, where Ω(x,y) = 1 if U(x,y) is sampled 1 2 and Ω(x,y) = 0 otherwise. This paper is mainly concerned with the following three problems. • Problem (i): Characterize the necessary and sufficient conditions on Ω, under which there exist only finite completions of U that satisfy all three rank constraints. • Problem (ii): Characterize sufficient conditions on Ω, under which there exists only one completion of U that satisfy all three rank constraints. • Problem(iii):Givelowerboundsonthenumberofsampledentriespercolumnsuchthattheproposed conditions on Ω for finite/unique completability are satisfied with high probability. The remainder of this paper is organized as follows. In Section II, the notations and problem statement are provided. In Section III, we characterize the deterministic necessary and sufficient condition on the sampling pattern for finite completability. In Section IV, a probabilistic guarantee for finite completability is proposed where the condition is in terms of the number of samples per column – in contrast with the geometric structure given in Section III. In Section V, deterministic and probabilistic guarantees for unique completability are provided. Numerical results are provided in Section VI to compare the number of samples per column for finite and unique completions based on our proposed analysis versus the existing method. Finally, Section VII concludes the paper. II. PROBLEM STATEMENT AND A MOTIVATING EXAMPLE Let U be the sampled matrix to be completed. Denote Ω as the sampling pattern matrix that is of the same size as U and Ω(x ,x ) = 1 if U(x ,x ) is observed and Ω(x ,x ) = 0 otherwise. For each subset 1 2 1 2 1 2 of columns U(cid:48) of U, define N (U(cid:48)) as the number of observed entries in U(cid:48) according to the sampling Ω pattern Ω. For any real number x, define x+ = max{0,x}. Also, I denotes an n×n identity matrix and n 0 denotes an n×m all-zero matrix. n×m 4 The generically chosen matrix U ∈ Rn×(m1+m2) is sampled. Denote a partition of U as U = [U |U ] 1 2 where U ∈ Rn×m1 and U ∈ Rn×m2 represent the first and second views of data, respectively. Given the 1 2 rank constraints rank(U ) = r , rank(U ) = r and rank(U) = r, we are interested in characterizing the 1 1 2 2 conditions on the sampling pattern matrix Ω under which there are infinite, finite, or unique completions for the sampled matrix U. In [34] a necessary and sufficient condition on the sampling pattern is given for the finite completability of a matrix U given rank(U) = r, based on an algebraic geometry analysis on the Grassmannian manifold. However, such analysis cannot be used to treat the above multi-view problem since it is not capable of incorporating the three rank constraints simultaneously. In particular, if we obtain the conditions in [34] corresponding to U, U and U respectively and then take the intersections of them, it will result in a 1 2 sufficient condition (not necessary) on the sampling pattern matrix Ω under which there are finite number of completions of U. Next, we provide an example such that: (i) given r , U is infinitely completable; (ii) given r , U 1 1 2 2 is infinitely completable;1 (iii) given r, U is infinitely completable; and (iv) given r , r and r, U is 1 2 finitely completable. In other words, if S denotes the set of completions of U given rank(U ) = r , S 1 1 1 2 denotes the set of completions of U given rank(U ) = r and S denotes the set of completions of U 2 2 given rank(U) = r, then in the following example |S | = ∞, |S | = ∞, |S| = ∞ and |S ∩S ∩S| < ∞. 1 2 1 2 Consider a generically chosen matrix U ∈ R4×4, where U = [U |U ], U ∈ R4×2 (the first two 1 2 1 columns) and U ∈ R4×2 (the last two columns). Assume that r = 1, r = 2 and r = 2. Moreover, 2 1 2 suppose that the sampled entries of U are shown below. U U 1 2 (cid:122) (cid:125)(cid:124) (cid:123) (cid:122) (cid:125)(cid:124) (cid:123) × × × × × − × × × − × − − − × × (cid:124) (cid:123)(cid:122) (cid:125) U 1(i) and (ii) together result that given r and r , U , U and U are infinitely completable. 1 2 1 2 5 We have the following observations about the number of completions of each matrix. • Given r1 = 1, U1 is infinitely completable: For any value of the (4,1)-th entry of U1, there exists exactly one completion of U . Hence, there exist infinitely completions of U . 1 1 • Given r2 = 2, U2 is infinitely completable: Observe that each value of the (3,2)-th entry of U2, corresponds to one completion of U . As a result, there are infinitely many completions of U . 2 2 • Given r = 2, U is infinitely completable: Note that for any value of the (2,2)-th entry of U, there exists at least one completion of U (as the second column of U is a linear combination of two vectors and only one entry of this column is known), and therefore U is infinitely completable. • For almost every matrix U, given r1 = 1, r2 = 2 and r = 2, U is finitely completable: We prove this statement in Appendix A by applying Theorem 2 which takes advantage of an geometric analysis on the manifold structure for multi-view data (which is not Grassmannian manifold) to incorporate all three rank constraints simultaneously. III. FINITE COMPLETABILITY In Section III-A, we define an equivalence relation among all bases of the sampled matrix U, where a basis is a set of r vectors (r = rank(U)) that spans the column space of U. This equivalence relation leads to the manifold structure for multi-view data to incorporate all three rank constraints. We introduce a set of polynomials according to the sampled entries to analyze finite completability through analyzing algebraic independence of the defined polynomials. In Section III-B, we analyze the required maximum number of algebraically independent polynomials that is necessary and sufficient for finite completability. Then, a relationship between the maximum number of algebraically independent polynomials (among the defined polynomials) and the sampling pattern (locations of the sampled entries) is characterized. Consequently, we obtain the necessary and sufficient condition on sampling pattern for finite completability. A. Geometry of the Basis Let define r(cid:48) = r −r , r(cid:48) = r −r and r(cid:48) = r −r(cid:48) −r(cid:48) = r +r −r. Observe that r ≤ r, r ≤ r 1 2 2 1 1 2 1 2 1 2 and r ≤ r + r . Suppose that the basis V ∈ Rn×r is such that its first r columns is a basis for the 1 2 1 first view U , its last r columns is a basis for the second view U , and all columns of V is a basis for 1 2 2 U = [U1|U2], as shown in Figure 1. Assume that V = [V1|V2|V3], where V1 ∈ Rn×r1(cid:48), V2 ∈ Rn×r(cid:48) 6 and V3 ∈ Rn×r2(cid:48). Then, [V1|V2] is a basis for U1 and [V2|V3] is a basis for U2. As a result, there exist matrices T ∈ Rr1×m1 and T ∈ Rr2×m2 such that 1 2 U = [V |V ]·T , (1a) 1 1 2 1 U = [V |V ]·T . (1b) 2 2 3 2 r! r! r! 1 2 V V V 1 2 3 r r 1 r 2 Fig. 1: A basis V for the sampled matrix U. For any i ∈ {1,...,m } and i ∈ {1,...,m }, (1) can be written as 1 1 2 2 U (:,i ) = [V |V ]·T (:,i ), (2a) 1 1 1 2 1 1 U (:,i ) = [V |V ]·T (:,i ). (2b) 2 2 2 3 2 2 In the following, we list some useful facts that are instrumental to the subsequent analysis. • Fact 1: Observe that each observed entry in U1 results in a scalar equation from (1a) or (2a) that involvesonlyallr entriesofthecorrespondingrowof[V |V ]andallr entriesofthecorresponding 1 1 2 1 column of T in (1a). Similarly, each observed entry in U results in a scalar equation from (1b) or 1 2 (2b) that involves only all r entries of the corresponding row of [V |V ] and all r entries of the 2 2 3 2 corresponding column of T in (1b). Treating the entries of V, T and T as variables (right-hand 2 1 2 sides of (1a) and (1b)), each observed entry results in a polynomial in terms of these variables. • Fact 2: For any observed entry Ui(x1,x2), x1 and x2 specify the row index of V and the column index of T , respectively, that are involved in the corresponding polynomial, i = 1,2. i • Fact 3: It can be concluded from Bernstein’s theorem [37] that in a system of n polynomials in 7 n variables such that the coefficients are chosen generically, the n polynomials are algebraically independent with probability one, and therefore there exist only finitely many solutions. Moreover, inasystemofnpolynomialsinn−1variables(orless),thenpolynomialsarealgebraicallydependent with probability one. Also, given that a system of n polynomials (generically chosen coefficients) in n−1 variables (or less) has one solution, it can be concluded that it has a unique solution with probability one. Given all observed entries {U(x ,x ) : Ω(x ,x ) = 1}, we are interested in finding the number 1 2 1 2 of possible solutions in terms of entries of (V,T ,T ) (infinite, finite or unique) via investigating the 1 2 algebraicindependenceamongthepolynomials.Throughoutthispaper,wemakethefollowingassumption. Assumption 1: Any column of U includes at least r observed entries and any column of U includes 1 1 2 at least r observed entries. 2 Observe that Assumption 1 leads to a total of at least m r +m r sampled entries of U. 1 1 2 2 Lemma 1. Given a basis V = [V |V |V ] in (1), if Assumption 1 holds, then there exists a unique 1 2 3 solution (T ,T ), with probability one. Moreover, if Assumption 1 does not hold, then there are infinite 1 2 number of solutions (T ,T ), with probability one. 1 2 Proof: Note that only observed entries in the i-th column of U result in degree-1 polynomials in 1 terms of the entries of T (:,i) ∈ Rr1×1. As a result, exactly r generically chosen degree-1 polynomials 1 1 in terms of r variables are needed to ensure there is a unique solution T in (1), with probability one. 1 1 Moreover, having less than r polynomials in terms of r variables results in infinite solutions of T in 1 1 1 (1), with probability one. The same arguments apply to T . 2 Definition 1. Let P(Ω) denote all polynomials in terms of the entries of V obtained through the observed entries excluding the m r +m r polynomials that were used to obtain (T ,T ). Note that since (T ,T ) 1 1 2 2 1 2 1 2 is already solved in terms of V, each polynomial in P(Ω) is in terms of elements of V. Consider two bases V and V(cid:48) for the matrix U with the structure in (1). We say that V and V(cid:48) span the same space if and only if: (i) the spans of the first r columns of V and V(cid:48) are the same, (ii) the 1 spans of the last r columns of V and V(cid:48) are the same, (iii) the spans of all columns of V and V(cid:48) are 2 the same. Therefore, V and V(cid:48) span the same space if and only if: (i) each column of V is a linear combination 1 of the columns of [V(cid:48)|V(cid:48)], (ii) each column of V is a linear combination of the columns of V(cid:48), and 1 2 2 2 8 (iii) each column of V is a linear combination of the columns of [V(cid:48)|V(cid:48)]. The following equivalence 3 2 3 class partitions all possible bases such that any two bases in a class span the same space, i.e., the above- mentioned properties (i), (ii) and (iii) hold. Definition 2. Define an equivalence class for all bases V ∈ Rn×r of the sampled matrix U such that two bases V and V(cid:48) belong to the same class if there exist full rank matrices A1 ∈ Rr1×r1(cid:48), A2 ∈ Rr(cid:48)×r(cid:48) and A3 ∈ Rr2×r2(cid:48) such that V = [V(cid:48)|V(cid:48)]·A , (3a) 1 1 2 1 V = V(cid:48) ·A , (3b) 2 2 2 V = [V(cid:48)|V(cid:48)]·A , (3c) 3 2 3 3 where V = [V1|V2|V3], V(cid:48) = [V1(cid:48)|V2(cid:48)|V3(cid:48)], V1,V1(cid:48) ∈ Rn×r1(cid:48), V2,V2(cid:48) ∈ Rn×r(cid:48) and V3,V3(cid:48) ∈ Rn×r2(cid:48). Note that (3) leads to the fact that the dimension of all bases V is equal to nr−r r(cid:48) −r(cid:48)r(cid:48) −r r(cid:48) = 1 1 2 2 nr−r2 −r2 −r2 +r(r +r ). 1 2 1 2 Definition 3. (Canonical basis) As shown in Figure 2, denote B1 = V(1 : r1(cid:48),1 : r1(cid:48)) ∈ Rr1(cid:48)×r1(cid:48), (4a) B2 = V(1 : r2(cid:48),1+r1 : r2(cid:48) +r1) ∈ Rr2(cid:48)×r2(cid:48), (4b) B = V(1+max(r(cid:48),r(cid:48)) : r(cid:48) +max(r(cid:48),r(cid:48)),1+r(cid:48) : r(cid:48) +r(cid:48)) ∈ Rr(cid:48)×r(cid:48), (4c) 3 1 2 1 2 1 1 B4 = V(1+max(r1(cid:48),r2(cid:48)) : r(cid:48) +max(r1(cid:48),r2(cid:48)),1 : r1(cid:48)) ∈ Rr(cid:48)×r1(cid:48), (4d) B5 = V(1+max(r1(cid:48),r2(cid:48)) : r(cid:48) +max(r1(cid:48),r2(cid:48)),1+r1 : r2(cid:48) +r1) ∈ Rr(cid:48)×r2(cid:48). (4e) Then, we call V a canonical basis if B = I , B = I , B = I , B = 0 and B = 0 . 1 r(cid:48) 2 r(cid:48) 3 r(cid:48) 4 r(cid:48)×r(cid:48) 5 r(cid:48)×r(cid:48) 1 2 1 2 Example 1. Consider an example in which U = [U |U ] ∈ R4×7 and U ∈ R4×3 is the first view and 1 2 1 U ∈ R4×4 is the second view. Assume that r = 2, r = 3 and r = 4. Then, the corresponding canonical 2 1 2 basis is as follows. 9 r! r! r! 1 2 r! r1! B1=I B2 =I 2 r! B =0 B =I B =0 4 3 5 r r 1 r 2 Fig. 2: A canonical basis. 1 − 1 0 − − 0 1 V = 0 1 0 0 − − − − Observe that r2 +r2 +r2 −r(r +r ) = 9 of the entries are known. 1 2 1 2 Remark 1. In order to prove there are finitely many completions for the matrix U, it suffices to prove there are finitely many canonical bases that fit in U. Note that patterns B and B are in V , patterns B and B are in V , and pattern B is in V . It 1 4 1 2 5 3 3 2 can be easily seen that according to the definition of the equivalence class in (3), any permutation of the rows of any of these patterns satisfies the property that in each class there exists exactly one basis with the permuted pattern. B. Algebraic Independence The following theorem provides a condition on the polynomials in P(Ω) that is equivalent (necessary and sufficient) to finite completability of U. Theorem 1. Assume that Assumption 1 holds. For almost every sampled matrix U, there are at most finitely many bases that fit in U if and only if there exist nr − r2 − r2 − r2 + r(r + r ) algebraically 1 2 1 2 10 independent polynomials in P(Ω). Proof: According to Lemma 1, Assumption 1 results that (T ,T ) can be determined uniquely 1 2 (finitely). Let P(Ω) = {p ,...,p } and define S as the set of all basis V that satisfy polynomials 1 t i {p ,...,p }, i = 0,1,...,t, where S is the set of all bases V without any polynomial restriction. Each 1 i 0 polynomial in terms of the entries of V reduces the degree of freedom or the dimension of the set of solutions by one. Therefore, dim(S ) = dim(S ) if the maximum number of algebraically independent i i−1 polynomials in sets {p ,...,p } and {p ,...,p } are the same and dim(S ) = dim(S )−1 otherwise. 1 i 1 i−1 i i−1 Observe that the number of variables is dim(S ) = nr −r2 −r2 −r2 +r(r +r ) and the number of 0 1 2 1 2 solutions of the system of polynomials P(Ω) is |S |. Therefore, using Fact 3, with probability one |S | t t is a finite number if and only if dim(S ) = 0. As mentioned earlier, the dimension of the set of all bases t without any polynomial restriction, i.e., dim(S ) = nr −r2 −r2 −r2 +r(r +r ). Hence, we conclude 0 1 2 1 2 that the existence of exactly nr−r2−r2−r2+r(r +r ) algebraically independent polynomials in P(Ω) 1 2 1 2 is equivalent to having finitely many bases, i.e., finite completability of U with probability one. In this subsection, we are interested in characterizing a relationship between the sampling pattern Ω and the maximum number of algebraically independent polynomials in P(Ω). To this end, we construct a constraint matrix Ω˘ based on Ω such that each column of Ω˘ represents exactly one of the polynomials in P(Ω). Consider an arbitrary column of the first view U (:,i), where i ∈ {1,...,m }. Let l = N (U (:,i)) 1 1 i Ω 1 denote the number of observed entries in the i-th column of the first view. Assumption 1 results that l ≥ r . i 1 We construct l − r columns with binary entries based on the locations of the observed entries in i 1 U (:,i) such that each column has exactly r + 1 entries equal to one. Assume that x ,...,x be the 1 1 1 li row indices of all observed entries in this column. Let Ωi be the corresponding n×(l −r ) matrix to 1 i 1 this column which is defined as the following: for any j ∈ {1,...,l −r }, the j-th column has the value i 1 1 in rows {x ,...,x ,x } and zeros elsewhere. Define the binary constraint matrix of the first view 1 r1 r1+j as Ω˘ = [Ω1|Ω2...|Ωm1] ∈ Rn×K1 [34], where K = N (U )−m r . 1 1 1 1 1 Ω 1 1 1 Similarly, we construct the binary constraint matrix Ω˘ ∈ Rn×K2 for the second view, where K = 2 2 N (U ) − m r . Define the constraint matrix of U as Ω˘ = [Ω˘ |Ω˘ ] ∈ Rn×(K1+K2). For any subset of Ω 2 2 2 1 2 columns Ω˘(cid:48) of Ω˘, P(Ω˘(cid:48)) denotes the subset of P(Ω) that corrseponds to Ω˘(cid:48).2 2Note that only r and r appear in the structure of constraint matrix and r only appears in the structure of the basis as the basis has r 1 2 columns.