BLOCK TENSORS AND SYMMETRIC EMBEDDINGS STEFAN RAGNARSSON∗ AND CHARLES F. VAN LOAN† Abstract. Wellknownconnections existbetweenthesingularvaluedecompositionofamatrix AandtheSchurdecompositionofitssymmetricembeddingsym(A)=([0A;AT0]). Inparticular, ifσ isasingularvalueofAthen+σ and−σ areeigenvalues ofthesymmetricembedding. Thetop and bottom halves of sym(A)’s eigenvectors are singular vectors for A. Power methods applied to 1 A can be related to power methods applied to sym(A). The rank of sym(A) is twice the rank of 1 A. In this paper we develop similar connections for tensors by building on L-H. Lim’s variational 0 approach to tensor singularvalues andvectors. We show how toembed ageneral order-dtensor A 2 intoanorder-dsymmetrictensorsym(A). ThroughtheembeddingwerelatepowermethodsforA’s singularvaluestopowermethodsforsym(A)’seigenvalues. Finally,weconnectthemultilinearand n outerproductrankofAtothemultilinearandouterproductrankofsym(A). a J Key words. tensor,blocktensor,symmetrictensor,tensorrank 1 1 AMS subject classifications. 15A18, 15A69,65F15 ] A 1. Introduction. If A IRn1×n2, then there are well-known connections be- ∈ N tween its singular value decomposition (SVD) and the eigenvalue and eigenvector properties of the symmetric matrix . h t 0 A a sym(A) = IR(n1+n2)×(n1+n2). (1.1) m AT 0 ∈ (cid:20) (cid:21) [ If A=UΣVT is the SVD of A, then for k =1:rank(A) 2 v 0 A u u 7 AT 0 vkk = ±σk vkk (1.2) 0 (cid:20) (cid:21)(cid:20) ± (cid:21) (cid:20) ± (cid:21) 7 where u = U(:,k), v = V(:,k), and σ = Σ(k,k). Another way to connect A and 0 k k k sym(A) is through the Rayleigh quotients . 0 1 uTAv n1 n2 0 φ (u,v) = = A(i ,i )u(i )v(i ) ( u v ) (1.3) :1 A kuk2kvk2 iX1=1iX2=1 1 2 1 2 !, k k2k k2 v i and X ar φ(Asym)(x) = 12xxTTCxx = 21 N N C(i1,i2)x(i1)x(i2)!,kxk22 (1.4) iX1=1iX2=1 whereu IRn1,v IRn2,N =n +n ,x IRN,andC =sym(A). Ifxisastationary 1 2 ∈ ∈ ∈ vector for φ(sym), then u = x(1:n ) and v = x(n +1:n +n ) render a stationary A 1 1 1 2 value for φ . See [8, p.448]. A In this paper we discuss these notions as they apply to tensors. An order-d ten- sor IRn1×···×nd is a real d-dimensional array (1:n ,...,1:n ) where the index 1 d A ∈ A range in the k-th mode is from 1 to n . The idea of embedding a general tensor into k a larger symmetric tensor having the same order is developed in 2. This requires § ∗CenterforAppliedMathematics,CornellUniversityIthaca, NY14853, [email protected] †Department of Computer Science, Cornell University, Ithaca, NY 14853, [email protected]. Bothauthors aresupportedinpartbyNSFcontractDMS-1016284. 1 2 S.RAGNARSSONandC.F.VANLOAN having a facility with block tensors. Fundamental orderings, unfoldings, and multi- linear summations are discussed in 3 and used in 4 where we characterize various § § multilinearRayleighquotientsandtheirstationaryvaluesandvectors. Thisbuildson the variationalapproachto tensor singularvalues developedin [15]. In 5 we provide § a symmetric embedding analysis of several higher-order power methods for tensors that have recently been proposed [10, 11, 5, 6, 13]. Results that relate the multilin- ear and outer product ranks of a tensor to the corresponding ranks of its symmetric embedding are presented in 6. A brief conclusion section follows. § Before we proceed with the rest of the paper, we use the case of third-order tensors to preview some of the main ideas and to establish notation. (The busy reader already familiar with basic tensor computations and notation may safely skip to 2.) The starting point is to define the trilinear Rayleigh quotient § n1 n2 n3 φA(u,v,w) = A(i1,i2,i3)u(i1)v(i2)w(i3)!,(kuk2kvk2kwk2) iX1=1iX2=1iX3=1 (1.5) where IRn1×n2×n3,u IRn1, v IRn2, and w IRn3. Calligraphic characters are A∈ ∈ ∈ ∈ used for tensors: (i ,i ,i ) is entry (i ,i ,i ) of . 1 2 3 1 2 3 A A The singular values and vectors of are the critical values and vectors of φA A as formulated in [15]. A simple expression for the gradient φA is made possible ∇ by unfolding = (a ) in each of its three modes and aggregating the u, v, and ijk A w vectors with the Kronecker product. To illustrate, suppose n = 4, n = 3, and 1 2 n =2 and define the modal unfoldings , , and by 3 (1) (2) (3) A A A a a a a a a 111 121 131 112 122 132 a a a a a a = 211 221 231 212 222 232 A(1) a311 a321 a331 a312 a322 a332 a411 a421 a431 a412 a422 a432 a a a a a a a a 111 211 311 411 112 212 312 412 = a a a a a a a a (1.6) A(2) 121 221 321 421 122 222 322 422 a a a a a a a a 131 231 331 431 132 232 332 432 a a a a a a a a a a a a = 111 211 311 411 121 221 321 421 131 231 331 431 . A(3) a112 a212 a312 a412 a122 a222 a322 a422 a132 a232 a332 a432 (cid:20) (cid:21) The columns of these matrices are fibers. A fiber of a tensor is obtained by fixing all but one of the indices. For example, the third column of the unfolding = (:,1,1) (:,2,1) (:,3,1) (:,1,2) (:,2,2) (:,3,2) A(1) A A A A A A (cid:2) (cid:3) is the fiber (1,3,1) A (2,3,1) (:,3,1) = A A (3,3,1) A (4,3,1) A obtained by fixing the 2-mode index at 3 and the 3-mode index at 1. It is necessary to specify the order in which the fibers appear in a modal unfolding. The choice SymmetricEmbeddingofTensors 3 exhibited in (1.6) has the property that uT w v n1 n2 n3 A(1) ⊗ (i ,i ,i )u(i )v(i )w(i ) = vT w u (1.7) iX1=1iX2=1iX3=1A 1 2 3 1 2 3 wTA(2) v ⊗u A(3) ⊗ which makes it easy to specify the stationary vectors of φA. If u, v, and w are unit vectors, then the gradient of φA is given by w v u A(1) ⊗ ∇φA(u,v,w) = A(2)w⊗u − φA(u,v,w) v . (1.8) A(3)v⊗u w We remark that if is an order-2 tensor, then (1.8) collapses to the familiar matrix- A SVD equations Av =σu and ATu=σv. Acentralcontributionofthispaperrevolvesaroundthetensorversionofthesym matrix(1.1)andtheassociatedRayleighquotientφ(sym) thatisdefinedin(1.4). Just A as sym-of-a-matrix sets up a symmetric block matrix whose entries are either zero or matrix transpositions, sym-of-a-tensor sets up a symmetric block tensor whose entries are either zero or a tensor transposition. If IRn1×n2×n3, then there are 6=3! possible transpositions identified by the A∈ notation <[ijk]> where [ijk] is a permutation of [123]: A <[123]> (i,j,k) A B <[132]> (i,k,j) A B B = AA<<[[223113]]>> =⇒ BB((jj,,ki,,ki)) = A(i,j,k) (1.9) <[312]> (k,i,j) AA<[321]> BB(k,j,i) for i=1:n , j =1:n , k =1:n . 1 2 3 Thesymmetricembeddingofa3rd-ordertensorresultsina3-by-3-by-3blockten- sor,akindofRubik’scubebuiltfrom27(possiblynon-cubical)boxes. If IRn1×n2×n3 andN =n +n +n ,thensym( ) = C IRN×N×N isthe3-by-3-by-A3b∈locktensor 1 2 3 A ∈ whose ijk block is specified by <[ijk]> if [ijk] is a permutation of [123] A = (1.10) C[ijk] 0 IRni×nj×nk otherwise. ∈ See Fig 1.1. The blocks in a block tensor such as can be specified using the colon C notation. For example, if n =4, n =3 and n =2, then 1 2 3 4 S.RAGNARSSONandC.F.VANLOAN (:,:,3) C (cid:0) (cid:0) (cid:0) (cid:0)(cid:0) (:,:,2) (cid:0) (cid:0) (cid:0) (cid:0)(cid:0) C <[123]> (cid:0) (cid:0) (cid:0) (cid:0)(cid:0) (cid:0) A (:,:,1) (cid:0) (cid:0) (cid:0) (cid:0)(cid:0) (cid:0) C <[132]> <[213]> (cid:0) (cid:0) (cid:0) (cid:0)(cid:0) (cid:0) (cid:0) A A (cid:0) (cid:0) (cid:0) (cid:0)(cid:0) (cid:0) (cid:0) (cid:0) (cid:0) (cid:0) (cid:0) (cid:0) (cid:0) <[231]> <[312]> (cid:0) (cid:0) A A (cid:0) (cid:0) <[321]> (cid:0) A (cid:0) Fig.1.1. The Symmetric Embedding of an Order-3 Tensor = (1:4,5:7,8:9) = <[123]> IRn1×n2×n3 [123] C C A ∈ = (1:4,8:9,5:7) = <[132]> IRn1×n3×n2 [132] C C A ∈ = (5:7,1:4,8:9) = <[213]> IRn2×n1×n3 [213] C C A ∈ . (1.11) = (5:7,8:9,1:4) = <[231]> IRn2×n3×n1 [231] C C A ∈ = (8:9,1:4,5:7) = <[312]> IRn3×n1×n2 [312] C C A ∈ = (8:9,5:7,1:4) = <[321]> IRn3×n2×n1 [321] C C A ∈ We will prove in section 2.3 that the tensor is in fact symmetric. C Thelasttopictocoverinourorder-3previewisthegeneralizationoftheRayleigh quotient φ(sym) defined in (1.4). If IRn1×n2×n3, =sym( ), N =n +n +n , A A∈ C A 1 2 3 and x IRN, then φ(sym) is defined by ∈ A N N N 1 φ(sym)(x) = (i ,i ,i )x(i )x(i )x(i ) x 3 (1.12) A 3! C 1 2 3 1 2 3 !,k k2 iX1=1iX2=1iX3=1 It will be shown in section 4.3 that if u }n1 x = v }n2 w }n3 satisfies φ(sym)(x)=0, then ∇ A uφA(u,v,w) = 0 vφA(u,v,w) = 0 wφA(u,v,w) = 0 ∇ ∇ ∇ SymmetricEmbeddingofTensors 5 where referstothegradientwithrespecttothecomponentsinvectorz. Moreover, z ∇ it will be shown that u u u x = v x = v x = v +− −+ − −− − w w w − − are also stationary vectors for φ(sym) and A φA(u,v,w) = φ(Asym)(x) = φ(Asym)(x−−) = −φ(Asym)(x−+) = −φ(Asym)(x+−). 2. The Symmetric Embedding. Block matrix manipulation is such a fixture in numerical linear algebra that we take for granted the correctness of facts like A A T AT AT 11 12 11 21 = . (2.1) " A21 A22 # " AT12 AT22 # Formalverificationrequiresshowingthatthe(i,j)entriesonbothsidesoftheequation are equal for all valid ij pairs. The symmetric embedding of a tensor involves generalizationsof both transposi- tion and blocking so this section begins by discussing these notions and establishing the tensor analog of (2.1). Since vectors of subscripts are prominent in the presenta- tion, weelevate their notationalstatus withboldface font, e.g., p = [4123]. We let 1 denote the vector of ones and assume that dimension is clear from context. More generally, if N is an integer, then N is the vector of all N’s. Finally, if i and j have equal length, then i j means that i j for all k. k k ≤ ≤ 2.1. Blocking. If s and t are integers with s t, then (as in Matlab) let s:t ≤ denote the rowvector[s,s+1, ,t]. We refer to avectorwiththis formas anindex ··· range vector. The act of blocking an m -by-m matrix C is the act of partitioning 1 2 the index range vectors 1:m and 1:m : 1 2 r(1) = 1:m = r(1) r(1) r(2) = 1:m = r(2) r(2) (2.2) 1 1 ··· b1 2 1 ··· b2 h i h i Given (2.2), we are able to regard C as a b b block matrix (C ) where block 1× 2 i1,i2 C has length(r(1)) rows and length(r(2)) columns. It is easy (although messy) i1,i2 i1 i2 to “locate” a particular entry of a particular block. Indeed, (1) (2) C (j ,j ) = C(ρ +j , ρ +j ) i1,i2 1 2 i1 1 i2 2 where ρ(k) = length(r(k))+length(r(k))+ +length(r(k) ) (2.3) ik 1 2 ··· ik−1 for k =1:2. To block an order-d tensor IRm1×···×md we proceed analogously. The index- C ∈ range vectors 1:m ,...,1:m are partitioned 1 d r(k) = 1:m = r(k) r(k) k =1:d (2.4) k 1 ··· bk h i and this permits us to regard as a b b block tensor. If i=[i ,...,i ], then 1 d 1 d C ×···× the i-th block is the subtensor = = (r(1),...,r(d)). Ci Ci1,...,id C i1 id 6 S.RAGNARSSONandC.F.VANLOAN If j=[j ,...,j ], then the j-th entry of this subtensor is given by 1 d (j) = (ρ(1)+j ,..., ρ(d)+j ) R (2.5) Ci C i1 1 id d ∈ where ρ(k) is specified by (2.3) for k =1:d. ik To illustrate equations (2.3)-(2.5), if IR9×7×5×6 and C ∈ 1:9 = 1 2 3 4 5 6 7 8 9 (b =3) 1 1:7 = 1 2 3 4 5 6 7 (b =2) (cid:2) (cid:3) 2 , 1:5 = 1 2 3 4 5 (b =2) (cid:2) (cid:3) 3 1:6 = 1 2 3 4 5 6 (b =3) (cid:2) (cid:3) 4 (cid:2) (cid:3) then we are choosing to regard as a 3 2 2 3 block tensor. Thus, if i=[3121] C × × × then = (7:9,1:5,5:5,1:2)and i C C (j) = (6+j , j , 4+j , j ) i 1 2 3 4 C C where 1 j [3512]. ≤ ≤ 2.2. Tensor Transposition. If IRn1×···×nd and p = [p ,...,p ] is a per- 1 d mutation of 1:d, then <p> IRnp1×···A×n∈pd denotes the p-transpose of defined by A ∈ A <p>(j ,...,j ) = (j ,...,j ) A p1 pd A 1 d where 1 j n for k =1:d. A more succinct way of saying the same thing is k k ≤ ≤ <p>(j(p))= (j) 1 j n. A A ≤ ≤ If is an order-2 tensor, then <[21]>(j ,j ) = (j ,j ). It is also easy to verify 2 1 1 2 A A A that if f and g are both permutations of 1:d, then ( <f>)<g> = <f(g)>. (2.6) A A A transposition of a block tensor renders another block tensor. The following lemma makes this precise and generalizes (2.1). Lemma 2.1. Suppose IRm1×···×md is a b b block tensor with block 1 d C ∈ ×···× dimensions defined by the partitioning (2.4). Let denote its i-th block where i = i C [i ,...,i ]. If p=[p ,...,p ] is a permutation of 1:d and = <p>, then the tensor 1 d 1 d B ∈IRmp1×···×mpd is a bp1×···×bpd block tensor where eaBch blCock Bi(p) is defined by = <p>. Bi(p) Ci Proof. If 1 j m for k=1:d, then from (2.4) and (2.5) we have k k ≤ ≤ <p>(j ,...,j ) = (j ,...,j ) = (ρ(1)+j ,...,ρ(d)+j ) Ci p1 pd Ci 1 p C i1 1 id d On the other hand, = <p> and so B C (ρ(1)+j ,...,ρ(d)+j ) = (ρ(p1)+j ,...,ρ(pd)+j ) = (j ,...,j ). C i1 1 id d B ip1 p1 ipd pd Bi(p) p1 pd Thus, (j(p)) = <p>(j(p)) for all j , i.e., = <p>. Bi(p) Ci Bi(p) Ci SymmetricEmbeddingofTensors 7 2.3. The sym() Operation. An order-d tensor IRN×···×N is symmetric · C ∈ if = <p> for any permutation p of 1:d. The tensor analog of (1.1) involves C C constructing an order-d symmetric tensor sym( ) whose blocks are either zero or carefully chosen transposes of . In particular, ifA IRn1×···×nd, then A A ∈ sym( ) IRN×···×N N =n + n 1 d A ∈ ··· is a block tensor defined by the partitioning 1:N =[r r ] where 1 d | ··· | rk =(1+n1+ +nk−1):(n1+ +nk) k =1:d. (2.7) ··· ··· The i-th block of =sym( ) is given by C A A<i> if i is a permutation of 1:d = i C 0 otherwise for all i that satisfy 1 ≤ i≤ d. Note that Ci is ni1×ni2×···×nid. We confirm that sym( ) is symmetric. A Lemma 2.2. If IRn1×···×nd and =sym( ), then is symmetric. A ∈ C A C Proof. Letpbeanarbitrarypermutationof1:d. Wemustshowthatif =C<p> B then = . Since as a block tensor is d d d, it follows from Lemma 2.1 that B C C × ×···× has the same block structure and B = <p> Bi(p) Ci for all i that satisfy 1 i d. If i is a permutation of 1:d, then = A<i> and by i ≤ ≤ C using (2.6) we conclude that = ( <i>)<p> = <i(p)> = i(p) i(p) B A A C If i is not a permutation of 1:d, then both and are zero and so i i(p) C C = <p> =0= . Bi(p) Ci Ci(p) Since and agree block-by-block, they are the same. B C 3. Orderings, Unfoldings, and Summations. Innumericalmultilinearalge- bra it is frequently necessary to reshape a given tensor into a vector or a matrix and vice versa. In this section we collect results that make these maneuvers precise. 3.1. The col Ordering. If i and s are length-e index vectors and 1 i s, ≤ ≤ then we define the integer-valued function ivec by ivec(i,s) = i1+(i2 1)s1+ +(ie 1)(s1 se−1) − ··· − ··· If IRs1×···×se, then v =vec( ) IRs1···se is the column vector defined by F ∈ F ∈ v(ivec(i,s))= (i) 1 i s. F ≤ ≤ Notethatife=2,then isamatrixandvec( )stacksitscolumns. Wealsoobserve that if w IRsk for k =F1:e, then F k ∈ w =w w w(ivec(i,s))=w (i ) w (i ). (3.1) e⊗ ··· ⊗ 1 ⇔ 1 1 ··· e e 8 S.RAGNARSSONandC.F.VANLOAN 3.2. Modal Unfoldings. In the gradientcalculations that follow, it is particu- larly convenient to “flatten” the given tensor IRn1×···×nd into a matrix. If A ∈ n˜ =[n(1:k 1) n(k+1:d)], (3.2) − ˜i=[ i(1:k 1) i(k+1:d)], (3.3) − then the mode-k unfolding is defined by (k) A (i ,ivec(˜i,n˜)) = (i) 1 i n. (3.4) (k) k A A ≤ ≤ This matrix has nk rows and n1 nk−1nk+1 nd columns. A third-order instance ··· ··· of this important concept is displayed in equation (1.6). We mention that there are other ways to order the columns in . See [14]. (k) A Whilethecolumnsof aremode-kfibers,itsrowsarereshapingsofitsmode-k (k) A subtensors. In particular, if 1 r n , then k ≤ ≤ (r,:) = vec( (r))T (k) A B where the mode-k subtensor (r) has order d 1 and is defined by B − (r)(i1,...,ik−1,ik+1,...,id) = (i1,...,ik−1,r,ik+1,...,id). B A Thepartitioningofanorder-dtensorintoorder-(d 1)tensorsisjustageneralization − of partitioning a matrix into its columns. 3.3. Summations. It is handy to have a multi-index summation notation in order to describe generalversions of the summations that appear in (1.5) and (1.12). If n is a length-d index vector, then n n1 nd . ≡ ··· Xi=1 iX1=1 iXd=1 The summationthatdefinesthe multilinear Rayleighquotient(1.5)canbe writtenin matrix-vector terms. Lemma 3.1. If IRn1×···×nd and u IRnk for k =1:d, then k A ∈ ∈ n (i)u (i ) u (i ) = vec( )Tu u . (3.5) A 1 1 ··· d d A d⊗ ··· ⊗ 1 i=1 X Moreover, for k =1:d we have n (i)u (i ) u (i ) = uT u˜ (3.6) A 1 1 ··· d d k A(k) k i=1 X where u˜k = (ud⊗ ··· ⊗uk+1⊗uk−1⊗ ··· ⊗u1). (3.7) SymmetricEmbeddingofTensors 9 Proof. If a=vec( ) andb=u u , thenusing the definition ofvec and A d⊗ ··· ⊗ 1 equations (3.1)-(3.4), we have n n n (i)u (i ) u (i ) = a(ivec(i,n)) b(ivec(i,n)) = a(i)b(i) = aTb. 1 1 d d A ··· · i=1 i=1 i=1 X X X Thisproves(3.5). Usingthemodalsubtensorinterpretationof thatwediscussed (k) A in 3.2 and definitions (3.2) and (3.3), we have § n nk n˜ (i)u (i ) u (i )= u (i ) (ik)(˜i)u˜(˜i) 1 1 d d k k A ··· B Xi=1 iXk=1 ˜Xi=1 nk = u (i ) (i ,:)u˜ = uT u˜ k k A(k) k k kA(k) k iXk=1 (cid:0) (cid:1) which establishes (3.6). Summations that involve symmetric tensors are important in later sections. The following notation for the multiple Kronecker product of a single vector is handy: x⊗d =x x. ⊗ ··· ⊗ d times Note that if x IRN, then x⊗d IRNd.| {z } ∈ ∈ Lemma 3.2. If IRN×···×N is a symmetric order-d tensor and x IRN, then C ∈ ∈ N (i)x(i ) x(i ) = xT x⊗(d−1) (3.8) 1 d (1) C ··· C i=1 X Proof. This follows from Lemma 3.1 by setting n = N and u = x for k = 1:d. k k Note that because is symmetric, = = . (1) (d) C C ··· C The summation (3.8) has a special characterization if = sym( ). To pursue C A this we will have to navigate ’s block structure and to that end we define the index C vectors L and R as follows: 1 n 1 n +1 n +n 1 1 2 L = . R = . . (3.9) . . . . n1+···+nd−1+1 n1+···+nd Note that if 1 p d, then ≤ ≤ = (L(p ):R(p ),...,L(p ):R(p )) p 1 1 d d C C is ’s p-th block. C Lemma 3.3. Suppose IRn1×···×nd, = sym( ), and N =n + +n . If 1 d x IRN is partitioned as foAllo∈ws C A ··· ∈ u 1 x = ... uk IRnk, ∈ u d 10 S.RAGNARSSONandC.F.VANLOAN and u˜ ,...,u˜ are defined by (3.7), then 1 d u˜ (1) 1 A x⊗(d−1) = (d 1)! .. (3.10) (1) . C − u˜ A(d) d and N n (j)x(j ) x(j ) = d! (i)u (i ) u (i ). (3.11) 1 d 1 1 d d C ··· A ··· j=1 i=1 X X Proof. If v = x⊗(d−1) and (1) C w 1 . e =I (:,j) = . (3.12) j N . w d is partitioned conformally with x, then for j =1:N we have N v(j)= (j,i ,...,i )x(i ) x(i ) 2 d 2 d C ··· i(2X:d)=1 N = (i)e (i )x(i ) x(i ) j 1 2 d C ··· i=1 X d R(p) = (i)e (i )x(i ) x(i ) j 1 2 d C ··· pX=1i=XL(p) d n(p) = (k)w (k )u (k ) u (k ) Cp p1 1 p2 2 ··· pd d p=1 k=1 X X NowsupposethatL(q) j R(q), j =L(q)+r 1. From(3.10)wemustshowthat ≤ ≤ − v is the rth component of u˜ . j (q) q A To that end observe that (k)w (k ) is necessarily zero unless p = q, k = r, Cp p1 1 1 1 and p is a permutation of 1:d. Assuming this to be the case and defining the vectors v ,...,v by 1 d u if i=q i v = 6 , i ( w otherwise q we see using (3.6) that