Theory and Computation of Tensors Theory and Computation of Tensors Multi-Dimensional Arrays WEIYANG DING YIMIN WEI AMSTERDAM (cid:129) BOSTON (cid:129) HEIDELBERG (cid:129) LONDON NEW YORK (cid:129) OXFORD (cid:129) PARIS (cid:129) SAN DIEGO SAN FRANCISCO (cid:129) SINGAPORE (cid:129) SYDNEY (cid:129) TOKYO Academic Press is an imprint of Elsevier AcademicPressisanimprintofElsevier 125LondonWall,LondonEC2Y5AS,UK 525BStreet,Suite1800,SanDiego,CA92101-4495,USA 50HampshireStreet,5thFloor,Cambridge,MA02139,USA TheBoulevard,LangfordLane,Kidlington,OxfordOX51GB,UK Copyright©2016ElsevierLtd.Allrightsreserved. Nopartofthispublicationmaybereproducedortransmittedinanyformorbyanymeans,electronic ormechanical,includingphotocopying,recording,oranyinformationstorageandretrievalsystem, withoutpermissioninwritingfromthepublisher.Detailsonhowtoseekpermission,further informationaboutthePublisher’spermissionspoliciesandourarrangementswithorganizationssuchas theCopyrightClearanceCenterandtheCopyrightLicensingAgency,canbefoundatourwebsite: www.elsevier.com/permissions. Thisbookandtheindividualcontributionscontainedinitareprotectedundercopyrightbythe Publisher(otherthanasmaybenotedherein). Notices Knowledgeandbestpracticeinthisfieldareconstantlychanging.Asnewresearchandexperience broadenourunderstanding,changesinresearchmethods,professionalpractices,ormedicaltreatment maybecomenecessary. Practitionersandresearchersmustalwaysrelyontheirownexperienceandknowledgeinevaluating andusinganyinformation,methods,compounds,orexperimentsdescribedherein.Inusingsuch informationormethodstheyshouldbemindfuloftheirownsafetyandthesafetyofothers,including partiesforwhomtheyhaveaprofessionalresponsibility. Tothefullestextentofthelaw,neitherthePublishernortheauthors,contributors,oreditors,assume anyliabilityforanyinjuryand/ordamagetopersonsorpropertyasamatterofproductsliability, negligenceorotherwise,orfromanyuseoroperationofanymethods,products,instructions,orideas containedinthematerialherein. LibraryofCongressCataloging-in-PublicationData AcatalogrecordforthisbookisavailablefromtheLibraryofCongress BritishLibraryCataloguing-in-PublicationData AcataloguerecordforthisbookisavailablefromtheBritishLibrary ISBN978-0-12-803953-3(print) ISBN978-0-12-803980-9(online) ForinformationonallAcademicPresspublications visitourwebsiteathttps://www.elsevier.com/ Publisher:GlynJones AcquisitionEditor:GlynJones EditorialProjectManager:Anna Valutkevich ProductionProjectManager:DebasishGhosh Designer:Nikki Levy TypesetbySPiGlobal,India Preface Thisbookisdevotedtothetheoryandcomputationoftensors,alsocalledhyper- matrices. Our investigation includes theories on generalized tensor eigenvalue problems and two kinds of structured tensors, Hankel tensors and M-tensors. Both theoretical analyses and computational aspects are discussed. We begin with the generalized tensor eigenvalue problems, which are re- garded as a unified framework of different kinds of tensor eigenvalue problems arising from applications. We focus on the perturbation theory and the error analysis of regular tensor pairs. Employing various techniques, we extend sev- eral classical results from matrices or matrix pairs to tensor pairs, such as the Gershgorin circle theorem, the Collatz-Wielandt formula, the Bauer-Fike the- orem, the Rayleigh-Ritz theorem, backward error analysis, the componentwise distance of a nonsingular tensor to singularity, etc. Inthesecondpart,wefocusonHankeltensors. Wefirstproposeafastalgo- rithmforHankeltensor-vectorproductsbyintroducingaspecialclassofHankel tensors that can be diagonalized by Fourier matrices, called anti-circulant ten- sors. ThenweobtainafastalgorithmforHankeltensor-vectorproductsbyem- beddingaHankeltensorintoalargeranti-circulanttensor. Thecomputational complexityisreducedfromO(nm)toO(m2nlogmn). Next,weinvestigatethe spectral inheritance properties of Hankel tensors by applying the convolution formulaofthefastalgorithmandanaugmentedVandermondedecompositionof strong Hankel tensors. We prove that if a lower-order Hankel tensor is positive semidefinite,thenahigher-orderHankeltensorwiththesamegeneratingvector has no negative H-eigenvalues, when (i) the lower order is 2, or (ii) the lower order is even and the higher order is its multiple. The third part is contributed to M-tensors. We attempt to extend the equivalentdefinitionsofnonsingularM-matrices,suchassemi-positivity,mono- tonicity, nonnegative inverse, etc., to the tensor case. Our results show that the semi-positivity is still an equivalent definition of nonsingular M-tensors, whilethemonotonicityisnot. Furthermore, thegeneralizationofthe“nonneg- ative inverse” property inspires the study of multilinear system of equations. We prove the existence and uniqueness of the positive solutions of nonsingular M-equations with positive right-hand sides, and also propose several iterative methods for computing the positive solutions. We would like to thank our collaborator Prof. Liqun Qi of the Hong Kong Polytechnic University, who leaded us to the research of tensor spectral theory and always encourages us to explore the topic. We would also like to thank Prof. Eric King-wah Chu of Monash University and Prof. Sanzheng Qiao of McMasterUniversity,whoreadthisbookcarefullyandprovidedfeedbackduring the writing process. This work was supported by the National Natural Science Foundation of China under Grant 11271084, School of Mathematical Sciences and Key Labo- ratory of Mathematics for Nonlinear Sciences, Fudan University. Chapter 1 Introduction and Preliminaries We first introduce the concepts and sources of tensors in this chapter. Several essentialandfrequentlyusedoperationsinvolvingtensorsarealsoincluded. Fur- thermore, two basic topics, tensor decompositions and tensor eigenvalue prob- lems, are briefly discussed at the end of this chapter. 1.1 What Are Tensors? The term tensor or hypermatrix in this book refers to a multiway array. The numberofthedimensionsofatensoriscalleditsorder,thatis,A=(a ) i1i2...im is an mth-order tensor. Particularly, a scalar is a 0th-order tensor, a vector is a 1st-order tensor, and a matrix is a 2nd-order tensor. As other mathematical concepts, tensor or hypermatrix is abstracted from real-world phenomena and other scientific theories. Where do the tensors arise? What kinds of properties do we care most? How many different types of tensors do we have? We will briefly answer these questions employing several illustrative examples in this section. Example 1.1. As we know, a table is one of the most common realizations of a matrix. We can also understand tensors or hypermatrices as complex tables with multivariables. For instance, if we record the scores of 4 students on 3 subjects for both the midterm and final exams, then we can design a 3rd-order tensor S of size 4×3×2 whose (i,j,k) entry s denotes the score of the i-th ijk student on the j-th subjects in the k-th exam. This representation is natural and easily understood, thus it is a convenient data structure for construction andquery. However,whenweneedtoprinttheinformationonapieceofpaper, the 3D structure is apparently not suitable for 2D visualization. Thus we need tounfoldthecubictensorintoamatrix. Thefollowingtwodifferentunfoldings of the same tensor both include all the information in the original complex table. We can see from the two tables that their entries are the same up to a permutation. Actually, there are many different ways to unfold a higher-order tensorintoamatrix,andthelinkagesbetweenthemarepermutationsofindices. (cid:84)(cid:104)(cid:101)(cid:111)(cid:114)(cid:121)(cid:32)(cid:97)(cid:110)(cid:100)(cid:32)(cid:67)(cid:111)(cid:109)(cid:112)(cid:117)(cid:116)(cid:97)(cid:116)(cid:105)(cid:111)(cid:110)(cid:32)(cid:111)(cid:102)(cid:32)(cid:84)(cid:101)(cid:110)(cid:115)(cid:111)(cid:114)(cid:115)(cid:46) 3 (cid:104)(cid:116)(cid:116)(cid:112)(cid:58)(cid:47)(cid:47)(cid:100)(cid:120)(cid:46)(cid:100)(cid:111)(cid:105)(cid:46)(cid:111)(cid:114)(cid:103)(cid:47)(cid:49)(cid:48)(cid:46)(cid:49)(cid:48)(cid:49)(cid:54)(cid:47)(cid:66)(cid:57)(cid:55)(cid:56)(cid:45)(cid:48)(cid:45)(cid:49)(cid:50)(cid:45)(cid:56)(cid:48)(cid:51)(cid:57)(cid:53)(cid:51)(cid:45)(cid:51)(cid:46)(cid:53)(cid:48)(cid:48)(cid:48)(cid:49)(cid:45)(cid:48) Copyright © (cid:50)(cid:48)(cid:49)(cid:54)(cid:32)(cid:69)(cid:108)(cid:115)(cid:101)(cid:118)(cid:105)(cid:101)(cid:114)(cid:32)(cid:76)(cid:116)(cid:100)(cid:46)(cid:32)(cid:65)(cid:108)(cid:108)(cid:32)(cid:114)(cid:105)(cid:103)(cid:104)(cid:116)(cid:115)(cid:32)(cid:114)(cid:101)(cid:115)(cid:101)(cid:114)(cid:118)(cid:101)(cid:100)(cid:46) 4 CHAPTER 1. INTRODUCTION AND PRELIMINARIES Sub. 1 Sub. 2 Sub. 3 Mid Final Mid Final Mid Final Std. 1 s s s s s s 111 112 121 122 131 132 Std. 2 s s s s s s 211 212 221 222 231 232 Std. 3 s s s s s s 311 312 321 322 331 332 Std. 4 s s s s s s 411 412 421 422 431 432 Table 1.1: The first way to print S. Mid Final Sub. 1 Sub. 2 Sub. 3 Sub. 1 Sub. 2 Sub. 3 Std. 1 s s s s s s 111 121 131 112 122 132 Std. 2 s s s s s s 211 221 231 212 222 232 Std. 3 s s s s s s 311 321 331 312 322 332 Std. 4 s s s s s s 411 421 431 412 422 432 Table 1.2: The second way to print S. Example1.2. Anotherimportantrealizationoftensorsarethestorageofcolor imagesandvideos. Ablack-and-whiteimagecanbestoredasagreyscalematrix, whoseentriesarethegreyscalevaluesofthecorrespondingpixels. Colorimages are often built from several stacked color channels, each of which represents value levels of the given channel. For example, RGB images are composed of threeindependentchannelsforred, green, andblueprimarycolor components. Wecanapplya3rd-ordertensorP tostoreanRGBimage,whose(i,j,k)entry denotesthevalueofthek-thchannelinthe(i,j)position. (k=1,2,3represent the red, green, and blue channel, respectively.) In order to store a color video, we may need an extra index for the time axis. That is, we employ a 4th-order tensor M = (m ), where M(:,:,:,t) stores the t-th frame of the video as a ijkt color image. Example 1.3. Denote x = (x ,x ,...,x )(cid:62) ∈ Rn. As we know, a degree-1 1 2 n polynomialp (x)=c x +c x +···+c x canberewrittenintop (x)=x(cid:62)c, 1 1 1 2 2 n n 1 wherethevectorc=(c ,c ,...,c )(cid:62). Similarly,adegree-2polynomialp (x)= 1 2 n 2 (cid:80)n c x x ,thatis,aquadraticform,canbesimplifiedintop (x)=x(cid:62)Cx, i,j=1 ij i j 2 where the matrix C = (c ). By analogy, if we denote an mth-order tensor ij C = (c ) and apply a notation, which will be introduced in the next i1i2...im section, then the degree-m homogeneous polynomial n n n (cid:88) (cid:88) (cid:88) p (x)= ··· c x x ...x m i1i2...im i1 i2 im i1=1i2=1 im=1 can be rewritten as p (x)=Cxm. m Moreover, x(cid:62)c = 0 is often used to denote a hyperplane in Rn. Similarly, Cxm =0canstandforandegree-mhypersurfaceinRn. WeshallseeinSection 1.2 that the normal vector at a point x on this hypersurface is n =Cxm−1. 0 x0 0 1.1. WHAT ARE TENSORS? 5 Example 1.4. The Taylor expansion is a well-known mathematical tool. The Taylor series of a real or complex-valued function f(x) that is infinitely differ- entiable at a real or complex number a is the power series ∞ (cid:88) 1 f(x)= f(m)(a)(x−a)m. m! m=0 A multivariate function f(x ,x ,...,x ) that is infinitely differentiable at a 1 2 n point (a ,a ,...,a ) also has its Taylor expansion 1 2 n f(x ,...,x )= (cid:88)∞ ··· (cid:88)∞ (x1−a1)i1...(xn−an)in ∂i1+···+inf(a1,...,an), 1 n i1=0 in=0 i1!···in! ∂xi11...∂xinn which is equivalent to n f(x ,...,x )=f(a ,...,a )+(cid:88)∂f(a1,...,an)(x −a ) 1 n 1 n ∂x i i i i=1 + 1 (cid:88)n (cid:88)n ∂2f(a1,...,an)(x −a )(x −a ) 2! ∂x ∂x i i j j i j i=1j=1 + 1 (cid:88)n (cid:88)n (cid:88)n ∂3f(a1,...,an)(x −a )(x −a )(x −a )+··· . 3! ∂x ∂x ∂x i i j j k k i j k i=1j=1k=1 Denoting x=(x ,x ,...,x )(cid:62) and a=(a ,a ,...,a )(cid:62), then we can rewrite 1 2 n 1 2 n the second and the third terms in the above equation as (x−a)(cid:62)∇f(a) and (x−a)(cid:62)∇2f(a)(x−a), where ∇f(a) and ∇2f(a) are the gradient and the Hessian of f(x) at a, re- spectively. If we define the mth-order gradient tensor ∇mf(a) of f(x) at a as (cid:0)∇mf(a)(cid:1) = ∂mf(a1,a2,...,an), i1i2...im ∂xi1∂xi2...∂xim then the Taylor expansion of a multivariate function can also be expressed by ∞ (cid:88) 1 f(x)= ∇mf(a)(x−a)m. m! m=0 Example 1.5. The discrete-time Markov chain is one of the most important models of random processes [10, 28, 81, 114], which assumes that the future is dependent solely on the finite past. This model is so simple and natural that Markovchainsarewidelyemployedinmanydisciplines,suchasthermodynam- ics,statisticalmechanics,queueingtheory,webanalysis,economics,andfinance. An sth-order Markov chain is a stochastic process with the Markov property, that is, a sequence of variables {Y }∞ satisfying t t=1 Pr(Y =i |Y =i ,...,Y =i )=Pr(Y =i |Y =i ,...,Y =i ), t 1 t−1 2 1 t t 1 t−1 2 t−s s+1 for all t>s. That is, any state depends solely on the immediate past s states. Particularly,whenthesteplengths=1,thesequence{Y }∞ isastandardfirst- t t=1 order Markov chain. Define the transition probability matrix P of an second- order Markov chain as p =Pr(Y =i|Y =j), ij t t−1 6 CHAPTER 1. INTRODUCTION AND PRELIMINARIES which is a stochastic matrix, that is, p ≥ 0 for all i,j = 1,2,...,n and ij (cid:80)ni=1pij =1forallj =1,2,...,n. TheprobabilitydistributionofY(cid:101)tisdenoted by a vector (x ) =Pr(Y =i). t i t ThentheMarkovchainismodeledbyx =Px ,thusthestationaryprobabil- t+1 t itydistributionxsatisfiesx=Pxandisexactlyaneigenvectorofthetransition probability matrix corresponding to the eigenvalue 1. Forhigher-orderMarkovchains,wehavesimilarformulations. Takeasecond- orderMarkovchain,thatis,s=2,asanexample. Definethetransitionproba- bility tensor P of a second-order Markov chain as p =Pr(Y =i|Y =j,Y =k), ijk t t−1 t−2 which is a stochastic tensor, that is, P ≥ 0 and (cid:80)n p = 1 for all j,k = i=1 ijk 1,2,...,n. The probability distribution of Y(cid:101)t in the product space can be re- shaped into a matrix (X ) =Pr(Y =i,Y =j). t i,j t t−1 Then the stationary probability distribution in the product space X satisfies n (cid:88) X(i,j)= P(i,j,k)·X(j,k) k=1 for all i,j =1,2,...,n. If we further assume that Pr(Y =i,Y =j)=Pr(Y =i)·Pr(Y =j) t t−1 t t−1 for all i,j =1,2,...,n and denote (x ) =Pr(Y =i), that is, X =x x(cid:62), then t i t t t t thestationaryprobabilitydistributionxsatisfiesx=Px2 [74]. Weshallseein Section 1.2 that x is a special eigenvector of the tensor P. Fromtheaboveexamples,wecangainsomebasicideasaboutwhattensors are and where they come from. Generally speaking, there are two kinds of tensors: the first kind is a data structure, which admits different dimensions according to the complexity of the data; the second kind is an operator, where it possesses different meanings in different situations. 1.2 Basic Operations Wefirstintroduceseveralbasictensoroperationsthatwillbefrequentlyreferred to in the book. One of the difficulties of tensor research is the complicated indices. Therefore we often use some small-size examples rather than exact definitions in this section to describe those essential concepts more clearly. For more detailed definitions, we refer to Chapter 12 in [49] and the references [18, 66, 98]. • To treat or visualize the multidimensional structures, we often reshape a higher-order tensor into a vector or a matrix, which are more familiar to 1.2. BASIC OPERATIONS 7 us. The vectorization operator vec(·) turns tensors into column vectors. Take a 2×2×2 tensor A=(a )2 for example, then ijk i,j,k=1 vec(A)=(a ,a ,a ,a ,a ,a ,a ,a )(cid:62). 111 211 121 221 112 212 122 222 There are a lot of different ways to reshape tensors into matrices, which are often referred to as “unfoldings.” The most frequently applied one is call the modal unfolding. The mode-k unfolding A of an mth-order (k) tensor A of size n ×n ×···×n is an n -by-(N/n ) matrix, where 1 2 m k k N = n n ...n . Again, use the above 2×2×2 example. Its mode-1, 1 2 m mode-2, and mode-3 unfoldings are (cid:18) (cid:19) a a a a A = 111 121 112 122 , (1) a a a a 211 221 212 222 (cid:18) (cid:19) a a a a A = 111 211 112 212 , (2) a a a a 121 221 122 222 (cid:18) (cid:19) a a a a A = 111 211 121 221 , (3) a a a a 112 212 122 222 respectively. Sometimesthemode-kunfoldingisalsodenotedasUnfold (·). k • The transposition operation of a matrix is understood as the exchange of the two indices. But higher-order tensors have more indices, thus we have much more transpositions of tensors. If A is a 3rd-order tensor, then there are six possible transpositions denoted as A<[σ(1),σ(2),σ(3)]>, (cid:0) (cid:1) where σ(1),σ(2),σ(3) is any of the six permutations of (1,2,3). When B=A<[σ(1),σ(2),σ(3)]>, it means b =a . iσ(1)iσ(2)iσ(3) i1i2i3 If all the entries of a tensor are invariant under any permutations of the indices,thenwecallitasymmetrictensor. Forexample,a3rd-ordertensor is said to be symmetric if and only if A<[1,2,3]> =A<[1,3,2]> =A<[2,1,3]> =A<[2,3,1]> =A<[3,1,2]> =A<[3,2,1]>. • Modal tensor-matrix multiplications are essential in this book, which are generalizations of matrix-matrix multiplications. Let A be an mth-order tensorofsizen ×n ×···×n andM beamatrixofsizen ×n(cid:48),then 1 2 m k k themode-kproductA× M ofthetensorAandthematrixM isanother k mth-order tensor of size n ×···×n(cid:48) ×···×n with 1 k m (cid:88)nk (A× M) = a ·m . k i1...ik−1jkik+1...im i1...ik−1ikik+1...im ikjk ik=1 Particularly, if A, M and M are all matrices, then A× M × M = 1 2 1 1 2 2 M(cid:62)AM . Easily verified, the tensor-matrix multiplications satisfy that 1 2 1. A× M × M =A× M × M , if k(cid:54)=l, k k l l l l k k 2. A× M × M =A× (M M ), k 1 k 2 k 1 2 8 CHAPTER 1. INTRODUCTION AND PRELIMINARIES 3. A× (α M +α M )=α A× M +α A× M , k 1 1 2 2 1 k 1 1 k 1 4. Unfold (A× M)=M(cid:62)A , k k (k) 5. vec(A× M × M ···× M )=(M ⊗···⊗M ⊗M )(cid:62)vec(A), 1 1 2 2 m m m 2 1 where A is a tensor, M are matrices, and α ,α are scalars. k 1 2 • Ifthematricesdegradeintocolumnvectors,thenweobtainanothercluster of important notations for tensor spectral theory. Let A be an mth-order n-dimensionaltensor,thatis,ofsizen×n×···×n,andxbeavectorof length n, then for simplicity: Axm =A× x× x× x···× x is a scalar, 1 2 3 m Axm−1 =A × x× x···× x is a vector, 2 3 m Axm−2 =A × x···× x is a matrix. 3 m • Likethevectorcase,aninnerproductoftwotensorsAandBofthesame size are defined by (cid:88)n1 (cid:88)nm (cid:104)A,B(cid:105)= ··· a ·b , i1i2...im i1i2...im i1=1 im=1 which is exactly the usual inner product of the two vectors vec(A) and vec(B). • The outer product of two tensors is a higher-order tensor. Let A and B be mth-order and (m(cid:48))th-order tensors, respectively. Then their outer product A◦B is an (m+m(cid:48))th-order tensor with (A◦B) =a ·b . i1...imj1...jm(cid:48) i1i2...im j1j2...jm(cid:48) If a and b are vectors, then a◦b=ab(cid:62). • We sometimes refer to the Hadamard product of two tensors with the same size as (A ⊗ B) =a ·b . HAD i1i2...im i1i2...im i1i2...im TheHadamardproductwillalsobedenotedasA.∗B inthedescriptions of some algorithms, which is a MATLAB-type notation. 1.3 Tensor Decompositions Givenatensor,howcanweretrievetheinformationhiddeninside? Onereason- able answer is the tensor decomposition approach. Existing tensor decomposi- tions include the Tucker-type decompositions, the CANDECOMP/PARAFAC (CP) decomposition, tensor train representation, etc. For those readers inter- estedintensordecompositions,werecommendthesurveypapers[50,66]. Some tensor decompositions are generalizations of the singular value decomposition (SVD) [49]. TheSVDisoneofthemostimportanttoolsformatrixanalysisandcompu- tation. Any matrix A∈Rm×n with rank r has the decomposition A=UΣV(cid:62),
Description: