Table Of ContentTheory and
Computation
of Tensors
Theory and
Computation
of Tensors
Multi-Dimensional Arrays
WEIYANG DING
YIMIN WEI
AMSTERDAM (cid:129) BOSTON (cid:129) HEIDELBERG (cid:129) LONDON
NEW YORK (cid:129) OXFORD (cid:129) PARIS (cid:129) SAN DIEGO
SAN FRANCISCO (cid:129) SINGAPORE (cid:129) SYDNEY (cid:129) TOKYO
Academic Press is an imprint of Elsevier
AcademicPressisanimprintofElsevier
125LondonWall,LondonEC2Y5AS,UK
525BStreet,Suite1800,SanDiego,CA92101-4495,USA
50HampshireStreet,5thFloor,Cambridge,MA02139,USA
TheBoulevard,LangfordLane,Kidlington,OxfordOX51GB,UK
Copyright©2016ElsevierLtd.Allrightsreserved.
Nopartofthispublicationmaybereproducedortransmittedinanyformorbyanymeans,electronic
ormechanical,includingphotocopying,recording,oranyinformationstorageandretrievalsystem,
withoutpermissioninwritingfromthepublisher.Detailsonhowtoseekpermission,further
informationaboutthePublisher’spermissionspoliciesandourarrangementswithorganizationssuchas
theCopyrightClearanceCenterandtheCopyrightLicensingAgency,canbefoundatourwebsite:
www.elsevier.com/permissions.
Thisbookandtheindividualcontributionscontainedinitareprotectedundercopyrightbythe
Publisher(otherthanasmaybenotedherein).
Notices
Knowledgeandbestpracticeinthisfieldareconstantlychanging.Asnewresearchandexperience
broadenourunderstanding,changesinresearchmethods,professionalpractices,ormedicaltreatment
maybecomenecessary.
Practitionersandresearchersmustalwaysrelyontheirownexperienceandknowledgeinevaluating
andusinganyinformation,methods,compounds,orexperimentsdescribedherein.Inusingsuch
informationormethodstheyshouldbemindfuloftheirownsafetyandthesafetyofothers,including
partiesforwhomtheyhaveaprofessionalresponsibility.
Tothefullestextentofthelaw,neitherthePublishernortheauthors,contributors,oreditors,assume
anyliabilityforanyinjuryand/ordamagetopersonsorpropertyasamatterofproductsliability,
negligenceorotherwise,orfromanyuseoroperationofanymethods,products,instructions,orideas
containedinthematerialherein.
LibraryofCongressCataloging-in-PublicationData
AcatalogrecordforthisbookisavailablefromtheLibraryofCongress
BritishLibraryCataloguing-in-PublicationData
AcataloguerecordforthisbookisavailablefromtheBritishLibrary
ISBN978-0-12-803953-3(print)
ISBN978-0-12-803980-9(online)
ForinformationonallAcademicPresspublications
visitourwebsiteathttps://www.elsevier.com/
Publisher:GlynJones
AcquisitionEditor:GlynJones
EditorialProjectManager:Anna Valutkevich
ProductionProjectManager:DebasishGhosh
Designer:Nikki Levy
TypesetbySPiGlobal,India
Preface
Thisbookisdevotedtothetheoryandcomputationoftensors,alsocalledhyper-
matrices. Our investigation includes theories on generalized tensor eigenvalue
problems and two kinds of structured tensors, Hankel tensors and M-tensors.
Both theoretical analyses and computational aspects are discussed.
We begin with the generalized tensor eigenvalue problems, which are re-
garded as a unified framework of different kinds of tensor eigenvalue problems
arising from applications. We focus on the perturbation theory and the error
analysis of regular tensor pairs. Employing various techniques, we extend sev-
eral classical results from matrices or matrix pairs to tensor pairs, such as the
Gershgorin circle theorem, the Collatz-Wielandt formula, the Bauer-Fike the-
orem, the Rayleigh-Ritz theorem, backward error analysis, the componentwise
distance of a nonsingular tensor to singularity, etc.
Inthesecondpart,wefocusonHankeltensors. Wefirstproposeafastalgo-
rithmforHankeltensor-vectorproductsbyintroducingaspecialclassofHankel
tensors that can be diagonalized by Fourier matrices, called anti-circulant ten-
sors. ThenweobtainafastalgorithmforHankeltensor-vectorproductsbyem-
beddingaHankeltensorintoalargeranti-circulanttensor. Thecomputational
complexityisreducedfromO(nm)toO(m2nlogmn). Next,weinvestigatethe
spectral inheritance properties of Hankel tensors by applying the convolution
formulaofthefastalgorithmandanaugmentedVandermondedecompositionof
strong Hankel tensors. We prove that if a lower-order Hankel tensor is positive
semidefinite,thenahigher-orderHankeltensorwiththesamegeneratingvector
has no negative H-eigenvalues, when (i) the lower order is 2, or (ii) the lower
order is even and the higher order is its multiple.
The third part is contributed to M-tensors. We attempt to extend the
equivalentdefinitionsofnonsingularM-matrices,suchassemi-positivity,mono-
tonicity, nonnegative inverse, etc., to the tensor case. Our results show that
the semi-positivity is still an equivalent definition of nonsingular M-tensors,
whilethemonotonicityisnot. Furthermore, thegeneralizationofthe“nonneg-
ative inverse” property inspires the study of multilinear system of equations.
We prove the existence and uniqueness of the positive solutions of nonsingular
M-equations with positive right-hand sides, and also propose several iterative
methods for computing the positive solutions.
We would like to thank our collaborator Prof. Liqun Qi of the Hong Kong
Polytechnic University, who leaded us to the research of tensor spectral theory
and always encourages us to explore the topic. We would also like to thank
Prof. Eric King-wah Chu of Monash University and Prof. Sanzheng Qiao of
McMasterUniversity,whoreadthisbookcarefullyandprovidedfeedbackduring
the writing process.
This work was supported by the National Natural Science Foundation of
China under Grant 11271084, School of Mathematical Sciences and Key Labo-
ratory of Mathematics for Nonlinear Sciences, Fudan University.
Chapter 1
Introduction and
Preliminaries
We first introduce the concepts and sources of tensors in this chapter. Several
essentialandfrequentlyusedoperationsinvolvingtensorsarealsoincluded. Fur-
thermore, two basic topics, tensor decompositions and tensor eigenvalue prob-
lems, are briefly discussed at the end of this chapter.
1.1 What Are Tensors?
The term tensor or hypermatrix in this book refers to a multiway array. The
numberofthedimensionsofatensoriscalleditsorder,thatis,A=(a )
i1i2...im
is an mth-order tensor. Particularly, a scalar is a 0th-order tensor, a vector is
a 1st-order tensor, and a matrix is a 2nd-order tensor. As other mathematical
concepts, tensor or hypermatrix is abstracted from real-world phenomena and
other scientific theories. Where do the tensors arise? What kinds of properties
do we care most? How many different types of tensors do we have? We will
briefly answer these questions employing several illustrative examples in this
section.
Example 1.1. As we know, a table is one of the most common realizations of
a matrix. We can also understand tensors or hypermatrices as complex tables
with multivariables. For instance, if we record the scores of 4 students on 3
subjects for both the midterm and final exams, then we can design a 3rd-order
tensor S of size 4×3×2 whose (i,j,k) entry s denotes the score of the i-th
ijk
student on the j-th subjects in the k-th exam. This representation is natural
and easily understood, thus it is a convenient data structure for construction
andquery. However,whenweneedtoprinttheinformationonapieceofpaper,
the 3D structure is apparently not suitable for 2D visualization. Thus we need
tounfoldthecubictensorintoamatrix. Thefollowingtwodifferentunfoldings
of the same tensor both include all the information in the original complex
table. We can see from the two tables that their entries are the same up to a
permutation. Actually, there are many different ways to unfold a higher-order
tensorintoamatrix,andthelinkagesbetweenthemarepermutationsofindices.
(cid:84)(cid:104)(cid:101)(cid:111)(cid:114)(cid:121)(cid:32)(cid:97)(cid:110)(cid:100)(cid:32)(cid:67)(cid:111)(cid:109)(cid:112)(cid:117)(cid:116)(cid:97)(cid:116)(cid:105)(cid:111)(cid:110)(cid:32)(cid:111)(cid:102)(cid:32)(cid:84)(cid:101)(cid:110)(cid:115)(cid:111)(cid:114)(cid:115)(cid:46) 3
(cid:104)(cid:116)(cid:116)(cid:112)(cid:58)(cid:47)(cid:47)(cid:100)(cid:120)(cid:46)(cid:100)(cid:111)(cid:105)(cid:46)(cid:111)(cid:114)(cid:103)(cid:47)(cid:49)(cid:48)(cid:46)(cid:49)(cid:48)(cid:49)(cid:54)(cid:47)(cid:66)(cid:57)(cid:55)(cid:56)(cid:45)(cid:48)(cid:45)(cid:49)(cid:50)(cid:45)(cid:56)(cid:48)(cid:51)(cid:57)(cid:53)(cid:51)(cid:45)(cid:51)(cid:46)(cid:53)(cid:48)(cid:48)(cid:48)(cid:49)(cid:45)(cid:48)
Copyright © (cid:50)(cid:48)(cid:49)(cid:54)(cid:32)(cid:69)(cid:108)(cid:115)(cid:101)(cid:118)(cid:105)(cid:101)(cid:114)(cid:32)(cid:76)(cid:116)(cid:100)(cid:46)(cid:32)(cid:65)(cid:108)(cid:108)(cid:32)(cid:114)(cid:105)(cid:103)(cid:104)(cid:116)(cid:115)(cid:32)(cid:114)(cid:101)(cid:115)(cid:101)(cid:114)(cid:118)(cid:101)(cid:100)(cid:46)
4 CHAPTER 1. INTRODUCTION AND PRELIMINARIES
Sub. 1 Sub. 2 Sub. 3
Mid Final Mid Final Mid Final
Std. 1 s s s s s s
111 112 121 122 131 132
Std. 2 s s s s s s
211 212 221 222 231 232
Std. 3 s s s s s s
311 312 321 322 331 332
Std. 4 s s s s s s
411 412 421 422 431 432
Table 1.1: The first way to print S.
Mid Final
Sub. 1 Sub. 2 Sub. 3 Sub. 1 Sub. 2 Sub. 3
Std. 1 s s s s s s
111 121 131 112 122 132
Std. 2 s s s s s s
211 221 231 212 222 232
Std. 3 s s s s s s
311 321 331 312 322 332
Std. 4 s s s s s s
411 421 431 412 422 432
Table 1.2: The second way to print S.
Example1.2. Anotherimportantrealizationoftensorsarethestorageofcolor
imagesandvideos. Ablack-and-whiteimagecanbestoredasagreyscalematrix,
whoseentriesarethegreyscalevaluesofthecorrespondingpixels. Colorimages
are often built from several stacked color channels, each of which represents
value levels of the given channel. For example, RGB images are composed of
threeindependentchannelsforred, green, andblueprimarycolor components.
Wecanapplya3rd-ordertensorP tostoreanRGBimage,whose(i,j,k)entry
denotesthevalueofthek-thchannelinthe(i,j)position. (k=1,2,3represent
the red, green, and blue channel, respectively.) In order to store a color video,
we may need an extra index for the time axis. That is, we employ a 4th-order
tensor M = (m ), where M(:,:,:,t) stores the t-th frame of the video as a
ijkt
color image.
Example 1.3. Denote x = (x ,x ,...,x )(cid:62) ∈ Rn. As we know, a degree-1
1 2 n
polynomialp (x)=c x +c x +···+c x canberewrittenintop (x)=x(cid:62)c,
1 1 1 2 2 n n 1
wherethevectorc=(c ,c ,...,c )(cid:62). Similarly,adegree-2polynomialp (x)=
1 2 n 2
(cid:80)n c x x ,thatis,aquadraticform,canbesimplifiedintop (x)=x(cid:62)Cx,
i,j=1 ij i j 2
where the matrix C = (c ). By analogy, if we denote an mth-order tensor
ij
C = (c ) and apply a notation, which will be introduced in the next
i1i2...im
section, then the degree-m homogeneous polynomial
n n n
(cid:88) (cid:88) (cid:88)
p (x)= ··· c x x ...x
m i1i2...im i1 i2 im
i1=1i2=1 im=1
can be rewritten as
p (x)=Cxm.
m
Moreover, x(cid:62)c = 0 is often used to denote a hyperplane in Rn. Similarly,
Cxm =0canstandforandegree-mhypersurfaceinRn. WeshallseeinSection
1.2 that the normal vector at a point x on this hypersurface is n =Cxm−1.
0 x0 0
1.1. WHAT ARE TENSORS? 5
Example 1.4. The Taylor expansion is a well-known mathematical tool. The
Taylor series of a real or complex-valued function f(x) that is infinitely differ-
entiable at a real or complex number a is the power series
∞
(cid:88) 1
f(x)= f(m)(a)(x−a)m.
m!
m=0
A multivariate function f(x ,x ,...,x ) that is infinitely differentiable at a
1 2 n
point (a ,a ,...,a ) also has its Taylor expansion
1 2 n
f(x ,...,x )= (cid:88)∞ ··· (cid:88)∞ (x1−a1)i1...(xn−an)in ∂i1+···+inf(a1,...,an),
1 n i1=0 in=0 i1!···in! ∂xi11...∂xinn
which is equivalent to
n
f(x ,...,x )=f(a ,...,a )+(cid:88)∂f(a1,...,an)(x −a )
1 n 1 n ∂x i i
i
i=1
+ 1 (cid:88)n (cid:88)n ∂2f(a1,...,an)(x −a )(x −a )
2! ∂x ∂x i i j j
i j
i=1j=1
+ 1 (cid:88)n (cid:88)n (cid:88)n ∂3f(a1,...,an)(x −a )(x −a )(x −a )+··· .
3! ∂x ∂x ∂x i i j j k k
i j k
i=1j=1k=1
Denoting x=(x ,x ,...,x )(cid:62) and a=(a ,a ,...,a )(cid:62), then we can rewrite
1 2 n 1 2 n
the second and the third terms in the above equation as
(x−a)(cid:62)∇f(a) and (x−a)(cid:62)∇2f(a)(x−a),
where ∇f(a) and ∇2f(a) are the gradient and the Hessian of f(x) at a, re-
spectively. If we define the mth-order gradient tensor ∇mf(a) of f(x) at a
as
(cid:0)∇mf(a)(cid:1) = ∂mf(a1,a2,...,an),
i1i2...im ∂xi1∂xi2...∂xim
then the Taylor expansion of a multivariate function can also be expressed by
∞
(cid:88) 1
f(x)= ∇mf(a)(x−a)m.
m!
m=0
Example 1.5. The discrete-time Markov chain is one of the most important
models of random processes [10, 28, 81, 114], which assumes that the future is
dependent solely on the finite past. This model is so simple and natural that
Markovchainsarewidelyemployedinmanydisciplines,suchasthermodynam-
ics,statisticalmechanics,queueingtheory,webanalysis,economics,andfinance.
An sth-order Markov chain is a stochastic process with the Markov property,
that is, a sequence of variables {Y }∞ satisfying
t t=1
Pr(Y =i |Y =i ,...,Y =i )=Pr(Y =i |Y =i ,...,Y =i ),
t 1 t−1 2 1 t t 1 t−1 2 t−s s+1
for all t>s. That is, any state depends solely on the immediate past s states.
Particularly,whenthesteplengths=1,thesequence{Y }∞ isastandardfirst-
t t=1
order Markov chain. Define the transition probability matrix P of an second-
order Markov chain as
p =Pr(Y =i|Y =j),
ij t t−1
6 CHAPTER 1. INTRODUCTION AND PRELIMINARIES
which is a stochastic matrix, that is, p ≥ 0 for all i,j = 1,2,...,n and
ij
(cid:80)ni=1pij =1forallj =1,2,...,n. TheprobabilitydistributionofY(cid:101)tisdenoted
by a vector
(x ) =Pr(Y =i).
t i t
ThentheMarkovchainismodeledbyx =Px ,thusthestationaryprobabil-
t+1 t
itydistributionxsatisfiesx=Pxandisexactlyaneigenvectorofthetransition
probability matrix corresponding to the eigenvalue 1.
Forhigher-orderMarkovchains,wehavesimilarformulations. Takeasecond-
orderMarkovchain,thatis,s=2,asanexample. Definethetransitionproba-
bility tensor P of a second-order Markov chain as
p =Pr(Y =i|Y =j,Y =k),
ijk t t−1 t−2
which is a stochastic tensor, that is, P ≥ 0 and (cid:80)n p = 1 for all j,k =
i=1 ijk
1,2,...,n. The probability distribution of Y(cid:101)t in the product space can be re-
shaped into a matrix
(X ) =Pr(Y =i,Y =j).
t i,j t t−1
Then the stationary probability distribution in the product space X satisfies
n
(cid:88)
X(i,j)= P(i,j,k)·X(j,k)
k=1
for all i,j =1,2,...,n. If we further assume that
Pr(Y =i,Y =j)=Pr(Y =i)·Pr(Y =j)
t t−1 t t−1
for all i,j =1,2,...,n and denote (x ) =Pr(Y =i), that is, X =x x(cid:62), then
t i t t t t
thestationaryprobabilitydistributionxsatisfiesx=Px2 [74]. Weshallseein
Section 1.2 that x is a special eigenvector of the tensor P.
Fromtheaboveexamples,wecangainsomebasicideasaboutwhattensors
are and where they come from. Generally speaking, there are two kinds of
tensors: the first kind is a data structure, which admits different dimensions
according to the complexity of the data; the second kind is an operator, where
it possesses different meanings in different situations.
1.2 Basic Operations
Wefirstintroduceseveralbasictensoroperationsthatwillbefrequentlyreferred
to in the book. One of the difficulties of tensor research is the complicated
indices. Therefore we often use some small-size examples rather than exact
definitions in this section to describe those essential concepts more clearly. For
more detailed definitions, we refer to Chapter 12 in [49] and the references
[18, 66, 98].
• To treat or visualize the multidimensional structures, we often reshape a
higher-order tensor into a vector or a matrix, which are more familiar to
1.2. BASIC OPERATIONS 7
us. The vectorization operator vec(·) turns tensors into column vectors.
Take a 2×2×2 tensor A=(a )2 for example, then
ijk i,j,k=1
vec(A)=(a ,a ,a ,a ,a ,a ,a ,a )(cid:62).
111 211 121 221 112 212 122 222
There are a lot of different ways to reshape tensors into matrices, which
are often referred to as “unfoldings.” The most frequently applied one
is call the modal unfolding. The mode-k unfolding A of an mth-order
(k)
tensor A of size n ×n ×···×n is an n -by-(N/n ) matrix, where
1 2 m k k
N = n n ...n . Again, use the above 2×2×2 example. Its mode-1,
1 2 m
mode-2, and mode-3 unfoldings are
(cid:18) (cid:19)
a a a a
A = 111 121 112 122 ,
(1) a a a a
211 221 212 222
(cid:18) (cid:19)
a a a a
A = 111 211 112 212 ,
(2) a a a a
121 221 122 222
(cid:18) (cid:19)
a a a a
A = 111 211 121 221 ,
(3) a a a a
112 212 122 222
respectively. Sometimesthemode-kunfoldingisalsodenotedasUnfold (·).
k
• The transposition operation of a matrix is understood as the exchange
of the two indices. But higher-order tensors have more indices, thus we
have much more transpositions of tensors. If A is a 3rd-order tensor,
then there are six possible transpositions denoted as A<[σ(1),σ(2),σ(3)]>,
(cid:0) (cid:1)
where σ(1),σ(2),σ(3) is any of the six permutations of (1,2,3). When
B=A<[σ(1),σ(2),σ(3)]>, it means
b =a .
iσ(1)iσ(2)iσ(3) i1i2i3
If all the entries of a tensor are invariant under any permutations of the
indices,thenwecallitasymmetrictensor. Forexample,a3rd-ordertensor
is said to be symmetric if and only if
A<[1,2,3]> =A<[1,3,2]> =A<[2,1,3]> =A<[2,3,1]> =A<[3,1,2]> =A<[3,2,1]>.
• Modal tensor-matrix multiplications are essential in this book, which are
generalizations of matrix-matrix multiplications. Let A be an mth-order
tensorofsizen ×n ×···×n andM beamatrixofsizen ×n(cid:48),then
1 2 m k k
themode-kproductA× M ofthetensorAandthematrixM isanother
k
mth-order tensor of size n ×···×n(cid:48) ×···×n with
1 k m
(cid:88)nk
(A× M) = a ·m .
k i1...ik−1jkik+1...im i1...ik−1ikik+1...im ikjk
ik=1
Particularly, if A, M and M are all matrices, then A× M × M =
1 2 1 1 2 2
M(cid:62)AM . Easily verified, the tensor-matrix multiplications satisfy that
1 2
1. A× M × M =A× M × M , if k(cid:54)=l,
k k l l l l k k
2. A× M × M =A× (M M ),
k 1 k 2 k 1 2
8 CHAPTER 1. INTRODUCTION AND PRELIMINARIES
3. A× (α M +α M )=α A× M +α A× M ,
k 1 1 2 2 1 k 1 1 k 1
4. Unfold (A× M)=M(cid:62)A ,
k k (k)
5. vec(A× M × M ···× M )=(M ⊗···⊗M ⊗M )(cid:62)vec(A),
1 1 2 2 m m m 2 1
where A is a tensor, M are matrices, and α ,α are scalars.
k 1 2
• Ifthematricesdegradeintocolumnvectors,thenweobtainanothercluster
of important notations for tensor spectral theory. Let A be an mth-order
n-dimensionaltensor,thatis,ofsizen×n×···×n,andxbeavectorof
length n, then for simplicity:
Axm =A× x× x× x···× x is a scalar,
1 2 3 m
Axm−1 =A × x× x···× x is a vector,
2 3 m
Axm−2 =A × x···× x is a matrix.
3 m
• Likethevectorcase,aninnerproductoftwotensorsAandBofthesame
size are defined by
(cid:88)n1 (cid:88)nm
(cid:104)A,B(cid:105)= ··· a ·b ,
i1i2...im i1i2...im
i1=1 im=1
which is exactly the usual inner product of the two vectors vec(A) and
vec(B).
• The outer product of two tensors is a higher-order tensor. Let A and
B be mth-order and (m(cid:48))th-order tensors, respectively. Then their outer
product A◦B is an (m+m(cid:48))th-order tensor with
(A◦B) =a ·b .
i1...imj1...jm(cid:48) i1i2...im j1j2...jm(cid:48)
If a and b are vectors, then a◦b=ab(cid:62).
• We sometimes refer to the Hadamard product of two tensors with the
same size as
(A ⊗ B) =a ·b .
HAD i1i2...im i1i2...im i1i2...im
TheHadamardproductwillalsobedenotedasA.∗B inthedescriptions
of some algorithms, which is a MATLAB-type notation.
1.3 Tensor Decompositions
Givenatensor,howcanweretrievetheinformationhiddeninside? Onereason-
able answer is the tensor decomposition approach. Existing tensor decomposi-
tions include the Tucker-type decompositions, the CANDECOMP/PARAFAC
(CP) decomposition, tensor train representation, etc. For those readers inter-
estedintensordecompositions,werecommendthesurveypapers[50,66]. Some
tensor decompositions are generalizations of the singular value decomposition
(SVD) [49].
TheSVDisoneofthemostimportanttoolsformatrixanalysisandcompu-
tation. Any matrix A∈Rm×n with rank r has the decomposition
A=UΣV(cid:62),
Description:Theory and Computation of Tensors: Multi-Dimensional Arrays investigates theories and computations of tensors to broaden perspectives on matrices. Data in the Big Data Era is not only growing larger but also becoming much more complicated. Tensors (multi-dimensional arrays) arise naturally from many