Table Of ContentAlternating minimal energy methods for linear
systems in higher dimensions. Part I: SPD systems
∗
Sergey V. Dolgov and Dmitry V. Savostyanov
† ‡
January 25, 2013
3
1
0
Abstract
2
n Weintroduceafamilyofnumericalalgorithmsforthesolutionoflinearsystemin
a
higherdimensionswiththematrixandrighthandsidegivenandthesolutionsought
J
in the tensor train format. The proposed methods are rank–adaptive and follow the
5
2 alternating directionsframework, but in contrastto ALSmethods,in each iterationa
tensorsubspaceisenlargedbyasetofvectorschosensimilarlytothesteepestdescent
]
A algorithm. The convergenceis analysedin the presenceofapproximation errorsand
N the geometrical convergence rate is estimated and related to the one of the steepest
. descent. The complexity of the presented algorithms is linear in the mode size and
h
dimension and the convergence demonstratedin the numerical experiments is com-
t
a
parabletotheoneoftheDMRG–typealgorithm.
m
Keywords: high–dimensionalproblems,tensortrain format,ALS,DMRG, steepest
[
descent,convergencerate,superfastalgorithms.
1
v
8
1 Introduction
6
0
6
Linearsystemsarisingfromhigh–dimensionalproblemsusuallycannotbesolvedbystan-
.
1
dard numerical algorithms. Ifthe equation isconsidered in ddimensions on a n n
0 1 2
× ×
3 ... n grid,thenumberofunknownsn ...n scalesexponentiallywithd,andevenfor
d 1 d
1 ×
moderate dimension d and mode sizes n the numerical complexity lays far beyond the
: k
v
technicalpossibilitiesofmodernworkstationsandparallelsystems. Tomaketheproblem
i
X
tractable, different approximations are proposed, including sparse grids [38, 3] and ten-
r sor product methods [24, 22, 23, 14]. In this paperwe consider the linear system Ax = y,
a
wherethematrixAandright-hand-sideyaregivenandapproximatesolutionxissought
in the tensor train (TT) format. Methods based on the TT format, also known as a linear
∗PartiallysupportedbyRFBRgrants12-01-00546-a,11-01-12137-ofi-m-2011,11-01-00549-a,12-01-33013,
12-01-31056, Russian Fed. Gov. contracts No. Π1112, 14.740.11.0345, 16.740.12.0727 at Institute of Nu-
mericalMathematics,RussianAcademyofSciences,andEPSRCgrantEP/H003789/1attheUniversityof
Southampton.
†Max-Planck-InstitutfürMathematikindenNaturwissenschaften,Inselstr. 22-26,D-04103Leipzig,Ger-
many(sergey.v.dolgov@gmail.com).
‡University of Southampton, Department of Chemistry, Highfield Campus, Southampton SO17 1BJ,
UnitedKingdom(dmitry.savostyanov@gmail.com)
1
tensor network, are novel and particularly interesting among all tensor product methods
due totheir robustness andsimplicity.
Thenumericaloptimizationontensornetworkswasfirstconsideredinquantumphysics
community by S. White [42], who introduces the matrixproductstates (MPS) formalism to
represent the ground state of a spin system together with the density matrix renormaliza-
tiongroup(DMRG)optimizationscheme. Thetensortrainformatandsomecomputational
methods were independently re-discovered in the papers of Oseledets and Tyrtyshnikov
(see [30] and references therein) until the results of White et. al. were popularized in the
numerical mathematics community by R. Schneider [18]. The questions concerning the
convergence properties of alternating schemes for different tensor product formats were
immediately raised and studied. The experimental results from quantum physics show
thenotablyfastconvergenceofDMRGforthegroundstateproblem,i.e.,findingthemin-
imal eigenstate of a system, but give no theoretical justification for this observation. The
alternatingleastsquares(ALS)algorithm wasusedinmultilinearanalysisforthecomputa-
tionofcanonicaltensordecompositionsinceearlyresultsofHitchcock[17]andwasknown
for its monotone but very slow convergence. For ALS there is also a lack of convergence
estimates both in the classical papers [16, 4], and in the recent ones, where ALS was ap-
plied to the Tucker model [5, 33], tensor trains [29], hierarchical Tucker format [25] and
high–dimensional interpolation [34].
In recent papers by Uschmajew [41, 35] the local convergence of ALS is proven for
the canonical and tensor train decompositions. This is a major theoretical breakthrough,
which unfortunately does not immediately lead to practical algorithms due to the local
characterofconvergencestudied,unjustifiedassumptionsonthestructureoftheHessian,
andverystrongrequirementsontheaccuracyoftheinitialguess. Theconvergencerateof
ALS is difficult to estimate partly due to the complex geometrical structure of manifolds
definedbytensornetworks. Thisproblemisnowapproachedfromseveraldirections,and
we mightexpect newresults soon [12, 11].
IncontrasttoALSschemeswhichoperateonmanifoldsoffixeddimension,theDMRG
algorithm changes the ranks of a tensor format. This allows to choose the ranks adap-
tively to the desired error threshold or the accuracy of the result and develop more prac-
tical algorithms which do not rely on a priori choice of ranks. The DMRG was adopted
for novel tensor formats (see references above) and new problems, including adaptive
high–dimensional interpolation [37] andsolution of linearsystems [9, 18]. The geometri-
cal analysis, eg the convergence of the nonlinear Gauss–Seidel method, is however even
more difficult when the dimensionsofunderlyingmanifoldsare notfixed.
Apart of working with the tensor format structure directly, like ALS and DMRG do,
standard algorithms from numerical linear algebra can be applied with tensor approx-
imations and other tensor arithmetics. Following this paradigm, the solution of linear
problemsin tensor product formats wasaddressed in [36, 1, 6]. Theusual considerations
of linear algebra can be used in this case to analyze the convergence. A first notable ex-
ample is the method of conjugate–gradient type for the Rayleigh quotient minimization
in higherdimensions, for which the global convergence wasproven byO. Lebedeva[27].
WedevelopaframeworkwhichcombinestheALSoptimizationsteps(ranksarefixed,
convergence estimates not yet possible) with the steps when the tensor subspaces are in-
creased and the ranks of a tensor format grow. Choosing the new vectors in accordance
with standardlinearalgebraalgorithms, werecastthe classicalconvergence estimatesfor
the proposed algorithm in higherdimensions. Inthispaperweconsider the caseofsym-
2
metrical positive definite (SPD)matricesandanalyzethe convergence inthe A–norm, i.e.
minimizetheenergyfunction. Thebasisenrichmentchoicefollowsthesteepestdescent(SD)
algorithmandtheconvergenceoftheresultedmethodisanalyzedwithrespecttotheone
of steepestdescent. Weshow that the basisenrichmentstep combined with the ALSstep
can be seen as a certain computationally cheap approximation of the DMRG step. The
complexity of the resulted method is equal to the one of ALS and is linear in the mode
sizeanddimension. Ourchoiceofthebasisenrichmentappearstobeverygoodforprac-
tical computations, and for the considered numerical examples the proposed methods
converge almostasfastasthe DMRGalgorithm.
Summarizingtheabove,theproposedalgorithmshave(1)provengeometricalconver-
gence with the estimated rate, (2) practical convergence compared to the one of DMRG,
(3)numerical complexity compared tothe one ofALS.
The paperisorganizedasfollows.
In Section 2 weintroduce the tensor train notation andnecessary definitions.
In Section 3 we introduce the basic notation for ALS and DMRG schemes. We also
studyhowthemodification ofoneTT–blockaffectstheALSproblemforitsneighborand
describe this in terms ofthe Galerkin correction method.
In Section 4 we develop the family of steepest descent methods for the problems in
one, two and many dimensions. The proposed methods have an inner–outer structure,
i.e.,asteepestdescentstepinddimensionsisfollowedbyasteepestdescentstepind−1
dimension,etc, cf. the interpolation algorithms [32,13]. Theconvergenceofthe recursive
algorithms in higher dimensions is analyzed using the Galerkin correction framework.
The effectofroundoff/approximation errors is alsostudied.
SincewemakenoassumptionsontheTT–ranksofthesolution,theranksofthevectors
intheproposedalgorithmscangrowateachiterationandmakethealgorithminefficient.
InSection5wediscusstheimplementationdetails,inparticularthestepswhenthetensor
approximation isrequired to reduce the ranks.
InSection6themodelnumericalexperimentsdemonstratetheefficiencyofthemethod
proposed andcompare it with other algorithms mentioned in the paper.
2 Tensor train notation and definitions
Thetensortrain(TT)representationofad-dimensionaltensorx = [x(i ,...,i )]iswritten
1 d
asthe followingmultilinearmap (cf. [35])
x = τ(X¯) = τ(X(1),...,X(d)),
(1)
x(i ,...,i ) = X(1) (i )X(2) (i )...X(d−1) (i )X(d) (i ),
1 d α0,α1 1 α1,α2 2 αd−2,αd−1 d−1 αd−1,αd d
wherei = 1,...,n arethemode(physical)indices,α = 1,...,r aretherankindices,X(k)
k k k k
arethe tensortraincores(TT–cores)andX¯ = (X(1),...,X(d))denotethewholetensortrain.
HereandlaterweusetheEinsteinsummation convention[10],whichassumesasumma-
tion over every pair of repeated indices. Therefore, in Eq. (1) we assume the summation
over all rank indices α , k = 1,...,d − 1. We also imply the closed boundary conditions
k
r = r = 1 to make the right–hand side a scalar for each (i ,...,i ). Eq. (1) is written in
0 d 1 d
the elementwise form, i.e., the equation is assumed over all free (unpaired) indices. It is
often convenientin higherdimensionsand will be usedthroughout the paper.
3
The indices can be written either in the subscript x or in brackets x(j). For the sum-
j
mation, there is no difference. The subscripted indices are usually considered as row and
columnindicesofa matrix, whilethe indicesin bracketsare seen asparameters. For exam-
ple, each TT–core X(k) is considered as a parameter-dependent on i matrix with the row
k
indexα andthe column indexα asfollows
k−1 k
X(k) = [X(k) (i )] Crk−1×nk×rk, X(k)(i ) Crk−1×rk.
αk−1,αk k ∈ k ∈
In our notation X(k)(i ) is a matrix, for which standard algorithms like orthogonalization
k
(QR) and singular value decomposition (SVD) can be applied. We will freely transfer in-
dicesfromsubscriptstobracketsinordertomaketheequationseasiertoreadortoempha-
size a certain transposition of elements in tensors. It brings the notations in consistence
with previous paperson the numerical tensor methods, e.g. [18, 9, 8,35]and others.
Wewillreshapearraysintomatricesandvectorsbyusingtheindexgrouping,i.e.,com-
biningtwoormoreindicesα,...,ζinasinglemulti-indexα...ζ.Following[35]wedefine
interface matricesX6k Cn1...nk×rk andX>k Crk×nk+1...nd asfollows
∈ ∈
X6k(i i ...i ,α ) = X(1)(i )X(2) (i )...X(k) (i ),
1 2 k k α1 1 α1α2 2 αk−1,αk k (2)
X>k(α ,i ...i i ) = X(k+1) (i )...X(d−1) (i )X(d) (i ),
k k+1 d−1 d αk,αk+1 k+1 αd−2,αd−1 d−1 αd−1 d
and similarly for symbols X<k and X>k. Using the τ notation defined in (1) we can write
x = τ(X6k,X>k). For a tensor x = [x(i ,...,x )] we also define the unfolding matrix, which
1 d
consists of the entries ofthe original tensor asfollows
X{k}(i ...i ,i ...i ) = x(i ,...,i ), X{k} Cn1...nk×nk+1...nd.
1 k k+1 d 1 d
∈
For x in the TT–format (1) it holds X{k} = X6kX>k and therefore rankX{k} = r . In [30]
k
the reverse is proven: for any tensor x there exists the representation (1) with TT–ranks
r = rankX{k}. This gives the term TT–rank the definite algebraic meaning. As a result,
k
the tensor train representation of fixed TT–ranks yields a closed manifold, and the rank-
(r ,...,r ) approximation problem is well–posed. Wecan also approximate a given ten-
1 d−1
sor by a tensor train with quasi–optimal ranks using a simple and robust approximation
(ranktruncation,ortensorrounding)algorithm[30]. Thisisthecaseforalltensornetworks
without cycles, eg. Tucker [40], HT[15], QTT-Tucker [8], etc. Incontrast, the MPS formal-
ismoriginallyassumestheperiodicboundaryconditionsα = α andsumovertheseindices,
0 d
which leads to Tr(X(1)...X(d)), where all matrices can be shifted in cycle under the trace.
The optimization in such type of tensor networks is difficult, because they form unclosed
manifoldsandthe bestapproximation doesnot alwaysexist.
The tensor train representation of the matrix is made similarly with the TT–cores de-
pending on two parameters i ,j . Hence, x = τ(X¯) is sought in the form (1) and A and y
k k
given in the TT–formatasfollows
A(i ,...,i ; j ,...,j ) = A(1)(i ,j )...A(d)(i ,j ),
1 d 1 d 1 1 d d
(3)
y(i ,...,i ) = Y(1)(i )...Y(d)(i ).
1 d 1 d
For A and x given in the TT–format, the matrix-vector product c = Ax is also a TT–
format computed asfollows
c(i ,...,i ) = A(1)(i ,j ) X(1)(j ) ... A(d)(i ,j ) X(d)(j ) ,
1 d 1 1 1 d d d
⊗ ⊗
(cid:0) (cid:1) (cid:0) (cid:1)
4
where denotesthe tensor (Kronecker) product oftwo matrices definedasfollows
⊗
A = A(i,j) , B = B(p,q) , C = A B = C(ip,jq) = A(i,j)B(p,q) .
⊗
Wereferto(cid:2) [30]fo(cid:3)r more d(cid:2)etailson(cid:3) basictensor operat(cid:2)ionsin the(cid:3)TT(cid:2)–format. (cid:3)
Inthispaperwewillusestandardl scalarproduct( , )andtheA–scalarproduct( , )
2 A
· · · ·
definedby asymmetrical positive definite (SPD) matrix A asfollows
(u,v) = u∗Av, u 2 = (u,u) .
A k kA A
ForagivennonsingularmatrixUwedefinetheA–orthogonal projectorR asfollows: for
U
all vand allw spanUit holds
∈
R v spanU, (w,R v) = (w,v) , R = U(U∗AU)−1U∗A.
U U A A U
∈
We will use vector notations for mode indices i = (i ,...,i ) and rank indices r =
1 d
(r ,...,r ). We also denote the subspace of tensor trains X¯ = (X(1),...,X(d)) with tensor
0 d
ranksr as
d
T = Cri−1×ni×ri.
r
i×=1
3 Alternating minimization methods
3.1 ALS–like minimization with fixed TT–ranks
TheMPSformalismwasproposedinQuantumPhysics,wheretherepresentation (1)was
usedfortheminimizationoftheRayleighquotient(x,Ax)/(x,x).Similarly,thesolutionof
alinearsystemAx = ywithA = A∗ canbesoughtthrough theminimizationofanenergy
function
J(x) = x −x 2 = (x,Ax)−2ℜ(x,y)+const, (4)
k ∗ kA
where x denotes the exact solution. We consider the Hermitian matrix A = A∗ and the
∗
right-hand side ygiven in the TT–format (3), and solve the minimization problem with x
soughtintheTT–format(1)withfixedTT–ranksr,i.e.,X¯ = argmin J(τ(X¯)).Thisheavy
∗ X¯∈Tr
nonlinearminimizationproblemcanhardlybesolvedunlessa(very)accurateinitialguess
isavailable(see,eg.[35]). Tomakeittractable,wecanusethealternatinglinearoptimiza-
tionframeworkandsubstitutetheglobalminimizationoverthetensortrainX¯ T bythe
r
∈
linear minimization over all cores X(1),...,X(d) subsequently in a cycle. Solving the local
problem we assume that all cores but k–th of the current tensor train X¯ = (X(1),...,X(d))
are ‘frozen’,and the minimization isdone overX(k) asfollows
X¯ = (X(1),...,X(k−1),X(k) ,X(k+1),...,X(d)), where
new new
(5)
X(k) = argminJ(τ(X¯)), s.t. X(k) Crk−1×nk×rk.
new X(k) ∈
Clearly, the energy function does not grow during the sequence of ALS updates and the
solution will converge to alocal minimum.
TowriteeachALSstepasalinearproblem,letusstretchallentriesoftheTT–coreX(k)in
thevectorx (α i α ) = X(k) (i ).From(1)weseethatx = X x ,whereX = P (X¯)
k k−1 k k αk−1,αk k 6=k k 6=k 6=k
isthe n ...n r n r matrix definedasfollows
1 d k−1 k k
×
X (i ...i ,α j α ) = X(1)(i )...X(k−1) (i )δ(i ,j )X(k+1) (i )...X(d) (i ),
6=k 1 d k−1 k k α1 1 αk−2,αk−1 k−1 k k αk,αk+1 k+1 αd−1 d (6)
X = P (X¯) = X<k I X>k ⊤,
6=k 6=k ⊗ nk ⊗
(cid:0) (cid:1)
5
α1 α2 αk−2 αk−1 αk αk+1 αd
X(1) X(2) X(k−1) X(k) X(k+1) X(d)
i1 i2 ik−1 ik ik+1 id
γ1 γ2 γk−2 γk−1 γk γk+1 γd
A(1) A(2) A(k−1) A(k) A(k+1) A(d)
j1 j2 jk−1 jk jk+1 jd
β1 β2 βk−2 βk−1 βk βk+1 βd
X(1) X(2) X(k−1) X(k) X(k+1) X(d)
Figure 1: Tensornetwork corresponding to the quadratic form (Ax,x) with matrix A and
vector x given in the tensor train format. The boxes are tensors with lines(legs) denoting
indices. Each bond between two tensors assumesa summation overthe join index.
where δ(i,j) is the Kronecker symbol, i.e., δ(i,j) = 1 if i = j and δ(i,j) = 0 elsewhere. If
J(τ(X¯))is considered asa function ofx ,it isalsothe second-orderenergy function
k
J(τ(x)) = (AX x ,X x )−2(y,X x ) = (X∗ AX x ,x )−2(X∗ y,x ),
6=k k 6=k k 6=k k 6=k 6=k k k 6=k k
where the gradient w.r.t. x iszerowhen1
k
X∗ AX x = X∗ y. (7)
6=k 6=k k 6=k
(cid:0) (cid:1)
The solution of the local minimization problem (5)is therefore equivalentto the solution
ofthe original system Ax = yin the reducedbasisX = P (X¯),defined by (6).
6=k 6=k
The tensor train representation (1) is non-unique. Indeed, two representations X¯ and
Y¯ mapto one tensor τ(X¯) = τ(Y¯)assoon as
Y(k)(i ) = H−1 X(k)(i )H , k = 1,...,d,
k k−1 k k
where H = H = 1and H Crk×rk, k = 1,...,d−1, arearbitrary nonsingular matrices.
0 d k
Given a vector in the TT–for∈mat x = τ(X¯), any transformation H = (H ,...,H ) does not
0 d
changetheenergylevelsinceJ(τ(X¯)) = J(τ(Y¯))butgivesussomeflexibilityforthechoice
ofthereducedbasissinceP (X¯) = P (Y¯).TheproperchoiceoftherepresentationX¯ essen-
6=k 6=k
6
tiallydefinesthereducedbasisandaffectsthepropertiesofthelocalproblem(7). Apromi-
H
nenttransformation isthe TT–orthogonalization algorithm proposed in [30]. Itchooses
matrices H applying the QR factorization to the reshaped TT–cores, i.e., matrices of size
k
r n r and/orr n r .ThetransformationHgivenbytheTT–orthogonalizationim-
k−1 k k k−1 k k
× ×
pliestheleft–orthogonalityconstrainsonTT–coresY(1),...,Y(k−1)andright–orthogonality
on Y(k+1),...,Y(d), which results in the orthogonality of the interfaces Y<k and Y>k and
hencethereducedbasisY = P (Y¯).Such anormalizationstepwillbeassumedinmany
6=k 6=k
algorithms throughout the paper;in most caseswe will dothis without introduction of a
newrepresentation Y¯ justby‘claiming’thenecessaryorthogonalization patternoftheTT
representation we use. If the reduced basis method is applied and such a representation
1ForillustrationseeFig.1,forthedetailedderivationsee[18,9].
6
X¯ is chosen so that X = P (X¯) = P is orthogonal, the spectrum of the reduced ma-
6=k 6=k
trixP∗APliesbetweenthe minimumandmaximumeigenvaluesofthe matrixA.Indeed,
using the Rayleighquotient [19], wewrite
λ (P∗AP) = min(Pv,APv) = min (u,Au) > min(u,Au) = λ (A),
min min
kvk=1 u∈spanP,kuk=1 kuk=1
and similarly for the maximum values. It follows that the reduced matrix is conditioned
not worse than the original, cond(X∗ AX ) 6 cond(A). Therefore, the orthogonality of
6=k 6=k
TT–cores ensures the stability of local problems and we will silently assume this for all
reduced problemsin this paper.
To conclude this part, let us calculate the complexity of the local problem (7). As was
pointedoutin[9],eitheradirectelimination,oraniterativelinearsolverwithfastmatrix-
by-vector products (matvecs) may be applied. If the direct solution method is used, the
costswhicharerequiredtoformther n r r n r matrixofthelocalproblem(7)are
k−1 k k k−1 k k
×
smaller than the complexity of the Gaussian elimination, i.e., the overall cost is O(n3r6).2
If an iterative method is used to solve the local problem, one multiplication X∗ AX re-
6=k 6=k
quires O(nr r3 + n2r2r2) operations, where r and r denote the TT–rank of the current
A A A
solution xand the matrix A,respectively. Careful implementation ofthe matvec isessen-
tial to reach this complexity, see [9] for details. The complexity of the normalization step
isonlyO(dnr3) operations andcan be neglected.
3.2 DMRG–like minimization and adaptivity of TT–ranks
In practical numerical work the TT–ranks of the solution are usually not known in ad-
vance,whichputsarestrictionontheuseofthemethodswithfixedTT–ranks. Theunder-
estimation of TT–ranks leads to a low accuracy of the solution, while the overestimation
results in a large computational overhead. This motivates the development of methods
whichcanchooseandmodifytheTT–rankson–the–flyadaptivelytothedesiredaccuracy
level. AprominentexampleofsuchmethodistheDensityMatrixRenormalizationGroup
(DMRG)algorithm[42],developedinthequantumphysicscommunityforthesolutionof
a ground state problem. DMRGperforms similarly to the ALS but at each step combines
two succeedingblocks X(k) and X(k+1) into one superblock
w (α i i α ) = W(k) (i i ), W(k)(i i ) = X(k)(i )X(k+1)(i ), (8)
k k−1 k k+1 k+1 αk−1,αk+1 k k+1 k k+1 k k+1
and make the minimization over w . Classical DMRG minimizes the Rayleigh quotient,
k
our version minimizes the energy function J(x), see [18, 9]. Similarly to (6),(7) we write
the local DMRGproblem Bw = g asfollows
k k
P = P (X¯) = X<k I I X>k+1 ⊤ Cn1...nd×rk−1nknk+1rk+1,
∈/{k,k+1} ⊗ nk ⊗ nk+1 ⊗ ∈ (9)
B = P∗AP, g = P∗y Crk−1nknk+1rk+1.
k (cid:0) (cid:1)
∈
When the w is computed, new TT–blocks are obtained by the low–rank decomposition,
k
i.e. the right-hand side of (8) is computed and the k-th rank is updated adaptively to the
chosen accuracy. The minimization over O(n2r2) components of w leads to complexity
k
O(n3),andseriously increasesthe computational time forsystems with large mode sizes.
2Wewillalwaysassumethatn1 =...=nd =nandr1 =...=rd−1 =rinthecomplexityestimates.
7
3.3 One–blockenrichmentasaGalerkinreductionofthetwo-dimensional
system
Supposethatwehavejustsolved(7)andupdatedtheTT-blockX(k). Beforewemovetothe
next step, we would like to improve the reduced basis P (X¯) by adding a few vectors
6=k+1
to it. Denote the current solution vector by t = τ(T¯) and suppose we add a step s = τ(S¯).
Thentheupdatedsolutionx = t+shastheTT–representationx = τ(X¯)definedasfollows
X(1)(i ) := T(1)(i ) S(1)(i ) ,
1 1 1
T(p)(i ) 0 T(d)(i ) (10)
X(p)(i ) := (cid:2) p (cid:3) , X(d)(i ) := d ,
p 0 S(p)(i ) d S(d)(i )
p d
(cid:20) (cid:21) (cid:20) (cid:21)
where p = 2,...,d − 1. We will denote this tensor train as X¯ = T¯ + S¯.3 The considered
update affects the solution process in two ways: first, naturally, adds a certain correction
to the solution, and second, enlarge the reduced basis that we will use at the next step of
the ALSminimization. Indeed,it caneasilybe seenfrom definition (2)that
X<k = T<k S<k , X>k ⊤ = T>k ⊤ S>k ⊤ .
h i
(cid:2) (cid:3) (cid:0) (cid:1) (cid:0) (cid:1) (cid:0) (cid:1)
From (6)we conclude that P (T¯ +S¯) = T<k S<k I T>k ⊤ S>k ⊤ and hence
6=k
⊗ ⊗
h i
(cid:2) (cid:3) (cid:0) (cid:1) (cid:0) (cid:1)
X = P (T¯ +S¯) = P (T¯) S<k I T>k ⊤ T<k I S>k ⊤ P (S¯) . (11)
6=k 6=k 6=k 6=k
⊗ ⊗ ⊗ ⊗
h i
(cid:0) (cid:1) (cid:0) (cid:1)
The clever choice of s = τ(S¯) allows to add the essential vectors to spanP (T¯ + S¯) and
6=k
therefore improve the convergence ofALS.
A random choice of s T with some small TT–ranks r (cf. random kick proposed
r
∈
in [37, 29]) may lead to a slow convergence. It also introduces an unwanted perturbation
ofthesolution. Amorerobustideaistochoosesinaccordancetosomeone-stepiterative
method, for instance, take s z = y − At and construct a steepest descent or minimal
≈
residual method with approximations. This choice allowsto derive the convergence esti-
mate similarlyto the classical one andwill be discussed inSec. 4.
To stay within methods of linear complexity, we restrict ourselves to zero shifts s = 0
with a simpleTT–structure S¯ = (0,...,0,S(k),0,...,0).Thetensor train X¯ = T¯ +S¯ hasthe
followingstructure4
T(k+1)(i )
X(k)(i ) = T(k)(i ) S(k)(i ) , X(k+1)(i ) = k+1 , (12)
k k k k+1 0
(cid:20) (cid:21)
(cid:2) (cid:3)
and X(p) = T(p) for other p. Note that since s = 0, the enrichment step does not affect
the energy J(τ(X¯)) = J(τ(T¯)). Therefore, we can choose S(k) freely and develop (proba-
bly, heuristic) approaches to improve the convergence of our scheme. The reduced basis
3Due to the non–uniqueness of the TT–formatother representations(probablywith smaller TT–ranks)
canexistforx=t+s.
4Wegiveadescriptionfortheforwardsweep,i.e. theonewithincreasingk = 1,...,d.Forthebackward
sweeptheconstructionisdoneanalogously.
8
1 ... k−1 k k+1 k+2 ... d
βk−1 βk βk+1
P6=k+1(X¯)xk+1
jk jk+1
γ
k
A
ik ik+1
αk−1 αk αk+1
P6=k+1(X¯)
=
αk−1 αk αk+1
P6=k+1(X¯)
ik ik+1
y
Figure 2: Linearsystem Ax = yin the reducedbasis P (X¯) shown bytensor networks.
6=k+1
The reduced system has r n r unknowns, shown by the dark box. Gray boxes show
k k+1 k+1
the X(k) which is updated by S(k) to improve the convergence. White boxes contribute to
the local matrix Band right-hand sideg ofthe 2Dsystem (9).
X = P (T¯ +S¯) dependson the choice of S(k) asfollows(cf. (6))
6=k+1 6=k+1
X (i ...i ,α j α ) = X<k(i ...i ,α )X(k) (i )δ(i ,j )X>k+1(α ,i ...i ),
6=k+1 1 d k k+1 k+1 1 k−1 k−1 αk−1,αk k k+1 k+1 k+1 k+2 d
X = X<k X(k) I X>k+1 ⊤,
6=k+1 αk−1 ⊗ αk−1 ⊗ nk+1 ⊗
X(k) = T(k) S(k) , (cid:0) (cid:1)
αk−1 αk−1 αk−1
h i (13)
whereX<k isacolumnofX<k andX(k) isthen r matrixwhichisthesliceof3-tensor
αk−1 αk−1 k× k
X(k) = [X(k)(α ,i ,α )] corresponding to the fixed α , and similarly for Sk(α ) and
k−1 k k k−1 k−1
T(k)(α ). Below we will write the local system (7) at the step k + 1 and see how it is
k−1
affected bythe choice of S(k).
The two-dimensional system defined by (9) is shown by gray boxes in the Fig. 2. It
appears here as the local problem in the DMRG method, but in the same framework we
may consider the whole initial system with d = 2, and k = 1, dependingon what type of
analysiswe would like to perform.
Now the reduced system for the elements of x (β j β ) = X(k+1)(β ,j ,β )
k+1 k k+1 k+1 k k+1 k+1
writes
X(k) B X(k) X(k+1) = X(k) G ,
αk,a ab,a′b′ a′,βk βk,b′ αk,a a,b (14)
X(k) I ∗B X(k) I x = X(k) I ∗g,
k+1
⊗ ⊗ ⊗
where the following mu(cid:2)lti-indic(cid:3)esar(cid:2)e introd(cid:3)uced for(cid:2)brevity(cid:3)ofnotation
α i = a, β j = a′, a,a′ = 1,...,r n ,
k−1 k k−1 k k−1 k
i α = b, j β = b′, b,b′ = 1,...,r n ,
k+1 k+1 k+1 k+1 k+1 k+1
9
andX(k) ∈ Crk−1nk×rk,I = Irk+1nk+1.Thesystem(14)hasrknk+1rk+1 unknowns. Atthesame
timeitisthereductionofa2DsystemBw = gwhichhasr n n r unknowns. There-
k−1 k k+1 k+1
fore,thechoiceoftheenrichmentS(k) (asapartofX(k))canbeconsideredasacheaperap-
proximationofthe2Dsystemsolution. TakingintoaccountthestructureofX(k) from (13)
we rewrite (14)asfollows
T S ∗B T S x = T S ∗g, T = T(k) I, S = S(k) I. (15)
k+1
⊗ ⊗
The syste(cid:2)m (15(cid:3)) is(cid:2)diffic(cid:3)ult to an(cid:2)alyze(cid:3). However, we may propose a certain approxi-
mation to its solution, and estimate the quality of the solution to the whole system (15)
viathepropertiesoftheapproximation. Namely,letusconsiderthezero–paddedTT–core
X(k+1)in(12)astheinitialguess,i.e.,someinformationaboutthesolutionx thatwewant
k+1
to use. For instance, we can apply the block Gauss–Seidel step, restricting the unknown
block to the form
T(k+1)(i ) t(α′i α ) = T(k+1) (i ),
X(k+1)(ik+1) = (cid:20) V(ik+k1+)1 (cid:21), v(αk′k′ikk++11αkk++11) = Vααk′k′α′αkk++11(ikk++11). (16)
Then (15)writes asthe followingoverdeterminedsystem
T∗ T∗
B Tt+Sv = g,
S∗ S∗
(cid:20) (cid:21) (cid:20) (cid:21)
(cid:2) (cid:3)
and followingthe Gauss–Seidel step we solve it considering onlythe lowerpart
(S∗BS)v = S∗(g−BTt). (17)
Equation (17) is a Galerkin reduction method with the basis S applied to the system
Bw = g with the initial guess (8), and TT–cores X(k) and X(k+1) defined by (12). After (17)
(k)
issolved, the updated superblock W writes asfollows
new
T(k+1)(i )
X(k)(i ) = T(k)(i ) S(k)(i ) , X(k+1)(i ) = k+1 ,
k k k new k+1 V(i )
k+1
(cid:20) (cid:21)
(18)
W(k) (i i ) = (cid:2)T(k)(i )T(k+1)(i (cid:3))+S(k)(i )V(i )
new k k+1 k k+1 k k+1
= W(k)(i i )+S(k)(i )V(i ),
k k+1 k k+1
which allows to consider the proposed method as a solver for the 2D system, which per-
forms the low–rankcorrectionfor the superblock ratherthan recompute itfrom scratch.
Equations (14) and (17) can be considered as certain approximate approaches to the
solution of the 2D system (9). Different such approaches can be collected into Table 1,
sorted from the highest tothe lowestaccuracy.
4 Steepest descent schemes
4.1 Steepest descent with perturbation
Given the initial guess t, the steepest descent (SD) step minimizes the energy function (4)
overvectors x = t+sα, wherethe step ischosen asfollows
s = −gradJ(t) = y−At = z,
(z,z)
α = argminJ(t+zα) = .
(z,Az)
10