Table Of ContentTHE ADAPTIVE S-STEP CONJUGATE GRADIENT METHOD
ERIN CARSON∗
Abstract. On modern large-scale parallel computers, the performance of Krylov subspace
iterativemethods islimitedbyglobal synchronization. This hasinspiredthe development ofs-step
(orcommunication-avoiding)Krylovsubspacemethodvariants,inwhichiterationsarecomputedin
blocksofs. Thisreformulationcanreducethenumberofglobalsynchronizations periterationbya
7 factorofO(s),andhasbeenshowntoproducespeedups inpracticalsettings.
1 Althoughthes-stepvariants aremathematicallyequivalent totheir classicalcounterparts, they
0 canbehavequitedifferentlyinfiniteprecisiondependingontheparameters. Ifsischosentoolarge,
the s-step method can suffer a convergence delay and a decrease in attainable accuracy relative to
2
the classicalmethod. Thismakes itdifficultforapotential userof suchmethods - thesvalue that
n minimizesthe timeper iteration maynot bethe best sforminimizingthe overall time-to-solution,
a andfurthermaycauseanunacceptable decreaseinaccuracy.
J Towardsimprovingthereliabilityandusabilityofs-stepKrylovsubspacemethods,inthiswork
5 we derive the adaptive s-step CG method, a variable s-step CG method where in block k, the
1 parameter sk is determined automatically such that a user-specified accuracy is attainable. The
method fordeterminingsk isbasedonabound ongrowthofthe residualgapwithinblockk,from
] which we derive a constraint on the condition numbers of the computed O(sk)-dimensional Krylov
A subspace bases. The computations required for determining the block size sk can be performed
without increasing the number of global synchronizations per block. Our numerical experiments
N demonstrate that the adaptive s-step CG method is able to attain up to the same accuracy as
. classicalCGwhilestillsignificantlyreducingthetotal numberofglobalsynchronizations.
s
c
[ 1. Introduction. Inthispaper,weconsidertheuseofKrylovsubspacemethods
1 for solving large, sparse linear systems Ax = b, where A Rn×n. We will focus
∈
v on the conjugate gradient (CG) method [25], which is used when A is symmetric
9 positivedefinite. Givenaninitialapproximatesolutionx1 andcorrespondingresidual
8
9 r1 = b−Ax1, the CG method iteratively updates the approximate solution using a
3 sequence of nested Krylov subspaces 1(A,r1) 2(A,r1) , where
K ⊂K ⊂···
0
1. Ki(A,r1)=span(r1,Ar1,...,Ai−1r1)
0
7 denotes the i-dimension Krylov subspace with matrix A and starting vector v. In
1 iteration i, the updated approximate solution xi+1 x1 + i(A,r1) is chosen by
v: imposing the Galerkin condition ri+1 =b Axi+1 ∈i. K
− ⊥K
i Thus each iteration of CG requires a matrix-vector product with A in order to
X
expand the dimension of the Krylov subspace and a number of inner products to
r
a perform the orthogonalization. On modern computer architectures, the speed with
which the matrix-vector products and inner products can be computed is limited
by communication (i.e., the movement of data). This limits the potential speed of
individual iterations attainable by an implementation of CG. To perform a sparse
matrix-vector product in parallel, each processor must communicate entries of the
source vector and/or the destination vector that it owns to neighboring processors.
Innerproductsrequireaglobalsynchronization,i.e.,thecomputationcannotproceed
untilallprocessorshavefinishedtheirlocalcomputationandcommunicatedtheresult
to other processors. For large-scalesparse problems on large-scalemachines, the cost
of synchronization between parallel processors can dominate the run-time (see, e.g.,
the exascale computing report [19, pp. 28]).
∗Courant Institute of Mathematical Sciences, New York University, New York, NY.
(erinc@cims.nyu.edu,http://math.nyu.edu/~erinc/).
1
Research efforts toward removing the performance bottlenecks caused by com-
munication in CG and other Krylov subspace methods have produced various ap-
proaches. One such approach are the s-step Krylov subspace methods (also called
“communication-avoiding” Krylov subspace methods); for a thorough treatment of
background, related work, and performance experiments, see, e.g., the theses [5, 27].
Ins-stepKrylovsubspacemethods,insteadofperformingoneiterationatatime, the
iterations are performed in blocks of s; i.e., in each iteration, the Krylov subspace is
expandedbyO(s)dimensionsbycomputingO(s)newbasisvectorsandthenallinner
productsbetweenthenewbasisvectorsneededforthenextsiterationsarecomputed
in one block operation. In this way, computing the inner products for s iterations
only requires a single globalsynchronization,decreasingthe synchronizationcost per
iteration by a factor of O(s). This approach has been shown to lead to significant
speedups foranumberofproblemsandreal-worldapplications(see,e.g.,[35,43]). In
theremainderofthe paper,wewillrefertothematriceswhosecolumnsconsistofthe
O(s) basis vectorscomputedin eachblockas s-step basis matrices. Further details of
the s-step CG method are discussed in section 2. We emphasize that our use of the
overloadedterm“s-stepmethods”herediffersfromotherworks,e.g.,[14,30]and[21,
9.2.7], in which ‘s-step method’ refers to a type of restarted Lanczos procedure.
§
In exact arithmetic the s-step CG method produces the exact same iterates as
the classicalCGmethod, buttheir behaviorcandiffer significantlyinfinite precision.
In both s-step and classical Krylov subspace methods, rounding errors due to finite
precision arithmetic have two basic effects: a decrease in attainable accuracy and a
delayofconvergence. Ithaslongbeenknownthatfors-stepKrylovsubspacemethods,
ass is increased(andso the conditionnumbers ofthe s-stepbasis matricesincrease),
the attainable accuracy decreases and the convergence delay increases relative to the
classical CG method (see, e.g., [10]). At the extreme, if the parameter s is chosen to
be too large,the O(s)-dimensional bases computedfor eachblock canbe numerically
rank deficient and the s-step method can fail to converge.
This sensitive numerical behavior poses a practical obstacle to optimizing the
performance of s-step methods, and diminishes their usability and reliability. In a
setting where the performance of CG is communication-bound, we expect that up to
some point, increasing s will decrease the time per iteration. If we pick s only based
on minimizing the time per iteration, however, we can run into problems. First, the
finite precision error may cause a large convergence delay, negating any potential
performance gain with respect to the overallruntime. Since the number of iterations
requiredforconvergenceforagivensvalueisnotknownapriori,choosingthesvalue
that results in the fastest time-to-solution is a difficult problem. Second, the chosen
s parameter may cause s-step CG to fail to converge to the user-specified accuracy.
In this case, the particular problem is unsolvable by the s-step CG method.
Requiring the user to choose the parameter s thus diminishes the practical us-
ability of s-stepKrylovsubspace methods. It is therefore imperative that we develop
a better understanding ofthe convergencerate andaccuracyinfinite precisions-step
CG and other s-step Krylov subspace methods. Our hope is that by studying the
theoretical properties of methods designed for large-scale computations in finite pre-
cision, we can develop methods and techniques that are efficient, capable of meeting
application-dependent accuracy constraints, and which do not require that the user
have extensive knowledge of numerical linear algebra.
Towardthis goal,inthis paper we developa variables-stepCG method inwhich
the s values can vary between blocks and are chosen automatically by the algorithm
2
such that a user-specified accuracy is attained. We call this the adaptive s-step CG
method. The condition for determining s in each block is derived using a bound on
the size of the gap between the true residuals b Ax and the updated residuals r
i i
−
computed in finite precision. We show that in order to prevent a loss of attainable
accuracy, in a given block of iterations, the condition number of the s-step basis ma-
trix should be no larger than a quantity related to the inverse of the 2-norms of the
residualscomputedwithinthatblock. Inotherwords,asthemethodconverges,more
ill-conditioned bases (i.e., larger s values) can be used without a loss of attainable
accuracy. Furthermore, we show that the adaptive s-step CG method can be im-
plemented in a way that does not increase the synchronization cost per block versus
(fixed) s-step CG.
Whereasthefixeds-stepCGmethodmaynotattainthesameaccuracyasclassical
CG, our numerical experiments on a set of example matrices demonstrate that our
adaptives-stepCGmethodcanachievethe sameaccuracyasclassicalCG,while still
reducing the number of synchronizations required by up to over a factor of 4.
In section 2 we give a brief background on s-step Krylov subspace methods and
detail the s-step CG method, and in section 3 we discuss related work. Section 4
presents our main contribution; in section 4.1, we derive a theoretical bound on how
large the s-step basis matrix condition number can be in terms of the norms of the
updated residuals without affecting the attainable accuracy, and in section 4.2, we
detail the adaptive s-stepCG algorithmthat makes use ofthis bound in determining
the number ofiterationsto performineachblock. Numericalexperiments forasmall
set of test matrices are presented in section 5. We discuss possible extensions and
future work and briefly conclude in section 6.
2. Background: s-step Krylov subspace methods. The basic idea behind
s-step Krylov subspace methods is to grow the underlying Krylov subspace O(s) di-
mensions at a time and perform the necessary orthogonalizationwith a single global
synchronization. Withinablockofs iterations,thevectorupdatesareperformedim-
plicitly by updating the coordinates of the iteration vectors in the O(s)-dimensional
basis spanned by the columns of the s-step basis matrix. The algorithmic details
vary depending on the particular Krylov subspace method considered, but the gen-
eral approach is the same. There is a wealth of literature related to s-step Krylov
subspace methods. We give a brief summary of early related work below; a thorough
bibliography can be found in [27, Table 1.1].
The s-step approach was first used in the context of Krylov subspace methods
by Van Rosendale [41], who developed a variant of the parallel conjugate gradient
method which minimizes inner product data dependencies with the goal of exposing
more parallelism. The term “s-step Krylov subspace methods” was first used by
Chronopoulosand Gear,who developedan s-stepCG method [9, 10]. This workwas
followedinsubsequentyearsbythedevelopmentofothers-stepvariantsofOrthomin
and GMRES [11], Arnoldi and Symmetric Lanczos [31, 32], MINRES, GCR, and
Orthomin [8, 13], Nonsymmetric Lanczos [33], and Orthodir [12]. Walker similarly
used s-step bases, but with the goal of improving stability in GMRES by replacing
the modified Gram-Schmidt orthogonalizationprocess with Householder QR [42].
Many of the earliest s-step Krylov subspace methods used monomial bases (i.e.,
[v,Av,A2v,...])forthecomputeds-stepbasismatrices,butitwasfoundthatconver-
genceoftencouldnotbeguaranteedfors>5(see,e.g.,[10]). Thismotivatedresearch
into the use of other more well-conditioned bases for the Krylov subspaces, including
scaled monomial bases [26], Chebyshev bases [29, 16, 17], and Newton bases [1, 20].
3
Thegrowingcostofcommunicationinlarge-scalesparseproblemshascreatedarecent
resurgenceofinterestintheimplementation,optimization,anddevelopmentofs-step
Krylovsubspacemethods;see,e.g.,therecentworks[18,27,35,46,23,34,44,45,43].
2.1. The s-step conjugate gradient method. In this work, we focus on the
s-step conjugate gradient method (Algorithm 2), which is equivalent to the s-step
CG method (Algorithm 2) in exact arithmetic. In the remainder of this section, we
give a brief explanation of the s-step CG method; further details on the derivation,
implementation, and performance of the s-step CG method can be found in, e.g., [5,
27].
Algorithm 1 Conjugate gradient (CG)
Input: n n symmetric positive definite matrix A, length-n vector b, and initial
×
approximation x1 to Ax=b
Output: Approximate solution xi+1 to Ax=b with updated residual ri+1
1: r1 =b Ax1, p1 =r1
−
2: for i=1,2,..., until convergence do
3: αi =riTri/pTi Api
4: qi =αipi
5: xi+1 =xi+qi
6: ri+1 =ri Aqi
−
7: βi =riT+1ri+1/riTri
8: pi+1 =ri+1+βipi
9: end for
Thes-stepCGmethodconsistsofanouterloop,indexedbyk,whichiteratesover
theblocksofiterations,andaninnerloop,whichiteratesoveriterationsj 1,...,s
∈{ }
within a block. For clarity, we globally index iterations by i sk+j. It follows from
≡
the properties of CG that for j 0,...,s ,
∈{ }
psk+j+1,rsk+j+1 s+1(A,psk+1)+ s(A,rsk+1) and
∈K K
xsk+j+1 xsk+1 s(A,psk+1)+ s−1(A,rsk+1). (2.1)
− ∈K K
Then the CG vectors for the next s iterations lie in the sum of the column spaces of
the matrices
k,s =[ρ0(A)psk+1,...,ρs(A)psk+1], with span( k,s)= s+1(A,psk+1) and
P P K
k,s =[ρ0(A)rsk+1,...,ρs−1(A)rsk+1], with span( k,s)= s(A,rsk+1), (2.2)
R R K
where ρ (z) is a polynomial of degree j satisfying the three-term recurrence
j
ρ0(z)=1, ρ1(z)=(z θ0)ρ0(z)/γ0, and
−
ρj(z)=((z θj−1)ρj−1(z) σj−2ρj−2(z))/γj−1. (2.3)
− −
We define the s-step basis matrix = [ , ] and define to be the
Yk,s Pk,s Rk,s Yk,s
sameas exceptwithallzerosincolumnss+1and2s+1. Notethatthes’sinthe
k,s
Y
subscripts are unnecessary here, but will be useful in describing the variable s-step
CG method later on. Under certain assumptions on the nonzero structure of A,
k,s
Y
can be computed in parallel in each outer loop for the same asymptotic latency cost
4
as a single matrix-vectorproductusing the so-called“matrix powerskernel”(see [18]
for details). Since the columns of satisfy (2.3), we can write
k,s
Y
A = , (2.4)
Yk,s Yk,sBk,s
where is a (2s+1) (2s+1) block tridiagonal matrix defined by the 3-term
k,s
B ×
recurrence coefficients in (2.3).
By (2.1), there exist vectors p′ , r′ , and x′ that represent the coordi-
k,j+1 k,j+1 k,j+1
nates of the CG iterates psk+j+1, rsk+j+1, and xsk+j+1 xsk+1, respectively, in k,s
− Y
for j 0,...,s . That is,
∈{ }
psk+j+1 =Yk,sp′k,j+1,
rsk+j+1 =Yk,srk′,j+1, and (2.5)
xsk+j+1−xsk+1 =Yk,sx′k,j+1.
Therefore,intheinnerloopofs-stepCG,ratherthanupdatetheCGvectorsexplicitly,
we instead update their coordinates in , i.e., for j 1,...,s ,
k,s
Y ∈{ }
x′k,j+1 =x′k,j +αsk+jp′k,j, rk′,j+1 =rk′,j −αsk+jBk,sp′k,j,
and p′k,j+1 =rk′,j+1+βsk+jp′k,j.
Thus the matrix-vector product with A in each iteration becomes a matrix-vector
productwiththemuchsmallermatrix . Thisalongwiththelength-(2s+1)vector
k,s
B
updates can be performed locally by each processor in each inner loop iteration.
We can also reduce communication in computing the inner products. Letting
Gk,s =YkT,sYk,s, and using (2.5) and (2.4), αsk+j and βsk+j can be computed by
αsk+j = rk′T,jGk,srk′,j / p′kT,jGk,sBk,sp′k,j and
βsk+j =(cid:0)rk′T,j+1Gk,srk′(cid:1),j+(cid:0)1 / rk′T,jGk,srk′,j(cid:1).
The matrix G can be co(cid:0)mputed with one(cid:1) g(cid:0)lobal synchr(cid:1)onization per outer loop
k,s
iteration, which, in terms of asymptotic parallel latency, costs the same as a single
inner product. As k,s and Gk,s are dimension (2s+1) (2s+1), αsk+j and βsk+j
B ×
can be computed locally by each processor within the inner loop.
3. Related work. Inthissection,wediscussedrelatedworkintheareasofvari-
able s-step Krylov subspace methods, the analysis of maximum attainable accuracy
in finite precision CG, and inexact Krylov subspace methods.
3.1. Variable s-step Krylov subspace methods. Williams et al. [43] use
a variable s-step BICGSTAB as the coarse grid solver in a multigrid method. The
techniqueusedhere,termed“telescoping”,ismotivatedbytheobservationthansome
coarse grid solves converge after only a few iterations, whereas other solves take
significantlylonger. Bystartingwithasmallsvalueandgraduallyincreasingit,they
ensurethats-stepBICGSTABwithlargersisonlyusedwhenthe solvetakesenough
iterations to amortize the additional costs associated with s-step CG.
Imberti and Erhel [28] have recently developed an s-step GMRES method that
allowsvariablessizes. Theyrecommendchoosings valuesaccordingtotheFibonacci
k
sequence up to some maximum value smax, i.e., starting with s0 = s1 = 1 and for
k 2, sk = min(smax,sk−1+sk−2). In this way, the sequence si used by Imberti
≥ { }
and Erhel is predetermined.
5
Algorithm 2 s-step conjugate gradient
Input: n n symmetric positive definite matrix A, length-n vector b, and initial
×
approximation x1 to Ax=b
Output: Approximate solution xsk+s+1 to Ax=b with updated residual rsk+s+1
1: r1 =b Ax1, p1 =r1
−
2: for k =0,1,..., until convergence do
3: Compute k,s =[ k,s, k,s] according to (2.2)
Y P R
4: Compute Gk,s =YkT,sYk,s
5: Assemble k,s such that (2.4) holds
B
6: p′k,1 =[1, 01,2s]T, rk′,1 =[01,s+1, 1, 01,s−1]T, x′k,1 =[01,2s+1]T
7: for j =1 to s do
8: αsk+j = rk′T,jGk,srk′,j / p′kT,jGk,sBk,sp′k,j
9: qk′,j =αs(cid:0)k+jp′k,j (cid:1) (cid:0) (cid:1)
10: x′k,j+1 =x′k,j +qk′,j
11: rk′,j+1 =rk′,j −Bk,sqk′,j)
12: βsk+j = rk′T,j+1Gk,srk′,j+1 / rk′T,jGk,srk′,j
13: p′k,j+1 =(cid:0)rk′,j+1+βsk+jp′k,(cid:1)j (cid:0) (cid:1)
14: end for
15: Recover iterates psk+s+1,rsk+s+1,xsk+s+1 according to (2.5)
{ }
16: end for
In contrast, in our approach, the sequence s is dynamically chosen based on
i
{ }
the2-normoftheupdatedresidual(whichisnotnecessarilymonotonicallydecreasing
in CG). In addition, our method is designed such that a user-specified accuracy ε∗
can be attained when ε∗ εCG, where εCG is the accuracy attained by classical CG
≥
for the same problem.
3.2. Maximum attainable accuracy in Krylov subspace methods. In
boths-stepandclassicalvariantsofKrylovsubspacemethods,finiteprecisionroundoff
error in updates to the approximate solution x and the residual r in each iteration
i i
can cause the updated residual r and the true residual b Ax to grow further and
i i
−
further apart as the iterations proceed. If this deviation grows large, it can limit the
maximum attainable accuracy, i.e.,the accuracywithwhichwecansolveAx=bona
computerwithunitround-offε. AnalysesofmaximumattainableaccuracyinCGand
otherclassicalKSMsaregivenbyGreenbaum[22],vanderVorstandYe[40],Sleijpen,
vanderVorst,andFokkema[37],Sleijpen,vanderVorst,andModersitzki[38],Bj¨orck,
Elfving, and Strakoˇs [2], and Gutknecht and Strakoˇs [24]. One important result of
these analyses is the insight that loss of accuracy can be caused at a very early
stage of the computation, which can not be corrected in later iterations. Analyses of
the maximum attainable accuracy in s-step CG and the s-step biconjugate gradient
method (BICG) can be found in [6, 5].
3.3. Inexact Krylov subspace methods. Theterm“inexactKrylovsubspace
methods” refersto Krylovsubspacemethods inwhichthe matrix-vectorproducts Av
are computed inexactly as (A+E)v where E represents some error matrix (due to
either finite precision computation or intentional approximation). It is assumed that
allothercomputations(orthogonalizationandvectorupdates)areperformedexactly.
Using an analysis of the deviation of the true and updated residuals, it is shown
that under these assumptions the size of the perturbation term E can be allowed to
6
grow inversely proportional to the norm of the updated residual without adversely
affecting the attainable accuracy. This theory has potential application in mixed-
precision Krylov subspace methods, as well as in situations where the operator A is
expensive to apply and can be approximated in order to gain performance.
Much work has been dedicated to inexact Krylov subspace methods. In [36],
Simoncini and Szyld provide a general theory for inexact variants applicable to both
symmetric and nonsymmetric linear systems andeigenvalue problemsand use this to
derive criteria for bounding the size of E in such a way that does not diminish the
attainableaccuracyofthemethod. AsimilaranalysiswasgivenbyvandeEschofand
Sleijpen [39]. These analyses confirm and improve upon the earlier empirical results
forthe inexactGMRESmethodofBourasandFraysse´e[3]andthe inexactconjugate
gradient method of Bouras, Frayss´e,and Giraud [4].
OurworkisverycloselyrelatedtothetheoryofinexactKrylovsubspacemethods
in that our analysis results in a somewhat analogous “relaxation strategy”, where
insteadofrelaxingtheaccuracyofthematrix-vectorproductsweareinsteadrelaxing
a constraint on the condition numbers of the computed s-step basis matrices. In
addition, our analysis assumes that all parts of the computation are performed in a
fixed precision ε.
Algorithm 3 Variable s-step conjugate gradient
Input: n n symmetric positive definite matrix A, length-n vectorb, initial approx-
×
imation x1 to Ax=b, sequence of s values (s0,s1,...)
Output: Approximate solution xm+sk+1 to Ax=b with updated residual rm+sk+1
1: r1 =b Ax1, p1 =r1
−
2: m=0
3: for k =0,1,..., until convergence do
4: Compute Yk,sk =[Pk,sk, Rk,sk] according to (2.2)
5: Compute Gk,sk =YkT,skYk,sk
6: Assemble Bk,sk such that (2.4) holds
7: p′k,1 =[1, 01,2sk]T, rk′,1 =[01,sk+1, 1, 01,sk−1]T, x′k,1 =[01,2sk+1]T
8: for j =1 to sk do
9: αm+j = rk′T,jGk,skrk′,j / p′kT,jGk,skBk,skp′k,j
10: qk′,j =αm(cid:0)+jp′k,j (cid:1) (cid:0) (cid:1)
11: x′k,j+1 =x′k,j +qk′,j
12: rk′,j+1 =rk′,j −Bk,skqk′,j)
13: βm+j = rk′T,j+1Gk,skrk′,j+1 / rk′T,jGk,skrk′,j
14: p′k,j+1 =(cid:0)rk′,j+1+βm+jp′k,j(cid:1) (cid:0) (cid:1)
15: end for
16: Recover iterates {pm+sk+1,rm+sk+1,xm+sk+1} according to (2.5)
17: m=m+sk
18: end for
4. Variable s-step CG. As discussed, the conditioning of the s-step basis ma-
tricesins-stepKrylovsubspacemethodsaffectstherateofconvergenceandattainable
accuracy. In the s-stepCG method (Algorithm 2), the Krylovsubspace basis is com-
puted in blocks of size s in eachouter loop. This method can be generalizedto allow
the blocks to be of varying size; in other words, a different s value can be used in
each outer loop iteration. We denote the block size in outer iteration k by s , where
k
1 sk smax for some maximum s value smax. We call this a variable s-step Krylov
≤ ≤
7
subspace method. A general variable s-step CG method is shown in Algorithm 3.
In this section, we derive a variable s-step CG method which automatically de-
termines s in each outer loop such that a user-specified accuracy can be attained,
k
whichwecalltheadaptives-stepCGmethod. Ouranalysisisbasedonthemaximum
attainable accuracyanalysisfor s-stepCGin [6], whichshowsthat the attainable ac-
curacy of s-step CG depends on the condition numbers of the s-step basis matrices
computed in each outer loop. In summary, we show that if in outer loop k,
k,s
Y
s is selected such that the condition number of the basis matrix is inversely
k Yk,sk
proportional to the maximum 2-norm of the residual vectors computed within outer
loop k, then the approximate solution produced by the variable s-step CG method
canbeasaccurateastheapproximatesolutionproducedbytheclassicalCGmethod.
Further, our method allows for the user to specify the desired accuracy, so in the
case that a less accurate solution is required, our method automatically adjusts to
allowforhighers values. Thiseffectivelyexploitsthetradeoffbetweenaccuracyand
k
performance in s-step Krylov subspace methods.
Inthissectionwederiveouradaptives-stepapproach,whichstemsfromabound
on the growth in outer loop k of the gap between the true and updated residuals in
finite precision.
4.1. A constraint on basis matrix condition number. In the remainder
of the paper, hats denote quantities computed in finite precision, ε denotes the unit
round-off, denotes the 2-norm, and κ(A) denotes the 2-norm condition number
A−1 A .kT·kosimplifythe analysis,wedropallO(ε2)terms(andhigherorderterms
k kk k
in ε). Writing the true residual b Ax =b Ax r +r , we can write the bound
i i i i
− − −
b Ax b Ax r + r .
i i i i
k − k≤bk − −b k b k bk
Thenas r 0(i.e.,asthemethodconverges),wehave b Ax b Ax r .
i i i i
The sizekofkth→e residual gap, i.e.b, b Ax br , btherefobrekde−termikne≈skthe−attai−nabkle
i i
k − − k
accuracy obf the method (see the related references in section 3.2)b. b b
We begin by considering the rounding errors made in the variable s-step CG
b b
method (Algorithm 3). In outer loop k and inner loops j 1,...,s of finite
k
∈ { }
precision variable s-step CG, using standard models of floating point error (e.g., [21,
2.4]) we have
§
Aˆ = E , (4.1)
Yk,sk Yk,skBk,sk − k
where E ε (3+N ) A ˆ +4 ,
b k kk≤ A k kkYk,skk kYk,skkkBk,skk
x′k,j+1 =x′k,j +qk′,j +ξk(cid:16),j+1, b (cid:17) (4.2)
where kξk,j+1k≤εkx′k,j+1k,
b b b
rk′,j+1 =rk′,j −Bk,skqk′,j +ηk,j+1, (4.3)
b
where kηk,j+1k≤ε(krk′,j+1k+3kBk,skkkqk′,jk),
b b b
xm+j+1 =Yk,skx′k,j+1+xm+1b+φm+j+1, b (4.4)
b whebre kbφm+j+1kb≤ε(kxm+j+1k+3jkYk,skkkx′k,j+1k), and
rm+j+1 =Yk,skrk′,j+1+ψm+j+1b,, b b (4.5)
b whebre kbψm+j+1k≤3jεkYk,skkkrk′,j+1k,
whereN denotesthe maximumnumber ofnonzerosperrowinAandm= k−1s .
A b b i=0 i
Let δ b Ax r denote the residual gap in iteration i. Similar to the analysis
i i i
≡ − − P
8
b b
in [6], for variable s-step CG we can write the growthof the residual gap in iteration
m+j+1 in outer loop k as
j
δm+j+1−δm+1 =− AYˆk,skξk,i+1+Yk,skηk,i+1−Ekqk′,i −Aφm+j+1−ψm+j+1.
Xi=1(cid:16) (cid:17)
Our goal is now to manipulate a bobund on this qubantity in order to derive a
method for determining the largest s value we can use in outer loop k such that
k
the desired accuracy is attained. Using standard techniques, we have the norm-wise
bound
j j
kδm+j+1−δm+1k≤εkAkkYˆk,skk kx′k,i+1k+εkYk,skk krk′,i+1k
i=1 i=1
X X
b b b j
+ε (3+N ) A ˆ +7 q′
A k kkYk,skk kYk,skkkBk,skk k k,ik
(cid:16) (cid:17)Xi=1
+εkAkkxm+j+1k+3jεkAkkYk,bskkkx′k,j+1k+3jεkYbk,skkkrk′,j+1k
j
ε (7+2bN ) A ˆ +1b4 b bx′ b
≤ A k kkYk,skk kYk,skkkBk,skk k k,i+1k
(cid:16) (cid:17)Xi=1
j b b
+ε r′
kYk,skk k k,i+1k
i=1
X
+εkAbkkxm+j+1kb+3jεkAkkYk,skkkx′k,j+1k+3jεkYk,skkkrk′,j+1k,
where we have used ji=1kqbk′,ik ≤ (2+ε) ji=1bkx′k,i+1bk. To simplifby the nbotation,
weletτ = / A andN =7+2N +14τ . We note that depends on
k kBk,skk kPk k AP k kBk,skk
the polynomial basis used and is not expected to be too large;e.g., for the monomial
b b
basis, =1. The above bound can then be written
kBk,skk
j j
kδm+j+1−δm+1k≤εNkkAkkYk,skk kx′k,i+1k+εkYk,skk krk′,i+1k (4.6)
i=1 i=1
X X
+εkAkkxmb+j+1k+3jbεkAkkYk,skkbkx′k,j+1k+3bjεkYk,skkkrk′,j+1k.
Note that assuming Yk,skbis full rank, we havbe b b b
kYk,skkkbrk′,i+1k=kYk,skkkYk+,skYk,skrk′,i+1k
b b ≤κ(bYk,sk)kbYk,skbrm+ib+1−ψm+i+1k
=κ(Ybk,sk)krbm+i+b1k+O(ε), and, (4.7)
x′ = + x′
kYk,skkk k,i+1k kYbk,skkkYbk,skYk,sk k,i+1k
κ( ) x′ . (4.8)
b b ≤ bYk,sk kbYk,skbk,i+b1k
(4.9)
b b b
To bound x′ in terms of the size of the residuals, notice that by (4.4),
kYk,sk k,i+1k
Yk,skbx′k,i+b1 =xm+i+1−xm+1−φm+i+1
9
b b b b
=(x xm+1) (x xm+i+1) φm+i+1
− − − −
=A−1(b Axm+1) A−1(b Axm+i+1) φm+i+1
− − − −
=A−1(rbm+1+δm+1)b A−1(rm+i+1+δm+i+1) φm+i+1
− −
=A−1(rm+1b rm+i+1)+A−1(δmb+1 δm+i+1) φm+i+1,
− − −
b b
so
b b
kYk,skx′k,i+1k≤kA−1k(krm+1k+krm+i+1k)+O(ε).
Thus together witbh (4.b8) we can write b b
kYk,skkkx′k,i+1k≤κ(Yk,sk)kA−1k(krm+1k+krm+i+1k)+O(ε).
This allobws us tbo write the bbound (4.6) inbthe form b
j
kδm+j+1−δm+1k≤2εNkκ(A)κ(Yk,sk) krm+i+1k
i=1
X
+ε Nkjκ(A)bκ(Yk,sk)+b3jκ(A)κ(Yk,sk) krm+1k
(cid:16) (cid:17)
+ε 3jκ(A)κ(Ykb,sk)+3jκ(Yk,sk) bkrm+j+1bk
+εk(cid:16)Akkxm+j+b1k. b (cid:17) b
Using x−xm+j+1 =A−1rm+j+1+δbm+j+1, we have
b kxmb+j+1−xk≤kA−1kkrm+j+1k+O(ε),
and writing xm+j+1 =bxm+j+1−x+x, we havbe
j
b b
kδm+j+1−δm+1k≤2Nkεκ(A)κ(Yk,sk) j(krm+1k+krm+j+1k)+ krm+i+1k!
i=1
X
+ε A x . b b b b
k kk k
Then
kδm+j+1−δm+1k≤6Nkjεκ(A)κ(Yk,sk) ℓ∈{m0,a..x.,j}krm+ℓ+1k +εkAkkxk,
(cid:18) (cid:19)
and letting c =6N jκ(A), this boundbcan be written b
k,j k
kδm+j+1−δm+1k≤ck,jεκ(Yk,sk) ℓ∈{m0,a..x.,j}krm+ℓ+1k +εkAkkxk. (4.10)
(cid:18) (cid:19)
Thisgivesanorm-wiseboundontbhegrowthoftherbesidualgapduetofiniteprecision
errors made in outer iteration k.
Letting m = ℓ−1s , notice that we have
ℓ i=0 i
P k
kδmk+sk+1k≤kδ1k+ kδmℓ+sℓ+1−δmℓ+1k.
ℓ=0
X
10