myjournalmanuscriptNo. (willbeinsertedbytheeditor) The regularization theory of the Krylov iterative solvers LSQR and CGLS for linear discrete ill-posed problems, part I: the simple singular value case ZhongxiaoJia 7 1 0 Received:date/Accepted:date 2 n a Abstract Forthelarge-scalelineardiscreteill-posedproblemmin Ax b orAx=b J k − k with b contaminated by a white noise, the Lanczos bidiagonalization based LSQR 0 2 methodanditsmathematicallyequivalentConjugateGradient(CG)methodforATAx= ATb are most commonly used. They have intrinsic regularizing effects, where the ] numberkofiterationsplaystheroleofregularizationparameter.However,therehas A been no answer to the long-standing fundamental concern by Bjo¨rck and Elde´n in N 1979:forwhich kindsofproblemsLSQR andCGLS canfindbest possibleregular- . ized solutions? Here a best possible regularizedsolution means that it is at least as h t accurateasthebestregularizedsolutionobtainedbythetruncatedsingularvaluede- a composition(TSVD)methodorstandard-formTikhonovregularization.Inthispaper, m assuming that the singular values of A are simple, we analyze the regularizationof [ LSQRforseverely,moderatelyandmildlyill-posedproblems.Weestablishaccurate 1 estimatesforthe2-normdistancebetweentheunderlyingk-dimensionalKrylovsub- v spaceandthek-dimensionaldominantrightsingularsubspaceofA.Forthefirsttwo 8 kindsofproblems,we thenprovethatLSQRfindsa bestpossibleregularizedsolu- 0 tionatsemi-convergenceoccurringatiterationk andthat,fork=1,2,...,k ,(i)the 7 0 0 k-stepLanczosbidiagonalizationalwaysgeneratesanearbestrankkapproximation 5 0 toA;(ii)thekRitzvaluesalwaysapproximatethefirstklargesingularvaluesinnat- . uralorder;(iii) the k-step LSQR always capturesthe k dominantSVD components 1 0 ofA.Forthethirdkindofproblem,weprovethatLSQRgenerallycannotfindabest 7 possible regularized solution. We derive estimates for the entries of the bidiagonal 1 matricesgeneratedbyLanczosbidiagonalization,whichcanbepracticallyexploited : v to identify if LSQR finds a best possible regularizedsolution at semi-convergence. i Numericalexperimentsconfirmourtheory. X r a ThisworkwassupportedinpartbytheNationalScienceFoundationofChina(No.11371219) ZhongxiaoJia DepartmentofMathematicalSciences,TsinghuaUniversity,100084Beijing,China. E-mail:[email protected] 2 ZhongxiaoJia Keywords Discreteill-posed fullorpartialregularization bestornearbestrank · · kapproximation TSVDsolution semi-convergence Lanczosbidiagonalization · · · · LSQR CGLS · MathematicsSubjectClassification(2000) MSC65F22 65R32 15A18 65J20 · · · · 65R30 65F10 65F20 · · 1 IntroductionandPreliminaries Considerthelineardiscreteill-posedproblem min Ax b or Ax=b, A Rm n, b Rm, (1) × x Rnk − k ∈ ∈ ∈ wherethenorm isthe2-normofavectorormatrix,andAisextremelyillcondi- k·k tionedwithitssingularvaluesdecayingtozerowithoutanoticeablegap.(1)mainly arisesfromthediscretizationofthefirstkindFredholmintegralequation Kx=(Kx)(t)= k(s,t)x(t)dt=g(s)=g, s W Rq, (2) W ∈ ⊂ Z wherethekernelk(s,t) L2(W W )andg(s)areknownfunctions,whilex(t)isthe ∈ × unknownfunctiontobesought.Ifk(s,t)isnon-degenerateandg(s)satisfiesthePi- cardcondition,thereexiststheuniquesquaresintegrablesolutionx(t);see[23,44,47, 69,76].Hereforbrevityweassumethatsandt belongtothesamesetW Rq with ⊂ q 1. Applications include image deblurring,signal processing, geophysics, com- ≥ puterizedtomography,heatpropagation,biomedicalandopticalimaging,groundwa- termodeling,andmanyothers;see,e.g.,[1,22,23,47,57,63,64,69,76,77,104].The theoryand numericaltreatmentsof integralequationscan be foundin [69,70]. The right-handsideb=bˆ+eisnoisyandassumedtobecontaminatedbyawhitenoisee, causedbymeasurement,modelingordiscretizationerrors,wherebˆ isnoise-freeand e < bˆ .Becauseofthepresenceofnoiseeandtheextremeill-conditioningofA, tkheknaivkeksolutionx =A†bof(1)bearsnorelationtothetruesolutionx =A†bˆ, naive true where † denotes the Moore-Penrose inverse of a matrix. Therefore, one has to use regularizationtoextractabestpossibleapproximationtox . true Inprinciple,regularizinganill-posedproblemistoreplaceitbyawell-posedone, suchthattheerroriscompensatedbythegaininstability.Inotherwords,regulariza- tionistocompromisetheerrorandstabilityasbestaspossible.Forawhitenoisee, throughoutthepaper,wealwaysassumethatbˆ satisfiesthediscretePicardcondition A†bˆ C withsomeconstantCfornarbitrarilylarge[1,27,41,42,44,47,64].Itis k k≤ ananalogofthePicardconditioninthefinite dimensionalcase;see,e.g.,[41],[44, p.9],[47,p.12]and[64,p.63].Thetwodominatingregularizationapproachesareto solvethefollowingtwoessentiallyequivalentproblems min Lx subjectto Ax b =min (3) x Rnk k k − k ∈ andgeneral-formTikhonovregularization(cf.[88,97,98]) min Ax b 2+l 2 Lx 2 (4) x Rn{k − k k k } ∈ RegularizationTheoryofLSQRandCGLS 3 withl >0theregularizationparameterforregularizedsolutions[44,47].Asuitable choiceofthematrixLisbasedona-priorinformationonx ,andtypicallyLiseither true theidentitymatrix,adiagonalweightingmatrix,ora p ndiscreteapproximationof × a firstorsecondorderderivativeoperator.Particularly,if L=I, theidentitymatrix, (4)isstandard-formTikhonovregularization. ThecaseL=I isofmostcommoninterestsandourconcerninthispaper.From nowon,we alwaysassume L=I,forwhichthe solutionsto (1), (3) and(4) canbe fullyanalyzedbythesingularvaluedecomposition(SVD)ofA.Let S A=U VT (5) 0 (cid:18) (cid:19) betheSVDofA,whereU =(u ,u ,...,u ) Rm mandV =(v ,v ,...,v ) Rn n 1 2 m × 1 2 n × areorthogonal,S =diag(s ,s ,...,s ) R∈n n withthesingularvaluess >∈s > 1 2 n × 1 2 >s >0assumedtobesimplethrough∈outthepaper,andthesuperscriptT denotes n ··· thetransposeofamatrixorvector.Then x =(cid:229) n uTi bv =(cid:229) n uTi bˆv +(cid:229) n uTi ev =x +(cid:229) n uTi ev (6) naive s i s i s i true s i i=1 i i=1 i i=1 i i=1 i withkxtruek=kA†bˆk= (cid:229) ni=1|usTiib2ˆ|2 1/2. The discrete Picard(cid:16)condition m(cid:17)eans that, on average, the Fourier coefficients uTbˆ decayfasterthans andenablesregularizationto computeusefulapproxima- | i | i tions to x , which results in the followingpopular model that is used throughout true Hansen’sbooks[44,47]andthecurrentpaper: uTbˆ =s 1+b , b >0, i=1,2,...,n, (7) | i | i where b is a model parameter that controls the decay rates of uTbˆ . Hansen [47, | i | p.68] points out, “while this is a crude model, it reflects the overallbehavior often found in real problems.” One precise definition of the discrete Picard condition is uTbˆ =t s 1+zi withcertainconstantst 0, z >0, i=1,2,...,n.Weremarkthat o|nice|theti i>0andz donotdiffergreait≥ly,sucihdiscretePicardconditiondoesnot i i affectourclaims,ratheritcomplicatesderivationsandformsoftheresults. The white noise e has a number of attractive properties which play a critical role in the regularization analysis: Its covariance matrix is h 2I, the expected val- uesE( e 2)=mh 2 andE(uTe)=h ,i=1,2,...,n,and e √mh and uTe h , i=k1,k2,...,n; see, e.g.,|[4i4,|p.70-1] and [47, p.41-2].kThke≈noise e thus|aiffe|c≈ts uTb, i=1,2,...,n, moreorlessequally.With(7), relation(6) showsthatforlarge i singular values uTbˆ /s is dominantrelative to uTe/s . Once uTbˆ uTe from some i onwards|,tihe|smiall singularvaluesmagni|fyi u|Tei/s , and| tihe|n≤oi|seie|domi- nates uTb/s andmustbesuppressed.Thetransitio|nipo|intki issuchthat | i | i 0 uT b uT bˆ > uT e h , uT b uT e h ; (8) | k0 |≈| k0 | | k0 |≈ | k0+1 |≈| k0+1 |≈ see[47,p.42,98]andasimilardescription[44,p.70-1].Thes arethendividedinto k thek largeonesandthen k smallones. 0 0 − 4 ZhongxiaoJia ThetruncatedSVD(TSVD)method[41,44,47]dealswith(3)bysolving min x subjectto A x b =min, k=1,2,...,n, (9) k k k k − k where A =U S VT is the best rank approximation k to A with respect to the 2- k k k k normwithU =(u ,...,u ),V =(v ,...,v )andS =diag(s ,...,s );itholdsthat k 1 k k 1 k k 1 k A A =s (cf.[10,p.12]).andxtsvd =A†b,calledtheTSVDsolution,solves k − kk k+1 k k (9).An crucialobservationisthatxtsvd is theminimum-normleastsquaressolution k to min A x b that perturbs A to A in (1), and we will frequently exploit this k k x Rnk − k inte∈rpretationlater. Basedontheabovepropertiesofthewhitenoisee,itisknownfrom[44,p.70-1] and[47,p.71,86-8,95]thattheTSVDsolutions (cid:229)k uTibv (cid:229)k uTibˆv, k k ; s i s i 0 xtsvd =A†b=i=1 i ≈i=1 i ≤ (10) k k (cid:229)k usTibvi (cid:229)k0 usTibˆvi+ (cid:229)k usTievi, k>k0, i=1 i ≈i=1 i i=k0+1 i andxtsvd isthebestTSVDregularizedsolutionto(1),whichbalancestheregulariza- k0 tionandperturbationerrorsoptimallyandstabilizestheresidualnorms Axtsvd b k k − k fork notclosetonafterk>k .Theindexk playstheroleoftheregularizationpa- 0 rameterthatdetermineshowmanylargeSVDcomponentsofAareusedtocompute aregularizedsolutionxtsvd to(1). k Thesolutionxl oftheTikhonovregularizationhasafilteredSVDexpansion xl =(cid:229) n fiusTi bvi, (11) i=1 i s 2 wherethe f = i arecalledfilters.TheTSVDmethodisaspecialparameterfil- i s 2+l 2 i teredmethod,where,inxtsvd,wetake f =1, i=1,2,...,kand f =0, i=k+1,...,n. k i i Theerrorxl xtrue canbewrittenasthesumoftheregularizationandperturbation errors, and an−optimal l aims to balance these two errors and make the sum of opt their normsminimized[44,47,69,104]. The best possible regularizedsolution xl opt retainsthek dominantSVD componentsanddampenstheothern k smallSVD 0 0 − componentsasmuchaspossible[44,47].Apparently,theabilitytoacquireonlythe largestSVDcomponentsofAisfundamentalinsolving(1). Anumberofparameter-choicemethodshavebeendevelopedforfindingl or opt k ,suchasthediscrepancyprinciple[75],theL-curvecriterion,whoseusegoesback 0 toMiller[74]andLawsonandHanson[72]andistermedmuchlaterandstudiedin detailin[43,49],andthegeneralizedcrossvalidation(GCV)[33,105];see,e.g.,[5, 44,47,64,66,68,79,89,104]fornumerouscomparisons.Allparameter-choicemeth- odsaimtomake f/s notsmallfori=1,2,...,k and f/s 0fori=k +1,...,n. i i 0 i i 0 ≈ Eachofthesemethodshasitsownmeritsanddisadvantages,andnooneisabsolutely reliable for all ill-posed problems.For example,some of the mentionedparameter- choicemethodsmayfailtofindaccurateapproximationstol ;see[37,103]foran opt RegularizationTheoryofLSQRandCGLS 5 analysisontheL-curvemethodand[44]forsomeotherparameter-choicemethods. Afurtherinvestigationonparamater-choicemethodsisnotourconcerninthispaper. TheTSVDmethodandthestandard-formTikhonovregularizationproducevery similarsolutionswithessentiallytheminimum2-normerror,i.e.,theworst-caseerror [69, p.13]; see [102], [42], [44, p.109-11] and [47, Sections 4.2 and 4.4]. Indeed, for an underlying linear compact equation Kx =g, e.g.,(2), with the noisy g and truesolutionx (t),underthesourceconditionthatitssolutionx (t) R(K )or true true ∗ x (t) R(K K),therangeoftheadjointK ofK orthatofK K,whi∈chamounts true ∗ ∗ ∗ ∈ toassumingthatx (t)oritsderivativeissquaresintegrable,theerrorsofthebest true regularizedsolutionsbytheTSVDmethodandtheTikhonovregularizationareorder optimal,i.e.,thesameorderastheworst-caseerror[69,p.13,18,20,32-40],[77,p.90] and [104, p.7-12].These conclusionscarriesoverto (1) [104, p.8]. Therefore,both xlopt and xtks0vd are best possible solutions to (1) under the above assumptions, and anyofthem canbe takenas thereferencestandardwhenassessing the regularizing effectsofaniterativesolver.Forthesakeofclarityandanalysis,wewilltakextsvd as k0 thestandardreference. For(1)large,theTSVDmethodandtheTikhonovregularizationmethodaregen- erallytoodemanding,andonlyiterativeregularizationmethodsarecomputationally viable. A major class of methods has been Krylov iterative solvers that project (1) ontoa sequenceof low dimensionalKrylovsubspacesandcomputesiteratesto ap- proximate x [1,23,32,36,44,47,69]. Of Krylov iterative solvers, the CGLS (or true CGNR)method,whichimplicitlyappliestheCGmethod[34,51]toATAx=ATb,and itsmathematicallyequivalentLSQRalgorithm[85]havebeenmostcommonlyused. TheKrylovsolversCGME(orCGNE)[10,11,19,36,38]andLSMR[11,25]arealso choices, which amount to the CG method applied to min AATy b or AATy=b k − k withx=ATyandMINRES[84]appliedtoATAx=ATb,respectively.TheseKrylov solvershavebeenintensivelystudiedandknowntohavegeneralregularizingeffects [1,21,32,36,38,44,47,52,53]andexhibitsemi-convergence[77,p.89];seealso[10, p.314], [11, p.733], [44, p.135] and [47, p.110]: The iterates converge to x and true theirnormsincreasesteadily,andtheresidualnormsdecreaseinaninitialstage;af- terwardsthenoiseestartstodeterioratetheiteratessothattheystarttodivergefrom x andinsteadconvergetox ,whiletheirnormsincreaseconsiderablyandthe true naive residual norms stabilize. If we stop at the right time, then, in principle, we have a regularizationmethod,wheretheiterationnumberplaystheroleoftheregularization parameter. Semi-convergenceis due to the fact that the projected problem starts to inherittheill-conditioningof(1)fromsomeiterationonwards,andasmallsingular valueoftheprojectedproblemamplifiesthenoiseconsiderably. TheregularizingeffectsofCG typemethodswere noticedbyLanczos[71]and were rediscoveredin [62,92,96]. Based on these works and motivated by a heuris- tic explanation on good numerical results with very few iterations using CGLS in [62],and realizingthatsuchan excellentperformancecanonlybe expectedif con- vergenceto the regularpartofthe solution,i.e.,xtsvd, takesplace beforethe effects k0 ofill-posednessshowup,onpage13of[12],Bjo¨rckandElde´nin1979foresightedly expressed a fundamentalconcern on CGLS (and LSQR): More research is needed to tell for which problems this approach will work, and what stopping criterion to choose.Seealso[44,p.145].AsremarkedbyHankeandHansen[39],thepaper[12] 6 ZhongxiaoJia wastheonlyextensivesurveyonalgorithmicdetailsuntilthattime,andastrictproof oftheregularizingpropertiesofconjugategradientsisextremelydifficult.Anenor- mous effort has long been made to the study of regularizing effects of LSQR and CGLS (cf. [24,30,31,36,38,44,47,52,53,56,78,81,86,91,100]), but hitherto there hasbeennodefinitiveanswertotheabovelong-standingfundamentalquestion,and thesameisforCGMEandLSMR. ForAsymmetric,MINRESandMR-IIappliedtoAx=bdirectlyarealternatives and have been shown to have regularizing effects [14,36,40,47,58,67], but MR-II seems preferablesince the noisy b is excludedin the underlyingsubspace [55,58]. For A nonsymmetric or multiplication with AT difficult to compute, GMRES and RRGMRES are candidate methods[3,15,16,80], and the latter may be better [58]. The hybrid approaches based on the Arnoldi process have been first proposed in [17] and studied in [14,18,73,82]. Gazzola and her coauthors [26,27,28,29] have describedageneralframeworkofthehybridmethodsandpresentedvariousKrylov- Tikhonov methods with different parameter-choice strategies. The regularizing ef- fectsofthesemethodsarehighlyproblemdependent,anditappearsthattheyrequire that the mixing of the left and right singular vectors of A be weak, that is,VTU is closetoadiagonalmatrix;formoredetails,see,e.g.,[58]and[47,p.126]. The behavior of ill-posed problems critically depends on the decay rate of s . j Thefollowingcharacterizationofthedegreeofill-posednessof(1)wasintroducedin [54]andhasbeenwidelyused[1,23,44,47,76]:Ifs =O(j a ),then(1) ismildly j − or moderatelyill-posed for 1 <a 1 or a >1. If s =O(r j) with r >1, j= 2 ≤ j − 1,2,...,n,then(1) isseverelyill-posed.Hereformildlyill-posedproblemsweadd therequirementa > 1,whichdoesnotappearin[54]butmustbe metfork(s,t) 2 ∈ L2(W W ) in (1) [39,44]. In the one-dimensionalcase, i.e., q=1, (1) is severely ill-pos×ed with k(s,t) sufficiently smooth, and it is moderately ill-posed with s = j O(j p 1/2), where p is the highest order of continuous derivatives of k(s,t); see, − − e.g.,[44,p.8]and[47,p.10-11].Clearly,thesingularvaluess foraseverelyill-posed j problemdecayatthesamerater 1,whilethoseofamoderatelyormildlyill-posed − a problemdecayatthedecreasingrate j thatapproachesonemorequicklywith j+1 jforthemildlyill-posedproblemtha(cid:16)nfor(cid:17)themoderatelyill-posedproblem. Ifaregularizedsolutionto(1)isatleastasaccurateasxtsvd,thenitiscalledabest k0 possibleregularizedsolution.Given(1),iftheregularizedsolutionofaniterativereg- ularizationsolveratsemi-convergenceissuchabestpossibleone,then,bythewords of Bjo¨rck and Elde´n, the solver works for the problem and is said to have the full regularization.Otherwise,thesolverissaidtohaveonlythepartialregularization. Because it has long been unknown whether or not LSQR, CGLS, LSMR and CGME have the full regularization for a given (1), one commonly combines them withsomeexplicitregularization,hopingthattheresultinghybridvariantsfindbest possibleregularizedsolutions[1,44,47].AhybridCGLSistorunCGLSforseveral trialregularizationparametersl andpicksupthebestoneamongthecandidates[1]. Itsdisadvantagesarethatregularizedsolutionscannotbeupdatedwithdifferentl and thereisnoguaranteethattheselectedregularizedsolutionisabestpossibleone.The hybridLSQR variantshavebeen advocatedby Bjo¨rck and Elde´n[12] and O’Leary andSimmons[83],andimprovedanddevelopedbyBjo¨rck[9]andBjo¨rck,Grimme RegularizationTheoryofLSQRandCGLS 7 and van Dooren [13]. A hybridLSQR first projects(1) onto Krylovsubspaces and then regularizes the projected problems explicitly. It aims to remove the effects of smallRitzvaluesandexpandsKrylovsubspacesuntiltheycapturesthek dominant 0 SVDcomponentsofA[9,13,39,83].Theexplicitregularizationforprojectedprob- lems should be introducedand play into effectsonly after semi-convergencerather thanfromtheveryfirstiteration.Ifitworks,theerrornormsofregularizedsolutions and the residual norms further decrease until they ultimately stabilize. The hybrid LSQRandCGMEhavebeenintensivelystudiedin,e.g.,[6,7,8,20,38,39,73,80,90] and [1,47,50]. Within the frameworkof such hybridsolvers, however,it is hard to findanear-optimalregularizationparameter[13,90]. Incontrast,ifaniterativesolveristheoreticallyprovedandpracticallyidentified to have the full regularization, one simply stops it after semi-convergence,and no complicatedhybridvariantand furtheriterations are needed. Obviously,we cannot emphasize too much the importance of proving the full or partial regularization of LSQR,CGLS,LSMRandCGME.Bythedefinitionofthefullorpartialregulariza- tion, we now modifythe concernof Bjo¨rckand Elde´n as: Do LSQR, CGLS, LSMR andCGMEhavethefullorpartialregularizationforseverely,moderatelyandmildly ill-posedproblems?Howtoidentifytheirfullorpartialregularizationinpractice? Inthispaper,wefocusonLSQRandanalyzeitsregularizationforseverely,mod- eratelyandmildlyill-posedproblems.DuetothemathematicalequivalenceofCGLS andLSQR,theassertionsonthefullorpartialregularizationofLSQRapplytoCGLS aswell.WeprovethatLSQRhasthefullregularizationforseverelyandmoderately ill-posed problems once r > 1 and a > 1 suitably, and it generally has only the partial regularization for mildly ill-posed problems. In Section 2, we describe the LanczosbidiagonalizationprocessandLSQR,andmakeanintroductoryanalysis.In Section 3, we establish sinQ theoremsfor the 2-normdistance between the under- lyingk-dimensionalKrylovsubspaceandthek-dimensionaldominantrightsingular subspaceofA.We thenderivesomefollow-upresultsthatplayacentralroleinan- alyzingtheregularizationofLSQR.InSection4,forthefirsttwokindsofproblems weprovethatak-stepLanczosbidiagonalizationalwaysgeneratesanearbestrankk approximationtoA,andthekRitzvaluesalwaysapproximatethefirstklargesingular valuesinnaturalorder,andnosmallRitzvalueappearsfork=1,2,...,k .Thiswill 0 showthatLSQRhasthefullregularization.Formildlyill-posedproblems,weprove that,forsomek k ,thekRitzvaluesgenerallydonotapproximatethefirstklarge 0 ≤ singularvaluesinnaturalorderandLSQRgenerallyhasonlythepartialregulariza- tion.InSection5,wederiveboundsfortheentriesofbidiagonalmatricesgenerated byLanczosbidiagonalization,showinghowfasttheydecayandhowtousethemto identifyifLSQRhasthefullregularizationwhenthedegreeofill-posednessof(1)is unknowninadvance.InSection6, wereportnumericalexperimentstoconfirmour theoryonLSQR.Finally,wesummarizethepaperwithfurtherremarksinSection7. Throughoutthepaper,wedenotebyK (C,w)=span w,Cw,...,Ck 1w thek- k − { } dimensionalKrylovsubspacegeneratedbythematrixC andthevectorw, andbyI and the bold letter 0 the identity matrix and the zero matrix with ordersclear from the context, respectively. For the matrix B=(b ), we define B =(b ), and for ij ij | | | | C =(c ), B C means b c componentwise. ij ij ij | | | | | |≤| | | |≤| | 8 ZhongxiaoJia 2 TheLSQRalgorithm TheLSQRalgorithmisbasedontheLanczosbidiagonalizationprocess,Algorithm1, thatcomputestwoorthonormalbases q ,q ,...,q and p ,p ,...,p ofK (ATA,ATb) 1 2 k 1 2 k+1 k andK (AAT,b)fork=1,2,...,n,{respectively. } { } k+1 Algorithm1 k-stepLanczosbidiagonalizationprocess 1. Takep1=b/ b Rm,anddefineb1q0=0. k k∈ 2. For j=1,2,...,k (i) r=ATpj b jqj 1 (ii) a j= r −;qj=r−/a j (iii) z=Akqjk a jpj (iv) b j+1=k−zk;pj+1=z/b j+1. Algorithm1canbewritteninthematrixform AQ =P B , (12) k k+1 k ATP =Q BT+a q eT . (13) k+1 k k k+1 k+1 k+1 wheree isthe(k+1)-thcanonicalbasisvectorofRk+1,P =(p ,p ,...,p ), k+1 k+1 1 2 k+1 Q =(q ,q ,...,q )and k 1 2 k a 1 b a 2 2 B = b ... R(k+1) k. (14) k 3 × ∈ ... a k b k+1 Itisknownfrom(12)that B =PT AQ . (15) k k+1 k WeremindthatthesingularvaluesofB ,calledtheRitzvaluesofAwithrespectto k theleftandrightsubspacesspan P andspan Q ,areallsimple. k+1 k { } { } andAcotmitepruatteiosnthke,LitSerQatRessox(lvk)e=sthQepy(rko)bwleimthkAx(k)−bk=minx∈Kk(ATA,ATb)kAx−bk k y(k)=argmin B y b e(k+1) = b B†e(k+1), (16) y Rkk k −k k 1 k k k k 1 ∈ wheree(k+1)isthefirstcanonicalbasisvectorofRk+1,andtheresidualnorm Ax(k) 1 k − b decreases monotonically with respect to k. We have Ax(k) b = B y(k) k k k − k k − b e(k+1) and x(k) = y(k) ,bothofwhichcanbecheaplycomputed. k k 1 k k k k k Notethat b e(k+1)=PT b.Wehave k k 1 k+1 x(k)=Q B†PT b, (17) k k k+1 RegularizationTheoryofLSQRandCGLS 9 that is, the iterate x(k) by LSQR is the minimum-normleast squaressolution to the perturbedproblemthatreplacesAin(1)byitsrankkapproximationP B QT.Re- k+1 k k callthatthebestrankkapproximationA toAsatisfies A A =s .Furthermore, k k k+1 k − k analogousto(9),LSQRnowsolves min x subjectto P B QTx b =min, k=1,2,...,n (18) k k k k+1 k k − k for the regularized solutions x(k) to (1). If P B QT is a near best rank k approx- k+1 k k imation to A with an approximate accuracy s and the k singular values of B k+1 k approximatethefirstklargeonesofAinnaturalorderfork=1,2,...,k ,thesetwo 0 factsrelateLSQRandtheTSVDmethodnaturallyandcloselyintwoways:(i)xtsvd k andx(k)aretheregularizedsolutionstothetwoperturbedproblemsof(1)thatreplace Abyitstworankkapproximationswiththesamequality,respectively;(ii)xtsvd and k x(k) solvealmostthesametworegularizationproblems(9)and(18),respectively.As aconsequence,theLSQRiteratex(k0) isasaccurateasxtsvd,andLSQRhasthefull k0 regularization.Otherwise, as will be clear later, underthe discrete Picard condition (7), x(k0) cannotbe as accurateas xtks0vd if either Pk+1BkQTk is nota near bestrank k approximationtoA,k=1,2,...,k ,orB hasatleastonesingularvaluesmallerthan 0 k s for some k k . Precisely, if either of them is violated for some k k and k0+1 ≤ 0 ≤ 0 q (k)<s ,x(k)hasbeendeterioratedbythenoisee,andLSQRhasonlythepartial k k0+1 regularization.Wewillgiveaprecisedefinitionofanearbestrankk approximation toAsoon. 3 sinQ theoremsforthedistancesbetweenK (ATA,ATb)andspan V aswell k k { } astheothersrelated vanderSluisandvanderVorst[99]provethefollowingresult,whichhasbeenusedin Hansen[44]andthereferencesthereintoillustratetheregularizingeffectsofLSQR andCGLS.Wewillalsoinvestigateitfurtherinourpaper. Proposition1 LSQRwiththestartingvectorp =b/ b andCGLSappliedtoATAx= 1 k k ATbwiththestartingvectorx(0)=0generatethesameiterates x(k)=(cid:229) n f(k)uTi bv, k=1,2,...,n, (19) i s i i=1 i where k (q (k))2 s 2 f(k)=1 (cid:213) j − i , i=1,2,...,n, (20) i −j=1 (q j(k))2 andtheq (k) arethesingularvaluesofB labeledasq (k)>q (k)> >q (k). j k 1 2 ··· k (19) shows that x(k) has a filtered SVD expansion of form (11). If all the Ritz valuesq (k) approximatethefirstksingularvaluess ofAinnaturalorder,thefilters j j (k) (k) f 1,i=1,2,...,k and the other f monotonicallyapproach zero for i=k+ i ≈ i 10 ZhongxiaoJia 1,...,n.Thisindicatesthatiftheq (k) approximatethefirstksingularvaluess ofA j j innaturalorderfork=1,2,...,k thenthek -stepLSQRhasthefullregularization. 0 0 However, if a small Ritz value appears before some k k , i.e., q (k) >s and ≤ 0 k 1 k0+1 s <q (k) s withthesmallestinteger j >k +1,then f(k) −(0,1)tendsto j∗ k ≤ k0+1 ∗ 0 i ∈ zeromonotonicallyfori= j ,j +1,...,n;ontheotherhand,wehave ∗ ∗ (cid:213) k (q j(k))2−s i2 = (q k(k))2−s i2k(cid:213)−1(q j(k))2−s i2 0, i=k +1,...,j 1 j=1 (q j(k))2 (q k(k))2 j=1 (q j(k))2 ≤ 0 ∗− since the first factor is non-positive and the second factor is positive. Then we get f(k) 1, i=k +1,...,j 1,indicatingthatx(k) isalreadydeterioratedandLSQR i ≥ 0 ∗− hasonlythepartialregularization. Thestandardk-stepLanczosbidiagonalizationmethodcomputesthekRitzvalues q (k),whichareusedtoapproximatesomesingularvaluesofA.Itismathematically j equivalenttothesymmetricLanczosmethodfortheeigenvalueproblemofATAstart- ingwithq =ATb/ ATb ;see[4,10,11,87,101]or[2,60,61]forseveralvariations 1 k k thatarebasedonstandard,harmonic,andrefinedprojection[4,94,101]oracombi- nation of them [59]. It is known that, for generalsingular value distribution and b, someRitzvaluesbecomegoodapproximationstotheextremesingularvaluesofAas k increases.Iflargesingularvaluesare wellseparatedbutsmallsingularvaluesare clustered,largeRitzvaluesconvergefastbutsmallRitzvaluesconvergeslowly. For(1),ATbcontainsmoreinformationondominantrightsingularvectorsthan ontheonescorrespondingtosmallsingularvalues.Therefore,K (ATA,ATb)hope- k fully contains richer information on the first k right singular vectors v than on the i other n k ones, at least for k small. Furthermore,note that A hasmany small sin- − gular values clustered at zero. Due to these two basic facts, all the Ritz values are expectedtoapproximatethelargesingularvaluesofAinnaturalorderuntilsomeit- erationk,atwhichasmallRitzvalueshowsup.Inthiscase,theiteratesx(k)byLSQR captureonlythelargestkdominantSVDcomponentsofA,andtheyaredeteriorated by the noise e dramatically after that iteration. This is why LSQR and CGLS have general regularizing effects; see, e.g., [1,44,46,47,50] and the references therein. Unfortunately,theseargumentscannothelpusdrawanydefinitiveconclusiononthe full or partial regularizationof LSQR because there has been no quantitativeresult onthesizeofsuchkforanykindofill-posedproblemandthenoisee.Foraseverely ill-posed example from seismic tomography,it is reportedin [100] that the desired convergenceoftheRitzvaluesactuallyholdsaslongasthediscretePicardcondition issatisfiedandthereisagoodseparationamongthelargesingularvaluesofA.Yet, therehasbeennomathematicaljustificationontheseobservations. A complete understandingof the regularization of LSQR includes accurate so- lutionsofthefollowingproblems:HowaccuratelydoesK (ATA,ATb)approximate k thek-dimensionaldominantrightsingularsubspaceofA?Howaccurateistherankk approximationP B QT toA?CanitbeanearbestrankkapproximationtoA?How k+1 k k does the noise level e affects the approximation accuracy of K (ATA,ATb) and k P B QT fork k kankdk>k ,respectively?Whatsufficientconditionsonr and a k+a1rekneekdedto≤gua0ranteethatP0 B QT isa nearbestrankk approximationtoA? k+1 k k