ebook img

Rayleigh-Ritz majorization error bounds of the mixed type PDF

0.3 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Rayleigh-Ritz majorization error bounds of the mixed type

RAYLEIGH-RITZ MAJORIZATION ERROR BOUNDS OF THE MIXED TYPE ∗ PEIZHEN ZHU†‡ AND ANDREW V. KNYAZEV§¶k Abstract. The absolute change in the Rayleigh quotient (RQ) for a Hermitian matrix with respect to vectors is bounded in terms of the norms of the residual vectors and the angle between 6 vectors in [doi:10.1137/120884468]. We substitute multidimensional subspaces for the vectors and 1 derive new bounds of absolute changes of eigenvalues of the matrix RQ interms of singular values 0 of residualmatrices and principalangles between subspaces, usingmajorization. We show how our 2 resultsrelatetoboundsforeigenvaluesafterdiscardingoff-diagonalblocksoradditiveperturbations. t c Key words. Rayleigh,Ritz,majorization,angles,subspaces,eigenvalue, Hermitian,matrix O AMS subject classification. 15A03,15A18,15A42,15B57. 5 2 1. Introduction. In this work, we continue our decades-long investigation, see ] [1, 12, 13, 14, 18, 19, 20, 21, 22, 39], of sensitivity of the Rayleigh quotient (RQ) and A theRitzvalues(theeigenvaluesofthematrixRQ)withrespecttochangesinvectorsor N subspaces,andcloselyrelatederrorboundsforaccuracyofeigenvalueapproximations . h by the Rayleigh-Ritz (RR) method, for Hermitian matrices and operators. t TherearetwomaintypesofboundsofabsolutechangesoftheRitzvalues: apriori a m andaposteriori bounds,asclassified,e.g.,in[39]. Theaprioribounds,e.g.,presented in [20], are in terms of principal angles between subspaces (PABS) , which may not [ be always readily available in practice. The a posteriori bounds, e.g., presented in 2 [2,27,31,34],arebasedoneasilycomputablesingularvaluesofresidualmatrices. Our v bounds in this work use both PABS and the singular values of the residual matrices, 6 thus called mixed bounds, following [39]. 4 1 Different vectors/subspaces may have the same RQ/Ritz values, but if the val- 6 ues are different, the vectors/subspaces cannot be the same, evidently. A priori and 0 mixed bounds can be used to differentiate vectors/subspaces, providing guaranteed . 1 lower bounds for the angles. In [19], this idea is illustrated for graph matching, by 0 comparingspectra ofgraphLaplaciansofthe graphsto be matched. Ourbounds can 6 be similarly applied for signal distinction in signal processing, where the Ritz values 1 serve as a harmonic signature of the subspace; cf. Star Trek subspace signature. Fur- : v thermore,the RQ/Ritzvaluesarecomputedindependently foreveryvector/subspace i X in a pair, thus, also suitable for distributed or privacy-preserving data mining, while determining the angles requires both vectors/subspaces in a pair to be available. r a The rest of this paper is organized as follows. Section 2 motivates our work and contains our conjecture. In Section 3, we formally introduce the notation, define ma- jorization, and formulate PABS. Section 4 contains our main results—mixed bounds of the absolute changes of the Ritz values in terms of PABS and the singular values of the residual matrices by using weak majorization. In Section 5, we compare our mixed majorizationbounds with those knownandrelate to eigenvalue perturbations. ∗ApreliminaryversionispostedatarXiv[23]. †DepartmentofMathematical&Statistical;MissouriUniversityofScienceandTechnology,202 RollaBuilding,400W.12thSt.,Rolla,MO65409-0020 ‡zhupe[at]mst.edu §MitsubishiElectricResearchLaboratories;201BroadwayCambridge,MA02139 ¶knyazev[at]merl.com khttp://www.merl.com/people/knyazev 1 2 PEIZHENZHUandANDREWV.KNYAZEV 2. Motivation and Conjectures. For a nonzero vector x and a Hermitian matrix A, RQ, defined by ρ(x) = x,Ax / x,x , where , is an inner product, h i h i h· ·i associatedwithanormby 2= , ,istypicallyusedasanapproximationtosome k·k h· ·i eigenvalueofA. Thecorrespondingresidualvectorisdenotedbyr(x)=Ax ρ(x)x. − Let x and y be unit vectors. The angle θ(x,y) between them is defined by cosθ(x,y) = x,y . In [39, Theorem 3.7 and Remark 3.6], the absolute change in |h i| RQ with respect to the vectors x and y, not orthogonalto each other, is bounded as follows, P r(x) + P r(y) Y X (2.1) ρ(x) ρ(y) k k k k | − |≤ cosθ(x,y) =( P r(x) + P r(y) )tanθ(x,y), X+Y X+Y k k k k where P and P are orthogonal projectors on the one-dimensional subspaces X Y X and spanned by the vectors x and y, correspondingly,and P is the orthogonal X+Y Y projectoronthesubspacespannedbyvectorsxandy. Ifthevectorxisaneigenvector of A, the RQ of x is an eigenvalue of A, and (2.1) turns into the following equalities, P r(y) X (2.2) ρ(x) ρ(y) = k k = PX+Yr(y) tanθ(x,y), | − | cosθ(x,y) k k since r(x)=0; i.e. the absolutechange ρ(x) ρ(y) ofthe RQbecomes in (2.2)the | − | absolute error in the eigenvalue ρ(x) of A; cf. [15, 35, 39]. It is elucidative to examine bounds (2.1) and (2.2) in a context of an asymptotic TaylorexpansionoftheRQatthe vectory withthe expansioncenteratthe vectorx. If x is an eigenvector of A, then r(x)=0, the gradient of the RQ at x, and thus the first order term in the Taylor expansion vanishes. This implies that the RQ behaves as a quadratic function in a vicinity of the eigenvector x, e.g., giving a second order bound for the absolute change ρ(x) ρ(y) of the RQ, as captured by (2.2) as well | − | as by the following a priori bound from [18, Theorem 4], (2.3) ρ(x) ρ(y) =[λ λ ]sin2θ(x,y), max min | − | − where λ λ are the two eigenvalues of the projected matrix P A max min span(x)+span(y) ≥ restricted to its invariant two-dimensional subspace span(x)+span(y), which is the range of the orthogonalprojector P . span(x)+span(y) If x is not an eigenvector of A, then the gradient of the RQ at x is not zero, and one can obtain only a first order bound, e.g., as in [18, Theorem 1 and Remark 3], (2.4) ρ(x) ρ(y) =[λ λ ]sinθ(x,y). max min | − | − IfthevectorxmovestowardaneigenvectorofA,thefirstordertermintheTaylor expansion gets smaller, so a desired first order bound can be expected to gradually turn into a second order one, which bound (2.1) demonstrates turning into (2.2), in contrast to autonomous a priori bounds (2.3) and (2.4). In this paper, we substitute finite dimensional subspaces and of the same X Y dimension for one-dimensional subspaces spanned by the vectors x and y and, thus, generalize bounds (2.1) and (2.2) to the multi-dimensional case. An extension of RQ to the multi-dimensional case is provided by the Rayleigh-Ritz (RR) method; see, e.g., [31, 34]. Specifically, let columns of the matrix X form an orthonormal basis for the subspace . The matrix RQ associated with the matrix X is defined by X ρ(X)=XHAX, and the corresponding residual matrix is R =AX Xρ(X). X − Rayleigh-Ritzmajorizationmixederrorbounds 3 Since the matrix A is Hermitian, the matrix RQ ρ(X) is also Hermitian. The eigenvalues of ρ(X) do not depend on the particular choice of the basis X of the subspace , and are commonly called “Ritz values” of the matrix A corresponding X to the subspace . If the subspace is A-invariant, i.e. A , then R =0 and X X X X ⊂X all eigenvalues of ρ(X) are also the eigenvalues of A. Our goal is bounding changes in Ritz values where the subspace varies. Particu- larly, we bound the differences of eigenvalues of Hermitian matrices ρ(X)=XHAX and ρ(Y) = YHAY, where the columns of the matrices X and Y form orthonormal bases for subspaces and , correspondingly. In particular, if the subspace is an X Y X invariantsubspaceofA,thenthechangesofeigenvaluesofHermitianmatricesXHAX andYHAY representapproximationerrorsofeigenvaluesofthe HermitianmatrixA. We generalize (2.1) and (2.2) to the multidimensional setting using majorization,see [2, 26], as stated in the following conjecture. Conjecture 2.1. Let columns of matrices X and Y form orthonormal bases for the subspaces and with dim( ) = dim( ), correspondingly. Let A be a X Y X Y Hermitian matrix and Θ( , )<π/2. Then X Y S(P R )+S(P R ) (2.5) Λ XHAX Λ YHAY Y X X Y , w − ≺ cos(Θ( , )) X Y (cid:12) (cid:0) (cid:1) (cid:0) (cid:1)(cid:12) (cid:12) (cid:12) (2.6) Λ XHAX Λ YHAY S(P R )+S(P R ) tan(Θ( , )). w X+Y X X+Y Y − ≺ { } X Y If the s(cid:12)ub(cid:0)space (cid:1)is A-i(cid:0)nvariant(cid:1),(cid:12)then (cid:12) (cid:12) X S(P R ) (2.7) Λ XHAX Λ YHAY X Y , w − ≺ cos(Θ( , )) X Y (cid:12) (cid:0) (cid:1) (cid:0) (cid:1)(cid:12) (cid:12) (cid:12) (2.8) Λ XHAX Λ YHAY S(P R ) tan(Θ( , )). w X+Y Y − ≺ X Y P and P(cid:12) a(cid:0)re ortho(cid:1)gonal(cid:0)projecto(cid:1)r(cid:12)s on subspaces and ; Λ() denotes the X Y(cid:12) (cid:12) X Y · vectorofdecreasingeigenvalues;S()denotesthevectorofdecreasingsingularvalues; · Θ( , ) denotes the vector of decreasing angles between the subspaces and ; X Y X Y denotes the weak majorization relation; see the formal definitions in Section 3. w ≺ All arithmetic operations with vectors in Conjecture 2.1 are performed element-wise. Let us note that the eigenvalues and the singular values appearing in Conjecture 2.1 do not depend on the particular choice of the bases X and Y of the subspaces and X , correspondingly. Y To highlight advantages of the mixed bounds of Conjecture 2.1, compared to a priorimajorizationbounds, we formulate here one knownresult as follows, cf. [24]. Theorem 2.2 ([20, Theorem 2.1]). Under the assumptions of Conjecture 2.1, (2.9) Λ XHAX Λ YHAY [λ λ ]sin(Θ( , )). w max min − ≺ − X Y If in addition(cid:12) o(cid:0)ne of the(cid:1)subsp(cid:0)aces is A(cid:1)(cid:12)-invariant, then (cid:12) (cid:12) (2.10) Λ XHAX Λ YHAY [λ λ ]sin2(Θ( , )), w max min − ≺ − X Y where λ (cid:12) λ(cid:0) are t(cid:1)he en(cid:0)d points(cid:1)o(cid:12)f the spectrum of the matrix P A restricted max (cid:12) min (cid:12) X+Y ≥ to its invariant subspace + . X Y Theorem 2.2 generalizes bounds (2.3) and (2.4) to the multidimensional case, but also inherits their deficiencies. Similar to bound (2.1) being comparedto bounds 4 PEIZHENZHUandANDREWV.KNYAZEV (2.3)and(2.4),Conjecture2.1ismoremathematicallyelegant,comparedtoTheorem 2.2. Indeed, bound (2.9) cannot imply bound (2.10) in Theorem 2.2, while in Con- jecture 2.1 bounds (2.7) and (2.8) for the case of the A-invariant subspace follow X directly from bounds (2.5) and (2.6), since R =0 and some terms vanish. X Conjecture2.1isparticularlyadvantageousinacasewhereboth and approx- X Y imate the same A-invariant subspace, so that the principal angles between subspaces and are small and the singular values of both residual matrices are also small, X Y e.g., leading to bound (2.6) of nearly the second order, while bound (2.9) is of the first order. For example, let the subspace be obtained off-line to approximate an X A-invariant subspace, while the subspace be computed from the subspace by Y X rounding in a low-precision computer arithmetic of components of the basis vectors spanning , for the purpose of efficiently storing, fast transmitting, or quickly ana- X lyzinginrealtime. Bounds(2.5)and(2.6)allowestimatingtheeffectoftherounding on the change in the Ritz values and choosing an optimal rounding precision. Another example is deriving convergencerate bounds of firstorderiterativemin- imizationmethodsrelatedtotheRitzvalues;e.g.,subspaceiterationsliketheLocally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) method [16]. The first order methods typically converge linearly. If the iterative subspace, which ap- proximates an invariant subspace, is slightly perturbed for whatever purpose, that effects in even smaller changes in the Ritz values, e.g., according to (2.6), preserving essentially the same rate of convergence. Such arguments appear in trace minimiza- tion [38], where [38, Lemma 5.1] presents an inequality for the trace of the difference Λ XHAX Λ YHAY thatgraduallychanges,as(2.6),fromthefirsttothesecond − order error bound. Much earlier examples can be found, e.g., in [14, Theorem 4.2], (cid:0) (cid:1) (cid:0) (cid:1) where the Ritz vectors are substituted by their surrogate approximations, which are easier to deal with, slightly perturbing the subspace. Similarly, in [7, Lemma 4.1], the actual nonlinear subspace iterations are approximatedby a linear scheme. Avail- ability of bounds like (2.6) is expected to greatly simplify proofs and lead to better convergence theory of subspace iterations. We are unable to prove Conjecture 2.1, although it holds in all our numerical tests. Instead,weprovehereslightlyweakerresults,whichstillgeneralizeandimprove bounds obtainedin[15,35], evenfor the casewhereonesubspaceis A-invariant. Our bounds, originating from [41, Section 6.3], exhibit the desired behavior balancing between first and second order error terms. Wefinallymotivateourresultsof 5.5. TheRitzvaluesarefundamentallyrelated § to PABS, as discovered in [19]. For example, simply taking the matrix A to be an orthogonalprojectorZ to a subspace of the same dimension as and turns the Z X Y Ritz values into the cosines squared of PABS, e.g., Λ XHAX =Λ XHZX =cos2(Θ( , )). X Z (cid:0) (cid:1) (cid:0) (cid:1) Thus,inthisexample,Conjecture2.1boundschangesinthePABSΘ( , )compared X Z toΘ( , ),extendingandimprovingsomeknownresults,e.g.,fromourearlierworks Y Z [17, 18, 19], even in the particular case = , where is A=Z-invariant. X Z Z It is interesting to investigate a geometric meaning, in terms of PABS, of the singular values that appear in the bounds, such as S(P R ). We leave such an X+Y Y investigation to future research, except for one simplified case, where the projector P is dropped, in 5.5 that discusses new majorization bounds for changes in X+Y § matrix eigenvalues under additive perturbations. Rayleigh-Ritzmajorizationmixederrorbounds 5 3. Notation and Definitions. Throughout this paper, S(A) denotes a vector of decreasing singular values of a matrix A, such that S(A) = [s (A),...,s (A)]. 1 n S2(A) denotes the entry-wise square, i.e. S2(A) = [s2(A),...,s2 (A)]. Λ(A) de- 1 n notes a vector of decreasing eigenvalues of A. denotes a unitarily invariant |||·||| norm. s (A) ands (A)= A denote the smallestand largest,correspondingly, min max k k singular values of A. λ (A) and λ (A) denote the smallest and largesteigenval- min max ues of a Hermitian matrix A. P denotes an orthogonal projector on the subspace X . ⊥ denotes the orthogonalcomplement of the subspace . Θ( , ) denotes the X X X X Y vector of decreasing principal angles between the subspaces and . θ ( , ) max X Y X Y and θ ( , ) denote the largest and smallest angles between subspaces and , min X Y X Y correspondingly. We use the symbols “ ”and“ ”to arrangecomponents of a vectorin decreasing ↓ ↑ and increasing order, correspondingly. For example, the equality x↓ = [x ,...,x ] 1 n implies x x x . All arithmetic operations with vectors are performed 1 2 n ≥ ≥ ··· ≥ entry-wise, without introducing a special notation. Vectors S(), Λ(), and Θ() are · · · by definition decreasing, i.e. S()=S↓(), Λ()=Λ↓(), and Θ()=Θ↓(). · · · · · · We now define the concepts of weak majorization and (strong) majorization, which are comparison relations between two real vectors. For detailed information, we refer the reader to [2, 26]. Definition 3.1. Let x↓ and y↓ Rn be the vectors obtained by rearranging the ∈ coordinates of vectors x and y in algebraically decreasing order, denoted by x ,...,x 1 n and y ,...,y , such that x x x and y y y . We say that x is 1 n 1 2 n 1 2 n ≥ ··· ≥ ≥ ··· ≥ weakly majorized by y, using the notation x y, if w ≺ k k x y , 1 k n. i i ≤ ≤ ≤ i=1 i=1 X X If in addition n x = n y , we say that x is majorized or strongly majorized by i=1 i i=1 i y, using the notation x y. P ≺P The inequality x y means x y for i = 1,...,n. Therefore, x y implies i i ≤ ≤ ≤ x y,butx y doesnotimplyx y. Therelations and arebothreflective w w w ≺ ≺ ≤ ≺ ≺ and transitive [2]. Theorem 3.2 ([32]). If f is a increasing convex function, then x y implies w ≺ f(x) f(y). w ≺ From Theorem 3.2, we see that increasing convex functions preserve the weak majorization. The following two results also provide ways to preserve majorization. Let nonnegative vectors x,y,u, and v be decreasing and of the same size. If x y, w ≺ then xu yu; see, e.g., [20]. Moreover, if x y and u v, then xu yv. w w w w ≺ ≺ ≺ ≺ The proof is simple, x y implies xu yu and u v implies yu yv, so we w w w w ≺ ≺ ≺ ≺ have xu yv. w ≺ Majorization is one of the most powerful techniques that can be used to derive inequalities in a concise way. Majorization relations among eigenvalues and singular values ofmatrices produce a varietyofinequalities in matrix theory. We review some existing majorization inequalities and prove necessary new majorization inequalities for singular values and eigenvalues in the Appendix. We define PABS using singular values; see, e.g., [5, 17, 40]. Definition 3.3. Let columns of the matrices X Cn×p and Y Cn×q form ∈ ∈ orthonormal bases for the subspaces and , correspondingly, and m = min(p,q). X Y Then cos Θ↑( , ) =S XHY = s XHY ,...,s XHY . 1 m X Y (cid:0) (cid:1) (cid:0) (cid:1) (cid:2) (cid:0) (cid:1) (cid:0) (cid:1)(cid:3) 6 PEIZHENZHUandANDREWV.KNYAZEV 4. Majorization-type mixed bounds. In this section, we derive several dif- ferent majorization-type mixed bounds of the absolute changes of eigenvalues of the matrix RQ for Hermitian matrices in terms of singular values of residual matrix and PABS. One of our main results is contained in the following theorem. Theorem 4.1. Under the assumptions of Conjecture 2.1, we have 1 (4.1) Λ XHAX Λ YHAY S(P R )+S(P R ) . w Y X X Y − ≺ cos(θ ( , )){ } max X Y (cid:12) (cid:0) (cid:1) (cid:0) (cid:1)(cid:12) If the su(cid:12)bspace is A-invariant, t(cid:12)hen X 1 (4.2) Λ XHAX Λ YHAY S(P R ). w X Y − ≺ cos(θ ( , )) max X Y (cid:12) (cid:0) (cid:1) (cid:0) (cid:1)(cid:12) (cid:12) (cid:12) Proof. Since Θ( , ) < π/2, the singular values of XHY are positive, which X Y impliesthatthematrixXHY isinvertible. WeapplythefirststatementofLemmaA.4 with A:=XHAX, B :=YHAY and T :=XHY obtaining 1 Λ XHAX Λ YHAY S XHAX XHY XHY YHAY . − ≺w s (XHY) − min (4(cid:12).3(cid:0)) (cid:1) (cid:0) (cid:1)(cid:12) (cid:0) (cid:0) (cid:1) (cid:0) (cid:1) (cid:1) (cid:12) (cid:12) By Definition 3.3, the singular values of XHY are the cosines of principal angles between two subspaces and . So, we have X Y (4.4) s XHY =cos(θ ( , )). min max X Y Additionally,theexpressionXH(cid:0)AX X(cid:1) HY XHY YHAY intherightsideof (4.3) − can be written as (cid:0) (cid:1) (cid:0) (cid:1) XHAX XHY XHY YHAY =XHA I X XH Y XH I Y YH AY − − ⊥ ⊥ − − ⊥ ⊥ (4.5) (cid:0) (cid:1) (cid:0) (cid:1) =−XHA(cid:0)X⊥X⊥HY +(cid:1)XHY⊥Y⊥H(cid:0)AY, (cid:1) where[X,X ]and[Y,Y ]areunitarymatrices. Sincethesingularvaluesareinvariant ⊥ ⊥ under conjugate transpose and orthonormal transforms, we have (4.6) S XHAX⊥X⊥HY =S YHX⊥X⊥HAX =S(PYPX⊥AX)=S(PYRX). Similarly(cid:0), S XHY YHA(cid:1)Y =(cid:0) S(P R ). F(cid:1)rom Theorem A.2 and taking into ac- ⊥ ⊥ X Y count equalities (4.5) and (4.6), we establish that (cid:0) (cid:1) S XHAX XHY XHY YHAY = S XHAX XHY +XHY YHAY − − ⊥ ⊥ ⊥ ⊥ (cid:0) (cid:0) (cid:1) (cid:0) (cid:1) (cid:1)≺w S(cid:0)XHAX⊥X⊥HY +S XHY⊥Y⊥HA(cid:1)Y (4.7) = S(P R )+S(P R ). (cid:0) Y X X(cid:1) Y (cid:0) (cid:1) Substituting (4.7) and (4.4) into (4.3), we obtain (4.1). If the subspace is A- X invariant, then R =0, and (4.1) turns into (4.2). X Let us clarify implications of the weak majorization inequalities in Theorem 4.1. The components of both vectors Λ XHAX and Λ YHAY are ordered decreas- ing. Let us denote Λ XHAX = [α ,α ,...,α ] where α α ... α and (cid:0)1 2 (cid:1) p (cid:0) 1 (cid:1)≥ 2 ≥ ≥ p (cid:0) (cid:1) Rayleigh-Ritzmajorizationmixederrorbounds 7 Λ YHAY = [β ,β ,...,β ] where β β ... β . For k = 1,...,p, the weak 1 2 p 1 2 p ≥ ≥ ≥ majorization inequalities in (4.1) and (4.2) in Theorem 4.1 are equivalent to (cid:0) (cid:1) k k 1 (4.8) α β ↓ s↓(P R )+s↓(P R ) , | i− i| ≤ cos(θ ( , )) i Y X i X Y max Xi=1 X Y Xi=1(cid:16) (cid:17) and k k 1 (4.9) α β ↓ s↓(P R ) . | i− i| ≤ cos(θ ( , )) i X Y max Xi=1 X Y Xi=1(cid:16) (cid:17) If p=1, the results in (4.8) and (4.9) are the same as (2.1) and (2.2). We use only the largest angle in Theorem 4.1. We now prove two majorization mixed bounds involving all principal angles, but not as strong as (2.5). The first one is for the squares. Theorem 4.2. Under the assumptions of Conjecture 2.1, we have Λ XHAX Λ YHAY 2 {S(PYRX)+S(PXRY)}2. − ≺w cos2(Θ↓( , )) X Y (cid:12) (cid:0) (cid:1) (cid:0) (cid:1)(cid:12) In addition,(cid:12)if the subspace is A-inv(cid:12)ariant, then X Λ XHAX Λ YHAY 2 S2(PXRY) . − ≺w cos2(Θ↓( , )) X Y (cid:12) (cid:0) (cid:1) (cid:0) (cid:1)(cid:12) (cid:12) (cid:12) Proof. We substitute XHAX, YHAY and XHY for A,B, and T in the third result of Lemma A.4, and get Λ XHAX Λ YHAY 2 − (cid:12) (cid:0)S2 XH(cid:1)Y −(cid:0)1 S2 X(cid:1)(cid:12)HAX XHY XHY YHAY . (cid:12)≺w (cid:12) − Definition 3.3 gives(cid:16)S(cid:0) XHY(cid:1) =(cid:17) cos(cid:0)Θ↑( , (cid:0)) , thu(cid:1)s, (cid:0) (cid:1) (cid:1) X Y (4.10) (cid:0) S (cid:1)XHY (cid:0)−1 = (cid:1) 1 . cos(Θ↓( , )) (cid:16)(cid:0) (cid:1) (cid:17) X Y From (4.7), we already have S XHAX XHY XHY YHAY S(P R )+S(P R ) . w Y X X Y − ≺ { } Since incr(cid:0)easing co(cid:0)nvex f(cid:1)unct(cid:0)ions p(cid:1)reserve w(cid:1)eak majorization by Theorem 3.2, we take the function f(x) = x2 for nonnegative x. Squaring both sides of the weak majorization inequality above yields S2 XHAX XHY XHY YHAY S(P R )+S(P R ) 2. w Y X X Y − ≺ { } Together w(cid:0)ith (4.10(cid:0)), this(cid:1)pro(cid:0)ves the(cid:1)first sta(cid:1)tement of Theorem 4.2. If the subspace is A-invariant, then R =0, which completes the proof. X X Letushighlight,thatonecannottakethesquarerootfrombothsidesoftheweak majorization inequalities in Theorem 4.2. Without the squares, we can prove bound (2.5) of Conjecture 2.1 but with an extra multiplier. Theorem 4.3. Under the assumptions of Conjecture 2.1, we have S(P R )+S(P R ) Λ XHAX Λ YHAY √c{ Y X X Y }, − ≺w cos(Θ↓( , )) X Y (cid:12) (cid:0) (cid:1) (cid:0) (cid:1)(cid:12) (cid:12) (cid:12) 8 PEIZHENZHUandANDREWV.KNYAZEV where c=cos(θ ( , ))/cos(θ ( , )). If the subspace is A-invariant, then min max X Y X Y X S(P R ) Λ XHAX Λ YHAY √c X Y . − ≺w cos(Θ↓( , )) X Y (cid:12) (cid:0) (cid:1) (cid:0) (cid:1)(cid:12) (cid:12) (cid:12) Proof. TheassumptionΘ( , )<π/2impliesthatXHY isinvertible,soYHAY issimilartothematrix XHY XYHYAY XHY −1. ThematrixAisHermitian,soare XHAX and YHAY. From the spectral decomposition, we have XHAX =U D UH, (cid:0) (cid:1) (cid:0) (cid:1) 1 1 1 where U is unitary and D is diagonal. Similarly, YHAY = U D UH, where U is 1 1 2 2 2 2 unitary and D is diagonal. As a consequence, we have 2 Λ XHAX Λ YHAY =Λ XHAX Λ XHY YHAY XHY −1 − − (cid:0) (cid:1) (cid:0) (cid:1)=Λ(cid:0)U D UH(cid:1) Λ(cid:16)(cid:0) XHY(cid:1) U D U(cid:0)H XH(cid:1)Y(cid:17)−1 . 1 1 1 − 2 2 2 (cid:0) (cid:1) (cid:16)(cid:0) (cid:1) (cid:0) (cid:1) (cid:17) Applying Theorem A.5, we obtain Λ XHAX Λ YHAY | − | (cid:0)κ(U )κ(cid:1)XHY(cid:0)U 1/2(cid:1)S XHAX XHY YHAY XHY −1 . w 1 2 ≺ − (cid:2) (cid:0) (cid:1)(cid:3) (cid:16) (cid:0) (cid:1) (cid:0) (cid:1) (cid:17) Furthermore,theconditionnumberofU is1andtheconditionnumberofXHYU is 1 2 equaltotheconditionnumberofXHY,i.e. κ(U )=1andκ XHYU =κ XHY . 1 2 Moreover,we have (cid:0) (cid:1) (cid:0) (cid:1) XHAX XHY YHAY XHY −1 − = XHAX(cid:0) XHY(cid:1) XH(cid:0)Y YH(cid:1)AY XHY −1. − Consequently, (cid:2) (cid:0) (cid:1) (cid:0) (cid:1) (cid:3)(cid:0) (cid:1) Λ XHAX Λ YHAY − (4.11) (cid:12) (cid:0)κ XHY(cid:1) 1/2(cid:0)S XHA(cid:1)X(cid:12) XHY XHY YHAY S XHY −1 . ≺(cid:12)w (cid:12) − (cid:2) (cid:0) (cid:1)(cid:3) (cid:2) (cid:0) (cid:1) (cid:0) (cid:1) (cid:3) (cid:16)(cid:0) (cid:1) (cid:17) Substituting (4.7) and (4.10) into (4.11) completes the proof. The bounds in Theorems 4.1, 4.2, and 4.3 are different, but comparable and, in some particular cases, actually the same. For example, the bounds for the largest component max Λ XHAX Λ YHAY are the same in both Theorem 4.1 and − Theorem 4.2. If dim( ) = dim( ) = 1, then the bounds in all three theorems are (cid:12) (cid:0) X (cid:1) Y(cid:0) (cid:1)(cid:12) the same as the fi(cid:12)rst inequality in (2.1). (cid:12) Theorem4.3maybetighterforthesumof Λ XHAX Λ YHAY ,compared − to that of Theorems 4.1 and 4.2, since (cid:12) (cid:0) (cid:1) (cid:0) (cid:1)(cid:12) (cid:12) (cid:12) 1/cos Θ↓( , ) [1/cos(θ ( , )),...,1/cos(θ ( , ))]. max max X Y ≤ X Y X Y Theorems 4(cid:0).1, 4.2, an(cid:1)d 4.3 provide various alternatives to conjecture (2.5), all involving the singular values of P R and P R . Our second conjecture (2.6) re- X Y Y X lies instead on the singular values of P R and P R . We next clarify the X+Y Y X+Y X relationship between these singular values. Lemma 4.1. We have S(P R ) S(P R ) sin(Θ( , )) and, similarly, X Y w X+Y Y ≺ X Y S(P R ) S(P R ) sin(Θ( , )). Y X w X+Y X ≺ X Y Rayleigh-Ritzmajorizationmixederrorbounds 9 Proof. Since the singular values are invariant under unitary transforms and the matrix conjugate transpose, we get S(PXRY)=S XHPY⊥AY =S YHAPY⊥X . The identities YHAPY⊥X =YHAPY⊥PY⊥X =YHAPY⊥PX+YPY⊥X hold, since (cid:0) (cid:1) (cid:0) (cid:1) PY⊥X =X PYX =PX+Y(X PYX)=PX+YPY⊥X, − − whereeverycolumnofthematrixX P X evidentlybelongstothesubspace + . Y − X Y Thus, Theorem A.3 gives S YHAPY⊥X w S YHAPY⊥PX+Y S(PY⊥X)=S(PX+YPY⊥AY) S(PY⊥X) ≺ (cid:0) (cid:1) = S(cid:0)(PX+YRY) S(PY(cid:1)⊥X). The singular values of PY⊥X coincide with the sines of the principal angles between and , e.g., see [5, 17], since dim( ) = dim( ), i.e. S(PY⊥X) = sin(Θ( , )). X Y X Y X Y This proves S(P R ) S(P R )sin(Θ( , )). X Y w X+Y Y ≺ X Y The second bound similarly follows from S(PYRX) w S(PX+YRX)S(PX⊥Y) ≺ since S(PX⊥Y) = sin(Θ( , )) due to the symmetry of PABS and dim = dim . X Y X Y We note that (2.5) implies (2.6) using Lemma 4.1. We can also combine Theo- rems 4.1,4.2,and4.3 withLemma 4.1 to easilyobtainseveraltangent-basedbounds. Corollary 4.4. Under the assumptions of Conjecture 2.1, we have sin(Θ( , )) Λ XHAX Λ YHAY w S(PX+YRX)+S(PX+YRY) X Y , − ≺ { }cos(θ ( , )) max X Y (cid:12) (cid:0) (cid:1) (cid:0) (cid:1)(cid:12) (cid:12) (cid:12) Λ XHAX Λ YHAY 2 S(P R )+S(P R ) 2 tan2(Θ( , )), w X+Y X X+Y Y − ≺ { } X Y (cid:12) (cid:0) (cid:1) (cid:0) (cid:1)(cid:12) (cid:12) (cid:12) Λ XHAX Λ YHAY √c S(P R )+S(P R ) tan(Θ( , )). w X+Y X X+Y Y − ≺ { } X Y In(cid:12) a(cid:0)ddition, (cid:1)if the(cid:0)subspace(cid:1)(cid:12) is A-invariant, then correspondingly we have (cid:12) (cid:12)X sin(Θ( , )) (4.12) Λ XHAX Λ YHAY w S(PX+YRY) X Y , − ≺ cos(θ ( , )) max X Y (cid:12) (cid:0) (cid:1) (cid:0) (cid:1)(cid:12) (cid:12) (cid:12) (4.13) Λ XHAX Λ YHAY 2 S2(P R ) tan2(Θ( , )), w X+Y Y − ≺ X Y (cid:12) (cid:0) (cid:1) (cid:0) (cid:1)(cid:12) (cid:12) (cid:12) (4.14) Λ XHAX Λ YHAY √cS(P R ) tan(Θ( , )), w X+Y Y − ≺ X Y where c=cos(cid:12)(θ(cid:0) ( , (cid:1)))/co(cid:0)s(θ ((cid:1)(cid:12), )). (cid:12) min X Y max X(cid:12) Y 5. Discussion. In this section, we briefly discuss and compare our new mixed majorization bounds of the absolute changes of eigenvalues, e.g., where the subspace is A-invariant, with some known results, formulated here in our notation. X Our bounds (4.12) and (4.13) are stronger than the following particular cases, where dim =dim , of [30, Theorem 1], X Y cos(θ ( , ))max Λ XHAX Λ YHAY sin(θ ( , )) R , max max Y X Y − ≤ X Y k k and [30, Remark 3], that u(cid:12)ses(cid:0)the Frob(cid:1)enius(cid:0)norm (cid:1)(cid:12) , (cid:12) k·(cid:12)kF cos(θ ( , )) Λ XHAX Λ YHAY sin(θ ( , )) R . max X Y − F ≤ max X Y k YkF (cid:13) (cid:0) (cid:1) (cid:0) (cid:1)(cid:13) (cid:13) (cid:13) 10 PEIZHENZHUandANDREWV.KNYAZEV 5.1. Sun’s 1991 majorization bound. Substituting the 2-norm in [35, The- orem 3.3] in our notation as follows, I XHYYHX = sin2(θ ( , )), one − 2 max X Y obtains the bound Λ XHAX Λ YHAY S(R )tan(θ ( , )), which is − (cid:13) ≺w Y(cid:13) max X Y weaker compared to our bound (4.12), s(cid:13)ince (cid:13) (cid:12) (cid:0) (cid:1) (cid:0) (cid:1)(cid:12) (cid:12) (cid:12) (5.1) S(P R ) s (P )S(R )=S(R ) X+Y Y max X+Y Y Y ≤ and sin(Θ( , )) sin(θ ( , )). max X Y ≤ X Y 5.2. First order a posteriori majorization bounds. A posteriori majoriza- tion bounds in terms of norms of residuals for the Ritz values approximating eigen- values are known for decades. We quote here one of the best such bounds of the first order, i.e. involving the norm rather, then the norm squared, of the residual. Theorem 5.1 ([2, 34, 37]). Let Y be an orthonormal n by p matrix and matrix A be Hermitian. Then there exist a set of incices 1 i < i < < i n, and 1 2 p ≤ ··· ≤ some p eigenvalues of A as Λ (A)= λ ,...,λ , such that I i1 ip Λ (A) Λ YHAY (cid:0) [s ,s ,s ,(cid:1)s ,...] 2S(R ), I w 1 1 2 2 w Y | − |≺ ≺ (cid:0) (cid:1) where S(R )=[s ,s ,...,s ]. The multiplier 2 cannot be removed; see [2, p. 188]. Y 1 2 p It is important to realize that in our bounds the choice of subspaces and is X Y arbitrary, while in Theorem 5.1 one cannot choose the subspace . The implication X of this fact is that we can choose in our bounds, such that θ ( , ) π/4, to max X X Y ≤ make our bounds sharper. Next, we describe some situations where principal angles are less than π/4. 1. [9] Let A be a Hermitian quasi-definite matrix, i.e. H BH (5.2) A= , B G (cid:20) − (cid:21) whereH Ck×kandG C(n−k)×(n−k) areHermitianpositivedefinitematri- ∈ ∈ ces. Let the subspace be spanned by the eigenvectors of A corresponding X to p eigenvalues which have the same sign. Let the subspace be spanned Y by e ,...,e and be spanned by e ,...,e , where e is the coordi- 1 p n−p+1 n i Z nate vector. Then, if the eigenvalues corresponding to the eigenspace are X positive, we have θ ( , ) < π/4. If the eigenvalues corresponding to the max X Y eigenspace are negative, we have θ ( , )<π/4. max X X Z 2. [6, p. 64] Let [Y Y ] be unitary. Suppose λ YHAY < λ YHAY ⊥ max min ⊥ ⊥ and isthespacespannedbytheeigenvectorscorrespondingtothepsmallest X (cid:0) (cid:1) (cid:0) (cid:1) eigenvalues of A. Then θ ( , )<π/4. max X Y 3. [8, sin(2θ) and Theorem 8.2] Let A be Hermitian and let [XX ] be unitary ⊥ with X Cn×p, such that [XX ]HA[XX ]=diag(L ,L ). Let Y Cn×p ⊥ ⊥ 1 2 ∈ ∈ be with orthonormal columns and H = YHAY. Let there be δ > 0, such Y thatΛ(L ) [α,β]andΛ(L ) R [α δ,β+δ]. LetΛ(H ) [α δ,β δ]. 1 ∈ 2 ∈ \ − Y ∈ −2 −2 Then θ ( , )<π/4. max X Y Theorem 5.1 gives a first order error bound using S(R ). Under additional Y assumptions, there are similar, but second order, also called quadratic, i.e. involving the squareS2(R ),a posteriorierrorbounds; e.g.,[27, 30]. Next we checkhow some Y knownbounds(see,e.g.,[8,28,29,36]andreferencesthere)fortheanglesΘ( , )in X Y terms of S(R ) can be combined with our tangent-based results, leading to various Y second order a posteriori error bounds, comparable to those in [27, 30].

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.