ebook img

Newton Method on Riemannian Manifolds: Covariant Alpha-Theory PDF

29 Pages·0.28 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Newton Method on Riemannian Manifolds: Covariant Alpha-Theory

Newton’s Method on Riemannian Manifolds: 3 0 Covariant Alpha-Theory. 0 2 n Jean-Pierre Dedieu Pierre Priouret Gregorio Malajovich a ∗ † ‡ J 5 January 15, 2003 1 ] A N Abstract . h t In this paper,Smale’s α theory is generalized to thecontext of intrinsic a m Newton iteration on geodesically complete analytic Riemannian and Her- [ mitian manifolds. Results are valid for analytic mappings from a manifold to a linear space of the same dimension, or for analytic vector fields on 2 v the manifold. The invariant γ is defined by means of high order covari- 6 ant derivatives. Bounds on the size of the basin of quadratic convergence 9 are given. If the ambient manifold has negative sectional curvature, those 0 9 bounds depend on the curvature. A criterion of quadratic convergence for 0 Newton iteration from the information available at a point is also given. 2 0 / h 1 Introduction and main results. t a m Numerical problems posed in manifolds arise in many natural contexts. Classical : v examples are given by the eigenvalue problem, the symmetric eigenvalue prob- i X lem, invariant subspace computations, minimization problems with orthogonality r constraints, optimization problems with equality constraints ... etc. In the first a example, Ax = λx, the unknowns are the eigenvalue λ C and the eigenvector ∈ x P (C), the complex projective space consisting of complex vector lines n 1 thr∈ough−the origin in Cn. In the second example, Ax = λx, A real and sym- metric, the unknowns are λ R and x Sn 1, the unit sphere in Rn. In the − ∈ ∈ third example the unknown is a k dimensional subspace contained in Cn that − is an element of the Grassmann manifold G (C). The fourth example involves n,k ∗MIP. D´epartement de Math´ematique, Universit´e Paul Sabatier, 31062 Toulouse cedex 04, France ([email protected]). †MIP. D´epartement de Math´ematique, Universit´e Paul Sabatier, 31062 Toulouse cedex 04, France ([email protected]). ‡Departamento de Matema´tica Aplicada, Universidade Federal de Rio de Janeiro, Caixa Postal 68530,CEP 21945-970,Rio de Janeiro, RJ, Brazil ([email protected]). 1 the orthogonal group, the special orthogonal group or the Stiefel manifold (n k × matrices with orthonormal columns). The last example leads to problems posed on submanifolds in Rn. For such or similar problems our objective is to design algorithms which re- spect their geometrical structure. We follow here the lines of the Geometric Integration Interest Group (http://www.focm.net/gi/) who showed the interest of such an approach. The first author’s original motivation came from homogeneous and multi- homogeneous polynomial systems (Dedieu-Shub [6]) and also from a model for the human spine (Adler-Dedieu-Margulies-Martens-Shub [1]) with configuration space SO(3)18. A second motivation, for the second author, came from sparse polynomial systems of equations where the solutions belong to a certain toric variety (Malajovich-Rojas [19]). For such problems one often has to compute the solutions of a system of equations or to find the zeros of a vector field. For this reason we investigate here one of the most famous method to approximately solve these problems: the Newton method. Inthis paper, weinvestigate thelocalbehavior ofNewton’s iterationclose toa solution. While a lot is known about Newton’s iteration in linear spaces [2], little is known about intrinsic Newton’s iteration in more general manifolds. Our main results here (Theorems 1.3 to 1.6 below) extend Smale’s α-theory to analytic Rie- mannian manifolds. α theory provides a criterion for the quadratic convergence of Newton’s iteration in a neighborhood of a solution. This criterion depends on available data at the approximate solution. One important application (out of the scope of this paper) is the construction of rigorous homotopy algorithms for the solution of non-linear equations. More precisely, we will study quantitative aspects of Newton’s method for finding zeros of mappings f : M Rn and vector fields X : M TM . Here n n n → → M denotes a real complete analytic Riemannian manifold, TM its tangent n n bundle, f and X are analytic. We denote by T M the tangent space at z to z n M , by .,. the scalar product on T M with associated norm . , by d the n z z n z h i k k Riemannian metric on M and by exp : T M M the exponential map. This n z z n → n map is defined on the whole tangent bundle TM because M is assumed to be n n complete. We denote by r > 0 the radius of injectivity of the exponential map z aabtozu.tTuhuwsi,thexrpazd:iuBsTrz,(0B¯,(ruz),r→) iBs tMhne(zc,lorsze)disboanlle).to one (B(u,r) is the open ball When M = Rn the Newton operator associated with f is defined by n N (z) = z Df(z) 1f(z). f − − In this context T Rn may be identified to Rn and exp (u) = z +u so that z z N (z) = exp ( Df(z) 1f(z)). f z − − 2 This formula makes sense in the context of Riemannian manifolds and we define the Newton operator N : M M in this way. f n n → When, instead of a mapping M Rn we consider a vector field X : M n n → → TM , in order to define Newton’s method, we resort to an object studied in n differential geometry; namely, the covariant derivative of vector fields. Let ∇ denote the Levi-Civita connection on M . For any vector fields X and Y on n M , (Y) is called the covariant derivative of Y with respect to X. Since is n X ∇ ∇ tensorial in X the value of (Y) at z M depends only on the tangent vector X n ∇ ∈ u = X(z) T M . For this reason we denote it z n ∈ ( (Y))(z) = DY(z)(u). X ∇ It is a linear map DY(z) : T M T M . z n z n → The Newton operator for the vector field X is defined by N (z) = exp ( DX(z) 1X(z)). X z − − Notice this definition coincides with the usual one when X is a vector field in Rn because the covariant derivative is just the usual derivative. In a vector space framework, Newton’s method makes zeros of f with non- singular derivativecorrespondtofixed pointsofN andNewtonsequences x = f k+1 N (x ), for an initial point x taken close to such a fixed point ζ, converge f k 0 quadratically to ζ. In this paper, our aim is to make these statements precise in our new geometric framework and to investigate quantitative aspects. We have in mind the following two theorems which are valid when M is equal to Rn or n in the more general context of an analytic mapping f : E F between two real → or complex Banach spaces: Theorem 1.1 (γ Theorem, Smale, 1986) Suppose that f(ζ) = 0 and Df(ζ) − is an isomorphism. Let Dkf(z) 1/k−1 γ(f,z) = sup Df(z) 1 . − k! k 2(cid:13) (cid:13) ≥ (cid:13) (cid:13) (cid:13) (cid:13) If (cid:13) (cid:13) 3 √7 z ζ − k − k ≤ 2γ(f,ζ) (k) then the Newton sequence z = N (z) is defined for all k 0 and k f ≥ 2k 1 1 − z ζ z ζ . k k − k ≤ 2 k − k (cid:18) (cid:19) 3 For a proof see Blum-Cucker-Shub-Smale [2] Chap. 8, Theorem 1. The sec- ond theorem we want to extend to the context of Riemannian manifolds is the following: Theorem 1.2 (α Theorem, Smale, 1986) Let − β(f,z) = Df(z) 1f(z) − k k and α(f,z) = β(f,z)γ(f,z). We also let α(f,z) = when Df(z) is not invertible. There is a universal ∞ constant α > 0 with the following property: if α(f,z) < α then there is a zero 0 0 ζ of f such that Df(ζ) is an isomorphism and such that the Newton sequence (k) z = N (z) is defined for all k 0 and satisfies k f ≥ 2k 1 1 − z ζ z ζ . k k − k ≤ 2 k − k (cid:18) (cid:19) Moreover, the distance from z to the zero ζ is at most 2β(f,z). ThissecondtheoremisprovedinSmale[35]withtheconstantα = 0.13071... 0 and Kim [16] and [17] for a one-dimensional version. 1.1 Definitions and notations. In order to generalize these two results we have to define the corresponding in- variants in the context of Riemannian manifolds. The material contained in this sectionisclassicalinRiemanniangeometry. Thereaderisreferedtoatextbookon this subject, for example: Dieudonn´e [7], Do Carmo [8], Gallot-Hulin-Lafontaine [11], Helgason [12], O’Neill [21]. Definition 1.1 (Tensors.) The space of p contravariant and q covariant an- − − alytic tensor fields T : T(M )p T (M )q (M ) n ∗ n n × → F is denoted by p(M ). An m tuple of such tensor fields is called a vectorial Tq n − tensor field and the space of vectorial tensor fields is denoted by p(M ,Rm). Tq n Here T (M ) is the cotangent bundle on M (the space of 1 forms) and ∗ n n − (M ) the space of scalar analytic functions defined on M . We let (M ) = n n n F F 0(M ). Let denote the Levi-Civita connection on M . For any vector field T0 n ∇ n X and Y on M , (Y) is called the covariant derivative of Y with respect to n X ∇ X. 4 Definition 1.2 (Covariant derivative for tensor fields.) Let X be a vector field on M . For any integers p, q 0 and any tensor field T p(M ) the n ≥ ∈ Tq n covariant derivative is defined by: (g) = X(g) = Dg(X) the derivative of g along the vector field X when X • ∇ g is a function: g 0(M ) ∈ T0 n (Y) is given by the connection when Y is a vector field i.e. Y 1(M ) • ∇X ∈ T0 n For a 1 form ω 0(M ) its covariant derivative is the 1 form defined • − ∈ T1 n − by (ω)(Y) = X(ω(Y)) ω( (Y)) X X ∇ − ∇ for any vector field Y. For a tensor field T p(M ) the covariant derivative is the tensor field • ∈ Tq n T p(M ) defined by ∇X ∈ Tq n T(ω1...ωp,Y ...Y ) = X(T(ω1...ωp,Y ...Y )) X 1 q 1 q ∇ − T( (ω1)...ωp,Y ...Y ) ... T(ω1...ωp,Y ... (Y )) X 1 q 1 X q ∇ − − ∇ for any 1 forms ωi and vector fields Y . j − For a vectorial tensor field T p(M ,Rm) • ∈ Tq n T T 1 X 1 ∇ . . X  ..  =  .. . ∇ T T m X m    ∇      Definition 1.3 (Covariant k th derivative for tensor fields.) Let X be − a vector field on M . For any integers p, q 0 and any tensor fields T n ≥ ∈ p(M ,Rm) the k th covariant derivative is defined inductively by Tq n − k T = k 1T . ∇X ∇X ∇X− Since the covariant derivative is tens(cid:0)orial in(cid:1)X, its value at a given point z M depends only on the vector X(z). For this reason, the following definition n ∈ makes sense: Definition 1.4 (Covariant k th derivative for tensor fields at a point.) − Let a point z M and a vector u (M ) be given. Let X be a vector field such n z n ∈ ∈ T that X(z) = u. For any integers p, q 0 and any tensor field T p(M ,Rm) ≥ ∈ Tq n the value at z of the k th covariant derivative is denoted by: − DkT(z)(u,...,u) = DkT(z)uk = ( k T)(z). ∇X It defines a k multilinear map − DkT(z) : ((T M )p (T M )q)k Rm. z n × z∗ n → 5 Definition 1.5 (Norm of a multilinear map.) Let M : (T M )k Rm z n → be a k multilinear map. Its norm is defined by − M z = sup M(u1,...,uk) Rm k k k k where the supremum is taken for all the vectors u T M such that u = 1. j z n j z ∈ k k The following definition extends the definition of γ(f,z) to a Riemannian context. Definition 1.6 (Gamma.) Let a map f : M Rn and a vector field X : n → M TM be given. For any point z M we let n n n → ∈ Dkf(z) 1/k−1 γ(f,z) = sup Df(z) 1 , − k! k 2(cid:13) (cid:13)z ≥ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) DkX(z)(cid:13) 1/k−1 γ(X,z) = sup DX(z) 1 . − k! k 2(cid:13) (cid:13)z ≥ (cid:13) (cid:13) We also let γ(f,z) = when Df((cid:13)z) is not invertible(cid:13), idem for γ(X,z). ∞ (cid:13) (cid:13) This definition is justified by the definitions 1.4 and 1.5. When Df(z) is invertible then, by analyticity, γ(f,z) is finite. We also have to consider the following number K related to the sectional curvature at ζ M . ζ n ∈ Definition 1.7 For any ζ M n ∈ d(exp (u),exp (v)) K = sup z z ζ u v z k − k where the supremum is taken for all z ∈ BMn(ζ,rζ), and u, v ∈ TzMn with kukz and v r ), with r the radius of injectivity at ζ. z ζ ζ k k ≤ Remark 1.1 K measures howfastthe geodesicsspreadapartin M . When ζ n • u = 0 or more generally when u and v are on the same line through 0, d(exp (u),exp (v)) = u v . z z k − kz Therefore, we always have K 1. ζ ≥ 6 When M has non-negative sectional curvature, the geodesics spread apart n • less than the rays (Do Carmo, [8] Chap. V-2) so that d(exp (u),exp (v)) u v z z ≤ k − kz and consequently K = 1. ζ Examples of manifolds with non-negative curvature are given by Rn, Sn • the unit sphere in Rn+1, Pn(R) the real projective space i.e the space of real vector lines in Rn+1 ([8], Chap. 8, Prop. 4.4), Pn(C) the complex projective space i.e the space of complex vector lines in Cn+1 ([8], Chap. 8, Exerc. 11), a Lie group with a bi-invariant metric ([8], Chap. 4, Exerc. 1), O n and SO the orthogonal and special orthogonal groups (Lie groups) ... n 1.2 Main results for mappings. Our first main theorem relates the size of the quadratic attraction basin of a zero ζ of f to the invariants γ(f,ζ) and K . ζ Theorem 1.3 (R γ theorem) Let f : M Rn be analytic. Suppose that n − − → f(ζ) = 0 and Df(ζ) is an isomorphism. Let K +2 K2 +4K +2 ζ − ζ ζ R(f,ζ) = min r , .  ζ 2qγ(f,ζ)    (k) If d(z,ζ) R(f,ζ) then the Newton sequence z = N (z) is defined for all ≤ k f k 0, and ≥ 2k 1 1 − d(z ,ζ) d(z,ζ). k ≤ 2 (cid:18) (cid:19) Remark 1.2 When M = Rn equippedwith the usual metric structure, the radius n of injectivity r = and K = 1. Thus, R(f,ζ) = (3 √7)/2γ(f,ζ) as in ζ ζ ∞ − Theorem 1.1. When M has non-negative sectional curvature, according to Remark 1.1 one n has K = 1 and Theorem 1.3 becomes ζ Corollary 1.1 When M has non-negative sectional curvature, let f : M Rn n n → be analytic. Suppose that f(ζ) = 0 and Df(ζ) is an isomorphism. Let 3 √7 R(f,ζ) = min r , − . ζ 2γ(f,ζ) ! 7 (k) If d(z,ζ) R(f,ζ) then the Newton sequence z = N (z) is defined for all ≤ k f k 0, and ≥ 2k 1 1 − d(z ,ζ) d(z,ζ). k ≤ 2 (cid:18) (cid:19) Theorem 1.3 has two interesting and immediate consequences: a lower esti- mate for the distance from other zeros and a lower estimate for the distance from the singular locus Σ = z M : detDf(z) = 0 . f n { ∈ } Corollary 1.2 Suppose that f(ζ) = 0 and Df(ζ) is an isomorphism. Then, for any other zero ζ = ζ one has ′ 6 d(ζ ,ζ) > R(f,ζ). ′ Moreover, for any z Σ the same inequality holds: f ∈ d(z,ζ) > R(f,ζ). Our second main theorem generalizes Theorem 1.2. We give sufficient condi- tions for z M to be the starting point of a quadratically convergent Newton n ∈ sequence. These conditions are given in terms of f at z, not in the behaviour of f in a neighborhood of z as in Kantorovich theory. We first need three definitions. Definition 1.8 The function ψ(u) = 1 4u+2u2 is decreasing from 1 to 0 when − 0 u 1 √2/2. We denote by α = 0.130716944... the unique root of the 0 ≤ ≤ − equation 2u = ψ(u)2 in this interval. Definition 1.9 σ is the sum of the following series: 2k 1 1 − σ = = 1.632843018... 2 k 0(cid:18) (cid:19) X≥ Definition 1.10 1 s = = 0.103621842... 0 σ + (1−σα0)2 1+ σ ψ(σα0) 1 σα0 − (cid:16) (cid:17) Definition 1.11 We let β(f,z) = Df(z) 1f(z) and α(f,z) = β(f,z)γ(f,z). − z k k We give to β(f,z) and α(f,z) the value when Df(z) is singular. ∞ 8 Theorem 1.4 (R α Theorem) Let f : M Rn be analytic. Let z M n n − − → ∈ be such that β(f,z) s r and α(f,z) < α . 0 z 0 ≤ Then the Newton sequence z = z, z = N (z ) is defined for all integers k 0 0 k+1 f k ≥ and converges to a zero ζ of f. Moreover, 2k 1 1 − d(z ,z ) β(f,z) k+1 k ≤ 2 (cid:18) (cid:19) and d(ζ,z) σβ(f,z). ≤ Remark 1.3 When M = Rn is equipped with the usual metric structure, the ra- n dius of injectivity r = and the first condition in Theorem 1.4 is automatically ζ ∞ satisfied. In this context Theorems 1.2 and 1.4 coincide. 1.3 Main results for vector fields. The case of vector fields is treated similarly. As in Theorem 1.3 we have: Theorem 1.5 (R γ Theorem) Let X : M TM be an analytic vector n n − − → field. Suppose that X(ζ) = 0 and DX(ζ) is an isomorphism. Let K +2 K2 +4K +2 ζ − ζ ζ R(X,ζ) = min r , .  ζ 2qγ(X,ζ)    (k) If d(z,ζ) R(X,ζ) then the Newton sequence z = N (z) is defined for all ≤ k X k 0, and ≥ 2k 1 1 − d(z ,ζ) d(z,ζ). k ≤ 2 (cid:18) (cid:19) Like for mappings, Theorem 1.5 gives estimates for the distance from other zeros and a lower estimate for the distance from the singular locus Σ = z M : detDX(z) = 0 . X n { ∈ } Corollary 1.3 Suppose that X(ζ) = 0 and DX(ζ) is an isomorphism. Then, for any other zero ζ = ζ one has ′ 6 d(ζ ,ζ) > R(X,ζ). ′ Moreover, for any z Σ the same inequality hold: X ∈ d(z,ζ) > R(X,ζ). 9 The invariants β and α are defined similarly: Definition 1.12 We let β(X,z) = DX(z) 1X(z) − z k k and α(X,z) = β(X,z)γ(X,z). We give to β(X,z) and α(X,z) the value when DX(z) is singular. ∞ Theorem 1.6 (R α Theorem) Let X : M TM be an analytic vector n n − − → field. Let z M be such that n ∈ β(X,z) s r and α(X,z) < α . 0 z 0 ≤ Then the Newton sequence z = z, z = N (z ) is defined for all integers k 0 0 k+1 X k ≥ and converges to a zero ζ of X. Moreover, 2k 1 1 − d(z ,z ) β(X,z) k+1 k ≤ 2 (cid:18) (cid:19) and d(ζ,z) σβ(X,z). ≤ 1.4 Previous work. There is quite a bit of previous work on such questions. The first to consider Newton’s method on a manifold is Rayleigh 1899 [26] who defined what we call today “Rayleigh Quotient Iteration” which is in fact a Newton iteration for a vector field on the sphere. Then, Shub 1986 [27] defined Newton’s method for the problem of finding the zeros of a vector field on a manifold and used retractions to send a neighborhood of the origin in the tangent space onto the manifold itself. In our paper we do not use general retractions but exponential maps. Independently of [27], Smith 1994 [37] developed an intrinsic Newton’s method and a conjugate gradient algorithm on a manifold using the exponential map. Also independently, Udriste 1994 [38] studied Newton’s method to find the zeros of a gradient vector field defined on a Riemannian manifold; Owren and Welfert 1996 [24] defined Newton’s iteration for solving the equation F(x) = 0 where F is a map from a Lie group to its corresponding Lie algebra; Edelman-Arias-Smith 1998[9]developedNewton’sandconjugategradientalgorithmsontheGrassmann and Stiefel manifolds. These authors define Newton’s method via the exponential mapaswedohere. Shub1993[28],ShubandSmale1993-1996[29], [30],[31],[32], [33], see also, Blum-Cucker-Shub-Smale 1998 [2], Malajovich 1994 [18], Dedieu and Shub 2000 [6] introduce and study Newton’s method on projective spaces 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.