ebook img

The Jordan Normal Form PDF

12 Pages·2011·0.234 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview The Jordan Normal Form

The Jordan Normal Form Erik WahlØn ODE Spring 2011 Introduction The purpose of these notes is to present a proof of the Jordan normal form (also called the Jordan canonical form) for a square matrix. Even if a matrix is real its Jordan normal form might be complex and we shall therefore allow all matrices to be complex. For real matrices there is, however, a variant of the Jordan normal form which is real (cid:21) see the remarks in Teschl, p. 60. The result we want to prove is the following. Theorem 1. Let A be an n×n matrix. There exists an invertible n×n matrix T such that T−1AT = J, where J is a block matrix, Ü ê J 1 J = ... J m and each block J is a square matrix of the form i â ì λ 1 0 ... ... J = λI +N = , (1) i ... 1 0 λ where λ is an eigenvalue of A, I is a unit matrix and N has ones on the line directly above the diagonal and zeros everywhere else. If we identify A with the linear operator x (cid:55)→ Ax on Cn, the relation J = T−1AT means that J is the matrix for A in the basis consisting of the columns of T. The theorem says that there exists a basis for Cn in which the linear operator A has the matrix J. When proving a result in linear algebra it is often more convenient to work with linear operators on vector spaces rather their matrix representations in some basis. Throughout the rest of the notes we shall therefore assume that V is an n-dimensional complex vector space and that A: V → V is a linear operator on V. 1 Recall that the kernel (or null space) of A is de(cid:28)ned by kerA = {x ∈ V : Ax = 0} and that the range of A is de(cid:28)ned by rangeA = {Ax: x ∈ V}. ThekernelandtherangearebothlinearsubspacesofV andthedimensiontheorem says that dimkerA+dimrangeA = n. Recall also that λ ∈ C is called an eigenvalue of A if there exists some vector x (cid:54)= 0 in V such that Ax = λx. The vector x is called an eigenvector of A corresponding to the eigenvector λ. The subspace ker(A − λI) of V, that is, the subspace spanned by the eigenvec- tors belonging to λ, is called the eigenspace corresponding to λ. The number dimker(A−λI) is called the geometric multiplicity of λ. Note that λ ∈ C is an eigenvalue if and only if it is a root of the characteristic polynomial pchar(z) = det(A − zI). By the fundamental theorem of algebra we can write pchar(z) as a product of (cid:28)rst degree polynomials, pchar(z) = (−1)n(z −λ1)a1(z −λ2)a2···(z −λk)ak, where λ ,...,λ are the distinct eigenvalues of A. The positive integer a is 1 k j called the algebraic multiplicity of the eigenvalue λ . The corresponding geometric j multiplicity will be denoted g . j Decomposition into Invariant Subspaces We begin with some de(cid:28)nitions. Let V ,...,V be subspaces of V. We say that 1 k V is the direct sum of V ,...,V if each vector x ∈ V can be written in a unique 1 k way as x = x +x +···+x , where x ∈ V , j = 1,...,k. 1 2 k j j If this is the case we use the notation V = V ⊕V ⊕···⊕V . 1 2 k We say that a subspace W of V is invariant under A if x ∈ W ⇒ Ax ∈ W. 2 Example 1. Suppose that A has n distinct eigenvalues λ ,...,λ with corre- 1 n sponding eigenvectors u ,...,u . It then follows that the vectors u ,...,u are 1 n 1 n linearly independent and thus form a basis for V. Let ker(A−λ I) = {zu : z ∈ C}, k = 1,...,n, k k be the corresponding eigenspaces. Each eigenspace is invariant under A since (A−λ I)u = 0 ⇒ (A−λ I)Au = A(A−λ I)u = 0. k k k Moreover, V = ker(A−λ I)⊕ker(A−λ I)⊕···⊕ker(A−λ I) 1 2 n by the de(cid:28)nition of a basis. More generally, suppose that A has k distinct eigenvalues λ ,...,λ and that 1 k the geometric multiplicity g of each λ equals the algebraic multiplicity a . Let j j j ker(A − λ I), j = 1,...,k, be the corresponding eigenspaces. We can then (cid:28)nd j a basis for each eigenspace consisting of g eigenvectors. The union of these bases j consists of g +···+g = a +···+a = n elements and is linearly independent, 1 k 1 k since eigenvectors belonging to di(cid:27)erent eigenvalues are linearly independent. We thus obtain a basis for V and it follows that V = ker(A−λ I)⊕ker(A−λ I)⊕···⊕ker(A−λ I). 1 2 k In this basis, A has the matrix Ü ê λ I 1 1 D = ... λ I k k where each I is a g × g unit matrix. In other words, D is a diagonal matrix j j j with the eigenvalues on the diagonal, each repeated g times. One says that A is j diagonalized in the new basis. Unfortunately, not all matrices can be diagonalized. Example 2. Consider the matrix Ç å 2 1 A = . 0 2 The characteristic polynomial is (λ − 2)2 so the only eigenvalue is λ = 2 with algebraic multiplicity a = 2. On the other hand Ç åÇ å Ç å 0 1 x x (A−2I)x = 1 = 2 0 0 x 0 2 so that g = 1 and the eigenspace is ker(A − 2I) = {(z,0): z ∈ C}. Clearly we cannot write C2 as a direct sum of the eigenspaces in this case. Note, however, that Ç å 0 0 (A−2I)2 = 0 0 so that C2 = ker(A−2I)2. 3 Given a polynomial p(z) = α zm +α zm−1 +···+α z +α , we de(cid:28)ne m m−1 1 0 p(A) = α Am +α Am−1 +···+α A+α I. m m−1 1 0 Lemma 1. There exists a non-zero polynomial p such that p(A) = 0. Proof. Here it is convenient to identify A with its matrix in some basis. Note that Cn×n is an n2-dimensional vector space. It follows that the n2 +1 matrices I,A,A2,...,An2 are linearly dependent. But this means that there exist numbers α ,...,α , not all zero, such that 0 n2 α An2 +α An2−1 +···+α A+a I = 0, n2 n2−1 1 0 that is, p(A) = 0, where p(z) = α zn2 +···+α z +α . n2 1 0 Letpmin(z)beamonicpolynomial(withleadingcoe(cid:30)cient1)ofminimaldegree such that pmin(A) = 0. If p(z) is any polynomial such thatp(A) = 0 it follows that p(z) = q(z)pmin(z) for some polynomial q. To see this, use the division algorithm on p and pmin: p(z) = q(z)pmin(z)+r(z), where r = 0 or degr < degpmin. Thus r(A) = p(A)−q(A)pmin(A) = 0. But this implies that r(z) = 0, since pmin has minimal degree. This shows that the polynomial pmin is unique. It is called the minimal polynomial for A. By the fundamental theorem of algebra, we can write the minimal polynomial as a product of (cid:28)rst degree polynomials, pmin(z) = (z −λ1)m1(z −λ2)m2···(z −λk)mk, (2) where the numbers λ are distinct and each m ≥ 1. Note that we don’t know j j that the roots λ of the minimal polynomial coincide with the eigenvalues of A j yet. This will be shown in Theorem 2 below. Lemma 2. Suppose that p(z) = p (z)p (z) where p and p are relatively prime. 1 2 1 2 If p(A) = 0 we have that V = kerp (A)⊕kerp (A) 1 2 and each subspace kerp (A) is invariant under A. j Proof. The invariance follows from p (A)Ax = Ap (A)x = 0, x ∈ kerp (A). j j j Since p and p are relatively prime, it follow by Euclid’s algorithm that there 1 2 exist polynomials q ,q such that 1 2 p (z)q (z)+p (z)q (z) = 1. 1 1 2 2 Thus p (A)q (A)+p (A)q (A) = I. 1 1 2 2 4 Applying this identity to the vector x ∈ V, we obtain x = p (A)q (A)x+p (A)q (A)x, 1 1 2 2 (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) x2 x1 where p (A)x = p (A)p (A)q (A)x = p(A)q (A)x = 0, 2 2 2 1 1 1 so that x ∈ kerp (A). Similarly x ∈ kerp (A). Thus V = kerp (A)+kerp (A). 2 2 1 1 1 2 On the other hand, if x +x = x(cid:48) +x(cid:48), x ,x(cid:48) ∈ kerp (A),j = 1,2, 1 2 1 2 j j j we obtain that u = x −x(cid:48) = x(cid:48) −x ∈ kerp (A)∩kerp (A), 1 1 2 2 1 2 so that u = q (A)p (A)u+q (A)p (A)u = 0. 1 1 2 2 It follows that the representation x = x +x is unique and therefore 1 2 V = kerp (A)⊕kerp (A). 1 2 Theorem 2. With λ ,...,λ and m ,...,m as in (2) we have 1 k 1 k V = ker(A−λ I)m1 ⊕···⊕ker(A−λ I)mk, 1 k where each ker(A−λ I)mj is invariant under A. The numbers λ ,...,λ are the j 1 k eigenvalues of A. Proof. We begin by noting that the polynomials (z − λ )mj, j = 1,...,k, are j relatively prime. Repeated application of Lemma 2 therefore shows that V = ker(A−λ I)m1 ⊕···⊕ker(A−λ I)mk, 1 k with each ker(A−λ I)mj invariant. j Consider the linear operator A: ker(A−λ I)mj → ker(A−λ I)mj. It is clear j j that ker(A−λjI)mj (cid:54)= {0}, for otherwise pmin would not be minimal. Since every linear operator on a (non-trivial) (cid:28)nite dimensional complex vector space has an eigenvalue, it follows that there is some non-zero element u ∈ ker(A−λ I)mj with j Au = λu, λ ∈ C. But then 0 = (A−λ I)mju = (λ−λ )mju, j j so λ = λ . This shows that the roots λ of the minimal polynomial are eigenvalues j j ofA. OntheotherhandifuisaneigenvectorofAcorrespondingtotheeigenvalue λ, we have 0 = pmin(A)u = (A−λ1I)m1···(A−λkI)mku = (λ−λ1)m1···(λ−λk)mku, so λ = λ for some j, that is, every eigenvalue is a root of the minimal polynomial. j 5 Thesubspaceker(A−λ I)mj iscalledthegeneralized eigenspacecorresponding j to λ and a non-zero vector x ∈ ker(A−λ I)mj is called a generalized eigenvector. j j The number m is the smallest exponent m such that (A − λ I)m vanishes on j j ker(A−λ I)mj. Suppose for a contradiction that e.g. (A−λ I)m1−1u = 0 for all j 1 u ∈ ker(A−λ I)m1. Writing x ∈ V as x = x +x˜ according to the decomposition 1 1 V = ker(A−λ I)m1 ⊕kerp˜(A), 1 where p˜(z) = (z −λ )m2···(z −λ )mk, we would then obtain that 2 k (A−λ I)m1−1p˜(A)x = p˜(A)(A−λ I)m1−1x +(A−λ I)m1−1p˜(A)x˜ = 0, 1 1 1 1 contradicting the de(cid:28)nition of the minimal polynomial. If we select a basis {u ,...,u } for each generalized eigenspace, then the j,1 j,nj union{u ,...,u ,u ,...,u ,...,u ,...,u }willbeabasisforV. Since 1,1 1,n1 2,1 2,n2 k,1 k,nk each generalized eigenspace is invariant under the linear operator A, the matrix for A in this basis will have the block form Ü ê A 1 ... , A k whereeachA isan ×n squarematrix. WhatremainsinordertoproveTheorem j j j 1 is to show that we can select a basis for each generalized eigenspace so that each block A takes the form (1) or possibly consists of multiple blocks of the form (1). j Proof of Theorem 1 By restricting A to a generalized eigenspace ker(A−λ I)mj, we can assume that j A only has one eigenvalue, which we call λ. Set N = A−λI and let m be the smallest integer for which Nm = 0 (so that pmin(z) = (z −λ)m for A). A linear operator N with the property that Nm = 0 for some m is called nilpotent. Suppose that m = n (the dimension of V ). This means that there is some vector u such that Nn−1u (cid:54)= 0. It follows that the vectors u,Nu,...,Nn−1u are linearly independent. Indeed, suppose that α u+α Nu+···+α Nn−1u = 0. 1 2 n Applying Nn−1 to this equation we obtain that α Nn−1u = 0. Proceeding induc- 1 tively we (cid:28)nd that α = 0 for each j. Thus {Nn−1u,...,Nu,u} is a basis for V. j The matrix for N in this basis is â ì 0 1 0 ... ... , ... 1 0 0 6 which means that we are done. In general, a set of non-zero vectors u,...,Nl−1u, with Nlu = 0 is called a Jordan chain. We will prove the theorem in general by showing that there is a basis for V consisting of Jordan chains. WeprovethetheorembyinductiononthedimensionofV. Clearlythetheorem holds if V has dimension 1. Suppose now that the theorem holds for all complex vector spaces of dimension less than n, where n ≥ 2, and assume that dimV = n. Since N is nilpotent it is not injective and therefore dimrangeN < n (by the dimension theorem). By the induction hypothesis, we can therefore (cid:28)nd a basis of Jordan chains u ,Nu ,...,Nli−1u , i = 1,...,k, i i i for rangeN. For each u we can (cid:28)nd a v ∈ V such that Nv = u (since i i i i u ∈ rangeN). That is, each Jordan chain in the basis for rangeN can be i extended by one element. We claim that the vectors v ,Nv ,N2v ,...,Nliv , i = 1,...,k, (3) i i i i are linearly independent. Indeed, suppose that (cid:88)k (cid:88)li α Njv = 0. (4) i,j i i=1j=0 Applying N to this equality, we (cid:28)nd that (cid:88)k l(cid:88)i−1α Nju = (cid:88)k (cid:88)li α Nj+1v = 0, i,j i i,j i i=1 j=0 i=1j=0 which, by hypothesis implies that α = 0, 1 ≤ i ≤ k, 0 ≤ j ≤ l −1. Looking at i,j i (4) this means that k k (cid:88)α Nli−1u = (cid:88)α Nliv = 0, i,li i i,li i i=1 i=1 which again implies that α = 0, 1 ≤ i ≤ k, by our induction hypothesis. i,li Extendthevectorsin(3)toabasisforV bypossiblyaddingvectors{w˜ ,...,w˜ }. 1 K For each i we have Nw˜ ∈ rangeN, so we can (cid:28)nd an element wˆ in the span of i i the vectors in (3) such that Nw˜ = Nwˆ . But then w = w˜ −wˆ ∈ kerN and i i i i i the vectors v ,Nv ,N2v ,...,Nliv , i = 1,...,k, w ,...,w i i i i 1 K constitute a basis for V consisting of Jordan chains (the elements w are chains of i length 1). 7 Some Further Remarks The matrix J is not completely unique, since we can e.g. change the order of the Jordan blocks. It turns out that this is the only thing which is not unique. In other words both the number of blocks and their sizes are uniquely determined. Let us prove this. As in the previous section, it su(cid:30)ces to consider a nilpotent operator N: V → V. Let β be the total number of blocks and β(k) the number of blocks of size k×k. Then dimkerN = β, and dimkerN2 di(cid:27)ers from dimkerN by β −β(1). In the same manner, we (cid:28)nd that dimkerN = β, dimkerN2 = dimkerN +β −β(1), . . . dimkerNk+1 = dimkerNk +β −β(1)−···−β(k). It follows by induction that each β(k) is uniquely determined by N. Note that the number of Jordan blocks in the matrix J equals the number of Jordan chains, so that there may be several Jordan blocks corresponding to the same eigenvalue. The sum of the lengths of the Jordan chains equals the dimension of the generalized eigenspace. Let pchar(z) = det(A−zI) be the characteristic polynomial of A. Recall that pchar is independent of basis, so that pchar(z) = det(J−zI). Expanding repeatedly along the (cid:28)rst column we (cid:28)nd that pchar(z) = (−1)n(z−λ1)n1···(z−λk)nk, where n = dim(A−λ I)mj is the dimension of the generalized eigenspace corresponding j j to λ . Thus n = a , the algebraic multiplicity of λ . By the remarks above about j j j j the uniqueness of J, it follows that the geometric multiplicity g of each eigenvalue j equals the number of Jordan chains for that eigenvalue. The exponent m of the factor (z − λ )mj in the minimal polynomial is the j j smallest exponent m such that Nm = 0, where N = (A − λjI)|ker(A−λjI)mj. Thus m is the length of the longest Jordan chain and m × m the size of the j j j largest Jordan block. Clearly, m ≤ dimker(A−λ I)mj = a . Thus the minimal j j j polynomial divides the characteristic polynomial. Since pmin(A) = 0 we have proved the following result. Theorem 3 (Cayley-Hamilton). Let pchar(z) = det(A−zI) be the characteristic polynomial of A. Then p (A) = 0. char Example 3. Let Ö è 1 0 1 A = 0 2 0 . −1 0 −1 The characteristic polynomial of A is pchar(z) = −z2(z−2). Thus, A has the only eigenvalues λ = 0 and λ = 2 with algebraic multiplicities a = 2 and a = 1, 1 2 1 2 8 respectively. The minimal polynomial must be z(z−2) or z2(z−2), since it divides pchar(z) and is divisible by z −λj for each j. We (cid:28)nd that Ö è Ö è −1 0 1 −2 0 −2 A−2I = 0 0 0 , A(A−2I) = 0 0 0 −1 0 −3 2 0 2 and A2(A−2I) = 0, so that pmin(z) = −pchar(z) = z2(z −2). This means that a basis of generalized eigenvectors must consist of one Jordan chain of length 2 corresponding to the eigenvalue λ and one of length 1 corresponding to λ . We 1 2 can also conclude that the Jordan normal form is Ö è 0 1 0 J = 0 0 0 0 0 2 and that g = g = 1. This can also be seen from the computations 1 2 Ax = 0 ⇐⇒ x = z(1,0,−1), Ax = 2x ⇐⇒ x = z(0,1,0), z ∈ C. Thus u = (1,0,−1) and u = (0,1,0) are eigenvectors corresponding to 1 3 λ and λ , respectively. We obtain a basis of generalized eigenvectors by solving 1 2 the equation Au = u . Note that this equation must be solvable, since there has 2 1 to be a Jordan chain of length 2 corresponding to u . We (cid:28)nd that u = (1,0,0) 1 2 is a solution. We therefore (cid:28)nd that T−1AT = J, where Ö è 1 1 0 T = 0 0 1 . −1 0 0 Example 4. Let Ö è 3 1 −1 A = 0 2 0 . 1 1 1 The characteristic polynomial of A is pchar(z) = −(z −2)3. Thus, A has the only eigenvalue 2 with algebraic multiplicity 3. The generalized eigenspace is the whole of C3. Moreover, the minimal polynomial must be z −2, (z −2)2 or (z −2)3. We see that Ö è Ö è 1 1 −1 0 0 0 A−2I = 0 0 0 , (A−2I)2 = 0 0 0 , 1 1 −1 0 0 0 so that pmin(z) = (z − 2)2. This means that a basis of generalized eigenvectors must consist of one Jordan chain of length 2 and one of length 1 (an eigenvector). We can also conclude that the Jordan normal form is Ö è 2 1 0 J = 0 2 0 0 0 2 9 and that the geometric multiplicity is 2. This can also be seen from the computa- tion Ax = 2x ⇐⇒ x +x −x = 0. 1 2 3 Contrary to the previous example, we cannot (cid:28)nd a basis of generalized eigenvec- tors by starting with an arbitrary basis of ker(A−2I). Instead, we (cid:28)rst proceed as in the proof of the Jordan normal form. Notice that range(A−2I) is spanned by the vector u = (1,0,1). By the form of the minimal polynomial, we conclude that 1 u is an eigenvector. Next, we (cid:28)nd a solution of the equation (A−2I)u = u , 1 2 1 e.g. u = (1,0,0). Finally, we add an eigenvector which is not parallel to u , e.g. 2 1 u = (0,0,1). Setting 3 Ö è 1 1 0 T = 0 0 1 , 1 0 1 we have T−1AT = J. The Matrix Exponential Recall that the unique solution of the initial value problem (cid:40)x(cid:48) = Ax, x(0) = x , 0 is given by x(t) = etAx . 0 If J is the normal form of A and A = TAT−1, we obtain that etA = TetJT−1, (5) where Ü ê etJ1 etJ = ... , etJk and Ç å tmi−1 etJi = eλit I +tN +···+ Nmi−1 . (m −1)! i Here we don’t require that the λ are distinct. In general, the solution of the i initial value problem will be a sum of terms of the form tjeλit. If A has a basis of eigenvectors, there will only be terms of the form eλit. While we now have an algorithm for computing the matrix exponential, it involves (cid:28)nding the generalized eigenvectors of A and in the end one also has to invert the matrix T. There are number of alternative ways of computing the matrix exponential which avoid the Jordan normal form, most of which are based on the Cayley-Hamilton theorem. Note that we should be able to express etA = (cid:80)∞ tjAj for each t as a linear combination of I,...,An−1, since the Cayley- j=0 j! Hamilton theorem allows us to express any higher power of A in terms of these matrices. We leave the proof of the following theorem as an exercise. 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.