ebook img

Simultaneous computation of the row and column rank profiles PDF

0.51 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Simultaneous computation of the row and column rank profiles

Simultaneous computation of the row and column rank profiles 3 Jean-Guillaume Dumas∗‡, Cl´ement Pernet†‡and Ziad Sultan∗†‡ 1 0 January 21, 2013 2 n a J Abstract 8 1 Gaussian elimination with full pivoting generates a PLUQ matrix de- composition. Dependingonthestrategyusedinthesearchforpivots,the ] permutation matrices can reveal some information about the row or the A column rank profiles of the matrix. We propose a new pivoting strategy N that makes it possible to recover at the same time both row and column . rank profiles of the input matrix and of any of its leading sub-matrices. s c Weproposearank-sensitiveandquad-recursivealgorithmthatcomputes [ the latter PLUQ triangular decomposition of an m×n matrix of rank r in O(cid:0)mnrω−2(cid:1) field operations, with ω the exponent of matrix multipli- 1 cation. Compared to the LEU decomposition by Malashonock, sharing a v similar recursive structure, its time complexity is rank sensitive and has 8 alowerleadingconstant. Overawordsizefinitefield,thisalgorithmalso 3 4 improveLsthepracticalefficiencyofpreviouslyknownimplementations. 4 . 1 1 Introduction 0 3 Triangularmatrixdecompositionisafundamentalbuildingblockincom- 1 putational linear algebra. It is used to solve linear systems, compute the : v rank,thedeterminant,thenullspaceortherowandcolumnrankprofiles i of a matrix. The LU decomposition, defined for matrices whose leading X principal minors are all nonsingular, can be generalized to arbitrary di- r mensions and ranks by introducing pivoting on sides, leading e.g. to the a LQUP decomposition of [6] or the PLUQ decomposition [5, 8]. Many al- gorithmic variants exist allowing fraction free computations [8], in-place computations [2, 7] or sub-cubic rank-sensitive time complexity [11, 7]. More precisely, the pivoting strategy reflected by the permutation matri- ces P and Q is the key difference between these PLUQ decompositions. ∗Universit´e de Grenoble. Laboratoire LJK, umr CNRS, INRIA, UJF, UPMF, GINP. 51, av. desMath´ematiques,F38041Grenoble,France. †Universit´e de Grenoble. Laboratoire LIG, umr CNRS, INRIA, UJF, UPMF, GINP. 51, av. J.Kuntzmann,F38330MontbonnotSt-Martin,France. ‡[email protected],[email protected],[email protected]. 1 In numerical linear algebra [5], pivoting is used to ensure a good numer- ical stability, good data locality, and reduce the fill-in. In the context of exactlinearalgebra,theroleofpivotingdiffers. Indeed,onlycertainpiv- otingstrategiesforthesedecompositionswillrevealtherankprofileofthe matrix. The latter is crucial in many applications using exact Gaussian elimination, such as Gro¨bner basis computations [4] and computational number theory [10]. Therow rank profileofanm×nmatrixwithrankr isalexicographi- callysmallestsequenceofr rowindicessuchthatthecorrespondingrows of the matrix are linearly independent. Similarly the column rank profile isalexicographicallysmallestsequenceofr columnindicessuchthatthe corresponding rows of the matrix are linearly independent. Thecommonstrategytocomputetherowrankprofileistosearchfor pivotsinarow-majorfashion: exploringthecurrentrow,thenmovingto thenextrowonlyifthecurrentrowiszero. SuchaPLUQdecomposition can be transformed into a CUP decomposition (where C =PL is in col- umnechelonform)andthefirstr valuesofthepermutationassociatedto P areexactlytherowrankprofile[7]. Ablockrecursivealgorithmcanbe derivedfromthisschemebysplittingtherowdimension[6]. Similarly,the columnrankprofilecanbeobtainedinacolumnmajorsearch: exploring the current column, and moving to the next column only if the current one is zero. The PLUQ decomposition can be transformed into a PLE decomposition (where E = UQ is in row echelon form) and the first r values of Q are exactly the column rank profile [7]. The corresponding block recursive algorithm uses a splitting of the column dimension. Recursive elimination algorithms splitting both row and column di- mensions include the TURBO algorithm [3] and the LEU decomposi- tion [9]. No connection is made to the computation of the rank profiles in any of them. The TURBO algorithm does not compute the lower tri- angular matrix L and performs five recursive calls. It therefore implies an arithmetic overhead compared to classic Gaussian elimination. The LEU decomposition aims at reducing the amount of permutations and therefore also uses many additional matrix products. As a consequence its time complexity is not rank-sensitive. WeproposehereapivotingstrategyfollowingaZ-curvestructureand workingonanincrementallygrowingleadingsub-matrix. Thisstrategyis firstusedinarecursivealgorithmsplittingbothrowsandcolumnswhich recoverssimultaneouslybothrowandcolumnrankprofiles. Moreover,the row and column rank profiles of any leading sub-matrix can be deduced from the P and Q permutations. We show that the arithmetic cost of this algorithm remains rank sensitive of the form O(mnrω−2) where ω is the exponent of matrix multiplication. The best currently known upper bound for ω is 2.3727 [12]. As for the CUP and PLE decompositions, thisPLUQdecompositioncanbecomputedin-place. Wealsoproposean iterative variant, to be used as a base-case. Compared to the CUP and PLE decompositions, this new algorithm has the following new salient features: • it computes simultaneously both rank profiles at the cost of one, • itpreservesthesquarenessofthematrixpassedtotherecursivecalls, 2 thusallowingmoreefficientuseofthematrixmultiplicationbuilding block, • it reduces the number of modular reductions in a finite field, • aCUPandaPLEdecompositionscanbeobtainedfromit,withrow and column permutations only. Compared to the LEU decomposition, • it is in-place, • itstimecomplexityboundisranksensitiveandhasabetterleading constant, • aLEUdecompositioncanbeobtainedfromit,withrowandcolumn permutations. In Section 2 we present the new block recursive algorithm. Section 3 shows the connection with the LEU decomposition and section 4 states themainpropertyaboutrankprofiles. Wethenanalyzethecomplexityof thenewalgorithmintermsofarithmeticoperations: firstweprovethatit isranksensitiveinSection5andsecondweshowinsection6that,overa finite field, it reduces the number of modular reductions when compared to state of the art techniques. We then propose an iterative variant in Section 7 to be used as a base-case to terminate the recursion before the dimensionsgettoosmall. Experimentscomparingcomputationtimeand cache efficiency are presented in section 8. 2 A recursive PLUQ algorithm We first recall the name of the main sub-routines being used: MM stands for matrix multiplication, TRSM for triangular system solving with matrix unknown (left and right variants are implicitly indicated by the param- eter list), PermC for matrix column permutation, PermR for matrix row permutation, etc. For instance, we will use: MM(C,A,B) to denote C ←C−AB, TRSM(U,B) for B ←U−1B with U upper triangular, TRSM(B,L) for B ←BL−1 with L lower triangular. We also denote by T the transposition of indices k and l and by L\U, k,l the storage of the two triangular matrices L and U one above the other. Furtherdetailsonthesesubroutinesandnotationscanbefoundin[7]. In block decompositions, we allow for zero dimensions. By convention, the productofanym×0matrixbyan0×nmatrixisthem×nzeromatrix. We now present the block recursive algorithm 1, computing a PLUQ decomposition. 3 Algorithm 1 PLUQ Input: A=(a ) a m×n matrix over a field ij Output: P,Q: m×m and n×n permutation matrices Output: r: the rank of A (cid:20) (cid:21) L\U V Output: A ← where L is r ×r unit lower triangular, U is r ×r M 0 upper triangular, and (cid:20) (cid:21) L (cid:2) (cid:3) A=P U V Q. M if m=1 then (cid:2) (cid:3) (cid:2) (cid:3) if A= 0 ... 0 then P ← 1 ,Q←I ,r ←0 n else i← the col. index of the first non zero elt. of A (cid:2) (cid:3) P ← 1 ;Q←T ,r ←1 1,i Swap a and a 1,i 1,1 end if Return (P,Q,r,A) end if if n=1 then (cid:2) (cid:3)T (cid:2) (cid:3) if A= 0 ... 0 then P ←I ;Q← 1 ,r ←0 m else i← the row index of the first non zero elt. of A (cid:2) (cid:3) P ← 1 ,Q←T ,r ←1 1,i Swap a and a i,1 1,1 for j =i+1...m do a ←a a−1 j,1 j,1 1,1 end for end if Return (P,Q,r,A) end if (cid:46) the trailing parts of the algorithm are shown on next pages Itisbasedonasplittingofthematrixinfourquadrants. Afirstrecur- sivecallisdoneontheupperleftquadrantfollowedbyaseriesofupdates. Then two recursive calls can be made on the anti-diagonal quadrants if the first quadrant exposed some rank deficiency. After a last series of updates, a fourth recursive call is done on the bottom right quadrant. Figure 1 illustrates the position of the blocks computed in the course of algorithm1,beforeandafterthefinalpermutationwithmatricesSandT. This framework differs from the one in [3] by the order in which the quadrants are treated, leading to only four recursive calls in this case insteadoffivein[3]. Wewillshowinsection4thatthisfacttogetherwith the special form of the block permutations S and T makes it possible to recoverrankprofileinformation. Thecorrectnessofalgorithm1isproven in appendix A. Remark 1. Algorithm 1 is in-place (as defined in [7, Definition 1]): all 4 (cid:20) (cid:21) A A (cid:46) Splitting A= 1 2 where A is (cid:98)m(cid:99)×(cid:98)n(cid:99). A A 1 2 2 3 4 (cid:20) (cid:21) Decompose A =P L1 (cid:2)U V (cid:3)Q (cid:46) PLUQ(A ) 1 1 M 1 1 1 1 1 (cid:20) (cid:21) B 1 ←PTA (cid:46) PermR(A ,PT) B 1 2 2 1 2 (cid:2)C C (cid:3)←A QT (cid:46) PermC(A ,QT) 1 2 3 1 3 1   L \U V B 1 1 1 1 Here A= M1 0 B2 . C C A 1 2 4 D ←L−1B (cid:46) TRSM(L ,B ) 1 1 1 1 E ←C U−1 (cid:46) TRSM(C ,U ) 1 1 1 1 F ←B −M D (cid:46) MM(B ,M ,D) 2 1 2 1 G←C −EV (cid:46) MM(C ,E,V ) 2 1 2 1 H ←A −ED (cid:46) MM(A ,E,D) 4 4   L \U V D 1 1 1 Here A= M1 0 F . E G H (cid:20) (cid:21) Decompose F =P L2 (cid:2)U V (cid:3)Q (cid:46) PLUQ(F) 2 M 2 2 2 2 (cid:20) (cid:21) Decompose G=P L3 (cid:2)U V (cid:3)Q (cid:46) PLUQ(G) 3 M 3 3 3 3 (cid:20) (cid:21) H H 1 2 ←PTHQT (cid:46) PermR(H,PT);PermC(H,QT) H H 3 2 3 2 3 4 (cid:20) (cid:21) E 1 ←PTE (cid:46) PermR(E,PT) E 3 3 2 (cid:20) (cid:21) M 11 ←PTM (cid:46) PermR(M ,PT) M 2 1 1 2 12 (cid:2)D D (cid:3)←DQT (cid:46) PermR(D,QT) 1 2 2 2 (cid:2)V V (cid:3)←V QT (cid:46) PermR(V ,QT) 11 12 1 3 1 3 5   L \U V V D D 1 1 11 12 1 2  M11 0 0 L2\U2 V2    Here A= M12 0 0 M2 0 .    E1 L3\U3 V3 H1 H2  E M 0 H H 2 3 3 4 I ←H U−1 (cid:46) TRSM(H ,U ) 1 2 1 2 J ←L−1I (cid:46) TRSM(L ,I) 3 3 K ←H U−1 (cid:46) TRSM(H ,U ) 3 2 3 2 N ←L−1H (cid:46) TRSM(L ,H ) 3 2 3 2 O ←N −JV (cid:46) MM(N,J,V ) 2 2 R←H −KV −M O (cid:46) MM(H ,K,V );MM(H ,M ,O) 4 2 3 4 2 4 3 (cid:20) (cid:21) Decompose R=P L4 (cid:2)U V (cid:3)Q (cid:46) PLUQ(R) 4 M 4 4 4 4 (cid:20) (cid:21) E21 M31 0 K1 ←PT (cid:2)E M 0 K(cid:3) (cid:46) PermR E M 0 K 4 2 3 22 32 2     D D D 21 22 2 V21 V22←V2QT (cid:46) PermC  0 0   0  4 O O O 1 2   L \U V V D D D 1 1 11 12 1 21 22  M11 0 0 L2\U2 V21 V22    Here A= M12 0 0 M2 0 0 .  E1 L3\U3 V3 I O1 O2     E21 M31 0 K1 L4\U4 V4  E M 0 K M 0 22 32 2 4   I r1+r2 S ← Ik−r1−r2   Ir3+r4  I  m−k−r3−r4  I r1  Ir2  T ← Ir3   Ir4   Ik−r1−r3  I (cid:20) (cid:21) (cid:20) (cid:21)n−k−r2−r4 I I P ←Diag(P r1 ,P r3 )S 1 P 3 P 2 4 (cid:20) (cid:21) (cid:20) (cid:21) I I Q←TDiag( r1 Q , r2 Q ) Q 1 Q 2 3 4 A←STATT (cid:46) PermR(A,ST);PermC(A,TT)   L \U D V D V D 1 1 1 11 21 12 22  M11 L2\U2 0 V21 0 V22   Here A= E1 I L3\U3 O1 V3 O2  E21 K1 M31 L4\U4 0 V4     M12 M2 0 0 0 0  E K M M 0 0 22 2 32 4 Return (P,Q,r +r +r +r ,A) 1 2 3 4 6 Figure 1: Block recursive Z-curve PLUQ decomposition and final block permu- tation. operationsoftheTRSM,MM,PermC,PermRsubroutinesworkwithO(1)extra memory allocations except possibly in the course of fast matrix multipli- cations. The only constraint is for the computation of J ← L−1I which 3 wouldoverwritethematrixI thatshouldbekeptforthefinaloutput. Hence acopyofI hastobestoredforthecomputationofJ. ThematrixI hasdi- mensionr ×r andcanbestoredtransposedinthezeroblockoftheupper 3 2 left quadrant (of dimension (m −r )×(n −r ), as shown on Figure 1). 2 1 2 1 3 From PLUQ to LEU We now show how to compute the LEU decomposition of [9] from the PLUQ decomposition. The idea is to write (cid:20) (cid:21) (cid:20) (cid:21) (cid:20) (cid:21) (cid:20) (cid:21) P L (cid:2)UV(cid:3)Q=P L 0 PT P Ir QQT U V Q M MI 0 I m−r n−r (cid:124) (cid:123)(cid:122) (cid:125)(cid:124) (cid:123)(cid:122) (cid:125)(cid:124) (cid:123)(cid:122) (cid:125) L E U andshowthatLandU arerespectivelyloweranduppertriangular. This is not true in general, but turns out to be satisfied by the P,L,M,U,V and Q obtained in algorithm 1. (cid:20) (cid:21) L (cid:2) (cid:3) Theorem 1. Let A = P UV Q be the PLUQ decomposition com- M puted by algorithm 1. Then for any unit lower triangular matrix Y and (cid:20) (cid:21) L any upper triangular matrix Z, the matrix P PT is unit lower tri- MY (cid:20) (cid:21) UV angular and QT Q is upper triangular. Z Proof. Proceedingbyinduction,weassumethatthetheoremistrueonall four recursive calls, and show that it is true for the matrices P[L ]PT (cid:20) (cid:21) M Y Y and QT [U V ]Q. Let Y = 1 where Y is unit lower triangular of Z Y Y 1 2 3 7 dimension k − r − r . From the correctness of algorithm 1 (see e.g. 1 2  L  1 M L  11 2  (cid:20)L (cid:21) M M Y  Equation A), S ST = 12 2 1  MY E I L   1 3  E K M L  21 1 31 4 E K Y M M Y 22 2 2 32 4 3 (cid:20) (cid:21) L Hence P PT equals MY  L  1 I  M L (cid:20)P (cid:21) r1P M11M2Y  1  2  12 2 1 × P  I E I L  3 r3  1 3  P4 E21K1 M31L4  E K Y M M Y 22 2 2 32 4 3 I  r1PT (cid:20)PT (cid:21)  2  1  Ir3  P3T PT 4 (cid:20) (cid:21) (cid:20) (cid:21) L L Byinductionhypothesis,thematricesL =P 2 PT,L =P 4 PT 2 2 M Y 2 4 4 M Y 4 2 1 4 3 (cid:20) (cid:21) (cid:20) (cid:21) L L , P 1 PT and P 3 PT are unit lower triangular. Therefore 1 M L 1 3 M L 3 1 2 3 4 the matrix P[L ]PT is also unit lower triangular. M Y (cid:20) (cid:21) Z Z Similarly, let Z = 1 2 where Z is upper triangular of dimension Z 1 3 (cid:20) (cid:21) UV k−r −r . The matrix TT T equals 1 2 Z U V V D D D  U V V D D D  1 11 12 1 21 22 1 11 12 1 21 22 0 0 U V V U V O O  2 21 22  3 3 1 2  U V 0 O O   Z Z  TT  3 3 1 2= 1 2   0 U V   0 0 U V V   4 4   2 21 22  Z1 Z2   0 U4 V4  Z Z 3 3 (cid:20) (cid:21) UV Hence QT Q equals Z U V V D D D  1 11 12 1 21 22 (cid:20)QT (cid:21)Ir1QT  U3 ZV3 O1 OZ2 1  3  1 2 × QT2  Ir2  0 0 U2V21V22 QT4  0 U4 V4  Z 3 I  r1 (cid:20) (cid:21) Q Q  3  1 .  I  Q r2 2 Q 4 8 (cid:20) (cid:21) (cid:20) (cid:21) U V U V Byinductionhypothesis,thematricesU =QT 3 3 Q ,U =QT 4 4 PT 3 3 Z 3 4 4 Z 4 1 3 (cid:20) (cid:21) (cid:20) (cid:21) U V U V , QT 1 1 Q and QT 2 2 Q are upper triangular. Consequently 1 U 1 2 U 2 3 4 the matrix QT [U V ]Q is upper triangular. Z Forthebasecasewithm=1. ThematrixLhasdimension1×1and is unit lower triangular. If r = 0, then U = ITZI is upper triangular. n n If r = 1, then Q = T where i is the column index of the pivot and is 1,i (cid:2) (cid:3) therefore the column index of the leading coefficient of the row UV Q. Applying QT on the left only swaps rows 1 and i, hence row (cid:2)UV(cid:3)Q is (cid:20) (cid:21) UV the ith row of QT Q. The latter is therefore upper triangular. The Z same reasoning can be applied to the case n=1. (cid:20) (cid:21) (cid:20) (cid:21) (cid:20) (cid:21) L I UV Corrolary1. LetL=P PT,E =P r QandU =QT Q. MI 0 0 m−r Then A=LEU is a LEU decomposition of A. Remark2. Theconverseisnotalwayspossible: givenA=L,E,U,there are several ways to choose the last m−r columns of P and the last n−r rows of Q. The LEU algorithm does not keep track of these parts of the permutations. 4 Computing the rank profiles We prove here the main feature of the PLUQ decomposition computed by algorithm 1: it reveals the row and column rank profiles of all leading sub-matrices of the input matrix. We recall in Lemma 1 basic properties verified by the rank profiles. Lemma 1. For any matrix, 1. the row rank profile is preserved by right multiplication with an in- vertible matrix and by left multiplication with an invertible upper triangular matrix. 2. the column rank profile is preserved by left multiplication with an invertiblematrixand byrightmultiplicationwithaninvertible lower triangular matrix. Lemma 2. Let A = PLUQ be the PLUQ decomposition computed by al- gorithm 1. Then the row (resp. column) rank profile of any leading (k,t) submatrix of A is the row (resp. column) rank profile of the leading (k,t) (cid:20) (cid:21) I submatrix of P r Q. 0 Proof. With the notations of corollary 1, we have: (cid:20) (cid:21)(cid:20) (cid:21)(cid:20) (cid:21) (cid:20) (cid:21) L I U V I A=P r Q=LP r QU MI 0 I 0 m−r n−r Hence (cid:20) (cid:21) (cid:20) (cid:21) (cid:2)I 0(cid:3)A It =L (cid:2)I 0(cid:3)P Ir QU , k 0 1 k 0 1 9 where L is the k×k leading submatrix of L (hence it is an invertible 1 lowertriangularmatrix)andU isthet×tleadingsubmatrixofU (hence 1 it is an invertible upper triangular matrix). Now, Lemma 1 implies that (cid:20) (cid:21) (cid:20) (cid:21) (cid:20) (cid:21) the rank profile of (cid:2)I 0(cid:3)A It is that of (cid:2)I 0(cid:3)P Ir Q It . k 0 k 0 0 Fromthislemmawededucehowtocomputetherowandcolumnrank profilesofany(k,t)leadingsubmatrixandmoreparticularlyofthematrix A itself. Corrolary 2. Let A = PLUQ be the PLUQ decomposition of a m×n matrix computed by algorithm 1. The row (resp. column) rank profile of any(k,t)-leadingsubmatrixofaAisthesortedsequenceoftherow(resp. column) indices of the non zero rows (resp. columns) in the matrix (cid:20) (cid:21) (cid:20) (cid:21) R=(cid:2)I 0(cid:3)P Ir Q It k 0 0 Corrolary 3. The row (resp. column) rank profile of A is the sorted se- quenceofrow(resp. column)indicesofthenonzerorows(resp. columns) of the first r columns of P (resp. first r rows of Q). 5 Complexity analysis Westudyherethetimecomplexityofalgorihtm1bycountingthenumber of field operations. For the sake of simplicity, we will assume here that the dimensions m and n are powers of two. The analysis can easily be extended to the general case for arbitrary m and n. For i=1,2,3,4 we denote by T the cost of the i-th recursive call to i PLUQ, on a m × n matrix of rank r . We also denote by T (m,n) the 2 2 i TRSM cost of a call TRSM on a rectangular matrix of dimensions m×n, and by T (m,k,n) the cost of multiplying an m×k by an k×n matrix. MM Theorem 2. Algorithm 1, run on an m×n matrix of rank r, performs O(cid:0)mnrω−2(cid:1) field operations. Proof. Let T = T (m,n,r) be the cost of algorithm 1 run on a m×n PLUQ matrix of rank r. From the complexities of the subroutines given, e.g., 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.