ebook img

Faster Inversion and Other Black Box Matrix Computations Using Efficient Block Projections PDF

0.16 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Faster Inversion and Other Black Box Matrix Computations Using Efficient Block Projections

Faster Inversion and Other Black Box Matrix Computations Using Efficient Block Projections Wayne Eberly1, Mark Giesbrecht2, Pascal Giorgi2,4, Arne Storjohann2, Gilles Villard3 (1)DepartmentofComputerScience,U.Calgary http://pages.cpsc.ucalgary.ca/∼eberly (2)DavidR.CheritonSchoolofComputerScience,U.Waterloo http://www.uwaterloo.ca/∼ mwg,pgiorgi,astorjoh 7 { } 0 (3)CNRS,LIP,E´coleNormaleSupe´rieuredeLyon 0 http://perso.ens-lyon.fr/gilles.villard 2 (4)IUT-Universite´ dePerpignan n http://webdali.univ-perp.fr/∼pgiorgi a J ABSTRACT O˜(n2.66) machine operations. More general bounds involv- 9 ing the number of black-box matrix operations to be used 2 Efficient blockprojections ofnon-singularmatriceshavere- are also obtained. cently beenused,in [11], toobtain an efficient algorithm to The derived algorithms are all probabilistic of the Las ] findrationalsolutionsforsparsesystemsoflinearequations. C In particular a bound of O˜(n2.5) machine operations is ob- Vegastype. Thatis,theyareassumedtobeabletogenerate random elements from the field at unit cost, and always S tainedforthiscomputationassumingthattheinputmatrix output thecorrect answer in theexpected time given. . can be multiplied by a vector with constant-sized entries s using O˜(n) machine operations. Somewhat more general c [ boundsforblack-boxmatrixcomputationsarealsoderived. 1. INTRODUCTION Unfortunately,thecorrectnessofthisalgorithm dependson 1 the existence of efficient block projections of non-singular In our paper [11] we presented an algorithm which pur- v portedlysolvedasparsesystemofrationalequationsconsid- matrices and this has been conjectured but not proved. 8 erably more efficiently than standard linear equations solv- In this paper we establish the correctness of the algo- 8 ing. Unfortunately, its effectiveness in all cases was conjec- rithm from [11] by proving the existence of efficient block 1 tural, even as its complexity and actual performance were projections for arbitrary non-singular matrices over suffi- 1 very appealing. This effectiveness relied on a conjecture re- cientlylargefields. Wefurtherdemonstratetheusefulnessof 0 gardingtheexistenceofso-calledefficientblockprojections. theseprojectionsbyincorporatingthemintoexistingblack- 7 GivenamatrixA Fn×n overanyfieldF,theseprojections 0 box matrix algorithms to derive improved bounds for the shouldbeblockve∈ctorsu Fn×s (wheresisablockingfac- / cost of several matrix problems — considering, in particu- ∈ s lar, “sparse”matricesthatcanbebemultipliedbyavector tordividingn, so n=ms)such thatwe can computeuv or c vtuquicklyforanyv Fn×s,andsuchthatthesequenceof : using O˜(n) field operations: We show how to compute the vectors u, Au, ..., Am∈−1u hasrank n. v dense inverse of a sparse matrix over any field using an ex- i pectednumberofO˜(n2.27)operationsinthatfield. Abasis In this paper, we prove the existence of a class of such X efficient block projections for non-singular n n matrices forthenullspaceofasparsematrix—andacertificationof × r itsrank—areobtainedatthesamecost. Anapplicationof over sufficiently large fields — we require that the size of a the field F exceed n(n + 1). This implies our algorithm this techniquetoKaltofen and Villard’s Baby-Steps/Giant- from[11]forfindingthesolutiontoA−1bfora“sparse”sys- StepsalgorithmsforthedeterminantandSmithFormofan tem of equations A Zn×n and b Zn×1 works as stated integermatrixisalsosketched,yieldingalgorithmsrequiring ∈ ∈ using these projections, and requires fewer bit operations ∗This material is based on work supported in part by the than anypreviously known when A issparse, at least using French National Research Agency (ANR Gecko, Villard), “standard” (i.e., cubic) matrix arithmetic. Here, by A be- and by the Natural Sciences and Engineering Research ingsparse,wemeanthatithasafastmatrix-vectorproduct Council (NSERC) of Canada (Eberly, Giesbrecht, Storjo- modulo any small (machine-word size) prime p. In partic- hann). ular, our algorithm requiresan expected O˜(n1.5(log( A + b ))) matrix-vector products v Avmodp for v kZnk×1 pkluks and additional O˜(n2.5(log(7→A +log b ))) bit∈opepra- k k k k tions. The algorithm is probabilistic of the Las Vegas type. That is, it assumes the ability to generate random bits at Permission tomake digital orhardcopies ofall orpartofthis workfor unit cost, and always returns the correct answer with con- personalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesare trollably high probability. When φ(n)=O˜(n), theimplied notmadeordistributedforprofitorcommercialadvantageandthatcopies costofO˜(n2.5)bitoperationsimprovesuponthep-adiclift- bearthisnoticeandthefullcitationonthefirstpage. Tocopyotherwise,to ing method of [7] which requires O˜(n3) bit operations for republish,topostonserversortoredistributetolists,requirespriorspecific permissionand/orafee. sparse or densematrices. This theoretical efficiency was re- flected in practice in [11] at least for large matrices. here that it does yield a block Krylov matrix of full rank, We present several other rather surprising applications andhencecan beusedforan efficientinverseofasparse A. of this technique. Each incorporates the technique into an Let = diag(δ ,...,δ ,δ ,...,δ ,...,δ ,...,δ ) be an 1 1 2 2 m m D existing algorithm in order to reduce the asymptotic com- n n diagonal matrix whose entries consist of m distinct × plexity for the sparse matrix problem to be solved. In par- indeterminates δ , each δ occurring s times. i i ticular, given a matrix A Fn×n over an arbitrary field F, we are able to compute∈the complete inverse of A with Theorem 2.1. If the leading ks ks minor of A is non- O˜(n3−1/(ω−1)) operations in FplusO˜(n2−1/(ω−1)) matrix- zero for 1 k m, then m( A ×,u) Fn×n isinvertible. ≤ ≤ K D D ∈ vector products by A. Here ω is such that we can multiply Proof. Let = A . For 1 k m, define as two n n matrices with O(nω) operations in F. Standard thespecializatioBnof DobDtainedbyse≤tting≤δ ,δ ,..B.k,δ matrix×multiplication gives ω = 3, while the best known to zero. Then isBthe matrix constructke+d1byk+se2tting tmo k matrix multiplication of Coppersmith & Winograd [6] has zero the last n Bks rows and columns of . Similarly, for ω = 2.376. If again we can compute v Av with O˜(n) 1 k mwed−efineu Fn×stobethemaBtrixconstructed operations in F, this implies an algorithm7→to compute the fro≤m u≤bysettingtozekro∈thelast n ksrows. Inparticular inverse with O˜(n3−1/(ω−1)) operations in F. This is always we have = and u =u. − m m less than O(nω), and in particular equals O˜(n2.27) oper- WeproBceed bBy induction on k and show that ations in F for the best known ω of [6]. Other relatively straightforward applications of these techniques yield algo- rank k( k,uk)=ks, (2.2) K B rithms for the full nullspace and (certified) rank with this for1 k m. Forthebasecasek=1wehave ( ,u )= 1 1 1 same cost. Finally, we sketch how these methods can be ≤ ≤ K B u and thusrank ( ,u )=ranku =s. 1 1 1 1 1 employedinthealgorithmsofKaltofenandVillard[20]and K B Now,assumethat(2.2)holdsforsomek with1 k<m. [14]tocomputingthedeterminantandSmithformofsparse ≤ By the definition of and u , only the first ks rows of k k matrices more efficiently. B and u will be involved in the left hand side of (2.2). k k There has certainly been much important work done on B Similarly, only the first ks columns of will be involved. k finding exact solutions to sparse rational systems prior to B Sincebyassumptionon theleadingks ksminorisnon- [11]. Dixon’sp-adicliftingalgorithm[7]performsextremely B × zero, we have rank ( ,u ) = ks, which is equivalent k k k k well in practice for dense and sparse linear systems, and B K B to rank ( , u ) = ks. By the fact that the first ks k k k k is implemented efficiently in LinBox [8], Maple and Magma K B B rows of u u are zero, we have (u u )=0, or k+1 k k k+1 k (see[11]foracomparison). Kaltofen&Saunders[18]arethe − B − equivalently u = u , and hence k k+1 k k firsttoproposetouseKrylov-typealgorithmsfortheseprob- B B lems. Krylov-typemethods are used to findSmith forms of rank k( k, kuk+1)=ks. (2.3) K B B sparse matrices and to solve Diophantine systems in paral- The matrix in (2.3) can bewritten as lel in [13, 14], and this is further developed in [20, 9]. See the references in these papers for a more complete history. ( , u )= Mk Fn×ks, For sparse systems over a field, the seminal work is that Kk Bk Bk k+1 » 0 –∈ of Wiedemann [23] who shows how to solve sparse n n systems over a field with O(n) matrix-vector products×and where Mk Fks×ks is nonsingular. Introducing the block ∈ O(n2) other operations. This research is further developed uk+1 we obtain thefollowing matrix: in [18, 5, 17] and many other works. The bit complexity of M similar operations for various families of structure matrices ∗ k is examined in [12]. [uk+1,Kk(Bk,Bkuk+1)]=2 Is 0 3. (2.4) 0 0 4 5 2. SPARSE BLOCK GENERATORS FOR whose rank is (k+1)s. Noticing that THEKRYLOVSPACE u , ( , u ) = ( ,u ) , For now we will consider an arbitrary invertible matrix » k+1 Kk Bk Bk k+1 – »Kk+1 Bk k+1 – A Fn×n over a field F, and s an integer — the blocking ∈ we are led to factor — which divides n exactly. Let m = n/s. For a so- called block projection u Fn×s and 1 k m, we denote rank ( ,u )=(k+1)s. by (A,u) the block Kr∈ylov matrix [≤u,Au≤,...,Ak−1u] Kk+1 Bk k+1 k sFinn×gKkusla.rOfourr gaopalaritsictuolasrhloywsitmhaptleKamn(dAs,pua)rs∈eFun,×anssiusmnoinn∈g- Fobintaailnlye,dubsiyngsetthtiengfacδtk+t1hattoBzekrois, wtheeosbpteaciinalization of Bk+1 some properties of A. rank ( ,u )=(k+1)s, k+1 k+1 k+1 Our factorization uses the special projection (which we K B will refer to as an efficient block projection) which is(2.2) for k+1andthusestablishes thetheorem by induction. I s u=2 .. 3 Fn×s (2.1) If the leading ks ks minor of A is nonzero then the . ∈ leading ks ks min×or of AT is nonzero as well, for any 64 Is 75 integer k. × which is comprised of m copies of I and thus has exactly s nnon-zeroentries. Asimilar projection hasbeen suggested Corollary 2.2. If the leading ks ks minor of A is in [11] without proof of its reliability (i.e., that the corre- nonzero for 1 k m and = A ×, then ( T,u) m spondingblockKrylovmatrixisnon-singular). Weestablish Fn×n is inverti≤ble. ≤ B D D K B ∈ Suppose now that A Fn×n is an arbitrary non-singular Corollary 2.2) implies that (r) and (ℓ), and thus are u u u matrix and the size of∈F exceeds n(n+1). It follows by invertible. K K H Theorem 2 of Kaltofen and Saunders [18] that there exists a lower triangular Toeplitz matrix L Fn×n and an upper Corollary 3.1. If A Fn×n is such that all leading triangular Toeplitz matrix U Fn×n∈such that each of the ks ks minors are non-s∈ingular, is a diagonal matrix ∈ of i×ndeterminates and = A , tDhen −1 and A−1 may leadingminorsofA=UALisnonzero. Let = A ;then B D D B B D D be factored as the products of the determinants of the matrices ( ,u) and m( T,u)(mbentionedintheabovetheoremaKnbdmcBorol- −1 = (r) −1 (ℓ), lary)KisaBpolynomialwithtotaldegreelessthan2n(m 1)< B Ku Hu Ku (3.1) nno(nn-+si1n)gu(ilfamrd6=iag1o)n.aIlnmthaitsricxasDeitfFonll×ownssuthchattthhaetreis−(aBls,oua) A−1 =DKu(r)H−u1Ku(ℓ)D, and Km(BT,u) are non-singular∈, for Km whereFKnu(×ℓ)naisndbloKcu(kr-)Haarnekebllo(cakn-Kdrinylvoevrtaibsled)ewfinitehdsabosvbel,ocaknsd, u H ∈ × B=DAD=DUALD. as above. Now let R=LD2U Fbn×n, u Fs×n and v Fn×s such Wenotethatforanyspecialization oftheindeterminatesin that ∈ ∈ ∈ tofieldelementsinFsuchthatdet =0wegetasimilar u uT =(LT)−1D−1u band v=LDbu. fDormula to (3.1) completely over F. HA s6imilar factorization inthenon-blockedcaseisusedin[10, (4.5)]forfastparallel Then b b matrix inversion. (RA,v)=LD (B,u) Km Km 4. BLACK-BOXMATRIXINVERSIONOVER and b A FIELD LTD m((RA)T,uT)= m(BT,u), LetA Fn×nbeinvertibleandsuchthatforanyv Fn×1 asostwhealtl.KmSi(nRcAe,DvK)iasnddiKagmo(n(aRlbAa)nTd,uUKT)aanrdeeLacahrenotnri-asinngguullaarr pthuetemdaitnr∈iφx(nti)moepsevreacttioonrspirnodFuc(twhAevreoφr(An)Tv cnan).bF∈eoclloomw-- ≥ Toeplitz matricebs it is now easily ebstablished that (R,u,v) ing Kaltofen, we call such matrix-vector and vector-matrix is an “efficient block projection” for the given matrix A, products black-box evaluations of A. In this section we will where this is as definedin [11]. b b show how to compute A−1 with O˜(n2−1/(ω−1)) black box This proves Conjecture 2.1 of [11] for the case that the evaluations and additional O˜(n3−1/(ω−1)) operations in F. size of F exceeds n(n+1): Note that when φ(n) = O˜(n) — a common characteriza- tion of “sparse” matrices — the exponent in n of this cost Corollary 2.3. For any non-singular A Fn×n and issmaller thanω,andisO˜(n2.273) withthecurrentlybest- ∈ s n (over a field of size greater than n(n+1)) there exists known matrix multiplication. an| efficientblock projection (R,u,v) Fn×n Fs×n Fn×s. Weassume for themoment that all principal ks ks mi- ∈ × × × nors of A are non-zero, 1 k m. ≤ ≤ 3. FACTORIZATIONOFTHEMATRIXIN- Let δ ,δ ,...,δ be the indeterminates which form the 1 2 m VERSE diagonal entries of and let = A . By Theorem 2.1 and Corollary 2.2, Dthe matricBes D( D,u) and ( T,u) m m Theexistenceof theefficient block projection established are each invertible. If m 2 KthenBthe produKct Bof the in the previous section allows us to define a useful factor- determinants of these matr≥ices is a non-zero polynomial ization of the inverse of a matrix. This was used to obtain ∆ F[δ ,...,δ ]withtotaldegreelessthan2n(m 1) 1. 1 m faster heuristics for solving sparse integer matrices in [11]. S∈upposethat Fhasat least 2n(m 1) elements.−The−n ∆ Thebasisisthefollowingfactorizationofthematrixinverse. cannotbezeroatallpointsin(F 0−)n. Letd ,d ,...,d 1 2 m Let = A , where is an n n diagonal matrix be nonzero elements of F such th\a{t ∆}(d ,d ,...,d ) = 0, whosedBiagonDaleDntriesconsDistofmdis×tinctindeterminates, letD=diag(d ,...,d ,...,d ,...,d ),a1nd2letB=mD6AD. 1 1 m m each occuring s times contiguously, as previously. Define Then K(r) = (B,u) Fn×n and K(ℓ) = (BT,u)T (Kwu(rh)er=eK(rm)(aBn,du)(ℓw)irtehfeuratos ipnro(2je.1ct)ioanndonKu(tℓh)e=riKghmt(aBnTd,ule)fTt Finnv×enrtiabrlueeesiancchKeimAnviesrtainbdle∈dsin,dce,∆..(.d,1d,d2ua,r.e..a,lKdlmmno)n6=zer0o,,Ban∈ids 1 2 m lreasnpdecrtivseulcyh).thFaonr aln+yr0=≤kkw≤emhav−e1uTandl anyrutw=ouinTdikcues. thus Hu = Ku(ℓ)BKu(r) ∈ Fn×n is invertible as well. Corre- Hence the matrix = (ℓ) (r) isBblo·cBk-HankelBwith spondingly, (3.1) suggests u u u blocks of dimensionHs sK: ·B·K B−1 =K(r)H−1K(ℓ), and A−1=DK(r)H−1K(ℓ)D × u u u u u u uT u uT 2u ... uT mu for computingthe matrix inverse. B B B Hu =266 uTB...2u uTB3u... ... uT 2...m−2u 377 1. CoWmepcuatnatcioomnpoufteutTh,isuTseBqu,e.n.c.e,,uaTndB2hme−nc1eaKndu(r)Kwu(ℓi)t.h 664 uTBmu ... uTB2m−2u uTBB2m−1u 775 emra−ti1onaspipnlicFa.tionsofB tovectorsusingO(nφ(n))op- Fn×n ∈ 2. Computation of H . u From = (ℓ) (r) = (ℓ) A (r), we have Duetothespecialform (2.1)ofu,onemaythencom- u u u u u followingHcorollKary t·oBT·KheoremK2.1,·wDhicDh ·(Ktogether with pute wu for any w Fs×n with O(sn) operations. ∈ HencewecannowcomputeuTBiufor0 i 2m 1 uniformly and randomly from . If U is constructed us- with O(n2) operations in F. ≤ ≤ − ing thegeneric exchangematrixSof [5, 6.2], then it will use § at most n log n /2 random elements from S, and from [5, 3. Computation of H−1. ⌈ 2 ⌉ u Theorem 6.3] it follows that all leading ks ks minors of The off-diagonal inverse representation of H−1 as in × u A = UA will be non–zero simultaneously with probability (A.4)intheAppendixcanbefoundwithO˜(sωm)op- at least 3/4. This probability of success can be made arbi- erations by Proposition A.1. e trarily close to 1 with a choice from a larger . We note 4. Computation of H−1K(ℓ). that A−1 = A−1U. Thus, once we have compuSted A−1 we FromCorollary A.2uintuheAppendix,wecancompute can compute A−1 with an additional O˜(n2) operations in the product H−1M for any matrix M Fn×n with F, using the feact that multiplication of an arbitraryen n O˜(sωm2) operautions. ∈ matrix by an n n butterfly preconditioner can be d×one with O˜(n2) oper×ations. 5. Computation of K(r) (H−1K(ℓ)). Once again let ∆ be the products of the determinants aWneycMan comFpnu×tne Kbuyu(rs)·pMlit=tuin[gu,uMBu,in.t.o.,Bmmb−l1ouck]Ms ,offosr o∆f tishenomnazterriocewsiKthmt(oDtaAlDd,eug)reaenldesKsmth(a(nDA2nD()mT,u)1,)s.oItfhwaet consecuti∈ve rows Mi, for 0≤i≤m−1: #chSoose2r(amnd+om1)lynsleolgectned>val4udeesgfr∆om, thSefoprroδb1a,.b.i−l.i,tyδmt,hsaitnc∆e m−1 ≥ ⌈ 2 ⌉ is zero at this point is at most 1/4 by the Schwartz-Zippel K M = Bi(uM ) u i Lemma [22, 24]. Xi=0 (4.1) In summary, for randomly selected butterfly precondi- =uM +B(uM +B(uM + 0 1 2 ··· tioner B, and independently and randomly chosen values ···+B(uMm−2+BuMm−1)···). d1,d2,...,dmtheprobabilitythatA=UAhasnon-singular leadingks ksminorsfor1 k mand ∆(d ,d ,...,d ) Because of the special form (2.1) of u, each product × ≤ ≤e 1 2 m uM Fn×n requires O(n2) operations, and hence all is non-zero is at least 9/16>1/2 when random choices are i ∈ madeuniformlyandindependentlyfromafinitesubset of such products involved in (4.1) can be computed in F 0 with size at least 2(m+1)n log n . S O(mn2) operations. Since applying B to an n× n \W{h}en #F 2(m+1)n log n w⌈e c2an⌉easily construct matrix costsnφ(n) operations, Ku(r)M iscomputedin a field extensi≤on E of F wh⌈ich2has⌉size greater than 2(m+ O(mnφ(n)+mn2)operationsusingtheiterativeform 1)n log n andperformthecomputationinthatextension. of (4.1) Sinc⌈e th2is⌉extension will have degree O(log n) over F, it #F willaddonlyalogarithmicfactortothefinalcost. Whilewe In total, the above process requires O(mn) applications of A to a vector (the same as for B), and O(sωm2+mn2) certainly do not claim that this is not of practical concern, it does not affect theasymptotic complexity. additional operations. If φ(n) = O˜(n) the overall number Finally,wenotethatit iseasily checkedwhetherthema- offieldoperationsisminimizedwiththeblockingfactors= n1/(ω−1). trixreturnedbythisalgorithmistheinverseoftheinputby using n multiplications by A by the columns of the output Theorem 4.1. Let A Fn×n, where n = ms, be such matrixandcorrespondingeachtothecorrespondingcolumn that all leadingks ks mi∈norsare non-singular for1 k oftheidentitymatrix. ThisresultsinaLasVegasalgorithm m. Let B=DAD×for D=diag(d ,...,d ,...,d ,..≤.,d ≤) for computation of the inverse of a black-box matrix with 1 1 m m suchthatd ,d ,...,d arenonzeroandeachofthematrices thecost as given above. 1 2 m (DAD,u) and ((DAD)T,u) is invertible. Then the KaoinppmveperlraicstaeiotinmosnasitnroifxFA.At−oK1vmeccatnorbseancodmanpuatdeddituiosinnaglOO˜((nn23−−11//((ωω−−11)))) rinitvhTemrhseewohmroeasmteriex4x.p2Ae.c−tL1eedctacAnos∈tbeiFscnoO×m˜n(pnbu2et−end1/o(bnωy-−s1ain))gLuaalpsaprVl.iecTgaathiseonnaslgthooe-f AtoavectorandO˜(n3−1/(ω−1))additionaloperationsinF. The above discussion makes a numberof assumptions. First,itassumesthattheblockingfactorsexactlydivides n. This is easily accommodated by simply extending n to ω Black-box Blocking Inversion the nearest multiple of s, placing A in the top left corner applications factor s cost of the augmented matrix, and adding diagonal ones in the 3 (Standard) 1.5 1/2 O˜(n2.5) bottom right corner. 2.807 (Strassen) 1.446 0.553 O˜(n2.446) Theorem4.1alsomakestheassumptionsthatallthelead- ing ks ks minors of A are non-singular and that the de- 2.3755 (Cop/Win) 1.273 0.728 O˜(n2.273) termina×nts of (DAD,u) and ((DAD)T,u) are each m m K K nonzero. Whileweknowofnowaytoensurethisdetermin- Table4.1: Exponentsofmatrixinversionwithama- isticallyinthetimesgiven,standardtechniquescanbeused trix vector cost φ(n)=O˜(n). toobtainthesepropertiesprobabilisticallyifFissufficiently × large. Suppose, in particular, that n 16 and that #F > 2(m+1)n log n . Fixaset ofat≥least2(m+1)n log n Remark 4.3. Thestructure(2.1)oftheprojectionuplays nonzero el⌈eme2nts⌉ of F. WeScan ensure that the⌈lead2ing⌉ a central role in computing the product of the block Krylov ks ks minors of A are non-zero by pre-multiplying by a matrixbyan nmatrix. Forageneralprojectionu Fn×s, × × ∈ butterflynetworkpreconditioner U with parameters chosen how to do better than a general matrix multiplication, i.e., how totakeadvantage oftheKrylovstructure forcomputing the black-box for A and O(n2) additional operations in F. K M, appears to be unknown. However, no certificate is provided that the rank is correct u within this cost (and we donot knowof oneor provideone Applying a Black-Box Matrix Inverse to a Ma- here). Kaltofen & Saunders [18] also show how to generate trix a vector uniformly and randomly from the nullspace of A The above method can also be used to compute A−1M for withthiscost(and,ofcourse,thisiscertifiablewithasingle any matrixM Fn×nwiththesamecostasinTheorem4.2. evaluation of the black box for A). We also note that the ∈ algorithms ofWiedemannandKaltofen &Saundersrequire Consider thenew step 1.5: only a linear amount of extra space, which will not be the 1.5. Computation of K(ℓ) M. case for our algorithms. u Split M into m blocks·of s columns, so that M = Wefirst employ the random preconditioning of [18]: A= [pbMuet0ai,cn.cg.o.Km,Mup(ℓl)ims·hM−e1dk]wfboyhrecsrooemmMpekukt∈i∈nFg{nB0×,is.M...N,mofowr−c0o1n}.siidTehrimscocman1- WAUAhhLialesDaanallsulanebaludocivnkegy. cWih×oeiicwemilmlinathoyurmssanaksoesnu-tmshiinesgsiuntlaawtrehmfaoetrnfto1lfl≤aolwsies,≤ttehharist. insequence,andthenmultiplyingonk thele≤ftb≤yuT−to case will be identified in our method. Also assume that we havecomputedtherankrofAwithhighprobability. Again, compute uTBiM for each iterate. k this will becertified in what follows. The cost for computing K(ℓ)M for a single k by the u k above process is n s multiplication of A to vectors 1. Inverting the leading minor. and O(ns) addition−al operations in F. The cost of Let A0 be the leading r r minor of A and partition × doing this for all k such that 0 k m 1 is thus A as ≤ ≤ − m(n s) < nm multiplications of A to vectors and A A O(n2−) additional operations. Since applying A (and A=„A02 A13« henceB)toann nmatrixisassumed tocost nφ(n) × Using the algorithm of the previous section, compute mopne2ra)toiopnesraitnioFn,sKinu(ℓF)·bMy tihsecopmropcuetsesddienscOri(bmednφh(enre).+ A−01. Ifthisfails,thentherandomizedconditioningor the rank estimate has failed and we either report this failure or try again with a different randomized pre- Note that this is the same as the cost of Step 5, so the conditioning. If we can compute A−1, then the rank overall cost estimate is not affected. Since Step 4 does not 0 of A is at least the estimated r. rely on any special form for K(ℓ), we can replace it with u a computation of H−1 (K(ℓ)M) with the same cost. We 2. Applying the inverted leading minor. obtain the following:u · u Compute A−01A1 ∈ Fr×(n−r) using the algorithm of theprevioussection(thiscouldinfact bemerged into Corollary 4.4. Let A Fn×n be non-singular and let thefirst step). M Fn×n be any matrix. W∈e can compute A−1M using a Las∈Vegas algorithm whose expected cost is O˜(n2−1/(ω−1)) 3. Confirming the nullspace. multiplicationsofAtovectorsandO˜(n3−1/(ω−1))additional Note that opeTrhaetioenstsiminatFe.s in Table 4 apply to this computation as „AA02 AA13«„A−0−1IA1«=„A2A−01A01−A3«=0 well. N and the Schur|com{pzlem}ent A A−1A A must be 5. APPLICATIONS TO BLACK-BOX MA- zeroiftherankr iscorrect. Th2is0can1b−eche3ckedwith TRICESOVER A FIELD n r evaluationsoftheblackboxforA. Wenotethat − because of its structure,N has rank n r. The algorithms of the previous section have applications − in some important computations with black-box matrices 4. Output rank and nullspace basis. over an arbitrary field F. In particular, we consider the IftheSchurcomplementiszero,thenoutputtherank problems of computing the nullspace and rank of a black- r andN,whosecolumnsgiveabasisforthenullspace box matrix. Each of these algorithms is probabilistic of the of A. Otherwise, output fail (and possibly retry with LasVegastype. Thatis,theoutputiscertifiedtobecorrect. a different randomized pre-conditioning). Kaltofen & Saunders[18] present algorithms for comput- ing the rank of a matrix and for randomly sampling the Theorem 5.1. Let A Fn×n have rank r. Then a basis ∈ nullspace, building upon work of Wiedemann [23]. In par- forthe nullspace ofA andrank r ofA canbe computed with ticular, they show that for random lower upper and lower anexpected numberofO˜(n2−1/(ω−1))applicationsofAtoa triangular Toeplitz matrices U,L Fn×n, and random di- vector,plusanadditionalexpectednumberofO˜(n3−1/(ω−1)) ∈ operations in F. The algorithm is probabilistic of the Las agonal D, that all leading k k minors of A = UALD are non-singularfor1 k r=×rankA,andthatiffAe F[x]is Vegas type. ≤ ≤ e ∈ theminimalpolynomialofA,thenithasdegreer+1ifAis 6. APPLICATIONSTOSPARSERATIONAL singular (and degree n if A is non-singular). This is proven tobetrueforanyinputA eFn×n,andforrandomchoiceof LINEAR SYSTEMS ∈ U, L and D, with high probability. The cost of computing Given a non-singular A Zn×n and b Zn×1, in [11] fAe (and hence rankA) is shown to be O(n) applications of we presented an algorithm a∈nd implementat∈ion to compute A−1b with O˜(n1.5(log( A + b ))) matrix-vectorproducts Computing the minimal matrix generator. k k k k v Amodpforamachine-wordsizedprimepandanyv The minimal matrix generator F(λ) modulo p can be AZnps7→s×u1mpilnugsOth˜a(nt2A.5a(lnodg(bkhAakd+ckobnks)t)a)natdsdizietidonenatlrbieits-,oapnedratthioant∈sa. Scoemep[2u0t,ed4f]r.oTmhtihseciannitbiaelsaecqcuoemnpcleissheegdmewnitthα0O,˜.(.m.,sαω2m−1. § · matrix-vectorproductbyAmodpcouldbeperformedwith nlog A ) bit operations. k k O˜(n) operations modulo p, the algorithm presented could solve a system with O˜(n2.5) bit operations. Unfortunately, Extracting the determinant. thisresultwasconditionalupontheunprovenConjecture2.1 Following the algorithm in [20, 4], we first check if § of [11]: the existence of an efficient block projection. This its degree is less than n and if so, return “failure”. conjecture was established in Corollary 2.3 of the current Otherwise,weknowdetFA(λ)=det(λI A). Return − paper. Wecan now unconditionally state thefollowing: detA=detF(0)modp. Theorem 6.1. Given any invertible A Zn×n and b Zn×1, we can compute A−1b using a Las ∈Vegas algorithm∈. proTjhecetcioonrrse,cftonleloswsosfftrhoemalfgaocrtitthhmat,a[un,dAsup,e.c.ifi.c,aAllmy−t1hue]bilsocokf Theexpected numberofmatrix-vector productsv Avmod p is in O˜(n1.5(log( A + b ))), and the expe7→cted num- full rank with high probability by Theorem 2.1. Since the k k k k projectionvisdense,theanalysisof[20,(2.6)]isapplicable, ber of additional bit-operations used by this algorithm is in andtheminimalgeneratingpolynomialwillhavefulldegree O˜(n2.5(log( A + b ))). k k k k mwithhighprobability,andhenceitsdeterminantatλ=0 Sparse integer determinant and Smithform will be thedeterminant of A. ThetotalcostisO˜((nφ(n)m+n2m+nmsω)log A )bit The efficient block projection of Theorem 2.1 can also be operations, which is minimized when s=n1/ω. Thkis ykields employedrelativelydirectlyintotheblockbaby-steps/giant- analgorithmforthedeterminantwhichrequiresO˜((n2−1/ωφ(n)+ steps methods of [20] for computing the determinant of an n3−1/ω)log A ) bit operations. This is probably most in- integer matrix. This will yield improved algorithms for the k k terestingwhenω=3,whereityieldsanalgorithmfordeter- determinantandSmithformofasparseintegermatrix. Un- minant which requiresO˜(n2.66log A ) bit operations on a fortunately, the new techniques do not obviously improve k k matrix with pseudo-linear cost matrix-vectorproduct. theasymptoticcostoftheiralgorithmsinthecaseforwhich Wealso notethatasimilar approach allows ustousethe they were designed, namely, for computations of the deter- Monte Carlo Smith form algorithm of [14], which is com- minants of dense integer matrices. putedbymeansofcomputingthecharacteristicpolynomial Weonlysketchthemethodforcomputingthedeterminant of random preconditionings of a matrix. This reduction is here following the algorithm in Section 4 of [20], and esti- mateitscomplexity. ThroughoutweassumethatA Zn×n explored in [20] in the dense matrix setting. The upshot ∈ is that we obtain the Smith form with the same order of is non-singular and assume that we can compute v Av 7→ complexity, to within a poly-logarithmic factor, as we have with φ(n)integer operations, wherethebit-lengthsofthese obtained the determinant using the above techniques. See integers are boundedby O˜(log(n+ v + A )). k k k k [20, 7.1] and [14] for details. We make no claim that this § is practical in its present form. Preconditioning and setup. Precondition A B = D UAD , where D , D are 1 2 1 2 ← randomdiagonalmatrices,andU isaunimodularpre- APPENDIX conditioner from [23, 5]. While we will not do the detailed analysis here,§selecting coefficients for these A. APPLYINGTHEINVERSEOFABLOCK- randomly from a set S1 of size n3 is sufficient to en- HANKELMATRIX sure a high probability of success. This precondition- Inthisappendixweaddressasymptoticallyfasttechniques ingwillensurethatallleadingminorsarenon-singular forcomputingarepresentationoftheinverseofablockHan- and that the characteristic polynomial is squarefree kelmatrix,for applyingthisinversetoan arbitrary matrix. with high probability (see [5] Theorem 4.3 for a proof Thefundamentaltechniquewewillemployistousetheoff- of the latter condition). From Theorem 2.1, we also diagonalinversionformulaofBeckermann&Labahn[1]and see that (B,u) hasfull rank with high probability. m K its fast variants [15]. An alternative to using the inversion Letpbeaprimewhichislargerthantheaprioribound formula wouldbetousethegeneralization oftheLevinson- on the coefficients of the characteristic polynomial of Durbin algorithm in [19]. A; this is easily determined to be (nlog A )n+o(1). Foranintegermthatdividesnexactlywiths=n/m,let k k Fix a blocking factor s to be optimized later, and as- sume n=ms. Choopobsfrlioonscjigkezcepptairrotoonjljeeeacacstsittoiinno2nn(w22si...t1h)Laceotneduffiv∈ci∈ZennZt×sns×cbhseoasaernnanefffirdoocmmiena(tdsbeentlosScek2) H =2666 αα...01 αα12 ... ...... αα2mm...−−12 3777∈Fn×n 64 αm−1 ... α2m−2 α2m−1 75 Forming the sequence α =uAiv Zs×s. (A.1) i ∈ Compute this sequence for i = 0...2m. Computing beanon-singularblock-Hankelmatrixwhoseblocksares s alltheAiv takesO˜(nφ(n) mlog A )bitoperations. matricesoverF,andletα bearbitraryinFs×s. Wefoll×ow 2m Computing all the uAiv ta·kes O˜k(n2k mlog A ) bit thelinesof[21]forcomputingtheinversematrixH−1. Since · k k operations. H is invertible, the following four linear systems (see [21, (3.8)-(3.11)]) with Q,P,Q∗,P∗ Fs×s[x] that satisfy the degree con- ∈ ∗ straints degQ m 1,degQ m 1, and degP HH[[vqmm−,·1·,···,·v1,]qt0=]t =−[[0α,m··,···,·0α,2Im]∈−1Fαn2×ms],∈Fn×s, (A.2) mFs×−s2a,rdeegnoPn∗-s≤in≤∗gmul−ar−,2.heTnhceerQesRid−≤u1eamnad−tr(iRce∗s)−R1Qan∗darRe∗soi≤n- lutions Q and Q for applying the inversion formula (A.4). and For (A.6), the σ-basis algorithm with σ = s(2m+1) leads [qm∗−1 ... q0∗]H =[0 ... 0 I]∈Fs×n, to [vm∗ ... v1∗]H =−[αm ... α2m−1 α2m]∈Fs×n, [A −I]» VU –= modx2m+1, (A.3) khavemuniq1u)e, saonlduttiohnesvgi,vve∗n byFst×hse(qfko,rq1k∗ ∈kFs×sm, ()f.orT0he≤n [V∗ U∗]» AI –= modx2m+1 ≤ − k k ∈ ≤ ≤ − we have(see [21, Theorem3.1]): ∗ ∗ with degV m,degV m,and degU m 1,degU H−1=26666 vmvI...1−1 ......... ..v.1 I 37777264 qm∗−1 ...... qm∗q...0∗−1 375 matesiosrnteg−giemsut1ohla.aleturTert,.ihwoheniet≤scnhocfnoetsrhtVeaapnatp=bltoyeviVrnem(g≤mVs(a(VA0t.(e)40)r)−i)a.1alUnwadsneindVgg∗eVT(t0≤∗ht)he=ioenrfe−Fo(msVll×o∗2sw(.04ain)r)iegn−nc1[o1oVn5s≤∗t-] 4 5 (A.4) −26666 qmq0...0−2 ......... ..q.0 0 37777264 vm∗ ...... vvm...∗1∗ 375. diinnogvMnPeemrrusowlaetptiirotpohiflxsytOiihtpn˜eoig(olsbynaωlnomocbAkml)o-.Hi1coak.palsenCtrrkoaoiefatmlindompgenuguastrltiearinenrigxOTFt(ho.(Aeme.p1e)lx)itpirznreedoFsurssci×Hoesnsa,nt(aokAnem.dl4um)clatoainftprtlbhyixe-e 4 5 inFn×n withblocksofsizes sbyamatrixinFn×n reduces The linearsystems (A.2) and(A.3) may also beformulated to the product of two matri×x polynomials of degree O(m), in termsof matrix Pad´eapproximation problems. Weasso- andofdimensionss sands n. Usingthefastalgorithms TcihateestoHs tmhaetmrixatprioxlypnoolmynioamlsiQal,AP,=Q∗P,P2i=m∗0iαnixFis×∈sF[xs]×tsh[xa]t. ionpe[4ra]toiorn[s3.],Bsyucshplait×tsin×gsthperos×ducntmcaantrbixeidnotnoesin sO˜b(lsoωcmks), satisfy× thes sbys nproductcanth×usbedoneinO˜(m× sωm)= A(x)Q(x) P(x)+x2m−1 modx2m, O˜(sω×m2) op×erations. × ≡ For n = sν let ω(1,1,ν) be the exponent of the problem where degQ m 1 and degP m 2, ofs sbys nmatrixmultiplicationoverF. Thesplitting ≤ − ≤ − (A.5) cons×idered ju×st above of the s n matrix into s s blocks, Q∗(x)A(x) P∗(x)+x2m−1 modx2m, corresponds to taking ω(1,1,ν×) = ω +ν 1 <×ν +1.376 where de≡gQ∗ m 1 and degP∗ m 2 (ω <2.376 due to [6]), with the total cost−O˜(sω(1,1,ν)m)= ≤ − ≤ − O˜(sωm2). Depending on σ 1, a slightly smaller bound are uniqueand providethecoefficients Q= mi=−01qixi and than ν +1.376 for ω(1,1,ν)≥may be used due the matrix Q∗ = m−1q∗xi forconstructingH−1 usingP(A.4)(see[21, multiplication techniquesspecificallydesignedforrectangu- i=0 i TheorPem3.1]). The notation “mod xi” for i 0 denotes lar matrices in [16]. This is true as soon as ν 1.171, and thatthetermsofdegreeiorhigherareforgotte≥n. Thes s gives for example ω(1,1,ν) < ν+1.334 for ν≥= 2, i.e., for matrix polynomials V,U,V∗,U∗ in Fs×s[x] that satisfy × s=√n. A(x)V(x) U(x)modx2m+1, V(0)=I, Corollary A.2. LetH betheblock-Hankelmatrixof(A.1). ≡ If the representation (A.4) of H−1 is given then computing where degV ≤m and degU ≤m−1, H−1M for an arbitrary M Fn×n reduces to four s s by (A.6) ∈ × s n products of polynomial matrices of degree O(m). This V∗(x)A(x)≡U∗(x)modx2m+1, V∗(0)=I, ca×n be done with O˜(sω(1,1,ν)m) or O˜(sωm2) operations in where degQ∗ m 1 and degP∗ m 2, F (n=sν =ms). ≤ − ≤ − are unique and provide the coefficients V = 1+ mi=1vixi B. REFERENCES and Q∗ =1+ m v∗xi for (A.4). P Using the mPatri=ix1Piad´e formulation, the matrices Q, Q∗, [1] Bfo.rBthecekfearsmt,arnenliaabnlde Gco.mLpaubtaahtnio.nAofunmifaotrrmix-atyppperopaachd´e V, and V∗ may be computed using the σ-basis algorithm approximants. SIAM J. Matrix Anal. Appl., in [2], oritsfast counterpartin [15, 2.2] that usesfast ma- § 15:804–823, 1994. trixmultiplication. Forsolving(A.5),theσ-basisalgorithm [2] Bernhard Beckermann and George Labahn. A uniform with σ=s(2m 1) solves − approach for the fast computation of matrix-type Pad´e approximants. SIAM Journal on Matrix [A I] Q =Rx2m−1 modx2m, Analysis and Applications, 15(3):804–823, July 1994. − » P – [3] Alin Bostan and Eric Schost. Polynomial evaluation [Q∗ P∗] A =R∗x2m−1 modx2m, and interpolation on special sets of points. J. » −I – Complex., 21(4):420–446, 2005. [4] D. Cantor and E. Kaltofen. Fast multiplication of [18] E. Kaltofen and B. D. Saunders.On Wiedemann’s polynomials over arbitrary algebras. Acta Informatica, method of solving sparse linear systems. In Proc. 28:693–701, 1991. AAECC-9, volume 539 of Springer Lecture Notes in [5] L. Chen, W. Eberly,E. Kaltofen, B. D.Saunders, Comp. Sci.,1991. 29–38. W. J. Turner, and G. Villard. Efficient matrix [19] Erich Kaltofen. Analysis of Coppersmith’s block preconditioners for black box linear algebra. Linear Wiedemann algorithm for theparallel solution of Algebra and its Applications, 343–344:119–146, 2002. sparse linear systems. Mathematics of Computation, [6] D. Coppersmith and S. Winograd. Matrix 64(210):777–806, April1995. multiplication via arithmetic progressions. J. Symb. [20] Erich Kaltofen and Gilles Villard. On thecomplexity Comp., 9:251–280, 1990. of computing determinants. Computational [7] John D. Dixon.Exact solution of linear equations Complexity, 13(3-4):91–130, 2004. using p-adicexpansions. Numerische Mathematik, [21] George Labahn, DongKoo Chio, and Stan Cabay. 40:137–141, 1982. The inverses of block hankeland block toeplitz [8] Jean-Guillaume Dumas, Thierry Gautier, Mark matrices. SIAM J. Comput., 19(1):98–123, 1990. Giesbrecht, Pascal Giorgi, Bradford Hovinen,Erich [22] J. T. Schwartz. Fast probabilistic algorithms for Kaltofen, B. David Saunders, Will J. Turner,and verification of polynomial identities. J. Assoc. Gilles Villard. LinBox: A generic library for exact Computing Machinery, 27:701–717, 1980. linear algebra. In Arjeh M. Cohen, Xiao-Shan Gao, [23] Douglas H.Wiedemann. Solving sparse linear and NobukiTakayama, editors, Proceedings of the equations overfinite fields. IEEE Transactions on 2002 International Congress of Mathematical Information Theory, 32(1):54–62, January 1986. Software, Beijing, China,pages 40–50. World [24] R.Zippel. Probabilistic algorithms for sparse Scientific, August 2002. polynomials. In Proc. EUROSAM 79, pages 216–226, [9] Jean-Guillaume Dumas, B. David Saunders, and Marseille, 1979. Gilles Villard. Integer smith form via thevalence: experience with large sparse matrices from homology. In ISSAC ’00: Proceedings of the 2000 international symposium on Symbolic and algebraic computation, pages 95–105, NewYork,NY,USA,2000. ACMPress. [10] WayneEberly. Processor-efficient parallel matrix inversion overabstract fields: two extensions. In Second Int’l Symp. on Parallel Symbolic Computation (PASCO’97), pages 38–45, NewYork,NY,USA,1997. ACM Press. [11] WayneEberly, Mark Giesbrecht, Pascal Giorgi, Arne Storjohann, and Gilles Villard. Solvingsparse rational linear systems. In ISSAC ’06: Proceedings of the 2006 international symposium on Symbolic and algebraic computation, pages 63–70, New York,NY,USA,2006. ACM Press. [12] Ioannis Z. Emiris and Victor Y. Pan. Improved algorithms for computingdeterminantsand resultants. J. Complex., 21(1):43–71, 2005. [13] M. Giesbrecht. Efficient parallel solution of sparse systems of linear diophantineequations. In Proceediings of PASCO’97,pages 1–10, 1997. [14] M. Giesbrecht. Fast computation of thesmith form of a sparse integer matrix. Computational Complexity, 10(1):41–69, 2004. [15] Pascal Giorgi, Claude-Pierre Jeannerod, and Gilles Villard. On thecomplexity of polynomial matrix computations. In Rafael Sendra,editor, Proceedings of the 2003 International Symposium on Symbolic and Algebraic Computation, Philadelphia, Pennsylvania, USA, pages 135–142. ACM Press, New York,August 2003. [16] Xiaohan Huangand Victor Y.Pan. Fast rectangular matrix multiplication and applications. J. Complex., 14(2):257–299, 1998. [17] E. Kaltofen. Analysis of Coppersmith’s block Wiedemann algorithm for theparallel solution of sparse linear systems. Mathematics of Computation, 64(210):777–806, 1995.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.