Table Of Content

A Fast Fourier Transforms [ReadChapters0and1first.] Status:Beta A.1 Polynomials Polynomialsarefunctionsofonevariablebuiltfromadditions,subtractions,andmultipli- cations(butnodivisions). Themostcommonrepresentationforapolynomial p(x)isas asumofweightedpowersofthevariable x: n (cid:88) p(x)= a xj. j j=0 Thenumbersa arecalledthecoefficientsofthepolynomial. Thedegreeofthepolynomial j isthelargestpowerof x whosecoefficientisnotequaltozero;intheexampleabove,the degreeisatmost n. Anypolynomialofdegree ncanberepresentedbyanarray P[0..n] of n+1coefficients,where P[j]isthecoefficientofthe xj term,andwhere P[n](cid:54)=0. Herearethreeofthemostcommonoperationsperformedonpolynomials: ©2018JeffErickson http://algorithms.wtf 1 A. FASTFOURIERTRANSFORMS • Evaluate: Giveapolynomial p andanumber x,computethenumber p(x). • Add: Give two polynomials p and q, compute a polynomial r = p+q, so that r(x)= p(x)+q(x) for all x. If p and q both have degree n, then their sum p+q alsohasdegree n. • Multiply: Give two polynomials p and q, compute a polynomial r = p·q, so that r(x)=p(x)·q(x)forall x. If p andq bothhavedegree n,thentheirproduct p·q hasdegree2n. We learned simple algorithms for all three of these operations in high-school algebra. Theadditionandmultiplicationalgorithmsarestraightforwardgeneralizationsofthe standardalgorithmsforintegerarithmetic. Evaluate(P[0..n],x): X ←1 〈〈X =xj〉〉 Add(P[0..n],Q[0..n]): y ←0 for j←0ton for j←0ton R[j]←P[j]+Q[j] y ← y+P[j]·X returnR[0..n] X ←X ·x return y Multiply(P[0..n],Q[0..m]): for j←0ton+m R[j]←0 for j←0ton fork←0tom R[j+k]←R[j+k]+P[j]·Q[k] returnR[0..n+m] EvaluateusesO(n)arithmeticoperations.1 Thisisthebestwecanhopefor,butwecan cutthenumberofmultiplicationsinhalfusingHorner’srule: p(x)=a +x(a +x(a +...+xa )). 0 1 2 n Horner(P[0..n],x): y ←P[n] fori←n−1downto0 y ←x·y+P[i] return y TheadditionalgorithmalsorunsinO(n)time,andthisisclearlythebestwecando. The multiplication algorithm, however, runs in O(n2) time. In Chapter 1, we saw adivide-and-conqueralgorithm(duetoKaratsuba)formultiplyingtwo n-bitintegers 1AlltimeanalysisinthislectureassumesthateacharithmeticoperationtakesO(1)time. Thismaynot betrueinpractice;infact,oneofthemostpowerfulapplicationsoffastFouriertransformsisfastinteger multiplication. Thefastestalgorithmcurrentlyknownformultiplyingtwo n-bitintegers, publishedby MartinFürerin2007,usesO(nlogn2O(log∗n))bitoperationsandisbasedonfastFouriertransforms. 2 A.2. AlternateRepresentations inonlyO(nlg3)steps;preciselythesameapproachcanbeappliedhere. Evencleverer divide-and-conquerstrategiesleadtomultiplicationalgorithmswhoserunningtimesare arbitrarilyclosetolinear—O(n1+(cid:34))foryourfavoritevalue e>0—butexceptforafew simplecases,thesealgorithmsnotworththetroubleinpractice,thankstolargehidden constants. A.2 Alternate Representations Part of what makes multiplication so much harder than the other two operations is ourinputrepresentation. Coefficientvectorsarethemostcommonrepresentationfor polynomials,butthereareatleasttwootherusefulrepresentations. Roots The Fundamental Theorem of Algebra states that every polynomial p of degree n has exactly n roots r ,r ,...r such that p(r ) = 0 for all j. Some of these roots may 1 2 n j be irrational; some of these roots may by complex; and some of these roots may be repeated. Despitethesecomplications,thistheoremimpliesauniquerepresentationof anypolynomialoftheform n (cid:89) p(x)=s (x−r ) j j=1 wherethe r ’saretherootsands isascalefactor. Onceagain,torepresentapolynomial j ofdegree n,weneedalistof n+1numbers: onescalefactorand nroots. Given a polynomial in this root representation, we can clearly evaluate it in O(n) time. Giventwopolynomialsinrootrepresentation,wecaneasilymultiplytheminO(n) timebymultiplyingtheirscalefactorsandjustconcatenatingthetworootsequences. Unfortunately,ifwewanttoaddtwopolynomialsinrootrepresentation,we’reout of luck. There’s essentially no correlation between the roots of p, the roots of q, and the roots of p+q. We could convert the polynomials to the more familiar coefficient representationfirst—thistakesO(n2)timeusingthehigh-schoolalgorithms—butthere’s no easy way to convert the answer back. In fact, for most polynomials of degree 5 or moreincoefficientform,it’simpossibletocomputerootsexactly.2 Samples Our third representation for polynomials comes from a different consequence of the Fundamental Theorem of Algebra. Given a list of n+1 pairs {(x ,y ),(x ,y ),..., 0 0 1 1 (x ,y )},thereisexactlyonepolynomial pofdegreensuchthat p(x )= y forall j. This n n j j isjustageneralizationofthefactthatanytwopointsdetermineauniqueline,becausea 2Thisiswherenumericalanalysiscomesfrom. 3 A. FASTFOURIERTRANSFORMS lineisthegraphofapolynomialofdegree1. Wesaythatthepolynomial p interpolates thepoints(x ,y ). Aslongasweagreeonthesamplelocations x inadvance,weonce j j j againneedexactly n+1numberstorepresentapolynomialofdegree n. Addingormultiplyingtwopolynomialsinthissamplerepresentationiseasy,aslong astheyusethesamesamplelocations x . Toaddthepolynomials,justaddtheirsample j values. Tomultiplytwopolynomials,justmultiplytheirsamplevalues;however,ifwe’re multiplyingtwopolynomialsofdegree n,wemuststartwith2n+1samplevaluesfor eachpolynomial,becausethat’showmanyweneedtouniquelyrepresenttheirproduct. BothalgorithmsruninO(n)time. Unfortunately,evaluatingapolynomialinthisrepresentationisnolongerstraightfor- ward. Thefollowingformula,duetoLagrange,allowsustocomputethevalueofany polynomialofdegree natanypoint,givenasetof n+1samples. p(x) = (cid:88)n−1(cid:32) yj (cid:89)(x−x )(cid:33) (cid:81) (x −x ) k j=0 k(cid:54)=j j k k(cid:54)=j Hopefully it’s clear that formula actually describes a polynomial function of x, since eachterminthesumisascaledproductofmonomials. It’salsonothardtoverifythat p(x ) = y for every index j; most of the terms of the sum vanish. As I mentioned j j earlier,theFundamentalTheoremofAlgebraimpliesthat p istheonlypolynomialthat interpolates the points {(x ,y )}. Lagrange’s formula can be translated mechanically j j intoanO(n2)-timealgorithm. Summary Wefindourselvesinthefollowingfrustratingsituation. Wehavethreerepresentations for polynomials and three basic operations. Each representation allows us to almost triviallyperformadifferentpairofoperationsinlineartime,butthethirdtakesatleast quadratictime,ifitcanbedoneatall! representation evaluate add multiply coefficients O(n) O(n) O(n2) roots+scale O(n) ∞ O(n) samples O(n2) O(n) O(n) A.3 Converting Between Representations Whatweneedarefastalgorithmstoconvertquicklyfromonerepresentationtoanother. Thenifweneedtoperformanoperationthat’shardforourdefaultrepresentation,we can switch to a different representation that makes the operation easy, perform the desiredoperation,andthenswitchback. Thisstrategyimmediatelyrulesouttheroot 4 A.3. ConvertingBetweenRepresentations representation,since(asImentionedearlier)findingrootsofpolynomialsisimpossible ingeneral,atleastifwe’reinterestedinexactresults. So how do we convert from coefficients to samples and back? Clearly, once we chooseoursamplepositions x ,wecancomputeeachsamplevalue y =p(x )inO(n) j j j timefromthecoefficientsusingHorner’srule. Sowecanconvertapolynomialofdegree nfromcoefficientstosamplesinO(n2)time. Lagrange’sformulacanbeusedtoconvert thesamplerepresentationbacktothemorefamiliarcoefficientform. Ifweusethenaïve algorithmsforaddingandmultiplyingpolynomials(incoefficientform),thisconversion takesO(n3)time. Wecanimprovethecubicrunningtimebyobservingthatbothconversionproblems boildowntocomputingtheproductofamatrixandavector. Theexplanationwillbe slightlysimplerifweassumethepolynomialhasdegree n−1,sothat nisthenumberof coefficientsorsamples. Fixasequence x0,x1,...,xn−1 ofsamplepositions,andlet V be the n×nmatrixwhere v = xj (indexingrowsandcolumnsfrom0to n−1): ij i 1 x x2 ··· xn−1 0 0 0 1 x x2 ··· xn−1  1 1 1  V =1 x2 x22 ··· x2n−1. ... ... ... ... ...    1 xn−1 xn2−1 ··· xnn−−11 The matrix V is called a Vandermonde matrix. The vector of coefficients a(cid:126) = (a0,a1,...,an−1) and the vector of sample values (cid:126)y = (y0,y1,...,yn−1) are related bythematrixequation Va(cid:126)= (cid:126)y, orinmoredetail, 1 x x2 ··· xn−1 a   y  0 0 0 0 0 1 x x2 ··· xn−1 a   y   1 1 1  1   1  1 x2 x22 ··· x2n−1 a2 = y2 . ... ... ... ... ...  ...   ...       1 xn−1 xn2−1 ··· xnn−−11 an−1 yn−1 Given this formulation, we can clearly transform any coefficient vector a(cid:126) into the correspondingsamplevector (cid:126)y inO(n2)time. Conversely,ifweknowthesamplevalues (cid:126)y,wecanrecoverthecoefficientsbysolving asystemof nlinearequationsin nunknowns,whichcanbedoneinO(n3)timeusing Gaussianelimination.3 Butwecanspeedthisupbyimplicitlyhard-codingthesample 3Infact,Lagrange’sformulaisjustaspecialcaseofCramer’sruleforsolvinglinearsystems. 5 A. FASTFOURIERTRANSFORMS positions into the algorithm, To convert from samples to coefficients, we can simply multiplythesamplevectorbytheinverseof V,againinO(n2)time: a(cid:126)=V−1(cid:126)y. Computing V−1 wouldtakeO(n3)timeifwehadtodoitfromscratchusingGaussian elimination,butbecausewefixedthesetofsamplepositionsinadvance,thematrix V−1 canbehard-codeddirectlyintothealgorithm.4 SowecanconvertfromcoefficientstosamplesandbackinO(n2)time. Atfirstlance, thisresultseemspointless;wecanalreadyadd,multiply,orevaluatedirectlyineither representationinO(n2)time,sowhybother? Butthere’sadegreeoffreedomwehaven’t exploitedyet: Wegettochoosethesamplepositions! Ourconversionalgorithmisslow onlybecausewe’retryingtobetoogeneral. Ifwechooseasetofsamplepositionswith therightrecursivestructure,wecanperformthisconversionmorequickly. A.4 Divide and Conquer Any polynomial of degree at most n−1 can be expressed as a combination of two polynomialsofdegreeatmost(n/2)−1asfollows: p(x)=peven(x2)+x·podd(x2). Thecoefficientsof peven arejusttheeven-degreecoefficientsof p,andthecoefficientsof podd arejusttheodd-degreecoefficientsof p. Thus,wecanevaluate p(x)byrecursively evaluating peven(x2)and podd(x2)andperformingO(1)additionalarithmeticoperations. Nowcallaset X of nvaluescollapsing ifeitherofthefollowingconditionsholds: • X hasoneelement. • Theset X2=(cid:8)x2(cid:12)(cid:12) x ∈X(cid:9)hasexactly n/2elementsandis(recursively)collapsing. Clearly the size of any collapsing set is a power of 2. Given a polynomial p of degree n−1,andacollapsingset X ofsize n,wecancomputetheset{p(x)| x ∈X}ofsample valuesasfollows: 1. Recursivelycompute(cid:8)peven(x2)(cid:12)(cid:12) x ∈X(cid:9)=(cid:8)peven(y)(cid:12)(cid:12) y ∈X2(cid:9). 2. Recursivelycompute(cid:8)podd(x2)(cid:12)(cid:12) x ∈X(cid:9)=(cid:8)podd(y)(cid:12)(cid:12) y ∈X2(cid:9). 3. Foreach x ∈X,compute p(x)=peven(x2)+x·podd(x2). Therunningtimeofthisalgorithmsatisfiesthefamiliarrecurrence T(n)=2T(n/2)+ Θ(n),whichasweallknowsolvesto T(n)=Θ(nlogn). 4Actually,itispossibletoinvertann×nmatrixino(n3)time,usingfastmatrixmultiplicationalgorithms that closely resemble Karatsuba’s sub-quadratic divide-and-conquer algorithm for integer/polynomial multiplication. Ontheotherhand,mynumerical-analysiscolleagueshavereasonablecausetoshootmein thefacefordaringtosuggest,eveninpassing,thatanyoneactuallyinvertamatrixatall,ever. 6 A.5. TheDiscreteFourierTransform Great! Nowallweneedisasequenceofarbitrarilylargecollapsingsets. Thesimplest methodtoconstructsuchsetsisjusttoinverttherec(cid:112)ursivedefi(cid:112)nition: IfX isacollapsible setofsizenthatdoesnotcontainthenumber0,then X ={± x | x ∈X}isacollapsible setofsize2n. Thisobservationgivesusaninfinitesequenceofcollapsiblesets,starting asfollows:5 X :={1} 1 X :={1, −1} 2 X :={1, −1, i, −i} 4 (cid:112) (cid:112) (cid:112) (cid:112) (cid:112) (cid:112) (cid:112) (cid:112) (cid:26) (cid:27) 2 2 2 2 2 2 2 2 X := 1, −1, i, −i, + i, − − i, − i, − + i 8 2 2 2 2 2 2 2 2 A.5 The Discrete Fourier Transform Forany n,theelementsof X arecalledthecomplex nth roots of unity;thesearethe n rootsofthepolynomial xn−1=0. These ncomplexvaluesarespacedexactlyevenly around the unit circle in the complex plane. Every nth root of unity is a power of the primitive nthroot 2π 2π ω =e2πi/n=cos +isin . n n n Atypical nthrootofunityhastheform (cid:129)2π (cid:139) (cid:129)2π (cid:139) ωk =e(2πi/n)k =cos k +isin k . n n n Thesecomplexnumbershaveseveralusefulpropertiesforanyintegers nand k: • Thereareexactly ndifferent nthrootsofunity: ωk =ωkmodn. n n • If niseven,thenωk+n/2=−ωk;inparticular,ωn/2=−ω0 =−1. n n n n • 1/ωk =ω−k =ωk =(ω )k,wherethebarrepresentscomplexconjugation: a+bi= n n n n a−bi • ω =ωk . Thus,every nthrootofunityisalsoa(kn)throotofunity. n kn Thesepropertiesimplyimmediatelythatif nisapowerof2,thenthesetofall nthroots ofunityiscollapsible! Ifwesampleapolynomialofdegree n−1atthe nthrootsofunity,theresultinglist of sample values is called the discrete Fourier transform of the polynomial (or more 5Inthischapter,lowercaseitalicialwaysrepresentsthesquarerootof−1. Computerscientistsare usedtothinkingofi asanintegerindexintoas(cid:112)equence,anarray,orafor-loop,butweobviouslycan’t dothathere. Theengineers’habitofusing j= −1justdelaystheproblem—Howdoengineerswrite quaternions?—andtypographicalhackslikeI oriorιorMathematica’sıı◦arejuststupid. 7 A. FASTFOURIERTRANSFORMS formally, ofitscoefficientvector). Thus,givenanarray P[0..n−1]ofcoefficients,its discreteFouriertransformisthevector P∗[0..n−1]definedasfollows: n−1 (cid:88) P∗[j] := p(ωj) = P[k]·ωjk n n k=0 Aswealreadyobserved,thefactthatsetsofrootsofunityarecollapsibleimpliesthatwe cancomputethediscreteFouriertransforminO(nlogn)time. Theresultingalgorithm, calledthefast Fourier transform,waspopularizedbyCooleyandTukeyin1965.6 The algorithmassumesthat nisapoweroftwo;ifnecessary,wecanjustpadthecoefficient vectorwithzeros. Radix2FFT(P[0..n−1]): ifn=1 return P for j←0ton/2−1 U[j]←P[2j] V[j]←P[2j+1] U∗←Radix2FFT(U[0..n/2−1]) V∗←Radix2FFT(V[0..n/2−1]) ω ←cos(2π)+isin(2π) n n n ω←1 for j←0ton/2−1 P∗[j] ←U∗[j]+ω·V∗[j] P∗[j+n/2]←U∗[j]−ω·V∗[j] ω←ω·ω n return P∗[0..n−1] FigureA.1.TheCooley-Tukeyradix-2fastFouriertransformalgorithm. Variants of this divide-and-conquer algorithm were previously described by Good in 1958, by Thomas in 1948, by Danielson and Lánczos in 1942, by Stumpf in 1937, by Yates in 1932, and by Runge in 1903; some special cases were published even earlier by Everett in 1860, by Smith in 1846, and by Carlini in 1828. But the algorithm, in its full modern recursive generality, was first described and used by Gauss around 1805 for calculating the periodic orbits of asteroids from a finite number of observations. In fact, Gauss’s recursive algorithm predates even Fourier’s introduction of harmonic analysisbytwoyears. So,ofcourse,thealgorithmisuniversallycalledtheCooley-Tukey 6TukeyapparentlydevelopedthisalgorithmtohelpdetectSovietnucleartestswithoutactuallyvisiting Sovietnuclearfacilities,byinterpolatingoff-shoreseismicreadings. Withouthisrediscovery,thenucleartest bantreatymightneverhavebeenratified,andwemightallbespeakingRussian,ormorelikely,whatever languageradioactiveglassspeaks. 8 A.6. MoreGeneralFactoring algorithm. Gauss’s work built on earlier research on trigonometric interpolation by Bernoulli,Lagrange,Clairaut,andEuler;inparticular,thefirstexplicitdescriptionofthe discrete“Fourier”transformwaspublishedbyClairautin1754,morethanhalfacentury beforeFourier’swork. Alas,nobodywillunderstandyouifyoutalkaboutGauss’sfast Clairauttransformalgorithms. HoorayforStigler’sLaw!7 A.6 More General Factoring The algorithm in Figure A.1 is often called the radix-2 fast Fourier transform to distinguishitfromotherFFTalgorithms. Infact,bothGaussand(muchlater)Cooley andTukeydescribedamoregeneraldivide-and-conquerstrategythatdoesnotassumen isapoweroftwo,butcanbeappliedtoanycompositeorder n. Specifically,if n=pq, wecandecomposethediscreteFouriertransformoforder nintosimplerdiscreteFourier transformsasasfollows. Forallindices0≤a<p and0≤ b<q,wehave pq−1 (cid:88) xa∗q+b = xjp+k(ωpjpq+k)(cid:96) (cid:96)=0 p−1q−1 (cid:88)(cid:88) = xjp+kω(paqq+b)(jp+k) k=0j=0 p−1q−1 (cid:88)(cid:88) = xjp+kωqbjωpbqkωapk k=0j=0 p−1(cid:32)(cid:32)q−1 (cid:33) (cid:33) (cid:88) (cid:88) = xjp+k(ωqb)j ωpbqk (ωap)k, k=0 j=0 TheinnermostsuminthisexpressionisonecoefficientofadiscreteFouriertransform oforderq,andtheoutermostsumisonecoefficientofadiscreteFouriertransformof order q. The intermediate factors ωbk are now formally known as “twiddle factors”. pq 7LestanyonebelievethatStigler’sLawhastreatedGaussunfairly,rememberthat“Gaussianelimination” wasnotdiscoveredbyGauss; thealgorithmwasnotevengiventhatnameuntilthemid-20thcentury! EliminationbecamethestandardmethodforsolvingsystemsoflinearequationsinEuropeintheearly 1700s,whenitappearedinIsaacNewton’sinfluentialtextbookArithmeticauniversalis.8AlthoughNewton apparently(andperhapsevencorrectly)believedhehadinventedtheeliminationalgorithm,itactually appearsinseveralearlierworks,includingtheeighthchapteroftheChinesemanuscriptTheNineChapters oftheMathematicalArt. TheauthorsandpreciseageoftheNineChaptersareunknown,butcommentary writtenbyLiuHuiin263ceclaimsthatthetextwasalreadyseveralcenturiesold. Itwasalmostcertainly notinventedbyaChineseemperornamedFast. 8Arithmetica universalis was compiled from Newton’s lecture notes and published over Newton’s strenuousobjections. Herefusedtohavehisnameassociatedwiththebook,andheevenconsideredbuying upeverycopyofthefirstprintingtodestroythem. Apparentlyhedidn’twantanyonetothinkitwashis latestresearch. ThefirsteditioncreditingNewtonastheauthordidnotappearuntil25yearsafterhisdeath. 9 A. FASTFOURIERTRANSFORMS No,seriously,that’sactuallywhatthey’recalled. Thiswallofsymbolsimpliesthatthe discreteFouriertransformoforder ncanbeevaluatedasfollows: 1. Writetheinputvectorintoa p×q arrayinrow-majororder. 2. ApplyadiscreteFouriertransformoforder p toeachcolumnofthe2darray. 3. Multiplyeachentryofthe2darraybytheappropriatetwiddlefactor. 4. ApplyadiscreteFouriertransformoforderq toeachrowofthe2darray. 5. Extracttheoutputvectorfromthe p×q arrayincolumn-majororder. ThealgorithmisdescribedinmoredetailinFigureA.2. FactorFFT(P[0..pq−1]): 〈〈Copy/typecastto2darrayinrow-majororder〉〉 for j←0to p−1 fork←0toq−1 A[j,k]←P[jp+k] 〈〈Recursivelyapplyorder-pFFTstocolumns〉〉 fork←0toq−1 B[·,k]←FFT(A[·,k]) 〈〈Multiplybytwiddlefactors〉〉 for j←0to p−1 fork←0toq−1 B[·,k]←B[·,k]·ωjk pq 〈〈Recursivelyapplyorder-qFFTstorows〉〉 for j←0to p−1 C[j,·]←FFT(C[j,·]) 〈〈Copy/typecastto1darrayincolumn-majororder〉〉 for j←0to p−1 fork←0toq−1 P∗[j+kq]←C[j,k] return P∗[0..pq−1] FigureA.2.TheGauss-Cooley-TukeyFFTalgorithm Wecanrecovertheoriginalradix-2FFTalgorithminFigureA.1bysetting p=n/2 andq=2. Thelines P∗[j]←U∗[j]+ω·V∗[j]and P∗[j+n/2]←U∗[j]−ω·V∗[j]are applyinganorder-2discreteFouriertransform;inparticular,themultipliersωand−ω arethe“twiddlefactors”. Both Gauss9 and Cooley and Tukey recommended applying this factorization ap- proachrecursively. CooleyandTukeyobservedfurtherthatifallprimefactorsof nare 9Gausswrote: “Nullaiamampliusexplicationeopuserit,quomodoillapartitioadhuculteriusextendi etadeumcasumapplicaripossit,ubimultitudoomniumvalorumpropositorumnumerusetribuspluribusve factoribuscompositusest,e.g.sinumerusµrursusessetcompositus,inquocasumanifestoquaevisperiodusµ terminoruminpluresperiodosminoressubdividipotest.”[“Thereisnoneedtoexplainhowthatdivisioncan beextendedfurtherandappliedtothecasewheremostoftheproposedvaluesarecomposedofthreeor morefactors,forexample,ifthenumberµisagaincomposite,inwhichcaseeachperiodofµtermscan 10

Algorithms Appendix PDF

151 Pages·2018·4.46 MB·English

by Jeff Erickson

Checking for file health...

Save to my drive

Quick download

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Algorithms Appendix

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.