ebook img

A new Truncated Fourier Transform algorithm PDF

0.22 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview A new Truncated Fourier Transform algorithm

A new in-place truncated Fourier transform algorithm 2 1 Andrew Arnold ([email protected]) 0 2 October 25, 2012 t c O Abstract 4 2 Let R be a ring containing a principal N = (2k)th root of unity. We present an algorithm which, given a polynomial f(z) over R of degree ] less than n ≤ N, gives a vector of n weighted evaluations of f(z), using C 1nlogn+O(n) ring multiplications. This algorithm requires only the 2 S spacefortheinputitselfandadditionalO(1)ringelementsandbounded- . s precision integers. The algorithm uses a linear-time method of breaking c f(z) into reduced images modulo polynomials of the form zm +1, m a [ poweroftwo,thenusestheFastFourierTransform(FFT)togetavector 2 of evaluation points of each image. The result is an in-place truncated v Fourier transform with complexity comparable to the truncated Fourier 0 transform of Van der Hoeven. Using this algorithm we give an in-place 6 algorithmforpolynomialmultiplicationrequiringspaceforonlytheinputs 9 and output and an additional O(1) bounded-precision integers and ring 4 elements. . 0 1 1 Introduction 2 1 : The Fast Fourier Transform (FFT) is an algorithm which maps a polynomial v i f(z)overaringRtoavectorcomprisedofnevaluationsoff(z)usingO(nlogn) X arithmetic ring operations. This algorithmlends itself to fast integer and poly- r nomial multiplication. The radix-2 Cooley-Tukey FFT [1] requires that the in- a puthaveapower-of-twosize. Thisresultsinjumpsinspaceandtimecomplexity at power-of-two sized inputs. Truncated Fourier Transform (TFT) algorithms byVanderHoeven[6]andbyHarveyandRoche[4]haveaddressedthesejumps in time and space complexity, respectively. Inthispaperwepresentanewin-placeTFTalgorithmthatimprovesonthe complexity of the Harvey-Roche TFT, in terms of the number of ring multipli- cations, by a constant factor. By in-place, we mean that the algorithm, acting on a size-n input, writes its result in place of its input and only requires space fortheoutputitselfplusanadditionalO(1)ringelementsandintegersbounded by cn for a fixed constant c. We then show how this algorithm may be used towards polynomial multiplication. In this section we describe the FFT and the previous TFT algorithms. 1 1.1 The discrete Fourier transform and its inverse The discrete Fourier transform (DFT) of a polynomial f(z) is its vector of evaluations at the distinct powers of a root of unity. Specifically, if f(z) = n−1a zi and R contains an nth primitive root of unity ω, then we define the i=0 i Pdiscrete Fourier transform of f(z) as DFTn(f)= f(ωj) . (1) ω 0≤j<n (cid:0) (cid:1) Wetreatthevectora=(a ,a ,...,a )andf asequivalentanduseDFTn(a) 0 1 n−1 ω andDFTn(f)interchangeably. The map DFTn :R[z]/(zn−1)→Rn forms an ω ω additive group isomorphism. If ω is a principal root of unity, that is, for j not divisible by n, n−1ωij =0, then we can obtain f mod(zn−1) by way of its i=0 inverse map, IDPFTn :Rn →R[z]/(zn−1), defined by ω IDFTnω(aˆ0,...,aˆn)= n1DFTω−1(aˆ0,...,aˆn), (2) where we againwe treat a polynomialas equivalent to its vector of coefficients. The DFT gives a polynomial multiplication algorithm: Theorem 1 (The Convolution Theorem). fg mod(zn−1)=IDFTn(DFT (f)·DFT (g)), (3) ω ω ω where “·”is thevector dot-product, and, given twopolynomial g(z),h(z)∈R[z], g(z)modh(z) denotes the unique polynomial r(z) such that h(z) divides g(z)− r(z) and deg(r)<deg(h) throughout. 1.2 The Fast Fourier Transform WecancomputethediscreteFouriertransformoff(z),f reducedmodulo(zn− 1), by way of the Fast Fourier Transform (FFT), which recursively reduces a DFT into smaller DFTs based on the factorization of n. The radix-2 FFT of Cooley and Tukey [1], computes a DFT for n a power-of-two by a divide-and- conquer approach. We break f(z) into two polynomials f (z) and f (z), where 0 1 n/2−1 n/2−1 f (z)= a zk, f (z)= a zk. (4) 0 2k 1 2k+1 Xk=0 Xk=0 Notethatf(z)=f (z2)+zf (z2). Thus,asωj+n/2 =−ωj,andω2j =ω2(j+n/2) 0 1 for an nth root of unity ω, we have that f(ωj)=f (ω2j)+ωjf (ω2j), f(ωj+n/2)=f (ω2j)−ωjf (ω2j), (5) 0 1 0 1 for 0 ≤ j < n/2. The pair of computations (5) are known as a butterfly op- eration. We perform these butterfly operations in place. Note that the values f (ω2j) comprise DFT (f ) for i = 0,1, two DFTs of size n/2. Thus, to com- i ω2 i pute the discrete Fourier transform of f, we split f into polynomials f and 0 2 f , compute their size-n/2 DFTs with root-of-unity ω2, and then compute the 1 butterfly operations (5), for 0 ≤ j < n, to obtain the evaluations of f at the powers of ω. If the butterfly operations are performed in place, the resulting evaluations f(ωj) will be written in bit-reversed order. More precisely, if we let rev (j) c denote the integer resulting from reversing the first c bits of j, we have that f(ωj) will be written in place of a , where k = rev (j) and log is taken to k log(n) be base-2 throughout. As an example, rev (11)=rev (01011 )=11010 =16+8+2=26. (6) 5 5 2 2 Procedure FFT(a, n, ω), an in-place implementation of the radix-2 Cooley-Tukey FFT Input: • a=(a(0),...,a(n−1)), an array containing a vector a∈Rn. • ω, a root of zn−1, for n a power of two. Result: DFTn(a) is written to the array a in bit-reversed order. ω 1 for i←−1 to log (n) do 2 2 m←−n/2i // Complete m DFTs of size n/m 3 (γ,ζ)←−(1,ωm) 4 for j ←−0 to n −1 do 2m 5 l←−rev (j)m i 6 for k ←−0 to m−1 do 7 (l ,l )←−(l+k,l+k+m) 1 2 // Butterfly operations (5) 8 a(l ),a(l ) ←− a(l )+γa(l ),a(l )−γa(l ) 1 2 1 2 1 2 (cid:16) (cid:17) (cid:16) (cid:17) 9 γ ←−γζ ProceduresFFTandIFFTdescribeimplementationsofthe in-placeFFTand its inverse, made non-recursive so as to make them in-place. Normally one would pre-compute powers of ω and store them in an array in bit-reversed order. To avoid this space overhead we perform the butterfly operations in an order such that we can compute the required powers of ωm sequentially. For better memory performance, one can produce the powers of ω in an order that allows us to traverse the array in sequential order, at the cost of an additional O(n) ring multiplications. The elements γ, ω, ζ, and array a comprise all the ring elements in the al- gorithm. Our only ring operations are due to the butterfly operations (line 8) and computing the powers of ω (lines 3, 9). ωm can be computed by a square- and-multiply approach using O(logm) multiplications. Enumerating these op- erations gives us the following cost: 3 ProcedureIFFT(a,n,ω),aninplaceimplementationoftheinverse-FFT Input: a=(a(0),...,a(n−1)), an array containing DFTn(a), for a ω vector a∈Rn, for n a power of two, in bit-reversed order. Result: The vector a is written to array a in place of its DFT. 1 for i←−1 to log (n) do 2 2 m←−2i−1 // Complete n DFTs of length 2m with root of unity ω−1 2m 3 (γ,ζ)←−(1,ω−m) 4 for j ←−0 to n −1 do 2m 5 for k ←−0 to m do 6 l←−2km 7 (l ,l )←−(j+l,j+l+m) 1 2 // Butterfly operations (5) 8 a(l ),a(l ) ←− a(l )+γa(l ),a(l )−γa(l ) 1 2 1 2 1 2 (cid:16) (cid:17) (cid:16) (cid:17) 9 γ ←−γζ 10 for i←−0 to n−1 do a(i)←− 1a(i) n Lemma 2. Let n be a power of two and let a be an array containing the co- efficients of a polynomial f(z) reduced modulo (zn−1). Computing DFTn(a) ω by way of procedure FFT, requires nlogn+O(1) additions and 1nlogn+O(n) 2 multiplications in R. Procedure IFFT has similar complexity. Theapproachdescribedfornapoweroftwogeneralizesfornaprimepower or a product of small primes, though in practise n is almost always chosen to be a power-of-2. Using the radix-2FFT, if d=deg(fg), we choosen to be the least powerof 2exceedingd. Thisentailsappendingzerostotheinputarraysaandb. Bythis method, computing a product of degree 2k requires the same time and space cost as computing a product of degree 2k +m, for 0 ≤ m < 2k. It would be preferable to have a radix-2 FFT algorithm whose time cost grows smoothly relative to n, and which requires n+O(1) space for arbitrary integers n > 0, and not just for n a power of two. Crandall’s devil’s convolution algorithm [2] somewhat flattens these jumps in complexity, though not entirely. It works by reducing a discrete convolutionofarbitrarylength into more easily computable convolutions. More recently, truncated Fourier transform (TFT) algorithms, described hereafter, have addressed this issue. 1.3 Truncated Fourier Transform algorithms The Truncated Fourier Transform (TFT) algorithm due to Van der Hoeven [6] returns a pruned DFT such that the result contains as few evaluations as is necessary. Suppose deg(fg)<nandN =2k where k=⌈log n⌉. Ifω isanNth 2 4 primitive root of unity, we define the TFT of f(z) with parameters ω and n to be the vector of evaluation points TFTn(f)= f(ωrevk(0)),f(ωrevk(1)),...,f(ωrevk(n−1)) . (7) ω (cid:16) (cid:17) It comprises the first n terms of DFTn(f), written in bit-reversed order. The ω TFT algorithmworksby discardingunnecessarybutterfly operations(5) where at least one of its inputs are 0. The inverse transform is more involved. Given alength-N vectora,the inverseTFT takesTFTn((a ,...,a ))andthe non- ω 0 n−1 transformed vector coefficients (a ,...,a ) and returns the entire vector n+1 N−1 a. For our purposes a is the vector of coefficients of f and a is presumed to be l zero for l ≥ n. The algorithm relies on the fact that one can obtain any two values in a butterfly relation (5) given the other two values by solving a 2×2 linear system. This method still requires space for N +O(1) ring elements. Theorem 3 (Van der Hoeven). The TFT algorithm computes TFTn(f) using ω n⌈logn⌉+O(n)ringadditions and 1(n⌈logn⌉)+O(n)multiplications bypowers 2 of ω. The inverse TFT algorithm requires similarly many multiplications by powers of ω, and n⌈logn⌉+O(n) shifted ring additions. A shifted addition is merely the combined operation of an addition plus, possibly, a multiplication by 2±1. For floating-point numbers and x ∈ Z/pZ, multiplication by 2±1, using bit shifts and bit operations, has bit-complexity comparable to addition. We note that Van der Hoeven computed more precise bounds on the number of ring operations (i.e. without “big-Oh”notation). We chose to write the cost as is to simplify the expressions in terms of n. Van der Hoeven’s algorithmalso generalizes to allow us to compute arbitrarysubsets of DFTs. We define, for a set I ⊂{0,1,...,N −1}, TFTI(f)= f(ωrevk(i)) . (8) ω (cid:16) (cid:17)i∈I By this notation TFTn =TFTI for I ={0,1,...,n−1}. ω ω More recently, Harvey and Roche [4] developed an in-place variant of the TFT which writes TFTn in place of (a ) . This algorithm was used to- ω i 0≤i<n wards fast in-place polynomial multiplication. This algorithm is implemented iteratively to avoid using an additional O(logn) space for the stack, but the algorithm effectively has a recursive structure. We first recursively compute TFT⌈n/2⌉(f )=f (ω2revk(i)), 0≤i≤⌈n/2⌉, (9) ω2 0 0 in place of the subset of coefficients of f, (a ,a ,...,a ). Then, if n is odd, 0 2 ⌈n/2⌉ wecomputeγ =f (ω2l),wherel=rev (⌈n/2⌉). SuchanevaluationtakesO(n) 1 k ring operations. We then compute TFT⌊n/2⌋(f )=f (ω2revk(i)), 0≤i≤⌊n/2⌋, (10) ω2 1 1 in place of the remaining coefficients of f , (a ,a ,...,a )). These two 1 1 3 ⌊n/2⌋ smaller TFTs, and γ if n is odd, gives us the information requried to complete 5 the butterfly relations needed to compute TFTn(f). These additional linear- ω time evaluations make the Harvey-Roche truncated Fourier transform require, at worst, a constant factor additional ring operations. Theorem 4 (Roche [5], Chapter 3). The in-place truncated Fourier transform requires at most 5n⌈logn⌉+ n−1 ring multiplications to compute TFTn(f). 6 3 ω 1.4 The Half-DFT In the TFT algorithm of this paper we will use the half-DFT (HDFT). For ω, a root of zn+1, the half-DFT is the map defined by HDFTn(f)= f(ω1),f(ω3),...,f(ω2n−1) . (11) ω (cid:0) (cid:1) It is the evaluation of f at the roots of zn+1. The half-DFT can be computed by way of a DFT [3]. Namely, if we let a′ be the vector defined by a′ = a ωj j j for 0≤j <n, then, up to re-ordering, HDFTn(a)=DFTn (a′). (12) ω ω2 Procedure HalfDFTdescribes an implementation of this approach. Procedure HalfDFT(a, n, ω), an in-place algorithm for computing the half-DFT Input: • a=(a(0),...,a(n−1)), an array containing a vector a∈Rn. • ω, a root of zn+1, where n is a power of two. Result: The half-DFT HDFTn(a) is written in place of a ω 1 γ ←−1 2 for i←−0 to n−1 do 3 a(i)←−γa(i) 4 γ ←−γω 5 FFT(a, n, ω2) ProcedureInverseHalfDFT(a,n,ω),anin-placealgorithmforinverting the half-DFT Input: • a=(a(0),...,a(n−1)), an array containing HalfDFTn(a) for some ω a∈Rn. • ω, a root of zn+1, where n is a power of two. Result: a is written in place of HDFTn(a) ω 1 IFFT(a, n, ω2) 2 γ ←−1 3 for i←−0 to n−1 do 4 a(i)←−γa(i) 5 γ ←−γω−1 Inverting the half-DFT is straightforward: we apply an inverse FFT and then multiply the ith coefficient by ω−i. We have the following complexities. 6 Lemma5. Letnbeapoweroftwo. HalfDFT(a,n,ω)andInverseHalfDFT(a,n,ω) both require nlogn+O(1) additions, and 1nlog n+O(n) multiplications. 2 2 The costofHalfDFT(a,n,γ)ismerelythe costofanFFT plusanadditional 2n ring multiplications. 2 An in-place TFT algorithm 2.1 Description of the result of the algorithm and notation We use the following notation throughout section 2 and thereafter. Suppose now that we have a polynomial f(z) = n−1a zj ∈ R[z] of degree at most j=0 j n−1, where n can be any positive integPer. Write n as n = s n , where i=1 i ni =2n(i), n(i)∈Z≥0 and ni >nj for 1≤i<j ≤s. For 1≤i≤Ps, we let i Φ =zni +1 and Γ = Φ . (13) i i i jY=1 Note that Φ mod Φ = 2 and Γ mod Φ = 2i for j ≥ i. Our transform i j i j algorithmwillcomputethe evaluationsoff(z)atthe rootsofΦ . Namely,ifwe i fix a canonicalrootω =ω of Φ , and then let, for 2≤i≤s, ω =ωn(1)/n(s), a 1 1 i 1 root of Φ , we will compute the vector of evaluations i f(ωj) for 0≤j <n , 1≤i≤s. (14) i i If we let I(n)={0≤k <n:for some i,1≤i≤s,n ≤k<2n }, (15) i i then the set of evaluations we compute is exactly TFTI(n)(f). ω The roots of unity used as evaluation points in TFTI(n)(f) will differ from ω those used in TFTn(f). For instance, TFTn will always use all roots-of-unity ω ω oforderdividingn ,whereasTFTI(n) willinsteaduseallrootsofunityoforder 1 ω 2n . 1 We define, for 1≤i≤s, the images f =f modΦ and f∗ =2−i+1f . (16) i i i i We call the polynomials f∗ the weighted images of f. We will first compute f∗ i i as they are less costly to compute than the images f . Once we have all the i weighted images f∗ we will reweigh them to obtain the unweighted images f . i i We also define, for 1≤i≤s, the images C =f modΓ . (17) i i We call the polynomials C the combined images of f. We also use i f if i=0 q = (18) i (cid:26) f quo Γi if 1≤i≤s 7 the quotient produced by dividing f by Γ and i n∗ =nmodn . (19) i i It is straightforward to obtain q , given f. Note that the degrees of any two i distinct terms of Γ differ by at least n , and that deg(q ) < n . Thus, as Γ is i i i i i monic,wehavethatthe coefficientsofq merelycomprisethe coefficientsofthe i higher-degree terms of f. More precisely, qi =an−1z(n∗i−1)+an−2z(n∗i−2)+···+an−n∗+1z+an−n∗. (20) i i By a similar argument, we also have that, for 1<i≤s, q =q quo Φ . (21) i i−1 i We note that q =0. We will also use the remainder resulting from dividing q s i by Φ , i r =q modΦ , 1≤i≤s. (22) i i−1 i Our transform has three steps: we first iteratively break f in-place into the remainders r ; we then iteratively write f∗ in place of r for i = 1,2,...,s; i i i lastly we reweigh the weighted images f∗ to get f and perform a half-DFT on i i each image f separately to give us the weighted evaluation points. i 2.2 Breaking f into the sequence of remainders r ,...,r 1 s We first break f =q0 into its quotient and remainder dividing by zn1 +1, n1−1 r =f mod(zn1 +1)= (a −a )zi, (23) 1 i i+n1 Xi=0 n∗ 1 q =f quo (zn1 +1)= a zi, (24) 1 i+n1 Xi=0 where a = 0 for i ≥ n. This can be done in place with n∗ subtractions in R. i 1 We then similarly break q into r and q , then q into r and q , and continue 1 2 2 2 3 3 until we have r ,...,r and q . Since deg(q ) < n = deg(Φ ), r is 1 s−1 s−1 s−1 s s s exactly q . s−1 This process is completely reversible. From r = q modΦ and q = i i−1 i i q quo Φ we can easily obtain q . If we write r = ni−1b zi and q = i−1 i i−1 i j=1 i i n∗i−1c zi, then P j=0 i P qi−1 =(b0+c0)+(b1+c1)z+···+(bn∗−1+cn∗−1)zn∗i−1+ (25) i i bn∗izn∗i +···+bni−1zni−1+ c0zni +···+cn∗zn∗i−1. i As r = q modΦ = q , we can obtain q from r and r . We then s s−1 s s−1 s−1 s−1 s can obtain q from q and r until we have q =f. j j+1 j+1 0 8 2.3 Writing the weighted images f∗ in place of the remain- i ders r i We first note that f∗ is precisely r . We will iteratively produce the remaining 1 1 weighted images. Suppose,atthestartoftheithiteration,wehavef∗,...,f∗,andr ,...,r , 1 i i+1 s and we want to write f∗ in place of r . We have i+1 i+1 f∗ =2−if modΦ , (26) i+1 i+1 =2−i(Γ q +C )modΦ , (27) i i i i+1 = q +2−iC modΦ , (28) i i i+1 =(cid:0)r + 2−(cid:1)iC modΦ . (29) i+1 i i+1 (cid:0) (cid:0) (cid:1)(cid:1) Unfortunately,wedon’thavethecombinedimageC ,butrathertheweighted i images f∗, 1 ≤ j ≤ i, from which we can reconstruct C . We would like to be j i able to compute C modΦ in place from the weighted images f∗. To that i i+1 j end, the following lemma gives us a basis for C modΦ . i i+1 Given an integer e, we will let e[i] refer to the ith bit of e, i.e. ⌊log(e)⌋ e= e[i]2i, e[i]∈{0,1}. (30) Xi=0 Lemma 6. Fix i, j and k, 1 ≤ j ≤ i < k ≤ s. Suppose that f∗ = ze and j f∗ = 0 for l 6= j, 0 ≤ l ≤i. Then C modΦ is nonzero only if e[n(l)]= 1 for l i k j <l ≤i, in which case C modΦ =2i−1ze modΦ . (31) i k k Given that znk modΦk =−1, we have that 2i−1ze modΦ =(−1)e[n(k)]2i−1z(emodnk), (32) k where emodn is the integer e∗ such that n |(e−e∗) and 0 ≤ e∗ < n . The k k k valuese[n(l)]canbe determinedfromn =2n(l) andebywayofabitwiseAND l operation. Example 7. Suppose n=86=64+16+4+2, and suppose that f∗ =f modz64+1=ze, f∗ =0, f∗ =0, 1 2 3 and deg(f) < 64+16+4. Then one can check that C mod(z2 +1), where 3 C =f rem (z64+1)(z16+1)(z4+1) , satisfies 3 (cid:2) (cid:3) 0 if e∈[0,20)∪[24,28)∪[32,52)∪[56,60),  4 if e=20,28,52, or 56, C3 mod(z2+1)= 4−z4 iiff ee==2212,,2390,,5534,, oorr 5578,, −4z if e=23,31,55, or 59.  9 Our lemma is stated more generally than what is needed for the algorithm, as we only need a result for C modΦ for 1 ≤ i < s. However, in doing so i i+1 this allows us to prove the lemma inductively. Proof of lemma 6. We fix e and j, and prove by induction on i. Base case: First consider the base case i = j, in which case the non-zero criterionof the lemma always holds and we need only show (31). We have that C ≡2i−1f∗ ≡2i−1ze (mod Φ ), (33) i i i and, by Chinese remaindering, C =C +Γ Γ−1 2i−1ze−C modΦ . (34) i i−1 i−1 i−1 i−1 i (cid:0) (cid:0) (cid:1) (cid:1) Since f modΦ =0 for 1≤l<i, it follows that C =0, and l i−1 C =Γ ze modΓ . (35) i i−1 i As e< n , Γ ze is already necessarily reduced modulo Γ , and we have C is i i−1 i i exactly Γ ze. Reducing this modulo Φ , we have i−1 k C modΦ =2i−1ze modΦ (36) i k k as desired. Inductive step: Suppose now that the lemma holds for a fixed i ≥ j, and consider C modΦ , k >i+1. We suppose that f∗ =0 for 1≤l6=j ≤i+1 i+1 k l and f∗ =ze. As f modΦ =0, Chinese remaindering gives us j i+1 C =C −Γ Γ−1C modΦ . (37) i+1 i i i i i+1 (cid:0) (cid:1) We prove the inductive step by cases. Case 1: Ife[n(l)]=0forsomel,j <l ≤i,thenbytheinductionhypothesis, C modΦ =0. Thus,given(37),C =C andC modΦ =C modΦ . i i+1 i+1 i i+1 k i k By the induction hypothesis again, C modΦ = 0, completing the proof for i k this case. Case 2: If e[n(l)] = 1 for j < l ≤ i, then by the induction hypothesis and (32), C modΦ =2i−1ze modΦ =2i−1(−1)e[n(i+1)]z(emodni+1), (38) i i+1 i+1 applying this to (37) gives C =C −Γ (−1)e[n(i+1)]z(emodni+1) modΓ . (39) i+1 i i i+1 (cid:16) (cid:17) By inspection of the degrees of the polynomials appearing in (39), we have equality: C =C −Γ (−1)e[n(i+1)]z(emodni+1) . (40) i+1 i i (cid:16) (cid:17) 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.