Optimizing the Constant-Q Transform in Octave Hans Fugal New Mexico State University P.O. Box 30001, MSC CS Las Cruces, New Mexico 88003 United States of America [email protected] Abstract 4000 The constant-Q transform calculates the log- 3500 frequency spectrum and is therefore better suited 3000 to many computer music tasks than the dis- crete Fourier transform, which computes the linear- 2500 z frequency spectrum. The direct calculation of H2000 theconstant-Qtransformiscomputationallyexpen- 1500 sive, and so the Brown-Puckette efficient constant- 1000 Q transform algorithm is often used. The origi- nal constant-Q transform and the Brown-Puckette 500 constant-Q transform algorithm were implemented 0 2 4 6 8 10 seconds and benchmarked in Octave. The Brown-Puckette algorithm was faster than the direct algorithm with Figure 1: A portion of a normal (linear- a high minimum frequency, but it was not faster for frequency) spectrogram of a monophonic a low minimum frequency. Additional vectorization ofthealgorithmisslowerundermostcircumstances, melody recorded on a pipe organ. Observe the and using sparse matrices gives marginal speedup. harmonic is not equidistant from the fundamen- tal frequency because of the log-frequency na- Keywords ture of harmonics. Octave, spectrum, spectrogram, constant-Q, opti- mization 1 Introduction 2093 Due to the log-frequency nature of the musi- 1047 cal scale and human hearing, and the linear- z H frequency nature of the discrete Fourier trans- 523 form(DFT),theDFTissometimesinconvenient to work with. For example, one semitone in a 262 lowregisterspansafewHz,butinahighregister one semitone spans over 100 Hz. When work- 131 0 2 4 6 8 10 Time ing with pitches, the log-frequency scale is de- sirable. Harmonic partials are integer multiples Figure 2: Constant-Q (log-frequency) spectro- of the fundamental frequency. The spacing be- gram of the same recording portrayed in figure tween harmonics is therefore not uniform in the 1. Observe the harmonics move with the fun- linear-frequency scale, which is inconvenient for damental frequency, and the distance between algorithms which try to recognize the harmonic semitones is constant. The vertical range is signatureofinstruments. Theconstant-Qtrans- more effectively used (the same limits are used form uses a varying time window to calculate on both plots). the log-frequency spectrum, and is better suited for these and many other problems in computer music. Figures1and2visuallydemonstratethe stant for each frequency bin. The Q of a filter is advantages of the constant-Q transform. definedasf/∆f,wheref isthecenterfrequency Theconstant-Qfilterissonamedbecausethe and ∆f is the bandwidth. The bandwidth is in- ratio of frequency to bandwidth (Q) is held con- versely proportional to the number of samples evaluated, and the constant-Q transform varies the number of samples according to the center frequency in order to keep Q constant. Brown [1] defines the constant-Q transform n as: bi y c n e u q e N −1 fr 1 k(cid:88)cq Xcq(kcq) = N wkcq(n)x(n)e−j2πQn/Nkcq kcq n=0 (1) N Q = 1/(21/b − 1) and b is the number of frequency bins per octave (often 12, for semi- Figure 3: The spectral kernel. Most of the ker- tone spacing). N = Qf /f and f = nel is very close to 0, and can be converted into kcq s kcq kcq 2kcq/bfmin. wkcq is a window function of length a sparse matrix. N . I will use B to refer to the total number kcq of constant-Q bins calculated. 2 Methods Brown and Puckette [2] introduced an ef- There were five implementations: the original ficient algorithm for calculating the constant- constant-Q algorithm (cq), the Brown-Puckette Q transform, which leverages the Fast Fourier constant-Q algorithm (ecq), the vectorized Transform and a precomputed sparse spectral Brown-Puckette constant-Q algorithm (vecq), kernel to compute the transform. The kernel the Brown-Puckette constant-Q algorithm calculationisexpensivebutdependsonlyonthe with a sparse-matrix kernel (secq), and the sample rate, number of bins per octave, and vectorized Brown-Puckette constant-Q algo- minimum and maximum frequencies. It is de- rithm with a sparse-matrix kernel (svecq). fined as: The code for each implementation along with the benchmark code can be found online at N(cid:88)0−1 http://hans.fugal.net/research/cq-octave/. K(kcq,k) = wkcq(n)ejωkcqe−j2πkn/N0 (2) Relevant snippets (edited for line length and n=0 brevity) are included below. cqisastraightforwardimplementationof (1), The precomputed kernel and FFT of the sig- usingoneforloopwithavectorizeddotproduct. nal are used to calculate the constant-Q trans- form: for kcq = 0:B-1 f = f_min * r^kcq; Nkcq = round(Q*sr/f); 1 N(cid:88)0−1 Xcq(kcq+1) = \ X (k ) = X(k)K(k ,k) (3) cq cq N cq sum(hamming(Nkcq) .* (x(1:Nkcq) .* \ 0 k=0 exp(j2piQn(1:Nkcq)/Nkcq))) / Nkcq; end As an optimization, the product may only be evaluatedwheretheabsolutevalueofanelement ecq is a straightforward implementation of (3), ofthespectralkernelisgreaterthanMINVAL = using one for loop with a vectorized dot prod- 0.15. Only 1–2% of the spectral kernel elements uct. The kernel used for ecq is the full kernel are larger than MINVAL (see figure 3). without truncation below MINVAL. It was ex- This paper explores the performance and op- pected that because of the precomputation of timization of Octave1 code to calculate the the spectral kernel, ecq would outperform cq. constant-Q transform using the original and Brown-Puckette algorithms described above. X = fft(x, N0); TheBrown-Puckettealgorithmisnotalwaysthe for kcq = 1:B fastest, and some would-be optimizations actu- Xcq(kcq) = sum(X .* K(kcq, :)) / N0; ally harm performance. end 1Octaveisanopen-sourcehigh-levellanguagefornu- This code is already somewhat vectorized using merical computation, and is mostly compatible with MATLAB. the .* and sum operations, in place of another for loop, and is natural for an Octave or MAT- 7 MacBook LAB programmer. The cardinal rule of opti- Athlon 6 mization in Octave and MATLAB is to avoid for loops in favor of vectorization wherever pos- 5 sible. By employing an additional vectorization ds 4 technique using the repmat function, the outer on for loop can also be eliminated, and this is done sec 3 in vecq. B rows of X are replicated to give a 2 B×N matrix, which is the size of the kernel K. 1 These two matrices are then element-wise mul- tiplied, and the sum is taken across the rows. It 0 cqkern sparse cq ecq vecq secq svecq was expected that this optimization would im- prove performance over ecq. Figure 4: Mean execution time and standard deviation over 100 runs with f = 16.35 Hz X = fft(x, N0); min (C0, MIDI note 12), on two computers. Xcq = sum((repmat(X,B,1) .* K), 2)/N0; The motivation behind the Brown-Puckette 0.55 MacBook constant-Q algorithm is that the spectral ker- 0.5 Athlon nel is sparse and few multiplications per bin are 0.45 needed. The efficient way to implement this in 0.4 Octave is to use a sparse matrix, which allows ds 0.35 Octave to avoid doing expensive operations on on 0.3 ec zeroswithoutexpensiveforloopsandcondition- s 0.25 als in the Octave code. secq is ecq executed 0.2 with a sparse kernel. Likewise, svecq is vecq 0.15 with a sparse kernel. secq and svecq were ex- 0.1 0.05 pected to perform better than ecq and vecq re- cqkern sparse cq ecq vecq secq svecq spectively because of the sparse kernel. cqkern is the code to calculate the spectral Figure 5: Mean execution time and standard kernel and sparse is the code to set values of deviation over 100 runs with f = 130.81 Hz min K less than MINVAL to 0 and then convert K (C3, MIDI note 48), on two computers. into a sparse matrix. Their execution time is theone-timeoverheadneededtousetheBrown- Forthefirstexperiment(f = 16.35Hz),cq Puckette algorithm family. min is the fastest by far. vecq and svecq are slower The parameters used for the first experiment thanecqandsecqrespectively. secqandsvecq are a sample rate of 44100 Hz, 12 bins per oc- are faster then ecq and vecq respectively. tave, f = 16.35 Hz (C0, MIDI note 12), and min For the second experiment (f = 130.81 f at the Nyquist frequency. min max Hz), cq is the slowest by far. On the Athlon Each algorithm ran 100 times in succession, vecq and svecq are still slower than ecq and timedusing cputime. Theexperimentwasdone secqrespectively,butontheMacBookvecqand on two systems. The first is a MacBook with svecqarefasterthanecqandsecqrespectively. a 2GHz Intel Core Duo processor and 2G of Inthisexperimenttheeffectofthesparsekernel RAM, running OS X 10.5 with Octave 3.0.3 is negligible on the Athlon. from MacPorts. The second is an Athlon 64 2800+ (1.8GHz) desktop with 1G of RAM, run- 4 Discussion ning32-bitUbuntu8.10withOctave3.0.1stock. The measure FFTW planner was used. The Brown-Puckette algorithm as implemented The experiment was then repeated with the was not always more efficient. For lower f min same parameters except f = 130.81 Hz (C3, the direct algorithm is considerably faster. But min MIDI note 48). for higher f the Brown-Puckette algorithm min is considerably faster, and the overhead for cal- 3 Results culating the spectral kernel is paid for almost Figures 4 and 5 show the mean execution times immediately. An analysis of the algorithms with standard deviation for each experiment. reveals that they are all of the same order: O(N logN ). (N depends primarily on f in 0 0 0 min practice, when the sample rate and Q are held constat.) It may be that when N is large the 0 speed of memory access and memory manage- ment issues dominate, but when N is small the 0 inefficiencies of for loops in Octave dominate. Each algorithm is of the same order but the im- plementations depend on the size of the input in complex ways. Conventional wisdom says that the more vec- torized the code is, the faster Octave can exe- cute it. However, the repmat vectorization of the Brown-Puckette algorithm was slower than the straightforward implementation, except on theMacBookforhigherf . Theworking-copy min space complexity of the straightforward imple- mentation is O(N ), and that of the repmat im- 0 plementation is O(N logN ). Care was taken 0 0 to avoid swapping to disk during the experi- ments, but we may be seeing a slowdown due to more memory accesses and memory manage- ment. Thespeedupof vecqontheMacBookfor higher f may be due to parallelization on the min dual cores. Sparse matrices improve speed (or at least do not slow it down), and the overhead of making the kernel sparse is not great, so when using the Brown-Puckette algorithm sparse matrices should be used. But the sparseness does not make an order-of-magnitude difference, and if higher accuracy is desired it can be omitted. 5 Future Work It would be instructive to run the same exper- iments on MATLAB and compare the perfor- mance to that of Octave, both in relative and absolute terms. Would the repmat optimization be slower on MATLAB as well? Therecouldbefurtherinvestigationofthein- herentdifferencesinthetwoalgorithms,perhaps comparing implementations in C, where mem- ory management is more transparent and loops are not inefficient, making a direct implemen- tation of the Brown-Puckette constant-Q algo- rithm more true in spirit to the original paper. References [1] Judith C. Brown. Calculation of a constant Q spectral transform. J. Acoust. Soc. Am, 89(1):425–434, January 1991. [2] Judith.C.BrownandMillerS.Puckette. An efficient algorithm for the calculation of a constant Q transform. J. Acoust. Soc. Am, 92(5):2698–2701, November 1992.