ebook img

LAC 2012 - Paper: Controlling adaptive resampling PDF

2012·0.19 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview LAC 2012 - Paper: Controlling adaptive resampling

Controlling adaptive resampling Fons ADRIAENSEN, Casa della Musica, Pzle. San Francesco 1, 43000 Parma (PR), Italy, [email protected] Abstract equipment includes such processing on selected digital inputs. Combiningaudiocomponentsthatuseincoherentsam- For a sofware solution the main problem is not ple clocks requires adaptive resampling - the exact ra- tioofthesamplefrequenciesisnotknownaprioriand the variable ratio resampling itself, but how to may also drift slowly over time. This situation arises control it. Audio data is handled in blocks of whenusingtwoaudiocardsthatdon’thaveacommon typically a few milliseconds lenght, and the only word clock, or when exchanging audio signals over a information available to control the resampling network. Controllingtheresamplingalgorithminsoft- algorithm are the timestamps for these blocks, ware can be difficult as the available information (e.g. and in some cases data provided by e.g. ALSA’s timestamps on blocks of audio samples) is usually in- snd pcm avail() and similar functions. All this exactandverynoisy. Thispaperanalysestheproblem informationhasconsiderablerandomandsystem- and presents a possible solution. atic errors, and it is by no means evident how to turn it into the required smoothly changing con- Keywords trol signal for the resampling algorithm. Jack, ALSA, network audio, resampling None of the currently available solutions such as the alsa in and alsa out clients that come with 1 Introduction Jack really gets this right. The purpose of this Adaptive resampling is required when combin- paper is to analyse the problem and present a so- ing audio hardware running at incoherent sam- lution. The algorithms discussed in the follow- ple rates into a single system. Incoherent here ing sections have been implemented in two new means that the clocks are not derived from the applications, zita a2j and zita j2a wich allow to same source. Although the nominal sample rates add ALSA soundcards running at arbitrary sam- are known, their ratio is not exact and may even ple rates as clients to a Jack server. Other imple- drift over time. A fixed resampling ratio, e.g. mentations, e.g. for networked audio or allowing 44100/48000, orevenonethattakesknownerrors to link two Jack servers running on the same ma- into account, will sooner or later require samples chine will follow. to be inserted or dropped and this is in general not acceptable. 2 Requirements The problem arises when adding a second sound Resampling itself, even with a variable ratio, will cardtoaJackserver, orwhentwomachines, each not lead to any signficant loss of audio quality if having their own audio hardware but no common implemented correctly. The main consequences clock, need to exchange audio signals via a net- when using fixed-bandwidth resampling (as e.g. work connection. in libsamplerate and zita-resampler) will be a In hardware this is a relatively simple problem small loss of bandwidth and some additional de- to solve if both sample clocks are available or can lay. Both depend on the length of the multiphase be extracted from the data streams. A PLL is filter used by this algorithm, and the tradeoff be- used to track the ratio error, and its output con- tween the two can be made by the user. trolsavariableratioresampler. Someprofessional The effect of a non-constant resampling ratio de- pends on the magnitude of the variation and on A2J DLL LF data queue Control DLL its spectrum. Very slow and small changes are equivalent to small movements of a listener w.r.t the speakers, or a performer w.r.t. the micro- bHufWfer auLdoicok q-fureeuee Resampler bHufWfer phone. To put this in context, one sample at 48 ALSA thread Jack client Jack server kHz corresponds to about 7 millimeters in air. So provided such changes are limited to a few sam- J2A DLL Control LF data queue DLL ples and very gradual they won’t be noticed1. Larger variations, even when quite slow, may bHufWfer Resampler auLdoicok q-fureeuee bHufWfer leadtoperceptibledelayandpitchchangeswhich, Jack server Jack client ALSA thread while not really degrading the audio signal qual- ity, may not be acceptable from a musical point Figure 1: The structure of A2J and J2A of view. But the really nasty effect of delay modula- tionoccurswhenthevariationscontainhigherfre- 3 Problem analysis quency components, even if these are quite small. The result no longer appears as a modulation of Anyone trying to build an abstract picture of this the signal (e.g. vibrato) but as parasitic signals, sort of algorithm and to reason about it will find noise or distortion. Some types of sound are very that it stretches his or her powers of imagination sensitive to such phase modulation. This is really to some degree. It helps to have a particular im- the equivalent of jitter on the sample clock of an plementation in mind. In this paper we will use A/D or D/A converter, and sadly, some of the the structure of zita a2j and zita j2a as a frame- existing implementations of adaptive resampling work. However, the analysis is more general and produce this at level that is some orders of mag- applies in other cases as well. nitude higher than the worst hardware. Figure 1 shows the structure of those applica- tions. In this figure both are shown with the Jack Apart from audio quality considerations there are client connected directly to the sound card used someotheraspectswhichareimportant. Boththe by the Jack server — this is the setup we would resampling,andmovingaudiosignalsbetweendo- use to measure the latency (using e.g. jack delay) mains with asynchronous periods will introduce in a round-trip involving both Jack’s sound card latency. It depends on the application if this is and the additional one. acceptable or not. But at least the delay should Taking zita a2j as an example, the ALSA be stable and repeatable. Stability means that it thread, as regards audio processing, does little must not depend on e.g. whether a Jack client more than transfer audio frames from the ALSA implementing the resampling runs at the start or device to a lock-free buffer. It also provides some near the end of a Jack cycle. Again, existing im- extra information which will be discussed later. plementations fail to meet this requirement. KeepingtheALSAthreadassimpleasthisallows Applications implementing adaptive resam- it to run at a higher real-time priority than the pling can take up to a few seconds to stabilize Jack server, and with a shorter period time, and theircontrolloops, andneedtorestartifsynchro- this in turn helps to obtain accurate timing infor- nisation is lost, e.g. when a misbehaving Jack mation. The actual resampling and the control client results in a timeout of the server. This logic is performed in the process callback of the is quite accecptable, but minor incidents such as Jack client. The structure of zita j2a is virually Jack1 skipping one or a few cycles should not the mirror image of zita a2j. result in a change of the latency nor require a restart. 3.1 Building a model What needs to do be done is to control the re- sampling ratio in such a way that on average the 1 Unless you are e.g. sending one channel of a stereo same number of samples enter and leave the lock- pairviatheresamplerandtheothernot,butthatisreally asking for trouble. free buffer. We also want to do this using only very small and smooth changes of the resampling t_J Jack period ratio. A(t_J) A(t) Monitoring the actual state of the buffer (the k_0A k_1A J(t) number of frames stored in it) won’t work for sev- eral reasons. The most fundamental one is that J(t_J) we actually can’t observe the buffer state in any time reliable way from either side, as the other side could be modifying it at the same time. At best t_0A t_1A wecouldhaveanupperorlowerbound. Also,this Alsa period value doesn’t change in a smooth way, but jumps Figure 2: Delay calculation parameters each time a period is processed on either side. And these changes occur when the code of the JackprocesscallbackortheALSAthreadactually tentothelock-freebufferink . Wedon’tactually runs, and this moment is not at all representative J needthefunctiononJack’sside, sinceweareonly oftherealtimingoftheaudiosamples. Forexam- interested in its value at the start of the current ple the process callback of the Jack client can run period. at any time between the start of the current cycle A similar DLL can be provided at the ALSA andthestartofthenextone, thisjustdependson side. This requires reading the wakeup time of the position of the client in the connection graph the ALSA device using Jack’s microseconds timer and on the CPU load of other clients. and applying the DLL algorithm which is quite Thekeytocreatingaworkingmodelistotakeab- simple. To transfer the data to the Jack side a stractionoftheperiodbasedprocessing,including second lock-free queue is used. For each period therandomtimingerrors,andimaginetheresam- the ALSA thread sends a message containing it pling algorithm as a continuous process. current status, the computed timestamp for the Suppose we would have two continuous functions next period, and the number of frames written to oftime,W(t)andR(t)thatprovidethenumberof or read from the audio buffer. At the Jack side, samplesthathavebeenwrittentoresp. readfrom during each process callback these messages are thebufferatanytimet. Thenif∆istherequired read and the frame counts are accumulated into delay of the whole process, we could evaluate the a variable k . The most recent data (t ,k ) A1 A1 A1 error W(t) − R(t) − ∆ in each process callback is used, along with the same from the previous andusethistocontroltheresamplingratio. This, period,(t ,k ). SincetheALSAthreadreports A0 A0 with some refinements and extra functionality, is the wakeup time of its next cycle, the interval the basic algorithm used in zita a2j and zita j2a. (t ,t ) includes the current wakeup time at the A0 A1 Remains to create those two functions. Let J(t) Jackside(orintheworstcaseoneoftheendpoints be the function on Jack’s side and A(t) the one will be close), so a simple interpolation is all that on the ALSA side. Then for zita a2j J(t) = R(t) is needed to compute A(t ). Figure 2 shows the J and A(t) = W(t), and for zita j2a J(t) = W(t) relevant parameters for the case zita a2j. and A(t) = R(t). 3.2 Resampler delay On the Jack side, part of the solution is already available in the server. The DLL (Delay Locked The analysis so far has ignored the delay intro- Loop) thathasbeenpartofJacksincemanyyears duced by the resampling process itself. There are computes in each cycle a prediction of the start two aspects to this. First, this latency can be sig- timeofthenextcycle, whileremovingmostofthe nificant and it must be taken into account when jitter due to random wakeup latency. This pro- defining the target value. Second, when using k , J vides a smooth and continuous mapping between the number of frames transferred between the re- time (as measured by Jack’s microsecond timer) sampler and the audio queue, as the value of J(t) and frame counts. we are actually making an error. J(t) should in- So we can implement J(t) by just reading the crease by exactly the same delta in each period, timestamp provided by Jack’s DLL into t , and the Jack period size either multiplied or divided J summing the number of frames read from or writ- by the resampling ratio, but it doesn’t because it outdist = 4.5 samples turns the current value of this delay at the input sample rate, and this is used to correct the value of k . One may ask if such a small error actually J matters. Thisdependsonthenominalresampling ratio. If this is not the quotient of two small inte- gers then the fractional error is a pseudo-random valueanditseffectwilllberemovedbytheloopfil- ter that controls the resampling ratio. The worst case results if the two sample rates are nominally inpdist = 2.7 samples the same. The error will be a sawtooth function Figure 3: Resampler latency with a frequency equal to the difference between the two actual sample rates. In this case the loop filter may not completely remove it. The effect was visible in test results of early implementa- is constrained to be integer. The sum will be ex- tions that did not include the correction in the actonaverage,thereisnolongtermaccumulating error calculation. error, but we are missing the fractional part. That fractional part is actually represented by 3.3 Closing the loop the internal state of the resampler. To see this we Combining the elements presented above we can will use zita-resampler as an example. Constant nowformulatetheequationsgivingthedelayerror bandwidth resampling works by evaluating a FIR forbothcases. Letγ betheresamplingratio, d res filter (a near ’brickwall’ lowpass wich corresponds be the value returned by the inpdist() member of toawindowedsinc()functioninthetimedomain) theresampler,∆thetargetdelayvalueandt = t J for each output sample. The central peak of the the start time of the current cycle, then impulse response corresponds to the current out- put sample, and the actual coefficients used de- E = W(t)−R(t)+d −∆ A2J res pend on the position of the filter w.r.t. the input E = W(t)−R(t)+d ∗γ −∆ J2A res samples. This is shown in Fig. 3, using a very short filter. Actual resampling filters are much Using the definitions of W() and R(), and setting longer. t −t Whenzita resampler finishesprocessingablock d = A(t ) = [k −k ] j A0 (1) A J A1 A0 of frames it remains in a state ready to compute tA1−tA0 the next output sample, except that it may have these become to read one or more input samples before it can proceed. InFig.3thebottom(red)dotsrepresent EA2J = [kA0−kJ]+dA+dres−∆ (2) input samples and the top (blue) ones the out- E = [k −k ]−d +d ∗γ −∆ (3) J2A J A0 A res put. Solid dots are samples already used or com- puted. The filter is aligned with the next output This error is first processed by a second order sample. In this example one more input sample lowpass filter and then becomes the input to the is required to compute the next output, the first secondorderloopfilter. Thisisverysimilartothe non-solid one in the figure. This will not always one used in Jack’s DLL, see [Adriaensen, 2005]. bethecase, forexampleifthepreviouscalltermi- The first filter is added to further reduce phase nated because the output buffer was full it could modulation by high frequency noise on the error be that the next output doesn’t require a new in- value. Its bandwidth is 20 times that of the loop put sample. But in any case the distance between filter, so it does not affect stability. The resam- the next input sample and the next output one is pler code adds another lowpass to smooth ratio well-defined and it can be expressed in either the changes. Also this one must be dimensioned so it input or output sample rate. This value includes does not affect operation of the loop. the fractional part that is missing from k . Thevaluesk andk areobtainedbyaccumulat- J J A1 The Vresampler class used in zita a2j and ing differences in each cycle. To make the equa- zita j2a provides a function inpdist() which re- tions above represent the actual round trip delay error we need to initialise them with the correct can’tsetthenumberofframesinthequeuetoany values. Sincek isjustk fromthepreviouscy- specific value in a safe way (as the other side may A0 A1 cle it needs no initialisation. A bit of arithmetic be accessing the queue at the same time). The (which is left as an exercise for the reader and only thing we can do is force a relative change by wich may tickle his or her powers of imagination either reading or writing a number of frames, and a bit) will show that the correct initial values are that is fact all we need. If N is the delay error rounded to the nearest kJ = −PJ/γ integer, then for the A2J case we set kJ = kJ+N k = P and read N frames from the queue, and for the A1 A J2A case we set k = k − N and write −N J J if the ALSA device is the input, and frames to the queue. Adjusting k removes most J oftheerrorfromthemodel, andapplyingthecor- kJ = PJ ∗γ responding change to the queue ensures that the k = −P model remains in sync with the actual situation. A1 A Both example applications do remove the ini- in the other case, with PJ and PA being the Jack tial error in this way, then run the loop at higher and ALSA period sizes, and γ the resampling ra- bandwidth for the first 4 seconds. tio. The key to understand this is that at any time 4 Implementation details thedelaymustbethesumofthenumberofframes 4.1 Delay error calculation inbothhardwarebuffers, thelock-freequeue, and The k , k and k values used in (1,2,3) are the resampler, while taking the different sample J A0 A1 integers that are incremented by frame counts in rates into consideration. each iteration, so they will overflow at some time. A related matter is to determine the target Since we only ever take differences of those, the round-tripdelayvalue. Somemore(simple)arith- overflow doesn’t matter. But this part of the cal- metic will show that culationmust bedoneasasubtractionofintegers ∆ = T +2T +(1+c)T without conversion to a floating point value, as min,A2J res J A suggestedbythesquarebracketsintheequations. ∆ = T +2T +2T min,J2A res J A Theremainingpartscanbedonesafelyinfloating where T is the delay of the resampler, T and point since these terms are recomputed each time res J T the Jack and ALSA side period times, and c and there will be no accumulating roundoff error. A is the maximum expected wakeup latency of the Jack represent its microseconds timer as a 64- ALSAthread,e.g. 1/2whenthisthreadisallowed bit integer type. These are difficult to use in to run half a cycle late. These values allow for calculations so they are converted to double. To worst case conditions, e.g. a Jack client running avoid loss of precision, the actual value of tJ, tA0 near the end of the cycle. and tA1 is the microseconds time masked to 28 bits, and divided by 106 to obtain seconds. This 3.4 Improving the settling time representation will wrap around every 268.4 sec- The system discussed so far will provide a con- onds or so. Again since we are always taking dif- stant processing delay, but with the loop running ferences this is easy to detect and correct. at is normal bandwidth (around 0.05 Hz in the 4.2 Error recovery and reporting current implementation) it could take a long time toreachthetargetvalue∆. Wecandotwothings In case of a fatal error in the ALSA device or a tospeedupconvergenceoftheloop. Oneistorun timeout of the Jack server there is no alternative theloopfilteratahigherbandwidthinitially. The but to restart the loop initialisation. The two ap- other is to use the first delay error measurement plications discussed here contain some state man- to force a situation close to the required one, and agement code to allow this without having to then let the loop take care of the remaining error. restart the actual processes. This uses a third This requires modifying the state of the lock-free lock-freequeue(notshowninFig.1)tosendcom- queue. Note that once the system is running we mands from the Jack side to the ALSA thread. When restarting the loop will settle quite fast, as the previous resample rate correction can be re- used. The Jack1 server will occasionally skip some cycles while creating or removing clients or when making port connections. This can be detected and handled quite easily and without introducing any errors in the loop. The particular implemen- tation of the lock-free audio queue allows it to recover from overflow or underflow conditions by just reading or writing the number of frames that were missed before. When doing this, the same adjustmentismadetok toremovetheerrorfrom J the delay calculation. A fourth lock-free queue is used to convey sta- Figure 4: zita-j2a, 48.0 to 44.1 kHz tus and optional monitoring information for out- put in the main thread. 4.3 The lock-free queue Whenrecoveringfromskippedcyclesthelock-free audio queue may be in an overflow or underflow condition. The same is true when removing the initial delay error as discussed in 3.4. Also, the value of N used there can be positive or negative. Thelock-freequeuemustimplementedinaway that maintains correct read and write counters and the corresponding data pointers at all times. It must also allow logical read or write operations of any number of items, including negative ones. Suchanimplementationneednotbemorecom- plicatedthansomeexistingones,andinfactitcan bemuchsimpler. TheC++classusedinzita a2j and zita j2a uses a 2N buffer size as would most. Figure 5: zita-j2a, 48.0 to 48.0 kHz It maintains read and write counters as 32-bit in- tegers. These are just incremented by the num- ber of items the user claims to have read or writ- being48kHzattheJacksideand44.1kHzforthe ten, without enforcing any limits. The read and ALSA device. The Y axis is the phase of the out- write indices are the respective counters modulo putsignal(w.r.t. tothegenerator)indegrees, the the buffer size. Since the buffer size divides the X-axis is in centiseconds. The loop stabilises in 232 range of the counters, these can be allowed about 15 seconds. After that time, variations are to overflow without any consequence. It is the less than 0.5 degrees peak-to-peak. One degree user’s responsability to avoid buffer overflow or at 1 kHz corresponds to 2.78 microseconds. The underflow if that matters, and the class provides small bump at around 145 seconds is the result of thenecessaryinformationtomakethiseasy,sono switching the desktop workspace. This measure- functionality is lost by this particular implemen- ment was done one a single CPU machine, run- tation. ningastandard(unpatched)kernelandhavingno HW video acceleration. 5 Results Figure 5 is the result of the same test but with Figure 4 shows the result of a 1 kHz sinewave sig- both ends running at 48.0 kHz. There is a pe- nal processed by zita j2a, with the sample rates riodic dip every 83 seconds, which corresponds to a frequency of around 0.012 Hz. This is the difference between the period frequencies of both soundcards. Every 83 seconds the interrupts of the two cards occur at almost the same time and compete for the CPU. This effect would probably not be visible on an SMP machine. It’s harmless in practice as the delay changes are very small and smooth. 6 Acknowledgements The results reported in this paper build on the work done by the Jack and ALSA developers. A particular thanks to the members of the respec- tivemailinglistsforthepromptanswerstoallmy questions. References Fons Adriaensen. 2005. Using a DLL to fil- ter time. Available at http://kokkinizita. linuxaudio.org/papers/index.html.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.