ebook img

LAC 2012 - Paper: A Toolkit for the Design of Ambisonic Decoders PDF

2012·0.82 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview LAC 2012 - Paper: A Toolkit for the Design of Ambisonic Decoders

A Toolkit for the Design of Ambisonic Decoders Aaron J. HELLER Eric M. BENJAMIN Richard LEE AI Center, SRI International Surround Research Pandit Litoral Menlo Park, CA, US Pacifica, CA, US Cooktown, QLD, AU [email protected] [email protected] [email protected] Abstract arrays [Heller et al., 2010]. In this paper, we ex- tendthatworktofullperiphonic(3-D)arraysand Implementation of Ambisonic reproduction systems is higher-order Ambisonics (HOA). The techniques limitedbythenumberandplacementoftheloudspeak- ers. In practice, real-world systems tend to have in- are implemented as a MATLAB [2011] and GNU sufficient loudspeaker coverage above and below the Octave [Gnu, 2011] toolkit that makes use of the listening position. Because the localization experi- NLOpt library [Johnson, 2011] to perform the op- enced by the listener is a nonlinear function of the timization. loudspeaker signals it is difficult to derive suitable de- We use the term decoder to mean the configu- coders analytically. As an alternative, it is possible to ration for a decoding engine that does the actual derive decoders via a search process in which analytic signal processing. Examples are Ambdec [Adri- estimators of the localization quality are evaluated at aensen, 2011] that operates in real time, as well each search position. We discuss the issues involved and describe a set of tools for generating optimized as an offline decoder we have implementedas part decoder solutions for irregular loudspeaker arrays and of this toolkit.1 demonstrate those tools with practical examples. Inthispaper,weuseboldromantypetodenote vectors,italictypetodenotescalars,andsansserif Keywords type to denote signals. A scalar with the same name as a vector denotes the magnitude of the Ambisonic decoder, HOA, periphonic, nonlinear opti- mization vector. A vector with a circumflex (“hat”) is a unit vector, so, for example, r^ = r =r . E E E 1 Introduction We start with a discussion of the design process andthetradeoffsinvolved,thenthespecificsofthe Ambisonics is a versatile surround sound record- optimizationprocess, andfinallyresultsoftwoar- ing and reproduction system. One of the attrac- rays, a third-order decoder for the 22-loudspeaker tions is that the transmission format is indepen- CCRMAarray,andasecond-orderdecoderforthe dent of the loudspeaker layout. However, this ◦ 12-loudspeaker 30 tri-rectangle array. means that each playback system needs a custom decoder that is matched to the loudspeaker ar- 2 Designing Ambisonic Decoders ray. The decoder creates the loudspeaker signals Ambisonics represents a sound field with a group from the transmission signals. Ambisonics theory of signals that are proportional to spherical har- provides simple encapsulations of high- and low- monics. TheoriginalAmbisonicsystemswerefirst frequency auditory localization that can be used order only, but more recently, higher-order sys- to design decoders, as well as theorems that ease the design of decoders for regular polygonal and 1AnotherimportantfunctionofanAmbisonicdecoderis polyhedral loudspeaker arrays. to provide near-field compensation. This compensates for In earlier papers, the present authors have dis- the curvature of the wavefronts due to the finite distance cussed the design and testing of first-order de- to the loudspeaker and is strictly a function of distance of the speaker from the center of the array and the order coders for regular horizontal loudspeaker layouts of reproduction. Ambdec and the offline decoder in this [Heller et al., 2008] as well as the use of nonlin- toolkit provide such filters and they will not be discussed ear optimization to design decoders for ITU 5.1 further in this paper. tems have been implemented. In first-order Am- to create realistic atmosphere even if more precise bisonics the zeroth-order component represents methods like HOA are used for special sounds. the sound pressure, and the three first-order com- To preserve this advantage requires the preser- ponents represent the acoustic particle velocity. vation of a good facsimile of the diffuse field. En- If these components are reproduced exactly, then ergy gain that varies with direction and “bunch- the sound will be correct at the center. How- ing” of directions, particularly in the horizontal ever, it is not possible to get the first-order com- plane, are all detrimental, as is “speaker detent” ponents correct except at a single point and not where individual loudspeakers draw attention to practical to get them correct at higher frequen- themselves. cies, where the wavelengths become smaller than 2.1 Auditory Localization the size of the human head. The task of the de- Due to the range of wavelengths involved, the coder is to create the best perceptual impression human auditory localization mechanism utilizes thatthesoundfieldisbeingreproducedaccurately different directional cues over different frequency given the loudspeaker array being used. regimes. At low frequencies, localization depends In practical terms, the following are necessary: on the detection of Interaural Time Differences (cid:15) Constant amplitude gain for all source direc- (ITDs), but at high frequencies there is an am- tions biguity because a human head is multiple wave- lengths across above about 1 kHz. For this rea- (cid:15) Constant energy gain for all source directions son, localization switches abruptly, depending on (cid:15) At low frequencies, correct reproduced wave- InterauralLevelDifferences(ILDs)abovethatfre- front direction and velocity quency. One way to predict localization would be to use Head Related Transfer Functions (HRTFs) (cid:15) At high frequencies, maximum concentration tocalculatetheactualearsignalsofalistener,but of energy in the source direction this turns out to be computationally difficult and (cid:15) Matching high- and low-frequency perceived would vary from listener to listener. directions Gerzon developed a series of metrics for pre- dicting localization that are simpler than using These criteria may, themselves, have different the HRTFs [Gerzon, 1992]. The simplest of these interpretation or importance depending on the metrics are the velocity localization vector, r , V source material and the intended use. We can and the energy localization vector, r . The direc- E identify three distinct types of program: tionofeachindicatesthedirectionoftheexpected (cid:15) Natural recordings made with a first-order localization perception, while the magnitude indi- cates the quality of the localization. In natural soundfield microphone. hearing from a single source, the magnitude of (cid:15) Natural recordings made with higher order each vector should be exactly 1, and the direc- microphones. As of this writing, such mi- tion of the vectors is the direction to the source. crophones are just becoming available com- It should be noted that, while r is proportional V mercially, but practical constraints will mean to the physical quantity of the acoustic particle that these are still first order at lower fre- velocity, r is an abstract construct.2 E quencies. Following Gerzon [1992], the pressure (ampli- (cid:15) Artificial recordings. First order as well as tude gain), P, and total energy gain, E, are HigherOrderAmbisonic(HOA)programma- ∑n terial. P = G (1) i i=1 The first case is Ambisonic’s greatest strength. Good first-order Ambisonic reproduction is prob- 2NotethatthesemetricsarenotspecifictoAmbisonics; ablytheclosesttorecreatingavirtualsoundenvi- they can be used to predict the quality of the phantom imagesproducedbyanymultispeakerreproductionsystem, ronment, whether the buzz of a busy Asian mar- regardless of the panning laws used, including plain old ketplace or the sound of a concert in a good hall two-channel stereo. Gerzon shows this for several well- in your living room. It will most likely be used known stereo phenomena [Gerzon, 1992]. ∑n ∗ E = (G G ) (2) i i i=1 The magnitude and direction of the velocity vec- tor, r and r^ , at the center of an array with n V V loudspeakers is ∑n 1 r r^ = Re G u^ (3) V V i i P i=1 whereasthemagnitudeanddirectionoftheenergy vector, r and r^, are computed by E E ∑n 1 ∗ r r^ = (G G )u^ (4) E E i i i E i=1 Figure 1: Maximum average r depending on or- where the G are the (possibly complex) gains E i der and type. “matching” and “max r ” refer to from the source to the i-th loudspeaker, u^ is a E i thedecodermatricesdescribedinSections2.2and unit vector in the direction of the loudspeaker, ∗ 2.3, respectively. and G is the complex conjugate of G . i i Thevelocityvectorpointsinthesamedirection and is proportional to the acoustic particle veloc- to each loudspeaker that are needed to optimize ity. Ithasbeenshownthatthevelocityvectorpre- localization as predicted by the velocity localiza- dicts the ITDs very accurately [Benjamin et al., tion vector, r . Numerous authors have provided V 2010]. The energy vector predicts the ILDs, but derivations of the low-frequency solution for a in practice it is not possible to get r = 1 unless E givenloudspeakerarray,andthusanumberofdif- the sound is coming from just one loudspeaker. ferent terms are used to refer to it, including “ve- This is representative of a pervasive problem in locity”, “matching”, “basic”, “exact”, “mode match- multichannel sound reproduction. The maximum ing”, “re-encoding” and so forth. average value of r that can be obtained for a E In practice, these reduce to projecting (or en- given Ambisonic order is shown in Figure 1. The coding) the loudspeaker directions onto the se- formulas to compute these are given in Appendix lected spherical harmonic basis set,4 assembling A. these vectors into an array, and computing the Because different sets of gains are needed to Moore-Penrose pseudoinverse of the array [Weis- satisfy the low- and high-frequency models, many stein, 2008]. Examples of this can be found in ambisonicdecoderssplittheaudiointotwobands, Appendix A of [Heller et al., 2008]. In general, apply different decoder matrices, and then recom- there are an infinite number of solutions and this bine to produce the loudspeaker signals.3 Daniel procedureprovidesthesolutionwiththeminimum hassuggestedthatathree-banddecodermaypro- L2-norm (i.e., the least-squares fit), which has the videbetterreproductionundersomelisteningcon- desirable property of requiring the minimum total ditions[Daniel,2001]. Thisremainsanopenques- radiated power.5 tion at this time. 2.2 Computing the Low-Frequency 4The toolkit is neutral as to the conventions for com- ponentorderingandnormalization. Theseconventionsare Matrix encapsulated in a single function. The current implemen- The low-frequency matrix provides gains from tationsupportstheFurse-Malhamset[Malham,2003],but each channel of the ambisonic program material others can be added easily. 5Recently, some authors, drawing upon compressive 3This places certain constraints on the phase response sending theory, have suggested that the L1-norm may be of the band splitting filters. We discuss the design and moresuitable[Wabnitzetal.,2011;Zotteretal.,2012]. L1- implementationofsuitablefiltersinAppendixBof[Heller norm minimizes the sum of absolute errors. Compared to etal.,2008]andnotethatthefiltersinAmbdecmeetthese least-squares,itallowslargermaximumerrorsinexchange requirements. for more zero errors. Exceptinthecaseofdegenerateconfigurations, teria and due to the nonlinear nature of the crite- where all the loudspeakers lie in the null of one or ria, numerical optimization is needed to compute more of the spherical harmonics, this procedure the solutions, which will be discussed in Section will result in a decoder matrix that satisfies the 3. low-frequency localization criteria exactly; how- 2.4 Merging the LF and HF Matrix ever, it may utilize a great deal of power to get The existence of different optimum decoder coef- them correct in directions where there is a large ficients for optimum r and r would typically angular separation between the loudspeakers in V E mean having to make a choice or compromise be- the array. This will result in low r values in E tween the two. In this case, however, both can be those directions. As we shall see, except in the had. The decoder that optimizes r can be used case of regular polyhedra and polygons, it is im- V atlowfrequenciesandthedecoderthatmaximizes possible to fully satisfy all the ambisonic criteria r can be used at high frequencies, by the simple simultaneously. This implies that while the am- E expedientofusingfilterstocrossoverbetweenthe bisonic transmission format is independent of the two. This is typically done at around 400 Hz. loudspeaker array, not all loudspeaker array ge- Because the higher-order components are re- ometries perform equally well. duced in order to maximize r , this causes a E 2.3 Computing the High-Frequency reduction in the total signal level of the high- Matrix frequency decoder outputs, and thus a reduction in the high frequencies heard by the listener. The The high-frequency matrix provides gains from gains that maximize r specify the relationship E each channel of the ambisonic program material among the signals of different order, but not how to each loudspeakers that are needed to optimize that gain should be apportioned between high- localizationaspredictedbytheenergylocalization frequency cuts and low-frequency boosts. There vector, r . Gerzon proved two theorems for first- E are three possibilities: order reproduction, the polygonal decoder theo- rem and the diametric decoder theorem. They 1) Preservation of the amplitude. That is, sim- state that in an array with a minimum of four ply use the gains produced by the optimizer or loudspeakers for 2-D and six speakers for 3-D, those given in Tables 1 and 2. where the loudspeakers are spaced in equal angles 2) Preservation of the root-mean-square (RMS) or in diametrically opposed pairs, r is guaran- E level. This is what Gerzon [1980] suggests teed to point in the same direction as r . The V and is what is implemented in older analog de- polygonal decoder theorem also holds for higher- coders. order Ambisonic reproduction, provided there is 3) Conservation of the total energy. Daniel [2001] an adequate number of loudspeakers in the ar- suggests this, and the configuration files in- ray to support the desired order. This simplifies cluded with Ambdec follow this recommenda- thetaskofdesigningthehigh-frequencymatrixto tion. This method results in more high fre- that of selecting the gain for each order such that quencies with more speakers. the overall magnitude of r is maximized. For E first-order decoders, Gerzon provided the values √ √ ThecalculationsinvolvedaregiveninAppendix of 2 for horizontal arrays and 3 for periphonic B. In listening tests, we have found that preserva- 2 3 arrays. Daniel derived general formulas for these tion of the RMS level works well for small arrays. gains [Daniel, 2001], which are given in Tables 1 We have also found that using the conservation and 2. (See Appendix A for programs that com- of energy approach on large 3-D arrays results in pute the values in these tables.) overemphasizing high frequencies and near-head As we will see in the example in Section 2.5, imagingartifactsandnulls. Inpractice,wetheset once the array deviates from having equal angles, theLF/HFbalancebyear,comparingthebalance there is no longer a guarantee that r and r of the two-band decoder to that of a single-band E V pointinthesamedirectionorthatthereisasingle r -max decoder. More work is needed to find a E set of gains that maximize r in every direction. procedure for this that does not involve tuning by E Because of this, we must trade off the various cri- ear. Order Max r Gains E 1 0.707107 1, 0.707107 2 0.866025 1, 0.866025, 0.5 3 0.92388 1, 0.92388, 0.707107, 0.382683 4 0.951057 1, 0.951057, 0.809017, 0.587785, 0.309017 5 0.965926 1, 0.965926, 0.866025, 0.707107, 0.5, 0.258819 Table 1: Per-order gains for max-r decoding with 2-D regular polygonal arrays. E Order Max r Gains E 1 0.57735 1, 0.57735 2 0.774597 1, 0.774597, 0.4 3 0.861136 1, 0.861136, 0.612334, 0.304747 4 0.90618 1, 0.90618, 0.731743, 0.501031, 0.245735 5 0.93247 1, 0.93247, 0.804249, 0.62825, 0.422005, 0.205712 Table 2: Per-order gains for max-r decoding with periphonic regular polyhedral arrays. E 2.5 Selection of a speaker array      LF p 1 1 1 0 W Duetosymmetry,regularloudspeakerarrayshave RF 2 1 1 (cid:0)1 0X the advantage of uniformity in the localization   =    (5) RR 4 1 (cid:0)1 (cid:0)1 0 Y predictors r and r . As noted above, practi- V E LR 1 (cid:0)1 1 0 Z cal difficulties usually prevent the attainment of a completelyregulararray. Therewillbeatendency This gives exact recovery of the pressure and forrE tobegreaterinthedirectionswherethean- velocity at the center of the array: jr j = 1 and V gular density of loudspeakers is greater, and less points in the intended direction. But because in the directions where there are few loudspeak- ◦ the angular separation of the loudspeakers is 90 , ers and rE will tend to point in the directions of r = 2. However, if we investigate what happens concentrations of loudspeakers. E 3 as the ratio of pressure (W) and velocity (X, Y It should be noted at this point that it is im- and Z) is varied, it develops that rE is maximum possible to get rE to be larger in the direction be- for the case√where the first-order components are tween loudspeakers than the value achieved sim- reduced to 2 of their original value. This gives 2 √ plybydrivingtheloudspeakersnearesttothegap a magnitude of r of 2. equally. This means that the best that can be E 2 If the square ispreplaced with a rectangle with achieved by an Ambisonic decoder is to have a an aspect ratio of 3 : 1, the front and rear loud- smoothtransitionbetweenareaswheretheperfor- ◦ speakersnowsubtendanangleof60 andtheside manceisgood(largenumberofloudspeakers,high ◦ loudspeakers subtend an angle of 120 . This re- magnitude of r and r points in the intended di- E E duces r at the sides but increases it in the front, E rection) and areas where the performance is less relativetoasquare. Ifthesamegainasderivedfor √ good(fewerloudspeakers,r hassmallmagnitude E the square ( 2 for the first-order components) is and points in an incorrect direction). As such, we 2 applied, then there is a substantial improvement must be careful in choosing the decoder parame- in r to the sides, and a very tiny decrease in r terssothattheperformanceinthegooddirections E E in the front. This is shown in Figure 2. is good enough, and the performance in the poor But is this the “optimum”? Figure 3 shows that directions is not too bad. if we further vary the ratio of the zero- and first- A simple example of a four-speaker array will order components it develops that r , evaluated E illustrate these difficulties. A square horizontal at the sides, is a maximum at a different ratio. array has a basic decoder solution of It is thus possible to maximize r in front or at E For tphe previous example of a rectangular array with a 3 : 1 aspect ratio, the ratio of the energy in the forward direction to the energy at the sides was calculated and is also plotted in Figure 3. r V obtains its correct value of 1 in all directions and the pressure response is also omnidirectional. At higher frequencies, where the “max-r ” decoder is E ineffect,r ismaximizedforfrontandbackdirec- E tions (where the speakers are closer together). At thesefrequencies,soundsfromthesides(perceived as “energy”) are 2.4 dB louder. This is a pervasive problem for irregular arrays and willpbe addressed in greater detail below. The simple 3 : 1 rectan- gle as above is used widely and is known to give goodresults. Ontheotherhand,listeningtestsin- Figure 2: Locus of r for a rectangular array, dicate that the 5.5 dB energy imbalanceexhibited E matching and “max r ” decoders. by some first-order decoders for ITU 5.1 arrays E is too large. From this, we propose 3 dB varia- tion in “energy” with horizontal direction as the maximum imbalance acceptable. 2.5.1 Discussion of compromises of speaker arrays TheselectionofaloudspeakerarrayforAmbisonic reproduction is subject to a number of compro- mises, notably the space available to house the arrayandthebudgetforpurchasingloudspeakers. It may be that the array already exists, in which case the decoder design task is one of selecting a decoder design that provides the best audible performance. In other situations, however, the design of the array has not been fixed although the number of loudspeakers may have been. In that case, there is substantial latitude to trade off Figure 3: r and the energy, E, as a function of E between high-order performance horizontally and the ratio of the first-order to zero-order scaling. periphonic performance. 3 Optimizer the sides, but not both at once. One might wish to use a different decoder depending on whether AsnotedinSection 2.5, inanirregulararray, sim- sound images are expected at the front, or on the ply scaling the LF and HF matrices does not re- sides, or a decoder that gives a compromise be- sult in r and r pointing in the same direction; V E tween the two. hence, the design procedure becomes somewhat Thus far, only the quality and direction of the more complex. localization have been discussed. There is also an Becausethekeypsychoacousticcriteriaforgood effect on the loudness of the sound, depending on decoder performance are nonlinear functions of direction. If the loudspeaker array is irregular, thespeakersignals,weutilizenumericaloptimiza- then the solution to recover pressure and velocity tion techniques. To do this, a single objective results in an increase in energy in the directions function is formulated that takes as input the wheretheangularspacingisgreatest. Thisresults decoder matrix and produces a single figure of in an increase in reproduced loudness in those di- merit that decreases as the decoder performance rections. improves. The nonlinear optimization algorithm will then try different sets of matrix elements, at- 3.1 Optimization Criteria tempting to arrive at the lowest value possible. For each test direction, the following are com- Because there are a number of criteria, we use puted: amplitude gain, P, energy gain, E, the the weighted sum to provide an overall figure of velocity localization vector, r , and the energy V merit. A user can adjust the weights to set the localization vector, r . From these, we compute E relative importance of the different criteria, say thepairwiseanglesbetweenthetestdirection, r^ , V uniform energy gain (loudness) versus angular ac- and r^. These are summarized with the following E curacy. In addition, each test direction can have figures of merit: deviation of amplitude gain from its own set of weights, so that, for example, an- 1alongthex-axis,minimum,maximum,andRMS gular accuracy can be emphasized for the front, values of amplitude gain, energy gain, magnitude while uniform energy gain is emphasized in other of r , magnitude of r , and the pairwise angular V E directions. This might be the preferred configura- deviations. tion for classical music recorded in a reverberant It is important that the criteria are “well be- performance hall. On the other hand, environ- haved” near zero, so as not to trigger oscillating mental recordings made outdoors have very lit- behaviorintheoptimizer. Theyshouldbecontin- tle diffuse content, so overall angular accuracy is uousandhavefirstderivatives. Inpracticalterms, more important. Another application of direction absolute value and thresholds should not be used; weightings is in highly asymmetrical arrays, such squaring can be used for the former and the expo- as a dome, where few speakers are below the lis- nential function for the latter cases. tener. Inthiscase, weexpectpoorperformancein Finally, directional weightings are applied to those directions, so they are deemphasized when each criteria and then an overall weighted sum computing the objective function. produces the single figure of merit for that partic- ular configuration. We have employed the NLOpt library for non- 3.2 Test Directions linear optimization [Johnson, 2011]. NLOpt pro- As mentioned in the previous section, each candi- vides a common application programming inter- datesetofparametersisevaluatedfromanumber face (API) for a collection of nonlinear optimiza- of directions. For 2-D speaker arrays, 180 or 360 tion techniques. In particular, it supports a num- evenly spaced directions are often used [Wiggins ber of “derivative free” optimization algorithms, et al., 2003; Moore and Wakefield, 2008]. For 3-D which are well suited to the current application arrays, the situation is more complex because no wheretheobjectivefunctionistheresultofacom- more than 20 points can be distributed uniformly putation, rather than an analytic function. on a sphere (a dodecahedron). Lebedev-Laikov quadrature defines sets of Anearlierversionoftheoptimizerthatwaslim- points on the unit sphere and weights with the ited to first-order horizontal arrays was written property that they provide exact results for inte- in C++ [Heller et al., 2010]. To extend that to gration of the spherical harmonics [Lebedev and higher-order and periphonic arrays required a sig- Laikov, 1999]. The current implementation pro- nificant rewrite, so an initial prototype was writ- vides a function that returns Lebedev grids of ten in MATLAB, with plans to recode in C++. points and corresponding weights for as many as Because the bulk of the computation is matrix 5810 directions. Our current experiments have multiplication, which is handled by highly opti- used a grid with 2702 points, which corresponds mized code in MATLAB, it turned out that the ◦ roughly to 3 . The toolkit also has functions pro- execution speed was almost as fast as the orig- viding 2-D and 3-D grids that are sampled in uni- inal C++ version, so we abandoned plans for form azimuth and elevation increments, which are the rewrite. To make the code widely usable, useful for visualization of the results. it was kept compatible with GNU Octave. The key change is that GNU Octave does not support 3.3 Optimization Behavior nested functions, so a number of variables need to As part of the optimization setup, the user sup- be declared global to make them accessible to the plies a set of stopping criteria. This can be spec- objective function. ified as a threshold on the absolute and relative changes in the parameters and/or the objective the upper and lower loudspeakers at (cid:6)30◦ with function, as well as a maximum running time and respect to horizontal. maximum number of iterations. The default val- uesinthecurrentimplementationare 1(cid:2)10−7 for 4.1 CCRMA Listening Room the parameters and objective function. Thedescribedsoftwarewasappliedtoderivingde- Forsmall2-Darrays(say, 12to24parameters), codersfortheListeningRoomatCCRMA(Center the optimizer typically converges in less than 1 for Computer Research in Music and Acoustics) minute, examining 40,000 to 1,500,000 configura- at Stanford. This facility consists of 22 identical tions. For large high-order arrays (say, 200 to loudspeakers arranged in five rings. There is a 400+ parameters), it typically converges in less horizontal ring of eight loudspeakers, two rings of ◦ ◦ than an hour. These timings were done with Oc- sixloudspeakers,one50 belowandone40 above taveversion3.2.4-atlasona2.66GHzIntelCorei7 horizontal, and one loudspeaker directly above with 8 GB of memory. The bulk of the computa- and one directly below the listening position. The tioncomprisesmatrixmultiplicationsandisthere- two hexagonal rings are thus not exactly horizon- fore suitable for parallel implementations. The tally opposed. A schematic of the array is shown timingsinMATLABwereapproximately2xfaster in Figure 4a. than Octave since it can make use of the multiple An initial solution was derived by calculating cores in the i7 processor. the pseudoinverse of the loudspeaker projection With large optimization problems, using a lo- matrixasdescribedabove. Thedecoderwasmod- cal optimization algorithm and providing an ini- ified to optimize the magnitude of rE at high fre- tial solution that is near to the optimum is im- quencies by applying the weighting factors given portant for reliable convergence. The toolkit cur- inTable2tothegainsofthesignalcomponentsof rently supports three strategies: each order. Given that the theoretical maximum average value for r can be no greater than 0.866 E (cid:15) Using the low-frequency solution modified at third order, the average value of 0.850 for the with the per-order gains that would provide third order decoder given here does not leave a the max-r solution for a uniform array. great deal of margin for improvement. Nonethe- E less, the optimization software was applied to the (cid:15) “Musil Design” where additional “virtual” problem. Figure 5 shows the performance of the loudspeakers are inserted into the array to initial solution and the optimized result. Aver- make the spacing more uniform, and hence age r was increased slightly, and maximum di- E more suited to a pseudo-inverse solution. Af- rectional error reduced by a factor of 5. ter the optimization is complete, the signals An informal listening test comparing this de- for the virtual speakers are either ignored or coder to the existing one was conducted using distributed to the adjacent speakers [Zotter third-order test signals and studio recordings, as et al., 2010]. wellasfirst-orderacousticrecordings. Thegeneral (cid:15) Ahierarchicalapproach,decomposingtheop- impression was that the new decoder did a better timization by establishing a solution for each job of keeping horizontal sources in the horizontal order consecutively, freezing the individual plane, whereas the existing decoder rendered such coefficients for orders below the current one, sources above the horizontal plane. but allowing an overall adjustment on the ◦ 4.2 The 30 Tri-Rectangle gain of the lower orders. As discussed elsewhere, a dodecahedron or other 4 Examples large regular array is difficult to fit into nor- mally dimensioned spaces. One large array that The software tools described above were applied does fit into normal spaces is the so-called tri- to the derivation of decoders for several real- rectangle, patterned after a suggestion by Gerzon. world systems. The examples given here are the A schematic is shown in Figure 4b. It consists of CCRMA listening room6 and a tri-rectangle with three interlocking rectangles of loudspeakers, one 6See https://ccrma.stanford.edu/room-guides/ in the horizontal plane, one in the XZ plane, and listening-room/ one in the YZ plane. The projection of the loud- (a) The 22-loudspeaker array at CCRMA (b) The 12-loudspeaker tri-rectangle. Figure 4: Schematics of the loudspeaker arrays used in the examples. (a) The initial solution calculated by pseudoinverse and (b) The optimized solution. max-r gains. E Figure 5: r in the vertical plane for the CCRMA array before and after optimization. The arrows show E the directional error between the low- and high-frequency matrices. In this case, average r was increased E slightly, from 0.85 to 0.86 and the maximum directional error reduced by a factor of 5. speakers into any plane is an octagon, which hints Performance of the initial solution by inversion atitsutilityforreproducingsecond-orderprogram is shown in Figure 6. The magnitude of r is in E material. However, to enable it to fit into typical red, with both the horizontal and vertical (in the spacestheverticalrectanglesmustbesquashedto XZ plane) shown. The horizontal shape is essen- an approximate (cid:6)30◦ vertical angle. This gives a tiallycircular,withperfectdirection(notshownin ◦ solid angle of 120 above and below the listening thefigure),butthemagnitudeofr decreasesdra- E position with no loudspeakers. Naturally, this has matically for sources above or below about (cid:6)30◦ a profound effect on the localization for sources of elevation. Furthermore, there is an increasing above and below the listening position. error in the direction of r indicating that high- E Figure 6: r in the horizontal and vertical planes E for the initial second-order decoder. Figure 7: 3-D plot of r from an unconstrained E optimization of the second-order decoder for the ◦ 30 tri-rectangle. frequencysoundswillbeperceivedascomingfrom nearthepoles. Theextremeerrorsinthedirection of r are compounded by the low values, making rately. This resulted in a slightly lower r , but E E localization in the up and down directions vague significantly reduced angular error in the vertical in any case. plane. Figure 8b shows the optimized solution. The large angle subtended by the loudspeaker 5 Conclusions placement with respect to the vertical axis makes itimpossibletogetpreciselocalizationforsources An open source package for the design of am- directly above or below the listening position. It bisonic decoders has been presented. The soft- may, however, be possible to improve the localiza- ware allows the derivation of decoders for arbi- tion for sources near horizontal by correcting the trary loudspeaker arrays, 2-D or 3-D. The soft- direction of rE. ware operates under Octave or MATLAB, with Running the optimizer with this configuration thenonlinearoptimizationperformedbytheopen as an initial solution resulted in a highly distorted source package NLOPT. Auditory localization at solution where the sounds are drawn strongly to middle and high frequencies is a nonlinear func- the loudspeakers. A 3-D plot of rE for this so- tion of the loudspeaker signals, which necessitates lution is shown in Figure 7. As can be seen, the the finding of solutions that work well for those performance is very non-uniform (a sphere would frequencies via an optimization process. be ideal) and the maximum angular error is over Two example systems were solved. The first ◦ 30 . was a third-order decoder for the 22-loudspeaker Next, a Musil design was attempted. Virtual CCRMA listening room. That system is nearly loudspeakers were inserted into the array directly regular, and it was found that a solution obtained above and below the center. This was optimized by inversion of the loudspeaker matrix, with per- and then the signals for the virtual loudspeakers ordergains,wasnearlyasgoodasoneobtainedby reassigned to the nearest real loudspeakers. This the nonlinear optimization process. Nonetheless, resulted in improved r in the horizontal plane, the magnitude of r was improved and the angle E E as well as elevations as high as (cid:6)30◦; however, error was reduced. ◦ it suffers from directional errors as large as 31 . The second system was a 12-loudspeaker tri- Figure 8a shows the optimized solution. rectangle, with the upper and lower loudspeakers ◦ Finally, a hierarchical design was attempted, at 30 above and below the horizontal plane. A where each subsequent order is optimized sepa- decoder derived for that system via the technique

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.