ebook img

BSTJ 60: 7. September 1981: A Comparison of Three Speech Coders to be Implemented on the Digital Signal Processor. (Cox, R.V.) PDF

5.4 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview BSTJ 60: 7. September 1981: A Comparison of Three Speech Coders to be Implemented on the Digital Signal Processor. (Cox, R.V.)

ans anon gt Tn A Comparison of Three Speech Coders to be Implemented on the Digital Signal Processor By. ¥. COX (tarsi ecnied June 24, 1960) “Tha recently developed digital signal processar ina device used for Iinplementing ow: to mdium complexity speech coders, It is cur rently being used in implementing adaptive differential pulse-code ‘modulation (abPcm) coiling, two-band sub-band coding, and four brand sub-Band coding. This study was performed to determine opt ‘mat parcaneter ues for the ta eub-band coders in preparation far thet implementation onthe digital signal processor and fo determine thee performance relative to ADPCw. (The actual implementation of the anpew and wo-band sub-board algorithms are discussed in other Dopere in Part 2 of thes isue of the Bell Syst Technical Journal Performance was Judged on the bas of segmental signal-tonoise ratio and 1 forced-choice, aubjective comparison tet of the clr. AU three coders were simulated at bitrates of 16, 20,24 28, and 32 ‘his. The simulations were performed on a Tahoracory computer. 1. mrRopucTION ‘The recently developed ost ia & devioe for implementing low.t0 _muaium-oomnplesity speech coders, Three oolens are currently being {mplemenied. ‘The simplest coder ix adaptive diferentat palse-code ‘iodalation (appen) and is diteussed in Fef. 1. The other two are in ‘the aub-band coder (ste family. OF these, the simpler onc owo-band fub-band coding (28-4n¢), featuring quadrature mirror filtering and ‘Bro equal bord, Its discussed in Ite, 2. ‘The other coder—the mest complicatet—is fourhand sub-band coding (48-880), featuring four ‘squal bands Its implementation ia atin progress, "This report discuss the inl dasign paramler for the acter 80 coders ant the relative performance ofall three. Segmental signal-to- foie ratio (een) aeasirements were made on all three coders via ‘omputer simulation a fvedillerent bitrates, 16 2, 24, 28, and 32 ‘hI ation, 12 subjects raed the coders in a compan es. simulations reported here were earrid out on a laboratory com= paler a preprsn forthe implementation athe S80 coders the Section I reviews the design of anreat and discusses the design of the ceo sub-band coders, Section IM discusses the remlte of the subjective testing experiment, and Section IV gives the conclusions of this etudy, 1 DESIGN OF THE cones: 2.1 Design of ADPCM "The aDrcM design simulated here i based on the design of Cum riskey etal” A block diagram of the apres coder described below i shown in Hig. 1. Tho most significant change from the design in Het. 248 that only to muliplier valor are ae in ennging the sepia, ‘yale of the bil rate, This sed on he ADPOM implemented by ‘ohnoton and Goodman* This version of AnPex has alreedy bees 2g sage i. a Be presi Fig !—Adgpve difereatl uecade dean cer ed in sition 1412 THE BELL SYSTEM TECHNICAL JOURNAL, SEPTEMBER 1901 implemented the nse and is desrib in Ref. 1."The anvem design ‘Was also used for quantizing the sub-band signals in the olhur eriors. "To siete 20 Kb/s and 2¢Why/x ees, alternating quantizers were used, For 20 kb/a, the (wo quantiens used are for 2 and 3 bite. Since the sterrsne is adapted baced only on the most significant magnitude bit, the same step-sive adaptation algorithm is used forall samples, ‘The ratio of 2 to Sil quantizer map-sae is held constant. ‘This requires one additional muliplisation to eonvert from the 2- to &-bit topsite 2.2 Twobane s6C [All sub-band coders are male ftom a fow fandamental building ‘ocke'The frais Linear Altering co divide the signal ino two oF more sul-handa, Theo sub ands ean chen be decimated toa lower sampling ‘rate than the original signal. Some form of quantisation must be used tn encode and quantize exch band, Tnerpolation and additional linear filtering is used to bring each band back co the orginal sampling rate fnd te is original anaes inthe frequency spectrum. At this point "hey fan be added together to prosuce-an outpat signal "The quedrature mirrrftering erfnique is ani wellknown for its tase with sub-hund coders, Each pair of quadrature mirror filters (917) produces two eub-bands of equal width in frequency. Jobson has Compile a collation of diffrent kegth aus" The possible quantizers ‘ohlch can be used are adaplive delts modulation (xDM), aDeest, and {daptive puse-code modulation (AFEO). Each of hese techniqies is fairly well known and understood. Likewise interpolation and deci- Imation are also well understood, So what remains is Une task of ‘combining these building blocks in such a way aso i on the nse and, ils, give che best puis performance. One ofthe vss of thia study tras te choovo gon candidates for implementation. “The 2n-anc desig in based on the 28-#Rc commentary grado coder of Johaston an! Crochiere. That coder was developed with the object of maintaining a high-quality aM radio smal. Is parameters were (ned 9 musi rather than to speech. ‘This section describes param ‘ler for a pesch bandwidth version. "There are five possible bic raves fnvisioned. Hor « mone detiled discussion of subsband coding in ‘general and the exec implementation ofthis coder refer to Ref. 2 igure 2 is. block diagram of the 28 sic. The input speech has been bandlintd fromm 200 t0 3200 Hz hy 1 sharp bandpass filter and sample al 8000 He. A S2-tap qb designed by Jobnston is ured for Separating the digitised speeh ilo the Ewo sub-bands” Afr 2-0-1 ‘ecitmation on both bands, we found average correlations for speech of (07 and ~0.45 forthe low and high bands, respectively "The two bands are chen coded using ether abPCw or ADM. The ADM SPEECH CODERS 1413 ww Le ae Lo ES 4° in used only for the higher band at low bic rates. Ite based on the DBE of Jayant The proticion ovoicnes wae for these coders were the ‘correlation values mentioned above. Sine the high band has a negative correlation, it was frequency inverted before quantastion by the a3, TDeeause this ADar requires a positive correlation for ite adaptation mechanism to work properly. Since Trequency inversion just meens changing the sign of every other sump, thsi avery minor operation. "The next step was to determine optimal bit alloeations for Une Tow and high bands. After experimenting with different bit llocaions and covalunting them on the basis of segmental SN measurements and informal listening, Une following bit allocations were adopted forthe five bit tes 18 Kb/s low: 3 bits high: 1 bit (aus) 20 kb/s low hits high: [bit (An) 24 kb/s low Gils high: 1 bit (ana) 25 kb/s low 6 bite high: 2bita $32 kyo 6 ite igh 3 bia ‘Some alternative designa were very close. Por instance a (4,2) allo ‘ation for 24 kh/a ia almost ax good as (, 1) for the speech it was tasted on. Perhape ifthe speech were les sharply bandpaas-Sltered 1414 THE SELL SYSTEM TECHNICAL JOURNAL, SEPTEMBER 1081 and if there were more high-frequency content (auch asin telephone pooch) che better allocation woul he (4, 2) for 24 kb/s 2.3 Fourband SBC desion ‘The dnsne design described here i new, although it is a loxical extension ofthe 26 ment 1d above, It aera with the tome 180 fub-bands ap the two-hond design. Hoth of these bands are then fivided into to new bans, yielding a total of four equally spaced bande, The filter used for Ue additional division in each band ie the 1S-ap coer of Johnston designated Cin Ref 5, Figured whose» Hock iagram for this vader, sor Llp se fy cornell 2a re aefcee re moet | gree LT sameeren | el ea Tb md TA. : SPEECH CODERS Once more the bands are quantized using aDecit or ADM, Our rmensurements of average correlation for epeech data showed correla: tions of 04,0, 0, and 8 for che four hands goin trom lowest a highest in fequency” The fourth band (2000 to 1000 Ha) has actually Deen Dandpasefiltered to cut off ac 2200 Ha. As a reeul, ie conta Hite poser and can bo ignored for low-bit vate coders. The comelations of ‘tho two middle bands are zero, reflecting thatthe long-term average of the speech spectrum from 1000 to 3000 Hx is Mat. If « prediction cocticient of zero is used with abredt, the vault i aves. Thos, the fen middle banvia are arew-enooded, The lagest amount of power is In the firs band; therefore, It receives the mowt bite "The bit allocations fou to be the best by the same segmental sv mensuroments and cama Hscening paves were a fellows: 16 kb/s 4,20 (hands 1 to 4) 20 kb/s 5.2.2. (abu on band 4) 2a kbs 5420 ‘The greatest amount of error occurs in Uh Tote ean. ven at the high Tater (28 and 32 kb/s) this error is still percepcible a8 a low ‘rumbling noise. However, it was found that a high-pass filter with a ‘cutoff af 200 He eliminated this problem, "The filter ured wan 8 12-tap ‘in fter. Table L gives the costcints, and Fg. ¢ shows the frequency sponse, A much smaller un iter could also be used todo the same job! Note that the above bit amignmenta were made without using the ri filter. With w high-pes fer, fower bits could bo alloerted to the Towa al and mar to nel wo ad three 24 Relate complexity of deaigne ‘The anreat designs for 16, 24, and 82 kb/a have alresdy been ‘implemented on the DeP. The combined encoder and decoder algo- rithms use 48 percent of the ust real-time capability for a sampling ‘Table I~Costficents for symmetric Fn high-pagsflter 1416 THE BELL SYSTEM TECHNICAL JOURNAL, SEPTEMBER 1981 ig. —Frequney rsp pas ler ar 8, ‘rate of fia, An even lower poreencage of xan and Hox memory is tied, The Onan based onthe design parameter reparced hore has iby been implemented on the psp chip. Tt usee 88 parent of the ral time enpabilcy and 78 peroent of (he RaNt memory. Ie includes an i ‘bandpass fiver for the input, ‘The 4@-#8c algorithm is planned for implementation in the near future. Since all ofthe major portions of the dn-sue have been programmed already forthe 2u-s8¢ impleren- tation, itis possible to projert how much of che se willbe use Both the tranmtter andthe receiver will roquiro « nap and ch vil ws bout the same fractions of realtime capability and RAM a8 the complete 2u-sbe algoitm. ‘Therefore, 20 we might clisiy Ube Cree ‘oder as having commpleices of 05, 1 nd 2, respectively. I, RELATIVE PERFORMANCE OF THE CODERS ‘Since the sue designs ave more compler, a demonstration of their improved performance over prea was needed (a jusify their irae ‘mentation on the Dae. To demonstrate their rolative performance all three coders weresimolated on laboratory computer Fach processed fpeech from a stored file, The realls were evalusied by both an Uhjective and subjective meneur, The objective measure was eogmen. {al sem, while the subjective measure was » forced-choice, subjective {A+B} comparison test in which all possible codera were compared. ‘ix phonetically balanend asntences were used for evaluating The coders ‘Three were shen by male speakers and three by females ‘hey were recorded uring a linear microphone. They were band- limited from 20 0 2200 Hand sump 1 8000 Ha using 2 16-bic linear quantizer 3.4 Segmental signattowoise ratio routs In comput sogmental sux measurements, blocks of speech of 82 ims were used, The anipem coder was compared with dhe original input lpeech. The sn codere were compre with reassembled speech which Imad been processed by the appropriate qMr Mering, but with no SPEECH CODERS 1417 quantization. Thee slightly modified speech signals cannot be distin fished from the original in casual listening, Without thena it would be Aificale to make w fair comparison of the throe codors on the basa of ‘xn, The measurements on 48-80 were made before the L2L-tap FIR high-pass filtering. "The results of these measurements are summarized in Fig. 6."They show thatthe mote cemplex she coders havea definiee advantage over lanncar nt the Tower-it rates Tterestingly at 2 kb/s, anPcat beats ‘oll of the more complex covers The 4R-MRC mainline fatly conscant 24H advantage over 2u-sbc. In terms of Bit raze chis tran lites to 4 kb/e At the low rates, the 4u-sue has about # é-tb/s advantage over ADPCM. 2.2 Subjective testing ofthe three coders An A-B comparison tat was performed to rank the thros coders. ‘Each coder at each rate was compared twive aginst every other coder at every rate, as wells azainst che original. Inthe owo comparisons of the two coders, each one wan paved in fst position once There were 12 parliepants in the text and sliogether there were 240 compariaons ‘The test wan bruken down into twa part, ane with 110 comparisons, the other with 190, The parceipants listened over headphanes in a soundproof booth. The participansa were also broken down into 060 {rouns of wi. If one group listened to & particular A-B comparison ‘witha female speaker the other group heard a sentence with a male speaker and vie versa. Thus, we atcempted tomake a totaly balanced fd unbiased test, Fi. Sermon ne mensuemens or the codes, 1418 THE BELL SYSTEM TECHNICAL JOURNAL, SEPTEMBER 1861 {SIS055 sao wueBe BUN 1 UBL Ps 72pe5 1G pOUIEIGO 1099706 1vaee Nd SPEECH CODERS Pi. 6—Oveal peor racking fh sua, Table Il gives the individual coder versus coder comparisons. In sddition, an overall preference ranking was computed based on the total number of votes received by aach coder. In al, a toal of 25) votes could bo received by any coder. Figure & shows the percentage ‘of he 380 possible vos rocelved far each rnd. Ther rs is in god ‘aureement with Uhe resulls of Fix. 4. Fur example, both subsbaned coders sow adantage over Apres at the low rates and ADPCM catches [Up oF pases them atthe high rates. ‘Some ofthe more significant results ae the following (i) "The db-sBc has an kb/s perceptual advantage over ADPCM at the low rates, The 2i+kb/a abpoa has been used for voice storage and playback syscoms ‘This result indicates that 16-kb/s 4u-suc could be ubettatel at a 88 percent aavings in storage of, equlvalenly 50 pereent increase in message storage capability. Moreover at 20b/s 2u-s6c hus a 4-kh/ perceptual advancage over ADPCM. (i) Alchough 40-s0e lost to aDPeM at 32 kb/a in sR measure ‘ments i beat aves in the subjective vests. In addition, indirect 1420 THE BELL SYSTEM TECHNICAL JOURNAL, SEPTEMBER 1261

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.