ebook img

BSTJ 60: 4. April 1981: Subsampling of a DPCM Speech Channel to Provide Two "Self-Contained" Half-Rate Channels. (Jayant, N.S.) PDF

4.7 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview BSTJ 60: 4. April 1981: Subsampling of a DPCM Speech Channel to Provide Two "Self-Contained" Half-Rate Channels. (Jayant, N.S.)

Subsampling of a DPCM Speech Channel to Provide Two “Self-Contained” Half-Rate Channels. By N.S. JAYANT. (Manure recatved November 12, 1980) We consider the following channelsplling problem: itis required to split a B-btsis speech-code sequence into two “self-contained” ‘BiB bits's components, either of which can be used to reproduce ‘acceptable speech; als, sf both components are available at a re Ceicer, must be possible to repraduce speech with full B-bite/s ‘quality. We propose solution where for intresting value of B, the Speech quality rewuling from halerate receptions approzimately ‘equals that from conventional fall-rate receptions at B72 bits. In {he proposed solution, 1-kH2 spocch is sampled at 12 Az and coded using tex or differentat Fes. ‘The output sequence of cude itords i pit into odd. and even-word sequences, A ullrate receiver luith access to both of the subchannels simply reconstitutes the output Sequence prior to decoding, while a half rate receiver with only the ‘ed for even) subcharnel estimates the even (or odd) componente by nearest neighbor interpolation. 4. TRODUCTION ‘The channel splitting problem deseribed inthe abstracts redefined in Fig, 1. The receiving end of a speech communication system ix supposed to operale in ether a fllzae or alf-zate mode deperiting fom whether i hs availa vo it both or only one of the speech eub- Channels, Respective qualities of speech reprodurian are denoted by Qel Hand Qut2/2). The aonavlaiity of one subchannel sa good ‘model for certain types of tansmiion failure, examples of which ae tigndl fring in mobile radio and speech sexment losses in packel Baitching. With apmwiau forms of diversity reception, the second Eubchannel willbe available with probnbility low te unity when the fist subchannel isnot. ‘The channelspliting problem has been re 1 extinct le Enea geee daira tel cal a Serene ih spent gay (hen os poe uf ecw ear ta “omen ~ cently analyzed for communication aysteme apersting at rate-dstor. tion limits with binary and Gaussian input signals!” "The nontrivial nnture of the channel splitting problem can be ap preciated through the simple example of a uniform quantizer By combining two appropriately staggered F-bit quantizer (one of ther ‘8 midvie, the other a midteend), one ean resline an (+ I)-bit aster, but noe a 22-bit ater. For fli-rate speech quality corresponding to Sit quantization, component quantizers would each need "bit (aot 8/2 = 4 bit) resolution fr the combination to yield bit quality. Thu, if the subchannels in Fig. 1-were simply uniform quantizers, and it speech were atmpled at BKH, one woul need ora BObit/ quantizer systema 0 that a fall-rate combination with 6t-kbiv/s quality can be realized. By contrast, in te differential pulse code modulation (OPC31) systom proposed in thia paper, the component receivers that combine to give Gt-kbit/s qulity are indeed halé-bic rae, 32-kbic/8 systema “Moreover, with an illustrative cencanco-lengthspoceh input, the all. ate quali @u(2/2) will be ahown to exceed the fullzale quality @(4/2) of « conventional urea syevem operating tt B/2 bit’ for interesting values of B. The quality Qx (82 b/s) i quite aceaplable S02 THE BELL SYSTEM TECHNICAL JOURNAL, APRIL 1081 far speech communications although somewhat shore of toll quality [Note once again that two uniform quantizers opereting st S2-kbit/s (2 4 bits) ouch oan only give, im combination, 40-Kbit/s (Skike Xf bite) quality, and not the desired Bt-Kbit/s quality. Similar argu ments apply to lower bitrates us well "The avstem of Fig. 1 is x pecinl ease of a communication scenario ‘hac ean be generalized to incite mune than two subchannels, and/or ‘ubchannela that are non-cual-rale.'The system of Pig. T can also be regarded as a apeck sytnaetrial case of embecled coding with a berarchy consisting of to equally significant subcodes, vin, she half= rate sequaees 1 SUBSAMPLING AND INTERPOLATION "The utility of subsampling and interpultion has heen demonstrated recently in the context of speech packet losses; speech-encoder out- Dus are parlitioned into odd-ample and even-saple systems which fre tronamted a2 separa packets. Ta the event of & lost odd (or fren) packet, the lore samples are extimated using nearest neighbor interpolations involving wvailable even (or odd) samples. With the tusual assumption of S2-KH epeech and @kHs sampling, the 12 sampling (at 4 KH) inplessevious aliasing effets, Iv thene eros fave mitigated by’ an aduplive interpolation procedure where nearest eto weighting coeltcients are varie follow speech statistic, ds elected by appropriate cxira information in packet header.’ The ‘Stem realies cama improvements with packet ls probabilities Up to about 10 peroent; bul a the component mostngr low peobabilice tpproaches 100 percent, min the ebannelepiting problem, residual tliesing occ ave ute unaccepeable, even with adaptive interpola “Te abuve obvervation ha lod us to the notion of 12-kH sampling for che problem of Hig, 1, With 12 subearpling. the half-supling rate ‘ll noe be GH, which burs out to e just adequate for the 32H Smsch inputa in telephony. We lao considered 16-bH» sampling, but this is lss prferable from the point of wew of quantization noise. In GAKbIL/s decoding for example 4bit quantisation of 16-KHe spocch produces more quantization moins than the 5-it quantizer that is Dorsie with 12-KHe speech, "A second advantage of 12 KF saenpling is that it permits nonedap- tive interpolations; adaptive incerplations yielded nearzero gains With 12:kHz speech, If KH is subsampled, it eannot be adequately ‘econeiguted by moniduplive interpolation even ifthe intorpolatin is invoked with « probuhiity uch lees than 100 percent” ‘Waveform reconstructions in dhe hall hit-ate Melf-cample rate) DPCM CHANNEL SPLITTING $03 systems of this paper aru decribed by Gln) = Avuin 1) + Avan + 1), Ava Ares. 0 ‘The samples u(r) will be quanti pres prediction error samples in general and more simply, they wil he quantized speech samples in ‘the special eate of nonpredictve, or nondfferential Po tu, THE pPcM CODEC Figure 2 shows block diagrams of fallrate and hall-rate DPCM systems with fixed first-order predictors, In euch case, decoding is defined by 00) fe hey — D+ gC, ° where q(n) is the quanti prodicilon error signal, and Ay i frt- ‘order predictor Inthe special cave of nondiferential ext (A: ~ 0), ‘9(n) is simply the quantized speech output 14) ou a(n ® ‘Subscripts M and Fin ig. 2 distinguish half and fll-rato versions of yin) and gn). cen fomgeea ie ares rant gE wnt [gocme fr a o 2 ok Arama Conventional feat nc (8) Dozer prion lira nce. a In conteast wich well-known adaptive nec aystems whore tho quantizer seep ste is adapted for every sample, tho present paper tssumes system there the step size i adapted once for each bloc: (of several ma duration), and held fixed forthe duration ofthe block" ‘A periodically updated, racher than instantaneously adaptive, quan- zee is usod in anticipation of interpolavion procedures, which are known from rocint experience to he unreliable when the sep size exhibits sample-to-semple Nuetustion” ‘The periodically wlaptve quantize ae defined by a lock-apaciic ep sae & that ie proportional to the root-mean-square value of teneralized fiaeifference in the form A lage = KCI Lx0n) ~ he-zt~ Uae 4a) A Lage = K(RD-Ly(n) ~ hey — Vm (a) with maximum nd minimum constraints Baas = 28 Bain= KAR) 12) er where [A(R], Ki(R)] are bit-ate-speciic constraints with euggested alwos of (0.25, 108}, (083, OOF (0.58, 012] and [1.0, 0.18] for 5,4, land 2.bic quanizersreqpectively. Te eubseripes AQF (4a) and AQB (ab) refer to forward-adaptive and buckwart-adaplive proceiurs, respective rms valoee in (4a) unl (4) are evaluated over the dueation of t apooch block to be coded in AQF. and over the duration of the ‘mt recent decode speech block in AQB. The AQN procedure isleas Cvctive becaune of mesch nonmationarcy os wall ax the effect of ‘quantization nce that ie present in the y(n) eaquenoe iw In (@b) However, stepesie information in an AQR system need not be sepe rately cranerited toa receiver itis inberently available inthe decoded Hin} sequence. AQF procedures, by eonscart, require che explicit transensson of tep-ize information (spically. about 5 bts worth per ‘lock of 16 ma)-In our experiments, quality lsses in AQB were more noticeable in full- bit-rate spooch Unnn in halEbitrae speech and in teach enze the losses were ofa sevond order ofimpertence. With thisin mind we have elected to cite only AQF resulta in section IV; those ress oan be regarded as upper bounds a far as quantizer perform ance is concerned. nthe context of 12 subsampling, the AQH procaure of (4h) cannot bbe implemented ac cuch unless 4, = 0 (Pc3). However, step sizes obtained by setting Ato zero in (4) have been found lo have fairly tal effects on bpeM performance. Dierences between vcm-optimal fand pecytoptiml step size are leas significant chan differences among sep sizes of diferent epesch blocks Moreover, the anboptimality of Powe-nutehed step aie becomes lees significant ay» "The next DPCM CHANNEL SPLITTING 505 ction shows that hal bit-rate prea favor values off chat are indeed ‘much smaller than thooe appropriate for conventional fullate DPCM. Iv, RESULTS AND CONCLUSIONS Figures # through 6 Mlutrate the performance of the interpolation procedure for the example uf a S-bil encoder. ‘The waveform segments fr toto 20-me black fromm 9 12.KHs bandlimited, female utterance ie chaivan cast threw votes” sampled at 12 KH, Figure @sthowsfull-rate and interpolated q waveforms (Ay = 05) for the two segments ic demonstrates that the nonadaptive interpolator (a) is reasonably adequate even for che fast-varying unvoiced example. ‘This is confirmed in Fig. 4 which ahows corresponding fll-zate and Ilsa » waveforms Notice thatthe hele-bit-rate output provides a such bettar reproduction af the voiced segment, bat nevertheless ‘eagonably effective in unvoiced speech reproduction. Perceptually, i esd nl rae relied seen gr nd gn = 0 Hh PELE Rapes Stata tar ae Se ee the pt [506 THE GEL SYSTEM TECHNICAL JOURNAL, APRIL 1981 (dn waveform degradations inthe hal€-bit coder are indeed fatty able for R~5, "The sigaificance of fy ~ 0 ia demonstrated in Fig. 5, which shows reat performance as a function of predicear coefficient value. The ‘objective quality meesure ured in the segmental signal-to-noiae ratio Sos defined che average of 101 #/n (B)-vnkues mens ver Ue cotiy uf 2m lock in be inp, Nove that maxinfzaton of ful-tate and halérate quill for y =09 anh, = 0.9, respectively, and notice also that these are aot very sharp msima, suggesting flenibiity for pracieal implementations The apecal, simple ease of hy =0 {pes lends to a noticenble quality degradation only for te ful rate eratem, ‘Figure 6 dopicts the performance of fullate and ballrate nreu receivers as funtion of ener bitrate, The performance curves (2) fad Gi) ave for hy values that masienie hal-rate speech quality (2s DPCM CHANNEL SPLITING 807 IES ea andre ti ste gue ada NE 5 for 5A hits /sample, hy = 0: for 32 bite/sample), The full ate ‘haractorstic shows the expected 6.dB-per-bit behavior, while the hhalbitvete characters fal more gradually with decreasing ‘Both chaructristics tend to the expected OB limit for no transmis sions (+ 0). The square dots in Fig 6 ropresent the performance of 1 fullsate receiver in a system designed to maximize full-ate speech quality (y= 08 t0 09) ‘An important observation from Fig, 6s that for encoder bt ates in ‘he important range of 2 to 8 bits/sumple, Qu(B/2) = QuIR/2: R= bile/eample @ ‘Thin saggeat that che halBhitrnte qualities in the sobenmple-inter scarf son dasa ons age et ha gay Pe re {Gisremmseat o lte cde dined fo maxes ui polate system are extremly gid result, considering the crucial onsraint tha the Hilf hit-raeystema combine eval to veld the full-tate performance Qr(). The approximate equality in (5) is borne ‘out very well in pereeplual wsscssnenits of Qo and Qr. In contrast to (6), enalycical results in Refs, 1 snd 2 ee quite pessimistic. This tifference in conclusions ie related to the fact that these analytical results apply a the rate-distortion limit, while the bitrates in this Paper are nowhere clos i the eatevistorton limit for pooch. In fu, bur bitrates are ih enough chat there ie suficient redundancy lft in te eader output to permit sabsampling and high-quality incerpola- "The relative performance of the bell-bit-riy reeuiver diminishes with incrensing hi rate, Clearly, as R— =, the quantization nove contributions @ vanish, Qe =, and Qu tands to finite asymptotic ‘value that showe tho effect of nearert-neighbor interpotion noise. Fisulls elaewhere® can be used to show that this ayimptoti va, for a firsorder Markov signal example, is approximately given by the fexpected value of 10 log [(2 + #2, (11/(1-~ Ra,(011 4B, where Rex) fa Blockspeciic adjacent sample coreation in the speech input 200. nally it woud be appropriate tn eaibeace che Qe and Qu values in Fig. with well-known defritions of cllequslily and communications ‘quality near perfct ineligibility with eoliceable but not obstrusive Alegradacion} Although siilations and conclusions have cencered on fs eingle earlieried Goat inj, ie appemts chat fll-bit-rate DecM Fealias taluaity fo H — Band 4 bite/sarpte and communications ‘wualiy for #3 ad? biby/ennple. The all wate rea receptions in the proposed system apneic tll quality with A ~ 5 bica/sample and Inaintem ond romirunicarions quality at A = 4 and 3 bites REFERENCES TAH 8 8 mg 2 ae Cis eine Deni 2 MRL PE am oe hn eee Tika aaron os Eee ee epi nt DPCM CHANNEL SPLITTING 500

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.