ebook img

BSTJ 60: 9. November 1981: Frequency Scaling of Speech Signals by Transform Techniques. (Malah, D.; Flanagan, J.L.) PDF

25.1 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview BSTJ 60: 9. November 1981: Frequency Scaling of Speech Signals by Transform Techniques. (Malah, D.; Flanagan, J.L.)

erie 8 Ai eho ey Frequency Scaling of Speech Signals by Transform Techniques By D. MALAH® and J. L. FLANAGAN (toruseretreceves May 15,1981) ‘The general framework of short-time Fourier analysis, modifica tion, and eynthesi ia used fo describe tn unified way several now techniaues or frequency sealing of speceh signals. Subsequently, [requeney domain harmonte sealing technique ia studied in detail teith emphasis on improving its performance and ite implementation ‘ffciency. This technique is particularly adtractie for 2:1 sealing by Use of a sign tracking algorithm which avoids the need for explicit ‘phave computation and unverapping. The implementation efficiency ts achioved ty using the fast Fourier transform algorithm, embedded Geeimation and interpolation, and tm extended version of a recently ‘developed weighted overlap add eynthests scheme. The improvement fn quality ie achiev by improved sign tracking and elaborate design tan selection of the analysis and synthesis profolpe filters (data teindows). Results of computer simulations, for @ variety of adverse ‘acoustical environment conditions, indicate that the system is Righty robust but ts quality or clean apeech is lower than wilh a time ‘domain harmonie sealing technigie which uses pitch information. In ‘applications which do not permit pitch transmission, « hybrid scheme tahich combines the to techniques is found lo yield a better quality ‘han either system alone, 1. INTRODUCTION Frequency scaling of speech signals is @ useful method for reducing ‘the bandwidth requirement in analog and digital speech transmission tystems.”” In analog systema the frequency compressed. signal ‘ansmitled at reduced bandwidth. tn digtal ayecoms the frequency 207 compressed signal is waveform coded to provide reduced bit-rate feanaminsion."* A general block diagram of such a digital system is shown in Fig. Lin this figure, the schematic spectral representation of the input speech agra is shown to consist of «spectral envelope with pronoimced resonances (formanc peaks) and ofa fine scructure because ff pitch harmonien in voived speech, The spectral envelope of the compressed signal is realed version ofthe Inpot apeceal envelope. However, diferen frequency scaling techniques may Tesul in diffrent fine sructures, ‘Since we do not refer at this point lo any specific technique, Fig. 1 doesnot show the fine structure of the compressed signal. This suggests that frequency scaling techniques cun be clasfied according to the ‘way the fine spectral structure i ealed. In particular, one can dine futh between narrow-band techniques, uch as the phase vocoder? fd time-domain harmonic scaling (rows) techniques, which aim st eparating and ecaling the individ pitch harmonies, and wide-band techniques such as the analytic signal rooting (ast) technique? and the more recent constant @ transform (cqt) method” which sim at tirctly scaling the spectral envelope. The much eavlier Vobane and ‘CODIMEX aymema also fall into the Inter catagory. ‘Another way of clagefontion i to distinguish between timo- and ‘requency-domain techniques. To provide useful qualcy, time-domain techniques require pileh tracking, as dono in the TDas technique a = aN 5 soe sce] vf] Fes Gg Nah fem a ane rom nih et ney 2108 THE BELL SYSIEM TECHNICAL JOURNAL, NOVEMBER 1861 (nce the pitch is Row, he Umedomain operations are rimple and result in good qualicyecaled and reconstructed speech’ Frequency ‘domain techniques are typically much more complex but donot require ‘explicit pitch tacking. However, they usually have a lower quality ‘because of errors made in resolving phase ambiguity and from the need, in general, Lo sale oth the phase and amplitude signals, as will be elaborated later on. For applications in which pitch tracking is not desired or not possible, because of sdverse acoustical environment fonditions che use ofan efficient frequency domain technique ie of rich incerest. In thie work we prosont an efficient implementation of a fequeney domain harmonic sealing (FDHs) technique which is based on an Improved version of the technique presented in Ref, 12. Frequency: domain harmonie scaling ie a narrow-band technique which aims at sealing che individual pitch harmonies. I is particularly attractive for 21 sealing, noe in thie cage a sign tracking algorithm avoids the noed {or explicit phase computation and unwrapping (hati eliminating 27 ‘Phase ambiguities), which in general bem dificult and errorsprone task." The efficient inplezsestation is based on the recently developed ‘weighted overlap-ddtnethod for short-time Fourier analyie/synthe tis, which allows block processing using the fast Fourier transform (ert algorithm’” Ics extended hereto melide analysis and syntheain ‘windows which wre both longer than ehe #77 block, or che transform “The general framework ofthe short time Fourier transform (st¥1) as developed in svveral recent works’ also provides a unified de- scription of olher known frequency sealing techniques, and helps to Tele cham to the reals vachnique, "hc uni! dteption i given i the following section, and is followed hy a detailed description of the ‘ons tachnique, Soction TV gives the details of the implementetion fchema and Seetion V discusses design considerations nnd simulation ‘eaulta Section VI presents w hybrid technique which corabines 72 ‘anu rox. "This combination is designed fo applications in which i is feasible to extract the piteh at the trinsmilier bul for which the ‘eanamieson of pitch datn i either ipower toe avoided 1. AUMED DESCRIPTION OF FREQUENCY SCALING TECHNIQUES [A genoral scheme for froqueney scaling is presented in tis section ‘tia based on viewing the frequency sealing operation ata modification ‘of the short-time spectrum of the speech signal. This aeheme in then ‘used to describe in a unified way several known frequency scaling ‘techniques, Our atention in describing the differen uchnigques il ‘be mainly focused on the nature of the spectral modifications used by ‘each technique and not necesarlly on che way they are implemented. FREQUENCY SCALING 2100 “The relation botwoen the short-time Fourier transform (ere) and 1 Sler-bank analyse well extablished.*" For the convenience ofthis presentation, the filter bank which is wied to divide Ube twossided peech spectrum into aub-bands is samimed to consist of 8 complex bendpase filters "The center frequency ofthe hth fer is denoted by ss and ita complex (or analytic) output signal by 20) Tis also ‘towed that each bandpass Mize ha area low-pass Mer prototype, ‘which means Chat the complex impale response Putt, of the Ath Fite, given by Ault) = wteoxpLjont, a where un(} is the impulse reaponse of the low-pass prototype of Fu) ‘Nate that in general the prototype ere need not be identies, bat w= ‘ume that the bandpass filters are conliquoun and are arranged symmetrically shout «= 0, an that th filter centered ato = uy hak the sume prototype Mer ak the one centered at wm a This wa) the ‘wummation of the autpata from a pair of enrrespanding (conjugate) omples filters results in real signal. "The output signal from che Ath complex fitor hus the general form a = Addempl QO] ® where As() ia the amplitude, or envelope, function and O(t) i the phase function, The phate function can be written as a sum of fo components Ne) ~ wat + galt, ° ‘where the meaning of u(t) is elaborated below. By substituting 2. (3) into og, (2), 0 ave that 2(¢) can be intorprted us boing the roault of the simultaneous modulation of the amplitae and phase of the com- plex carrer signal exp jt) by the amplitude and phase signals Aal0) ‘nd gu), reepectively. The instantaneous frequency of zu) is wiven bby the phase derivative dt) = uy + pal), 0 that a(t) seen to be wntaneous frequency rom the center frequency ry of zu) by a factor q(g <1 for compression and {> 1 for expansion), it achiceed if the center equency ui scaled ‘hited to guy and the bundideh of 20, about wy, i alo sealed by {cia well known that tho bandwith of n signal which is characterized by simultaneous amplitude and phase modulation of w carrie signal is s fonction of both modulating signals" Hence, jus scaling the instan- taneous frequency deviation s(t) by a fhotor q docs nat result, in tuneral, in the exact scaling ofthe bandwideh of z(0) by ¢. The lnk taf adequate analytical models which describe the time variations of ‘the amplitade and phase modulating sgnals—for an inpat speech signal—hus resulted in variety of frequency scaling techniques which uc diferent analysis fier hanks and different modifications of the ‘modulating signala. Inthe following. the modifications applied to the ‘modulation signals by different [eequency ecaing techniques are used to analyze and compare the different tachaiques. We fet, however, Show thet the snxifieation of as() and. q(t) corresponds to the Imodiffation ofthe abort-time spectrum ofthe speech sige ‘Since eit) the output wignal from « bandpass filter having an impulse response ha ican be expresed as the convolution between (he input pooch signal x) end ho()- From eq, (1) this results 2x0) ~ Xa Heaps ” where ton t= fab tri ® Comparing eq 4) with eq (2) and using eu (8), we have ley, 0 — AudenpLia(ah o "Thin shows that the amplitude and phase modulations ofthe carrier explind) av fully described by the composite modulation function “Hien, f, Additional understanding of ee modulation functions can ibe guinea from the studies in Ree 18 and 18.‘The expresion for Xl, fin ew (8) shores that Xo, 2) equal co the value of the ster of x6) tthe frequency = am fale the window Function used tn weight the input signal" Thou be emphasized again chat in the present fiscuscion Une diferent bandpuss fillers covering the speech band have, in general, dilleeat prototype fiter. Hence, for ene bandpass filter one van define an s1¥ which, i evaluated a che enter frequency of thar filer, given the corresponding compaste modulation function, fall bandpace Here have identical provtspe low pase Hltere, only a single strt is neeled to find che value oF Xo #) for each dy by ‘evaluating feet at auch center frequency. With thie understanding, tre ml refer £0 Xt 14 he ATPT oF (0) at = we, even for the foneral cave of onidencea] prototype Hern. Denating the freaweney sealed version of zt) by alt) and che currecponding mofiied wre? by Xyleu fhe have sralth = Xylan Hespl gust o "The magnitude and phase components of Xele, £} are wevordingly denoted by Ant) and fst), respectively. Aa noted above, the mode Feat of Xen needed foreaact frequency scaling of speech signals isnot known, en the handwidth ofthe individual sub-band siznale ix tually only parilly waled by any given technique, Hence, to avoid FREQUENCY SCALING 2111 ‘excasve interband aliasing when che partially scaled sub-band signals ‘recombined, additional filtering of (0) may be needed. The filtering of ett) can be performer either hy bandpass Akers having a bund ‘wich which i tres ne brady of Use anya Glens, or wa llently by low-pass Cltering the msdied aver by the corresponding Towpaneprocorype fltrs. Figure 2 shows a general block diagram for frequency scaling which i base on modifying the meer ofthe input signal ws discused ubove. The impulse response of the synthesis low= pass filers which ane used to band-limit the ourpuc signals in each Channel are denoled by uy(). ‘These sealedsbandwidh syne Flters can generally be obtained from walt} By the relation alt) — tex(qt)- Inthe diagram of Vig. 2, only the details ofthe Ath channel re given since all the other channels are sila (see voli line). The Eltered modified arrr ik denoced in Fig 2 by Se(uy 1. The output ened speech signal S(0 is given by JA8 =F 2a =F Bilan expiant, ® ‘where, aa cten in Fig. 2, 2y() isthe Acb-chanpel sealed and filtered bandpass signal. The suztion in eg (8) is over the subbands ‘eis cloar from the above discaasion, and frem tho bloc diagram in Fig. 2 thatthe choiow ofthe Mla bank are Uhe sr modiextion are the kay laser for any given technique, While (he block diagram in Fig. provides a buss for comparing different (ecbniques, the actual Inplementations can differ, either heen of hora rewona oF the availebilly of more efficient or convenient ways for implementation. ‘We turn now to Une description of weveru known fequency sling techniques in certs of the srrrrnndifeatinn wed by each teshniue ‘This sill exempliy the above discussion and will provide @ proper perspective for discussing the FDUS technique and its propertice and Implementation. Ph} Mes daa of gn Yo 2112 THE BELL SYSTEM TECHNICAL JOURNAL, NOVEMBER 1961 21 Analyte signt rooting Tn the analytic signal meting technique” the number of bandpass filers is chooen to match the formant structure of speech signals 50 that, preferably, no more than one formant is present in each sub band. The approach taken in Re, 9 ell asin the carier CODIMEX rpater'"is to obtain aq(t) by raising the analytic signal a(t) the power of. I q-< 1, this corresponds to taking the 1/¢ root of a8) Inhich i the ongin fr the nine ofthis technique. Using the relations Jn eqa (4) and (7), we Gnd thar the 87% modification performed by thi technique is elon. = Xan o Tn terms of the modulacing amplitice snd phaso signals, Uhis modifi cation corresponds to Ault) = (4001, (200) and eat) = alt). a0) ‘To understand the effect of this modification, we note that sinee ach subband is (0 contain no more than ope formant, i can be frpected that mast often one-ptch harmonic, the one closest to the peal of the formant, is dominant tothe other harmnie in that sub- Band. Prom the analeisin Refs. 20 and 21 one can conelude that che ‘hase sealing operation in eq (10) sales the instantancous frequency OF the dominant harmonic in each band by q, but che other lower “Empltvde harmonics ar sifted in soc n wey that their spacing from the dominant harmonic roaine tnchanged. The result ofthis trans lations thatthe fine structure spectral components are not ecewarily Thermonie although their spacing is equal to the pitch frequeney. The scaling of the amplitude signale in the way given by oq, (10a) can be Show co aeale the magnitads of Ube nondeminant components (har- monic) relative (othe amplitude of the dominant component. I¢ slo ttfocte the intermodulation terms generated by the phase scaling. For 9 Js the effect ie to reduce the magnicude of the nnndominant Components relative to the dominant one and hence, effectively, 1» oder the bandwidth of the fequaney-scaled formants, To avuid trctssive interband aliasing, ita parieularly iaporeant in this tech figue fo use the and-liniting low-pass Bikers to(@) folowing Ene mailifation. "For more effective sealing ofthe arplitadeaignal, and with respect o the car which wees eonstant-Q bandpass tere, consider the ap proach by Ravindra? He suggests tha the As(¢) be spectrally analyzed far enrh by an additional bank of filters wad that te bandwidth be FREQUENCY SCALING 2113, sealed by sealing the phase in each sub-hand—this ean be repeated in ‘tree ike structure" The implemenlalion compleityof thi apprasch, however, appears to be exorbitant 2.2 Phare vocoder "The phase vocoder, ates name indicate, ean be used dineetly a 4 vovoder system in which the phase davivetive and magnitude of the ‘input signal eter are coded wn (ranted. °""" The phase vocoder technique can also be applied for frequency scaling and this aspect in considered here In the phase vocoder, the number of bandpass ters is chosen co match the harmonic structure of voiced spooch This means that a ‘elatvely large numberof flere is used 20 chat, praferably, no mare ‘han one-pteh harmonie i present in each subband. The fat that individual harmonics are separalely scaled, allows us to infer the ‘haracterstica of the modulation aignals in each band from knows speech properties. In particular, since pitch and vocal tract variations te relatively slom, the bandwidth of each pitch harmonic is quite arrow, x showen by the “pitch tnth” in the input apetram shown Fig 1 In view of thie fae, one woul expect that even if dhe pte Jnarmonics arc only shifted to the proper frequencies, without scaling the bandwidth of each pitch tooth, acceptable comprossion ean be achieved (ke. only a eal interharsnanicaiaing Wsexpected), provided ‘hat the compression rata is Fite to 2 or at moet 3, Indeed, this finds oppor inthe results obeained with che roms technique which swe discuss later“ However, oahift the pitch harmonica tothe proper Tncations requires knowledge of the pitch frequency oF, equivalently, ‘the deviation of each pitch harmonic from the center frequency ofthe sub-band in which ite located. ‘Let be the pitch-harmonic frequency in the Ath auby-band, with conter frequency «, and At the devistion ofthe pitch harmonie from the center frequency; ie. fs = ,— ux Then, the phase derivative ‘18 canbe expreasod as dul = an. + 10, a ‘where Yatt) describes the contribution af the phase varittons to the Dandvids of the piteh harmonic in Ge Jul aub-band, In the phase vocoder technique, the phate derivative putt) aealed by qs that in ‘Addition to sifting each pitch coach tits proper location, x parti fealing of ite bandwidth is obtained since y() is scaled as well. The ‘amplicade modulation signals are not modified in thie technique However, since individual harmonics are analyzed, the emplitde ‘signal in each band varies slowly [eee (a) of Hg. 6 in Ref, 19), and ite contribution to the pitch-tooth bandwidth is expected to be small 2114 THE BELL SYSTEM TECHNICAL JOURNAL, NOVEMBER 1981 Accordiaay, the modifiod amplitude ae phase signals are given by Anlt) = Al, 28) and aioe [est om Tt in observed from aq, (124) thal. the constant phase term gx() is tlisearded [note (0) ~ ff, Ba} +(e). This car have am effect fon the shape ofthe scaled signal waveform, but bocause of the relative {sensitivity of the ear toa fixe phase distortion, twas not jodged to be perceptually signif ‘Since in this technique individual pitch harmonice are sealed and the interband aliasing is expected to be smal, use of tho output synths filters, denoted by sy(t) in Fig. i les compeling than for the abn technique, but it ean sil be use Tt-ahould be noted that the phase voeodar technique can perform timescale variations of speech signals wimply by playing back the goal which has been frequency-sceled by a factor at (/g)-times the original speed. This estes the origina frequency tango but scales the signal's time duration by y. This useful property i ont shared by the aan (echnique Because of the way th pitch harmonics are shifted tnd because ofthe nonlinear sealing of the amplitude signals. On the ‘ther hand, the atm technique can be useful for restoring speech Gisterted by a helium atmosphere, where scaling of the formants trthoutchenging the perceive pitch of the signal i desired? ‘We tara now to the more recently developed time-domain harmonic scaling (rons) technique Although this technique is most efficiently implemented inthe time domain, i: wae formulated wn derived within (he ster framework 2.3 Time-domain harmonte scaling ‘As noted in the discussion on the phase vocoder technique, corm pression factors of up to 8 can posibly be obtained oven if the ‘pandwilth of each pitch harmonic i not scaled, provided thal the pitch harmonics are shifted to the corec frequency locations. Inthe phase vocoder thix necessitates scaling the phase derivative of the fret co that Sf, the frequency deviation af the pitch harmonic inthe ih eubsband from the omter frequency oy, is scaled by 4. The fappromc taken by che TOWA lechnique isto incorporate pitch infur- ‘ution which is obtained by a aeparste pitch detector into tho sealing process: If the pitch frequency is known, the handwith of each Fanos filter can be made equal to the pitch frequency and Une FREQUENCY SCALING 2115 canter fiequeney af each bana filter ean he aligned with the ‘corresponding pitch harmonic, o Uhat Aft = 0 for all the bacdpass fers which cover the speech band, Here, in principle, the number of Tpandpans ers also varies withthe pitch Fequeney and ix equal to the numberof pitch harmonios in the given speech ban. Hence, if ‘only shifting ofthe pitch harmonics is desired, as echemateally shown in Fig. 8 (for q = 1/2 and ¢ ~ 2), without scaling the pith-cceth handWidth, there is no need to mowify the modulating amplitude and ‘hase signals (Le. the sreri, but only co scale the carrer, oF center, frequencies, Thue, Kn = Xl 2, ay the undendariting thal oy i chonen tn coincide with the pitch Barmonic fin the Mth sub-band. Using eq. (13) im eq. (8), and assuming that no synthesa filters are used, the output-scated signal (i given by ton, expt jaa wy (a) ewonae z i] 5 — f anions oprsnanon bogey singe he ah 2116 THE BELL SYSTEM TECHNICAL JOURNAL, NOVEMBER 1981

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.