ebook img

BSTJ 60: 7. September 1981: Digital Signal Processor: Speech Synthesis. (Buric, M.R.; Kohut, J.; Olive, J.P.) PDF

5.7 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview BSTJ 60: 7. September 1981: Digital Signal Processor: Speech Synthesis. (Buric, M.R.; Kohut, J.; Olive, J.P.)

Digital Signal Processor: Speech Synthes By M. R. BURIG, J. KOHUT, and J. P. OLIVE Man ip oct Jue 10, 1980) “Thie paper deseribes device that incapable of synthesizing speech in realtime and is based on the digital sina provessor chip. The ‘deve performs. function af a tcelfh-order linear prediction coding lontbesizer, and as such represents a linear dynamic system approx Imation uf the vocal act In this model, short time segments of the ‘peck waveform are derived aa vutput of w system rion By a ‘oeaderperindic impute sequence for coiced sours, or hy white noise {or uavoiced wrsuls, The time-varying nature ofthe system is derived ‘rom the input information presented tthe device or every nat plc ‘period. Interfacna of the devive to stanilard microprocessors i ea, 0 that the synthesizer cam conveniently be integrated into larger gatens ‘Seach synthesia ie one of several prominin ares of aplication for the digital egnal processor (nae) chip deserined in chia iaue of the Fell Stem Technical douraut. Ti computational power. low cost, ind easy interfacing are Une properties that allow a deri ofa stand: ‘lone epesch synthesier with very few componenca outside the Os? ‘hip, Such a syrthesiee can be used in a variety of devices intended for providing nro services in'& businese envizonment, as well as in fire vesidencal services. (One of the must sucrefal wayn ta agnthesine speech ia based on & lene predictive cading(C1e) model of a vocal erat. See Reta. 1 2, find 2. In this model, the veal tact i approximated by a linear “yname ayscem driven by impute sequences for voieed sounds, or by tnhite noise for unvoieod sounds (Fig la}. The diffrence equation for uch a reprosontatin ie 1921 ig. 1b Latin taf nate co sim= Basin 1 Gute, o In this dsereve description ofa Tinea system, ot) is che signal at vine instant» for: = 1, 2, oss, MY} ie che set of linear preition ‘eveliconi, nn) in the ayetom input, and (athe gain seficient. "A pitch eynehronous synthesizer is one in which a new set of coefficients (a, G} is presented to it for each new pitch period (10 malliseconds average for male voces). The coefcionts are held con- slant for tho duration of Uhe pitch iaterve. ‘This wsumes that the system properties are relatively slow-varving with sxpeet to this time scale. The variation is provided by un outside source of information. fand it eur be obleinad at the sure ither by naling mal spec fan Lec analvoer or by coraplee synch sel on phoolosiel rls, Tin dhe firs cast the ayalesizer functions a roveiver in a vocing aystem, whileim che second ease, jis final sage in aepeech synthesis stem. In addition to the Lec coetficints {a,) and gain coefficient ( the only ather variable needed for apeech ayathesis the pitch period ‘of the impulse excitation in che cate of voiced sounds, or a duration of ‘nase exieation in he case of uneniced anunds "The input-ourpue tranefer function of «titer ayetam cat be pre served under a set of equivalence transformations performed op the (in) parameters which yi! varimm representations of thu sme system. See Refs 2 and 2 This fac is une to advantage by slcting ‘ vepresencatin that i leat sensitive co parameter pertains wal ‘rtoes in fine length arithmetic In adiion, ic sla provi easy 1622 THE BELL SYSTEM TECHNICAL JOURNAL, SEPTEMBER 1961 {eat uf the stability of the system, One mich represencation ithe lattice fore ofa linear system, described bythe following tet of diference equations (Fig 1), Foss) = ft) — alt= BD gf = ha frsht) + Banal = Bb MeN Noted, o where {fad ,} are forvard and backward signa alan th Tatiee ‘age, and (haze reluction onefcients. The eutput of the stein is {itn}, and che input i {fel}. Even though this form requires Inore mulipicetions per staple ourpat, ie has a very important ‘property that all the reflection coefficonts belong to the interes] (Ch Dina stable ysten “A device that performs the function of « pitch synchronous synthe ciaer ofthe iwelfth order in the latice Form has been designed and bul ound the digital signal processor chip. The device aymthesizs fpeech in roa time with an ouput aampling ral of 10 KF. Te is intended io be uted in conjunction with some external device capable of providing the necessary system description for every new pitch period, This informacion i runsmitled to the synthesizer in che form Sta ffwenmond measoge, shown in Pi, 2, consisting of «header word tol for synchronization apd error recovery, a number reprinting the picch period of exeitacion, us wxeitarion amplitude, and Gall che system paramecere The parameters axe given ap 15-bit reflection ‘coefficient, or ae &-bi log-area partes, "The busi nyuthesizer may be interfaced to the outside world in a mambr of ways, and this will be discussed. 1 DESCRIPTION OF THE SYNTHESIZER “The synthesier Block diagram ie shown in Fi, 8. The main com ponents are digzal signal processor chip, interface for the input SPEECH SYNTHESIS 182 ee Fi >i ick ag ara, and an output ncerface whieh inrhule a fis-infrsout butler Inemors (F}. ‘The nuput of the Biowor! mem iy orverlad £0 an analog signal sheoush « digial-u-analog (D-%0-4) eonvercr. "The processar derives the synthesis rogram fm read-only morn ory (u0M), whichis presently extemal to the proceaor, could lw ‘ontained in the processor's intemal memory. The couputation within ‘he psp is done jo an aritiretic wal thal operates on 20-bit and Git operand, Notably, i nce an ecient malipbier and a 40-bit seeurmulator. These provide « dynamic range which is sufficient for the synthesis pplication. The parallel ane pipelined exchitectare of he processor miinlaine a high computational throughput ate, "The provescoe utilises a muluiplexad addeose and data brs for ae cessing instructions /dace stored external memory. To transfer in structions or data from the menor. the processor places an adese fn the bus during the first half ofthe bun eele. An addremt register ix tuts! i the aymthesicer to late the address, which specifies the ‘appropriate memory location, The data ftom the memory is lashed ino ut external dutaroister, und irageerred inv the procestor during {he second half ofthe bus eve, "Tho input interface of the synthesiacr consists of a parallel-n era: ‘out shift register and sstocited conctl ogi, Data request fom the 1624 THE BELL SYSTEM TECHNICAL JOURNAL, SEPTEMBER 1081 praceaor are passed Wo the ouside souree of informstion, and when a ‘word i oui ck Transtered eo the processor bi-serilly "The output interlare contains a senal-to-paralel eonverver, fis first-out memory buffer, and associated control loge. The output of ‘the bob in a bitscrial stream, which ie chiftd into the seral-in parallel ‘ont if repstar. The strmple is (hen transferted rom the aiftrepiator tothe rio, where iC joins the queve of previously computed samples. (A sorple fs taken onic of the rieo queus every 100 microseconds unler ‘ont of w TkH clock, and applied co the digtal-vo-anilog con "The role ofthe vtro memory is cw wen out Une differences in the three mutuelly aaynchvonous processes that are taking place during synthesis, The Girt process isthe input of synthesis information ina The stxond ithe eompueation of the speech sumples nd the ‘he astietof the samples to the D-to-A coaverier. The input ‘proces i largely slependenc onthe host compntar and its ranma Capabilities "The ure of the epi Piro relies the tranmnasion ‘mijicements Wichout che ro the data would be required ina burst fuotle for each piteh period. Bach burst would eansise of 15 words Transmitted within one sample period of 100 ys. However, since the iro queue contains the accumulsed! volt samples for up to 6 sample periods, th input data rate i etfectvely decreased to 15 words every 6.4 milisecomfs. The compucational process within the Dam ia fyuchronous with respect co the output sampling rate of one sare 100 ucroeeconds he rir allows che nse to compute nev samplesit fall speed hy providing the capahiity of saving the samples ntl they fre needed. This questing nexTunisn requires thatthe average com: putation time f less than che interval between oulpi request. The evioe meets this Tequirement, and in fact, Une Naw performs the ‘computations faster chan required most of the time. This implies that ‘here are times when the TI¥9 Tecomes fll, at which point the ompatation wil be suspend unl ourpue permission i erated to the nae. This happens mhen a word is taken out of the #10. 1, THE SYNTHESIS PROGRAM From the above description ofthe synthesizur, it x lear ehat it ean be uel fora unter of different ssihenis cher, with & variety of speech representation algrithss {UP, formant, ec. “The data rate required by the synthesizer bs w very important paracneter for practical implementation uf y woive resprnae device ‘The dats rate is function ofa speech production model used by the synthesize, and ofa coling scheme used to represent model param- tera in a segment of speech. A discursion of quantization schemes of {re parameters anv ther effete tothe qulity of wynilssize spel, SPEECH SYNTHESIS 1628 i given in Kef- 2. Hven though & significant decreare inthe input data 1 i porsible with these approaches, we will describe an implemen tation chat does not employ parameter quantization oF interpolation. ‘Such a scheme ie useful when the device is used in a syle by rule system. Other applications may require come form of input daca ‘compression, The same physical devise may be used in such eases, the only difference would be in the os program. ‘So far, we implemented two versions of the aymuhesis program for the synthesizer. Both of them employ the lactice form of a linear system. They differ ony in the input data representation, one of chem requires 15-bit reflection eoofiiens, und the other seeps bil loge area parameters. Clearly, the data rate in the second program is much lower chan in tho firs, with only a slit decrease in speoch quality. In this program the inpvt date i converted ino reflection coefficients by f table lnokeup procedure, The relationship between the reflection Coeficiente and log-area parameters is given by 1 w where (A,} is «oto log-area parameters, This rulationship i imple ented in wach ay th or ech ay — Avy: there ie ade of ‘look-up table. Since tho log-area parameters are specifi by B-bit fhumbers, the able conaina 266 cates, Ones che conversion is rade, bch programe funeion in the sae way, “There are three major tasks thatthe program performs repecitively fauring the syntheris. The fine tan in abeaining the data for the ‘yathecia of every new pitch period. "The input is handled jointly by the program and the input interface, When the program requesta data {np te interface obtaine from the hoet computer, and tranafera it into dhe ost. The protocol used by the interface is deseribod in the rest erin. This promade is ropeate far each date rae by Ue ‘ber, until all parameters dearrbing the next pitch period have been transferred fromm che host computer Tacaute of possible data transfer erore, © mechanigm ia provided for error recovery, Fach pitch period requires w header word and 14 uramelers The program will proceed with the evalesis only if the Fender is reocived. This provedtre ia sufficient guard agua iste fr ingerted date words. In che worst case, cine piel anne will be incorrect synthesized. Without the procedure, any inaerted o deleted ‘inla wor ssoule create = permarn synchronize offal with serious perceptual consequences, "The second task nf the synthesis program isto compute the epasch samples by uliizing the input information und eq. (2. ‘The program _makea «decision to syathesio void er unvoiced sounds on the basis 41626 THE BF1L SYSTEM TECHNICAL JOURNAL, SEPTEMBER 1961 och pitch value the pitch period is encode axa zero, the program simulator Lranserfonetion driven by white noie or 100 samples (10 mma}. The noise is computed in the program e© a peaud-random equente of the Tength Si. Otherwise, the input to the transfer faretion i 8 single pase ac the begining of che pitch period, and the ‘numberof prafuced samples fs equal tothe specified pitch value. The fmplivude information is ured for sealing the noise walue in ease of unvoiced sounds, oF for sealing the impulse amplitude for voired rounds “As son a8 sample ia computed, i is output to the FIFO memory, and the program continues withthe computation of the next sanple "This final sks of the program iz conditioned upon the state of the rire buffer, if che buffer i fll he processor sits until a word is removed. |W, INTERFACE DETAILS ‘The control sigels, which feclitate data transfers between the synthesier and a host progemor, ennsist of a data request signal igenerated by che device, nl calm ready line activated by che host. ‘There coo contro signals ure sulicient Uo define a com nestion protocol with he synthesizer, so that the device ean easily be {ntegreted into w larger ser, ‘Te sequence uf events that ccuna duving the synthesis procedure inshian i Fig. 4 When the ose requests new daly the interface logic visor che date request vgn, The li procemsor monitors this equest. fan places # new word on the data Hines, When the data i stable, the ‘st processor generates an edge-juliied data realy sgasl The word is latched in the input sift register 100 ri Tater. Once the input word latched, ¢ signal clo to reecive it sent tothe nev. The Ds? then [generis Ube aifl-cock pulses necessary to transfer the word avo is ingle. When the sifting is completod (at 400-ns/bit rat the data roqucst signal goes low, until the B&P mikes nner eeques fer ‘ee word ‘The antput ofthe os waaveous | re mn fs enabled by means of elear-to-tend signal SPEECH SYNTHESIS 1627 (crs). This signal is granted tothe nse by che vuro memory. The only limes when this signal isnot granted are when the VIFO queve i fal, fd fora short period after the ord is paced in the queus, The steer ison the order of 4 microsecomls, and ie m reaule of internal data ‘propagation in the FIFO, The vuiput protocol is shown in Fi 5. If the fs is granted, the transfer of the data from the xe to the Fiv0 Inermory is done in 180 phases, Ta the fra phe, he DSP outpucs a Titeserial data stvaum into che output shit regtter, and in the agcond phase, the information ie tanaferred Grom the shift ropster ino the ‘va, This cranafer is done upon receipe ofa positive transition of the signal Output Buffer Empty, generated by the pst at the end of ourput Shfting If the vizo momory becomes full, che crs isnot granted until ‘sample has been taken from the head ofthe queue and placed into the Dto.A converter. “Ther ane twn other varacona ofthe haste aymthesiercicuic chat Irave been tested alo, The ft cone i lightly enhanced version of the synthesizer which includes an additional frst-in-fist-out memory boffer ar the input ofthe system, show in Fig. 6. The motivation for this inpuc queuing mechaniam is agin based onthe fact that che input ‘process, which feuds the synthesver with the system coelicient, is asynchronous with respect Lo the pitch fraes, This configuration is specially useful whon the host procosor hes to compute tho reflection ‘oeflicinnts needed for nyisis in rovltiae, whie dhe synthase ix processing previously obtained paramelers Typically, the time 10 Compute the coufiieta haa aome variance, and this variance is ‘compensated for by the “elasticity” of che input fer, ‘The econd variation of dhe basic eireit coneaine no £120 ules. Ie detuonstrates a winimal syuthesizer configuration, containing only 16 Integrated citcuit in adcition tothe bse chip. It performs well when thehos processor is capable of providing the input dace et the required rate The synthesizer can easly be connuclel lo standard microcompu- seus LL | n hi sta Fy Fe 5m pia 41625 THE SELL SYSTEM TECHNICAL JOURNAL, SEPTEMBER 196) arr tiee ‘Scher ith pat 0 ae pes of vonsections have been triad successfully. One of ther a shown in Fig. 7, where che synthesize is contained un w single board inertaced dinecrly 0.2 standard microprocessors. The input interface appears a single memory location nthe address space of the processor, and the dala in ranafereed lo i¢ hy « single "move ‘netmition. The input F1¥0 bufer may or may net be implemented. If ii then dhe amithesingr interrupts the proceaor only when the buffer serpy: Without the buffer, the processors interrupted foreach dats tranater nuther rype of inenfce, shown in Fig. 8, contains u synthesizer sil direct memury-aeeese (owt creutry"Uhe put and output buffers fe not nesced in this vnigunation, xinos Une syntbesteer abla the ‘ata by accesing the processor memory. The processor ses up a DMA ‘wanstar by providing an adress and a data count to the synthesizer boar Ie is inverrupted nly when fhe specified nuraher of words have ‘been proceaed by tho synthesizer Te third way of connecting the synthesizer to a microprocessor i by means of «standard parallel iuerace board, a standard accessor. In this ease the synthosizer i nt u pars ofthe microcomputer system, rather it isan outside duview All of these examples show (hat the synthesizer may easily be included asa part of ollligent teri that ste usually buiearound SPEECH SYNIHESS 1629 Ty [pore Zs standard mivruprocensore, A combination of da sod voice servi fan be provided with such configurations. Applications that require Toe nth ute Uo Ubu syrthesiver woudl ase scheme widh quae ne ameter a eis program Una presides fr erp flog. area paramere of longer Ue incervale \. SOFTWARE DRIVERS FOR THE SYNTHESIZER In order to demonstrate the Dexbility of th synthesizer, two types of host computers were uted for implomentation of the ayathesizer rivers. One is « stand-alone microcomputer, bused on an ESTA mieropro castor. Ths configuration flues an enhanced mini UNIX" operat ing sce, and is iene for real tine speech processing exper nen: A parallel interee port in tod for driving tbe evasive, The hhard-shaking signals required by the parallel port are in agreement ‘ith the anes provided by the azmthesizes. The data request Tne interupte the microprocessor, which then éeansmite 4 ward to the symchesizer, At the same time « date resdy pals i ierued, which ia ged by the synthesizer to latch che date ‘A program that drives the ayatheszer with the data from a file containing the speech parameters is used a= same ew) [anne] ‘The progeam reads the file fename into a ber, and transfer it 0 the aymthesizer on the basis of the protocol deacrihed earlier. The ‘oplional arguments Hoop and frame ure used for testing purposes. IF 41690 THE BELL SYSTEM TECHNICAL JOURNAL, SEPTEMBER 1081

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.