ebook img

BSTJ 60: 3. March 1981: On the Use of Dynamic Time Warping for Word Spotting and Connected Word Recognition. (Myers, C.S.; Rabiner, L.R.; Rosenberg, A.E) PDF

10.8 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview BSTJ 60: 3. March 1981: On the Use of Dynamic Time Warping for Word Spotting and Connected Word Recognition. (Myers, C.S.; Rabiner, L.R.; Rosenberg, A.E)

THE BELL SYSTEM TECHNICAL JOURNAL On the Use of Dynamic Time Warping for Word Spotting and Connected Word Recognition* By C. S. MYERS, L. R. RABINER, and A. E. ROSENBERG (tanuscit received Seotember 9, 1980) Several variations on algorithms for dynamic time warping for snecch processing applicatins have Iwen proposed. This paper cam ‘pares tio ofthese algorithms he faed-range method and the feat ‘minimum method. We shew tat, based on results from same simple Utord spotting and connerie? wud recognition experiment, the focal minimum method performs considerably better than the ficed-range method. We describe explanations of this Bhovior and techniques far optimizing the parameters of the local minimum algorithm for both word spotting andl connected word reeapnition |. wwtmooucTION ‘Time registration of test and @ zeference patter is one of the fundamental probleme in the area of automatie speech rocogniion "Thie problem is important oct the Lime sealen of a text and 2 reference patter are nol. perfectly aligned. In some cases the time reales can be registered by & simple near compression or expansion however, in most eases, a nonlinear time warping is required to Compensace for lol compression ot expansion of the time ecal, Por ach cages, the clas of algorithms known a dynamic time warping (orev) mothors has been developed. Work by Sakoe and Chiba, ye raph emel in aginga ome MS partir uty ENT eR Re CEE Noe Uhura! and White and Noel? as shown that rw algorithms are an ‘efletive method of time registering patterns in isolated word recom nition systems. Bridle and Christiansen and Rushforth have studied the applicability of paw algorithma to word spotting, and recenly, Sakoo, Rabiner and Schmid," and Myers and Kebines,"bave success. fully applied dynamic cime-warping techniques to connected digit recognition. A reat deal of work has been done in the aren of performance evaluation uf the various Mw algorithms as applied to fisrece word recogition,” ” However, he eTets of the D4 param: ters on the overall performance of the algorthen for either word patting or eannectad word recognition are not as well unversiond ‘The purpose ofthis paper isto dicuse several propaned metheals of applying ony algorithms to word spotting and connected word recog tition, and to study some ofthe factors which delarmine the perform tance of these algorithms. "The organization ofthis paper is as follows, In Section It we review the basic dynamic programming mathod of time alignment and show how itmay be ured efficiently in sthor a word spotting ora connected Word recognition problem. We describe, in detail two different vw slgorithms for which we have performed extensive evaluations. Section IM conten a description ofthe exporiments which we performed to tvalnata the performer of Uke siferent Dew algorithms nnd the tHiects of the parameters axsoriated with ther. In Rection TV we ‘summarize the reults of these experiments and draw some general fonclusions on the use of brw algorithms for word spotting and ‘cnmected word recognition 1, DYNAMIC PROGRAMMING FOR TIME ALIGNMENT In this section we frst review the base principles of prw algorithms 8 plied lo dinrete word oeognition, and then point out some of the inherent difcuties involved in applying these algoridhms co worl spotting end connected speech recognition. We then show how it is possible to modify the basic orw idea so thet it may be used for both connected word recognition and word spotting applications 2.1 Oynamtc te warping for scrate word recognition ‘Tho problem of time aligament for discrete word recognition is ilwerated in Fig. 1A ruferonee patiers, Reh = Te ey Ny consicting of a time sequener fe, Tames) of 4 rulidimensional Feature vetor isto be Lime regaled with a test pattern, Ta, m= 1,2, M, which it ako represented as a time sequence of a rullidimensional feature veecr. ln Kig I, or the aake of clarity, Boch Ro) and Tim) ore shown a one-dimensional functions. We chal, assumie that both the rferenee andthe tes pattern ae measured fem, 204 THE BELL SYSTEM TECHNICAL JOURNAL, MARCH 1981 a the acmatic wuvefann of «single ord, spoken in inlation, nd chat hth the bepinning and ending poinss af the reference and the cest patzenn have boon aveuralcly determined. The problem of time align Ian eto find the path, here jrsseevized hy Une fmetion pai 8, GOD, which minimizes « given distance metric. A eypical distance etre” ef He Far tae, aay ea iui, 801 Sarr wher Ks the length af the path, ated, jt isthe local distance, or ‘iwimilavity, between treme i( of tho referenoe pavtern slr [do of the ast pattern, TUR) is a weighting function apnied wo the fac, and N(WW) ia a normalization factor Which is hazed on the prtiular weighting fanetion that is chosen. Th advon tn minimizing che alobal distance, the time alignment pach i chosen to have certain desimble properties. One important Droperty i che proper ime rexisration ofthe heginning and ending pitts of the cest and reference pattems, Le, DYNAMIC IME WARPING 305 w=, i=, es) Hay, 0) = 2) ‘Also, the time aligament path is required to obey certain shape and Slope constzain For example it would not be reasonable to allow a pal for which 10 to 1 expansion or compression of the time suis boeurs. Another consideration isthe preservation of time order, Le, the functions i) and (A) mus both be meneconically increasing "These loal continuity constrints are gonerelly doscrbed by spesi- fying the fll path in terms of rape Tox paths which may be pioce together to form lngce paths, For rsp, to reach weil pri itrmay be reasonable to have come frm any ofthe gid poines (1 — 1, mn in~ 1m 2,08 2m ~ 1), a8 showin Fg 2, part We ter to these constraints as Type T local constraints, Some othor propose ets of lca constraints are sie in pails by and dof Fig 2 Tho eroased out are in rtd sgnifi Uhe restriction that path ‘may not move horizosally fortwo consecutive segments," All these Tht onsrains Timi the averal slope ofthe time alignment contour Rg ta cen ed o dmai ine waring 206 THE BELL SYSTEM TECHMIGAL JOURNAL, MARGH 1081 tebe hotween ‘eand 2, in accordance with the results found by Sake sind Chiba” "To solve for the optimal time-alignment path, both the weighting function, WM, and the normalization factor, N(W), must be specified in widition to the load constraints. Typically WR) ia chosen to be ther of two functions, ie, Wy = a) ~ at) (Type a), (80) WOR) = ak) ~ tk) + jC) RD Typeb). ab) ‘There two weighting functions aze refered co as the asymmetric ‘weighting function, Type m eed the symmetric weighting function, ‘Type b, and were argially proposed by Stkoe and Chiba.” Weixhting faction Type a weights all frames of the reforence pattern equals, hile weighting function Type b weights al frames of both the refer- fe ard the test equally. For initialization purposes (0) and (0) aro ‘lefined Lobe O and thus W(1)~ for weighting function Type @ and WL) = 2 for weighting fusion Type b "The choice of N(P) i typically mado such that Dik), j14) isthe average loal distance along the path definod by ah) and 7(8), and is independent of both che lengths of the reference and test patterns, sre tthe length ofthe time alignment path itself The natural choice or NIW) a thas NOW) = 3 van, “ ‘For weighting functions Types ¢ and b the normalization is given by i) = Fle — 1) =H) — 10) =, 50) 3 (uh — ih 1) + 70) - j0k— = 10) — 0) +10 ~ j00) Given a weighting function and we of local constraints cis pile 12 define the optimal time aligament path as thar path which mini ings the total stance D(A}, (A). Mare formally, if we denote the lstncesneoriatel with the opimal path as D. then B= min, (PU, JU “ NU) NeM, ) ‘The sousion to tht problem maybe found by dynamic programming by aoe ofthe following optimality prineiple: Local Optimality: I de best path from the grid point (1, 1) to the vilpint (20) gee crough grep (wm, then Une bes pat DYNAMIC TIME WARPING 307 Tham the ged point (1,1 tothe gid point (n,m) includes, 8 portion of, the bes path from the grid pont {1 1 to the gid point," ‘Thus if we define Dain, m) as the minimum total distance elon ‘any path from the gid point (1 1) tothe grid point (rm, then Dal, ‘nl can be computed, recursively aeording Wo the optimality principe, Dale, m= mi [Datn’, mY + an’), (md), 7) shore dn’), (2, m) i the weighted distanco from tho grid point Gr’) tw the grid point (n,m). Bor example, for Type T load contints and an sasmmecre weighting function, n’ and nay take fom any ofthe fellowing values, (rm VE (in mT m= 2B m— YH) and din’), (n,m) i given by an =m ~ 1), fo, md) = dm, ‘0 Xen — 1, m= 2) ml) =, cy en — 2, m1, ma) = 2a, (9) “Du he fll 179 recursion for Type Hlocl constraint and weighting function Type a i given by Pata mad = mink Dan = Le 1)-¢ dim nd, Dal = Lom = 2) + diay m), Dal 2, m1) +240, mL (10) Using the local optimality principle, a complete wrw algorithm igiven Ty the alyrithn Step 1, Initialize Da, D = at, DWI Step2 Compute Dyin, recursively for l= n= Buns. = DAN, AN) NW Isms "This completes our review of thw hase principles involved in app ing dymamie programming to disereve word recognition, We will wow tdesribe the diffulins which arise when DFW algorithms are applied wo conneeta worl recuguion problema and then we sil show how the Dw principle can be modified for word spotting and connected ‘word recognition applications. 22 Dilicutieem connected word receantion We shall assume that we aro given w (est paltern consisting of a sequence of connocted wards, spoken in w normal manner, for which the global beginning and ending poins have been accurately located 908 THE BELL SYSTEM TECHNICAL JOURNAL, MARCH 1981 land for which no furcher waxmaentation has been atcompted. Given snuck a framework, che word spalling, problew ia co determine al fubeectione of the tee patter it any, which match with a specified elorence pattern, called the keyword. ‘Thus, for word spotting. a Inultplicgy of tegions ofthe lt patern must be compared withthe leyword pattern "The connected word recognition problem, on Uhe other hand, isto piece together reference pattama (obtained, in all our work, from Bolated occurrent of words) to mach the lest pattern, The general fpproach to this problesn wil be the one proposed by Levinann and Rosenberg.” namely Gi) Find tho reference pattra that bes. lx given section of the test pattern. Gi) Use te position within dhe tet pattern at which the best mah ing word ends to poste the beginning ofthe following wor (ai) Continue to concatenste reference patterns in this manner unl the teat patter i exhausted. Dynamic timewarping algorithms, as they have been appliel to sliserte word recognition applications, aro not directly applicable to tiher che word spotting ar the connected word reengaition problem ‘There are ovo reasorm why this is 20, Pure karate some of the problems which ave eicnuntered, In hie fquee we show the cime a emo ust iy Lag sa fo tn gmc llr DYNAMIC TIME WARPING 308 patoem for log intensity of two speech utterance, "3," "8" in part and “2H” in part b, The werance in pare a sas spoken with a Aiacerible pane between the "and the “8” while the utvranee in pare b waa apoken with no discernible pause between the "3" and the "8." Dynamic time-werping algorithms, as they have been applied to ddserete word recognition, require a reliable set of word boundaries, However a ace in Fig. db, a reliable segmentation forthe ullerance 38" ia dificult, ifnot impecible, co obtain. ‘Another difficulty in using Drw algorithms, based on isolated word reference template, for connected specch applications is the problem ‘Of coartcaltion between wor. Por example the final // of the word "3" andthe intial /e' of the word "8" coarticulate strongly with each ‘other. Thus, nother fundamental assumption tht hat been relied on, "nnely Ubi the ehunieerits of Ue toate reference words which ‘were tying vo mntch Wo our test ulteranee can be truly found in the {est pattern, snot vali. In the next section we will describe the basic lachniques that wil be used lo overoome thee cfc, 2.3 Basle approsches to connected epsech recognition probleme In our approach to connected word rovomnition and word spotting wo will make two changes from the structure of te eolatd word Dew tirichm. One change ik to no longer attempt to find the entire Inolnted reference pater In he lest pattern, We wl ll ase isolated svords a2 ou reerence patvame but wl only expect «good mach in the middle of the word, and not necessarily near the ends. Thus, we rl ot nequine that we be able to aceuracaly atch the beginning and ‘ending points of che reference pattem to points within the cast pattern ‘Ana result, we would like to consider the possibility of overlapring "oference pattems to recognive connected spoech, In this manner we Tope to account for both emors inthe endpoine locations and for some ‘ofthe gro features of coarticulation, Aner fundaunental modification to the basic now algorithn i the vse of beginning and ending regions rather than beginning and ending frames. Inthia manner we hope to avoid some ofthe problems inherent ‘in requiring an accurato segmentation of the tost utterance. Figure 4 efines, within a test pattern, a beginning region of size B irae), ‘with potential starting frames between bx and by (B = by ~ B+ 1) and an ending region of ize B, with potential ending frames between €) land e; (=~ 6, +3). One possible pew constraint would be that the best time-aligment contour may begin anhene within the beginning region and end anywhere within the ending region, Three such potential paths are shown in Vig. 4. Such a framework would he toed for word spotting, in which the heginning and ending reglins courrespond to the entire cea patio, or fer connected word recogni: Fig ¢Thattien of he of gga aig ao, tion, in which the ending region fr one word is used to hypothesizw the beginning rion Toe the next word ‘The use of beginning and ending rugions modify the basie oe algorithm by ehanging the constraints which are imposed on the end ‘ofthe time alignment contour, Len i= G0) = hehehe (ue) HK)=N, jlllne ae eses cm ‘Thus, to find the optimal tiseoligament contour, every possible beginning end ending poln. pie mst be tied, chat is [et ro. sara = 8a a "The omount of computation required to solve og, (12 for tho optima path can be excessive, Le, theoretically we reine BF weparabe ne ‘warp in the mow genera cise. However, the amount of computation DYNAMIC TIME WARPING 311 required to solve en. (12) may be reduced toa single tne warp by jacicions selection of the weighting fametion. 1f 2) is chosen to be the azymmetrie weighting freon, Type a (WR) =H) — i081), fond NUM) de chosen appropriately (N1Wa) =), then D my be Computed eficiently hy » modified rw algorithm a flaws: Step 1. Set Dall, 2) = Atl.) for bbs, Step 2 Compuce Dye, m) recursively tor |= = N, doamses, Step py it, Dal, ‘This algorithon works because Step 1 initialize all possible begining points, Step 2 computes the bee pat pol Um from any of the potential beginning points initalied in Step 1, and Step 8 fd the best posible ending point along any path from any possible beginning pont The parccular choive ofthe asymmetric weighting function is Ttnportane baeause ta normalization factor is unaffected by the choice ofthe beginning or ending point, ie, its normalization factorial IN-A dependence on the loth ofthe test patter, asin the symmetric ‘weighting function, Type b, would require a eeparace time warp for ‘ch wt of beginning ond ending points been the ffetive tenth of the test (e— b+ 1) depends on the choice ofthe beginning ae eng "An important factor, even with che savings of a single time warp is the large emount of computation required for the ur algorithm. Step Bot the modified pTw algorithm is defined for t= n= Nb, = m= es and this region may be ax large a2 N-M. Te is also not poesile to significantly rede ths ice hy using restrictions on the slope of the ‘warping conto when che ending region ie left unspecified. This point ie ilustrated in Fig. 6, where the slope of the warping function ig restricted «0 be between i and 2 We obgsrve chat even with eatrietion, when no ending rgion is specified, the area for which Dain, m) ust be computed is WN + BW. ‘Wwe modifications to the DTW algorithm have been suggested to uct this amount of computation, Tn particular, Sake anal Chiba Ihave proposed that a time-warping path not be alloted to deviaue significantly from a straight ine, ie, for any i), the value of A) ie retricted uch that J) th) — BL) eR, ay where & isthe center of Uns beginning mia [6 = (b+ bs) /2] and isthe maximum deviation which iy allowed. Remus be chosen ta a eset cover the entre beginning region, i. 2R-+ 1= B. This algorithm, ‘212 THE BELL SYSTEM TECHNICAL JOURNAL, MARCH 1981

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.