ebook img

Bayesian Speech and Language Processing PDF

446 Pages·2015·8.142 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Bayesian Speech and Language Processing

BayesianSpeechandLanguageProcessing WiththiscomprehensiveguideyouwilllearnhowtoapplyBayesianmachinelearning techniquessystematicallytosolvevariousproblemsinspeechandlanguageprocessing. A range of statistical models is detailed, from hidden Markov models to Gaussian mixture models, n-gram models, and latent topic models, along with applications includingautomaticspeechrecognition,speakerverification,andinformationretrieval. Approximate Bayesian inferences based on MAP, Evidence, Asymptotic, VB, and MCMC approximations are provided as well as full derivations of calculations, useful notations,formulas,andrules. Theauthorsaddressthedifficultiesofstraightforwardapplicationsandprovidedetailed examples and case studies to demonstrate how you can successfully use practical Bayesianinferencemethodstoimprovetheperformanceofinformationsystems. This is an invaluable resource for students, researchers, and industry practitioners workinginmachinelearning,signalprocessing,andspeechandlanguageprocessing. Shinji Watanabe received his Ph.D. from Waseda University in 2006. He has been a research scientist at NTT Communication Science Laboratories, a visiting scholar at Georgia Institute of Technology and a senior principal member at Mitsubishi Electric ResearchLaboratories(MERL),aswellashavingbeenanassociateeditoroftheIEEE TransactionsonAudioSpeechandLanguageProcessing,andanelectedmemberofthe IEEE Speech and Language Processing Technical Committee. He has published more than100papersinjournalsandconferences,andreceivedseveralawardsincludingthe BestPaperAwardfromIEICEin2003. Jen-TzungChieniswiththeDepartmentofElectricalandComputerEngineeringandthe DepartmentofComputerScienceattheNationalChiaoTungUniversity,Taiwan,where heisnowtheUniversityChairProfessor.HereceivedtheDistinguishedResearchAward fromtheMinistryofScienceandTechnology,Taiwan,andtheBestPaperAwardofthe 2011 IEEE Automatic Speech Recognition and Understanding Workshop. He serves currently as an elected member of the IEEE Machine Learning for Signal Processing TechnicalCommittee. Downloaded from https:/www.cambridge.org/core. Columbia University Libraries, on 30 Jun 2017 at 15:16:17, subject to the Cambridge Core terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9781107295360 “ThisbookprovidesanoverviewofawiderangeoffundamentaltheoriesofBayesian learning, inference, and prediction for uncertainty modeling in speech and language processing. The uncertainty modeling is crucial in increasing the robustness of prac- tical systems based on statistical modeling under real environment, such as automatic speechrecognitionsystemsundernoise,andquestionansweringsystemsbasedonlim- itedsizeoftrainingdata.Thisisthemostadvancedandcomprehensivebookforlearning fundamentalBayesianapproachesandpracticaltechniques.” SadaokiFurui,TokyoInstituteofTechnology Downloaded from https:/www.cambridge.org/core. Columbia University Libraries, on 30 Jun 2017 at 15:16:17, subject to the Cambridge Core terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9781107295360 Bayesian Speech and Language Processing SHINJI WATANABE MitsubishiElectricResearchLaboratories JEN-TZUNG CHIEN NationalChiaoTungUniversity Downloaded from https:/www.cambridge.org/core. Columbia University Libraries, on 30 Jun 2017 at 15:16:17, subject to the Cambridge Core terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9781107295360 UniversityPrintingHouse,CambridgeCB28BS,UnitedKingdom CambridgeUniversityPressispartoftheUniversityofCambridge. ItfurtherstheUniversity’smissionbydisseminatingknowledgeinthepursuitof education,learningandresearchatthehighestinternationallevelsofexcellence. www.cambridge.org Informationonthistitle:www.cambridge.org/9781107055575 (cid:2)c CambridgeUniversityPress2015 Thispublicationisincopyright.Subjecttostatutoryexception andtotheprovisionsofrelevantcollectivelicensingagreements, noreproductionofanypartmaytakeplacewithoutthewritten permissionofCambridgeUniversityPress. Firstpublished2015 PrintedintheUnitedKingdombyClays,StIvesplc AcatalogrecordforthispublicationisavailablefromtheBritishLibrary LibraryofCongressCataloginginPublicationdata Watanabe,Shinji(Communicationsengineer)author. Bayesianspeechandlanguageprocessing/ShinjiWatanabe,MitsubishiElectricResearch Laboratories;Jen-TzungChien,NationalChiaoTungUniversity. pages cm ISBN978-1-107-05557-5(hardback) 1. Languageandlanguages–Studyandteaching–Statisticalmethods. 2. Bayesianstatistical decisiontheory. I. Title. P53.815.W38 2015 410.1(cid:3)51–dc23 2014050265 ISBN978-1-107-05557-5Hardback CambridgeUniversityPresshasnoresponsibilityforthepersistenceoraccuracy ofURLsforexternalorthird-partyinternetwebsitesreferredtointhispublication, anddoesnotguaranteethatanycontentonsuchwebsitesis,orwillremain, accurateorappropriate. Downloaded from https:/www.cambridge.org/core. Columbia University Libraries, on 30 Jun 2017 at 15:16:17, subject to the Cambridge Core terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9781107295360 Contents Preface pagexi Notationandabbreviations xiii PartI Generaldiscussion 1 1 Introduction 3 1.1 Machinelearningandspeechandlanguageprocessing 3 1.2 Bayesianapproach 4 1.3 HistoryofBayesianspeechandlanguageprocessing 8 1.4 Applications 9 1.5 Organizationofthisbook 11 2 Bayesianapproach 13 2.1 Bayesianprobabilities 13 2.1.1 Sumandproductrules 14 2.1.2 Priorandposteriordistributions 15 2.1.3 Exponentialfamilydistributions 16 2.1.4 Conjugatedistributions 24 2.1.5 Conditionalindependence 38 2.2 Graphicalmodelrepresentation 40 2.2.1 Directedgraph 40 2.2.2 Conditionalindependenceingraphicalmodel 40 2.2.3 Observation,latentvariable,non-probabilisticvariable 42 2.2.4 Generativeprocess 44 2.2.5 Undirectedgraph 44 2.2.6 Inferenceongraphs 46 2.3 DifferencebetweenMLandBayes 47 2.3.1 Useofpriorknowledge 48 2.3.2 Modelselection 49 2.3.3 Marginalization 50 2.4 Summary 51 Downloaded from https:/www.cambridge.org/core. Columbia University Libraries, on 30 Jun 2017 at 15:16:18, subject to the Cambridge Core terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9781107295360 vi Contents 3 Statisticalmodelsinspeechandlanguageprocessing 53 3.1 Bayesdecisionforspeechrecognition 54 3.2 HiddenMarkovmodel 59 3.2.1 LexicalunitforHMM 59 3.2.2 LikelihoodfunctionofHMM 60 3.2.3 ContinuousdensityHMM 63 3.2.4 Gaussianmixturemodel 66 3.2.5 GraphicalmodelsandgenerativeprocessofCDHMM 67 3.3 Forward–backwardandViterbialgorithms 70 3.3.1 Forward–backwardalgorithm 70 3.3.2 Viterbialgorithm 74 3.4 MaximumlikelihoodestimationandEMalgorithm 76 3.4.1 Jensen’sinequality 77 3.4.2 Expectationstep 79 3.4.3 Maximizationstep 86 3.5 MaximumlikelihoodlinearregressionforhiddenMarkovmodel 91 3.5.1 LinearregressionforhiddenMarkovmodels 92 3.6 n-gramwithsmoothingtechniques 97 3.6.1 Class-basedmodelsmoothing 101 3.6.2 Jelinek–Mercersmoothing 101 3.6.3 Witten–Bellsmoothing 103 3.6.4 Absolutediscounting 104 3.6.5 Katzsmoothing 106 3.6.6 Kneser–Neysmoothing 107 3.7 Latentsemanticinformation 113 3.7.1 Latentsemanticanalysis 113 3.7.2 LSAlanguagemodel 116 3.7.3 Probabilisticlatentsemanticanalysis 119 3.7.4 PLSAlanguagemodel 125 3.8 RevisitofautomaticspeechrecognitionwithBayesianmanner 128 3.8.1 Trainingandtest(unseen)dataforASR 128 3.8.2 Bayesianmanner 129 3.8.3 Learninggenerativemodels 131 3.8.4 Sumruleformodel 131 3.8.5 Sumruleformodelparametersandlatentvariables 132 3.8.6 Factorizationbyproductruleandconditionalindependence 132 3.8.7 Posteriordistributions 133 3.8.8 Difficultiesinspeechandlanguageapplications 134 PartII Approximateinference 135 4 Maximuma-posterioriapproximation 137 4.1 MAPcriterionformodelparameters 138 Downloaded from https:/www.cambridge.org/core. Columbia University Libraries, on 30 Jun 2017 at 15:16:18, subject to the Cambridge Core terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9781107295360 Contents vii 4.2 MAPextensionofEMalgorithm 141 4.2.1 Auxiliaryfunction 141 4.2.2 Arecipe 143 4.3 ContinuousdensityhiddenMarkovmodel 143 4.3.1 Likelihoodfunction 144 4.3.2 Conjugatepriors(fullcovariancecase) 144 4.3.3 Conjugatepriors(diagonalcovariancecase) 146 4.3.4 Expectationstep 146 4.3.5 Maximizationstep 149 4.3.6 Sufficientstatistics 158 4.3.7 MeaningoftheMAPsolution 160 4.4 Speakeradaptation 163 4.4.1 Speakeradaptationbyatransformationof CDHMM 163 4.4.2 MAP-basedspeakeradaptation 165 4.5 Regularizationindiscriminativeparameterestimation 166 4.5.1 ExtendedBaum–Welchalgorithm 167 4.5.2 MAPinterpretationofi-smoothing 169 4.6 Speakerrecognition/verification 171 4.6.1 Universalbackgroundmodel 172 4.6.2 Gaussiansupervector 173 4.7 n-gramadaptation 174 4.7.1 MAPestimationofn-gramparameters 175 4.7.2 Adaptationmethod 175 4.8 Adaptivetopicmodel 176 4.8.1 MAPestimationforcorrectivetraining 177 4.8.2 Quasi-Bayesestimationforincrementallearning 179 4.8.3 Systemperformance 182 4.9 Summary 183 5 Evidenceapproximation 184 5.1 Evidenceframework 185 5.1.1 Bayesianmodelcomparison 185 5.1.2 Type-2maximumlikelihoodestimation 187 5.1.3 Regularizationinregressionmodel 188 5.1.4 EvidenceframeworkforHMMandSVM 190 5.2 BayesiansensingHMMs 191 5.2.1 Basisrepresentation 192 5.2.2 Modelconstruction 192 5.2.3 Automaticrelevancedetermination 193 5.2.4 Modelinference 195 5.2.5 Evidencefunctionormarginallikelihood 196 5.2.6 Maximuma-posteriorisensingweights 197 5.2.7 Optimalparametersandhyperparameters 197 Downloaded from https:/www.cambridge.org/core. Columbia University Libraries, on 30 Jun 2017 at 15:16:18, subject to the Cambridge Core terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9781107295360 viii Contents 5.2.8 Discriminativetraining 200 5.2.9 Systemperformance 203 5.3 HierarchicalDirichletlanguagemodel 205 5.3.1 n-gramsmoothingrevisited 205 5.3.2 Dirichletpriorandposterior 206 5.3.3 Evidencefunction 207 5.3.4 Bayesiansmoothedlanguagemodel 208 5.3.5 Optimalhyperparameters 208 6 Asymptoticapproximation 211 6.1 Laplaceapproximation 211 6.2 Bayesianinformationcriterion 214 6.3 Bayesianpredictiveclassification 218 6.3.1 Robustdecisionrule 218 6.3.2 LaplaceapproximationforBPCdecision 220 6.3.3 BPCdecisionconsideringuncertaintyofHMMmeans 222 6.4 Neuralnetworkacousticmodeling 224 6.4.1 Neuralnetworkmodelingandlearning 225 6.4.2 BayesianneuralnetworksandhiddenMarkovmodels 226 6.4.3 LaplaceapproximationforBayesianneuralnetworks 229 6.5 Decisiontreeclustering 230 6.5.1 DecisiontreeclusteringusingMLcriterion 230 6.5.2 DecisiontreeclusteringusingBIC 235 6.6 Speakerclustering/segmentation 237 6.6.1 Speakersegmentation 237 6.6.2 Speakerclustering 239 6.7 Summary 240 7 VariationalBayes 242 7.1 Variationalinferenceingeneral 242 7.1.1 Jointposteriordistribution 243 7.1.2 Factorizedposteriordistribution 244 7.1.3 Variationalmethod 246 7.2 Variationalinferenceforclassificationproblems 248 7.2.1 VBposteriordistributionsformodelparameters 249 7.2.2 VBposteriordistributionsforlatentvariables 251 7.2.3 VB–EMalgorithm 251 7.2.4 VBposteriordistributionformodelstructure 252 7.3 ContinuousdensityhiddenMarkovmodel 254 7.3.1 Generativemodel 254 7.3.2 Priordistribution 255 7.3.3 VBBaum–Welchalgorithm 257 7.3.4 Variationallowerbound 269 7.3.5 VBposteriorforBayesianpredictiveclassification 274 Downloaded from https:/www.cambridge.org/core. Columbia University Libraries, on 30 Jun 2017 at 15:16:18, subject to the Cambridge Core terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9781107295360 Contents ix 7.3.6 Decisiontreeclustering 282 7.3.7 DeterminationofHMMtopology 285 7.4 StructuralBayesianlinearregressionforhiddenMarkovmodel 287 7.4.1 VariationalBayesianlinearregression 288 7.4.2 Generativemodel 289 7.4.3 Variationallowerbound 289 7.4.4 Optimizationofhyperparametersandmodelstructure 303 7.4.5 Hyperparameteroptimization 304 7.5 VariationalBayesianspeakerverification 306 7.5.1 Generativemodel 307 7.5.2 Priordistributions 308 7.5.3 Variationalposteriors 310 7.5.4 Variationallowerbound 316 7.6 LatentDirichletallocation 318 7.6.1 Modelconstruction 318 7.6.2 VBinference:lowerbound 320 7.6.3 VBinference:variationalparameters 321 7.6.4 VBinference:modelparameters 323 7.7 Latenttopiclanguagemodel 324 7.7.1 LDAlanguagemodel 324 7.7.2 Dirichletclasslanguagemodel 326 7.7.3 Modelconstruction 327 7.7.4 VBinference:lowerbound 328 7.7.5 VBinference:parameterestimation 330 7.7.6 CacheDirichletclasslanguagemodel 332 7.7.7 Systemperformance 334 7.8 Summary 335 8 MarkovchainMonteCarlo 337 8.1 Samplingmethods 338 8.1.1 Importancesampling 338 8.1.2 Markovchain 340 8.1.3 TheMetropolis–Hastingsalgorithm 341 8.1.4 Gibbssampling 343 8.1.5 Slicesampling 344 8.2 Bayesiannonparametrics 345 8.2.1 Modelingviaexchangeability 346 8.2.2 Dirichletprocess 348 8.2.3 DP:Stick-breakingconstruction 348 8.2.4 DP:Chineserestaurantprocess 349 8.2.5 Dirichletprocessmixturemodel 351 8.2.6 HierarchicalDirichletprocess 352 8.2.7 HDP:Stick-breakingconstruction 353 8.2.8 HDP:Chineserestaurantfranchise 355 Downloaded from https:/www.cambridge.org/core. Columbia University Libraries, on 30 Jun 2017 at 15:16:18, subject to the Cambridge Core terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9781107295360 x Contents 8.2.9 MCMCinferencebyChineserestaurantfranchise 356 8.2.10 MCMCinferencebydirectassignment 358 8.2.11 RelationofHDPtoothermethods 360 8.3 Gibbssampling-basedspeakerclustering 360 8.3.1 Generativemodel 361 8.3.2 GMMmarginallikelihoodforcompletedata 362 8.3.3 GMMGibbssampler 365 8.3.4 Generativeprocessandgraphicalmodelofmulti-scaleGMM 367 8.3.5 Marginallikelihoodforthecompletedata 368 8.3.6 Gibbssampler 370 8.4 NonparametricBayesianHMMstoacousticunitdiscovery 372 8.4.1 Generativemodelandgenerativeprocess 373 8.4.2 Inference 375 8.5 HierarchicalPitman–Yorlanguagemodel 378 8.5.1 Pitman–Yorprocess 379 8.5.2 Languagemodelsmoothingrevisited 380 8.5.3 HierarchicalPitman–Yorlanguagemodel 383 8.5.4 MCMCinferenceforHPYLM 385 8.6 Summary 387 AppendixA Basicformulas 388 AppendixB Vectorandmatrixformulas 390 AppendixC Probabilisticdistributionfunctions 392 References 405 Index 422 Downloaded from https:/www.cambridge.org/core. Columbia University Libraries, on 30 Jun 2017 at 15:16:18, subject to the Cambridge Core terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9781107295360

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.