ebook img

Techniques for Noise Robustness in Automatic Speech Recognition PDF

500 Pages·2012·8.679 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Techniques for Noise Robustness in Automatic Speech Recognition

P1:TIX/XYZ P2:ABC JWST201-fm JWST201-Virtanen August31,2012 9:5 PrinterName:YettoCome Trim:244mm×168mm TECHNIQUES FOR NOISE ROBUSTNESS IN AUTOMATIC SPEECH RECOGNITION P1:TIX/XYZ P2:ABC JWST201-fm JWST201-Virtanen August31,2012 9:5 PrinterName:YettoCome Trim:244mm×168mm TECHNIQUES FOR NOISE ROBUSTNESS IN AUTOMATIC SPEECH RECOGNITION Editors TuomasVirtanen TampereUniversityofTechnology,Finland RitaSingh CarnegieMellonUniversity,USA BhikshaRaj CarnegieMellonUniversity,USA A John Wiley & Sons, Ltd., Publication P1:TIX/XYZ P2:ABC JWST201-fm JWST201-Virtanen August31,2012 9:5 PrinterName:YettoCome Trim:244mm×168mm Thiseditionfirstpublished2013 ©2013JohnWiley&Sons,Ltd Registeredoffice JohnWiley&SonsLtd,TheAtrium,SouthernGate,Chichester,WestSussex,PO198SQ,UnitedKingdom Fordetailsofourglobaleditorialoffices,forcustomerservicesandforinformationabouthowtoapplyfor permissiontoreusethecopyrightmaterialinthisbookpleaseseeourwebsiteatwww.wiley.com. TherightoftheauthortobeidentifiedastheauthorofthisworkhasbeenassertedinaccordancewiththeCopyright, DesignsandPatentsAct1988. Allrightsreserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmitted,inany formorbyanymeans,electronic,mechanical,photocopying,recordingorotherwise,exceptaspermittedbytheUK Copyright,DesignsandPatentsAct1988,withoutthepriorpermissionofthepublisher. Wileyalsopublishesitsbooksinavarietyofelectronicformats.Somecontentthatappearsinprintmaynotbe availableinelectronicbooks. Designationsusedbycompaniestodistinguishtheirproductsareoftenclaimedastrademarks.Allbrandnamesand productnamesusedinthisbookaretradenames,servicemarks,trademarksorregisteredtrademarksoftheir respectiveowners.Thepublisherisnotassociatedwithanyproductorvendormentionedinthisbook.This publicationisdesignedtoprovideaccurateandauthoritativeinformationinregardtothesubjectmattercovered.Itis soldontheunderstandingthatthepublisherisnotengagedinrenderingprofessionalservices.Ifprofessionaladvice orotherexpertassistanceisrequired,theservicesofacompetentprofessionalshouldbesought. LibraryofCongressCataloging-in-PublicationData Virtanen,Tuomas. Techniquesfornoiserobustnessinautomaticspeechrecognition/TuomasVirtanen,RitaSingh,BhikshaRaj. p.cm. Includesbibliographicalreferencesandindex. ISBN978-1-119-97088-0(cloth) 1.Automaticspeechrecognition. I.Singh,Rita. II.Raj,Bhiksha. III.Title. TK7882.S65V572012 (cid:2) 006.454–dc23 2012015742 AcataloguerecordforthisbookisavailablefromtheBritishLibrary. ISBN:978-0-470-97409-4 Typesetin10/12ptTimesbyAptaraInc.,NewDelhi,India P1:TIX/XYZ P2:ABC JWST201-fm JWST201-Virtanen August31,2012 21:0 PrinterName:YettoCome Trim:244mm×168mm Contents ListofContributors xv Acknowledgments xvii 1 Introduction 1 TuomasVirtanen,RitaSingh,BhikshaRaj 1.1 ScopeoftheBook 1 1.2 Outline 2 1.3 Notation 4 Part One FOUNDATIONS 2 TheBasicsofAutomaticSpeechRecognition 9 RitaSingh,BhikshaRaj,TuomasVirtanen 2.1 Introduction 9 2.2 SpeechRecognitionViewedasBayesClassification 10 2.3 HiddenMarkovModels 11 2.3.1 ComputingProbabilitieswithHMMs 12 2.3.2 DeterminingtheStateSequence 17 2.3.3 LearningHMMParameters 19 2.3.4 AdditionalIssuesRelatingtoSpeechRecognitionSystems 20 2.4 HMM-BasedSpeechRecognition 24 2.4.1 RepresentingtheSignal 24 2.4.2 TheHMMforaWordSequence 25 2.4.3 SearchingthroughallWordSequences 26 References 29 3 TheProblemofRobustnessinAutomaticSpeechRecognition 31 BhikshaRaj,TuomasVirtanen,RitaSingh 3.1 ErrorsinBayesClassification 31 3.1.1 Type1Condition:MismatchError 33 3.1.2 Type2Condition:IncreasedBayesError 34 3.2 BayesClassificationandASR 35 3.2.1 AllWeHaveisaModel:AType1Condition 35 P1:TIX/XYZ P2:ABC JWST201-fm JWST201-Virtanen August31,2012 21:0 PrinterName:YettoCome Trim:244mm×168mm vi Contents 3.2.2 IntrinsicInterferences—SignalComponentsthatareUnrelatedto theMessage:AType2Condition 36 3.2.3 ExternalInterferences—TheDataareNoisy:Type1and Type2Conditions 36 3.3 ExternalInfluencesonSpeechRecordings 36 3.3.1 SignalCapture 37 3.3.2 AdditiveCorruptions 41 3.3.3 Reverberation 42 3.3.4 ASimplifiedModelofSignalCapture 43 3.4 TheEffectofExternalInfluencesonRecognition 44 3.5 ImprovingRecognitionunderAdverseConditions 46 3.5.1 HandlingtheModelMismatchError 46 3.5.2 DealingwithIntrinsicVariationsintheData 47 3.5.3 DealingwithExtrinsicVariations 47 References 50 Part Two SIGNALENHANCEMENT 4 VoiceActivityDetection,NoiseEstimation,andAdaptiveFiltersfor AcousticSignalEnhancement 53 RainerMartin,DorotheaKolossa 4.1 Introduction 53 4.2 SignalAnalysisandSynthesis 55 4.2.1 DFT-BasedAnalysisSynthesiswithPerfectReconstruction 55 4.2.2 ProbabilityDistributionsforSpeechandNoiseDFTCoefficients 57 4.3 VoiceActivityDetection 58 4.3.1 VADDesignPrinciples 58 4.3.2 EvaluationofVADPerformance 62 4.3.3 EvaluationintheContextofASR 62 4.4 NoisePowerSpectrumEstimation 65 4.4.1 SmoothingTechniques 65 4.4.2 HistogramandGMMNoiseEstimationMethods 67 4.4.3 MinimumStatisticsNoisePowerEstimation 67 4.4.4 MMSENoisePowerEstimation 68 4.4.5 EstimationoftheAPrioriSignal-to-NoiseRatio 69 4.5 AdaptiveFiltersforSignalEnhancement 71 4.5.1 SpectralSubtraction 71 4.5.2 NonlinearSpectralSubtraction 73 4.5.3 WienerFiltering 74 4.5.4 TheETSIAdvancedFrontEnd 75 4.5.5 NonlinearMMSEEstimators 75 4.6 ASRPerformance 80 4.7 Conclusions 81 References 82 P1:TIX/XYZ P2:ABC JWST201-fm JWST201-Virtanen August31,2012 21:0 PrinterName:YettoCome Trim:244mm×168mm Contents vii 5 ExtractionofSpeechfromMixtureSignals 87 ParisSmaragdis 5.1 TheProblemwithMixtures 87 5.2 MultichannelMixtures 88 5.2.1 BasicProblemFormulation 88 5.2.2 ConvolutiveMixtures 92 5.3 Single-ChannelMixtures 98 5.3.1 ProblemFormulation 98 5.3.2 LearningSoundModels 100 5.3.3 SeparationbySpectrogramFactorization 101 5.3.4 DealingwithUnknownSounds 105 5.4 VariationsandExtensions 107 5.5 Conclusions 107 References 107 6 MicrophoneArrays 109 JohnMcDonough,KenichiKumatani 6.1 SpeakerTracking 110 6.2 ConventionalMicrophoneArrays 113 6.3 ConventionalAdaptiveBeamformingAlgorithms 120 6.3.1 MinimumVarianceDistortionlessResponseBeamformer 120 6.3.2 NoiseFieldModels 122 6.3.3 SubbandAnalysisandSynthesis 123 6.3.4 BeamformingPerformanceCriteria 126 6.3.5 GeneralizedSidelobeCancellerImplementation 129 6.3.6 RecursiveImplementationoftheGSC 130 6.3.7 OtherConventionalGSCBeamformers 131 6.3.8 BeamformingbasedonHigherOrderStatistics 132 6.3.9 OnlineImplementation 136 6.3.10 Speech-RecognitionExperiments 140 6.4 SphericalMicrophoneArrays 142 6.5 SphericalAdaptiveAlgorithms 148 6.6 ComparativeStudies 149 6.7 ComparisonofLinearandSphericalArraysforDSR 152 6.8 ConclusionsandFurtherReading 154 References 155 Part Three FEATUREENHANCEMENT 7 FromSignalstoSpeechFeaturesbyDigitalSignalProcessing 161 MatthiasWo¨lfel 7.1 Introduction 161 7.1.1 AboutthisChapter 162 7.2 TheSpeechSignal 162 P1:TIX/XYZ P2:ABC JWST201-fm JWST201-Virtanen August31,2012 21:0 PrinterName:YettoCome Trim:244mm×168mm viii Contents 7.3 SpectralProcessing 163 7.3.1 Windowing 163 7.3.2 PowerSpectrum 165 7.3.3 SpectralEnvelopes 166 7.3.4 LPEnvelope 166 7.3.5 MVDREnvelope 169 7.3.6 WarpingtheFrequencyAxis 171 7.3.7 WarpedLPEnvelope 175 7.3.8 WarpedMVDREnvelope 176 7.3.9 ComparisonofSpectralEstimates 177 7.3.10 TheSpectrogram 179 7.4 CepstralProcessing 179 7.4.1 DefinitionandCalculationofCepstralCoefficients 180 7.4.2 CharacteristicsofCepstralSequences 181 7.5 InfluenceofDistortionsonDifferentSpeechFeatures 182 7.5.1 ObjectiveFunctions 182 7.5.2 RobustnessagainstNoise 185 7.5.3 RobustnessagainstEchoandReverberation 187 7.5.4 RobustnessagainstChangesinFundamentalFrequency 189 7.6 SummaryandFurtherReading 191 References 191 8 FeaturesBasedonAuditoryPhysiologyandPerception 193 RichardM.Stern,NelsonMorgan 8.1 Introduction 193 8.2 SomeAttributesofAuditoryPhysiologyandPerception 194 8.2.1 PeripheralProcessing 194 8.2.2 ProcessingatmoreCentralLevels 200 8.2.3 PsychoacousticalCorrelatesofPhysiologicalObservations 202 8.2.4 TheImpactofAuditoryProcessingonConventional FeatureExtraction 206 8.2.5 Summary 208 8.3 “Classic”AuditoryRepresentations 208 8.4 CurrentTrendsinAuditoryFeatureAnalysis 213 8.5 Summary 221 Acknowledgments 222 References 222 9 FeatureCompensation 229 JashaDroppo 9.1 LifeinanIdealWorld 229 9.1.1 NoiseRobustnessTasks 229 9.1.2 ProbabilisticFeatureEnhancement 230 9.1.3 GaussianMixtureModels 231 P1:TIX/XYZ P2:ABC JWST201-fm JWST201-Virtanen August31,2012 21:0 PrinterName:YettoCome Trim:244mm×168mm Contents ix 9.2 MMSE-SPLICE 232 9.2.1 ParameterEstimation 233 9.2.2 Results 236 9.3 DiscriminativeSPLICE 237 9.3.1 TheMMIObjectiveFunction 238 9.3.2 TrainingtheFront-EndParameters 239 9.3.3 TheRpropAlgorithm 240 9.3.4 Results 241 9.4 Model-BasedFeatureEnhancement 242 9.4.1 TheAdditiveNoise-MixingEquation 243 9.4.2 TheJointProbabilityModel 244 9.4.3 VectorTaylorSeriesApproximation 246 9.4.4 EstimatingCleanSpeech 247 9.4.5 Results 247 9.5 SwitchingLinearDynamicSystem 248 9.6 Conclusion 249 References 249 10 ReverberantSpeechRecognition 251 ReinholdHaeb-Umbach,AlexanderKrueger 10.1 Introduction 251 10.2 TheEffectofReverberation 252 10.2.1 WhatisReverberation? 252 10.2.2 TheRelationshipbetweenCleanandReverberant SpeechFeatures 254 10.2.3 TheEffectofReverberationonASRPerformance 258 10.3 ApproachestoReverberantSpeechRecognition 258 10.3.1 Signal-BasedTechniques 259 10.3.2 Front-EndTechniques 260 10.3.3 Back-EndTechniques 262 10.3.4 ConcludingRemarks 265 10.4 FeatureDomainModeloftheAcousticImpulseResponse 265 10.5 BayesianFeatureEnhancement 267 10.5.1 BasicApproach 268 10.5.2 MeasurementUpdate 269 10.5.3 TimeUpdate 270 10.5.4 Inference 271 10.6 ExperimentalResults 272 10.6.1 Databases 272 10.6.2 OverviewoftheTestedMethods 273 10.6.3 RecognitionResultsonReverberantSpeech 274 10.6.4 RecognitionResultsonNoisyReverberantSpeech 276 10.7 Conclusions 277 Acknowledgment 278 References 278 P1:TIX/XYZ P2:ABC JWST201-fm JWST201-Virtanen August31,2012 21:0 PrinterName:YettoCome Trim:244mm×168mm x Contents Part Four MODELENHANCEMENT 11 AdaptationandDiscriminativeTrainingofAcousticModels 285 YannickEste`ve,PaulDele´glise 11.1 Introduction 285 11.1.1 AcousticModels 286 11.1.2 MaximumLikelihoodEstimation 287 11.2 AcousticModelAdaptationandNoiseRobustness 288 11.2.1 Static(orOffline)Adaptation 289 11.2.2 Dynamic(orOnline)Adaptation 289 11.3 MaximumAPosterioriReestimation 290 11.4 MaximumLikelihoodLinearRegression 293 11.4.1 ClassRegressionTree 294 11.4.2 ConstrainedMaximumLikelihoodLinearRegression 297 11.4.3 CMLLRImplementation 297 11.4.4 SpeakerAdaptiveTraining 298 11.5 DiscriminativeTraining 299 11.5.1 MMIDiscriminativeTrainingCriterion 301 11.5.2 MPEDiscriminativeTrainingCriterion 302 11.5.3 I-smoothing 303 11.5.4 MPEImplementation 304 11.6 Conclusion 307 References 308 12 FactorialModelsforNoiseRobustSpeechRecognition 311 JohnR.Hershey,StevenJ.Rennie,JonathanLeRoux 12.1 Introduction 311 12.2 TheModel-BasedApproach 313 12.3 SignalFeatureDomains 314 12.4 InteractionModels 317 12.4.1 ExactInteractionModel 318 12.4.2 MaxModel 320 12.4.3 Log-SumModel 321 12.4.4 MelInteractionModel 321 12.5 InferenceMethods 322 12.5.1 MaxModelInference 322 12.5.2 ParallelModelCombination 324 12.5.3 VectorTaylorSeriesApproaches 326 12.5.4 SNR-DependentApproaches 331 12.6 EfficientLikelihoodEvaluationinFactorialModels 332 12.6.1 EfficientInferenceusingtheMaxModel 332 12.6.2 EfficientVector-TaylorSeriesApproaches 334 12.6.3 BandQuantization 335 12.7 CurrentDirections 337 12.7.1 DynamicNoiseModelsforRobustASR 338 P1:TIX/XYZ P2:ABC JWST201-fm JWST201-Virtanen August31,2012 21:0 PrinterName:YettoCome Trim:244mm×168mm Contents xi 12.7.2 Multi-TalkerSpeechRecognitionusingGraphicalModels 339 12.7.3 NoiseRobustASRusingNon-Negative BasisRepresentations 340 References 341 13 AcousticModelTrainingforRobustSpeechRecognition 347 MichaelL.Seltzer 13.1 Introduction 347 13.2 TraditionalTrainingMethodsforRobustSpeechRecognition 348 13.3 ABriefOverviewofSpeakerAdaptiveTraining 349 13.4 Feature-SpaceNoiseAdaptiveTraining 351 13.4.1 ExperimentsusingfNAT 352 13.5 Model-SpaceNoiseAdaptiveTraining 353 13.6 NoiseAdaptiveTrainingusingVTSAdaptation 355 13.6.1 VectorTaylorSeriesHMMAdaptation 355 13.6.2 UpdatingtheAcousticModelParameters 357 13.6.3 UpdatingtheEnvironmentalParameters 360 13.6.4 ImplementationDetails 360 13.6.5 ExperimentsusingNAT 361 13.7 Discussion 364 13.7.1 ComparisonofTrainingAlgorithms 364 13.7.2 ComparisontoSpeakerAdaptiveTraining 364 13.7.3 RelatedAdaptiveTrainingMethods 365 13.8 Conclusion 366 References 366 Part Five COMPENSATIONFORINFORMATIONLOSS 14 Missing-DataTechniques:RecognitionwithIncompleteSpectrograms 371 JonBarker 14.1 Introduction 371 14.2 ClassificationwithIncompleteData 373 14.2.1 ASimpleMissingDataScenario 374 14.2.2 MissingDataTheory 376 14.2.3 ValidityoftheMARAssumption 378 14.2.4 MarginalisingAcousticModels 379 14.3 EnergeticMasking 381 14.3.1 TheMaxApproximation 381 14.3.2 BoundedMarginalisation 382 14.3.3 MissingDataASRintheCepstralDomain 384 14.3.4 MissingDataASRwithDynamicFeatures 386 14.4 Meta-MissingData:DealingwithMaskUncertainty 388 14.4.1 MissingDatawithSoftMasks 388

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.