ebook img

Spoken language processing: guide to algorithms and system development PDF

965 Pages·2001·10.354 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Spoken language processing: guide to algorithms and system development

TABLE OF CONTENTS 1.INTRODUCTION...................................................................................................1 1.1. MOTIVATIONS.......................................................................................................2 1.1.1. SpokenLanguageInterface...................................................................2 1.1.2. Speech-to-speechTranslation................................................................3 1.1.3. KnowledgePartners...............................................................................3 1.2. SPOKENLANGUAGESYSTEMARCHITECTURE........................................................4 1.2.1. AutomaticSpeechRecognition..............................................................4 1.2.2. Text-to-SpeechConversion....................................................................6 1.2.3. SpokenLanguageUnderstanding..........................................................7 1.3. BOOKORGANIZATION............................................................................................9 1.3.1. PartI:FundamentalTheory..................................................................9 1.3.2. PartII:SpeechProcessing....................................................................9 1.3.3. PartIII:SpeechRecognition...............................................................10 1.3.4. PartIV:Text-to-SpeechSystems..........................................................10 1.3.5. PartV:SpokenLanguageSystems......................................................10 1.4. TARGETAUDIENCES.............................................................................................11 1.5. HISTORICALPERSPECTIVEANDFURTHERREADING.............................................11 PART I: FUNDAMENTAL THEORY 2.SPOKENLANGUAGE STRUCTURE.........................................................19 2.1. SOUNDANDHUMANSPEECHSYSTEMS................................................................21 2.1.1. Sound...................................................................................................21 2.1.2. SpeechProduction...............................................................................24 2.1.3. SpeechPerception................................................................................28 2.2. PHONETICSANDPHONOLOGY...............................................................................36 2.2.1. Phonemes.............................................................................................36 2.2.2. TheAllophone:SoundandContext.....................................................47 2.2.3. SpeechRateandCoarticulation..........................................................49 2.3. SYLLABLESANDWORDS......................................................................................50 2.3.1. Syllables...............................................................................................51 2.3.2. Words...................................................................................................52 2.4. SYNTAXANDSEMANTICS.....................................................................................57 2.4.1. SyntacticConstituents..........................................................................58 2.4.2. SemanticRoles.....................................................................................63 2.4.3. LexicalSemantics................................................................................64 2.4.4. LogicalForm.......................................................................................66 2.5. HISTORICALPERSPECTIVEANDFURTHERREADING.............................................68 i ii TABLEOFCONTENTS 3.PROBABILITY,STATISTICS ANDINFORMATIONTHEORY..73 3.1. PROBABILITYTHEORY.........................................................................................74 3.1.1. ConditionalProbabilityAndBayes'Rule............................................75 3.1.2. RandomVariables................................................................................77 3.1.3. MeanandVariance..............................................................................79 3.1.4. CovarianceandCorrelation................................................................83 3.1.5. RandomVectorsandMultivariateDistributions.................................84 3.1.6. SomeUsefulDistributions...................................................................85 3.1.7. GaussianDistributions........................................................................92 3.2. ESTIMATIONTHEORY...........................................................................................98 3.2.1. Minimum/LeastMeanSquaredErrorEstimation................................99 3.2.2. MaximumLikelihoodEstimation.......................................................104 3.2.3. BayesianEstimationandMAPEstimation........................................108 3.3. SIGNIFICANCETESTING.......................................................................................114 3.3.1. LevelofSignificance..........................................................................114 3.3.2. NormalTest(Z-Test)..........................................................................116 χ2Goodness-of-FitTest...................................................................117 3.3.3. 3.3.4. Matched-PairsTest............................................................................119 3.4. INFORMATIONTHEORY......................................................................................121 3.4.1. Entropy..............................................................................................121 3.4.2. ConditionalEntropy..........................................................................124 3.4.3. TheSourceCodingTheorem.............................................................125 3.4.4. MutualInformationandChannelCoding..........................................127 3.5. HISTORICALPERSPECTIVEANDFURTHERREADING...........................................129 4.PATTERNRECOGNITION...........................................................................133 4.1. BAYESDECISIONTHEORY..................................................................................134 4.1.1. Minimum-Error-RateDecisionRules................................................135 4.1.2. DiscriminantFunctions.....................................................................138 4.2. HOWTOCONSTRUCTCLASSIFIERS.....................................................................140 4.2.1. GaussianClassifiers..........................................................................142 4.2.2. TheCurseofDimensionality..............................................................144 4.2.3. EstimatingtheErrorRate..................................................................146 4.2.4. ComparingClassifiers.......................................................................148 4.3. DISCRIMINATIVETRAINING................................................................................150 4.3.1. MaximumMutualInformationEstimation.........................................150 4.3.2. Minimum-Error-RateEstimation.......................................................156 4.3.3. NeuralNetworks................................................................................158 4.4. UNSUPERVISEDESTIMATIONMETHODS.............................................................163 4.4.1. VectorQuantization...........................................................................164 4.4.2. TheEMAlgorithm.............................................................................170 4.4.3. MultivariateGaussianMixtureDensityEstimation...........................172 TABLEOFCONTENTS iii 4.5. CLASSIFICATIONANDREGRESSIONTREES..........................................................176 4.5.1. ChoiceofQuestionSet.......................................................................177 4.5.2. SplittingCriteria................................................................................179 4.5.3. GrowingtheTree...............................................................................181 4.5.4. MissingValuesandConflictResolution............................................182 4.5.5. ComplexQuestions............................................................................183 4.5.6. TheRight-SizedTree..........................................................................185 4.6. HISTORICALPERSPECTIVEANDFURTHERREADING...........................................190 PART II SPEECH PROCESSING 5.DIGITAL SIGNAL PROCESSING..............................................................201 5.1. DIGITALSIGNALSANDSYSTEMS........................................................................202 5.1.1. SinusoidalSignals..............................................................................203 5.1.2. OtherDigitalSignals.........................................................................206 5.1.3. DigitalSystems..................................................................................206 5.2. CONTINUOUS-FREQUENCYTRANSFORMS...........................................................209 5.2.1. TheFourierTransform......................................................................209 5.2.2. Z-Transform.......................................................................................211 5.2.3. Z-TransformsofElementaryFunctions.............................................212 5.2.4. PropertiesoftheZandFourierTransform.......................................215 5.3. DISCRETE-FREQUENCYTRANSFORMS................................................................216 5.3.1. TheDiscreteFourierTransform(DFT).............................................218 5.3.2. FourierTransformsofPeriodicSignals............................................219 5.3.3. TheFastFourierTransform(FFT)....................................................222 5.3.4. CircularConvolution.........................................................................227 5.3.5. TheDiscreteCosineTransform(DCT)..............................................228 5.4. DIGITALFILTERSANDWINDOWS........................................................................229 5.4.1. TheIdealLow-PassFilter.................................................................229 5.4.2. WindowFunctions.............................................................................230 5.4.3. FIRFilters..........................................................................................232 5.4.4. IIRFilters...........................................................................................238 5.5. DIGITALPROCESSINGOFANALOGSIGNALS........................................................242 5.5.1. FourierTransformofAnalogSignals................................................242 5.5.2. TheSamplingTheorem......................................................................243 5.5.3. Analog-to-DigitalConversion...........................................................245 5.5.4. Digital-to-AnalogConversion...........................................................246 5.6. MULTIRATESIGNALPROCESSING.......................................................................247 5.6.1. Decimation.........................................................................................248 5.6.2. Interpolation......................................................................................249 5.6.3. Resampling........................................................................................250 5.7. FILTERBANKS.....................................................................................................250 5.7.1. Two-BandConjugateQuadratureFilters..........................................250 iv TABLEOFCONTENTS 5.7.2. MultiresolutionFilterbanks...............................................................253 5.7.3. TheFFTasaFilterbank....................................................................255 5.7.4. ModulatedLappedTransforms..........................................................257 5.8. STOCHASTICPROCESSES....................................................................................259 5.8.1. StatisticsofStochasticProcesses.......................................................260 5.8.2. StationaryProcesses..........................................................................263 5.8.3. LTISystemswithStochasticInputs....................................................266 5.8.4. PowerSpectralDensity......................................................................267 5.8.5. Noise..................................................................................................269 5.9. HISTORICALPERSPECTIVEANDFURTHERREADING...........................................269 6.SPEECH SIGNAL REPRESENTATIONS...............................................273 6.1. SHORT-TIMEFOURIERANALYSIS.......................................................................274 6.1.1. Spectrograms.....................................................................................279 6.1.2. Pitch-SynchronousAnalysis...............................................................281 6.2. ACOUSTICALMODELOFSPEECHPRODUCTION..................................................281 6.2.1. GlottalExcitation...............................................................................282 6.2.2. LosslessTubeConcatenation.............................................................282 6.2.3. Source-FilterModelsofSpeechProduction......................................286 6.3. LINEARPREDICTIVECODING..............................................................................288 6.3.1. TheOrthogonalityPrinciple..............................................................289 6.3.2. SolutionoftheLPCEquations...........................................................291 6.3.3. SpectralAnalysisviaLPC.................................................................298 6.3.4. ThePredictionError..........................................................................299 6.3.5. EquivalentRepresentations...............................................................301 6.4. CEPSTRALPROCESSING......................................................................................304 6.4.1. TheRealandComplexCepstrum.......................................................305 6.4.2. CepstrumofPole-ZeroFilters...........................................................306 6.4.3. CepstrumofPeriodicSignals............................................................309 6.4.4. CepstrumofSpeechSignals...............................................................310 6.4.5. Source-FilterSeparationviatheCepstrum.......................................311 6.5. PERCEPTUALLY-MOTIVATEDREPRESENTATIONS...............................................313 6.5.1. TheBilinearTransform......................................................................313 6.5.2. Mel-FrequencyCepstrum..................................................................314 6.5.3. PerceptualLinearPrediction(PLP)..................................................316 6.6. FORMANTFREQUENCIES....................................................................................316 6.6.1. StatisticalFormantTracking.............................................................318 6.7. THEROLEOFPITCH...........................................................................................321 6.7.1. AutocorrelationMethod.....................................................................321 6.7.2. NormalizedCross-CorrelationMethod.............................................324 6.7.3. SignalConditioning...........................................................................327 6.7.4. PitchTracking....................................................................................327 6.8. HISTORICALPERSPECTIVEANDFUTUREREADING.............................................329 TABLEOFCONTENTS v 7.SPEECH CODING..............................................................................................335 7.1. SPEECHCODERSATTRIBUTES............................................................................336 7.2. SCALARWAVEFORMCODERS............................................................................338 7.2.1. LinearPulseCodeModulation(PCM)..............................................338 7.2.2. µ-lawandA-lawPCM.......................................................................340 7.2.3. AdaptivePCM....................................................................................342 7.2.4. DifferentialQuantization...................................................................343 7.3. SCALARFREQUENCYDOMAINCODERS..............................................................346 7.3.1. BenefitsofMasking............................................................................346 7.3.2. TransformCoders..............................................................................348 7.3.3. ConsumerAudio................................................................................349 7.3.4. DigitalAudioBroadcasting(DAB)....................................................349 7.4. CODEEXCITEDLINEARPREDICTION(CELP).....................................................350 7.4.1. LPCVocoder......................................................................................350 7.4.2. AnalysisbySynthesis.........................................................................351 7.4.3. PitchPrediction:AdaptiveCodebook...............................................354 7.4.4. PerceptualWeightingandPostfiltering.............................................355 7.4.5. ParameterQuantization.....................................................................356 7.4.6. CELPStandards................................................................................357 7.5. LOW-BITRATESPEECHCODERS........................................................................359 7.5.1. Mixed-ExcitationLPCVocoder.........................................................360 7.5.2. HarmonicCoding..............................................................................360 7.5.3. WaveformInterpolation.....................................................................365 7.6. HISTORICALPERSPECTIVEANDFURTHERREADING...........................................369 PART III: SPEECH RECOGNITION 8.HIDDENMARKOVMODELS......................................................................375 8.1. THEMARKOVCHAIN.........................................................................................376 8.2. DEFINITIONOFTHEHIDDENMARKOVMODEL...................................................378 8.2.1. DynamicProgrammingandDTW.....................................................381 8.2.2. HowtoEvaluateanHMM–TheForwardAlgorithm.......................383 8.2.3. HowtoDecodeanHMM-TheViterbiAlgorithm.............................385 8.2.4. HowtoEstimateHMMParameters–Baum-WelchAlgorithm.........387 8.3. CONTINUOUSANDSEMI-CONTINUOUSHMMS...................................................392 8.3.1. ContinuousMixtureDensityHMMs..................................................392 8.3.2. Semi-continuousHMMs.....................................................................394 8.4. PRACTICALISSUESINUSINGHMMS..................................................................396 8.4.1. InitialEstimates.................................................................................396 8.4.2. ModelTopology.................................................................................397 8.4.3. TrainingCriteria................................................................................399 8.4.4. DeletedInterpolation.........................................................................399 vi TABLEOFCONTENTS 8.4.5. ParameterSmoothing........................................................................401 8.4.6. ProbabilityRepresentations...............................................................402 8.5. HMMLIMITATIONS...........................................................................................403 8.5.1. DurationModeling............................................................................404 8.5.2. First-OrderAssumption.....................................................................406 8.5.3. ConditionalIndependenceAssumption..............................................407 8.6. HISTORICALPERSPECTIVEANDFURTHERREADING...........................................407 9. ACOUSTICMODELING.................................................................................413 9.1. VARIABILITYINTHESPEECHSIGNAL..................................................................414 9.1.1. ContextVariability.............................................................................415 9.1.2. StyleVariability.................................................................................416 9.1.3. SpeakerVariability............................................................................416 9.1.4. EnvironmentVariability.....................................................................417 9.2. HOWTOMEASURESPEECHRECOGNITIONERRORS............................................417 9.3. SIGNALPROCESSING—EXTRACTINGFEATURES.................................................419 9.3.1. SignalAcquisition..............................................................................420 9.3.2. End-PointDetection..........................................................................421 9.3.3. MFCCandItsDynamicFeatures......................................................423 9.3.4. FeatureTransformation.....................................................................424 9.4. PHONETICMODELING—SELECTINGAPPROPRIATEUNITS..................................426 9.4.1. ComparisonofDifferentUnits...........................................................427 9.4.2. ContextDependency..........................................................................428 9.4.3. ClusteredAcoustic-PhoneticUnits....................................................430 9.4.4. LexicalBaseforms..............................................................................434 9.5. ACOUSTICMODELING—SCORINGACOUSTICFEATURES....................................437 9.5.1. ChoiceofHMMOutputDistributions................................................437 9.5.2. Isolatedvs.ContinuousSpeechTraining...........................................439 9.6. ADAPTIVETECHNIQUES—MINIMIZINGMISMATCHES........................................442 9.6.1. MaximumaPosteriori(MAP)............................................................443 9.6.2. MaximumLikelihoodLinearRegression(MLLR)..............................446 9.6.3. MLLRandMAPComparison............................................................448 9.6.4. ClusteredModels...............................................................................450 9.7. CONFIDENCEMEASURES:MEASURINGTHERELIABILITY...................................451 9.7.1. FillerModels......................................................................................451 9.7.2. TransformationModels......................................................................452 9.7.3. CombinationModels..........................................................................454 9.8. OTHERTECHNIQUES..........................................................................................455 9.8.1. NeuralNetworks................................................................................455 9.8.2. SegmentModels.................................................................................457 9.9. CASESTUDY:WHISPER......................................................................................462 9.10. HISTORICALPERSPECTIVEANDFURTHERREADING..........................................463 TABLEOFCONTENTS vii 10.ENVIRONMENTAL ROBUSTNESS.......................................................473 10.1. THEACOUSTICALENVIRONMENT.....................................................................474 10.1.1. AdditiveNoise....................................................................................474 10.1.2. Reverberation.....................................................................................476 10.1.3. AModeloftheEnvironment..............................................................478 10.2. ACOUSTICALTRANSDUCERS.............................................................................482 10.2.1. TheCondenserMicrophone...............................................................482 10.2.2. DirectionalityPatterns.......................................................................484 10.2.3. OtherTransductionCategories.........................................................492 10.3. ADAPTIVEECHOCANCELLATION(AEC)...........................................................493 10.3.1. TheLMSAlgorithm............................................................................494 10.3.2. ConvergencePropertiesoftheLMSAlgorithm.................................495 10.3.3. NormalizedLMSAlgorithm...............................................................497 10.3.4. Transform-DomainLMSAlgorithm...................................................497 10.3.5. TheRLSAlgorithm.............................................................................498 10.4. MULTIMICROPHONESPEECHENHANCEMENT....................................................499 10.4.1. MicrophoneArrays............................................................................500 10.4.2. BlindSourceSeparation....................................................................505 10.5. ENVIRONMENTCOMPENSATIONPREPROCESSING.............................................510 10.5.1. SpectralSubtraction..........................................................................510 10.5.2. Frequency-DomainMMSEfromStereoData....................................514 10.5.3. WienerFiltering.................................................................................516 10.5.4. CepstralMeanNormalization(CMN)................................................517 10.5.5. Real-TimeCepstralNormalization....................................................520 10.5.6. TheUseofGaussianMixtureModels................................................520 10.6. ENVIRONMENTALMODELADAPTATION............................................................522 10.6.1. RetrainingonCorruptedSpeech.......................................................523 10.6.2. ModelAdaptation..............................................................................524 10.6.3. ParallelModelCombination.............................................................526 10.6.4. VectorTaylorSeries..........................................................................528 10.6.5. RetrainingonCompensatedFeatures................................................532 10.7. MODELINGNONSTATIONARYNOISE.................................................................533 10.8. HISTORICALPERSPECTIVEANDFURTHERREADING..........................................534 11.LANGUAGE MODELING............................................................................539 11.1. FORMALLANGUAGETHEORY...........................................................................540 11.1.1. ChomskyHierarchy...........................................................................541 11.1.2. ChartParsingforContext-FreeGrammars.......................................543 11.2. STOCHASTICLANGUAGEMODELS.....................................................................548 11.2.1. ProbabilisticContext-FreeGrammars..............................................548 11.2.2. N-gramLanguageModels.................................................................552 11.3. COMPLEXITYMEASUREOFLANGUAGEMODELS..............................................554 11.4. N-GRAMSMOOTHING.......................................................................................556 viii TABLEOFCONTENTS 11.4.1. DeletedInterpolationSmoothing.......................................................558 11.4.2. BackoffSmoothing.............................................................................559 11.4.3. Classn-grams....................................................................................565 11.4.4. Performanceofn-gramSmoothing....................................................567 11.5. ADAPTIVELANGUAGEMODELS........................................................................568 11.5.1. CacheLanguageModels...................................................................568 11.5.2. Topic-AdaptiveModels......................................................................569 11.5.3. MaximumEntropyModels.................................................................570 11.6. PRACTICALISSUES............................................................................................572 11.6.1. VocabularySelection.........................................................................572 11.6.2. N-gramPruning.................................................................................574 11.6.3. CFGvsn-gramModels......................................................................575 11.7. HISTORICALPERSPECTIVEANDFURTHERREADING..........................................578 12.BASICSEARCH ALGORITHMS.............................................................585 12.1. BASICSEARCHALGORITHMS............................................................................586 12.1.1. GeneralGraphSearchingProcedures..............................................586 12.1.2. BlindGraphSearchAlgorithms.........................................................591 12.1.3. HeuristicGraphSearch.....................................................................594 12.2. SEARCHALGORITHMSFORSPEECHRECOGNITION...........................................601 12.2.1. DecoderBasics..................................................................................602 12.2.2. CombiningAcousticAndLanguageModels......................................603 12.2.3. IsolatedWordRecognition................................................................604 12.2.4. ContinuousSpeechRecognition........................................................604 12.3. LANGUAGEMODELSTATES..............................................................................606 12.3.1. SearchSpacewithFSMandCFG.....................................................606 12.3.2. SearchSpacewiththeUnigram.........................................................609 12.3.3. SearchSpacewithBigrams...............................................................610 12.3.4. SearchSpacewithTrigrams..............................................................612 12.3.5. HowtoHandleSilencesBetweenWords...........................................613 12.4. TIME-SYNCHRONOUSVITERBIBEAMSEARCH..................................................615 12.4.1. TheUseofBeam................................................................................617 12.4.2. ViterbiBeamSearch..........................................................................618 12.5. STACKDECODING(A*SEARCH)........................................................................619 12.5.1. AdmissibleHeuristicsforRemainingPath........................................622 12.5.2. WhentoExtendNewWords...............................................................624 12.5.3. FastMatch.........................................................................................627 12.5.4. StackPruning.....................................................................................631 12.5.5. MultistackSearch..............................................................................632 12.6. HISTORICALPERSPECTIVEANDFURTHERREADING..........................................633 13.LARGE VOCABULARYSEARCH ALGORITHMS........................637 13.1. EFFICIENTMANIPULATIONOFTREELEXICON...................................................638

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.