ebook img

Automatic Detection and Classification of Prosodic Events Andrew PDF

407 Pages·2009·3.8 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Automatic Detection and Classification of Prosodic Events Andrew

AutomaticDetectionandClassificationofProsodicEvents AndrewRosenberg Submittedinpartialfulfillmentofthe Requirementsforthedegree ofDoctorofPhilosophy intheGraduateSchoolofArtsandSciences COLUMBIAUNIVERSITY 2009 (cid:13)c 2009 AndrewRosenberg AllRightsReserved Abstract AutomaticDetectionandClassificationofProsodicEvents AndrewRosenberg Prosody,orintonation,isacriticallyimportantcomponentofspokencommunication. The automatic extraction of prosodic information is necessary for machines to process speech with human levels of proficiency. In this thesis we describe work on the automatic detection and classification of prosodic events – specifically, pitch accents and prosodic phraseboundaries. Wepresentnoveltechniques,featurerepresentationsandstateoftheart performanceineachofthesetasks. Wealsopresentthreeproof-of-conceptapplications– speechsummarization,storysegmentationandnon-nativespeechassessment–showingthat accesstohypothesizedprosodiceventinformationcanbeusedtoimprovetheperformance ofdownstreamspokenlanguageprocessingtasks. Webelievethecontributionsofthisthesis advance the understanding of prosodic events and the use of prosody in spoken language processingtowardsthegoalofhuman-likeprocessingofspeechbymachines. Contents ListofFigures v ListofTables xi 1 Introduction 1 2 Materials 6 2.1 ToBIStandard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 BostonDirectionsCorpus . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 BostonRadioNewsCorpus . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4 TDT-4Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3 PitchAccentDetection 14 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3 AcousticDetection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.4 AcousticPitchAccentDetectionattheWord,SyllableandVowelDomains 34 3.5 UsingFilteredEnergyFeaturestoDetectPitchAccents . . . . . . . . . . . 46 3.5.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.5.2 ResultsandDiscussion . . . . . . . . . . . . . . . . . . . . . . . . 50 3.5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.6 CorrectedEnergyBasedClassifier . . . . . . . . . . . . . . . . . . . . . . 55 i 3.6.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.6.2 ResultsandDiscussion . . . . . . . . . . . . . . . . . . . . . . . . 60 3.7 UsingPart-of-speechtagsinPitchAccentDetection . . . . . . . . . . . . . 75 3.7.1 Part-of-speech-basedWordClasses . . . . . . . . . . . . . . . . . 76 3.7.2 AcousticFeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.7.3 CombiningAcousticandWordClassInformation . . . . . . . . . . 79 3.7.4 ResultsandDiscussion . . . . . . . . . . . . . . . . . . . . . . . . 80 3.7.5 Evaluation of class-based modeling with corrected energy-based classifiers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 3.7.6 DiscussionofSyntacticExperiments . . . . . . . . . . . . . . . . . 103 3.8 ConclusionandFutureWork . . . . . . . . . . . . . . . . . . . . . . . . . 105 3.8.1 KeyObservations . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4 PhraseBoundaryDetection 112 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 4.2 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 4.3 AcousticPhraseBoundaryDetection . . . . . . . . . . . . . . . . . . . . . 125 4.3.1 RepresentationsofPitchandEnergyReset . . . . . . . . . . . . . 134 4.3.2 RepresentationsofPreboundaryLengthening . . . . . . . . . . . . 143 4.4 Lexico-SyntacticPhraseBoundaryDetection . . . . . . . . . . . . . . . . 147 4.5 DetectionofIntermediatePhraseBoundaries . . . . . . . . . . . . . . . . 158 4.6 ConclusionandFutureWork . . . . . . . . . . . . . . . . . . . . . . . . . 165 4.6.1 KeyObservations . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 5 PitchAccentTypeClassification 170 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 5.2 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 5.3 ExamplesofPitchAccentTypes . . . . . . . . . . . . . . . . . . . . . . . 177 ii 5.4 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 5.5 DescriptiveAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 5.5.1 H*v. L+H*inBURNC . . . . . . . . . . . . . . . . . . . . . . . 197 5.6 PitchAccentTypeClassificationExperiments . . . . . . . . . . . . . . . . 200 5.6.1 Basicacousticaggregations . . . . . . . . . . . . . . . . . . . . . 201 5.6.2 Thevalueofcontext-normalizedaggregations . . . . . . . . . . . . 205 5.6.3 Shapemodeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 5.6.4 Samplingstrategies . . . . . . . . . . . . . . . . . . . . . . . . . . 222 5.6.5 TheinfluenceofPhraseAccentsonPitchAccentClassification . . 227 5.6.6 BURNCsyllable-basedclassification . . . . . . . . . . . . . . . . 231 5.6.7 UseofPart-of-speechInformation . . . . . . . . . . . . . . . . . . 233 5.7 ConclusionandFutureWork . . . . . . . . . . . . . . . . . . . . . . . . . 237 5.7.1 KeyObservations . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 6 Phrase-finalTypeClassification 244 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 6.2 ExamplesofPhrase-FinalTypes . . . . . . . . . . . . . . . . . . . . . . . 249 6.3 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 6.4 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 6.5 ExperimentResultsandDiscussion . . . . . . . . . . . . . . . . . . . . . . 261 6.5.1 Acousticfeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 6.5.2 Syntacticfeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 6.5.3 RegionsofAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . 283 6.5.4 QuantizedContourModeling . . . . . . . . . . . . . . . . . . . . 289 6.5.5 Finalsegmentclassmodeling . . . . . . . . . . . . . . . . . . . . 295 6.6 PhraseAccentClassification . . . . . . . . . . . . . . . . . . . . . . . . . 298 6.7 ConclusionandFutureWork . . . . . . . . . . . . . . . . . . . . . . . . . 306 6.7.1 KeyObservations . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 iii 7 Applications 313 7.1 BroadcastNewsSummarization . . . . . . . . . . . . . . . . . . . . . . . 314 7.1.1 RelatedWorkinExtractiveSpeechSummarization . . . . . . . . . 315 7.1.2 SpeechSummarizationCorpus . . . . . . . . . . . . . . . . . . . . 316 7.1.3 SpeechSegmentation . . . . . . . . . . . . . . . . . . . . . . . . . 317 7.1.4 ExtractiveSummarizationExperimentsandResults . . . . . . . . . 319 7.1.5 ExtractiveSummarizationConclusionandFutureWork . . . . . . . 324 7.2 StorySegmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326 7.2.1 RelatedWorkinStorySegmentation . . . . . . . . . . . . . . . . . 327 7.2.2 StorySegmentationCorpus . . . . . . . . . . . . . . . . . . . . . 327 7.2.3 StorySegmentationApproach . . . . . . . . . . . . . . . . . . . . 329 7.2.4 StorySegmentationResultsandDiscussion . . . . . . . . . . . . . 331 7.2.5 StorySegmentationConclusions . . . . . . . . . . . . . . . . . . . 333 7.3 Non-nativeIntonationAssessment . . . . . . . . . . . . . . . . . . . . . . 334 7.3.1 Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338 7.3.2 Non-nativeToneDistribution . . . . . . . . . . . . . . . . . . . . . 339 7.3.3 SequentialModelingforNon-nativeIntonationAssessment . . . . 343 7.3.4 ConclusionandFutureWork . . . . . . . . . . . . . . . . . . . . . 352 7.4 KeyObservations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 8 ConclusionandFutureWork 355 8.1 KeyObservations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366 8.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 9 Bibliography 370 iv List of Figures 2.1 Example of ToBI annotation of intonation. File h1r1 from the BDC-read corpus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1 Histogram of filtered energy classifier performance accuracy with word levelfeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.2 Plot of the portion of data points correctly classified by at least N energy classifiers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.3 Detailviewofaportionofasinglepitch-basedclassifier . . . . . . . . . . 59 3.4 LogisticRegressionaccuracyusingnominalUni-,Bi-andTrigramPOStag featuresonBDC-read. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.5 LogisticRegressionaccuracyusingnominalUni-,Bi-andTrigramPOStag featuresonBDC-spontaneous. . . . . . . . . . . . . . . . . . . . . . . . . 89 3.6 LogisticRegressionaccuracyusingnominalUni-,Bi-andTrigramPOStag featuresonBURNC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 3.7 LogisticRegressionaccuracyusingnominalUni-,Bi-andTrigramPOStag modelposteriorfeaturesonBDC-read. . . . . . . . . . . . . . . . . . . . . 90 3.8 LogisticRegressionaccuracyusingnominalUni-,Bi-andTrigramPOStag modelposteriorfeaturesonBDC-spontaneous. . . . . . . . . . . . . . . . 90 3.9 LogisticRegressionaccuracyusingnominalUni-,Bi-andTrigramPOStag modelposteriorfeaturesonBURNC. . . . . . . . . . . . . . . . . . . . . . 91 v 3.10 Logistic Regression accuracy using acoustic features and nominal Uni-, Bi- andTrigramPOStagsonBDC-read. . . . . . . . . . . . . . . . . . . . . . 92 3.11 Logistic Regression accuracy using acoustic features and nominal Uni-, Bi- andTrigramPOStagsonBDC-spontaneous. . . . . . . . . . . . . . . . . 92 3.12 Logistic Regression accuracy using acoustic features and nominal Uni-, Bi- andTrigramPOStagsonBURNC. . . . . . . . . . . . . . . . . . . . . . . 93 3.13 Logistic Regression accuracy using acoustic features and mixture model likelihoodsfromUni-,Bi-andTrigramPOStagsonBDC-read. . . . . . . . 95 3.14 Logistic Regression accuracy using acoustic features and mixture model likelihoodsfromUni-,Bi-andTrigramPOStagsonBDC-spontaneous. . . 96 3.15 Logistic Regression accuracy using acoustic features and mixture model likelihoodsfromUni-,Bi-andTrigramPOStagsonBURNC. . . . . . . . . 97 3.16 Posthoc combinationof acousticand syntacticmodels usingUni-, Bi- and TrigramPOStagsonBDC-read. . . . . . . . . . . . . . . . . . . . . . . . 98 3.17 Posthoc combinationof acousticand syntacticmodels usingUni-, Bi- and TrigramPOStagsonBDC-spontaneous. . . . . . . . . . . . . . . . . . . . 98 3.18 Posthoc combinationof acousticand syntacticmodels usingUni-, Bi- and TrigramPOStagsonBURNC. . . . . . . . . . . . . . . . . . . . . . . . . 99 3.19 POS tag class-based modeling using Uni-, Bi- and Trigram POS tags on BDC-read. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 3.20 POS tag class-based modeling using Uni-, Bi- and Trigram POS tags on BDC-spontaneous. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 3.21 POS tag class-based modeling using Uni-, Bi- and Trigram POS tags on BURNC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.1 Error Ratio example. The RMSE of the single fit line is 14.41, the RMSE withtwofitlinesis7.74. . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 vi 4.2 Histogramsoffinalrhymedurationsprecedingintonationalphrasebound- aries,andnon-phraseendingwordboundaries. IntonationalPhraseBound- aries: µ = 0.229 σ = 0.100 Non Boundaries: µ = 0.125 σ = 0.0661 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 4.3 A graphical example of the parse tree distance ratio feature. In both exam- plesthedistancebetween“stop”and“from”is4;thedistanceratiointhe topexampleis2,whileinthebottomitis1. . . . . . . . . . . . . . . . . . 150 4.4 AsubsetofUCPproductionrules. . . . . . . . . . . . . . . . . . . . . . . 153 5.1 ClearExampleofH*accent. . . . . . . . . . . . . . . . . . . . . . . . . . 178 5.2 ConfusableExampleofH*accent. . . . . . . . . . . . . . . . . . . . . . . 179 5.3 ClearExampleof!H*accent. . . . . . . . . . . . . . . . . . . . . . . . . 179 5.4 ConfusableExampleof!H*accent. . . . . . . . . . . . . . . . . . . . . . 180 5.5 ClearExampleofL*accent . . . . . . . . . . . . . . . . . . . . . . . . . 180 5.6 ConfusableExampleofL*accent. . . . . . . . . . . . . . . . . . . . . . . 181 5.7 ClearExampleofL+H*accent. . . . . . . . . . . . . . . . . . . . . . . . 181 5.8 ConfusableExampleofL+H*accent. . . . . . . . . . . . . . . . . . . . . 182 5.9 ClearExampleofL+!H*accent. . . . . . . . . . . . . . . . . . . . . . . . 182 5.10 ConfusableExampleofL+!H*accent. . . . . . . . . . . . . . . . . . . . . 183 5.11 ClearExampleofL*+Haccent. . . . . . . . . . . . . . . . . . . . . . . . 184 5.12 ConfusableExampleofL*+Haccent. . . . . . . . . . . . . . . . . . . . . 184 5.13 ClearExampleofL*+!Haccent. . . . . . . . . . . . . . . . . . . . . . . . 185 5.14 ConfusableExampleofL*+!Haccent. . . . . . . . . . . . . . . . . . . . . 185 5.15 ClearExampleofH+!H*accent. . . . . . . . . . . . . . . . . . . . . . . . 186 5.16 ConfusableExampleofH+!H*accent. . . . . . . . . . . . . . . . . . . . . 186 5.17 Minimumrawandspeakernormalizedpitchbypitchaccenttype. . . . . . . 189 5.18 Maximumrawandspeakernormalizedpitchbypitchaccenttype. . . . . . 189 5.19 Maximumrawandspeakernormalizedintensitybypitchaccenttype. . . . 190 vii

Description:
5.6.5 The influence of Phrase Accents on Pitch Accent Classification 227 .. 5.22 An example of Tilt coefficients based on pitch contour . 210.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.