ebook img

A Syllable, Articulatory-Feature, and Stress-Accent Model of Speech Recognition PDF

286 Pages·2.255 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview A Syllable, Articulatory-Feature, and Stress-Accent Model of Speech Recognition

A Syllable, Articulatory-Feature, and Stress-Accent Model of Speech Recognition by Shuangyu Chang B.S.E. (University of Michigan, Ann Arbor) 1997 A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the GRADUATE DIVISION of the UNIVERSITY of CALIFORNIA, BERKELEY Committee in charge: Professor Nelson Morgan, Cochair Dr. Lokendra Shastri, Cochair Dr. Steven Greenberg Professor Edwin R. Lewis Professor David L. Wessel Professor Lot(cid:12) A. Zadeh Fall 2002 The dissertation of Shuangyu Chang is approved: Cochair Date Cochair Date Date Date Date Date University of California, Berkeley Fall 2002 A Syllable, Articulatory-Feature, and Stress-Accent Model of Speech Recognition Copyright (cid:13)c Fall 2002 by Shuangyu Chang 1 Abstract A Syllable, Articulatory-Feature, and Stress-Accent Model of Speech Recognition by Shuangyu Chang Doctor of Philosophy in Computer Science University of California, Berkeley Professor Nelson Morgan, Dr. Lokendra Shastri, Cochairs Current-generation automatic speech recognition (ASR) systems assume that wordsarereadilydecomposableintoconstituentphoneticcomponents(\phonemes"). A detailed linguistic dissection of state-of-the-art speech recognition systems indi- cates that the conventional phonemic \beads-on-a-string" approach is of limited utility, particularly with respect to informal, conversational material. The study shows that there is a signi(cid:12)cant gap between the observed data and the pronunci- ation models of current ASR systems. It also shows that many important factors a(cid:11)ecting recognition performance are not modeled explicitly in these systems. Motivated by these (cid:12)ndings, thisdissertation analyzes spontaneous speech with respect to three important, but often neglected, components of speech (at least with respect to English ASR). These components are articulatory-acoustic fea- tures (AFs), the syllable and stress accent. Analysis results provide evidence for an alternative approach of speech modeling, one in which the syllable assumes pre- 2 eminent status and is melded to the lower as well as the higher tiers of linguistic representation through the incorporation of prosodic information such as stress accent. Using concrete examples and statistics from spontaneous speech material it is shown that there exists a systematic relationship between the realization of AFs and stress accent in conjunction with syllable position. This relationship can be used to provide an accurate and parsimonious characterization of pronunciation variation in spontaneous speech. An approach to automatically extract AFs from the acoustic signal is also developed, as is a system for the automatic stress-accent labeling of spontaneous speech. Based on the results of these studies a syllable-centric, multi-tier model of speech recognition is proposed. The model explicitly relates AFs, phonetic seg- ments and syllable constituents to a framework for lexical representation, and in- corporates stress-accent information into recognition. A test-bed implementation of the model is developed using a fuzzy-based approach for combining evidence from various AF sources and a pronunciation-variation modeling technique using AF-variation statistics extracted from data. Experiments on a limited-vocabulary speech recognition task using both automatically derived and fabricated data demonstrate the advantage of incorporating AF and stress-accent modeling within the syllable-centric, multi-tier framework, particularly with respect to pronuncia- tionvariationinspontaneousspeech. Professor Nelson Morgan Dissertation Committee Cochair Dr. Lokendra Shastri Dissertation Committee Cochair i To Jiangxin ii Contents List of Figures vii List of Tables xvi 1 Introduction 1 1.1 The Conventional Model of Speech Recognition . . . . . . . . . . . 1 1.2 Finding Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2 Linguistic Dissection of LVCSR Systems 12 2.1 Background Information . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1.1 Corpus Material . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.1.2 Participating Systems . . . . . . . . . . . . . . . . . . . . . 16 2.2 Analysis Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2.1 Word and Phone Error Patterns . . . . . . . . . . . . . . . . 17 2.2.2 Syllable Structure and Syllable Position . . . . . . . . . . . . 25 2.2.3 Articulatory-acoustic Features and Syllable Position . . . . . 28 2.2.4 Prosodic Stress Accent and Word Errors . . . . . . . . . . . 32 2.2.5 Speaking Rate and Word Errors . . . . . . . . . . . . . . . . 35 iii 2.2.6 Pronunciation Variation and Word Errors . . . . . . . . . . 37 2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3 Articulatory-acoustic Features 43 3.1 Background and Previous Work . . . . . . . . . . . . . . . . . . . . 44 3.2 Automatic Extraction of Articulatory-acoustic Features . . . . . . . 48 3.2.1 System Description . . . . . . . . . . . . . . . . . . . . . . . 48 3.2.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.2.3 Extension to Automatic Phonetic Labeling . . . . . . . . . . 56 3.3 Manner-speci(cid:12)c Training and the \Elitist" Approach . . . . . . . . 61 3.3.1 AF Classi(cid:12)cation on the NTIMIT Corpus . . . . . . . . . . 61 3.3.2 An \Elitist" Approach . . . . . . . . . . . . . . . . . . . . . 62 3.3.3 Manner-Speci(cid:12)c Training . . . . . . . . . . . . . . . . . . . . 67 3.4 Cross-linguistic Transfer of AFs . . . . . . . . . . . . . . . . . . . . 69 3.5 Robustness of AFs . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.5.1 Corpus Material with Noise . . . . . . . . . . . . . . . . . . 76 3.5.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . 76 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4 Speech Processing at the Syllable Level 81 4.1 What is a Syllable? . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.2 The Stability and Importance of the Syllable in Speech Perception . 84 4.2.1 Stability of Syllables in Speech Corpora . . . . . . . . . . . . 85 4.2.2 Acoustic-based Syllable Detection and Segmentation . . . . 85 iv 4.2.3 Signi(cid:12)cance of Syllable Duration . . . . . . . . . . . . . . . . 87 4.2.4 Syllables and Words . . . . . . . . . . . . . . . . . . . . . . 89 4.3 Pronunciation Variation, Prosody and the Syllable . . . . . . . . . . 89 4.4 Articulatory-acoustic Features and the Syllable . . . . . . . . . . . 93 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5 Stress Accent in Spontaneous American English 104 5.1 Stress Accent in Spontaneous American English . . . . . . . . . . . 105 5.1.1 The Perceptual Basis of Stress Accent . . . . . . . . . . . . 106 5.1.2 Vocalic Identity and Stress Accent . . . . . . . . . . . . . . 107 5.2 Stress Accent and Pronunciation Variation . . . . . . . . . . . . . . 111 5.2.1 Pronunciations of \That" { Revisited . . . . . . . . . . . . . 112 5.2.2 Impact of Stress Accent by Syllable Position . . . . . . . . . 114 5.3 Automatic Stress-Accent Labeling of Spontaneous Speech . . . . . . 131 5.3.1 System Description . . . . . . . . . . . . . . . . . . . . . . . 132 5.3.2 Experiments on the Switchboard Corpus . . . . . . . . . . . 134 5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 6 A Multi-tier Model of Speech Recognition 140 6.1 Model Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 6.2 Questions Regarding the Multi-tier Model . . . . . . . . . . . . . . 145 6.3 Test-bed System Implementation . . . . . . . . . . . . . . . . . . . 146 6.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 6.3.2 AF Classi(cid:12)cation and Segmentation . . . . . . . . . . . . . . 152 v 6.3.3 Stress-accent Estimation . . . . . . . . . . . . . . . . . . . . 154 6.3.4 Word Hypothesis Evaluation . . . . . . . . . . . . . . . . . . 155 6.3.5 Cross-AF-dimension Syllable-score Combination . . . . . . . 158 6.3.6 Within-syllable Single-AF-dimension Matching . . . . . . . . 164 6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 7 Multi-tier Recognition { Experiments and Analysis 171 7.1 Experimental Conditions . . . . . . . . . . . . . . . . . . . . . . . . 172 7.2 Overall System Performance . . . . . . . . . . . . . . . . . . . . . . 174 7.3 Testing the Contribution of Stress Accent . . . . . . . . . . . . . . . 181 7.4 Testing Pronunciation Modeling . . . . . . . . . . . . . . . . . . . . 182 7.5 Testing the Contribution of Syllable Position . . . . . . . . . . . . . 185 7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 8 Conclusions and Future Work 190 8.1 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . 191 8.1.1 Linguistic Dissection of LVCSR Systems . . . . . . . . . . . 191 8.1.2 Detailed Analysis of the Elements of Speech . . . . . . . . . 192 8.1.3 An Alternative Model of Speech . . . . . . . . . . . . . . . . 195 8.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 8.2.1 Incorporation into Conventional Systems . . . . . . . . . . . 198 8.2.2 Further Analysis and Experiments . . . . . . . . . . . . . . . 200 8.2.3 An Improved Framework and Implementation . . . . . . . . 202 8.3 Coda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.