ebook img

Towards Adaptive Spoken Dialog Systems PDF

258 Pages·2013·6.519 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Towards Adaptive Spoken Dialog Systems

Towards Adaptive Spoken Dialog Systems Alexander Schmitt Wolfgang Minker • Towards Adaptive Spoken Dialog Systems 123 Alexander Schmitt Wolfgang Minker Instituteof Communications Instituteof Communications Engineering Engineering Universityof Ulm Universityof Ulm Ulm Ulm Germany Germany ISBN 978-1-4614-4592-0 ISBN 978-1-4614-4593-7 (eBook) DOI 10.1007/978-1-4614-4593-7 SpringerNewYorkHeidelbergDordrechtLondon LibraryofCongressControlNumber:2012944961 (cid:2)SpringerScience+BusinessMedia,NewYork2013 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionor informationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purposeofbeingenteredandexecutedonacomputersystem,forexclusiveusebythepurchaserofthe work. Duplication of this publication or parts thereof is permitted only under the provisions of theCopyrightLawofthePublisher’slocation,initscurrentversion,andpermissionforusemustalways beobtainedfromSpringer.PermissionsforusemaybeobtainedthroughRightsLinkattheCopyright ClearanceCenter.ViolationsareliabletoprosecutionundertherespectiveCopyrightLaw. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexempt fromtherelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. While the advice and information in this book are believed to be true and accurate at the date of publication,neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityfor anyerrorsoromissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,with respecttothematerialcontainedherein. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Preface This book investigates stochastic methods for the automatic detection of critical dialog situations in Spoken Dialog Systems (SDS) and implements data-driven modelingandpredictiontechniques.Theachievementofthisapproachistoallow for a robust and user-friendly interaction with next generation SDS. The advances in spoken language technology have led to an increasing deployment of SDS in the field, such as speech-enabled personal assistants in our smartphones. Limitations of the current technology and the great complexity of natural language interaction between man and machine keep producing problems in communication. We users may particularly experience this in our everyday usage of telephone-based SDS in call centers. Low speech recognition perfor- mance,falseexpectationstowardstheSDS,andsometimesbaddialogdesignlead to frustration and dialog crashes. Inthisbook,wepresent adata-drivenonline monitoringapproach thatenables future SDS to automatically recognize negative dialog patterns. Thereby we implement novel statistical and machine learning-based approaches. Using the knowledgeaboutexistingproblemsintheinteractionfuturedialogsystemswillbe able to change dialog flows dynamically and ultimately solve those problems. Unlike rule-based approaches, the presented statistical procedures allow a more flexible, more portable, and more accurate use. After an introduction into spoken dialog technology the book describes the foundations of machine learning and pattern recognition, serving as basis for the presented approaches. Related work in the field of emotion recognition and data- driven evaluation of SDS (e.g. the PARADISE approach) are presented and discussed. The main part of the book begins with a major symptom, which is closely connected to poor communication and critical dialog situations, namely the detection of negative emotions. Related work in the field is frequently based on artificialdatasetsusingactedandenactedspeech,whichmaynotbetransferredto real-lifeapplications.Weinvestigatehowfrequentlyusersshownegativeemotions in real-life SDS and develop novel approaches to speech-based emotion recog- nition. We follow a multilayer approach to model and detect emotions by using v vi Preface classifiers based on acoustic, linguistic and contextual features. We prove that acoustic models outperform linguistic models in recognizing emotions in real-life SDS. Furthermore, we examine the relationship between the interaction flow and the occurrence of emotions. We may show that the interaction flow of a human– machinedialoghasaninfluenceontheuser’semotionalstate. Thisfact allows us to support emotion recognition using parameters that describe the previous interaction.Forourstudies,weexclusivelyemploynon-actedrecordingsfromreal users. Not all users react emotionally when problems arise in the interaction with an SDS. Inthe second step,we therefore present novel statistical methods that allow spottingproblemswithin adialog-basedVIoninteractionpatterns.Thepresented InteractionQualityparadigmdemonstrateshowacontinuousqualityassessmentof ongoing spoken human-machine interaction may be achieved. This expert-based approach represents an objective quality view on an interaction. To which degree thisparadigmmirrorssubjectiveusersatisfactionisassessedinalaboratorystudy with 46 users. To our knowledge, this study is the first to assess user satisfaction during SDS interactions. It can be shown that subjective satisfaction correlates with objective quality assessments. Interaction parameters influencing user satis- faction are statistically determined and discussed. Furthermore, we present approaches that will enable future dialog systems to predict the dialog outcome during an interaction. The latter models allow dialog systems in call center applications to escalate endangered calls promptly to call center agents who may help out. Problems specific to the estimation of the dialog outcome are assessed and solved. An open-source workbench supporting the development and evalua- tion of such statistical models and the presentation of a parameter set used to quantify SDS interactions round off the book. The proposed approaches will increase user-friendliness and robustness in future SDS. All models have been evaluated on several large datasets of commercial and non-commercial SDS and thus have been tested for practical use. Acknowledgments This project would not have been possible without the support of many people. The authors express their deepest gratitude to David Sündermann, Jackson Liscombe,andRobertoPieraccinifromSpeechCycleInc.(USA)forsupportingus all the time with corpora still ‘‘hot from recording’’, their helpful advice, and for their warm welcomes during our stays in New York. We are notably grateful to Tim Polzehl (Deutsche Telekom Laboratories, Berlin) and Florian Metze (Carnegie Mellon University, Pittsburgh) for our prosperouscollaborationinthefieldofemotionrecognitionduringthepastyears. We wouldfurther liketothankthecrew fromDeutsche TelekomLaboratories,in particular Sebastian Möller, Klaus-Peter Engelbrecht, and Christine Kühnel for their friendly and collegial cooperation and the exchange of valuable ideas at numerous speech conferences. We are indebted to our research colleagues, students, and the technical staff at the Dialog Systems group at University of Ulm. In particular we owe a debt of gratitude to Tobias Heinroth, Stefan Ultes, Benjamin Schatz, Shu Ding, Uli Tschaffon, and Carolin Hank as well as Nada Sharaf and Sherief Mowafey from the German University in Cairo (Egypt) for their fruitful discussions and the tireless exchange of new ideas. ThankstoAllisonMichaelfromSpringerforhisassistanceduringthepublishing process. Finally, we thank our families for their encouragement. vii Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Spoken Dialog Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.1 Automatic Speech Recognition. . . . . . . . . . . . . . . . . . . 6 1.1.2 Semantic Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.3 Dialog Management . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.1.4 Language Generation and Text-to-Speech Synthesis . . . . 9 1.2 Towards Adaptive Spoken Dialog Systems. . . . . . . . . . . . . . . . 10 2 Background and Related Research. . . . . . . . . . . . . . . . . . . . . . . . 17 2.1 Machine Learning: Algorithms and Performance Metrics. . . . . . 18 2.1.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.1.2 Performance Metrics. . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.2 Emotion Recognition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.2.1 Theories of Emotion and Categorization . . . . . . . . . . . . 33 2.2.2 Emotional Speech. . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.2.3 Emotional Labeling. . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.2.4 Paralinguistic and Linguistic Features for Emotion Recognition . . . . . . . . . . . . . . . . . . . . . . . 42 2.2.5 Related Work in Speech-Based Emotion Recognition . . . 48 2.3 Approaches for Estimating System Quality, User Satisfaction and Task Success . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.3.1 Offline Estimation of Quality on Dialog- and System-Level. . . . . . . . . . . . . . . . . . . . 50 2.3.2 Online Estimation of Task Success and User Satisfaction. . . . . . . . . . . . . . . . . . . . . . . . . . 55 2.4 Summary and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3 Interaction Modeling and Platform Development . . . . . . . . . . . . . 63 3.1 Raw Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.2 Parameterization and Annotation. . . . . . . . . . . . . . . . . . . . . . . 66 ix x Contents 3.2.1 Interaction Parameters. . . . . . . . . . . . . . . . . . . . . . . . . 66 3.2.2 Emotion Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.3 A Workbench for Supporting the Development of Statistical Models for Online Monitoring . . . . . . . . . . . . . . . 84 3.3.1 Requirements Toward a Software Tool . . . . . . . . . . . . . 85 3.3.2 The Workbench . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.3.3 Data Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.3.4 Evaluating Statistical Prediction Models for Online Monitoring. . . . . . . . . . . . . . . . . . . . . . . . . 89 3.4 Summary and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4 Novel Strategies for Emotion Recognition. . . . . . . . . . . . . . . . . . . 99 4.1 Speech-Based Emotion Recognition. . . . . . . . . . . . . . . . . . . . . 101 4.1.1 Paralinguistic Emotion Recognition. . . . . . . . . . . . . . . . 101 4.1.2 Linguistic Emotion Recognition . . . . . . . . . . . . . . . . . . 109 4.2 Dialog-Based Emotion Recognition. . . . . . . . . . . . . . . . . . . . . 110 4.2.1 Interaction and Context-Related Emotion Recognition. . . 111 4.2.2 Emotional History. . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 4.2.3 Emotion Recognition in Deployment Scenarios . . . . . . . 116 4.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 4.3.1 Corpora. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 4.3.2 Human Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 122 4.3.3 Speech-Based Emotion Recognition . . . . . . . . . . . . . . . 124 4.3.4 Dialog-Based Emotion Recognition. . . . . . . . . . . . . . . . 134 4.3.5 Fusion Strategy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 4.3.6 Emotion Recognition in Deployment Scenarios . . . . . . . 143 4.4 Summary and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 5 Novel Approaches to Pattern-Based Interaction Quality Modeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 5.1 Interaction Quality Versus User Satisfaction. . . . . . . . . . . . . . . 154 5.2 Expert Annotation of Interaction Quality . . . . . . . . . . . . . . . . . 156 5.2.1 Annotation Example . . . . . . . . . . . . . . . . . . . . . . . . . . 158 5.2.2 Rating Statistics and Determination of the Final IQ Score. . . . . . . . . . . . . . . . . . . . . . . . . . 158 5.3 A User Satisfaction Study Under Laboratory Conditions . . . . . . 161 5.3.1 Lab Study Setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 5.3.2 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 5.3.3 Comparison of User Satisfaction and Interaction Quality . . . . . . . . . . . . . . . . . . . . . . . . 166 5.4 Input Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 5.5 Modeling Interaction Quality and User Satisfaction. . . . . . . . . . 169 5.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 5.6.1 Performance Metrics. . . . . . . . . . . . . . . . . . . . . . . . . . 170 Contents xi 5.6.2 Feature Set Composition . . . . . . . . . . . . . . . . . . . . . . . 171 5.6.3 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 5.6.4 Assessing the Model Performance. . . . . . . . . . . . . . . . . 174 5.6.5 Impact of Optimization Through Feature Selection. . . . . 175 5.6.6 Cross-Target Prediction and Portability . . . . . . . . . . . . . 177 5.6.7 Causalities and Correlations Between Interaction Parameters and IQ/US. . . . . . . . . . . . . . . . . . . . . . . . . 177 5.7 Summary and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 6 Statistically Modeling and Predicting Task Success. . . . . . . . . . . . 185 6.1 Linear Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 6.2 Window Modeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 6.3 SRI and Salience. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 6.4 Coping with Model Uncertainty . . . . . . . . . . . . . . . . . . . . . . . 192 6.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 6.5.1 Overall Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 197 6.5.2 Linear Versus Window Modeling . . . . . . . . . . . . . . . . . 198 6.5.3 Class-Specific Performance . . . . . . . . . . . . . . . . . . . . . 198 6.5.4 SRI-Related Features. . . . . . . . . . . . . . . . . . . . . . . . . . 199 6.5.5 Model Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 6.6 Summary and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 7 Conclusion and Future Directions. . . . . . . . . . . . . . . . . . . . . . . . . 205 7.1 Overall Summary and Synthesis of the Results. . . . . . . . . . . . . 206 7.1.1 Corpora Creation, Basic Modeling, and Tool Development . . . . . . . . . . . . . . . . . . . . . . . . 208 7.1.2 Emotion Recognition. . . . . . . . . . . . . . . . . . . . . . . . . . 208 7.1.3 Interaction Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 7.1.4 Task Success Prediction. . . . . . . . . . . . . . . . . . . . . . . . 214 7.2 Suggestions for Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 216 7.3 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Appendix A: Interaction Parameters for Exchange Level Modeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 Appendix B: Detailed Results for Emotion Recognition, Interaction Quality, and Task Success. . . . . . . . . . . . . . 225 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.