Table Of ContentHandling Emotions in Human-Computer Dialogues
Johannes Pittermann • Angela Pittermann
Wolfgang Minker
Handling Emotions
in Human-Computer
Dialogues
ABC
JohannesPittermann WolfgangMinker
UniversitätUlm UniversitätUlm
Inst.Informationstechnik Fak.Ingenieurwissenschaften
Albert-Einstein-Allee43 And
89081Ulm Elektrotechnik
Germany Albert-Einstein-Allee43
johannes.pittermann@alumni.uni-ulm.de 89081Ulm
Germany
wolfgang.minker@uni-ulm.de
AngelaPittermann
UniversitätUlm
Inst.Informationstechnik
Albert-Einstein-Allee43
89081Ulm
Germany
angelapittermann@gmx.de
ISBN978-90-481-3128-0 e-ISBN978-90-481-3129-7
DOI10.1007/978-90-481-3129-7
SpringerDordrechtHeidelbergLondonNewYork
LibraryofCongressControlNumber:2009931247
°cSpringerScience+BusinessMediaB.V.2010
Nopartofthisworkmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformorby
anymeans,electronic,mechanical,photocopying,microfilming,recordingorotherwise,withoutwritten
permissionfromthePublisher,withtheexceptionofanymaterialsuppliedspecificallyforthepurpose
ofbeingenteredandexecutedonacomputersystem,forexclusiveusebythepurchaserofthework.
Coverdesign:BoekhorstDesignb.v.
Printedonacid-freepaper
SpringerispartofSpringerScience+BusinessMedia(www.springer.com)
Preface
“Thefinestemotionofwhichwearecapableisthemysticemotion”
(AlbertEinstein,1879–1955)
During the past years the “mystery” of emotions has increasingly attracted
interest in research on human–computer interaction. In this work we investigate
theproblemofhowtoincorporatetheuser’semotionalstateintoaspokenlanguage
dialoguesystem.Thebookdescribestherecognitionandclassificationofemotions
andproposesmodelsintegratingemotionsintoadaptivedialoguemanagement.
In computer and telecommunication technologies the way in how people
communicate with each other is changing significantly from a strictly structured
and formattedinformationtransfer to a flexible and more natural communication.
Spokenlanguageisthemostnaturalwayofcommunicationbetweenhumansandit
alsoprovidesaneasyandquickwaytointeractwithacomputerapplication.These
systemsrangefrominformationkioskswheretravelerscanbookflightsorbuytrain
tickets to handheld devices which show tourists around cities while interactively
giving information about points of interest. Generally, spoken language dialogue
doesnotonlymeansimplicity,comfortandsavingoftimebutmoreovercontributes
tosafetyaspectsincriticalenvironmentslikeincars,wherehands-freeoperationis
indispensible in order to keep the driver’sdistraction minimal. Within the context
ofubiquitouscomputinginintelligentenvironmentsdialoguesystemsfacilitateev-
erydaywork,e.g.,athomewherelightsorhouseholdappliancescanbecontrolled
by voice commands, and provide the possibility, especially in assisted living, to
quicklysummonhelpinemergencycases.
In parallel to the progress made in technical development the customer’s de-
mandsconcerningtheproductshaveincreased.Whilecarownersinthe1920smight
havebeencompletelysatisfiedoncetheyarrivedatadestinationwithoutanymajor
complications,peopleinthe1970swouldhavealreadytendedtobecomeannoyed
oncetheirenginerefusestostartonthefirstturnoftheignitionkey.Andnowadays
anavigationsystemshowingthewrongwaymightevencausemoreanger.Forubiq-
uitoustechnologylikecarsthismeansontheonehandthatthedriverisliterallyat
themercyofsophisticatedtechnologyontheotherhandthisdoesnothinderhim/her
from building some kind of personalrelation to the car, rangingfrom decorations
v
vi Preface
likecarfreshenersorfuzzydicetoexpensivetuning.Sucharelationincludesaswell
theexpressionofemotionstowardsthecar–justimaginedriversspurringontheir
cars when climbing a steep hill and being glad having reached the top, or drivers
shoutingattheirnon-functioningnavigationsystem, hittingorkickingtheircars...
A similar behavior can be observed among computer users. Having successfully
writtenabookusingawordprocessingsoftwaremightarousehappiness,however
asuddenharddisccrashdestroyingalldocumentswillprobablydrivetheauthorup
thewall.
Normally neither the car nor the computer is capable of replying to the user’s
affect.Sowhynotenabledevicestoreactaccordingly?Thinkofacarthatrefuses
to startandthe drivershoutingangrily“Stupidcar, I paidmorethan$40,000and
nowit’sonlycausingtrouble!”.Hereacar’sreplylike“Iamsorrythattheengine
doesnotrun properly. This is due to a defective spark-plugwhich needsto be re-
placed.”wouldcertainlydefusethetensesituationanditmoreoverprovidesuseful
informationon how to solve the problem.This again contributesto safety aspects
in the car as the driver can be calmed down, e.g., in the case of a delay due to a
trafficjam,whereuponthedrivertriestomakeupthelossoftimebyspeeding.Here
thecar’scomputercouldtrytorearrangetheplannedmeetingandinformtheuser:
“Due to ourdelay I haverescheduledyourmeeting one hourlater. So there is no
needtohurry.”
To implementa more flexible system, the typical architecture of a spoken lan-
guage dialogue system needs to be equipped with additional functionality. This
includes the recognition of emotions and the detection of situation-based param-
etersaswellasuser-stateandsituationmanagerswhichcalculatemodelsbasedon
theseparametersandinfluencethecourseofthedialogueaccordingly.
Constituting a hot topic of interest in current research there exist several ap-
proachesto classify the user’semotions. These methodsincludethe measurement
of physiological values using biosensors, the interpretation of gestures and facial
expressionsusingcameras,naturallanguageprocessingspottingemotivekeywords
and fillers in recognizedutterancesor classification of prosodicfeatures extracted
fromthespeechsignal.Concentratingonamonomodalsystemwithoutvideoinput
andtryingtoreduceinconveniencestotheuser,thisworkfocusesontherecognition
ofemotionsfromthespeechsignalusingHiddenMarkovModels(HMMs).Based
onadatabaseofemotionalspeech,asetofprosodicfeatureshasbeenselectedand
HMMshavebeentrainedandtestedforsixemotionsandtenspeakers.Duetovari-
ationsinmodelparametersmultiplerecognizershavebeenimplemented.
Accordingtotheoutputoftheemotionrecognizer(s)thecourseofthedialogue
isinfluenced.Withthehelpofauser-statemodelandasituationmodelthedialogue
strategyisadaptedandanappropriatestylisticrealizationofitspromptsischosen.
I.e.,if the user is ina neutralmoodandspeaksclearly,thereare noconfirmations
necessaryandthedialoguecanbekeptrelativelyshort.Howeveriftheuserisangry
andspeakscorrespondinglyunclearly,thesystemhastotrytocalmdowntheuser
butitalsohastoaskoftenforconfirmation,whichagainmakestheuserturnangry...
Principallythere existtwo methodsto modelthe influenceof these so-calledcon-
trolparameterslikeemotions:arule-basedapproachwhereeveryeventualityinthe
Preface vii
user’sbehavioriscoveredbyarulewhichcontainsasuitablereply,orastochastic
approachwhichmodelstheprobabilityofacertainreplyindependenceoftheuser’s
previousutterancesandcorrespondingcontrolparameters.
So how is this book organized? An introduction to the research topic is fol-
lowedbyanoverviewonemotions–theoriesandemotionsinspeech.Inthethird
chapter, dialogue strategy concepts with regard to integrating emotions in spoken
dialogue are described. Signal processing and speech-based emotion recognition
arediscussedinChapter4andimprovementstoourproposedemotionrecognizers
as well as the implementation of our adaptive dialogue manager are discussed in
Chapter5.Chapter6presentsevaluationresultsoftheemotionrecognitioncompo-
nentandoftheend-to-endsystemwithrespecttoexistingspokenlanguagedialogue
systems evaluationparadigms.The bookconcludeswith a final discussion and an
outlookonfutureresearchdirections.
Ulm, Johannes&AngelaPittermann
May2009 WolfgangMinker
Contents
1 Introduction.................................................................... 1
1.1 SpokenLanguageDialogueSystems................................... 2
1.2 EnhancingaSpokenLanguageDialogueSystem ..................... 6
1.3 ChallengesinDialogueManagementDevelopment................... 8
1.4 IssuesinUserModeling ................................................ 11
1.5 EvaluationofDialogueSystems........................................ 14
1.6 SummaryofContributions.............................................. 16
2 HumanEmotions.............................................................. 19
2.1 DefinitionofEmotion................................................... 19
2.2 TheoriesofEmotionandCategorization............................... 22
2.3 EmotionalLabeling ..................................................... 36
2.4 EmotionalSpeechDatabases/Corpora ................................. 42
2.5 Discussion ............................................................... 45
3 AdaptiveHuman–ComputerDialogue...................................... 47
3.1 BackgroundandRelatedResearch..................................... 48
3.2 User-StateandSituationManagement................................. 61
3.3 DialogueStrategiesandControlParameters........................... 65
3.4 IntegratingSpeechRecognizerConfidenceMeasures
intoAdaptiveDialogueManagement .................................. 66
3.5 IntegratingEmotionsintoAdaptiveDialogueManagement.......... 72
3.6 ASemi-StochasticDialogueModel.................................... 78
3.7 ASemi-StochasticEmotionalModel .................................. 90
3.8 ASemi-StochasticCombinedEmotionalDialogueModel ........... 95
3.9 Extending the Semi-Stochastic Combined
EmotionalDialogueModel.............................................100
3.10 Discussion ...............................................................104
4 HybridApproachtoSpeech–EmotionRecognition........................107
4.1 SignalProcessing........................................................108
4.2 ClassifiersforEmotionRecognition ...................................120
4.3 ExistingApproachestoEmotionRecognition.........................127
ix
x Contents
4.4 HMM-BasedSpeechRecognition......................................131
4.5 HMM-BasedEmotionRecognition ....................................135
4.6 CombinedSpeechandEmotionRecognition..........................142
4.7 EmotionRecognitionbyLinguisticAnalysis..........................144
4.8 Discussion ...............................................................149
5 Implementation................................................................151
5.1 EmotionRecognizerOptimizations....................................151
5.2 UsingMultiple(Speech–)EmotionRecognizers.......................159
5.3 ImplementationofOurDialogueManager ............................173
5.4 Discussion ...............................................................185
6 Evaluation......................................................................187
6.1 DescriptionofDialogueSystemEvaluationParadigms...............187
6.2 SpeechDataUsedfortheEmotionRecognizerEvaluation...........190
6.3 PerformanceofOurEmotionRecognizer..............................192
6.4 EvaluationofOurDialogueManager..................................217
6.5 Discussion ...............................................................223
7 ConclusionandFutureDirections...........................................227
A EmotionalSpeechDatabases.................................................237
B UsedAbbreviations............................................................251
References...........................................................................253
Index.................................................................................273
Chapter 1
Introduction
“How may I help you?” (cf. Gorin et al. 1997) – Imagine you are calling your
travelagency’stelephonehotlineandyoudon’tevennoticethatyouaretalkingto
acomputer.Wouldyoubesurprisedifyourvirtualdialoguepartnerrecognizedyou
bymeansofyourvoiceandifitaskedyouhowyoulikedyourprevioustrip?
Theongoingtrendofcomputersbecomingmorepowerful,smaller,cheaperand
more user-friendly leads to the effect that these devices increasingly gain in im-
portanceineverydaylife andbecome“invisible”.Withinthisso-calledubiquitous
computingthereexistalargevarietyofapplicationsanddatastructuresrangingfrom
informationretrievalsystems to controltasks andemergencycall functionality.In
order to handle these applications, a manageable user interface is required which
canberealizedwiththeaidofaspokenlanguagedialoguesystem(SLDS).
Inthischapter,wegiveabriefoverviewonthefunctionalityofSLDSandtheir
implementationincurrentdialogueapplications.Someoftheideaspresentedhere
alreadyapplysuccessfullyinstate-of-the-artdialogueapplications,otherideasare
still part of ongoing research. Thus, certain challenges still exist in the develop-
mentofspeechapplications(seealsoMinkeretal.2006b).Inthisbook,weaddress
theuser-friendlinessandthe naturalnessofanSLDS. Thisincludestheadaptation
of the dialogue to the user’s emotional state and, to accomplish that, the recog-
nition of emotionsfrom the speech signal. Thereforewe describe the architecture
of an SLDS and refer to approaches where regular dialogue systems may be im-
proved and how these improvements can be realized. Here, in Sections 1.2–1.4
especiallychallengesinthedevelopmentofadaptivedialoguemanagementaread-
dressed. In Chapters 3–5, we describe our strategies of integrating emotions into
adaptivedialoguemanagementandourapproachtospeech-basedemotionrecogni-
tionanditsderivativeslikecombinedspeech–emotionrecognitionandoptimization
approaches.Anevaluationofourmethodsaswellasasummaryandadiscussionof
futureperspectivesisgiveninChapters6and7.
J.Pittermannetal.,HandlingEmotionsinHuman-ComputerDialogues, 1
DOI10.1007/978-90-481-3129-7 1,
c SpringerScience+BusinessMediaB.V.2010