Polychroniou, Anna (2014) The SSPNet-Mobile Corpus: from the detection of non-verbal cues to the inference of social behaviour during mobile phone conversations. PhD thesis. http://theses.gla.ac.uk/5686/ Copyright and moral rights for this thesis are retained by the author A copy can be downloaded for personal non-commercial research or study, without prior permission or charge This thesis cannot be reproduced or quoted extensively from without first obtaining permission in writing from the Author The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the Author When referring to this work, full bibliographic details including the author, title, awarding institution and date of the thesis must be given Glasgow Theses Service http://theses.gla.ac.uk/ [email protected] T SSPN -M C : F HE ET OBILE ORPUS ROM D N -V C THE ETECTION OF ON ERBAL UES I S TO THE NFERENCE OF OCIAL B D M P EHAVIOUR URING OBILE HONE C . ONVERSATIONS A P NNA OLYCHRONIOU SUBMITTED IN FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DoctorofPhilosophy SCHOOL OF COMPUTING SCIENCE COLLEGE OF SCIENCE AND ENGINEERING UNIVERSITY OF GLASGOW JULY 2014 (cid:13)c ANNA POLYCHRONIOU Abstract Mobile phones are one of the main channels of communication in contemporary society. However, the effect of the mobile phone on both the process of and, also, the non-verbal behavioursusedduringconversationsmediatedbythistechnology,remainpoorlyunderstood. This thesis aims to investigate the role of the phone on the negotiation process as well as, the automatic analysis of non-verbal behavioural cues during conversations using mobile telephones, by following the Social Signal Processing approach. The work in this thesis includes the collection of a corpus of 60 mobile phone conversations involving 120 sub- jects,developmentofmethodsforthedetectionofnon-verbalbehaviouralevents(laughter, fillers,speechandsilence)andtheinferenceofcharacteristicsinfluencingsocialinteractions (personality traitsand conflict handling style)from speech and movements whileusing the mobile telephone, as well as the analysis of several factors that influence the outcome of decision-making processes while using mobile phones (gender, age, personality, conflict handlingstyleandcallerversusreceiverrole). The findings show that it is possible to recognise behavioural events at levels well above chancelevel,byemployingstatisticallanguagemodels,andthatpersonalitytraitsandconflict handling styles can be partially recognised. Among the factors analysed, participant role (caller versus receiver) was the most important in determining the outcome of negotiation processesin thecase ofdisagreement betweenparties. Finally, the corpuscollected forthe experiments (the SSPNet-Mobile Corpus) has been used in an international benchmarking campaignandconstitutesavaluableresourceforfutureresearchinSocialSignalProcessing andmoregenerallyintheareaofhuman-humancommunication. Acknowledgements I strongly want to thank my parents for the endless emotional and moral support they of- fered me during this truly challenging three year effort. This thesis would not have been accomplishedwithoutthem. ‘Μαμά και Μπαμπά, σας ευχαριστώ πολύ για την ανεκτίμητη ηθική και ψυχολογική υποστήριξη και για την συνεχή συμπαράσταση σας. Δεν θα κατάφερνα να ολοκληρώσω επιτυχώς αυτό το δύσκολο εγχείρημα χωρίς εσάς.’ IfeelgratefultomysupervisorsDr. AlessandroVinciarelliandDr. RodMurray-Smithfor givingmetheopportunitytobecomearesearcher. I would like to express my thankfulness to my examiners Dr. Hayley Hung and Dr. Maurizio Filippone for helping me to improve my research perspectives and skills by pointing out well-aimedcomments. Iwouldliketoexpressmygratitudetomycolleague,andmybestSwissfriend,Dr. Hugues Salamin for his objective and acute view on critical issues of this research and for offering generouslyhisexpertiseonhigh-levelstatisticsandontechnicalissuesoccurred. Finally,for keepingapositive,honestandsupportiveattitude. Last,butnotleast,myfriendandcolleagueRebeccaforherhelpandsupport. Tomyself. Contents 1 Introduction 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 ThesisStatementandResearchGoals . . . . . . . . . . . . . . . . . . . . 3 1.3 ThesisStructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 ListofPublications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Non-verbalBehaviour 9 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 HowNon-VerbalBehaviourIsExpressed . . . . . . . . . . . . . . . . . . 9 2.3 MethodsToObserveNon-VerbalCommunication . . . . . . . . . . . . . . 14 2.4 TheMethodOfThisThesis . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3 State-of-the-Art 18 3.1 ReviewingTelephone-Corpora . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 ReviewingTechnology-MediatedCorpora . . . . . . . . . . . . . . . . . . 20 3.3 ReviewingCorporaonpredictionofsocialinformation. . . . . . . . . . . . 22 3.3.1 CorporaonGroupInteraction . . . . . . . . . . . . . . . . . . . . 22 3.3.2 CorporaonVarious-TypeInteraction . . . . . . . . . . . . . . . . . 24 3.4 RequirementsofCorpusCollection . . . . . . . . . . . . . . . . . . . . . . 26 3.5 TheSSPNet-MobileCorpus. . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4 TheSSPNet-MobileCorpus: AcquisitionoftheData. 31 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2 Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.3 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.4 PsychologicalTests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.4.1 ConflictHandlingStyle: theRahimConflictInventory-II(ROCI-II) 38 4.4.2 PersonalityTraits: TheBigFiveInventory-10(BFI-10) . . . . . . . . 41 4.5 RecordingHumanBehaviour . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.5.1 SensorsandSignals . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.5.2 Synchronisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5 TheSSPNet-MobileCorpus: AnnotationoftheData. 50 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.2 Annotationmodel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.2.1 BehaviouralEvents . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.2.2 Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.3 AnnotatedBehaviouralEvents . . . . . . . . . . . . . . . . . . . . . . . . 52 5.3.1 SpeakingTime . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.3.2 Laughter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.3.3 OverlappingSpeech . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.3.4 BackChannel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.3.5 Fillers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.3.6 Silence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6 The“Caller-ReceiverEffect”onNegotiationsUsingMobilePhones. 58 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.2 PreviousWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.2.1 NegotiationandInterpersonalRelationships . . . . . . . . . . . . . 58 6.2.2 TechnologyandBehaviourChange . . . . . . . . . . . . . . . . . 59 6.3 TheCaller-Receiver Effect . . . . . . . . . . . . . . . . . . . . . . . . . . 60 6.3.1 GenderEffects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 6.3.2 AgeEffects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 6.3.3 PersonalityEffects . . . . . . . . . . . . . . . . . . . . . . . . . . 63 6.3.4 ConflictHandlingStyleEffects . . . . . . . . . . . . . . . . . . . 65 6.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 7 Personality traits and conflict handling style recognition from audio and motor activationdata. 68 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 7.2 TheData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 7.3 ExperimentsandResults . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 7.3.1 SpeechFeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 7.3.2 MotorActivationFeatures . . . . . . . . . . . . . . . . . . . . . . 69 7.3.3 Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 7.3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 7.3.5 Furtherexperimentation . . . . . . . . . . . . . . . . . . . . . . . 72 7.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 8 AutomaticdetectionofLaughterandFillers. 75 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 8.2 Previouswork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 8.2.1 TheClassificationProblem . . . . . . . . . . . . . . . . . . . . . . 76 8.2.2 TheSegmentationProblem . . . . . . . . . . . . . . . . . . . . . . . 77 8.3 Thedata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 8.4 Themodel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 8.4.1 HiddenMarkovModel . . . . . . . . . . . . . . . . . . . . . . . . 80 8.4.2 FeaturesandModelParameters . . . . . . . . . . . . . . . . . . . . 81 8.5 ExperimentsandResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 8.5.1 PerformanceMeasures . . . . . . . . . . . . . . . . . . . . . . . . 82 8.5.2 DetectionResults . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 8.5.3 ComParEInterspeech2013Challenge . . . . . . . . . . . . . . . . 85 8.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 9 Conclusions 89 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 9.2 ResultsandContributionsoftheThesis . . . . . . . . . . . . . . . . . . . 89 9.3 DataefficacyandFutureImprovements . . . . . . . . . . . . . . . . . . . 90 9.4 FutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 9.5 FinalRemarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 A ProtocolandScenario 93 B ConsentForm 97 C TheBF-10Questionnaire 99 D TheConflictHandlingStyleQuestionnaire 101 Bibliography 103 List of Tables 3.1 Past corpora collected for the investigation of various aspects of behaviour duringmeetingsofsmallgroupsordyads. . . . . . . . . . . . . . . . . . . 30 4.1 Thetablereportsthenumberandthepercentagesofparticipantspergender, educationalbackgroundandnationality. . . . . . . . . . . . . . . . . . . . 34 4.2 MotionSensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 6.1 Gender effects. The table reports the gender compositon of the subjects (“Total” column)aswellasthepersuasivenessofmaleandfemalesubjects at bothcallanditemlevel. Accordingtoatwo-tailedbinomialtest,thep-value ishigherthan0.42forallpersuasivenessfigures. . . . . . . . . . . . . . . 63 7.1 The values of variance v for PCA, gamma γ that correspond to the model giving thebest performances after cross-validation. The costs (regularization term)C forSVMisequalto0.01forallcases. . . . . . . . . . . . . . . . . . 71 7.2 Accuracyforpersonalitytraits(upperpart)andconflicthandlingstyles(lower part) using speech, phone movements and their combination (S+M). Bold values are higher than the a-priori probability of the most frequent class (second column from the left) to a statistically significant extent (p-value at 5%). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 8.1 Thetable reportsthedetails ofthemain recentworkson laughterdetection presentedintheliterature. Thefollowingabbreviationsareused: A=Audio, V=Video, C=Classification, S=Segmentation, Acc=Accuracy, EER=Equal ErrorRate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 8.2 HMMperformanceoverthe5-foldsetup. . . . . . . . . . . . . . . . . . . 82 8.3 HMMperformanceoverthechallengesetup. . . . . . . . . . . . . . . . . 82 8.4 Confusion Matrix for the 2-gram language model with λ = 100. The rows correspondtothegroundtruthandthecolumnstotheclassattributedbythe classifier. Eachcellisatimeinseconds . . . . . . . . . . . . . . . . . . . 84 8.5 Thetablereportstheapproaches,theextractedfeaturesandtheperformances ofeachparticipantintheSocialSignalSub-ChallengeoftheInterspeech2013 Computational Paralinguistics Challenge. The measurements are the Area Under Curve (AUC) for laughter and filler separately and the Unweighted AverageofAreaUnderCurve(UAAUC)ofthetwoclassesandarepresented in the three columns on the right. The first line corresponds to the results presentedbytheorganisersoftheComParE.Thelastlinepresentstheresults of the approach presented in this thesis. The numbers in the column of Features correspondto certain features1: Intensity contour, 2: Pitch contour, 3: Timbral contour, 4: Rhythmic patterns, 5: Spectral tilt, 6: Duration, 7: Lengthofprecedingandfollowingpauses. Forfurtherexplanationof1to4 see[Ohetal.,2013]andforafulldescriptionofphoneticfeaturessee[Wagner etal.,2013]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Description: