ebook img

Implicit Media Tagging and Affect Prediction from video of spontaneous facial expressions, recorded with depth camera PDF

2.4 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Implicit Media Tagging and Affect Prediction from video of spontaneous facial expressions, recorded with depth camera

Implicit Media Tagging and Affect Prediction from video of spontaneous facial expressions, recorded with depth camera 7 1 0 2 By n a J DANIEL HADAR 8 1 ] Under the supervision of C H PROF. DAPHNA WEINSHALL . s c [ 1 v 8 4 2 5 0 . 1 0 7 1 : v i X r a Department of Cognitive Science HEBREW UNIVERSITY Athesissubmittedinpartialfulfillmentoftherequirements for the degree of MASTER OF COGNITIVE SCIENCE in the FacultyofHumanities. DECEMBER 2016 ABSTRACT Wepresentamethodthatautomaticallyevaluatesemotionalresponsefromspontaneousfacial activityrecordedbyadepthcamera.Theautomaticevaluationofemotionalresponse,oraffect,is afascinatingchallengewithmanyapplications,includinghuman-computerinteraction,media tagging and human affect prediction. Our approach in addressing this problem is based on the inferred activity of facial muscles over time, as captured by a depth camera recording an individual’s facial activity. Our contribution is two-fold: First, we constructed a database of publicly available short video clips, which elicit a strong emotional response in a consistent manner across different individuals. Each video was tagged by its characteristic emotional response along 4 scales: Valence, Arousal, Likability and Rewatch (the desire to watch again). Thesecondcontributionisatwo-steppredictionmethod,basedonlearning,whichwastrained andtestedusingthisdatabaseoftaggedvideoclips.Ourmethodwasabletosuccessfullypredict the aforementioned 4 dimensional representation of affect, as well as to identify the period of strongestemotionalresponseintheviewingrecordings,inamethodthatisblindtothevideoclip being watch, revealing a significantly high agreement between the recordings of independent viewers. i ACKNOWLEDGMENTS My humble thanks and appreciation to my supervisor, Prof. Daphna Weinshall, that guided methroughoutthisresearchandwasdeeplyinvolvedinit.Ihavebeenprivilegedtohaveher guidanceandsupport. Inaddition,IamgratefultoTaliaGranot,forherpracticalassistanceregardingutilizingfacial expressions,aswellasourbrainstormmeetingsthatprovidedmewithnewideasandinspiration. Manythanksalsogoestothereadersofthisdissertation:HillelAviezerandNirFierstein,for theirconstructivecomments. Lastbutnotleast,Iwouldliketothankmyparents,LimorandShuki,fortheirnever-ending backingandsupport,andtomybelovedwifeLiat,forherendlessoptimismandencouragement. ii TABLE OF CONTENTS Page ListofTables v ListofFigures vi 1 Introduction 1 2 TheoreticalBackgroundandPreviousWork 5 2.1 QuantificationofEmotionandFacialExpressions . . . . . . . . . . . . . . . . . . . 5 2.2 ImplicitMediaTaggingandAffectPrediction . . . . . . . . . . . . . . . . . . . . . . 7 2.3 DepthCameras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4 FacialResponseHighlightPeriod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3 DatabaseConstruction 11 3.1 ElicitingEmotionviaVideoClips. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2 DatabaseCriteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.4 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.5 ResultsandAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4 Method 17 4.1 ExperimentalDesign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2 FacialExpressionsRecording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.3 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.4 PredictiveModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5 ResultsandAnalysis 25 5.1 LearningPerformance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5.2 RelativeImportanceofFeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.3 LocalizationofHighlightPeriod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 6 Discussion 31 iii TABLE OF CONTENTS AppendixA ReviewofEmotionElicitDatabases 33 Bibliography 37 iv LIST OF TABLES TABLE Page 3.1 Emotionelicitdatabasesummary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2 Mean,median,standarddeviationandrangeofthedifferentscalesoverallclips,as wellasIntra-CorrelationCoefficient(ICC)andCronbach’sα.. . . . . . . . . . . . . . . 14 4.1 Summaryofmodelslearned.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.1 Mean(andstd)ofPearson’sRbetweenthepredictedandactualranks,aswellasthe averagecorrelationbetweenthesubjectivereportandmediatags. . . . . . . . . . . . . 25 5.2 Accuracy(andstd)ofthederivedbinarymeasure. . . . . . . . . . . . . . . . . . . . . . . 27 5.3 Mean (and std) of Pearson’s R between the predicted and actual ranks of AP* and One-Viewer-Outmethod. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.4 MeanerrorofallviewerssubjectiveranksandAP-1,IMT-1andIMT-2’spredictions.. 28 A.1 Emotionelicitingvideoclipsdatabases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 v LIST OF FIGURES FIGURE Page 1.1 The triangular relationship between the facial expression, the media tags and the viewer’saffectivestate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Facebook’sreactions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.1 ExamplesfromtheFacialActionCodingSystem[28]. . . . . . . . . . . . . . . . . . . . 6 2.2 Illustrationofthedifferencebetweenaffectprediction(f)andimplicitmediatagging (g). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.1 Correlationsbetweenvalence,arousal,likabilityandrewatch. . . . . . . . . . . . . . . 15 3.2 Distribution of 4 subjective scores over all tested clips, where valence and arousal definethetwomainaxes,alsosummarizedinhistogramformaboveandtotheright oftheplot.Sizeandcolorcorrespondrespectivelytotheremaining2scoresofrewatch andlikability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.1 TheExperimentaldesign. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2 (A) Quantized AU signal (K =4), and (B) Its corresponding transition matrix. The numberofframeslabeled0,1,2,3is6,3,5,5,respectively.Therefore:ActivationRatio= 13, ActivationLength= 7 and ActivationLevel=1.473, and ChangeRatio= 6 , 19 19 18 SlowChangeRatio= 4 , FastChangeRatio= 2 . . . . . . . . . . . . . . . . . . . . . . 20 18 18 4.3 Facialresponse(middlerow)toavideoclip(illustratedinthetoprow),andthetime varyingintensityofAU12(bottomrow). . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.4 Illustrationofthetwo-steppredictionalgorithm. . . . . . . . . . . . . . . . . . . . . . . 22 5.1 Correlationexamplebetweenpredictedandactualratingsofasingleviewer’svalence score(R=0.791). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.2 TherelativecontributionofdifferentfeaturegroupstoIMT-1. . . . . . . . . . . . . . . 29 5.3 TherelativecontributionofdifferentfeaturegroupstoAP-1. . . . . . . . . . . . . . . . 29 5.4 HistogramofHPsrelativetotheclips’endtime,whichmarkstheoriginofthe X−axis (µ=−7.22,σ=4.14,χ2=0.86). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 vi R 1 E T P A H C INTRODUCTION TheRomanPhilosopherCicerowrote,"Thefaceisapictureofthemind".Thisstatement hasbeendiscussedanddebatedrepeatedlyovertheyears,withinthescientificcommunity and outside it – are the face really a window to the soul? Does human emotion reflect in facial expressions? There is a vast agreement among scholars that the answer is, at least partially–yes.Hence,weaskedthefollowingquestion–coulditbedoneautomatically?Thatis, couldcomputervisiontoolsbeutilizedtoevaluatehumans’emotionalstate,giventheirfacial expressions? Inthepasttwodecadeswehadwitnessedanincreasedinterestinautomaticmethodsthat extractandanalyzehuman’semotions,oraffectivestate.Thepotentialapplicationsofautomatic affectrecognitionvaryfromhumancomputerinteractiontoemotionalmediatagging,including for example the creation of a user’s profile in various platforms, building emotion-driven HCI systems, and emotion-based tagging of dating sites, videos on YouTube or posts on Facebook. Indeed,inrecentyearsmediatagginghasreceivedmuchattentionintheresearchcommunity (e.g.[83,92]). Inthisworkwetookadvantageoftheemergingtechnologyofdepthcameras.Recently,depth cameras based on structured light technology have emerged as a means to achieve effective humancomputerinteractioningaming,basedonbothgesturesandfacialexpressions[1].We usedadepthcamera(Carmine1.09)torecordparticipantsfacialresponsetoasetofvideoclips designed to elicit emotional response, and developed two types of pertinent prediction models forautomaticquantitativeevaluationofaffect:modelsfortaggingvideoclipsfromhumanfacial expressions(i.e.implicitmediatagging),andmodelsforpredictingviewersaffectivestate,given theirfacialbehavior(i.e.affectprediction).Respectively,aclearseparationbetweenthemshould bedrawn. Bothimplicitmediataggingandaffectpredictionconcerntheestimationofemotion-related 1 CHAPTER 1. INTRODUCTION indicatorsbasedonnon-verbalcues,buttheydifferintheirtarget:inthefirst,thepurposeis predictingattributesofthemultimediastimuli,whileinthelatter,thehumanaffectisthematter inhand.Thisdistinctioncouldbemadeclearbyobservingthetriangularrelationshipbetween the video clip’s affective tagging, the facial response to it and the viewer’s reported emotional feedback(seeFigure1.1)–implicitmediataggingconcernstheautomatedannotationofastimuli directly from spontaneous human response, while affect prediction deals with predicting the viewer’saffectivestate.Tobenotedthatobjectsandlocationsidentificationisnotapartofthis work’sscope(e.g.[88]),butonlyemotional-relatedtags. Facial Expression Implicit Affect Media Prediction Tagging Affective Media State Tags FIGURE 1.1: THE TRIANGULAR RELATIONSHIP BETWEEN THE FACIAL EXPRESSION, THE MEDIA TAGS AND THE VIEWER’S AFFECTIVE STATE. Asopposedtoexplicittagging,inwhichtheuserisactivelyinvolvedinthetaggingprocess, implicittaggingisdonepassively,andreliesonlyonthetypicalinteractiontheuserhavewith suchstimuli(e.g.watchingavideoclip).Assuch,itislesstimeandenergyconsuming,andmore likelytobefreeofbiases.Ithasbeensuggestedthatexplicittaggingtendstoberatherinaccurate in practice; for example, users tend to tag videos according to their social needs, which yields taggingthatcouldbereputation-driven,especiallyinasetupwheretheuser’sfriends,colleagues orfamilymaybeexposedtotheirtags[77]. Inrecentyears,mediatagginghadbecameanintegralpartofsurfingtheinternet.Manyweb platformsallow(andevenencourage)userstolabeltheircontentbyusingkeywords(e.g.funny, wow,LOL)ordesignatedscales(e.g.Facebook’sreactions,seeFigure1.2).Clearly,non-invasive methodswhichproducesuchtagsimplicitlycanbeofgreatinterest.Thatbeingsaid,implicit mediaiscomplementary(ratherthancontradictory)toexplicitmediatagging,andassuchcan be used for assessing the correctness of explicit tags [45], or for examining the inconsistency betweentheintentional(explicit)andinvoluntary(implicit)tagging[63,64]. FIGURE 1.2: FACEBOOK’S REACTIONS. 2

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.