ebook img

Unsupervised Latent Behavior Manifold Learning from Acoustic Features: audio2behavior PDF

0.41 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Unsupervised Latent Behavior Manifold Learning from Acoustic Features: audio2behavior

UNSUPERVISEDLATENTBEHAVIORMANIFOLDLEARNINGFROMACOUSTIC FEATURES:AUDIO2BEHAVIOR HaoqiLi1,BrianBaucom2,PanayiotisGeorgiou1 1UniversityofSouthernCalifornia,LosAngeles,CA,USA 2TheUniversityofUtah,DepartmentofPsychology,UT,USA 7 ABSTRACT manbehavioralcodingsystemforcouplesinteraction. Todealwith 1 datasparsity,asparselyconnectedanddisjointlytraineddeepneural Behavioral annotation using signal processing and machine learn- 0 networks (SD-DNN) framework was introduced in [9], that limits ingishighlydependentontrainingdataandmanualannotationsof 2 thenumberoftrainedparametersatanytime. behavioral labels. Previous studies have shown that speech infor- DespitetheseeffortsintheBSPdomain,itisstillchallengingto n mation encodes significant behavioral information and be used in extracteffectivebehaviorrepresentationsfromhigh-dimensionality a a variety of automated behavior recognition tasks. However, ex- acoustic features. Over the last few years, Deep Neural Networks J tractingbehaviorinformationfromspeechisstilladifficulttaskdue havedemonstratedpromiseintheircapabilitytolearnhighlevelrep- 2 to the sparseness of training data coupled with the complex, high- resentationfromrawdata.Forinstance,bytrainingDNNwithaudio 1 dimensionalityofspeech,andthecomplexandmultipleinformation featuresinput,andcorrespondinglabels(e.g.,emotionrecognitionin streamsitencodes. Inthisworkweexploittheslowvaryingprop- [10,11],keywordspottingin[12])astarget,theoutputofDNNcan ] ertiesofhumanbehavior. Wehypothesizethatnearbysegmentsof G beregardedasrepresentationofrawinputdata.However,thissuper- speechsharethesamebehavioralcontextandhenceshareasimilar visedframeworkfailsinourspecificdomain,sinceahugeamount L underlyingrepresentationinalatentspace.Specifically,wepropose oftrainingdatawithannotatedlabelsisessential. Datasparseness . aDeepNeuralNetwork(DNN)modeltoconnectbehavioralcontext s limitstheuseofAImethodsforemotions,stress,andbehaviores- andderivethebehavioralmanifoldinanunsupervisedmanner. We c timation. Thus, in this work we propose an unsupervised way of [ evaluatetheproposedmanifoldinthecouplestherapydomainand exploitingdatafortheBSPdomain. Wefurtherinvestigatewhether also provide examples from publicly available data (e.g. stand-up outofdomaindatacanbeemployedforin-domainbehavioralquan- 1 comedy).Wefurtherinvestigatetrainingwithinthecouples’therapy v tification. domainandfrommoviedata.Theresultsareextremelyencouraging 8 Recently,contextinformationhasbeenusedforarangeofap- andpromiseimprovedbehavioralquantificationinanunsupervised 9 plications. For instance in developing the word2vec model Mi- mannerandwarrantsfurtherinvestigationinarangeofapplications. 1 likov et al.[13, 14] have proposed an embedding that ties 1-hot 3 IndexTerms— BehaviorSignalProcessing,manifoldlearning, word representations of nearby words via an intermediate, hid- 0 unsupervisedlearning,behaviorrepresentation den, vector representation. Similar to auto-encoders or bottleneck . representations[15,16],thehiddenlayerattemptstoconnectthein- 1 formationattheinputandoutputlayers,butinthiscasetheinforma- 0 1. INTRODUCTION tionresidesatalongerscalethaneitherofthetworepresentations– 7 namelycontext. 1 Analysis and classification of human behaviors is one of the core : tasks of observational study. For example, in couples therapy, Our proposed framework employs a similar idea to the v psychologists observe and identify domain-specific behaviors(e.g., word2vec. Since humans employ a large temporal window to ob- Xi blameandacceptance)duringcoupleinteractions,andprovidespe- servethecontextandevaluatebehaviors,wecanhenceassumethat cifictreatmentsbasedontheiranalysis. behaviorremainsrelativelyconstantwithinasufficientlylongwin- r dow. This matches also annotation guidelines in the field of psy- a Behavior estimation process is a complicated task. Different chology where the minimum observation windows are usually set fromemotions,humanbehaviorssuchasacceptance,areoftenman- at30seconds. Italsomatchesempiricalunderstandingofbehavior. ifested over long time scales. Longer context needs to be con- Forexample,oneperson(oftenthecaseincouplestherapyinterac- sidered when human annotators attempt to quantify behavior. Be- tionsaswellaseverydaylife)canbesadduringaconversionfora causeofthat,humanratersneedtocombineinformationatdifferent long window of time despite different intonations and speech pat- timescalestoestimatebehaviorscorrectly. Itisdifficulttosimulate ternsthroughoutthattemporalwindow. the complex non-linear nature of the annotation process using one specificalgorithm. Moreover,datawithrichbehavioralinformation In our paper, we propose an unsupervised behavior manifold frompsychotherapydomainsareoftenseverelylimitedinquantity learning using Deep Neural Network via unlabeled acoustic fea- duetoprivacyconstraintsandcostofannotation. tures. We learn the manifold with unlabeled within-domain data and from Out-Of-Domain (OOD) data. We evaluate if the knowl- Integrating machine learning and signal processing methods, edge gained includes behaviorally meaningful information within Behavior Signal Processing (BSP)[1, 2] employs acoustic[3, 4], andOODtrainingandwithinandOODtesting. lexical[5, 6], and visual[7, 8] information to model and analyze multi-modalhumanbehaviors. Forexample,incouplestherapydo- The rest of paper is organized as follows: Section 2 describes main,usingacousticfeatures,Blacketal.[3]builtanautomatichu- in detail our proposed manifold learning to obtain behavior repre- sentationinanunsupervisedmanner. Section3providesabriefde- WorksupportedbyNSFandDoD. scriptionofthedatabaseusedinourpaper,afterwhichwedescribe audioprocessing,featureextractionstepsandexperimentsettingsin Unsupervised Out of Domain Data Domain Labeled Data section4.Afterthat,wediscussourresultsinsection5.Finally,we test frame giveourconclusionandfutureworkinsection6. frame (k-w) frame (k-w+1) … frame (k+w) labeled frame 2. METHODOLOGY hidden layers Thesuccessofmachinelearningalgorithmscanbeattributedtotwo bottleneck Behavior monadinitpcroanpelretaiersn:tfihrasttftuhnecDtioNnNbacsaendreopnrelasregnetaanmyofuunntcstioofnd,aatnad. sTehce- layer Manifold Proposed underlyingrepresentationsthattheDNNidentifiesarecriticaltoits hidden layers Pick closest sample success[17, 18]. In the BSP domain, we often suffer from lack of data while the complexities of the signal require the use of high- frame (k) Feature Baseline dimensionalityacousticfeatures. Thegoalofthispaperistoiden- Space tify,inanunsupervisedmanner,alatentmanifoldwherethesignal Audio retainsitsbehavioralcharacteristics. Inthisbehavioralmanifoldwe expectsimilarbehaviorstoappearclosertogetherthantheydointhe originalsignalspaceorinthefeaturespace. Basedonthegeomet- Fig.1.Behaviorrepresentationtrainingframework ricnotionofmanifolds,thelearnedrepresentationcanbeassociated withanintrinsiccoordinatesystemontheembeddedmanifold[17]. similarlyratedsessions.Forexample,afterlearningmanifoldonun- Inourcase,aneffectivebehaviormanifoldshouldpreserveinforma- superviseddata,atestsession(withknownrating)canbecompared tionresidingona“behavioralaxis”,whileremovingotheracousti- in the manifold space with all known samples of negative/positive callyencodedinformation. behaviors,andclosestmatchcanbeselected. (ii)Wealsocollecta Onereasonableassumptionisthatthebehavioralstateofaper- rangeofsamplesfrompoliticalspeeches,stand-upcomedyetc.and son is slow varying (note that behavior changes much slower than compare their pairwise similarity. Details of the datasets are pro- emotional expression despite the close relations between the two). videdbelow. Thismeansthatbylookingataveryshortintervalofbehavior(say 5s)andafollowinginterval(saynext5s),wewillmostlikelyobserve thesameoraverysimilarbehavioralstate.Basedonthisassumption 3. CORPUS wewillcreateamodelthatexploitscontextandtiesthetwointervals viatheproposedreduceddimensionalityembeddingvectorspace. Fortheunsupervisedtrainingprocessweutilizetwocorporainour We acknowledge and expect the following complication with paper: the above assumption: the nearby information frames also encode • Ti:Forin-domainBSPdata(Train-in-domain:Ti)weemploy speakercharacteristicsaswellasacousticconditionssuchasenvi- thecouplestherapydatabasebyUCLA/UWCoupleTherapy ronmentandchannel.WewilldiscussthisfurtherinSection5. ResearchProject[19],inwhich134coupleswereinvolvedin video-taped marital issue interactions. In each session, one relationship-relatedtopic(e.g.,“Whycan’tyouleavemystuff 2.1. Trainingframework alone?”) wasinitiatedduringthespeechsession. Although Ourproposedtrainingframeworkissimilartoanautoencoder, but notusedforthetraining,behaviorallabelsexistforthiscor- rather than just training to reconstruct the input our system trains pus. toreconstructneighboringframes. AsshowninFig.1,forthekth • To: For out of BSP domain training dataset, we collected frameofacousticfeatures,theoutputsareframesfromk−w,k+w around 400 hours of audio from a range of movies. Many excludingthekthframe,wherewisthesizeofthewindowinwhich oftheselectedmoviesincludelargepartsofemotionalcon- weconsiderbehavioralcontexttoremainrelativelyconstant.Bycre- versionsreflectingarangeofbehaviors. atingsuchanunsupervisedcorpuswecantrainsimilarlytostandard DNNtaskswithbackpropagation,thuslearningtheunderlyingbe- Fortestingwealsoemploytwodatasets: havioralmanifoldrepresentation. • Eo: For out of domain evaluation (Eo) data, we collected audiofromtwodifferentspeakersforeachofthefollowing 2.2. Behaviormanifoldrepresentation scenarios:stand-upcomedyofcomedianswhoemployanger Afterthetraining, weusetheoutputofthebottlenecklayerasthe asanelicitationmechanism(seeTable1),comedianwithout behavior representation. In general, the dimension of the hidden angry behavior, political debate, Ted talk, eulogy. Each layerissmallerthandimensionoftheoriginalfeaturespace,sothis audio’slengthisaround10minutes. processcanbealsoregardedasafeaturedimensionalityreduction orcompressionprocess. • Ei:WithintheBSPdomainweemploythelabelsofourcou- plestherapydata. Eachparticipant’sbehaviorwasevaluated by trained human annotators for a set of 33 behaviors(e.g., 2.3. Evaluation “Acceptance”, “Blame” etc.) based on standard Couples Sinceweemployanunsupervisedmethodintrainingourmodel,we Interaction[20]andSocialSupportRatingSystems[21].Each needtodemonstratethatrepresentationsindeedincludebehaviorin- annotatorrated1-9foreachbehavioratsessionlevelinterms formation.Weintendtodothisondifferentevaluationdata:(i)From of the presence of this behavior. In this work we show re- thefieldofpsychologywewillemployasacasestudyCouplesTher- lationshipswith4ofthebehaviorsbybinarizingthetopand apyinteractionsandwewillcompareunderlyingrepresentationsof bottom20%oftheoriginalratings. 4.4. EvaluationMethodforIn-DomainCouplesTherapy Table1.OutofDomaintestdata Asmentionedbefore,wehaveonlysession-levelratingsforthecou- ples therapy corpus. For each behavior code and each gender, we 1.GeorgeCarlin;2.RichardPryor;3.JimGaffigan;4.Steve selected 70 sessions on one extreme case of this code (e.g., high Hofstetter5.FinalRepublicanPresidentialDebate,20156.Vice acceptance)andanother70sessionsattheotherextreme(e.g.,low PresidentialDebate20127.TEDtalk:KevinSlavin;8.Christopher acceptance). We binarize the behavior to provide evaluation class Steiner9.EulogyforaSon(youtube)10.Mr.LiHongyi’sEulogy labelsandachievehigherinter-annotatoragreement.Forthecouples forthelateMr.LeeKuanYew therapydatasetwewilluseasupervisedevaluationprocedure,even thoughthebehavioralmanifoldhasbeentrainedinanunsupervised manner. 4. EXPERIMENTS After we obtain the latent manifold representation for each frame,weusetheEuclideandistancetofindtheclosest“reference 4.1. Audioprocessing frame”,whichisthenearestframeamongallthelabeledframesfrom For couples therapy data, due to the limitations of the available differentcouples. Theleave-one-couple-outtestprocedurecanen- recordings,somepre-processingisneededtoremovesessionswith sureafairevaluationwherethespeakercharacteristicswillnothave lowSNR.Further,sincethesearedyadicinteractiondata,wewanted any impact during testing. Further, we use the corresponding ses- todiarizetheinteractions. Inthiswork,weemployedthesamepre- sionbehaviorlabelasthereferenceframe’slabel. Then,weemploy processingasin[3].Inshort,weutilizeallinteractionswithanSNR majorityvotingtogeneratesessionlevellabelsfrommultipleframe above5dB,performVoiceActivityDetection(VAD)toidentifyspo- levellabels. ken regions, and Speaker Diarization to identify same-speaker re- gions. 4.5. EvaluationMethodforOODdata For the movie dataset we did not perform any pre-processing procedure,thustreatingallframesthesame,includingsilence,mu- Unlikethecouplestherapydata, ourOODdonothaveanylabels. sic,andchangingspeakerregions. Theevaluationdatawasselectedhowevertoreflectdifferentbehav- ioralstyles.ForinstanceasseenfromTable1,apoliticianspeaking 4.2. Acousticfeatureextraction duringadebateisexpectedtobeverydifferentinbehavioralstyle fromastand-upcomedian,butsimilartoanotherpolitician. Inthis We extract acoustic features characterizing speech prosody (pitch, casewepresenttheresultsoftheclustering: whatframewasclose intensity and their derivatives), spectral envelope characteristics towhatasapercentage.Thispercentagescoreimpliesthesimilarity (MFCCs, MFBs, LPCs and their derivatives), voice quality (jitter, betweentwoaudioframes. shimmerandtheirderivatives). AlloftheseLow-LevelDescriptors (LLD)areextractedevery10mswitha25msHammingwindowus- ingopenSMILE[22].Withineachframe,wecomputefunctionalsof 5. RESULTSANDDISCUSSION alltheseacousticfeaturesincludingMin(1stpercentile),Max(99th percentile),Range(99thpercentile–1stpercentile),Mean,Median, 5.1. Testingonwithindomaincorpus andStandardDeviation. Baseline of couples’ behavior classification As a baseline system Temporalvariationofbehaviorismuchslowerthanbasicemo- weuseanearestneighborbehaviorclassificationintheacousticfea- tion,thusalongersizeofframewindowisnecessaryforitsanaly- turespaceattheframelevel,andsimilarlyto4.4usemajorityvoting sis.Inourpaper,inordertoestimatemeaningfulbehavioralmetrics togeneratesessionlevellabels. Theresultsofthisbaselineclassifi- while maintaining high resolution, we use a 20s and 5s windows cationmethodareshowninbothTable2andTable3,whichareonly with1sshift. slightlybetterthanrandomguess. Thisresultsuggeststhatoriginal acousticfeaturesarenotaneffectivecandidateforbehaviorrepre- 4.3. ExperimentalSetup sentation. Further training is needed in order to extract behavior We conducted three different types of experiments, using different informationfromhighdimensionalacousticfeatures. trainingorevaluationdatasets.Inourexperiments,theinputdimen- ComparisonofwithinandOODtrainingInordertocomparewithin sionofourfeatureis420asdiscussioninsection 4.2.Thedimension andOODtraining,weconductexperiment(1)and(2)onbehavior ofthebottlenecklayerissettobe64. Togeneratethetrainingdata, code Acceptance. To be consistent with our precious work[4, 9], foreachframe,thewindowrangewissettobe6,werandomlypick a 20s frame size is chosen. Because of limited in-domain dataset up5contextframeswithincontextwindowasreconstructionlabels (Ti) size, we build a neural network with only 2 hidden layers in forthisframe. Forinstanceforaninputofaudiofrom100-120sthe thatcase. Whentrainingwithout-of-domaindata(To),sincemore contextcouldbeanyfiveframesfrom(100+i)−(120+i)where trainingdataisavailable,weemployaneuralnetworkwith5hidden i=[−6,−5,...,6]. layersof300,200,64,200,300nodesrespectively. Threeexperimentsaredescribedasfollows: FromtheresultsinTable2,bothTiandTotrainingmethodsbeat theperformanceofbaseline, whichshowsthatouraudio2behavior • Experiment (1): Unsupervised training on couples therapy frameworkisaneffectivewaytoprojectthesignalonamoremean- corpus(Ti),andevaluateonCouplestherapycorpus(Ei). ingful behavioral manifold in an unsupervised manner and reduce • Experiment (2): Unsupervised training on movie corpus the feature dimensionality. As expected, in-domain training per- (To),andtestonCouplesTherapycorpus(Ei). formsbetterthanthatfromOODone. Thisisreasonable, sincein • Experiment (3): Unsupervised training on movie corpus termsofspeechpatternsandacousticcharacteristics,thereisabig (To),andtestonaudiosessionslistedintable1representing gapbetweenthemovieandCouplestherapycorpus,andimportantly differentbehaviorstyles(To,Eo). thecouplesdataarefar-field,lowqualityrecordingswhilethemovie Selected Comedy Comedy Debate Ted talk Eulogy Table2.Classificationaccuracy(%)ofbehavioracceptance TrainonCouples’ TrainonMovies Input 1 2 3 4 5 6 7 8 9 10 Baseline 20swindowsize 20swindowsize 1 0.00 0.44 0.13 0.19 0.04 0.05 0.05 0.03 0.05 0.02 Comedy 57.5 69.29 66.43 2 0.38 0.00 0.19 0.13 0.03 0.03 0.04 0.06 0.10 0.05 3 0.17 0.24 0.00 0.24 0.06 0.04 0.06 0.05 0.09 0.04 Comedy 4 0.31 0.18 0.18 0.00 0.06 0.02 0.07 0.05 0.07 0.05 Table 3. Classification accuracy (%) for behaviors with different 5 0.12 0.08 0.19 0.12 0.00 0.21 0.13 0.09 0.05 0.02 framewindowsize Debate Trainon Trainon 6 0.14 0.08 0.14 0.08 0.16 0.00 0.09 0.12 0.10 0.09 Behavior Baseline Movies20s Movies5s 7 0.07 0.08 0.13 0.11 0.05 0.06 0.00 0.31 0.11 0.08 Ted talk windowsize windowsize 8 0.07 0.09 0.08 0.08 0.05 0.07 0.24 0.00 0.19 0.12 Acceptance 57.50 66.43 68.57 9 0.08 0.13 0.10 0.09 0.03 0.03 0.10 0.16 0.00 0.29 Eulogy Blame 55.00 61.07 71.78 10 0.08 0.05 0.08 0.08 0.01 0.03 0.05 0.16 0.47 0.00 Negtivity 63.93 63.93 69.64 Positivity 51.07 65.00 66.43 Fig.2.Behaviorscenariosimilarityevaluationresults Average 56.88 64.11 69.11 dataareusuallyhigherqualitysignals.Themismatchofthetraining numberoftotalframesofinputaudio. ResultsareshowninFigure andtestsetsisminimalwhenbotharefromthesamedomain. 2. Comparison of frame length The OOD training result in Table 2 Ideally, audio from similar scenarios should exhibit high sim- ispromising,especiallyforout-of-domaindataset,sincewedonot ilarity with each other, and a lower score should be assigned be- performanypre-processingprocedures,suchasVADordiarization. tweenlessrelatedscenarios. Wefindthat9outof10audiosamples Becauseofthat,aswementionedin4.1,thereshouldbesomenon- are classified as we hoped based on majority vote on frame level speechparts, suchasbackgroundmusic, silence, aswellasmulti- clustering. Moreover, besidesclassification, results showbehavior ple sources of noise besides human speech. In addition, within a similaritywithdetails: audio2behaviorcanshowbehaviorsimilar- largerframewindowsize,itisalsohighlyprobablethatspeechre- ity under different degrees. For example, we can see that 44% of gionswithineachwindowcomefrommultiplespeakers. Different datafromGeorgeCarlinareidentifiedassimilartoRichardPryor, speaker’scharacteristicsinoneframewindowmaycontaminatebe- wherebothcomediansemployanangrytoneintheirstand-upcom- haviorrelatedacousticrepresentation.Wetrytofindaproperwayto edyand19%and13%comefromSteveHofstetterandJimGaffigan, improvetheperformancebyreducingspeakercharacteristics. One also comedians that employ a milder tone in their routines. Less approachistoreducethelengthofframewindow. Wethushypoth- than5%ofthedataareassociatedwithanyoftheotherconditions. esizethatusingasmallerwindowsizethechanceofsingle-speaker Alltheseresultsshowpromisingbehavioralquantificationofourau- regionsineachframebecomeshigherandthusitshouldimprovethe dio2behaviormodel. audio2behaviormodelperformancebyloweringacousticcomplex- ity.Thisclearlyassumesthatthewindow,whilesmaller,isstilllong enoughtocapturethebehavioralcharacteristics. Weemployexperiment(2)onmultiplebehaviorswithdifferent 6. CONCLUSIONANDFUTUREWORK frame lengths to verify this hypotheses. From the results in table 3, we can see there is significant improvement, a 5% absolute in- Data sparsity is always a critical issue in behavior related studies. crease from 64.11% to 69.11% in terms of classification accuracy. Behavior recognition research suffers from expensive data annota- Moreover,forallfourbehaviors,aconsistentimprovementisnoted tionprocessandlowinter-annotatoragreement,whichalsolimitsthe on 5s frame length acoustic feature. This shows that consistency performanceofautomatedbehaviorrecognitionsystem. Compared withineachacousticspeechframeregionmightbeonecriticalissue withpreviousexistingsupervisedbehavioralrecognitioninBSPdo- inaudio2behaviorsystem,andencouragesdiarizationasafrontend main, our audio2behavior provides another possible solution can- pre-processingstep. Weshouldnoteherethatforcomplexhuman didate: transferoutofdomainknowledgeintotraining, thenadapt behavior annotation process, evenfor humanannotators, theinter- themodelintodomainapplications. Thisunsupervisedtrainingap- annotator agreement can only reach about Krippendorffs α = 0.8 proachofvectorizingabstractbehaviorfromaudioandthenobtain- [6],andsothe69.11%foracompletelyunsupervisedmethodwith ingbetterbehavioralquantificationinmanifoldshowsauspiciousre- justmajorityvoteattheoutputisveryencouraging. sultsandapplicationsinbehavioralsignalprocessingdomain. Ingeneral,theseresultsarepromisingforcommunicativebehav- iorquantificationsinceweonlyutilizeunlabeled,any-domaindata Inthefuture,inspiredbyresultsofthispaper,weplantoemploy andtraininanunsupervisedmanner. VADanddiarizationintothefrontendtobetterimprovethetraining of the audio2behavior model. This will reduce speaker character- isticsandacousticcomplexityinbehaviorrepresentationbyallow- 5.2. TestingonOODcorpus ingustodospeaker-specificnormalizations. Alternativelywecan employthespeaker-distinctregionsbutinajointandunsupervised As mentioned in section 3, we collect OOD test dataset from dif- mannerlearnbothaspeakerandbehavioralmanifold. ferentscenarioslistedinTable1. Ineachscenario,twoaudiofiles arecollectedfromdifferentspeakers. Weusenormalizedpercent- Moreover,unsupervisedbehaviorrepresentationmodelscanbe agescoretoevaluatebehaviorsimilarity. Thescoreiscalculatedby also employed into a range of applications for which training data dividingnumberofnearestframesineachselectedscenariobythe areunavailable,byquicklyallowingout-of-domainbootstrapping. 7. REFERENCES [13] TMikolovandJDean, “Distributedrepresentationsofwords and phrases and their compositionality,” Advances in neural [1] S.NarayananandP.G.Georgiou, “Behavioralsignalprocess- informationprocessingsystems,2013. ing: Derivinghumanbehavioralinformaticsfromspeechand [14] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean, language,”ProceedingsoftheIEEE,vol.101,no.5,pp.1203– “Efficientestimationofwordrepresentationsinvectorspace,” 1233,May2013. arXivpreprintarXiv:1301.3781,2013. [2] Panayiotis G. Georgiou, Matthew P. Black, and Shrikanth S. [15] Geoffrey E Hinton and Ruslan R Salakhutdinov, “Reducing Narayanan, “Behavioral signal processing for understanding thedimensionalityofdatawithneuralnetworks,”Science,vol. (distressed) dyadic interactions: Some recent developments,” 313,no.5786,pp.504–507,2006. in Proceedings of the 2011 Joint ACM Workshop on Human Gesture and Behavior Understanding, New York, NY, USA, [16] PierreBaldi, “Autoencoders,unsupervisedlearning,anddeep 2011,J-HGBU’11,pp.7–12,ACM. architectures.,”ICMLunsupervisedandtransferlearning,vol. [3] Matthew P. Black, Athanasios Katsamanis, Brian R. Bau- 27,no.37-50,pp.1,2012. com,Chi-ChunLee,AdamC.Lammert,AndrewChristensen, [17] YoshuaBengio, AaronCourville, andPascalVincent, “Rep- PanayiotisG.Georgiou,andShrikanthS.Narayanan,“Toward resentation Learning: A Review and New Perspectives,” automating a human behavioral coding system for married arXiv.org,June2012. couples’interactionsusingspeechacousticfeatures,” Speech [18] YoshuaBengio,“Deeplearningofrepresentationsforunsuper- Communication,vol.55,no.1,pp.1–21,2013. visedandtransferlearning.,”ICMLUnsupervisedandTransfer [4] WeiXia,JamesGibson,BoXiao,BrianBaucom,andPanayi- Learning,vol.27,pp.17–36,2012. otisGGeorgiou, “Adynamicmodelforbehavioralanalysisof coupleinteractionsusingacousticfeatures,” inSixteenthAn- [19] Andrew Christensen, David C Atkins, Sara Berns, Jennifer nual Conference of the International Speech Communication Wheeler,DonaldHBaucom,andLoreleiESimpson, “Tradi- Association,2015. tionalversusintegrativebehavioralcoupletherapyforsignif- icantlyandchronicallydistressedmarriedcouples.,” Journal [5] Panayiotis G. Georgiou, Matthew P. Black, Adam Lammert, ofconsultingandclinicalpsychology,vol.72,no.2,pp.176, BrianBaucom,andShrikanthS.Narayanan, ““That’saggra- 2004. vating, veryaggravating”: Isitpossibletoclassifybehaviors incoupleinteractionsusingautomaticallyderivedlexicalfea- [20] CHeavey,DGill,andAChristensen,“Couplesinteractionrat- tures?,” inProceedingsofAffectiveComputingandIntelligent ingsystem2(CIRS2),” UniversityofCalifornia,LosAngeles, Interaction (ACII), Lecture Notes in Computer Science, Oct. vol.7,2002. 2011. [21] JJonesandAChristensen, “Couplesinteractionstudy:Social [6] Shao-YenTseng,SandeepNallanChakravarthula,BrianBau- support interaction rating system,” University of California, com, and Panayiotis Georgiou, “Couples behavior modeling LosAngeles,vol.7,1998. and annotation using low-resource LSTM language models,” [22] Florian Eyben, Felix Weninger, Florian Gross, and Bjo¨rn inProceedingsofInterspeech,SanFrancisco,CA,September Schuller, “Recent developments in opensmile, the munich 2016. open-sourcemultimediafeatureextractor,” inProceedingsof [7] B.Xiao,P.Georgiou,B.Baucom,andS.S.Narayanan,“Head the 21st ACM International Conference on Multimedia, New motionmodelingforhumanbehavioranalysisindyadicinter- York,NY,USA,2013,MM’13,pp.835–838,ACM. action,” IEEETransactionsonMultimedia,vol.17,no.7,pp. 1107–1119,July2015. [8] A.Metallinou,R.B.Grossman,andS.Narayanan, “Quantify- ingatypicalityinaffectivefacialexpressionsofchildrenwith autismspectrumdisorders,” inMultimediaandExpo(ICME), 2013IEEEInternationalConferenceon,July2013,pp.1–6. [9] HaoqiLi,BrianBaucom,andPanayiotisGeorgiou, “Sparsely connectedanddisjointlytraineddeepneuralnetworksforlow resourcebehavioralannotation:Acousticclassificationincou- ples’therapy,” inProceedingsofInterspeech,SanFrancisco, CA,September2016. [10] KunHan,DongYu,andIvanTashev, “Speechemotionrecog- nition using deep neural network and extreme learning ma- chine.,” inInterspeech,2014,pp.223–227. [11] D. Le and E. M. Provost, “Emotion recognition from spon- taneousspeechusinghiddenmarkovmodelswithdeepbelief networks,” inAutomaticSpeechRecognitionandUnderstand- ing(ASRU),2013IEEEWorkshopon,Dec2013,pp.216–221. [12] GuoguoChen,CarolinaParada,andTaraNSainath, “Query- by-example keyword spotting using long short-term memory networks,” in2015IEEEInternationalConferenceonAcous- tics,SpeechandSignalProcessing(ICASSP).IEEE,2015,pp. 5236–5240.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.