ebook img

Egocentric Activity Recognition with Multimodal Fisher Vector PDF

0.81 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Egocentric Activity Recognition with Multimodal Fisher Vector

EGOCENTRICACTIVITYRECOGNITIONWITHMULTIMODALFISHERVECTOR SiboSong(cid:63),Ngai-ManCheung(cid:63),VijayChandrasekhar†,BappadityaMandal†,JieLin† (cid:63)SingaporeUniversityofTechnologyandDesign,†InstituteforInfocommResearch,Singapore ABSTRACT 6 With the increasing availability of wearable devices, research on 1 egocentricactivityrecognitionhasreceivedmuchattentionrecently. 0 In this paper, we build a Multimodal Egocentric Activity dataset 2 whichincludesegocentricvideosandsensordataof20fine-grained n and diverse activity categories. We present a novel strategy to ex- a tracttemporaltrajectory-likefeaturesfromsensordata. Wepropose J to apply the Fisher Kernel framework to fuse video and temporal enhancedsensorfeatures.Experimentresultsshowthatwithcareful 5 2 designoffeatureextractionandfusionalgorithm,sensordatacanen- hanceinformation-richvideodata. Wemakepubliclyavailablethe (a) (b) ] MultimodalEgocentricActivitydatasettofacilitatefutureresearch. M Fig.1:(a)Sampleframesfromvideos(b)Activitycategories Index Terms— Multimodal Egocentric Activity dataset, Ego- M centricactivityrecognition,MultimodalFisherVector . Previous work has investigated egocentric activity recognition s c 1. INTRODUCTION usingeitherlow-dimensionalsensordataorhigh-dimensionalvisual [ data. To the best of our knowledge, there is no previous work to Egocentricorfirst-personactivityrecognitionhasattractedalotof studythepotentialimprovementwithusingboththesensorandvi- 1 attention. Theever-increasingadoptionofwearabledevicessuchas sual data simultaneously. To address this research gap, we make v GoogleGlass,MicrosoftSenseCam,AppleWatchandMibandfa- severalnovelcontributionsinthiswork. 3 cilitateslow-cost, unobtrusivenesscollectionofrichegocentricac- First,webuildandmakepubliclyavailableachallengingMul- 0 6 tivitydata.Thesedevicescanmonitoractivitiesandgatherdataany- timodal Egocentric Activity dataset1 that consists of 20 complex 6 timefromanywhere.Theegocentricactivitydataishelpfulinmany and diverse activity categories recorded in both sensor and video 0 applicationsrangingfromsecurity,healthmonitoring,lifestyleanal- data. The sensor and video data are recorded in a synchronized . ysistomemoryrehabilitationfordementiapatients. and integrated manner to allow new algorithms to explore them 1 Researchonautomaticegocentricactivityrecognitionhasbeen simultaneously. Second, we propose a novel technique to fuse 0 focusing on using two broad categories of data: low-dimensional the sensor and video features using the Fisher Kernel framework. 6 sensordataandhigh-dimensionalvisualdata.Low-dimensionalsen- Fisher kernel combines the strengths of generative and discrimi- 1 sordatasuchasGPS,light,temperature,directionoraccelerometer native approaches[13, 14]. In this work, we propose a generative : v datahasbeenfoundtobeusefulforactivityrecognition[1,2,3,4,5]. modelforthesensorandvideodata. Baseonthemodel,weapply i [2] proposes features for egocentric activity recognition computed Fisherkernelframeworktocomputemultimodalfeaturevectorsfor X fromcell-phoneaccelerometerdata. Theyreportedover90%accu- adiscriminativeclassifier. Wereferthedeterminedmultimodalfea- r racy for 6 simple activities. [3] reported more than 80% accuracy turevectorsasMultimodalFisherVector(MFV).Third,weperform a for9 activitieswith sensorslocatedat twolegs. Low-dimensional comprehensiveexperimentstocomparetheperformanceofdifferent sensordatacanbecollectedandstoredeasily,andthecomputational approaches:sensor-only,video-only,andthefusionofboth. complexityoftherecognitionisusuallylow. Therestofthispaperisorganizedasfollows. InSection2we presentourMultimodalEgocentricActivitydataset. Themethodol- More recently, there is a lot of interests to perform egocentric ogyisdescribedinSection3. Experimentalevaluationispresented activityrecognitionusinghigh-dimensionalvisualstreamsrecorded inSection4andweconcludetheworkinSection5. fromindividuals’wearablecameras. Comparedtolow-dimensional sensordata,visualdatacapturesmuchricherinformation:scenede- tails,peopleorobjectstheindividualinteracts,forexample. There- 2. MULTIMODALEGOCENTRICACTIVITYDATASET fore, several egocentric video datasets and approaches have been proposedtorecognizecomplexactivities. Amongthem,somepre- Torecordmultimodalegocentricactivitydata,wecreatedourown viousworksfocusonextractionofegocentricsemanticfeatureslike applicationthatrunsonGoogleGlass. Thisapplicationenablesus object [6, 7], gestures [8] and object-hand interactions [9] or dis- torecordegocentricvideoandsensordatasimultaneouslyinasyn- criminative features[10]. Recently, trajectory-based approach [11] chronizedmanner.Thefollowingtypesofsensordataaresupported has been applied to characterize ego-motion in egocentric videos, ontheglass: accelerometer,gravity,gyroscope,linearacceleration, and encouraging results have been obtained for activity classifica- tion[12]. 1Dataset:http://people.sutd.edu.sg/1000892/dataset magneticfieldandrotationvector. Thesesensorsareintegratedin thedeviceandtheycanhelpcapturingaccurateheadmotionofindi- vidualswhentheyareperformingdifferentactivities. TheMultimodalEgocentricActivitydatasetcontains20distinct life-loggingactivitiesperformedbydifferenthumansubjects. Each activitycategoryhas10sequences. Eachcliphasexactlyaduration of15seconds. TheactivitycategoriesarelistedinFig.1b. Sample framesofwritingsentences,organizingfilesandrunningareshown in Fig.1a. These categories can be grouped into 4 top level types: Ambulation, DailyActivities, OfficeWork, Exercise. Subjectshave beeninstructedtoperformactivitiesinanaturalandunscriptedway. Fig.2: Illustrationoftemporalenhancedtrajectory-likesensorfea- The dataset has the following characteristics. Firstly, it is the tures.(Colorofwindowsindicatessimilarpatterns) firstlife-loggingactivitydatasetthatcontainsbothegocentricvideo and sensor data which are recorded simultaneously. Secondly, the datasetcontainsconsiderablevariabilityinscenesandillumination. of10samplestoextractatrajectoryoflengthequalsto10andthen Videosarerecordedbothindoorandoutdoor,withsignificantchange movethewindowoneframeforwardandextractanothertrajectory. intheilluminationconditions. Forinstance,thewalkingroutesare Thisprocesscouldbedoneuntilendofthefileandthuswecould invariousenvironments,suchasresidentialareas,campus,shopping generatemanytrajectoriesforsensordata. mall,etc. Thuseventhoughdifferentsubjectsdothesameactivity, TemporalenhancedsensorfeaturesGenerally,inimagepro- theirbackgroundvaries. Thirdly,webuildataxonomybasedonthe cessing,Fishervectorencodingisanorderlessapproachbecauseit categoriesshowninFig.1b. All20categoriescanbegroupedinto4 represents an image as an orderless collection of features. While, topleveltypes. Thisallowsevaluationofnewvisualanalysisalgo- forsensordataprocessing,thetemporalorderoftrajectory-likedata rithmsagainstdifferentlevelsoflife-loggingactivitygranularity. orpatternscouldbesignificantlydifferentinrecognizingtwoactiv- ities. However,wedonotwanttimestampofeachtrajectorytobe includedsincepatternsdonotholdexactcorrespondenceintimese- 3. METHODOLOGY quencesforoneactivity. Ifwesimplyaddtimestampasonemore dimensional data, noises could also be added into sensor features. Inthissection,wedescribeegocentricvideoandsensorfeatureex- Becauseessentiallytheprecisetimingofapatterndoesnotmatter. traction. Furthermore,wepresentasimplebuteffectivewaytoen- While,insomecases,theorderofpatternsorthestageofpatternoc- codetemporalinformationinsensorfeatures.Intheend,wepropose currenceplaysanmajorroleindifferentiatingtwodistinctactivities. aMultimodalFisherVectorapproachforcombiningvideoandsen- Toovercomethedifficulty,wesegmentthetime-seriesdatainto sorfeatures. severalstagestoavoidfine-grainedwindows. Andthentheseseg- ments are indexed starting from 1 which represents their temporal 3.1. Videotrajectoryfeatures ordersasFig.2shows. Thenthetrajectory-likefeaturesinsideeach partisassociatedwiththeirnormalizedtemporalorder. Itturnsout Weevaluatestate-of-the-arttrajectory-basedactivityrecognitionon tobeeffectivewiththisquantizationstrategyforactivityrecognition egocentricvideosinourowndataset.Thedensetrajectoryapproach usingsensordata. weusedhasalreadybeenappliedtothird-personviewactivityrecog- nitionin[11]. Densetrajectoryapproachisappliedinsteadofim- provedtrajectoryapproach[15]becausedensetrajectorycouldcap- 3.3. MultimodalFisherVector turemoreego-motioninformationinegocentricvideos. In this section, we construct a Multimodal Fisher Vector (MFV) Dense trajectories are obtained by tracking densely sampled whichutilizestheFisherKernelframeworktocombinevideotrajec- points using optical flow fields. Foreground motions, background toryfeaturesandtemporalenhancedsensorfeaturesextractedfrom motions and head movements are extracted using this approach. aclip.FisherKernelhasbeenintroducedtocombinethebenefitsof Several kinds of descriptors are computed for each trajectory and generativeanddiscriminativeapproaches[13].Theideaistocharac- theircharacteristicsarediscussedin[11].Trajectoryisaconcatena- terizesignalswithagradientvectorderivedfromapdfwhichmodels tionofnormalizeddisplacementvectors. Theotherdescriptorslike thegeneratingprocessofthesignal.Inourcase,thesignalisthetu- MBH(motionboundaryhistograms)arecomputedinthespace-time plef =(x,s),xbeingthevideotrajectoryfeatureandsbeingthe volumealignedwiththetrajectory.Thesedescriptorsarecommonly associatedtemporalenhancedsensorfeature. Thegenerativemodel usedforactivityrecognitionnowadays.ThenweappliedFisherker- off willbediscussedlater. nels on trajectory features as in [12], where the trajectory features Let p be a pdf with a parameter set λ. Under the framework, arerepresentedbymeansofaGaussianMixtureModel(GMM). one can characterize N features F = {f ,n=1,··· ,N} with n thefollowinggradientvector: 3.2. Temporalenhancedtrajectory-likesensorfeatures ∇ logp(F|λ) (1) Inthissection,weestablishanovelstrategytoconvertsensortime- λ seriesdataintotrajectory-likefeature. Andthentemporalorderis To illustrate MFV, we first consider video features code- introducedtoenhancesensorfeature. book generated by a Gaussian Mixture Model (GMM). X = Trajectory-like features In video analysis, dense trajectories {x ,n=1,··· ,N} denotes the set of trajectory features ex- n arecomputedbytrackingdenselysampledpointsusingopticalflow tracted from videos and λ the set of parameters of the GMM. fields. Similarly,wefirstlyconverttime-seriesdataofsensorsinto λ = {θ ,µ ,Σ ,i=1,··· ,K}whereθ ,µ ,andΣ denote i xi xi i xi xi trajectory-likedata.Soaslidingwindowisrequiredtogeneratetra- respectivelytheweight,meanvectorandcovariancematrixofGaus- jectoryofsensordata(SeeFig.2). Forexample,webuildawindow sianiandK denotesthenumberofGaussian. Weassumediagonal Σ . The likelihood that the observation x was generated by xi n GMMis: K (cid:88) p(x |λ)= θ p (x ) (2) n i i n i=1 (cid:80) theweightsaresubjecttotheconstraint: θ =1.Thecomponents i p aregivenby: i exp(−1(x−µ )(cid:48)Σ−1(x−µ )) p(x|ω=i)= 2 xi xi xi (3) (2π)D/2|Σ |1/2 xi (a) (b) whereD isthedimensionalityofthevideofeaturevectorsand|•| denotesthedeterminantoperator. Fig.3: ComparisonofFVSandTFVSwith(a)differentw(b)dif- (cid:80) Weparameterizeθi =exp(αi)/ jexp(αj),whichavoidsen- ferentk forcingexplicitlytheconstraints. Withq todenotetheposterior ni p(ω = i|x ),orresponsibility,andx todenotex −µ ,the n ni n xi gradientsofthelog-likelihoodforasingletrajectoryfeatureare, 4. EXPERIMENTALEVALUATION ∂logp(x ) n =q −θ (4) ∂α ni i Inthissection,wefirstdescribeourexperimentsetup. Inaddition, i we study the parameter setting for temporal enhanced trajectory- ∂logp(x ) ∂µ n =qniΣ−xi1xni (5) likesensorfeature. ThenweevaluatetheperformanceofourMFV xi methodonourdataset.Wepresentanddiscusstheresultsforactivity ∂logp(x ) n =q (Σ −x2 )/2 (6) recognitiontaskusingMFV. ∂Σ−1 ni xi ni xi x2 denoteselement-wisesquare.Therepresentationisobtainedby ni 4.1. Experimentsetup averagingthesegradientsoverallx . n To generate MFV, we apply a single Gaussian as the genera- Inourexperiment,thedimensionofvideosis320×180andframe tivemodelforsensorfeatureswiththesamelabelω. Toassignω rateis10fps. Forsensordata, intotal, thereare19dimensionsof tosensorfeatures, wefirstregardGMMasthebasicbag-of-words sensor data including accelerometer, gravity, gyroscope, linear ac- modelbylabelingthevideofeaturepointsaccordingtotheirlargest celerationandmagneticfieldandrotationvector. Andinallcases, p(ω = i|xn). It means we use the sensor patterns which has the wecollectedsensordatain10Hz. samestartingtimewithvideotrajectoriesinvideocodebooktobuild ThenumberofGaussianissettoK =25forvideodataandwe sensorcodebookinsteadofusingGMM.Theintuitionisthatthevi- randomly sample a subset of 1% of all samples to estimate GMM sualtrajectoriescanbeassociatedwithsensortrajectory-likefeatures forbuildingcodebook. Thedimensionalityofthefeatureisreduced byestablishingrelationshipbetweentwomodalities’codebooks. In byhalfusingPCA.Finally,weapplypowerandL2normalization other words, one single Gaussian of sensor trajectory-like features to MFV as in [16]. The cost parameter C = 10 is used for lin- is built based on each Gaussian of video trajectory features when ear Support Vector Machine (SVM) and one-against-rest approach constructingthecodebookforsensorfeatures. isutilizedformulti-classclassification. Weuselibsvmlibrary[17] Eachsensorfeaturecouldcorrespondtomultiplevideotrajecto- toimplementtheSVMalgorithm. ries,becausealotofvideotrajectoriescouldbegeneratedforeach frame. Soweusemaxpoolingtoassignlabelωtothesensordata. Eachvectoristhenrepresentedasthetupleb = (ω,s),whereωis 4.2. Parametersetting thelabelandsisthesensorfeatureassociatedwithtemporalorder. Here,weconductanexperimenttoselectthewindowsizeandnum- Wethendefineagenerativemodeloverthetupleas, berofclustersinGMMprocesssincetheseparametersarecrucialin p(b)=p(ω)p(s|ω) (7) trajectory-likesensorfeaturesgeneration. p(ω=i)=θ (8) We choose a moderate number of segments which is 4 for in- i exp(−1(s−µ )(cid:48)Σ−1(s−µ )) dexing temporal orders. In Fig.3 we show that with window size p(s|ω=i)= 2 si si si (9) w=3andnumberofclustersk=4wecouldobtainthebestresult. (2π)(d+1)/2|Σsi|1/2 WecomparetemporalenhancedsensorFishervectors(TFVS)and It can be shown that the gradients of the log-likelihood of sensor normalsensorFishervectors(FVS)withdifferentparameters.Itin- featureshavesimilarrepresentationasvideotrajectoryfeatures,i.e., dicatesthatusuallyTFVSoutperformsFVSwithcarefully-selected (5), (6). We assume that the video feature x and sensor feature s parameters. AndthentoevaluateMFV,wealsochoosew = 3for are independent conditioning on ω. We use the following as the sensortrajectory-likefeature. generativemodelforf =(x,s): An important observation here is that usually we do not need (cid:88) largewandk. Thesettingw = 3isabletoprovideenoughstatis- p(f)= θip(x|ω=i)p(s|ω=i) (10) tics of different patterns. We notice that k = 4 achieves the best i performance for both FVS and TFVS which makes intuitive sense p(x|ω=i)isdefinedexactlythesameasin(3)andp(s|ω=i)asin sincesensordatahasonly19dimensionsineachrecordingwhich (9).TheMFVhassize(1+2(D+(d+1)))K =(2D+2d+3)K aremuchsmallercomparedtovideofeaturesand,therefore,itdoes where D is the reduced dimensionality of video trajectory feature notneedalargektoencodetheinformation. Thisisdesirable,be- usingPrincipalComponentAnalysis(PCA),and(d+1)thereduced causethesmallerkused,thelessstorageGMMmodelstakeandthe dimensionalityoftemporalenhancedsensortrajectory-likefeature. fasterthecomputationgoeswhenutilizingTFVSforencoding. Table1:Comparisonofdifferentmethods SVM2 DecisionTree2 FVS TFVS Accuracy 47.75% 51.80% 65.60% 69.00% FVV FVV+FVS FVV+TFVS MFV Accuracy 78.44% 80.45% 82.20% 83.71% 4.3. EvaluationoftemporalenhancedsensorFisherVector (a) (b) (c) WeevaluatetheproposedTFVSonsensordatainourdataset. To providecomprehensiveanalysisofdifferentapproaches,wefurther Fig.4:Confusionmatricesof(a)TFVS(b)FVV(c)MFV compare the results of the proposed approach with FVS and other methodsonactivityrecognition. Fig.4aprovidestheconfusionmatrixonclassifyingsensordata wecanfindthattheredoesexistsomepatternsforheadmovement usingTFVS.TheindicesofactivitiesareshowninFig.1b.Wenotice while making phone calls which is a very interesting observation. thatTFVSperformswellasexpectedforsomelow-levelactivities AndforworkingatPC,webelievethatvideotrajectoryapproachis likeAmbulationandExercise. Whileforotherhigh-levelactivities confusedbysomemovementsonthemonitors. Theseresultsagain likeorganizingfiles,theaccuraciesarearound50%only.Webelieve confirmthatourMFVapproachperformedbetterthanFVVanditis this is due to the fact that sensor data can only capture the head moreaccurateandstable. motionandcannotanalyzethedetailsofsceneorobjects.Notethat, From Table 1, we are able to observe that our MFV performs TFVSoutperformsFVSwhichconfirmsthatthetemporalenhanced superiortoconcatenationofFVVandFVSandevenconcatenation methodiseffective. of FVV and TFVS which has an accuracy of 82.2%. The results Comparisonwithfeature-basedapproachonsensordataWe suggest that MFV is a better representation to combine video and evaluateasystemthatusesphone-basedaccelerometersandfeature- sensordata. Intuitively,thesingleGaussianmodeltakesadvantage basedtechniquefrom[2]tocomparewithourTFVS.Featuresare ofdetailedsensorstatisticsineachclusterofGMMbuiltforvideo extractedbycalculatingstatisticslikeaveragevalue,standarddevi- trajectoryfeatures. Furthermore,thereisapossibilityofextending ation and designing other features like time between peaks. Then singleGaussianmodeltomultipleGaussianmodelsinordertoim- informativefeaturesbasedonrawaccelerometerreadingsaregener- provetheaccuracy. ated. WechoosedecisiontreesandSVMtoevaluatethesefeatures. The results are reported in Table 1. We can see that our approach 4.5. Computationalcost outperforms feature-based approach with both SVM and decision treeclassifiers. Becausethesefeaturesaremostlyglobalstatistics. FromTable1,wecanseethatbyencodingtemporalenhancedsen- Whileusuallyonlysomerepeatedpatternsofthesignalthatcantell sorfeatures,wecouldimprovetheaccuracyfrom78.4%to83.7%. whatactivityitis. OurTFVSmakesuseofwindowstoextractthe AndthisishelpfulespeciallywhencomputingFishervectoronsen- interestingparts.Andwithtemporalorders,itcangetmoredetailed sordataisextremelyfastsinceitsvolumeisquitesmallcomparedto informationforthesepatterns. videodata. Inpractical,generatingtrajectoryfeaturesandcalculat- ingFVonallvideosinthedatasettakesabout3hours,whileitonly takes less than 1 minute for all sensor data. And with only TFVS 4.4. EvaluationofMultimodalFisherVector wecouldobtainanaccuracyof69.0%evenfor20categories,which WeconductexperimentstoconfirmsuperiorityofourMFVrepre- alreadysatisfiestheneedsofsomeapplications. sentationoverbothFVVandTFVS,aswellasasimpleconcatena- Therefore,thereisnosingleperfectanswer. Dependingonpar- tionofthesetwovectors. WealsodiscussthebehavioroftheMFV ticularapplicationandrequirement,wecouldchoosedifferentsolu- onourdataset,particularlyonsomecategories. tions.MFVusuallyprovidesthebestperformance,whileitisexpen- The classification accuracy of normal video Fisher vectors siveforprocessing. Insomecases, TFVScanachievesatisfactory (FVV) is reported in Table 1. By examining the performance of performancewithlimitedcomputationalability. different categories using FVV, we can find that the accuracy of makingphonecallsis28.57%whichisthelowestinFig.4b.Videos 5. CONCLUSIONS of making phone calls activity are often mis-classified into sitting anddrinking.Infact,formakingphonecalls,itishardtorecognize We built a challenging Multimodal Egocentric Activity dataset evenforhumanbeingssincethereishardlyanyobjectappearance whichisrecordedwithvideosandsensordata.Thedatasetcontains inthescene. avarietyofscenesandpersonalstyles, capturingthediversityand Fig.4cshowstheconfusionmatrixforrecognizingactivitiesus- complexityoflife-loggingactivities. Weperformedadetailedeval- ingMFV.Actually,itlookssimilarwithconfusionmatrixforFVV. uation of state-of-the-art trajectory-based technique on videos and Whiletherearestillimprovements,becausesensorscanprovideac- achieveaccuracyaround78%. Weproposeanoveltechniquetoin- curatedataforslightheadmotion.Bycomparingconfusionmatrices corporatetemporalinformationintotrajectory-likesensordata. Our ofFVVandMFV,weobservethatthelargestimprovementsoccur TFVS outperforms the basic feature-based approach significantly inmakingphonecalls(51.1%),workingatPC(22.1%)andwalking and also the FVS. More importantly, we propose to apply Fisher downstairs(12.0%).Formakingphonecalls,itisdifficultforvisual- Kernel framework to fuse sensor and video data. We improve the basedapproachestoclassifysincethereishardlyanyobjectappear- performancefrom78.4%to83.7%byutilizingMFVforegocentric ance in subjects’ view. And with temporal enhanced sensor data, activityrecognitionatalowcost. 2feature-basedapproachin[2] 6. REFERENCES [15] Heng Wang and Cordelia Schmid, “Action recognition with improvedtrajectories,”inComputerVision(ICCV),2013IEEE [1] HamedRezaieandMonaGhassemian,“Implementationstudy InternationalConferenceon.IEEE,2013,pp.3551–3558. ofwearablesensorsforactivityrecognitionsystems,” Health- [16] FlorentPerronnin,JorgeSa´nchez,andThomasMensink, “Im- careTechnologyLetters,vol.2,no.4,pp.95–100,2015. provingthefisherkernelforlarge-scaleimageclassification,” [2] Jennifer R Kwapisz, Gary M Weiss, and Samuel A Moore, inComputerVision–ECCV2010,pp.143–156.Springer,2010. “Activityrecognitionusingcellphoneaccelerometers,” ACM [17] Chih-Chung Chang and Chih-Jen Lin, “LIBSVM: A library SigKDD Explorations Newsletter, vol. 12, no. 2, pp. 74–82, forsupportvectormachines,”ACMTransactionsonIntelligent 2011. SystemsandTechnology,vol.2,pp.27:1–27:27,2011. [3] Donghai Guan, Weiwei Yuan, Young-Koo Lee, Andrey Gavrilov,andSungyoungLee, “Activityrecognitionbasedon semi-supervisedlearning,” inEmbeddedandReal-TimeCom- puting Systems and Applications, 2007. RTCSA 2007. 13th IEEEInternationalConferenceon.IEEE,2007,pp.469–475. [4] Uwe Maurer, Asim Smailagic, Daniel P Siewiorek, and MichaelDeisher, “Activityrecognitionandmonitoringusing multiplesensorsondifferentbodypositions,”inWearableand ImplantableBodySensorNetworks,2006.BSN2006.Interna- tionalWorkshopon.IEEE,2006,pp.4–pp. [5] AdilMehmoodKhan, AliTufail, AsadMasoodKhattak, and Teemu H Laine, “Activity recognition on smartphones via sensor-fusionandkda-basedsvms,” InternationalJournalof DistributedSensorNetworks,vol.2014,2014. [6] HamedPirsiavashandDevaRamanan,“Detectingactivitiesof dailylivinginfirst-personcameraviews,” inComputerVision andPatternRecognition(CVPR),2012IEEEConferenceon. IEEE,2012,pp.2847–2854. [7] AlirezaFathi,XiaofengRen,andJamesMRehg,“Learningto recognizeobjectsinegocentricactivities,” inComputerVision andPatternRecognition(CVPR),2011IEEEConferenceOn. IEEE,2011,pp.3281–3288. [8] Sang-Rim Lee, Sven Bambach, David J Crandall, John M Franchak, and Chen Yu, “This hand is my hand: A probabilistic approach to hand disambiguation in egocentric video,” in Computer Vision and Pattern Recognition Work- shops(CVPRW),2014IEEEConferenceon.IEEE,2014,pp. 557–564. [9] AlirezaFathi,AliFarhadi,andJamesMRehg,“Understanding egocentricactivities,” inComputerVision(ICCV),2011IEEE InternationalConferenceon.IEEE,2011,pp.407–414. [10] BappadityaMandalandHow-LungEng, “3-parameterbased eigenfeatureregularizationforhumanactivityrecognition,” in AcousticsSpeechandSignalProcessing(ICASSP),2010IEEE InternationalConferenceon.IEEE,2010,pp.954–957. [11] HengWang,AlexanderKla¨ser,CordeliaSchmid,andCheng- LinLiu, “Densetrajectoriesandmotionboundarydescriptors foractionrecognition,” InternationalJournalofComputerVi- sion,vol.103,no.1,pp.60–79,2013. [12] Sibo Song, Vijay Chandrasekhar, Ngai-Man Cheung, Sanath Narayan,LiyuanLi,andJoo-HweeLim, “Activityrecognition inegocentriclife-loggingvideos,” inComputerVision-ACCV 2014Workshops.Springer,2014,pp.445–458. [13] TommiJaakkola, DavidHaussler, etal., “Exploitinggenera- tivemodelsindiscriminativeclassifiers,” Advancesinneural informationprocessingsystems,pp.487–493,1999. [14] FlorentPerronninandChristopherDance, “Fisherkernelson visualvocabulariesforimagecategorization,”inComputerVi- sionandPatternRecognition,2007.CVPR’07.IEEEConfer- enceon.IEEE,2007,pp.1–8.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.