M.I.TMediaLaboratoryPerceptualComputingSectionTechnicalReportNo. 309 AbbreviatedversionappearsinFifthInternationalConferenceonComputerVision,Pp. 624-630,CambridgeMA,1995 Recognition of Human Body Motion Using Phase Space Constraints LeeCampbell AaronBobick RoomE15-383,TheMediaLaboratory MassachusettsInstituteofTechnology 20AmesSt.,Cambridge,MA02139 (cid:0) [email protected] [email protected] Abstract cabularyofsome800namesofsteps(aswellasseveralnotation languages);thevocabularyhasbeenusefulinballetforoverahun- Anewmethodforrepresentingandrecognizinghuman dredyears,thusitisattheappropriatelevelofdetailforhuman bodymovementsispresented.Assumingtheavailabil- reasoningand communication; and human observerscan easily ityofCartesiantrackingdata,wedeveloptechniques provide ground truth identifications against which to check the forrepresentationofmovementsbasedonspacecurves computerrecognition. insubspacesofa“phasespace.” Thephasespacehas The description of human motion is one component of the axesofjointanglesandtorsolocationandattitude,and generalproblemofvideoannotation. Entertainmentcompanies, theaxesofthesubspacesaresubsetsoftheaxesofthe newscasters, and sports teams are acquiring ever larger video phasespace. Usingthis representationwedevelopa databases. Ifthesecanbe(semi-)automaticallyscannedandan- systemforlearningnewmovementsfromgroundtruth notated,thevideocanbeorganized,catalogued,cross-referenced, data by searching for constraints which are in effect and searched for keywords. Powerful text search tools can be duringthemovementtobe learned,andnotineffect appliedoncethemassofsignalshasbeenannotatedusingavo- duringothermovements.Wethenusethelearnedrep- cabulary in which people naturally describe the domain. One resentationforrecognizingmovementsindata. attractiveaspectofthisscenarioisthatthespeedofannotationis Prior approaches by other researchers used a small relativelyunimportant. numberof classificationcategories, whichdemanded Inthispaperwepresentamethodforrecognizingclassicalbal- lessattentiontorepresentation. Wetrainandtestthe letstepsfromthree-dimensionalpointdata. Weareusing3Ddata systemonninefundamentalmovementsfromclassical asinput, as opposedto video, becausewe wishto focus onthe ballet performedby twodancers. The system learns recognitiontask,notthegeometryrecovery.Therehasbeenmuch and accurately recognizes the nine movements in an previous work on recovering 3D models from 2D motion; our unsegmentedstreamofmotion. questionisgiven3Ddata,howdoyouseeaplie´? Theapproach wedevelopisbasedontheideathatdifferentcategoricalmove- 1 Introduction mentseachembodya differentset of constraintsonthe motion ofthebodyparts;theseconstraintsaremostlyeasilyobservedin Untilrecently,computervisionanalysisofimagesequencesfo- aphase-spacethat relatestheindependentvariablesof thebody cused predominantly on issues of geometry, either the three- motion.Bylearningwhichsetsofconstraintsarehighlydiagnos- dimensionalgeometry of a scene or the geometric motion of a ticofparticularmotionswecanbuildconstraintsetdetectorsto movingcamera(see[18]forreview). Lately,however,therehas recognizethemovements. been an effort at getting computers to understand action or be- havior[7]. Inthispaperwepresentatechniqueforrecognizing 2 Symbolicunderstandingofhumanmotion humanmotionindomainsinwhichthereareclearlydefinedse- manticcategoriesofmovement. Theproblemofunderstandinghumanbodymotionfromimages Therearenumerousdomains—athletics,dance,surveillance is one that leads into such diverse areas as dynamics, athletics, —inwhichtheactionorbehaviorofobjectsismuchmoreseman- and cognitive science. Since our primary motivation is video ticallyrichthanthestaticconfigurationofthescenecomponents. annotation, we focus on a symbolic description of action that Toboundthescopeoftheproblem,toestablishperformancecri- translatesthecontinuousdomainofhumanmotionintoadiscrete teria,andtobecertainthatweareextractingsemanticcategories sequenceofsymbols. thataremeaningfultopeoplewerequirethatthedomainalready haveawelldefinedvocabularyfordescribingaction. Sincepeo- 2.1 Constrainedhumanmotion: balletasexample ple use that vocabulary for describing the action we know that Withfewexceptions(e.g.fingermanipulation)humanbodymo- suchadescriptioncapturestheimportantrelevantsemantics. Ifit tionhasmanyconstraints.Someoftheconstraintscomefromthe didnot,thevocabularywouldnotprovideareasonablebasisfor lawsofphysics: armsdonotseparateattheshoulderandpeople communication. Finally, a well definedvocabularyreducesthe needtomaintainbalancetostayofftheground. Otherconstraints, difficultyinestablishinggroundtruth;peopleshouldfinditeasy whichcomefromtherulesofanathleticorartisticform,wewill provideabaselineagainstwhichasystemcouldbetested. callculturalconstraints. Thefundamentalideaoftheworkpre- Balletisagoodtestbedforworkinunderstandinghumanbody sentedhereisthatbyrecognizingthepresenceoftheseconstraints movement because it has the requisite features: there is a vo- itispossibletorecognizethemotions. (cid:1) This work was supported in part by a grant from Interval Gaitsareanexampleof highlyconstrainedmotion. Physical Research constraintsinducearhythmicandrepetitivepatternofmotion[23]. Fromacomputationalvisionperspectivetheseconstraintsareso each time; if they were, standard recognition techniques based strongthatseveralresearcherstreatedwalkingasaperiodic1DOF onacompletespatio-temporalgeometricaldescriptioncouldbe motionforpurposesfindingthelimborientations[17,28],orthe exploited. When steps occur in sequences, the beginnings and identity[24]ofawalkingperson. endsare modifiedto transitionto the neighboringstep. This is analogoustoco-articulationinspeech.Anothersourceofvariation Thedomainofballet iscalled“extension:”theheighttowhichalegisliftedortheheight Ballet is a domain rich with both cultural and physical con- of a jump. Because of variations in style and bodies, different straints. Classicalballetismadeupofafinitenumberofdiscrete dancersmaydosomemovementswithlessextension;sometimes movementsorsteps. Peoplewithexpertisecaneasilynameand thechoreographermayutilizethespecialabilitiesofadancerand recognizemostofthemandeachofacollectionofexpertstypi- callfora“bravura”formofthemovementwithmoreextension. callywillgenerateidenticaldescriptions.Inadditiontothenamed stepsthereareclassificationsorcategoriesofstepsandwrittenno- 2.2 Definingtheproblem tationlanguagesforrecordingballetandotherformsofdance[15, 22]. Thegoodagreementamongexpertsindicatesthatthemoves Asmentionedintheintroduction,manyresearchersareworkingon arequitedistinctandformareasonablesetforacomputertotry recoveringthethree-dimensionalgeometryofmovingpeople(e.g. todiscriminate. see[9,28]). Fortheworkhereweassumethatthree-dimensional There are three major schools of classical ballet: Russian, position of some body points are available and the goal is to French,andCecchettiorItalian. Sincesomestepsareperformed recognize the ballet steps from that data. Our system uses a differentlyin differentschoolsthere areabout 300 or 400 steps commerciallyavailablesystem[3]toprovidevideo-rate3Ddata perschool andabout 800 to 1000 stepstotal. Major categories ofapproximately20markersattachedtoahumanbody. ofstepsthatoccurinperformanceincludestationaryposes,turns Ourspecifictaskistolearnandidentifyasetofnineatomicbal- andpirouettes,linkingstepsandmovements,jumps,andbeats(a letmovementsfromtheXYZtrackingdata.Theninemovements, flutteringofthelegsduringajump). Stepsthatoccurinpractice allperformedstartingfromaflat-footedpositionwithlegsturned orclassandduringwarmupincludereleve´s,battementsandronds outandwiththerightlegworkingandtheleftlegsupporting,are dejambe. Mostofthestepscanbestartedfromanyoffivefoot showninFigure1anddescribedbelow: positions(Althoughfirst,fourth,andfifthpositionsarefarmore 1. plie´ lowering the torso bybending the knees, thenrising common). Manyofthestepscanbedoneeitheronpointesoron backup(usedwhenlaunchingandlandingeverytwo-legged demi-pointes. Thereare8armpositionsintheCecchettischool, jump,andwithmanyothersteps); sixintheFrench,andfourintheRussian. Thereismorefreedom forthearmstobeexpressiveandtotakearangeofpositions,and 2. releve´ risingupontheballsofthefeetandthenlowering notallarmpositionsoccurwithallsteps[14,32]. (preparationformanysteps;aplie´orreleve´ispartofalmost everyballetstep); Thepracticestepsareinterestingbecausetheytendtobesimple movementswithone degreeoffreedomwhicharepiecesofthe 3. tendua´ laseconde slidingthefoottothesideandbending more complex movements. Thus they are a sort of syllable or theankle as needed to maintain toe contact with the floor phonemeofdanceandwecallthem“atomic”movements. They (preparationformanysteps); includeplie´,releve´,tendu,de´veloppe´,andbattement. Manysteps 4. de´gage´ raisingthelegrapidlyabout45(cid:0) totheside(partof involveonesupportlegandtheotherlegistermeda“working” manystepsinvolvingtransferofweightortraveling); leg. The physicalconstraintsof ballet arethe mostobvious ones. 5. fondu a plie´ on the supporting leg while the workingleg As in all well controlled motions, maintenance of support and bendsatthekneeandpointsthetoedown(afluidstepfre- balance is essential. This constraint has a profound effect on quentlyusedbyitself); wholebodymovementandconstrainsmostofthemotionsofthe 6. frappe´ raising the workingfoot vertically bybending the major masses of the body to one intrinsic degree of freedom. kneeandhipuntilthehipmakesa45(cid:0) angle,thenrapidly As an example from outside ballet, consider the movement of straighteningthekneeandankletokicktotheside(partof sittingdown: peoplebendattheankle,knee,hipandspinewhen manyjete´s(one-leggedjumps)andassemble´s); they sit, and though there are many possible combinations of anglesthatwilllandtheminachair,mostpeoplesitwithsimilar 7. de´veloppe´ like frappe´ but the foot is raised until the hip movements. Itislikelythissimilarityoccursbecausepeopletry makesa90(cid:0) anglebeforestraighteningtheknee(frequently to maintain their balance and comfortably support their torsos usedbeforeapromenade,orbyitself,especiallyinpartner- duringthe movement. Analogously,when twodancers execute ing,toextendthelegouthorizontally); thesameballetstep,thephysicalconstraintsrequireahighdegree 8. grandbattementa´laseconde withthekneestraight,raising ofsimilarity. thelegtothesideuntilthefootisatshoulderheight; Ballet is alsosubject to numerous cultural constraintswhich 9. grand battement devant same as battement a´ la seconde increasethesimilarityofstepfromoneexecutiontothenext,from exceptthelegisraisedtothefront(bothgrandbattements onedancertothenext.“Placement”and“technique”aretheterms can be used by themselves in allegro passages instead of forthecorrectpositionsofthelimbsandwaysofmoving. They de´veloppe´). arelearnedoneballetstepatatime,andarewelldefinedforeach move. Theexecutionofthestepsisalsosubjecttotheconstraint Thesestepswerechosenforthefollowingreasons: of “grace:” it is better to do a movement with less extension gracefully than with large extension that shows straining. The (cid:1) this gives leverage for recognizing more complex steps if severalpartsofthestepareknown; setofphysicalandculturalconstraintsresultinhighlyrepeatable movements. (cid:1) severalof themovements havesimilaritieswhich will test Ofcourseballet stepsarenotexecutedexactlythesameway thesystem’sabilitytomakefinedistinctions; 2 plie´ releve´ tendu de´gage´ fondu frappe´ de´vel- oppe´ grand batt. side grand batt. front Figure1: Nine“atomic”balletmoves. 3 (cid:1) severalofthemovementsarequitesimpleandthussuitable and cluster the curves. Gould and Shah [13] find features on forbeginningdevelopmentofarepresentation. curvesfromtworepresentationsbasedontrajectories: onecurve isXandYvelocityvstime; theothercurveisdistancevslocal Wewill usetwomeasuresofsuccess. Thefirst isastandard curvature. Thesefeaturesetscanthenbecomparedtopreviously patternrecognitionevaluationfunctionwhere,givengroundtruth, representedmotion. we measure the percentage of time instances in which the sys- Perhapstheworkmostsimilartothatpresentedhereisthatof temcorrectlyidentifiesdancestepinprogress. Amorerelevant Shavit[29,30]. Thistechniquerepresents“qualitativevisualdy- performancemeasureistheabilityofthesystemtogeneratethe namics”usingaphasespace,theclassicaltechniqueforanalyzing correct annotation, announcing the same sequence of symbols thedynamicsofasystem. PhasespaceisaEuclideanspacewith (dancesteps)thatahumanobserverwouldgenerate. axesforalltheindependentvariablesofasystemandtheirtime derivatives.Eachpointinphasespacerepresentsastateofthesys- 3 Relatedwork tem,andasthesystemevolvesovertimeitmovesalongaphase path. Forautonomoussystems—systemsinwhichparameters Indescribingrelatedworkwefocuson(1)thepsychophysicsand suchas spring constantsdo not change withtime and to which structure-from-motionworkthatdemonstratestheplausibilityof energyisnotbeingadded—onlyonephasepathpassesthrough thesolvingthespecificproblemofrecognizingmotionfromthe eachpointinthespace. Insuchsystems,knowledgeofposition positionof 3D points; and (2) previous attempts at recognizing andvelocityofeachvariableatonetimeinstantcompletelydeter- specificmotions(foramorecompletediscussionofrelatedwork minesthebehaviorofthesystem. Simpledynamicsystems(e.g. see[8]). massonaspring,orviscousdamping)haveeasilycharacterized In 1973 Gunnar Johansson [19] published results from a se- phaseportraitsinphasespace. riesofpsychophysicalexperimentswhichprovedthatpeoplecan InShavit’swork,abinaryregionintheimageisdescribedasa recognize movements of humans from Moving Light Displays blobofadeformableelasticmaterial. Overtimetheblobdeforms, (MLDs) — 2D images of a small number of moving spots at- andthedeformationsarerepresentedbyatimevaryingdeforma- tachedtothejoints. Laterresults(e.g.[20,5,10])demonstrated tion parameter vector. The image motion is then described as theabilityofhumanstorecognizemovementssuchashammering, a trajectory through the phase space that is a function of these boxlifting,ballbouncing,andstirring,andtwo-personactivities parameters. Using training examples of images of known mo- suchasdancing,greetingandboxing. tion,Shavitconstructedphasediagrams,andsegmentedtheminto Early computational vision results (e.g. [31]) proved basic sectionswith simple dynamics. Finally, imagery such as black structure-from-motion theorems demonstrating it is possible to silhouettesofwalking,running,andjumpingcartooncharacters, recoverthethree-dimensionalgeometrybehindMLD’s. Hoffman movinglightdisplaysofactorsperformingthesamegaits,andim- andFlinchbaugh[16]andWebbandAggarwal[33]exploitedthe agesderivedfromfilmsofrunninganimalswerecharacterizedby factthatbiologicalmotionmostlyinvolves1DOFrotationjointsto intermsofcombinationsofthesesimpledynamicsections. With computethethreedimensionalmotionofrestrictedhumanmove- thesecharacterizationsShavitwasabletoidentifycharacteristics ment. More recentwork [35, 12] havefocusedon exploitinga aboutthegaitswhicharesuitableforidentificationofthegaits.No modelintheinterpretationofMLD’s. attempthasyetbeenmadetoidentifyalargersetofmovements The psychophysics results combined with the 3D-geometry ormotionclasses. methods raisesthe issue of whether people perform the taskof Beforeconcludingourdiscussionofphase-spacemethodswe motionrecognitionfromMLD’susingtwo-dimensionalorthree- notethatinnon-autonomoussystemsspecifyingthepositionand dimensionalinformation.Subjectscouldeitherrecognizethemo- velocityoftheindependentvariablesatonetimeinstantisnotad- tiondirectlyfromtheimagepositions,orfirstcompute3Dinfor- equatetodescribethemotionthroughphasespace.Timevarying mation,andthenrecognizethemovement. However,becausethe parametersrequirethatthevaluesoftheparametersbespecified 2Dinputistriviallycomputablefromthe3Dinformation,there- atmanytimeinstances;inthelimitonewouldprovidecomplete sultscertainlyimplythatthetaskshouldbeeasytoperformgiven parametertrajectoriesasafunctionoftime. Insuchcases,spec- the3Dpositionofsufficientpoints. ifyingvelocityinformationislesscritical;approximatevaluesof Understandingbodymotion thevelocitycanbeinferredbyfinitedifferencingthetimesampled O’Rourke and Badler [25] tracked features in synthetic im- parametertrajectories. agery,usingconstraintpropagationtoimplementafeedbackloop Othermethods: HiddenMarkovModels(HMM’s)wereused betweenhigh-levelandlow-levelprocessesofimagemotionanal- byYamatoetal.[34]torecognizeimagesoftennisstrokesseen ysis. More recently [17, 1, 28] research has been focused on fromaparticularviewpoint. Atypicalpatternclassificationtech- ondeterminingcorrespondencebetweenframesofanimagese- niquesuchasMaximumLikelihood[11]wouldrequireasequence quenceandthephaseofaknownbodymotion.Theuseofamodel ofmovements tobe segmentedinto separatedancestepsbefore withknownmotion(andtypicallyaknownbackground)greatly thestepscould be classified. All thedifficulty in this approach simplifiestheproblemascertainkeyfeatures,suchasthefirstfew becomesconcentratedinthesegmentationphase. principalcomponentsofashapecontour,canbeusedasfeature vectorandcanbematchedagainstknownvectors. 4 RepresentationandRecognitioninPhaseSpace Additionaleffortshavedevisedmethodsofrepresentinggeneral (model-free)motion. Rangarajan,AllanandShah[27]developed Centraltothisworkistheproblemofhowtorepresentmovement. asystemtomatchtrajectoriesusingscalespaceimagesofcurves Twokindsofconsiderationsbearonthechoiceofrepresentation. ofspeedvstimeanddirectionvstime. PolanaandNelson[26] Oneisintrinsic,e.g.isitcanonicalandcanitbestablycomputed presentatexturalmethodfordetectingperiodicityinanXYTsolid fromrawdata. Theother,extrinsic,involvesuseoftherepresen- and, assuming a stationary camera, matching to other periodic tation—in this case, howwell doesit supportrecognition. In motionsandrejectingnon-periodicmotions. AllmanandDyer[2] thissectionwewillbrieflymotivateourchoiceofrepresentation, findopticflowinanXYTsolid,tracecurvesasafunctionoftime, explicateitsdetails,andconsideritsintrinsicproperties.Next,we 4 developarecognitionmethodbasedupontherepresentationthat 4.2 Representationconsiderations requireslearningthebestparameterstodescribeeachmotion. In Aspects of the general representation problem are discussed in section5wepresenttheresultsofourrecognitionexperiments. MarrandNishihara[21]andBadlerandSmoliar[4]. Marrand Nishihara consider representations of three dimensional shapes 4.1 RepresentationDetails for object recognition and present three criteria: accessibility, Considerseveraloccurrencesofamovement;e.g.aplie´,inwhich scopeanduniqueness,andstabilityandsensitivity.Thesecriteria, thelegs repeatexactlythesame motion, whilethe armsbehave however,areasrelevanttorecognizingmotionsandweevaluate differentlyeachtime;andconsiderplottingallthemovementsin ourrepresentationinlightoftheseconsiderations. afullphasespaceofallthetorsopositionandattitudeparameters Therepresentationisclearlyaccessible;itiseasytocompute andjoint angles, and alltheir derivatives. Plottingonly theleg jointanglesfromtheraw Cartesiantrackingdata. Inourappli- variableswhileholdingthearmvariablesconstantwouldshowa cation,scopeanduniquenessrefertotheclassofmovementsfor spacecurve,andeachrepetitionofthemovementwouldtraverse whichtherepresentationisdesignedandwhethermovementshave approximatelythatsamespacecurve. However,ifthearmvari- canonicaldescriptionsintherepresentation.Sinceapointinbody ablesareplottedaswell,eachrepetitionmaytraverseadifferent phasespacecompletelydeterminesthestateofallbodyparts,two spacecurve. Theintuitionhereisthattheinvarianceoftheplie´— coincidentphasepathsrepresentthesamemotion. Furthermore, thesetofconstraintsineffectduringitsexecution—iscaptured thephasevariablesareuniquelydeterminedbythebodygeometry inthelegvariables,whilethevariationisinthearmparameters: sothattwoidenticalmotionsmaptwoidenticalphasepaths. theconstraintofthemotioncanbefoundinasubspaceofthefull However,sincetwodifferentdancerswillprobablynottraverse phasespace. identicalphasepaths,aquestionstillremains: cantwomotions Figures2and3illustratethis idea. Eachfigureshowsa2D- judged to be the same by a human observer also be placed in projectionofphasespaceinwhichacollectionofdatapointsare thesamecategorybyasystemusingthisrepresentation?Inother plotted.Thedatapointscorrespondingtotwodancersperforming words,isthereawaytorepresentthecommonalitybetweenmove- aplie´aremarkedby (cid:0) ’sand ’s;allotherdatapointsaremarked ments,aswellasthedifferences? bya“”Notethatthepointsco(cid:1)rrespondingtotheplie´andthereleve´ Byselectingonlyasubspaceofbodyphasespacewecanig- arenic(cid:2)elysegregatedfromtheotherdatapoints.Thisintuitionthat nore dimensions that are not constrained by a movement. As theconstraintsineffectduringamotionarevisibleinasubspaceof mentioned,thearmscan beexcludedwhenassessingwhethera phasespacemotivatesourchoiceofrepresentationformovement. moveisaplie´. Equallyimportant,iftemporalvariation—varia- Define “body phase space” to be a space with axes for each tioninthespeedatwhichamoveisexecuted—istobeabstracted, independentparameterofthebodymodelandtheirfirstderivatives thenallthederivativeaxesofthebodyphasespaceshouldbeex- butnoindependenttimeaxis.Leta2D-projectionspacebeatwo- cluded. In principle, the learning algorithm in Section 4.4 can dimensionalsubspaceofbody phasespacespanned byanytwo learntoignoreallthederivativedimensions;fortheresultspre- axesoftheoriginalphasespace. Figures2and3eachshowtwo sentedinSection5weexplicitlyremovedfromconsiderationthe examplesof2D-projectionspaces. Wecanspecify2D-projection velocityaxesofthephasespace. spacecurvesbydefiningcurvesintheplaneofthe2D-projection Stabilityandsensitivity: Doesdegreeofsimilarityintherep- space. And,finally,werepresentmotionbyacollectionofthese resentationreflectsimilarityinthemotions? Andcansubtledif- 2D-projectionspacecurves. Ifconsideredinthefullphasespace, ferencesbe expressedintherepresentation? The representation thecollectionofthese2D-projectioncurvesdefineanintersection iscapableofexpressingdegreesofsimilaritybysomemetricof ofmanifolds,eachdefinedbyonecurve. distancebetweentwopathsinphasespace. However,theEuler Continuing with the plie´ example, suppose we draw a curve anglerepresentationof 3DOFjointssuchas hipsand shoulders throughthe (cid:0) ’sand ’sineachofthetwo2D-projectionspaces hassingularitiesatcertainorientations,andnearthesesingulari- offigure2. Ifweloo(cid:1) selysaythat“currentlyperformingaplie´” tiesthesensitivitybecomesunacceptablyhigh(nearsingularity, isequivalenttothestatevariablesbeingnearbothofthehypoth- an arbitrarily small change in the tipof a vector can cause two esizedcurves,then,inthefullphasespace,wehavedefinedthe oftheEuler anglesintherepresentationtomakemaximal180(cid:0) intersection of the R knee/Zscaled manifold with that of the swings). Onesolutiontothissensitivityproblemistousemulti- R ankle/Z scaledtodefinea plie´. Wehavealsocoarselyout- pleredundantsetsofEuleranglestorepresent3DOFjoints,and linedastrategyforrecognizingamotion: Ateachtimestepthe to do represent a motion using the angle set which is not near stateofthesystemcanberepresentedasapointinphasespace. singularity.1 The recognition system described in the next sec- We conditionally accept a point as being part of a recognized tion automatically selects the stable angle set to represent and movementifitiswithinathresholddistancefromeachofthe2d- recognizeagivenmotion. projectionspacecurvesusedtodefinethemotion. Wepostpone Quaternionswerealsoconsideredforrepresenting3DOFjoints. anyfurtherdiscussionofrecognitionuntilweafterconsiderthe However,anadvantageofEuleranglesforrepresentingballetdata intrinsicpropertiesoftherepresentation. is related to missing data. When the ankle marker is obscured Beforecontinuingwementionsomeofthespecifics. Forrec- but hip and knee markers are visible, only partial data can be ognizingballet stepsweusecubicpolynomialsto formthe2D- recoveredforhiporientation(atmosttwooutofthethreeEuler projectioncurves. Alowordercurveimpliesthatasinglemove angles depending on the representation). For Euler angles the cannotrequiretoocomplicatedamotion,agreeingwithourintu- partial data can be computed and used for recognition, but for itionthatasingleactionis“simple.” Ofcourse,anycontinuous quaternions no values can be computed from the partial data. curveparameterizationis possiblesuchas piecewiselinearseg- ThusEuleranglesaremorerobustinthepresenceofmissingdata. mentsorb-splines. Also,fortheexamplesshown,thebodypa- rametersconsideredarejointanglesfor1DOFjoints(e.g.knee), 1Nearnessto singularitycan alwaysbe detected becausethe Euler angles for 3DOF joints (e.g. hip), torso orientation, and axisofthethirdEuleranglebecomesnearlyparalleltotheaxisof body-height-normalizedtorsoheightabovethefloorplane. thefirstEulerangle. 5 Phase Plot of Plie Phase Plot of Plie 0.5 2 1.8 0 1.6 1.4 −0.5 1.2 R_knee −1 R_ankle 1 0.8 −1.5 0.6 0.4 −2 0.2 −2.5 0 0.4 0.42 0.44 0.46 0.48 0.5 0.52 0.54 0.56 0.58 0.4 0.42 0.44 0.46 0.48 0.5 0.52 0.54 0.56 0.58 Z_scaled Z_scaled Figure2: (cid:0) and markstimestepsduringplie´sfortwodancers;“.” markstimestepsduringothermovements. Anglesareinradians. Thesetwoviewso(cid:1)fthephasespaceshowhowtheplie´liesinaregionseparatedfromtheothermovements(datasubsampledforclarity). Phase Plot of Releve Phase Plot of Releve 0.58 0.58 0.56 0.56 0.54 0.54 0.52 0.52 d d e e _scal 0.5 _scal 0.5 Z Z 0.48 0.48 0.46 0.46 0.44 0.44 0.42 0.42 0.4 0.4 −0.5 0 0.5 1 1.5 2 2.5 −1.5 −1 −0.5 0 0.5 R_hip_Phi R_hip_Y Figure3: (cid:0) and marktimestepsduringreleve´sfortwodancers;“.” markstimestepsduringothermovements. Twoviewsofreleve´. (cid:1) Onequestionstillremains: whyuseaphasespaceofjointan- moveateachtimestep. glesasopposedtosomeothersetofvariables?Fortherecognition The basicideais as follows. Ateachtimeinstance thestate methodto work, weneedsuccessiverepetitionsof amotion by ofthesystemcanbe representedasapointinphasespace. We different dancers to lie close to the same space curve in phase willacceptapointasbeingpartofarecognizedmovementifitis space,andthusneedtorepresentordescribemotionusingaset withina2D-projection-specificthresholddistanceofeachofthe ofvariableswhichwillgiveusthisproperty. Suchadescription 2D-projectionspacecurvesusedtodefinethemove. 2 needstobeindependentofpositionandorientationofthetorso, Letusdefinethefollowingterms: and it is certainly useful if the description of one limb is inde- pendentofthedescriptionsoftheotherlimbs. Jointanglesarea PairRelation: (cid:0)(cid:2)(cid:1)(cid:3) asmoothfunction (cid:5)(cid:6)(cid:4) (cid:3)(cid:8)(cid:7)(cid:9)(cid:0)(cid:10)(cid:1)(cid:3)(cid:12)(cid:11)(cid:13)(cid:5)(cid:14)(cid:1)(cid:16)(cid:15) where(cid:5)(cid:14)(cid:1) is simplerepresentationthathastheseproperties. aninputparameterand(cid:5) (cid:3) isapredictedparameter. Thisis (cid:4) acurvelyingina2D-projectionspace. 4.3 Recognitioninphasespace Threshold: a distance above and below a pair predictor (cid:17) Whatdoesitmeantorecognizeaballetstep?Fromtheperspective 2Inthecurrentimplementation,thethresholdisfixed,indepen- of annotation, recognition requires announcing the correct step dentofpositionalongthecurve. Thisisashortcomingwhichfails somewhereduringitsperformance,andnotannouncinganyother toacknowledgethefactthatvariationisafunctionofEulerangle steps. However, we begin the development of our recognition sensitivity,ofthedancer’sdegreeofcontrolinvariouspositions, systembyconsideringtheidentificationofthecurrentlyexecuted andofdifferencesinkinematicsofdancer’sbodies. 6 z z z z x x y y y y x x Starting attitude: Rotate leg Rotate leg Rotate leg femur vertical, about x about y about z knee bent Figure4: ThreesuccessiverotationsillustrateEuleranglesrepresenting3rotationaldegreesoffreedomofhipjoint. Coordinateaxes arerigidlyattachedtofemur,i.e.rotationsaremeasuredinframeofreferenceofleg. Ifthelegisstraightandtoepointedsuchthatall 4reflectorslieonaline,asingularityoccursandtheZanglecannotberecoveredfromdata. NearthissingularitytheZmeasurements sufferfromexcessivenoise. t(cid:0)h(cid:5) a(cid:3)(cid:2)t(cid:1)pre(cid:0) d(cid:1)(cid:3)ic(cid:11)(cid:13)t(cid:5)o(cid:1)r.(cid:15) (cid:0)(cid:4)(cid:3) (cid:17) whichboundstheacceptingregionof apnadpe(cid:19)rtfa(cid:11)k(cid:5)e(cid:15) thaerefothrme n(cid:14) u(cid:11)m(cid:5) b(cid:15) e(cid:7)(cid:16)r o(cid:15) fff(cid:11)a(cid:5) l(cid:15)se(cid:1)(cid:18)a(cid:17)(cid:20)cc(cid:19)epf(cid:11)ta(cid:5) n(cid:15) cweshearned(cid:15) ffa(cid:11)l(cid:5)se(cid:15) PairPredictor: abinaryfunctionoftime rejectionsrespectively,and(cid:17) isaweightfactor. (cid:5) (cid:1)(cid:3)(cid:12)(cid:11)(cid:7)(cid:6) (cid:15) (cid:7)(cid:9)(cid:8) 10 iofther(cid:0)w(cid:5) i(cid:3)(cid:2)se(cid:1) (cid:0) (cid:1)(cid:3) (cid:11)(cid:13)(cid:5) (cid:1) (cid:15) (cid:0)(cid:10)(cid:3) (cid:17) Comp(cid:27)(cid:29)ou(cid:7)(cid:31)(cid:28)nd(cid:30)P,r ed(cid:7)"(cid:28)icto! r,:aandfu#n(cid:27)%c$%t(cid:30)’io&(n(cid:7)(cid:28)(cid:5) (cid:1) (cid:11)(cid:7)#)(cid:6) (cid:15) $*(cid:7)!+& ;(cid:5)(cid:21)(cid:1)(cid:12)c(cid:3)r(cid:11)(cid:7)e(cid:6)a(cid:15)(cid:23)te(cid:22) d(cid:5)(cid:21)b(cid:24)(cid:26)(cid:12)y(cid:25) (cid:11)(cid:7)l(cid:6)o(cid:15) gwichaelrlye composedofasimplepairrelationandathreshold. anding together the smoothed predictions of two or more pairpredictors. Inthecurrentimplementation,apairrelationforagiven move- mentinagiven2D-projectionsubspaceisconstructedbyfittinga Notethattheapproachtorecognitiondescribedabovedefines cubicpolynomialthroughthedatapointsknowntobefromthat anindependentdetectorforeachdancestep;inprinciplethesede- motion. Note that since the pair relation (cid:0) (cid:1)(cid:3) induces an order- tectorscanbeexecutedinparallel. Thuswearenotconstructing ingofdimensions(e.g. predictingankle-anglefromknee-angle a pattern recognition system because their is no constraint pre- isdifferentthantheconverse)thereare (cid:11) (cid:11) (cid:11) (cid:1) 1(cid:15) possiblepair ventingthesameinputdataasbeingrecognizedastwoormore predictors,twiceasmanyastherearetwodimensionalsubspaces. differentdancesteps. The pair predictors may be viewed as each embodying one The described quantities establish a method for representing componentoftheconstraintimposedbyamovement. However, andrecognizingballet steps. Toapply this methodwe mustbe they really are static constructs in the sense that no temporal abletoconstructthenecessaryfunctionsfromthedata. continuityisconsidered.Evenwithoutconsideringafulldynamic modelofdance,itispossibletoincreasethereliabilityofthepair 4.4 Learningpredictors. predictor(cid:5) (cid:1)(cid:3) (cid:11)(cid:7)(cid:6) (cid:15) byincludingasimpletemporalfilter: Weemployasupervisedlearningparadigmtodeterminethe2D- SmoothedPredictor: afilter(cid:5)(cid:13)(cid:1)(cid:12)(cid:3) (cid:11)(cid:7)(cid:6) (cid:15) overthepairpredictionwhich projectionsubspaces,thepredictorcurves, and thresholdsto be eliminatesshortperiodsofacceptanceorrejectionwhichare usedtorepresenteachballetstep. Theinputdataisannotatedto lessthanasettimeconstant. indicate,foreachtimestep,whetheritispartofaparticulardance steporpartofanon-dancingpose. Tofindgoodidentifiers,the Finally, we need to combine pair predictors to represent all systemdoesahierarchicalsearch, successivelyconstrainingthe theconstraintsonamovement. Toselectwhichcombinationof regionofphasespaceinwhichthemovementlies. predictorsisbestatrepresentingamotion,weneedtodefinethe For each movement to be learned, the system considers all notionofpredictorefficiency. Hereweusethestandardpattern possiblepairpredictorsforthatmovement,findsthebestthreshold recognitioncriteriaofminimizingaweightedsumoffalseaccep- foreachpairpredictor(evaluatingpredictorsaccordingtoafitness tancesandfalserejections. Ourgoalistofindacollectionofpair function),andsavesthe , bestpair predictors. Severalof these predictorswiththegreatestefficiency: arethencombinedtoform acompoundpredictorforthemove. PredictorFitness: afunction(cid:14) (cid:11) (cid:5) (cid:1)(cid:12)(cid:3) (cid:11)(cid:7)(cid:6) (cid:15) (cid:15) usedtoevaluatepairpre- Centraltothisprocessisthefitnessfunction,whichmustevaluate dictorsandtoorderthem. Predictorfunctionsusedinthis thequalityofcorrelationrevealedbyapredictor. 7 4.4.1 CorrelationsandFitnessFunctions rateof 60 frames/sec., whichis convertedto joint anglesof six Correlationsintheworldleadtocategorizations. Correlations 1DOFandfour3DOFjointsinthearmsandlegs(seefigure5). comeintwotypes: alwaystrue,e.g.lawsofphysicsormath,and Thesystemwastestedonasetofnineballetstepswhichwere truewhenanitemisamemberofacategory.Thislattertypeisfor chosentobe“atomic”inthesensethattheyare“pieces”or“syl- classification[6]. The learningmethodtriestofindcorrelations lables”ofmorecomplexballetmovements. Trainingandrecog- betweenvariableswhichoccurduringamove. Buthowtofindthe nitiondatawasusedfromtwodifferentdancerswhoseheightsare secondkindofcorrelationswhichareusefulforcategorization? 157cmand173cmtotestwhetherthesystemcoulddodancer- Ourtechniqueistominimizeaweightedsumoffalseacceptances independentrecognition. Theinputsequenceswerecontinuous, andfalserejectionsofapredictor. Thefalserejectionpartofthe notsegmented,sothesystemcouldmakenoimplicitassumptions fitnessfunctionseeks a good correlationduringthe move. The aboutwhereadancestepbeginsandends. Finally,ourtestswere falseacceptancepartofthefunctionseeksanti-correlationduring resubstitutionestimatesinwhichthesamedataisusedfortrain- thenon-move. Thusthisrulefindsthesecondkindofcorrelations ingandfortesting;thischoicewasnecessitatedbyaninadequate andnotthefirst. amountoftestdata. Thelearningalgorithmproceedsbyfirstusingthefitnessfunc- Theparametersusedforlearningandrecognitionwere: right tiontoselecttheacceptancethresholdofagivenpairpredictor. A hip–(cid:0) , right hip– , right hip– : an Euler angle set shown in (cid:1) (cid:2) hierarchicalexhaustivesearchisperformedtochoosethethreshold Figure 4inwhichthe(cid:0) –axis(front–back)issuperior,the –axis thatmaximizesthefitnessofthepredictorforthecurrentmotion. (side to side) is the middle axis, and the –axis (along f(cid:1)emur) (cid:2) theNreexist,athreeassyosntaebmledneutemrmbeinre(s(cid:11) th(cid:11) (cid:11)eb(cid:1)est1(cid:15),)poafirpapirrepdricetdoircst.orSs,inthcee miseiansfuerreiosr;derfligehcttiohnip–o(cid:3)f:thpeafretmoufrafnrootmherveErutilcearl;anrigglhetskentewehainchd pairsaresearchedexhaustively,andthe, best,asmeasuredbythe ankleangles;andZ,theaverageheightofthehips,scaledforthe fitnessfunction, arefound. However,different fitnessfunctions dancer’sleglength. serve different goals. We found that compounding eliminates Asmentionedintheprevioussection,insensitivitytotemporal manymorefalseacceptancesthancorrectacceptances,soalower variationimpliesthatparameterderivativesshouldnotbeusedas penalty could be put on false acceptances if two way or three possiblecandidatesforpairpredictors. Tosimplifythelearning waycompoundingwasused thanifno compoundingwas used. taskwedidnotincludederivativedimensionsinthephasespace Theintuitionhereisthattrueacceptanceswillbebe correlated, tobesearched. sotheywilltendtobe acceptedafterlogicalANDing,butfalse Figure6showstherawdatainputtothelearningandrecognition acceptanceswillbeuncorrelatedandsowilltendtobeeliminated system.Eachcolumncontainsthedataforonedancerexecuting9 byANDing. Inpracticethefalseacceptancestendtobepartially balletmovesinsuccession.Thedifferenceinduration(16versus correlated,butANDingpredictionsusuallydramaticallydecreases 22seconds)showthedegreeoftemporalvariabilityinthedata. thefalseacceptanceratewhileonlyslightlyreducingthecorrect Thespikesinthedataaremissingdatawherethetrackingsystem acceptances. couldnotprovidethethree-dimensionalpositionofsomeofthe Notethattheabilitytofindfunctionscorrelatingwithcategory markers. membershipallowssomefreedominchoiceofrepresentation–we Asdeveloped,learningoccursinthreeareas:learningathresh- canallowmultipledifferentrepresentationsofthesamevariable. old for each pair predictor, learning which pair predictors are Although two representations of one variable may be perfectly “best”(i.e. maximizethefitnessfunction), andlearningacom- correlated at all times, they make a poor predictor of category poundpredictor. Thefirsttwokindsoflearningareimplemented membershipandsoarenotselectedbythelearningrule. Instead, bychoosingthethresholdorpairpredictorthatevaluateshighest thesystemchoosesfromamongthemultiplerepresentationsthe inthefitnessfunction,andthethirdisimplementedbyANDing onewhichpredictsbest. InthecasewheremultiplesetsofEuler the three best pair predictors. Thereare two parameterswhich anglesrepresenthiporientation,thebestpredictortendstobeEuler mustbesetmanually: thefitnessfunctionweightratio, andthe angleswhicharefarfromsingularityandthereforelessnoisy. smoothingtimeconstant. Theresultofthelearningprocessisamultidimensionaltube- Weevaluatetheresultsintwoways.First,wecanconsidereach like volume in phase space, formed by intersecting the (cid:11) (cid:1) 1 timestepindependently,analyzinghowmanyfalseacceptances dimensionalcurved-hyperplanesassociatedwitheachpairpredic- and false rejections each predictor generates. Second, we can tor. This result raises the question: why construct the volume countthenumberofpredictorerrorswhichlooksatthebehavior bycompoundinginsteadofdirectlyoptimizingallparameterssi- of the predictor withrespect to the time interval that makesup multaneouslytofindthebest, -dimensionalsubspacespacecurve the move. A predictor error occur if the predictor (1) outputs to represent the motion? The answer is that the compounding a detection interval during a different step or when no step is approach limits search. The fitness measure as a function of occurring;(2)outputsadetectionintervalmorethanoncewhen threshold does not necessarily have a single minimum (i.e. the thereisonlyoneoccurrenceofthe step; or (3)failsto outputa receiveroperatingcharacteristicisnotconvex),soalargerangeof detectionintervalduringastep. valuesofeachthresholdneedtobesearched. Ifmultiplethresh- Table 1 shows the number of timesteps falsely accepted and oldsweresearchedsimultaneously,itwouldleadtocombinatorial falselyrejected(FAandFR)withneithertemporalsmoothingnor explosion. Theapproachweuseisasequenceoflocaloptimiza- ANDing. This can be viewed as a model-free pattern recogni- tionswhich is not necessarilyaglobal optimum, but the global tion approach to the problem where the system simply adjusts optimumiscomputationallyintractable. thresholdstominimizeFA FR.Manypredictorerrorsoccurred becausethepredictorsoften(cid:1) turnedonoroffforsingletimesteps, 5 ExperimentalResults ascanbeseeninthelastcolumn. Note,however,thatusingjust twovariables,boththeplie´andreleve´canbedetectedquitewell, We have developed a recognition system based on the theory andinfacthavenopredictorerrors. developedintheprevioussection. TheinputisCartesian(XYZ) Table2andthe“announcement”pulsesinFigure7showthe trackingdatarecordedfrom14pointsonthedancer’sbodyata reults of using compound predictors formed by ANDing three 8 Marker Locations Articulated Body Model rigid torso has 3 DOF shoulders location, plus 3 DOF elbows orientation parameters hips 3 DOF shoulder joints wrists 1 DOF elbow joints knees ankles 3 DOF hip joints toes 1 DOF knee joints 1 DOF ankle joints Figure5: Trackingandhumanbodymodels. The AOAsystem[3]returnsXYZdatafor14variablesyielding42channelsofdata. Fromthisdata24modelparametersareextracted: torsolocationandattitude–6parametersrepresentedbyX,Y,Z,yaw,pitch,androll; shouldersandhips–3parameterseach,representedby3Eulerangles;elbows,kneesandankles–1parametereachrepresentedby1 angle. Bestpairpredictors;NoANDing costratio(cid:17) =1:1 temporalfilteringtime=0 (cid:1) Predictor Move Variables (cid:0) (cid:11) (cid:0) FA(%) FR(%) errors plie´ hip– ,Z 105 2175 0 1(1) 0 (cid:3) releve´ hip– ,Z 161 2119 14(0.7) 26(9) 0 (cid:1) tendu hip– ,Z 143 2137 47(2.2) 37(26) 7 (cid:1) de´gage´ hip– ,knee 97 2203 31(1.4) 46(50) 17 (cid:3) fondu hip– ,knee 188 2092 29(1.4) 88(47) 19 (cid:0) frappe´ hip– ,hip– 175 2105 4(.02) 166(95) 6 (cid:0) (cid:1) de´veloppe´ hip– ,Z 298 1982 121(6.1) 147(50) 19 (cid:3) g. batt. side hip– ,Z 108 2172 5(.02) 79(73) 6 (cid:1) g. batt. front hip– ,hip– 105 2175 17(0.8) 15(14) 6 (cid:0) (cid:1) Table1: (cid:17) isratioofcostofFR(falserejection)toFA(falseacceptance);(cid:11) istotalnumberoftimesteps;(cid:0) isnumberoftimesteps duringthemove. Thisapproachtriestomaximizethenumberoftimestepsproperlyclassifiedwhileignoringtemporalcontinuity. False (cid:1) acceptanceandfalserejectionreceiveequalpenalties.Thepredictorerrorcount[#oftimesthepredictorturnedon] [#ofocccurences ofdancestep]ishighbecausethepredictorsoftenturnonverybrieflyduringothermovements. BestforAnnotation;3–wayANDing costratio(cid:17) =4.3:1 temporalfilteringtime=.3sec (cid:1) Annot. Move Variables (cid:0) (cid:11) (cid:0) FA(%) FR(%) errs plie´ hip– ,hip– ,knee,Z 105 2175 0 1(1) 0 (cid:0) (cid:3) releve´ hip– ,ankle,Z 161 2119 6(.3) 21(13) 0 (cid:1) tendu hip– ,hip– ,hip– ,Z 143 2137 0 31(22) 0 (cid:0) (cid:1) (cid:3) de´gage´ hip– ,hip– ,knee 97 2203 0 15(16) 0 (cid:0) (cid:3) fondu hip– ,hip– ,hip– ,hip– ,Z 188 2092 0 102(54) 0 (cid:0) (cid:1) (cid:2) (cid:3) frappe´ hip– ,hip– ,hip– ,Z 175 2105 0 18(10) 0 (cid:0) (cid:1) (cid:3) de´veloppe´ hip– ,hip– ,hip– ,Z 298 1982 52(2.6) 148(50) 1 (cid:0) (cid:2) (cid:3) g. batt. side hip– ,hip– ,knee,Z 108 2172 0 1(1) 0 (cid:0) (cid:3) g. batt. front hip– ,hip– ,ankle,Z 105 2175 2(.01) 32(30) 0 (cid:0) (cid:1) Table2: ANDingmeansthate.g.onlytimestepsacceptedbyall3plie´detectors(individuallytemporallyfiltered)willbeclassifiedasplie´. Eachindividualdetectorcanallowhigheracceptanceratesbecausetheconjunctionlowerstheprobabilityofrandomfalseacceptance. Temporalfilteringrejectsshortlivedacceptanceperiods. Allthisallowsustobiastheindividualpredictorstowardsacceptancewitha lowerpenaltyonFA,andyetseeverylowratesofFAonthecompoundpredictors. 9 Input data: joint angles & height 10 10 # 5 p 5 e St 0 0 0 5 10 15 0 5 10 15 20 ht 0.5 g0.5 ei H 0.4 0.4 0 0 X −1 p −1 Hi −2 −2 1 1 0 Y 0 p Hi −1 −1 2 2 Z p 1 Hi1 0 0 2 2 hi P 1 p 1 Hi 0 0 0 0 e e −1 Kn−1 −2 −2 2 2 e kl 1 n1 A 0 0 0 5 10 15 0 5 10 15 20 Time (sec) dancer 1 Time (sec) dancer 2 Figure6: Inputdatafortwodancersprocessedbydancesteprecognitionsystem. Thetopgraphisgroundtruthidentification;thenext istorsoheightscaledbydancer’sleglength;threeEuleranglesrepresentingthedancer’srighthip;afourthEuleranglefromadifferent representation;andrightkneeandankleangles. Thetimebasealongthebottomismeasuredinvideoframes. The“spikes”clippedat thetopsofgraphs(e.g.inHip )aremissingdatacodes. (cid:3) 10
Description: