Novel Skeletal Representation for Articulated Creatures AThesis Presentedto TheAcademicFaculty by Gabriel J. Brostow InPartialFulfillment oftheRequirementsfortheDegree DoctorofPhilosophy CollegeofComputing GeorgiaInstituteofTechnology April2004 Novel Skeletal Representation for Articulated Creatures Approvedby: LeonardMcMillan IrfanA.Essa,CommitteeChair (UNCatChapelHill) AaronBobick JamesRehg JessicaK.Hodgins GregTurk (CarnegieMellonUniversity) DateApproved: 12April2004 TABLE OF CONTENTS LISTOFTABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v LISTOFFIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x CHAPTERI INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 BroaderImpactandApplications . . . . . . . . . . . . . . . . . . . . . . 3 1.2 MathematicalObjective . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 CHAPTERII RELATEDWORK . . . . . . . . . . . . . . . . . . . . . . . . . 9 CHAPTERIII SPINEFORMULATION&ESTIMATION . . . . . . . . . . . 13 3.1 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2 CreatingaSpineforaSingleFrame . . . . . . . . . . . . . . . . . . . . . 15 3.3 CorrespondenceTracking . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.4 ImposingaSingleGraphontheSpine . . . . . . . . . . . . . . . . . . . 22 CHAPTERIV IMPLEMENTATION&EXPERIMENTALRESULTS . . . . . 23 4.1 DesignDecisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.2 DataAcquisitionandProcessing . . . . . . . . . . . . . . . . . . . . . . 24 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 CHAPTERV COMPARISONOFAPPLICABILITY . . . . . . . . . . . . . . 43 5.1 PossibleApplicationsOverview . . . . . . . . . . . . . . . . . . . . . . . 43 5.2 TheMotionCaptureProblem . . . . . . . . . . . . . . . . . . . . . . . . 45 5.3 CurrentSolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.4 IncorporatingSpines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.5 EvaluationPlan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 CHAPTERVI CONCLUSION&FUTUREWORK . . . . . . . . . . . . . . . 51 6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 iii 6.2 FutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 6.2.1 FusingSpineNodes . . . . . . . . . . . . . . . . . . . . . . . . . 54 6.2.2 LimbOptimizationFrameworks . . . . . . . . . . . . . . . . . . 55 6.2.3 UseofthisRepresentation . . . . . . . . . . . . . . . . . . . . . 58 6.3 FinalDiscussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 iv LIST OF TABLES Table1 Contextofourapproach(markedincolor)withrespecttoexistingalgo- rithmsinestablishedfields. . . . . . . . . . . . . . . . . . . . . . . . . . 9 v LIST OF FIGURES Figure1 Photos of five subjects filmed to obtain experimental data. The middle image shows the surface of the adult human, and pictured clockwise from the upper left is the human baby (11 months old), the Rosehair tarantula,thedog,andthecamelmarionette. . . . . . . . . . . . . . . . 4 Figure2 Spine graph limbs encoding motion over time; nodes labeled for illus- trationonly. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Figure3 (A)Articulatedsubject,(B)reconstructedsurface,(C)extractedskeleton. 8 Figure4 Exampleofgeneratingaskeletonforasyntheticstarfishmesh. (A)Cap- ture images of the starfish from a variety of vantage points (B) Extract a 3D surface using generalized voxel carving and improved marching cubes (C) Starting at one extremity tip, calculate geodesic distances for each vertex (D) Quantize distances and cluster vertices into bins of the same distance (E) Create a skeleton by walking through the progression of level set rings (F) Repeat C-E for each tip and merge into a single representativeskeleton. . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Figure5 2Dexampleofclusteringconnectedverticesintobinsofsimilargeodesic distanceandwalkingthroughtheresultinglevelsetrings. . . . . . . . . 17 Figure6 Theredandgreenskeletonsrepresentthesame“creature,”possiblyseeded from two different places. Wishing to copy nodes from the best limbs eachconstituentskeletonhastooffer,wedevelopedaleaf-nodeseeking topology matching algorithm that recognizes that these pairs of three- wayjunctionsshouldbeasinglefour-wayjunction. . . . . . . . . . . . 18 Figure7 Refinement through imposing of correspondence into the sequence. In- stead of greedily including every protrusion that appears to be an end effector,weareabletokeeponlythelimbsthatappearconsistentlyover time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Figure8 The sequence of skeleton-trees (left) has separate node-branches L ..L . A I The limb-to-limb correspondence is known across time, but each node exists only in one limb for one frame. Normalizing each limb’s length with respect to time, we resample the Spine to form one set of Spine- nodes(right)whosepositionvariesasafunctionoftime. . . . . . . . . . 22 Figure9 Construction previsualization of our data capture stage. The structure is designed to accommodate small subjects, and support 20 or more cam- eraswithoutobstructingincominglight. Note“floating”calibrationpat- ternofdimensions20x14cm. . . . . . . . . . . . . . . . . . . . . . . 25 vi Figure10 Datacapturestagesurroundingaplatformwithaninterchangeabletable top. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Figure11 Camel marionette used for experimentation, after replacing strings with fishingline(inset). Thissegmentedexampleframefromthevideofootage showstheblue2Dboundingbox,andthesubsequentlyestimatedbound- ingvolumeusedbytheGVCalgorithm. . . . . . . . . . . . . . . . . . . 29 Figure12 BABY DATASET: From left to right, one of the views, voxels, polygonal model,levelsets,andSpinewithdistancefunction. . . . . . . . . . . . 30 Figure13 DOG DATASET: From left to right, subject, polygonal model, distance function,levelsets,andresultingSpine. . . . . . . . . . . . . . . . . . 30 Figure14 CAMEL PUPPET DATASET: Fromlefttoright,oneview,wireframe,dis- tancefunction,levelsets,andresultingSpine. . . . . . . . . . . . . . . 30 Figure15 Graphofcamel’snumberofendeffectorlimbtipsfoundovertimewhen processingsurfacesindividually. |E |variesbetweenfourandten. . . . . 32 t Figure16 Histogramshowinghowmanysurfacesinthecamelsequencewerefound tohaveeachnumberoflimbtips. Thecamelsequencerevealedfivelimb tipswiththegreatestconsistency,asisappropriateforthiscreature. . . . 33 Figure17 Rendered version of the camel Spine, extracted from the animated se- quence. Pictured nodes appear consistently in each frame with known correspondenceandorientation. Eachlimbwasparameterizedonlength and discretized into 15 equal samples, though any subdivision can be used. Frames where limbs were tucked are interpolated for animation purposes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Figure18 Photograph of tarantula subject taken through transparent table top. Be- yondeightlegs,tarantulashaveapedipalponeachsideofthefangs,and anabdomensection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Figure19 Tiled subimages of tarantula subject’s footage filmed with 20 cameras. Variations in colorsare result of user error in adjusting settings, though two images pictured in bottom row come from JVC cameras instead of Canon. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Figure20 Reconstructedsurfacemeshoftarantulausingarrayofframes#29. {Red, green, blue} coloring represents { x,y,z } components of surface nor- mals. Lumps on surface are result of fine subdivision of voxel volume despitecomparativelylargepixelsinvideofootage(atmaximalzoom). . 36 vii Figure21 Geodesic distance measured from first automatically detected tarantula limb tip, colored from black to white with increasing distance. Sub- division of this distance field into connected components of level sets according to our algorithm produces the pictured skeleton. Other limb tipssubsequentlyyieldotherskeletonsthataremergedwiththepictured one. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Figure22 Differentviewsofthemergedtarantulaskeleton: (A)Withoutedgescon- nectingskeletonnodes,(B)withedgesthatconvergeonapoint. . . . . . 37 Figure23 Graph of tarantula’s number of end effector limb tips found over time when processing surfaces individually. |E | varies between seven and t fifteen. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Figure24 Histogram showing how many surfaces in the tarantula sequence were foundtohaveeachnumberoflimbtips. Thetarantulasequencerevealed ten limb tips with the greatest consistency, which is almost appropriate; tarantulashaveeightlegs,twopedipalps,andanabdomen. . . . . . . . . 38 Figure25 Rendering of Spine estimated for adult human subject (left), and corre- spondingsurfacemeshcoloredaccordingtonormals. . . . . . . . . . . . 39 Figure26 (A)Graphofadulthuman’snumberofendeffectorlimbtipsfoundover timewhenprocessingsurfacesindividually. |E |variesbetweentwoand t five. (B)Whilemostdatawasprocessedbyquantizingthegeodesicdis- tance field into 30 levels, the pictured graph shows the change in |E | t whenusing40levelsinstead. Thenumberoflevelsusedcannotbearbi- trarily high unless triangles in the surface mesh are subdivided, because levelsetsmustformcompletecircuitsaroundthebody. . . . . . . . . . . 40 Figure27 Histogram showing how many surfaces in the adult human sequence werefoundtohaveeachnumberoflimbtips. Theadulthumansequence revealed four limb tips with the greatest consistency instead of five, be- causeoneofthetwoarmswasalternatinglytuckedagainstthebody. . . . 41 Figure28 Example illustrating difficulty in aligning coordinate frames defined by three or more markers each. Co-locating one red and blue corner of the two frames can easily leave the other two corners grossly misaligned, so best fit techniques must be used instead (e.g., least squared error on distance). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Figure29 Illustration of attaching bone screw markers to a skeleton. Depending on prior knowledge about limb lengths and joint locations with respect tomarkers,notalllimbsneedthreescrewstobetrackedeffectively. . . . 47 viii Figure30 (A) Illustration of explicit optical motion capture markers attached to subject’s surface (top). In postprocessing, a user selects clouds of mark- ers that subsequently drive the pose of each limb in a revolute joint armature (bottom). (x,y,z) location of each optical marker is tracked and available for pose estimation (B) Illustration of implicit Spine node markers in subject’s interior. In the same manner as for motion capture, ausercanselectclustersofnodestodriveeachofthetwopicturedlimbs on an armature. Spine node markers have (x,y,z) location and absolute orientation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Figure31 One hypothesized separation of this branch of the Spine tree into three sections,eachbeingevaluatedasrigidlyrotatingaboutitsparentjoint. . 56 Figure32 While both are Spine trees appearing to have three limbs, the creature on the left has three legs, while the one on the right is a snake with bull- horns. Performing a local Spine flexibility analysis would reveal a good startingpointfortheplacementofthreejointsineithercase,andreduces superfluousjoint-locationhypotheses. . . . . . . . . . . . . . . . . . . . 57 Figure33 Volume Capture merges technologies of data acquisition for skeletal poses(mocap),surfacemodels(Cyberware),andlighting/texture([26]). . 58 ix SUMMARY Thisresearchexaminesanapproachforcapturing3Dsurfaceandstructuraldataof movingarticulatedcreatures. Giventhetaskofnon-invasivelyandautomaticallycapturing suchdata,amethodologyandtheassociatedexperimentsarepresented,thatapplytomulti- viewvideosofthesubject’smotion. Ourthesisstates: Afunctionalstructureandthetime- varyingsurfaceofanarticulatedcreaturesubjectarecontainedinasequenceofits3Ddata. Afunctionalstructureisoneexampleofthepossiblearrangementsofinternalmechanisms (kinematic joints, springs, etc.) that is capable of performing the motions observed in the inputdata. Volumetricstructuresarefrequentlyusedasshapedescriptorsfor3Ddata. Thecapture of such data is being facilitated by developments in multi-view video and range scanning, extending to subjects that are alive and moving. In this research, we examine vision-based modeling and the related representation of moving articulated creatures using Spines. We define a Spine as a branching axial structure representing the shape and topology of a 3D object’slimbs,andcapturingthelimbs’correspondenceandmotionovertime. TheSpineconceptbuildsonskeletalrepresentationsoftenusedtodescribetheinternal structure of an articulated object and the significant protrusions. Our representation of a Spine provides for enhancements over a 3D skeleton. These enhancements form tempo- rallyconsistentlimbhierarchiesthatcontaincorrespondenceinformationaboutrealmotion data. We present a practical implementation that approximates a Spine’s joint probability function to reconstruct Spines for synthetic and real subjects that move. In general, our approach combines the objectives of generalized cylinders, 3D scanning, and markerless motion capture to generate baseline models from real puppets, animals, and human sub- jects. x
Description: