Visual Analysis of High DOF Articulated Objects with Application to Hand Tracking James M. Rehg April 1995 CMU-CS-95-138 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Submitted to the Dept. of Electrical and Computer Engineering in partial ful(cid:12)llment of the requirements for the degree of Doctor of Philosophy. Committee: Dr. Takeo Kanade, Chair Dr. Jos(cid:19)e Moura Dr. Katsushi Ikeuchi, SCS Dr. Andrew Witkin, SCS This research was conducted at the Robotics Institute, Carnegie Mellon University, and partially supported by the NASA George Marshall Space Flight Center (GMSFC), Huntsville,Alabama35812through the Graduate Student Researchers Program(GSRP), Grant No. NGT-50559. The views and conclusions contained in this document are those of the author and should not be interpreted as representing the o(cid:14)cial policies, either expressed or implied,of NASA or the U.S. government. (cid:13)c 1995 James M. Rehg Keywords: Model-BasedVisualTracking,ArticulatedandNonrigidOb- ject Motion, Occlusions, Human Motion Analysis, Human-Computer Inter- action, Gesture Recognition Abstract Measurement of human hand and body motion is an important task for applications ranging from athletic performance analysis to advanced user- interfaces. Commercial human motion sensors are invasive, requiring the user to wear gloves or targets. This thesis addresses noninvasive real-time 3D tracking of human motion using sequences of ordinary video images. In contrast to other sensors, video cameras are passive and inobtrusive, and can easily be added to existing work environments. Other computer vi- sion systems have demonstrated real-time tracking of a single rigid object in six degrees-of-freedom (DOFs). Articulated objects like the hand present three challenges to existing rigid-body tracking algorithms: a large number of DOFs (27 for the hand), nonlinear kinematic constraints, and complex self-occlusion e(cid:11)ects. This thesis presents a novel tracking framework for articulated objects that uses explicit kinematic models to overcome these obstacles. Kinematicmodelsplay two mainroles in this work: they provide geomet- ric constraints on image features and predict self-occlusions. A kinematic model for hand tracking gives the 3D positions of the (cid:12)ngers as a function of the hand state, which consists of the pose of the palm and the (cid:12)nger joint angles. Image features for the hand consist of lines and points which are obtained by projecting (cid:12)nger phalanges and tips into the image plane. The kinematic model provides a geometric constraint on the image plane posi- tions of hand features as a function of the hand state. Tracking proceeds by registering the projection of the hand model with measured image features at a high frame rate. Self occclusions are modeled by arranging the image features in overlap- ping layers, ordered by their visibility to the camera. The layered repre- sentation is generated automatically by the kinematic model and used to constrain registration. This framework was implemented in a hand tracking system called DigitEyes and tested in two sets of experiments. First, a hand was tracked in real-time using two cameras and a 27 DOF model, and using a single camera in a 3D mouse user-interface trial. Second, the occlusion handling framework was tested o(cid:11)-line on a motion sequence with signi(cid:12)cant self-occlusion. These results illustrate the e(cid:11)ectiveness of explicit kinematic models in 3D tracking and analysis of self-occluding motion. Dedicated to Jim and Marci Acknowledgments I wish to thank my advisor, Dr. Takeo Kanade, for his support and tech- nical advice during my graduate studies. My years in the Vision and Au- tonomous Systems Center (VASC) were extremelyenjoyable, and I’m grate- ful to have had the opportunity to complete my thesis in such a marvelous environment. I was fortunate to interactclosely with mycommitteemembers,Dr. Kat- sushi Ikeuchi, Dr. Jos(cid:19)e Moura, and Dr. Andrew Witkin at di(cid:11)erent stages of my thesis work. I am grateful to them for their helpful comments and insight. I am especially grateful to Andy for introducing me to deformable models and for making the resources of the graphics lab available for myuse. I want to thank Dr. Ingemar Cox for an enjoyable and productive intern- ship at the NEC Research Institute in Princeton, NJ during the summer of 1991. I would like to thank my parents, Jim and Marci, for the encouragement and e(cid:11)ort that made this dissertation possible. I want to thank my wife, Dorothy, for her constant support and valuable technical insight. I am grateful to Alberto Elfes for introducing me to computer vision and robotics, and to Radu Jasinschi for his ideas and friendship. I have enjoyed many interesting conversations with members of the VASC group and ECE department over the years. In particular, I want to thank Sandra Ramos- Thuel, Omead Amidi, Heung-Yeung Shum, Luc Robert, and Fabio Cozman for their time. ThankstoOmeadAmidiandYujiMesakifortheirhelpwiththehardware environment, and to Dr. David Sturman for kindly providing the postscript (cid:12)le for Fig. 2.4. Jim Rehg April 7, 1995
Description: