Table Of ContentComputaaonal Studies
of Human Motion:
Part 1 , Tracking and Motion
Synthesis
David A. Forsyth, Oun Arlkan, leslie lkemoto,
James O'Brien, Dev. Ramanan
Computational Studies
of Human Motion:
Part 1, Tracking and
Motion Synthesis
Computational Studies
of Human Motion:
Part 1, Tracking and
Motion Synthesis
David A. Forsyth
University of Illinois Urbana Champaign
Okan Arikan
University of Texas at Austin
Leslie Ikemoto
University of California, Berkeley
James O’Brien
University of California, Berkeley
Deva Ramanan
Toyota Technological Institute at Chicago
Boston – Delft
Foundations and Trends(cid:13)R in
Computer Graphics and Vision
Published, sold and distributed by:
now Publishers Inc.
PO Box 1024
Hanover, MA 02339
USA
Tel. +1-781-985-4510
www.nowpublishers.com
sales@nowpublishers.com
Outside North America:
now Publishers Inc.
PO Box 179
2600 AD Delft
The Netherlands
Tel. +31-6-51115274
A Cataloging-in-Publication record is available from the Library of Congress
The preferred citation for this publication is D.A. Forsyth, O. Arikan, L. Ikemoto,
J.O’Brien,D.Ramanan,ComputationalStudiesofHumanMotion:Part1,Tracking
andMotionSynthesis,FoundationandTrends(cid:13)R inComputerGraphicsandVision,
vol 1, no 2/3, pp 77–254, 2005
Printed on acid-free paper
ISBN: 1-933019-72-7
(cid:13)c 2006 D.A. Forsyth, O. Arikan, L. Ikemoto, J. O’Brien, D. Ramanan
All rights reserved. No part of this publication may be reproduced, stored in a retrieval
system,ortransmittedinanyformorbyanymeans,mechanical,photocopying,recording
orotherwise,withoutpriorwrittenpermissionofthepublishers.
Photocopying. In the USA: This journal is registered at the Copyright Clearance Cen-
ter, Inc., 222 Rosewood Drive, Danvers, MA 01923. Authorization to photocopy items for
internal or personal use, or the internal or personal use of specific clients, is granted by
now Publishers Inc for users registered with the Copyright Clearance Center (CCC). The
‘services’foruserscanbefoundontheinternetat:www.copyright.com
For those organizations that have been granted a photocopy license, a separate system
of payment has been arranged. Authorization does not extend to other kinds of copy-
ing, such as that for general distribution, for advertising or promotional purposes, for
creating new collective works, or for resale. In the rest of the world: Permission to pho-
tocopy must be obtained from the copyright owner. Please apply to now Publishers Inc.,
PO Box 1024, Hanover, MA 02339, USA; Tel. +1 781 871 0245; www.nowpublishers.com;
sales@nowpublishers.com
nowPublishersInc.hasanexclusivelicensetopublishthismaterialworldwide.Permission
tousethiscontentmustbeobtainedfromthecopyrightlicenseholder.Pleaseapplytonow
Publishers,POBox179,2600ADDelft,TheNetherlands,www.nowpublishers.com;e-mail:
sales@nowpublishers.com
Foundations and Trends(cid:13)R in
Computer Graphics and Vision
Volume 1 Issue 2/3, 2005
Editorial Board
Editors-in-Chief:
Brian Curless
University of Washington
Luc Van Gool
KU Leuven/ETH Zurich
Richard Szeliski
Microsoft Research
Editors
Marc Alexa (TU Darmstadt) Jitendra Malik (UC. Berkeley)
Ronen Basri (Weizmann Inst) Steve Marschner (Cornell U.)
Peter Belhumeur (Columbia) Shree Nayar (Columbia)
Andrew Blake (Microsoft Research) James O’Brien (UC. Berkeley)
Chris Bregler (NYU) Tomas Pajdla (Czech Tech U)
Joachim Buhmann (U. Bonn) Pietro Perona (Caltech)
Michael Cohen (Microsoft Research) Marc Pollefeys (U. North Carolina)
Paul Debevec (USC, ICT) Jean Ponce (U. Illinois UC)
Julie Dorsey (Yale) Long Quan (INRIA)
Fredo Durand (MIT) Cordelia Schmid (INRIA)
Olivier Faugeras (INRIA) Steve Seitz (U. Washington)
Mike Gleicher (U. of Wisconsin) Amnon Shashua (Hebrew Univ)
William Freeman (MIT) Peter Shirley (U. of Utah)
Richard Hartley (ANU) Stefano Soatto (UCLA)
Aaron Hertzmann (U. of Toronto) Joachim Weickert (U. Saarland)
Hugues Hoppe (Microsoft Research) Song Chun Zhu (UCLA)
David Lowe (U. British Columbia) Andrew Zisserman (Oxford Univ)
Editorial Scope
Foundations and Trends(cid:13)R in Computer Graphics and Vision
will publish survey and tutorial articles in the following topics:
• Rendering: Lighting models; • Shape Representation
Forward rendering; Inverse • Tracking
rendering; Image-based rendering;
• Calibration
Non-photorealistic rendering;
• Structure from motion
Graphics hardware; Visibility
computation • Motion estimation and registration
• Shape: Surface reconstruction; • Stereo matching and
Range imaging; Geometric reconstruction
modelling; Parameterization; • 3D reconstruction and
• Mesh simplification image-based modeling
• Animation: Motion capture and • Learning and statistical methods
processing; Physics-based • Appearance-based matching
modelling; Character animation
• Object and scene recognition
• Sensors and sensing
• Face detection and recognition
• Image restoration and
• Activity and gesture recognition
enhancement
• Image and Video Retrieval
• Segmentation and grouping
• Video analysis and event
• Feature detection and selection
recognition
• Color processing
• Medical Image Analysis
• Texture analysis and synthesis
• Robot Localization and Navigation
• Illumination and reflectance
modeling
Information for Librarians
FoundationsandTrends(cid:13)R inComputerGraphicsandVision,2005,Volume1,
4 issues. ISSN paper version 1572-2740. ISSN online version 1572-2759. Also
available as a combined paper and online subscription.
FoundationsandTrends(cid:13)R in
ComputerGraphicsandVision
Vol.1,No2/3(2005)77–254
(cid:13)c 2006D.A.Forsyth,O.Arikan,L.Ikemoto,
J.O’Brien,D.Ramanan
DOI:10.1561/0600000005
Computational Studies of Human Motion:
Part 1, Tracking and Motion Synthesis
David A. Forsyth1, Okan Arikan2, Leslie
Ikemoto3, James O’Brien4 and Deva Ramanan5
1 University of Illinois Urbana Champaign
2 University of Texas at Austin
3 University of California, Berkeley
4 University of California, Berkeley
5 Toyota Technological Institute at Chicago
Abstract
Wereviewmethodsforkinematictrackingofthehumanbodyinvideo.
Thereviewispartofaprojectedbookthatisintendedtocross-fertilize
ideas about motion representation between the animation and com-
putervisioncommunities.Thereviewconfinesitselftotheearlierstages
of motion, focusing on tracking and motion synthesis; future material
will cover activity representation and motion generation.
In general, we take the position that tracking does not necessarily
involve(asisusuallythought)complexmultimodalinferenceproblems.
Instead, there are two key problems, both easy to state.
The first is lifting, where one must infer the configuration of the
body in three dimensions from image data. Ambiguities in lifting can
result in multimodal inference problem, and we review what little is
known about the extent to which a lift is ambiguous. The second is
data association, where one must determine which pixels in an image
come from the body. We see a tracking by detection approach as the
most productive, and review various human detection methods.
Lifting,andavarietyofotherproblems,canbesimplifiedbyobserv-
ingtemporalstructureinmotion,andwereviewtheliteratureondata-
driven human animation to expose what is known about this structure.
Accurategenerativemodelsofhumanmotionwouldbeextremelyuseful
inbothanimationandtracking,andwediscusstheprofounddifficulties
encountered in building such models. Discriminative methods – which
should be able to tell whether an observed motion is human or not –
do not work well yet, and we discuss why.
There is an extensive discussion of open issues. In particular, we
discuss the nature and extent of lifting ambiguities, which appear to
be significant at short timescales and insignificant at longer timescales.
Thisdiscussionsuggeststhatthebesttrackingstrategyistotracka2D
representation, and then lift it. We point out some puzzling phenom-
ena associated with the choice of human motion representation – joint
angles vs. joint positions. Finally, we give a quick guide to resources.
Contents
1 Tracking: Fundamental Notions 1
1.1 General observations 2
1.2 Tracking by detection 5
1.3 Tracking using flow 14
1.4 Tracking with probability 26
2 Tracking: Relations between 3D and 2D 33
2.1 Kinematic inference with multiple views 34
2.2 Lifting to 3D 38
2.3 Multiple modes, randomized search and
human tracking 49
3 Tracking: Data Association for Human Tracking 61
3.1 Detecting humans 62
3.2 Tracking by matching revisited 74
3.3 Evaluation 85
4 Motion Synthesis 89
4.1 Fundamental notions 90
4.2 Motion signal processing 101
ix