Table Of ContentDESIGN OF MANY-CAMERA TRACKING SYSTEMS
FOR SCALABILITY AND EFFICIENT RESOURCE
ALLOCATION
A DISSERTATION
SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING
AND THE COMMITTEE ON GRADUATE STUDIES
OF STANFORD UNIVERSITY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
XingChen
July2002
c Copyright by Xing Chen 2002
All Rights Reserved
ii
IcertifythatIhavereadthisdissertationandthatinmyopin-
ionitisfullyadequate,inscopeandquality,asadissertation
forthedegreeofDoctorofPhilosophy.
PatrickM.Hanrahan
(PrincipalAdviser)
IcertifythatIhavereadthisdissertationandthatinmyopin-
ionitisfullyadequate,inscopeandquality,asadissertation
forthedegreeofDoctorofPhilosophy.
ChristophBregler
IcertifythatIhavereadthisdissertationandthatinmyopin-
ionitisfullyadequate,inscopeandquality,asadissertation
forthedegreeofDoctorofPhilosophy.
LeonidasJ.Guibas
Approved for the University Committee on Graduate Stud-
ies:
iii
To my husband Jason Fan,
and to my parents
Huaichen Chen and Jiemin Gong
iv
Abstract
There are many applications where it is useful to know the location and/or orientation of
people or objects as they move through space. For example, head and hand position can
be used as input to drive a virtual reality application, or the joint motion of a dancer can
be capturedto drivethemovementof an animated figure. Amongvariousmotion tracking
technologies, systems using cameras as optical sensors have been the subject of intensive
research for years due to their non-intrusive nature and their immunity to ferromagnetic
distortion. A network of cameras allows for a larger working volume, higher tracking
accuracy,morerobustnessandgreaterflexibility,butalsointroducesavarietyofinteresting
problems. The architecture of such systems needs to be designed to be scalable to a large
number of cameras. The added complexity of such systems makes it difficult to calibrate
all of the cameras into a single global coordinate frame in a scalable fashion, and also
introducestheproblemofhowtooptimallyplacecamerastomaximizetheperformanceof
thetrackingsystem.
Inthisdissertation,wehavedesignedandimplementedM-Track,ascalabletrackingar-
chitectureforreal-timemotiontrackingwithtensofcameras. Itsone-processor-per-camera
designenablestheparallelprocessingofhigh-bandwidthimagedata. Theemploymentofa
centralestimatorbased onan extendedKalman filter allowsthe smoothand asynchronous
integration of information from camera-processor pairs. Since camera synchronization is
notrequired,thisarchitectureresultsineasierdeployment,andenablestheemploymentof
heterogeneous sensors including cameras with different resolutions and frame rates. The
architecture also supports tracking of multiple features and automatic labeling of these
features,evenwhensomefeaturepointsaretemporarilyoccluded. Threeend-to-endappli-
cationsarebuiltuponthisarchitectureto demonstratetheusefulnessofthesystem.
v
Next,wepresentascalablewide-areamulti-cameracalibrationscheme. Alargenumber
of asynchronous cameras can be calibrated into a single consistent coordinate frame by
simplywavingabrightlightinfrontofthem. Thiscanbeachievedevenwhencamerasare
arranged with non-overlapping working volumes and when no initial estimates of camera
poses are available. There is no need for the construction of a universallyvisible physical
calibration object, and the method is easily adaptable to working volumes of variable size
andshape.
Wethenproposeaquantitativemetricforevaluatingthetrackingqualitygivenamulti-
camera placement configuration. Even though occlusion of tracked objects has a huge
impact on motion tracking, previous work only uses the 3D uncertainty caused by limited
cameraresolutiontoevaluatequality. Ourmetricconsidersbothcameraresolutionandthe
likelihood of target occlusion. In the formulation of the metric, we propose a novel prob-
abilistic model that estimates the dynamic self-occlusion of targets and verify its validity
through experimental data. Finally, we analyze various camera placement configurations
using our proposed metric and show the impact on camera placement requirements when
consideringeitheronlyresolutionorbothresolutionandocclusion.
vi
Acknowledgements
This work would not have been possible without the help and encouragement of many
people.
First of all, I feel extremely privileged and grateful to have had Dr. Pat Hanrahan as
my research advisor for the past five years. He has consistently encouraged me and given
me the freedom to explore a variety of research topics. It has been a rewarding learning
experience for me to work with him, and his vision, advice, high standards and especially
his emphasis on first principles in research have over the years facilitated my growth as
botharesearcherandaprofessional.
My most sincerethanks go to Stanfordprofessorswho havegraciously expendedtheir
time and effort for my work: Drs. Leo Guibas and Chris Bregler. I thank them for their
technicalinsightsandadvice,aswellastheirfriendshipandmoralsupport.
I extendmy thanksto Dr.AnoopGuptaof MicrosoftResearch,for initiallygettingme
started on a video streaming project while he was a professor at Stanford. The experience
from that project has motivated me in my exploration of the continually converging fields
ofvideo,graphicsandvision,andtheirapplicationsin networkedenvironments.
I would like to gratefully acknowledgemy goodfriend and close collaboratorin much
of the work in this dissertation, James Davis. His work is an excellent and useful comple-
menttomine,andthisresearchwould nothaveprogressedthisfar withouthisknowledge,
insightsandhardwork.
I would also like to acknowledge my friends and colleagues in the Stanford Graphics
Lab. It has been a lot of fun to be around so many intelligent, motivated people whose
knowledgeandcreativityI’veoftendrawnon. Inparticular,IthankFranc¸oisGuimbretie`re,
vii
Brad Johanson, Tamara Munzner, and Szymon Rusinkiewicz for not only their profes-
sional insights and suggestions, but also for their personal support and friendship. Special
thanks go to staff members John Gerth, Ada Glucksman and Heather Gentner for making
my working environment pleasant and hassle-free with their technical and administrative
support.
My grateful thanks extend to the various organizations that sponsored my work: Intel
Corporation,IntervalResearch,andSonyCorporation.
Finally, my heartfelt thanks go to my beloved better-half, Jason Fan, who has been
so patient and supportive over the past few years with my flexible, fun-seeking, graduate
school life style while he himself has endured a more regimented one in industry; and I
thankmyparents,HuaichenChenandJieminGong,whoareengineeringprofessorsthem-
selves and have raised me to value logic, curiosity, creativity, and scientific exploration,
andwhohavetrustedmeandservedasanemotionalanchorthroughoutmylife. Theirlove
hasbeenaconstantsourceofstrengthformeandIdedicatethisdissertationto them.
viii
Contents
Abstract v
Acknowledgements vii
1 Introduction 1
1.1 Applicationsofmotiontracking . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Sensortechnologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Motivationfora networkofcameras . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.2 Issuestobesolved . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Relatedwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Contributionsofthisdissertation . . . . . . . . . . . . . . . . . . . . . . . 14
1.6 Organizationofthisdissertation . . . . . . . . . . . . . . . . . . . . . . . 15
2 M-Track: AScalable,AsynchronousArchitecture 16
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 M-Trackarchitecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.1 Generaloverview . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.2 Distributedimage-spaceprocessing . . . . . . . . . . . . . . . . . 19
2.2.3 Centralstate-spaceestimator . . . . . . . . . . . . . . . . . . . . . 24
2.2.3.1 Trackingusingasynchronouscameras . . . . . . . . . . 24
2.2.3.2 Supportfortrackingmultipleindependentpoints . . . . . 28
2.2.4 Networkingmodule . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2.4.1 Communicationprotocol . . . . . . . . . . . . . . . . . 32
ix
2.2.4.2 Timingissues . . . . . . . . . . . . . . . . . . . . . . . 33
2.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.3.1 3DheadtrackingfortheInteractiveMural . . . . . . . . . . . . . . 38
2.3.2 LumiPoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.3.3 Simultaneousbodyandfacemotioncapture . . . . . . . . . . . . . 44
2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3 CalibratingDistributedCameraNetworks 48
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2.1 Basisoftraditionalcameracalibration . . . . . . . . . . . . . . . . 49
3.2.2 Previouswork . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3 ProposedMethod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3.1 Initialextrinsiccalibration . . . . . . . . . . . . . . . . . . . . . . 55
3.3.2 Iterativerefinement . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.4 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.4.1 Evaluationmethod . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.4.2 Comparisonwithexistingtechniques . . . . . . . . . . . . . . . . 63
3.4.3 Examplein awideareasetting . . . . . . . . . . . . . . . . . . . . 64
3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4 DeterminingOptimalCameraConfigurations 70
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.2 Relatedwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.3 Constructionofthequalitymetric . . . . . . . . . . . . . . . . . . . . . . 73
4.3.1 Causesof3Dpositionuncertainty . . . . . . . . . . . . . . . . . . 73
4.3.2 Qualitymetricofacameraconfiguration: generaldefinition . . . . 77
4.3.3 Uncertaintyofapointwithresolutiononly . . . . . . . . . . . . . 77
4.3.4 Uncertaintyofapointwithdynamicocclusion . . . . . . . . . . . 79
4.3.5 Uncertaintyofavolume . . . . . . . . . . . . . . . . . . . . . . . 84
4.3.6 Non-uniformdistributions . . . . . . . . . . . . . . . . . . . . . . 85
4.3.7 Summaryofourproposedmetric . . . . . . . . . . . . . . . . . . 85
x
Description:all of the cameras into a single global coordinate frame in a scalable central estimator based on an extended Kalman filter allows the smooth and .. magnetic fields such as transformers and computer screens. as the distance from a target to the camera increases, its size on the camera image plane