FaceGeometryandAppearanceModeling Humanfacesarefamiliartoourvisualsystems.Weeasilyrecognizeaperson’sfacein arbitrarylightingconditionsandinavarietyofposes,detectsmallappearancechanges, andnoticesubtleexpressiondetails.Cancomputervisionsystemsprocessfaceimages aswellashumanvisionsystemscan? Faceimageprocessinghaspotentialapplicationsinsurveillance,imageandvideo search, social networking, and other domains.Acomprehensive guide to this fasci- nating topic, this book provides a systematic description of modeling face geometry and appearance from images, including information on mathematical tools, physical concepts, image processing and computer vision techniques, and concrete prototype systems. This book will be an excellent reference for researchers and graduate students in computervision,computergraphics,andmultimediaaswellasapplicationdevelopers whowouldliketogainabetterunderstandingofthestateoftheart. dr. zicheng liuisaSeniorResearcheratMicrosoftResearch,Redmond.Hehas workedonavarietyoftopicsincludingcombinatorialoptimization,linkedfigureani- mation,andmicrophonearraysignalprocessing.Hiscurrentresearchinterestsinclude activityrecognition,facemodelingandanimation,andmultimediacollaboration.Hehas publishedmorethan70papersinpeer-reviewedinternationaljournalsandconferences andholdsmorethan40grantedpatents. dr. zhengyou zhang is a Principal Researcher with Microsoft Research, Redmond,andmanagesthemultimodalcollaborationresearchteam.Hehaspublished morethan200papersinrefereedinternationaljournalsandconferencesandcoauthored 3-D Dynamic Scene Analysis: A Stereo Based Approach (1992); Epipolar Geometry inStereo,MotionandObjectRecognition(1996);ComputerVision(1998);andFace DetectionandAdaptation(2010).HeisanIEEEFellow. http://ebooks.cambridge.org/ebook.jsf?bid=CBO9780511977169 http://ebooks.cambridge.org/ebook.jsf?bid=CBO9780511977169 Face Geometry and Appearance Modeling Concepts andApplications ZICHENG LIU MicrosoftResearch,Redmond ZHENGYOU ZHANG MicrosoftResearch,Redmond http://ebooks.cambridge.org/ebook.jsf?bid=CBO9780511977169 cambridge university press Cambridge,NewYork,Melbourne,Madrid,CapeTown, Singapore,SãoPaulo,Delhi,Tokyo,MexicoCity CambridgeUniversityPress 32AvenueoftheAmericas,NewYork,NY10013-2473,USA www.cambridge.org Informationonthistitle:www.cambridge.org/9780521898416 ©ZichengLiuandZhengyouZhang2011 Thispublicationisincopyright.Subjecttostatutoryexception andtotheprovisionsofrelevantcollectivelicensingagreements, noreproductionofanypartmaytakeplacewithoutthewritten permissionofCambridgeUniversityPress. Firstpublished2011 PrintedintheUnitedStatesofAmerica AcatalogrecordforthispublicationisavailablefromtheBritishLibrary. LibraryofCongressCataloginginPublicationData Liu,Zicheng,1965– Facegeometryandappearancemodeling:conceptsandapplications/ZichengLiu, ZhengyouZhang. p. cm. Includesbibliographicalreferencesandindex. ISBN978-0-521-89841-6(hardback) 1. Computervision. 2. Computergraphics. 3. Imageprocessing. I. Zhang, Zhengyou,1965– II. Title. TA1634.L582011 006.4(cid:1)2–dc22 2011002438 ISBN978-0-521-89841-6Hardback CambridgeUniversityPresshasnoresponsibilityforthepersistenceoraccuracyofURLs for external or third-party Internet Web sites referred to in this publication and does not guarantee that any content on such Web sites is, or will remain, accurate or appropriate. http://ebooks.cambridge.org/ebook.jsf?bid=CBO9780511977169 Contents 1 Introduction page1 1.1 Literaturereview 2 1.2 Scopeofthebook 11 1.3 Notation 13 PARTI FACEREPRESENTATIONS 2 Shapemodels 19 2.1 Mesh 19 2.2 Parametricsurfaces 20 2.3 Linearspacerepresentation 21 2.4 Expressionspacerepresentation 25 3 Appearancemodels 31 3.1 Illuminationmodels 31 3.2 Irradianceenvironmentmap 36 3.3 Sphericalharmonics 36 3.4 Morphablemodeloffacealbedo 39 PARTII FACEMODELING 4 Shapemodelingwithactivesensors 43 4.1 Laserscanners 43 4.2 Structuredlightsystems 46 4.3 Structuredlightstereosystems 47 5 Shapemodelingfromimages 48 5.1 Structurefrommotionapproach 48 5.2 Stereovisionapproach 77 v vi Contents 5.3 Twoorthogonalviews 80 5.4 Asingleview 82 6 Appearancemodeling 86 6.1 Reflectometry 86 6.2 Reconstructionofirradianceenvironmentmaps 91 6.3 Illuminationrecoveryfromspecularreflection 101 7 Jointshapeandappearancemodeling 111 7.1 Shapefromshading 111 7.2 Facemorphablemodel 113 7.3 Sphericalharmonicbasismorphablemodel 118 7.4 Data-drivenbilinearilluminationmodel 128 7.5 Spatiallyvaryingtexturemorphablemodel 129 7.6 ComparisonbetweenSHBMM-andMRF-basedmethods 140 PARTIII APPLICATIONS 8 Faceanimation 149 8.1 Talkinghead 149 8.2 Facialexpressionsynthesis 150 8.3 Physicallybasedfacialexpressionsynthesis 153 8.4 Morph-basedfacialexpressionsynthesis 153 8.5 Expressionmapping 154 8.6 Expressionsynthesisthroughfeaturemotionpropagation 166 9 Appearanceediting 181 9.1 Detailtransfer 181 9.2 Physiologicallybasedapproach 191 9.3 Virtuallighting 194 9.4 Activelighting 202 10 Model-basedtrackingandgazecorrection 217 10.1 Headposeestimation 218 10.2 Monocularheadposetracking 235 10.3 Stereoheadposetracking 236 10.4 Multicameraheadposetracking 242 10.5 Eye-gazecorrectionforvideoconferencing 251 11 Humancomputerinteraction 257 11.1 Conversationalagent 257 11.2 Face-basedhumaninteractiveproofsystem 268 Bibliography 279 Index 295 Colorplatesfollowpage146 http://ebooks.cambridge.org/ebook.jsf?bid=CBO9780511977169 1 Introduction Faceimageprocessinghasbeenafascinatingtopicforresearchersincomputer vision,computergraphics,andmultimedia.Humanfaceissointerestingtous partly because of its familiarity. We look at many different faces every day. Werelyonourfaceasacommunicationchanneltoconveyinformationthatis difficultorimpossibletoconveyinothermeans.Humanfacesaresofamiliar to our visual system that we can easily recognize a person’s face in arbitrary lightingconditionsandposevariations,detectsmallappearancechanges,and noticesubtleexpressiondetails.Acommonquestioninmanyresearchers’minds iswhetheracomputervisionsystemcanprocessfaceimagesaswellasahuman visionsystem. Faceimageprocessinghasmanyapplications.Inhumancomputerinterac- tion,wewouldlikethecomputertobeabletounderstandtheemotionalstate oftheuserasitenrichesinteractivityandimprovestheproductivityoftheuser. Incomputersurveillance,automaticfacerecognitionisextremelyhelpfulfor boththepreventionandinvestigationofcriminalactivities.Inteleconferencing, faceimageanalysisandsynthesistechniquesareusefulforvideocompression, imagequalityenhancement,andbetterpresentationofremoteparticipants.In entertainment,facemodelingandanimationtechniquesarecriticalforgenerat- ingrealistic-lookingvirtualcharacters.Somevideogamesevenallowplayers tocreatetheirpersonalizedfacemodelsandput“themselves”inthegame. Thelastdecadehasbeenanexcitingperiodforfaceimageprocessingwith many new techniques being developed and refined. But the literature is scat- tered,andithasbecomeincreasinglydifficultforaperson,whoisnewinthis field, to search for published papers and to get a good understanding of the stateoftheart.Thegoalofthisbookistoprovideasystematictreatmentofthe technologiesthathavebeendevelopedinthisfield.Wehopethatitwillserveas agoodreferenceforpractitioners,researchers,andstudentswhoareinterested inthisfield. 1 2 1 Introduction Faceimageprocessingisabroadfield.Itisvirtuallyimpossibletocoverall thetopicsinthisfieldinasinglebook.Thisbookmainlyfocusesongeometry and appearance modeling, which involves the representation of face geom- etry and reflectance property and how to recover them from images. Face geometry and appearance modeling is a core component of the face image processing field. In general, any face image processing task has to deal with poseandlightingvariations,whichcanbeaddressedbyapplying3Dgeometry andappearancemodelingtechniques.Someoftheseapplicationsaredescribed inthelastpartofthebook. Inthischapter,wegivealiteraturereviewanddescribethescopeofthebook andthenotationscheme. 1.1 Literaturereview Human perception of faces has been a long-standing problem in psychology and neuroscience. Early works can be traced back to Duchenne [49, 50] and Darwin[42].Duchenne[49,50]studiedtheconnectionbetweenanindividual facial muscle contraction and the resulting facial movement by stimulating facialmuscleswithelectrodestotheselectedpointsontheface.Darwin[42] studied the connections between emotions and facial expressions. In 1888, Galton [69] studied how to use iris and finger prints to identify people. In thepastdecade,therehasbeenalotofstudyonwhatfeatureshumansuseto recognizefacesandobjectsandhowthefeaturesareencodedbythenervous system[17,56,70,89]. Earlyresearchoncomputationalmethodsforfacerecognitioncanbetraced backtoBledsoe[20],Kelly[109],andKanade[106,107].Kanade’sthesis[106] isapioneerworkonfacialimageanalysisandfeatureextraction.Intheseworks, theclassificationwasusuallydonebyusingthedistancesbetweendetectedface featurepoints. Early1970swasalsothetimeframewhenthefirstcomputationalmethodfor faceimagesynthesiswasdeveloped.In1972,Parke[162]developedapolygo- nal head model and generated mouth and eye opening and closing motions. After that, Parke [163] continued his research by collecting facial expres- sionpolygondataandinterpolatingbetweenthepolygonalmodelstogenerate continuousmotion. In 1977, Ekman and Friesen [52] developed the Facial Action Encoding System(FACS).ThisschemebreaksdownfacialmovementintoActionUnits (AUs)whereeachAUrepresentsanatomicaction,whichisresultedfromthe contractionofeitherasinglemuscleorasmallnumberofmuscles.FACShas beenextensivelyusedforbothanalysisandsynthesistasks. 1.1 Literaturereview 3 In early 1980s, as part of the effort to develop a system for performing the actions ofAmerican Sign Language, Platt and Badler [170] developed a physically based representation for facial animation. In their representation, faceskindeformationwascausedbythecontractionofmuscles,andamech- anism was developed to propagate muscle contractions to the skin surface deformation. In1985,BergeronandLachapelle[13]producedthefamousanimatedshort filmTonydePeltrie.Theyphotographedarealpersonperforming28different facialexpressions.Theseimagesweremarkedmanually,andthefeaturepoint motions were used to drive their face model. Their work demonstrated the poweroffacialexpressionmapping. In1986and1987,Pearceetal.[165]andLewisandParke[124]developed speech-driven animation systems that used phoneme recognition to drive lip movement.In1987,Waters[233]developedamusclemodelforgenerating3D facialexpressions.ComparedtotheworkofPlattandBadler[170],Waters’s modelismoreflexibleinthatitavoidshard-codingthemappingfromamuscle contraction to the skin surface deformation. This technique was adopted by PixarintheanimatedshortfilmTinToy[115]. In1988,Magnenat-Thalmannetal.[140]proposedtorepresentafaceani- mation as a series of abstract muscle action procedures, where each abstract muscleactiondoesnotnecessarilycorrespondtoanactualmuscle. The 1990s witnessed an exciting period of rapid advancement in both face image analysis and synthesis fields. In 1990, range scanners such as the ones manufactured by Cyberware, Inc., became commercially available. Such scanners were extremely valuable for data collection. In the same year, Williams [239] developed a performance-driven facial animation system. He obtainedarealistic3DfacemodelbyusingaCyberwarescanner.Fortracking purposes,aretroreflectivematerialwasputontheliveperformer’sfacesothat asetofbrightspotscanbeeasilydetectedontheface.Themotionsofthespots werethenmappedtotheverticesofthefacemodel. TerzopoulosandWaters[208,209,210]developedaphysicallybasedmodel tonotonlysynthesizebutalsoanalyzefacialexpressions.Theyusedatechnique calledsnakes[108]totrackfacialfeaturecontours,andestimatedthemuscle control parameters from the tracked contours. The recovered muscle control parameterscanthenbeusedtoanimateasyntheticfacemodel.Leeetal.[121, 122] developed a system for cleaning up laser-scanned data, registering, and insertingcontractilemuscles. Since 1990, there has been a tremendous amount of work on image-based face modeling and animation. In the following, we review the literature alongfivedifferentbutinterconnectedthreads:statisticalmodelsandsubspace 4 1 Introduction representation,geometrymodeling,appearancemodeling,animation,andthe modelingofeyesandhair. 1.1.1 Statisticalmodelsandsubspacerepresentation In 1987, Sirovich and Kirby [112, 199] proposed to use eigenvectors (called eigenpictures)torepresenthumanfaces.In1991intheirseminarpaper[216], TurkandPentlandproposedtousethisrepresentationforfacerecognition,and they call the eigenvectors “eigenfaces.”Their work not only generated much excitement in face recognition but also made Principal ComponentAnalysis anextremelypopulartoolindesigningstatisticmodelsforimageanalysisand synthesis.In1992,Yuilleetal.[251]developedatemplate-basedtechniquefor detectingfacialfeatures.Theirtechniquecanbethoughtofasanextensionto the snakes technique [108] in that the local elastic models in the snakes are replacedbytheglobaldeformabletemplates.In1995,Cootesetal.[38]pro- posedtouseastatisticshapemodel,calledActiveShapeModel(ASM),asa betterdeformabletemplate.Anactiveshapemodelisconstructedbyapplying PrincipalComponentAnalysistothegeometryvectorsofasetoffaceimages. ASMwaslaterextendedtotheActiveAppearanceModel(AAM)[37],which consistsofastatisticshapemodelaswellasastatistictexturemodel.In1997, VetterandPoggio[220]introducedlinearobjectclassesforsynthesizingnovel views of objects and faces. In 1999, Blanz and Vetter [19] further extended thisideaandintroducedamorphablemodelfor3Dfacereconstructionfroma singleimage.TheyusedtheUniversityofSouthFlorida(USF)datasetof200 Cyberware scanned faces to construct a statistic shape model and a statistic texturemodel.Givenaninputfaceimage,theshapeandtexturemodelcoef- ficients can be recovered through an analysis-by-synthesis framework. Since then, both the USF dataset and the notion of the linear object classes have becomeverypopular.LinearspacerepresentationsaredescribedinChapters2 and3.Theface-modelingtechniqueofBlanzandVetter[19]willbedescribed inSection7.2. The study of subspace representation of face images under varying illu- minations also started to attract much attention in the same period. In 1992, Shashua [194] proposed that, without considering shadows, the images of a Lambertiansurfaceinafixedposeundervaryingilluminationsliveinathree- dimensional linear space. In 1994, Hallinan [82] made a similar observation and proposed to use a low-dimensional representation for image synthesis. In 1995, Murase and Nayar [153] proposed to use low-dimensional appear- ance manifolds for object recognition under varying illumination conditions.