Table Of Content

Multilinear Wavelets: A Statistical Shape Space for Human Faces AlanBrunton FraunhoferInstituteforComputerGraphicsResearchIGD Fraunhoferstraße5,64283Darmstadt [email protected] TimoBolkart StefanieWuhrer ClusterofExcellenceMMCI,SaarlandUniversity CampusE17,66123Saarbru¨cken,Germany {tbolkart|swuhrer}@mmci.uni-saarland.de 4 1 0 July2,2014 2 l u Abstract ofdataareavailablefromwhichtolearnastatisticalprioron J theshapeofthehumanface(e.g.[5,33,32,23]). 1 Wepresentastatisticalmodelfor3Dhumanfacesinvarying In this paper, we propose a novel statistical model for the ] expression, which decomposes the surface of the face using shape of human faces, and use it to fit to input 3D surfaces V awavelettransform,andlearnsmanylocalized,decorrelated fromdifferentsources,exhibitinghighvariationinexpression C multilinear models on the resulting coefficients. Using this andidentity,andseverelevelsofdatacorruptionintheforms . model we are able to reconstruct faces from noisy and oc- ofnoise,missingdataandocclusions.Wemakethefollowing s c cluded3Dfacescans,andfacialmotionsequences. Accurate specifictechnicalcontributions: [ reconstructionoffaceshapeisimportantforapplicationssuch 2 as tele-presence and gaming. The localized and multi-scale • A novel statistical shape space based on a wavelet de- v nature of our model allows for recovery of fine-scale detail compositionof3Dfacegeometryandmultilinearanaly- 8 whileretainingrobustnesstoseverenoiseandocclusion,and sisoftheindividualwaveletcoefficients. 1 is computationally efficient and scalable. We validate these 8 • Based on this model, we develop an efficient algorithm properties experimentally on challenging data in the form of 2 forlearningastatisticalshapemodelofthehumanface . staticscansandmotionsequences. Weshowthatincompari- 1 invaryingexpressions. sontoaglobalmultilinearmodel,ourmodelbetterpreserves 0 finedetailandiscomputationallyfaster,whileincomparison 4 • Wedevelopanefficientalgorithmforfittingourmodelto 1 toalocalizedPCAmodel,ourmodelbetterhandlesvariation static and dynamic point cloud data, that is robust with : in expression, is faster, and allows us to fix identity parame- v respecttohighlycorruptedscans. tersforagivensubject. i X • Wepublishourstatisticalmodelandcodetofitittopoint r a clouddata[6]. 1 Introduction Our model has the following advantages. First, it results Acquisitionof3Dsurfacedataiscontinuallybecomingmore in algorithms for training and fitting that are highly efficient commonplaceandaffordable,throughavarietyofmodalities and scalable. By using a wavelet transform, we decompose ranging from laser scanners to structured light to binocular a high-dimensional global shape space into many localized, and multi-view stereo systems. However, these data are of- decorrelatedlow-dimensionalshapespaces. Thisdimension- tenincompleteandnoisy,androbustregularizationisneeded. alityisthedominantfactorinthecomplexityofthenumerical Whenweareinterestedinaparticularclassofobjects,suchas routines used in both training and fitting. Training on thou- humanfaces,wecanusepriorknowledgeabouttheshapeto sandsoffacestakesafewminutes,andfittingtoaninputscan constrainthereconstruction.Thisalleviatesnotonlytheprob- takesafewseconds,bothusingasingle-threadedimplemen- lems of noise and incomplete data, but also occlusion. Such tationonastandardPC. priorscanbelearnedbycomputingstatisticsondatabasesof Second,itallowstocapturefine-scaledetailsduetoitslo- registered3Dfaceshapes. cal nature, as shown in Figure 5, while retaining robustness Accurate 3D face capture is important for many applica- against corruption of the input data. The wavelet transform tions,fromperformancecapturetotele-presencetogamingto decomposes highly correlated vertex coordinates into decor- recognition tasks to ergonomics, and considerable resources related coefficients, upon which multilinear models can be 1 learned independently. Learning many low-dimensional sta- andexpressionvariations, andthusinSection6wecompare tisticalmodels, ratherthanasinglehigh-dimensionalmodel, toaglobalmultilinearmodelandshowthatourmodelbetter asusedin[5,30,7],greatlyreducestheriskofover-fittingto captureslocalgeometricdetail. thetrainingdata;itavoidsthecurseofdimensionality. Thus, BlanzandVetter[5]manuallysegmentedthefaceintofour amuchhigherproportionofthevariabilityinthetrainingdata regionsandlearnedamorphablemodeloneachsegment.The can be retained in the model. During fitting, tight statistical regions are fitted to the data independently and merged in boundscanbeplacedonthemodelparametersforrobustness, a post-processing step. This part-based model was shown yetthemodelcanstillfitcloselytovaliddatapoints. to lead to a higher data accuracy than the global morphable Third, it is readily generalizable and extendable. Our model. Aspart-basedmodelsaresuitabletoobtaingoodfit- modelrequiresnoexplicitsegmentationofthefaceintoparts; tingresultsinlocalizedregions,theyhavebeenusedinmul- the wavelet transform decomposes the surface hierarchically tiple follow-up works, eg. [13, 28, 25]. While the model of into overlapping patches, and the inverse transform recom- Kakadiarisetal.[13]sharessomesimilaritieswithourmodel, binesthem. Unlikemanuallydecomposedpart-basedmodels, theyuseafixedannotatedfacemodel,andwavelettransforms eg. [13, 28, 25], it requires no sophisticated optimization of tocomparefacialgeometryimages.Incontrast,welearnmul- blendingweightsandthedecompositionisnotclass-specific. tilinearmodelsonsubdivisionwaveletcoefficients. Further,itcanbeeasilyextendedtoincludeadditionalinfor- All of the methods discussed so far model shape changes mationsuchastexture. using global or part-based statistical models. In contrast, by applyingawavelettransformtothedatafirst,statisticalmod- els can be constructed that capture shape variation in both a 2 Related Work local and multi-scale way. Such wavelet-domain techniques havebeenusedextensivelyformedicalimaging[12,20,17], This work is concerned with learning 3D statistical shape and Brunton et al. [8] proposed a method to analyze local models that can be used in surface fitting tasks. To learn a shapedifferencesof3Dfacesinneutralexpressioninahierar- statisticalshapemodel,adatabaseofshapeswithknowncor- chicalway.Thismethoddecomposeseachfacehierarchically respondence information is required. Computing correspon- using a wavelet transform and learns a PCA model for each dences between a set of shapes is a challenging problem in wavelet coefficient independently. This approach has been general [27]. However, for models of human faces, corre- shown to capture more facial details than global statistical spondences can be computed in a fully automatic way using shapespaces. Hence, inSection6wecomparetoawavelet- templatedeformationmethods(e.g.[19,22]). domainapproachandshowthatourmodelbettercapturesex- Themostrelatedworkstoourworkarepart-basedmultilin- pressionvariation. ear models that were recently proposed to model 3D human We propose a method that combines this localized shape bodyshapes[9]. Todefinethepart-basedmodel,asegmenta- space with a multilinear model, thereby allowing to capture tion of the training shapes into meaningful parts is required. localizedshapedifferencesofdatabasesof3Dfacesofdiffer- Thisisdonemanuallybysegmentingthehumanmodelsinto entsubjectsindifferentexpressions. bodyparts,suchaslimbs. Lecronetal.[16]useasimilarsta- tisticalmodelonhumanspines,thataremanuallysegmented into its vertebrae. In contrast, our method computes a suit- 3 Multilinear Wavelet Model ablehierarchicaldecompositionautomatically,therebyelimi- natingtheneedtomanuallygenerateameaningfulsegmenta- Ourstatisticalshapespaceforhumanfacesconsistsofamul- tion. tilinear model for each wavelet coefficient resulting from a Manystatistical modelshave beenusedto analyzehuman spherical subdivision wavelet decomposition of a template faces. Thefirststatisticalmodelfortheanalysisof3Dfaces face mesh. The wavelet transform takes a set of highly cor- was proposed by Blanz and Vetter [5]. This model is called related vertex positions and produces a set of decorrelated the morphable model, and uses Principal Component Anal- wavelet coefficients. This decorrelation means that we can ysis (PCA) to analyze shape and texture of registered faces, treat the coefficient separately and learn a distinct multilin- mainly in neutral expression. It is applied to reconstruct 3D earmodelforeachcoefficient. Thesemultilinearmodelscap- facialshapesfromimages[5]and3Dfacescans[4,21]. Am- turethevariationofeachwaveletcoefficientoverchangesin berg et al. [1] extend the morphable model to consider ex- identityandexpression. Inthefollowing, wereviewthetwo pressions,bycombiningitwithaPCAmodelforexpression componentsofourmodel. offsetswithrespecttotheneutralexpressiongeometry.Anal- ternativewaytoincorporateexpressionchangesistouseuse 3.1 SecondGenerationSphericalWavelets amultilinearmodel, whichseparatesidentityandexpression variations. This model has been used to modify expressions Spherical wavelets typically operate on subdivision sur- in videos [30, 11, 31], or to register and analyze 3D motion faces [24] following a standard subdivision hierarchy, giv- sequences[7]. Multilinearmodelsaremathematicallyequiv- ing a multi-scale decomposition of the surface. This allows alent to TensorFaces [29] applied to 3D data rather than im- coarse-scaleshapepropertiestoberepresentedbyjustafew ages, and provide an effective way to capture both identity coefficients,whilelocalizedfine-scaledetailsarerepresented 2 4 Training byadditionalcoefficients. Secondgenerationwaveletscanbe accelerated using the lifting scheme [26], factoring the con- volutionofthebasisfunctionsintoahierarchyoflocallifting Inthissection,wedescribetheprocessoflearningthemulti- operations, whichareweightedaveragesofneighboringver- linear wavelet model from a database of registered 3D faces tices. When combined with subsampling, the transform can in a fixed number of expressions. Using the notation from be computed in time linear in the number of vertices. The Section3.2,thedatabasecontainsd identities,eachind ex- particularwaveletdecompositionweuse[3]followsCatmull- 2 3 pressions. WediscussinSection6howtoobtainsuchareg- Clarksubdivision,andhasbeenusedpreviouslyforlocalized istereddatabase. Thetrainingprocessisdepictedgraphically statisticalmodelsinmultipleapplicationdomains[17,8].The inFigure1. wavelet transform is a linear operator, denoted D. For a 3D facesurfaceX,thewaveletcoefficientsares=DX. Thefirststageinourtrainingpipelineistoapplyawavelet transform to every shape in our training database. The left- most part of Figure 1 shows the influence region of two waveletcoefficientsonfourfaceshapes(twoidentitiesintwo 3.2 MultilinearModels expressions). To obtain a template with the proper subdivision connectivity, we use a registration-preserving stereo- To statistically analyze a population of shapes, which vary graphicresamplingontoaregulargrid[8],althoughanyquad- in multiple ways, such as identity and expression for faces, remeshing technique could be used. Because the training one can use a multilinear model. In general, one constructs shapesareregistered,andhavethesameconnectivity,wenow a multilinear model by organizing the training data into an have a database of registered wavelet coefficients (middle of N-modetensor,wherethefirstmodeisthevectorrepresenta- Figure 1). Note that this does not require any manual seg- tionofeachtrainingsample,andtheremainingmodescontain mentation,butiscomputedfullyautomatically. Byconsider- trainingsamplesvariedindistinctways. ingthedecorrelatingpropertiesofwavelettransforms,wecan Weorganizeoursetofparametrizedtrainingshapesintoa look at it another way: we now have a training set for each 3-modetensorA∈Rd1×d2×d3,whered1isthedimensionof individualwaveletcoefficient,whichwecantreatseparately. eachshape,andd andd arethenumberoftrainingsamples 2 3 From these decorrelated training sets, covering variations in each mode of variation; in our case, identity and expres- inbothidentityandexpression,wecanlearnadistinctmulti- sion.Itwouldbestraightforwardtoextendthismodeltoallow linearmodelforeachcoefficient,resultinginmanylocalized formoremodes,suchasvaryingtexturesduetoillumination shapespacesasshownintherightpartofFigure1. Thisal- changes, if the data were available. We use a higher-order lowsatremendousamountofflexibilityinthemodel. SingularValueDecomposition(HOSVD)[15]todecompose Ainto Training our model has the following complexity. Each wavelet transform has complexity O(n), for n vertices, and A=M× U × U , (1) 2 2 3 3 weperformd d ofthem. ThecomplexityoftheHOSVDis 2 3 O(d2(d d2 +d d2)) [15], and we compute n of them. Be- where M ∈ Rd1×m2×m3 is a tensor called a multilinear caus1e ev2er3y mu3lti2linear model is computed for only a sin- model,andU2 ∈ Rd2×m2 andU3 ∈ Rd3×m3 areorthogonal gle wavelet coefficient over the training set, d1 = 3 so the mma∈triRcems.iTohfeMi-thinmthoededpirreocdtiuocntMofi×-tihUmiroedpelabcyesUeiamch∈veRctdoir. cOo(mnp(dle2xdi23ty+isd3Od(22d)2)do23v+eradll3.dT22)hupse,rowuravmeoledtelcoaellfofiwcisenhtigahnldy To compute the orthogonal matrix U2, A is unfolded in the efficientandscalabletraining,asdetailedinSection6. directionof2-ndmodetothematrixA(2) ∈Rd2×d1d3,where Trainingmanylow-dimensionalmodelshasstatisticalben- thecolumnsofA arethevectorsofAindirectionof2-nd (2) efitstoo. Weretainalargeamountofthevariationpresentin mode. thetrainingdatabytruncatingmodes2and3atm = 3and 2 Thedecompositionin(1)isexact, ifm = rank(U )for i (i) m = 3. We chose m = m = 3 because d = 3 is the 3 2 3 1 all i. If m < rank(U ) for at least one i, the decomposi- i (i) smallestmode-dimensioninourtensor. tionapproximatesthedata. Thistechniqueiscalledtruncated Ourmodelgeneratesa3DfacesurfaceX asfollows. The HOSVD,andweusethistoreducethedimensionalityofthe vertexpositionsx ∈ X aregeneratedfromthewaveletcoef- trainingdata. ficients via the inverse wavelet transform, denoted by D−1. Themultilinearmodelrepresentsashapes∈Rd1 by The wavelet coefficients are generated from their individual multilinearweightsforidentityandexpression. Thus,follow- s≈f+M× wT × wT, (2) 2 2 3 3 ing(2),waveletcoefficientsaregeneratedby where f is the mean of the training data (over all identities s =s +M × wT × wT (3) andexpressions),andw2 ∈ Rm2 andw3 ∈ Rm3 areidentity k k k 2 k,2 3 k,3 andexpressioncoefficients.Varyingonlyw changesidentity 2 whilekeepingtheexpressionfixed,whereasvaryingonlyw wherekistheindexofthewaveletcoefficient,andthesurface 3 changestheexpressionofasingleidentity. isgeneratedbyX =D−1swheres=[s ... s ]T. 1 n 3 Training Data Wavelet Decomposition Localized Multilinear Models Shape Space Figure 1: Overview of the training. Left: Training data with highlighted impact of the basis function. Middle: Wavelet decomposition of each face of the training data. Right: Corresponding wavelet coefficients and learned multilinear model shapespaces. plusanduniformscaling)Rmappingthecoordinateframeof X tothecoordinateframeofP. Ourfittingenergyconsistsoffourparts: alandmarkterm, a surface fitting term, a surface smoothing term, and a prior Localized Multilinear Models term. Thatis, E =E +E +E +E (4) fit L X S P whereE ,E ,E andE arethelandmarkenergy,surface L X S P fittingenergy,surfacesmoothingenergyandpriorenergy,respectively. Wenowdescribeeachoftheseenergiesinturn. ThelandmarkenergymeasurestheEuclideandistancebe- tweencorrespondinglandmarksetsL(m) ⊂X andL(d) ⊂P located on the model surface and input data, respectively. Input Face Initialize Fit Surface These landmarks may be obtained in a variety of ways, in- cludingautomatically[10,22],anddonotrestrictourmethod. Figure2: Overviewofthefitting. Top: Localizedmultilinear In Section 6, we demonstrate how our method performs us- models. Bottom, left to right: input face scan, result after ing landmarks from multiple sources. The landmarks are in initialization,resultoffullsurfacefitting. correspondence such that |L(m)| = |L(d)| and (cid:96)(m) and (cid:96)(d) i i representtheequivalentpointsonX andP respectively.With 5 Fitting this,wedefineourlandmarkenergyas, In this section, we discuss the process of fitting our learned |L(m)| modeltoaninputorientedpointcloudormeshP,whichmay E =ρ |X| (cid:88) (cid:13)(cid:13)R(cid:96)(m)−(cid:96)(d)(cid:13)(cid:13)2 (5) L L|L(m)| (cid:13) i i (cid:13)2 be corrupted by noise, missing data or occlusions. The pro- i=1 cess is depicted graphically in Figure 2. We fit our model whereρ =1isaconstantbalancingtherelativeinfluenceof by minimizing a fitting energy that captures the distance be- L landmarksagainstthatoftherestofthesurface. tween X and P, subject to the constraints learned in our Thesurfacefittingenergymeasuresthepoint-to-planedis- training phase. We minimize the energy in a coarse-to-fine tancebetweenverticesinX andtheirnearestneighborsinP. manner, starting with the multilinear weights of the coarse- Thatis, scalewaveletcoefficients,andrefiningtheresultbyoptimiz- E = (cid:88) ρ(x)(cid:107)Rx−y(x)(cid:107)2 (6) ingfiner-scalemultilinearweights. X 2 x∈X\L(m) wherey(x)istheprojectionofRxintothetangentplaneofp, 5.1 FittingEnergy wherep∈P isthenearestneighborofRx. Thedistancesare We optimize our model parameters to minimize an energy weightedby measuringthedistancebetweenX andP.Ourmodelparame- (cid:40) tersconsistoftheper-waveletcoefficientmultilinearweights, ρ(x)= 1 if (cid:107)Rx−p(cid:107)2 ≤τ (7) w ,w fork =1,...,n,andasimilaritytransform(rigid 0 otherwise k,2 k,3 4 where τ = 1cm is a threshold on the distance to the nearest Inthesecondstep,wefixRandminimize(4)withrespect neighbor,providingrobustnesstomissingdata. Wecompute to only the multilinear weights. This deforms the surface so nearestneighborsusingANN[2]. that it closely fits the input data P. Figure 2 (bottom, right) The prior energy restricts the shape to stay in the learned showsthefinalfittingresult. shapespace, providingrobustnesstobothnoiseandoutliers. The energies E , E and E are nonlinear with respect L X S We avoid introducing undue bias to the mean shape via a to the multilinear weights, and we minimize them using the hyper-boxprior[7], L-BFGS-B [18] quasi-Newton method. This bounded opti- mizationallowstheprior(8)tobeenforcedsimplyasbounds   (cid:88)n (cid:88)m2 (cid:88)m3 on the multilinear weights. The hierarchical and decorrelat- EP =  fk,2,j(wk,2,j)+ fk,3,j(wk,3,j) (8) ingnatureofthewavelettransformallowsustominimizethe k=1 j=1 j=1 energies separately for each multilinear model in a coarse- to-fine manner. During initialization, we recompute R and where optimize the multilinear weights iteratively at each level of (cid:40) waveletcoefficients. Duringsurfacefitting,nearestneighbors 0 ifw¯ −λ≤w ≤w¯ +λ f (w)= k,2,j k,2,j (9) are recomputed and the multilinear weights optimized itera- k,2,j ∞ otherwise tively at each level. During initialization, we allow greater variation in the model, λ = 1, because we assume the land- restricts each component of wk,2 to be within a constant marks are not located on occlusions. During surface fitting, amount λ of the same component of the mode-mean w¯k,2, weresticttheshapespacefurther,λ = 0.5,unlessthepartic- andsimilarlyforeachcomponentofwk,3. ularweightcomponentisalreadyoutsidethisrangefromthe The smoothing energy is the bi-Laplacian energy, which initialization. penalizeschangesincurvaturebetweenneighboringvertices. Fitting many low-dimensional local multilinear models is It is needed due to the energy minimization algorithm, de- more efficient than fitting a single high-dimensional global scribed in Section 5.2, which optimizes each multilinear multilinearmodel,becausethedimensionalityofthevariables waveletindependently. Withoutasmoothingenergy,thiscan to be optimized is the dominant factor in the complexity of resultinvisiblepatchboundariesinthefittedsurface,ascan the quasi-Newton optimization, which achieves super-linear beseeninFigure4. convergence by updating an estimate of the Hessian matrix Formally,wewrite in each iteration. For a problem size d = m + m the 2 3 HessiancontainsΩ(d2)uniqueentries, whichfavorssolving ES =ρS (cid:88)(cid:13)(cid:13)U2(x)(cid:13)(cid:13)22 (10) many small problems even if the total number of variables x∈X optimizedisgreater.ThisisconfirmedexperimentallyinSec- tion 6. Further, each multilinear model has compact support where U2(x) is the double-umbrella discrete approximation on X, which reduces the number of distances that must be ofthebi-Laplacianoperator[14],andρ isaconstantweight. S computedineachevaluationof(6)anditsgradient. Thesmoothingenergyposesatrade-off: visuallypleasing smooth surfaces versus fitting accuracy and speed. Leaving 5.3 Tracking out E allows the energy minimization to get closer to the S data (as expected), and leads to faster fitting due to the en- Asanapplicationofourshapespace,weshowhowasimple ergybeingmorelocalized. Hence,weretaintheoptionofnot extensionofourfittingalgorithmcanbeusedtotrackafacial evaluatingthisenergyincasethescenariowouldfavorclose motionsequence. Tothefirstframe, wefitbothidentityand fittingandfastperformanceovervisuallysmoothresults. We expressionweights.Subsequently,wefixidentityweightsand useeitherρS =100orρS =0inallourexperiments.Section onlyfitexpressionweights. Thisensuresthatshapechanges 6discussesthistrade-offinmoreconcreteterms. overthesequenceareonlyduetoexpression,notidentity. A moreelaboratescheme,whichaveragestheidentityweights, 5.2 EnergyMinimization wouldalsobefeasible. To avoid jitter, we introduce a temporal smoothing term Weminimize(4)inatwo-stepprocedure. Inthefirststep,we on the vertex positions. Approaches based on global multi- iterativelyminimizeE +E +E withrespecttoRandthe linear models often place a temporal smoothing term on the L P S multilinear weights of each wavelet coefficient. This rigidly expressionweightsthemselves[31,7]sincetheseareusually alignsthemodelandthedata, andcoarselydeformsthesur- much lower dimension than the surface X. In our case, the facetofitthelandmarks,givingagoodinitializationforsub- combineddimensionalityofallexpressionweightsisequalto sequent surface fitting. We solve for R that minimizes E , that of the vertex positions, so no efficiency is to be gained L given the landmark positions L(m) and L(d). This involves by operating on the weights rather than the vertex positions. solvingasmallover-determinedlinearsystem. Then,weop- Further,placingarestrictiononthevertexpositionsfitseasily timizew andw fork =1,...,ntominimizeE +E . intoourenergyminimization. Weuseasimplepenaltyonthe k,2 k,3 L P Figure2(bottom,middle)showstheresultoflandmarkfitting movementoftheverticesx∈X betweenframes. Thisiseas- foragiveninputdata. ilyincorporatedintoourfittingalgorithmbysimplyaddinga 5 Euclideandistancepenaltytoourenergyfunction(4)during surfacefitting: E = (cid:88) ρ (cid:107)x −x (cid:107)2 (11) T T t t−1 2 xt∈Xt whereρ = 1isaconstantbalancingallowingthesurfaceto 100 T moveversusreducingjitter. 80 e g a 60 nt Local multiple PCA e 6 Evaluation erc 40 Global multilinear P Our model ρ = 0 S 20 Our model ρ = 100 S 6.1 ExperimentalSetup 00 1 2 3 4 5 Error [mm] TrainingData: Foratrainingdatabase,weusetheBU3DFE database [33] registered using an automatic template-fitting Figure 3: Top block: Median reconstruction error for noisy approach [22] with ground truth landmarks. This database datausingmultiplelocalizedPCAmodels,aglobalmultilin- contains100subjectsin25expressionslevelseach. Wesuc- earmodel, ourmodel(ρS = 0), andourmodel(ρS = 100). cessfully registered 99 subjects in all expressions and used Bottomblock: maskshowingthecharacteristicdetailregions thisfortraininginourexperiments. oftheface,andcumulativeerrorplotforvaryingidentityand TestData:Totestourfittingaccuracyweuse200scansfrom expression. Errorsinmillimeters. the Bosphorus database [23] including variation in identity, expressionandtypesofocclusions.Wespecificallydonottest 6.2 ReconstructionofNoisyData onscansfromthesamedatabaseweusefortrainingtoavoid bias.Further,theBosphorusscanstypicallyhavehighernoise levels than those in BU3DFE, and contain occlusions. This databasecontainslandmarksoneachscan;weusethesubset of those shown in Figure 2 present on a given surface (not blockedbyanocclusion). InSection6.4,weshowtheperfor- manceofourmethodwhentrackingfacialmotionsequences from the BU4DFE database [32] with landmarks automatically predicted using an approach based on local descriptors andaMarkovnetwork[22]. Comparison: Wecompareourfittingresultstothelocalized PCAmodel[8]andtheglobalmultilinearmodel[7].Allthree modelsaretrainedwiththesamedata,withtheexceptionthat becausethelocalPCAmodeldoesnotmodelexpressionvari- ation, we train it separately for each expression and give it thecorrectexpressionduringfitting. Theothertwoaregiven landmarksforfitting. Performance: Weimplementedourmodel,bothtrainingand fitting, in C++ using standard libraries. We ran all tests on Figure 5: Reconstruction examples for noisy scans in dif- a workstation running windows with an Intel Xeon E31245 ferent expressions. Top block: fear expression. Top block: at 3.3GHz. Training our model on 2475 face shapes each happy expression. Each block, from left to right: local mul- with24987verticestakes<5minusingasingle-threadedim- tiple PCA [8], global multilinear [7], proposed (ρS = 100), plementation. Inpracticewefoundourtrainingalgorithmto inputdata. scaleapproximatelylinearlyinthenumberoftrainingshapes. Fittingtakes5.37sonaveragewithρ = 0,and14.76swith Inthissection,wedemonstrateourmodel’sabilitytocap- S ρ = 100, for a surface with approximately 35000 vertices ture fine-scale detail in the presence of identity and expres- S (Sections6.2and6.3).Forthemotionsequenceswithapprox- sion variation, and high noise levels. We fit it to 120 mod- imately 35000 vertices per frame (Section 6.4), fitting takes els(20identitiesinupto7expressions)fromtheBosphorus 4.35s per frame on average without smoothing and 11.14s database [23]. We measure the fitting error as distance-to- withsmoothing. Theglobalmultilinearmodeltakes≈2min data,andtheper-vertexmedianerrorsareshownforallthree forfittingtoastaticscan. Asingle-threadedimplementation models in Figure 3 (left). Our model has a greater propor- ofthelocalPCAmodeltakes5minduetothesampling-based tionofsub-millimetererrorsthaneitheroftheothermodels. optimization,whichavoidslocalminima. Specifically, the local PCA and the global multilinear have 6 Figure4:EffectofsmoothingenergyE onanexamplenoisyscan.Leftblock:fittingresultsforascaninsurpriseexpression, S withaclose-upofthenoseregioninthebottomrow. Lefttoright: localmultiplePCA,globalmultilinearmodel,ourmodel (ρ =0),ourmodel(ρ =100),andinputdata. Rightblock: ourreconstructionsforafearexpressionforρ =0(left)and S S S ρ = 100. Note the faint grid-artifacts that appear without smoothing, eg. in the cheek region and around the mouth. The S inputdatacanbeseeninFigure5(leftblock). 63.2%and62.0%,respectively,ofverticeswitherror<1mm, 6.3 ReconstructionofOccludedData whereasourmodelhas71.6%withρ =100and72.4%with S Inthissection,wedemonstrateourmodel’srobustnesstose- ρ = 0. Figure3(right)showscumulativeerrorplotsforall S vere data corruptions in the form of occlusions. We fit all three methods for vertices in the characteristic detail region threemodelsto80scans(20subjects,4typesofocclusions) oftheface,whichisshownnexttotheplot. Thisregioncon- fromtheBosphorusdatabase[23]. Figure6(topright)shows tainsprominentfacialfeatureswiththemostgeometricdetail. the cumulative error for all three models. Since distance-to- Weseethatourmodelismoreaccuratethanpreviousmodels data is not a valid error measure in occluded areas, we ap- in this region and has many more sub-millimeter errors; the ply different masks, shown next to the error plot, depending local PCA and global multilinear have 60.4% and 58.0% of onthetypeofocclusionsothatonlyunoccludedverticesare errors < 1mm, respectively, whereas our model has 70.2% measured. Clockwise from top-left: the mask used for eye, withρ = 100and72.7%withρ = 0. Thisshowsthatour S S glasses, mouthandhairocclusions. Fromthecumulativeer- model has improved accuracy for fine-scale detail compared rorcurves, weseethatourmodelretainsgreateraccuracyin toexistingmodels,inparticularinareaswithprominentfea- unoccludedpartsofthefacethanpreviousmodels. turesandhighgeometricdetail. The bottom two rows of Figure 6 show example recon- structions in the presence of severe occlusions. All models showrobustnesstoocclusionsandreconstructplausibleface Figures4and5showexamplesoffittingtonoisyscansof shapes, but our model provides better detail in unoccluded different subjects in different expressions. These scans con- parts of the face than previous models (see the mouth and tain acquisition noise, missing data and facial hair. Figure 4 chin in the first row, and the nose in the second row). For (left) shows a surprise expression and close-ups of the nose theseexamples,weshowourreconstructionwithρ =100. region;ourreconstructionbothρ =100andρ =0capture S S S significantlymorefine-scaledetailthanpreviousmodels.The rightpartofthefiguredemonstratestheeffectofthesmooth- 6.4 ReconstructionofMotionData ing energy in preventing faint grid artifacts appearing in the reconstruction due to the independent optimization scheme. Inthissection,weshowourmodel’sapplicabilityto3Dface Figure 5 shows two subjects in fear and happy expressions. tracking using the simple extension to our fitting algorithm We again see the increased accuracy of our model in terms described in Section 5.3. Figure 7 shows some results for a of fine-scale detail on facial features compared to previous selection of frames from three sequences from the BU4DFE models. Note the accuracy of the nose and mouth shapes in database[32]. Weseethat, asforstaticscans, highlevelsof allexamplescomparedtotheothermodels, andtheaccurate facial detail are obtained, and even the simple extension of fitting of the underlying face shape in the presence of facial our fitting algorithm tracks the expression well. Since land- hair. Furthernotehowourmodelcapturestheasymmetryin marks are predicted automatically for these sequences, the theeyebrowregionforthefearexpression. entire tracking is done automatically. This simple tracking 7 stratedthesepropertiesexperimentallywithathorougheval- uationonnoisydatawithvaryingexpression,occlusionsand 100 missing data. We have further shown how our fitting proce- 80 durecanbeeasilyandsimplyextendedtogivestabletracking e ag 60 of 3D facial motion sequences. Future work includes mak- ercent 40 LGolocbaal ml muultilptillien ePaCrA ing our model applicable for real-time tracking. Virtually P Our model ρ = 0 allaspectsofourfittingalgorithmaredirectlyparallelizable, 20 S Our model ρ = 100 and an optimized GPU implementation could likely achieve S 00 1 2 3 4 5 real-time fitting rates, in particular for tracking, where only Error [mm] expression weights need to be optimized every frame. Such high-detail real-time tracking could have tremendous impact intele-presenceandgamingapplications. References [1] B.Amberg,R.Knothe,andT.Vetter. Expressioninvariant3D facerecognitionwithamorphablemodel. InFG,pages1–6, 2008. 2 [2] A.AryaandD.Mount. Approximatenearestneighborqueries infixeddimensions. InSODA,pages271–280,1993. http: //www.cs.umd.edu/˜mount/ANN/. 5 [3] M.Bertram,M.Duchaineau,B.Hamann,andK.I.Joy. Gen- eralized B-Spline subdivision-surface wavelets for geometry compression. TVCG,10(3):326–338,2004. 3 [4] V.Blanz,K.Scherbaum,andH.-P.Seidel. Fittingamorphable modelto3dscansoffaces. InICCV,2007. 2 [5] V.BlanzandT.Vetter.Amorphablemodelforthesynthesisof Figure6:Topleft:Masksusedtomeasureerrorforthediffer- 3dfaces. InSIGGRAPH,pages187–194,1999. 1,2 ent occlusions types. Top right: combined cumulative error [6] Bolkart, T., Brunton, A., Salazar, A., Wuhrer, S.: Statisti- plot. Bottom two rows: reconstruction examples for a scans cal 3d shape models of human faces. http://statistical-face- with occlusions (eye and mouth). Each row: local multiple models.mmci.uni-saarland.de/(2013) 1 [7] T. Bolkart and S. Wuhrer. Statistical analysis of 3d faces in PCAmodel,globalmultilinearmodel,ourreconstructionwith motion. In3DV,2013. 2,5,6 ρ =100,inputdata. S [8] A.Brunton,C.Shu,J.Lang,andE.Dubois. Waveletmodel- basedstereoforfast,robustfacereconstruction.InCRV,2011. 2,3,6 algorithm is surprisingly stable. Videos can be found in the [9] Y.Chen,Z.Liu,andZ.Zhang.Tensor-basedhumanbodymod- supplementalmaterial. eling. InCVPR,2013. 2 [10] C. Creusot, N. Pears, and J. Austin. A machine-learning ap- 7 Conclusion proach to keypoint detection and landmarking on 3d meshes. IJCV,102(1-3):146–179,2013. 4 [11] K.Dale,K.Sunkavalli,M.K.Johnson,D.Vlasic,W.Matusik, We have presented a novel statistical shape space for human andH.Pfister. Videofacereplacement. TOG,30(6):130:1–10, faces. Our multilinear wavelet model allows for reconstruc- 2011. 2 tion of fine-scale detail, while remaining robust to noise and [12] C.Davatzikos,X.Tao,andD.Shen. Hierarchicalactiveshape severe data corruptions such as occlusions, and is highly ef- models, using the wavelet transform. TMI, 22(3):414–423, ficient and scalable. The use of the wavelet transform has 2003. 2 bothstatisticalandcomputationaladvantages. Bydecompos- [13] I. Kakadiaris, G. Passalis, G. Toderici, M. Murtuza, Y. Lu, ingthesurfacesintodecorrelatedwaveletcoefficients,wecan N. Karamelpatzis, and T. Theoharis. Three-dimensional face learn many independent low-dimensional statistical models recognitioninthepresenceoffacialexpressions:Anannotated rather than a single high-dimensional model. Lower dimen- deformablemodelapproach. TPAMI,29(4):640–649,2007. 2 [14] Kobbelt, L., Campagna, S., Vorsatz, J., Seidel, H.P.: Interac- sionalmodelsreducetheriskofoverfitting, whichallowsus tivemulti-resolutionmodelingonarbitrarymeshes.In: CGIT tosettightstatisticalboundsontheshapeparameters,thereby (1998) 5 providingrobustnesstodatacorruptionswhilecapturingfine- [15] L.D.Lathauwer. Signalprocessingbasedonmultilinearalge- scale detail. Model dimensionality is the dominant factor in bra. PhDthesis,K.U.Leuven,Belgium,1997. 3 thenumericalroutinesusedforfittingthemodeltonoisyin- [16] F.Lecron,J.Boisvert,S.Mahmoudi,H.Labelle,andM.Ben- put data, and fitting many low-dimensional models is much jelloun. Fast3dspinereconstructionofpostoperativepatients faster than a single high-dimensional model even when the usingamultilevelstatisticalmodel. MICCAI,15(2):446–453, totalnumberofparametersismuchgreater. Wehavedemon- 2012. 2 8 Figure7: TrackingresultsfortheapplicationofourfittingalgorithmgiveninSection5.3. Eachblockshowsframes0,20,40 and60ofasequenceofasubjectperforminganexpression. Top: happyexpression. Bottom: fearexpression. [17] Y. Li, T.-S. Tan, I. Volkau, and W. Nowinski. Model-guided [23] A. Savran, N. Alyuz, H. Dibeklioglu, O. Celiktutan, segmentation of 3D neuroradiological image using statistical B. Go¨kberk, B. Sankur, and L. Akarun. Bosphorus database surfacewaveletmodel. InCVPR,pages1–7,2007. 2,3 for3dfaceanalysis. InBIOID,pages47–56,2008. 1,6,7 [18] D. Liu and J. Nocedal. On the limited memory method for [24] P.Schro¨derandW.Sweldens. Sphericalwavelets: Efficiently large scale optimization. Math. Prog.: Ser. A, B, 45(3):503– representing functions on the sphere. In SIGGRAPH, pages 528,1989. 5 161–172,1995. 2 [19] I.Mpiperis,S.Malassiotis,andM.G.Strintzis. Bilinearmod- [25] M. Smet and L. V. Gool. Optimal regions for linear model- elsfor3-dfaceandfacialexpressionrecognition.TIFS,3:498– based3dfacereconstruction. InACCV,pages276–289,2010. 511,2008. 2 2 [20] D.Nain,S.Haker,A.Bobick,andA.Tannenbaum. Multiscale 3dshapeanalysisusingsphericalwavelets. InMICCAI,pages [26] W. Sweldens. The lifting scheme: A custom-design con- 459–467,2005. 2 structionofbiorthogonalwavelets. Appl.Comp.Harm.Anal., [21] A.PatelandW.Smith.3dmorphablefacemodelsrevisited.In 3(2):186–200,1996. 3 CVPR,pages1327–1334,2009. 2 [27] G.Tam,Z.-Q.Cheng,Y.-K.Lai,F.Langbein,Y.Liu,D.Mar- [22] A.Salazar,S.Wuhrer,C.Shu,andF.Prieto. Fullyautomatic shall, R. Martin, X.-F. Sun, and P. Rosin. Registration of 3d expression-invariantfacecorrespondence. MVAP,ToAppear. point clouds and meshes: A survey from rigid to nonrigid. 2,4,6 TVCG,19(7):1199–1217,2013. 2 9 [28] F.terHaarandR.Veltkamp. 3dfacemodelfittingforrecogni- tion. InECCV,pages652–664,2008. 2 [29] Vasilescu,M.,Terzopoulos,D.: Multilinearanalysisofimage ensembles:Tensorfaces.In:ECCV.pp.447–460(2002) 2 [30] D.Vlasic,M.Brand,H.Pfister,andJ.Popovic´. Facetransfer withmultilinearmodels. TOG,24(3):426–433,2005. 2 [31] F.Yang,L.Bourdev,J.Wang,E.Shechtman,andD.Metaxas. Facialexpressioneditinginvideousingatemporally-smooth factorization. InCVPR,pages861–868,2012. 2,5 [32] L. Yin, X. Chen, Y. Sun, T. Worm, and M. Reale. A high- resolution3ddynamicfacialexpressiondatabase.InFG,pages 1–6,2008. 1,6,7 [33] L.Yin,X.Wei,Y.Sun,J.Wang,andM.J.Rosato. A3dfacial expressiondatabaseforfacialbehaviorresearch. InFG,pages 211–216,2006. 1,6 10