Table Of Content

Advances in Data Analysis and Classification manuscript No. (will be inserted by the editor) Archetypal shapes based on landmarks and extension to handle missing data Irene Epifanio · Mar´ıa Victoria Ibán˜ez · Amelia Simó Received:date/Accepted:date Abstract Archetypeandarchetypoidanalysisareextendedtoshapes.Theobjec- tiveistofindrepresentativeshapes.Archetypalshapesarepure(extreme)shapes. We focus on the case where the shape of an object is represented by a configuration matrix of landmarks. As shape space is not a vectorial space, we work in the tangentspace,thelinearizedspaceaboutthemeanshape.Then,eachobservation is approximated by a convex combination of actual observations (archetypoids) or archetypes, which are a convex combination of observations in the data set. These tools can contribute to the understanding of shapes, as in the usual multi- variatecase,sincetheyliesomewherebetweenclusteringandmatrixfactorization methods. A new simplex visualization tool is also proposed to provide a picture of the archetypal analysis results. We also propose new algorithms for performing archetypal analysis with missing data and its extension to incomplete shapes. A well-known data set is used to illustrate the methodologies developed. The proposed methodology is applied to an apparel design problem in children. Keywords Statisticalshapeanalysis·Archetypeanalysis·Archetypoidanalysis· Anthropometric data · Children’s wear · Missing data Mathematics Subject Classification (2000) 62H11 · 62H25 · 62H30 1 Introduction Oneofthemainstepsinimageprocessingistherepresentationanddescriptionof theobjectsintheimages.Specifically,hereweconcentrateontheanalysisoftheir shape.Asignificantamountofresearchandactivityhasbeencarriedoutinrecent This paper has been partially supported by the Spanish Ministerio de Econom´ıa y Competi- tividadProjectDPI2013−47279−C2−1−R. I.Epifanio,M.V.Ibań˜ezandA.Simo´ DepartmentofMathematics-IMAC,UniversitatJaumeI Tel.:+34-964728390 Fax:+34-964728429 E-mail:epifanio@uji.es 2 IreneEpifanioetal. decadesinthegeneralareaofshapeanalysis.Byshapeanalysis,wemeanasetof toolsfordescribing,comparing,matching,deforming,andmodelingshapes.Three major approaches can be identified in shape analysis based on how the object is treatedinmathematicalterms(StoyanandStoyan(1995)):Objectscanbetreated as subsets of Rm, they can be described as sequences of points that are given by certain geometrical or anatomical properties (landmarks), or they can be defined using functions that represent their contours. Themajorityofresearchhasbeenrestrictedtolandmark-basedanalysis,where objects are represented using k labeled points in the Euclidean space Rm. These landmarks are required to appear in each data object and to correspond to each other in a physical sense. Seminal papers on this topic are Bookstein (1978), Kendall (1984), and Goodall (1991). The main references are Dryden and Mardia (1998) and Kendall et al (2009). In this paper we concentrate on this approach. When the landmark-based approach is used, the corresponding shape space is a finite-dimensional Riemannian manifold, and statistical methodologies on manifolds must be used. There are several difficulties with generalizing probability distributions and statistical procedures to measurements in a non-vectorial space like a Riemannian manifold, but fortunately, there has been a significant amount ofresearchandactivityinthisareaoverrecentyears.Anexcellentreviewisgiven by Pennec (2006). Statisticallearningcanbesupervisedorunsupervised((Hastieetal,2009,Ch. 14) provide an excellent overview of unsupervised learning techniques) depending on whether or not outcome variables are present or not. This last is our case. In multivariate statistics, data decomposition techniques for finding the latent components are very popular. A data matrix is viewed as a linear combination of several factors. Different unsupervised methods emerge depending on the constraints on the factors and their combination (Mørup and Hansen (2012); Thurau et al (2012); Vinué et al (2015a)). For example, in clustering methods such as k-means (or k-medoids) data are explained by means of several centroids, which are the average of groups of data (or data points in the case of k-medoids). This makesfactorseasilyinterpretable.However,binaryassignmentofdatatotheclus- tersreducestheirmodelingflexibility,ascomparedwithmethodssuchasprincipal component analysis (PCA). PCA can explain data variability very well, but the factorsobtainedarenotalwayseasytointerpret,astheyarealinearcombination ofthevariables.Obviously,theirobjectivesaredifferent.Archetypeanalysis(AA) lies somewhere between these two techniques, in the sense that its modeling flexi- bility is higher than clustering methods and its factors are very easy to interpret. The same happens with fuzzy versions of k-means and k-medoids, although their objectives are different from that of AA. A summary table showing the relation- shipbetweenseveralunsupervisedmethodsinthemultivariatecontext,whichalso applies to shapes, is given by Vinué et al (2015a). AA was proposed by Cutler and Breiman (1994). The objective of AA is to represent data as a convex combination (mixture) of pure or extremal patterns called archetypes. Archetypes are a convex combination of data points. A variant ofAAisarchetypoidanalysis(ADA).UnlikeAA,thepuretypesinADAarenota mixture of cases, but real (observed) cases. ADA represents the data as mixtures ofextremecases,andnotasmixturesofmixtures,asAAdoes.ADAwasproposed by Vinué et al (2015a). Archetypalshapesbasedonlandmarksandextensiontohandlemissingdata 3 The archetypal patterns obtained in both AA and ADA are extreme observations (archetypes belong to the convex hull of data (Cutler and Breiman (1994)). Representing data by their extreme constituents (Davis and Love (2010)) facil- itates human interpretation and understanding of data due to the principle of opposites (Thurau et al (2012)). Central points are not as good as extreme points for human interpretability. Furthermore, the expression of data as a convex combination of archetypal patterns makes a human reading of data easier, unlike a linear combination without any constraints. Besides ADA, AA has given rise to the development of other new methodolo- giesandithasbeenextendedtootherkindsofdata.Someexamplesareweighted and robust AA (Eugster and Leisch (2011)), interval archetypes (D’Esposito et al (2012)), functional AA and ADA (Epifanio (2016)), data-driven prototype iden- tification (Ragozini et al (2017)), probabilistic AA (Seth and Eugster (2016b)), AAfornominalobservations(SethandEugster(2016a))andarchetypalnetworks (Ragozini and D’Esposito (2015)). AA and ADA have been applied in many different fields, such as astrophysics (Chan et al (2003)), biology (D’Esposito et al (2012)), developmental psychology (Ragozini et al (2017)), e-learning (Theodosiou et al (2013)), genetics (Thøgersen et al (2013)), global development (Epifanio (2016)), industrial engineering (Epi- fanio et al (2013); Vinué et al (2015a)), machine learning problems (Mørup and Hansen(2012)),marketresearch(Lietal(2003);Porzioetal(2008);Midgleyand Venaik (2013)), multi-document summarization (Canhasi and Kononenko (2013, 2014)),neuroscience(Tsanousaetal(2015);Hinrichetal(2016))andsports(Eug- ster(2012);VinuéandEpifanio(2017)).AAandADAalgorithmsareimplemented in the R package archetypes (Eugster and Leisch (2009)) and Anthropometry (Vinué et al (2015b); Vinué (2017)), respectively. Displayingfiguresthatcorrespondtoextremescoresofprincipalcomponentsis quiteacommonexploratorytoolinstatisticalshapeanalysis(DrydenandMardia (1998);Claude(2008)).Thiscouldbeviewedaslookingforthearchetypalshapes. Nevertheless, unlike PCA, the purpose of AA or ADA is to find extreme observations, and cases with extreme PCA scores do not necessarily return archetypal cases.ThisisexplainedbyCutlerandBreiman(1994)andshownbyEpifanioetal (2013)throughaproblemwherearchetypescouldnotberecoveredwithPCAeven if all the components had been considered. Although it is common to study the average and variability of shapes in mor- phometrics, many applications have to cope with the analysis of extreme shapes. Severalbiologicalapplicationswheretheinterestisonextremeshapesratherthan mean shapes are introduced by Dryden and Zempléni (2006): the analysis of healthy and diseased muscle fiber cells and a time series of 4000 conformations ofaDNAmoleculeover4nanoseconds.Inseveraldiseases,themorphologyofcer- tain organic structures is affected and they have extreme shapes when compared with those found in controls. For example, spines in scoliosis patients, or corneal endothelium cells in pathological cases (Ayala et al (2006); Zapater et al (2002)). Anotherpotentialbiologicalapplicationistaxonomy,sincetheinterestisprecisely on species (pure types or archetypes) and their hybrids (mixture of archetypes). Therefore,insteadoftheusualapplicationofPCAinthisfield(ViscosiandCardini (2011)), AA for shapes could be very useful. However, there are sometimes error-prone shape data (Du et al (2015)), i.e. shapedatathatarenotfreefrommeasurementerrors,orsomelandmarksmayeven 4 IreneEpifanioetal. be missing for different observations (Brown et al (2012)); for example, if we are working with fossilized specimens, which are frequently subject to fragmentation, distortionanderosion(ArbourandBrown(2014)).Excludingthecases(ormissed landmarks) with missing data is not a good option (Arbour and Brown (2014)), especiallyifthesamplesizeissmallandthecollectionofadditional,morecomplete, material is not possible. Another option would be imputation, provided that the proportion of missing data is small. Nevertheless, if the proportion is large, the errors caused by imputation are increasingly important (Eirola et al (2013)). For thatreason,wealsoproposeanewprocedureforcomputingAAwithmissingdata in the multivariate case and we extend it to shapes with missing landmarks. A previous attempt to compute multivariate AA with missing data was introduced by Mørup and Hansen (2012), but by modifying the original objective function of AA. In industrial design, extremes are also very important. Human modeling is widely used in many industries such as the aviation, automotive, defense and manufacturing sectors. The use of representative human models (cases) provides designers with an efficient way of applying the body size characteristics of the target population to ergonomic design and evaluation. Boundary cases are those located toward the edges of the distribution, and are very relevant in ergonomic product design assuming that the accommodation of boundaries ensures the ac- commodationofinteriorpoints.Notethatifthe“hard-to-fit”extremesubjects(the boundary cases) are located before the design process begins, the design could be improved from the very beginning. This would reduce the time and cost of the design process. The problem that motivated us is concerned with the analysis of children’s shapes for garment fit problems, although it could be applied to the design of other products for an appropriate target population. Lack of fit is a significant problem for both customers and apparel companies. In fact, the return rates due to size misfit are very high in online garment shops (Eneh (2015)). The idea is to identify individuals who represent the fitting problems of the target population by means of archetypal shapes. Then the designer may adapt the base pattern to ensurethateachnewpatternisadaptedtothemeasurementsoftheextremesofa size. A 3D anthropometric study of the child population in Spain was carried out by the Biomechanics Institute of Valencia. The aim of this study was to obtain anthropometric data from the child population for the clothing industry. A total of 502 Spanish children aged 6 to 12 years old were randomly selected. They were scanned using the Vitus Smart 3D body scanner from Human Solutions, a non- intrusive laser system formed by four columns housing the optic system, which moves from head to feet in ten seconds, performing a sweep of the body. The purpose of this work is to extend AA and ADA to statistical shape analysis, which will help make a shape data set easier to understand, displaying and describing the relevant features of the data. AA and ADA in the multivariate case is reviewed in Section 2, together with some aspects of shape analysis, and the novel extension of AA and ADA to shapes is introduced. In Section 3, AA and ADA with landmarks is illustrated with a very well-known data set, and the results are compared with those obtained with PCA, sparse PCA and k-means clustering. A new simplex visualization tool is also proposed to provide a picture ofthearchetypalanalysisresults.InSection4,AAwithmissingdataisproposed, together with its extension to shapes with missing landmarks. The procedure is Archetypalshapesbasedonlandmarksandextensiontohandlemissingdata 5 againillustratedandcomparedwithdifferentalternativesinthemultivariatecase and shape case in the supplementary material (Online Resource 1). ADA with missing data is also discussed. In Section 5, children’s shapes for garment design are analyzed. Finally, conclusions and further developments are discussed in Sec- tion6.ThecodeinR(RDevelopmentCoreTeam(2017))anddataforreproducing theexamplesareavailableathttp://www3.uji.es/∼epifanio/RESEARCH/laa.rar. 2 Definition of AA and ADA with landmarks 2.1 AA and ADA in the multivariate case LetX =(x ,...,x )beann×rdatamatrixwithncasesandrvariables.ForAA, 1 n the p×r matrix Z = (z , ..., z ) will contain the p archetypes in those data, in 1 p suchawaythatrowx isapproximatedbyamixtureoftherowsz ’s(archetypes): i j p (cid:88) x ∼xˆ = α z , (1) i i ij j j=1 with the mixture coefficients contained in the n×p matrix α = (α ). On the ij other hand, the archetypes z are obtained with mixture coefficients compiled in j the p×n matrix β = (β ) according to: jl n (cid:88) z = β x . (2) j jl l l=1 TodeterminethematricesZ,αandβ foragivendatamatrixX,thefollowing residual sum of squares (RSS) is minimized ( (cid:107)·(cid:107) denotes the Frobenius matrix norm for matrices, and the Euclidean norm for vectors ): n p n p n RSS =(cid:107)X−αβX(cid:107)2 =(cid:88)(cid:107)x −(cid:88)α z (cid:107)2 =(cid:88)(cid:107)x −(cid:88)α (cid:88)β x (cid:107)2, (3) i ij j i ij jl l i=1 j=1 i=1 j=1 l=1 under the constraints p (cid:88) 1) α =1 with α ≥0 and i=1,...,n, j =1,...,p and ij ij j=1 n (cid:88) 2) β =1 with β ≥0 and j =1,...,p and l=1,...,n. jl jl l=1 Constraint 1) indicates that the approximation of x is a convex combination i p (cid:88) ofarchetypes,xˆ = α z .Eachα istheweightofthearchetypej forthecase i ij j ij j=1 i; that is to say, the α coefficients indicate how much each archetype contributes to the approximation of each case. Constraint 2) indicates that archetypes z are j n (cid:88) case mixtures, z = β x . j jl l l=1 6 IreneEpifanioetal. Archetypes z are not necessarily observed data points. z would only be an j j observed data point if one β was equal to 1 in constraint 2) for each j. In ADA, jl therefore, β can only take on the binary values 0 or 1, because β ≥ 0 and the jl jl sumofconstraint2)is1.Asaconsequence,insteadofthecontinuousoptimization problemofAA,thefollowingmixed-integeroptimizationproblemhastobesolved in ADA: n p n p n RSS =(cid:88)(cid:107)x −(cid:88)α z (cid:107)2 =(cid:88)(cid:107)x −(cid:88)α (cid:88)β x (cid:107)2, (4) i ij j i ij jl l i=1 j=1 i=1 j=1 l=1 under the constraints p (cid:88) 1) α =1 with α ≥0 and i=1,...,n, j =1,...,p and ij ij j=1 n (cid:88) 2) β =1 with β ∈{0,1} and j =1,...,p and l=1,...,n. jl jl l=1 Note that β =1 for one and only one l, otherwise β =0. jl jl Due to their definitions, archetypes and archetypoids are extremal representa- tives of the data. If p > 1, archetypes are located on the boundary of the convex hull of the data (see Cutler and Breiman (1994)), while this does not necessarily hold for archetypoids (see Vinué et al (2015a)). If p = 1, the mean of the data is the archetype, and the archetypoid is the medoid of the data (Kaufman and Rousseeuw (1990)). An alternating minimizing algorithm was proposed by Cutler and Breiman (1994) to solve the problem (3). It alternates between estimating the best α for given archetypes Z and the best archetypes Z for a given α. The convex least squares problems were solved with a penalized version of the non-negative least squares algorithm by Lawson and Hanson (1974). Eugster and Leisch (2009) im- plemented that algorithm in the R library archetypes, where data are standardized by default. We have based our algorithm on this one, but the data are not standardized and the Frobenius norm is considered, as indicated in equation (3), instead of the spectral norm used by Eugster and Leisch (2009). An algorithm based on the idea of the Partitioning Around Medoids (PAM) clustering algorithm (Kaufman and Rousseeuw (1990)) was proposed by Vinué et al (2015a) to solve the problem (4). A BUILD phase and a SWAP phase are considered in that algorithm. From an initial set of archetypoids calculated in the BUILD step, the SWAP phase improves that set by exchanging selected cases forunselectedobservationsandbycheckingifthesereplacementsreducetheRSS. ThatalgorithmwasimplementedintheRlibraryAnthropometrybyVinuéetal (2015b). Three alternatives for the BUILD phase are considered in the R imple- mentation.ThefirstcandidatesarethenearestneighborsinEuclideandistanceto the p archetypes, the so-called cand set. The second initial candidates, referred ns to as the cand set, are the cases with the maximum α value for each archetype α j, i.e., the cases with the largest relative share for the respective archetype. The third set of candidates, the cand set, consists of the observations with the maxi- β mumβ valueforeacharchetypej,i.e.,themajorcontributorsinthegenerationof thearchetypes.Fromthesethreeinitialsets,aftertheSWAPphase,ADAreturns Archetypalshapesbasedonlandmarksandextensiontohandlemissingdata 7 threesetsofarchetypoids.ThesetwithlowestRSS(oftenthesamesetisobtained from the three initializations) is the returned ADA solution. An open question is the number p of archetypes or archetypoids to compute. Note that neither archetypes nor archetypoids are necessarily nested. It could be decided by the user or the elbow criterion could be used as made by Cutler and Breiman (1994); Eugster and Leisch (2009); Vinué et al (2015a) (the value p is selected as the point where the elbow on the RSS representation for a series of different p values is found). 2.2 Shape space and shape distances Theword“shape”isverycommonlyusedineverydaylanguage,usuallyreferringto the visual appearance of a geometric object. More formally, shape can be defined as geometric information about the object that is invariant under a Euclidean similarity transformation, i.e., with respect to location (translation), orientation (rotation) and scale (size) (Dryden and Mardia (1998)). In this work, the shape of geometrical m-dimensional objects (usually m=2,3) is determined by a finite numberofk>mcoordinatepoints,knownaslandmarkpoints.Eachobjectisthen described by a k×m configuration matrix X = (x ) containing the m Cartesian ij coordinates x of its k landmarks, i.e. each row represents a landmark and each ij column represents one Cartesian coordinate of that landmark. However,aconfigurationmatrixX isnotapropershapedescriptorbecauseit is not invariant to similarity transformations. For any similarity transformation, i.e., for any translation vector b ∈ Rm, scale parameter s ∈ R+, and m × m rotationmatrixR,theconfigurationmatrixgivenbysXR+1 bT (where1 isthe k k k×1 vector of ones, and the superscript T means transpose) describes the same shape as X. Definition 1 The shape space Σk is the set of equivalence classes [X] of k × m m configuration matrices X ∈ Rk×m under the action of Euclidean similarity transformations. A representative of each equivalence class [X] can be obtained by removing the similarity transformations one at a time. There are different ways to do that. LetX beaconfigurationmatrix.Onewaytoremovethelocationeffectconsists ofmultiplyingitbytheHelmertsub-matrix((DrydenandMardia,2016,p.49-50)), H, i. e., X =HX. H TofilterscalewecandivideX bythecentroidsize,whichisgivenbyS(X)= H (cid:107)X (cid:107). H X Y = H (5) (cid:107)X (cid:107) H iscalledthepre-shapeoftheconfigurationmatrixX becauseallinformationabout location and scale is removed, but rotation information remains. Definition 2 The pre-shape space Sk is the set of all possible pre-shapes Y. m Sk isahypersphereofunitradiusinRm(k−1) (aRiemannianmanifoldthatis m widely studied and known). Σk is the quotient space of Sk under rotations. m m 8 IreneEpifanioetal. Asaresult,ashape[X]isanorbitgeneratedbytherotationgroupSO(m)on the pre-shape space. For m=2, this quotient space is isometric with the complex projective space CP , a familiar Riemannian manifold without singularities. For m > 2, which k−2 is the case of our application, Σk is not a familiar space, and it has singulari- m ties; however, the Riemannian structure of the non-singular part of Σk can be m obtainedtakingintoaccountthatthequotientspaceΣk/SO(m)isaRiemannian m submersion; see Kendall et al (2009). Different distances between shapes can be introduced in Σk. One of the most m popular is the full Procrustes distance. Given two configuration matrices X and 1 X , the full Procrustes distance is a least-squares type metric between the corre- 2 sponding pre-shapes Y and Y , respectively. 1 2 Definition 3 ThefullProcrustesdistancebetweenconfigurationmatricesX and 1 X is defined by: 2 d (X ,X )= inf (cid:107)Y −βY R(cid:107), (6) F 1 2 2 1 R∈SO(m),β∈R SO(m) being the orthogonal group of rotations. The solution of this optimization problem is given by: (cid:118) (cid:117) m (cid:117) (cid:88) dF(X1,X2)=(cid:116)1−( λi)2, i=1 where λ ≥ λ ≥ ...λ ≥| λ | are the square roots of the eigenvalues of 1 2 m−1 m YTY YTY , and the smallest value λ is the negative square root if and only if 1 2 2 1 m detYTY <0 Dryden and Mardia (1998). 1 2 Alternative distances could be used in the shape space. In particular, we will also work with: Definition 4 The Procrustes distance ρ(X ,X ) is the closest (over rotations) 1 2 great circle distance (the shortest distance between two points on the surface of a sphere, measured along the surface of the sphere) between Z and Z on the 1 2 pre-shape space Sk. m The relationship between d and ρ is (Kendall et al (2009)): d (X ,X ) = F F 1 2 sinρ(X ,X ). 1 2 Based on the full Procrustes distance a concept of mean shape [µ] can be (cid:98) introduced in the Fréchet sense Fréchet (1948), i.e., one that minimizes the sum of squared distances from any shape in the set. Definition 5 Given a set of configuration matrices X ,...,X , the full Pro- 1 n crustes mean in Σk is given by m n [µ]=arg inf (cid:88)d2(X ,µ). (7) (cid:98) F i µ:S(µ)=1 i=1 Fortwo-dimensionaldataanexpliciteigenvectorsolutionoftheoptimizationprob- lem (5) is available (see p. 44 by Dryden and Mardia (1998)), but for m=3 and higher dimensional data, an iterative procedure must be used. Archetypalshapesbasedonlandmarksandextensiontohandlemissingdata 9 2.3 AA and ADA for shapes Let X ,...,X be n landmark configuration matrices, with k landmarks in di- 1 n mension m, each X is the representative of a shape [X ], an element of the shape i i space Σk. Henceforth, in order to simplify the notation, we will use X to denote m bothaconfigurationmatrixanditsshape,providedthatitisunderstoodfromthe context. OurobjectiveistofindshapesZ ∈Σk,j =1,..,pthatgeneralizethesettings j m (1)and(2)tothisspace.Theycannotbeuseddirectlybecauseshapespaceisnot avectorialspaceandtheexpressions(cid:80)p α Z and(cid:80)n β X arenotdefined. j=1 ij j i=1 jl l Instead we propose to project X , ..., X to an approximating linear space 1 n where this expression could be defined and then take it back to Σk. m In order to obtain this linear space we have to take into account that, as noted before, the non-singular part of Σk has a Riemannian manifold structure m and so it is locally homeomorphic with a Euclidean space such that the local homeomorphismscanbesmoothlypatchedtogether.Thisleadsustothedefinition of the tangent space at each point (pole) that is linear and contains all possible directions at the pole. Additionally, the tangent space has the desirable property that the distance from the shape to the pole is preserved, i.e. the distance from a point in the manifold to the pole is equal to the Euclidean distance between its projections in the tangent space. The type of distance depends on the choice of the tangent; in our application the full Procrustes distance will be adopted (Dryden and Mardia (2016)). As one moves away from the pole, Euclidean distances between some pairs of points in the tangent space are smaller than their corresponding shape distances. This distortion becomes larger as one considers points that are more distant from the pole. For this reason, the pole should be taken close to all of the points and the mean of the observed shapes is the best choice (Rohlf (1998); Slice (2001)). The tangent space of the shape space at the Procrustes mean (7) is called the Procrustes tangent space. As discussed before, if the data are fairly concentrated around this mean, the EuclideandistanceintheProcrustestangentspaceisagoodapproximationtod , F and standard multivariate techniques based on Euclidean distance can be used in thisspace.Forthisreason,thisapproachiswidelyusedforstatisticalinferenceon the shape space in many applications. To know whether shape variation is sufficiently small in practical applications, Rohlf (1999) suggests comparing the Euclidean distances between all pairs of points in the tangent space (or simply the distance to the average shape) against their Procrustes distances in the shape space. Furthermore, Rohlf (1999) also points out that, at least in biological applications, the approximation would usually be good when there are more than just a few landmarks. In terms of Riemannian manifolds, the maps that allow us to move from the manifoldtothetangentspaceorviceversaarecalledlogarithmicandexponential maps, respectively. For m=2 their expressions are easy because, as noted before, inthiscasetheshapespaceisisometrictothecomplexprojectivespace.Toobtain their expressions in the case of m > 2, which is the case of our application, we have to take into account that the mapping π that assigns to each preshape Y in Sk the corresponding element π(Y) in Σk is a Riemannian submersion and it m m 10 IreneEpifanioetal. maps the horizontal subspace of the tangent space to the pre-shape sphere at Y isometricallyontothetangentspacetotheshapespaceatπ(Y).Usingthisresult, theexponentialandlogarithmmapsinΣk canbecomputed(DrydenandMardia m (2016)). Before showing the calculus, it is necessary to introduce the vectorizing operator. The vectorizing operator of an l×m matrix A with columns a ,a ,...,a 1 2 m is defined as: vec(A)=(aT,aT,...,aT)T. 1 2 m Let S be the pre-shape of the Procrustes mean µ of X ,...,X and Y ,...,Y 1 n 1 n their respective preshapes, obtained using equation (5). To obtain the expression of the projection onto the tangent plane at S of X ,...,X , the pre-shape Y is 1 n i rotatedtobeascloseaspossibletoS.Wewritetherotatedpre-shapeasY Γˆ with i i the rotation matrix Γˆ. The expression of Γˆ can be found on p. 61 of Dryden and i i Mardia (1998):Γˆ = U VT, where U ,V ∈ SO(m) are the left and right matrices i i i i i of the singular value decomposition of STY . Then, the projection of Y on the i i tangent space at S is: trace(STY Γˆ) log (Y )=(I −vec(S)vec(S)T)vec(Y Γˆ) i i , (8) S i km−m i i sin(trace(STY Γˆ)) i i where I is the (km−m)×(km−m) identity matrix. km−m ThelogarithmictangentprojectionhasthepropertythatitpreservesthePro- crustes distance (Def. 4). Instead of using these coordinates, we propose to use: sin(trace(STY Γˆ)) v =log (Y ) i i . i S i trace(STY Γˆ) i i Thev coordinatesarecalledKent’spartialtangentcoordinates.Wechosethis i projection because instead of preserving the Procrustes distance it preserves the full Procrustes distance (Def. 6). The practical difference is that, using Kent’s partial tangent coordinates the more extreme observations are pulled in more towards the pole Dryden and Mardia (2016). We are now ready to introduce the definition of AA and ADA for shapes. Let v ,...,v be the tangent coordinates of X ,...,X . The coordinates in the 1 n 1 n tangent space u j = 1,..,p of the archetypes Z ∈ Σk, j = 1,..,p are obtained j j m by minimizing: n p n p n RSS =(cid:88)(cid:107)v −(cid:88)α u (cid:107)2 =(cid:88)(cid:107)v −(cid:88)α (cid:88)β v (cid:107)2, (9) i ij j i ij jl l i=1 j=1 i=1 j=1 l=1 under the constraints p (cid:88) 1) α =1 with α ≥0 and i=1,...,n, j =1,...,p and ij ij j=1 n (cid:88) 2) β =1 with β ≥0 and j =1,...,p and l=1,...,n. jl jl l=1 For the case of ADA, the definition is analogous, but constraint 2) is changed as in Eq. 4. A significant difference between our methodology and the usual statistical methods for shape analysis is that once archetypes are computed in the tangent

Description:

Anthropometric data · Children's wear · Missing data. Mathematics . AA and ADA have been applied in many different fields, such as astrophysics.

Archetypal shapes based on landmarks and extension to handle missing data PDF

30 Pages·2017·0.7 MB·English

Checking for file health...

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Download Archetypal shapes based on landmarks and extension to handle missing data PDF Free - Full Version

by Unknow| 2017| 30 pages| 0.7| English

Download Archetypal shapes based on landmarks and extension to handle missing data by in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Archetypal shapes based on landmarks and extension to handle missing data

Anthropometric data · Children's wear · Missing data. Mathematics . AA and ADA have been applied in many different fields, such as astrophysics.

Detailed Information

Author:	Unknown
Publication Year:	2017
Pages:	30
Language:	English
File Size:	0.7
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Archetypal shapes based on landmarks and extension to handle missing data Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Archetypal shapes based on landmarks and extension to handle missing data PDF?

Yes, on https://PDFdrive.to you can download Archetypal shapes based on landmarks and extension to handle missing data by completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Archetypal shapes based on landmarks and extension to handle missing data on my mobile device?

After downloading Archetypal shapes based on landmarks and extension to handle missing data PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Archetypal shapes based on landmarks and extension to handle missing data?

Yes, this is the complete PDF version of Archetypal shapes based on landmarks and extension to handle missing data by Unknow. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Archetypal shapes based on landmarks and extension to handle missing data PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.