Table Of ContentAdvances in Data Analysis and Classification manuscript No.
(will be inserted by the editor)
Archetypal shapes based on landmarks and extension
to handle missing data
Irene Epifanio · Mar´ıa Victoria Ib´an˜ez ·
Amelia Sim´o
Received:date/Accepted:date
Abstract Archetypeandarchetypoidanalysisareextendedtoshapes.Theobjec-
tiveistofindrepresentativeshapes.Archetypalshapesarepure(extreme)shapes.
We focus on the case where the shape of an object is represented by a configura-
tion matrix of landmarks. As shape space is not a vectorial space, we work in the
tangentspace,thelinearizedspaceaboutthemeanshape.Then,eachobservation
is approximated by a convex combination of actual observations (archetypoids)
or archetypes, which are a convex combination of observations in the data set.
These tools can contribute to the understanding of shapes, as in the usual multi-
variatecase,sincetheyliesomewherebetweenclusteringandmatrixfactorization
methods. A new simplex visualization tool is also proposed to provide a picture
of the archetypal analysis results. We also propose new algorithms for performing
archetypal analysis with missing data and its extension to incomplete shapes. A
well-known data set is used to illustrate the methodologies developed. The pro-
posed methodology is applied to an apparel design problem in children.
Keywords Statisticalshapeanalysis·Archetypeanalysis·Archetypoidanalysis·
Anthropometric data · Children’s wear · Missing data
Mathematics Subject Classification (2000) 62H11 · 62H25 · 62H30
1 Introduction
Oneofthemainstepsinimageprocessingistherepresentationanddescriptionof
theobjectsintheimages.Specifically,hereweconcentrateontheanalysisoftheir
shape.Asignificantamountofresearchandactivityhasbeencarriedoutinrecent
This paper has been partially supported by the Spanish Ministerio de Econom´ıa y Competi-
tividadProjectDPI2013−47279−C2−1−R.
I.Epifanio,M.V.Iba´n˜ezandA.Simo´
DepartmentofMathematics-IMAC,UniversitatJaumeI
Tel.:+34-964728390
Fax:+34-964728429
E-mail:epifanio@uji.es
2 IreneEpifanioetal.
decadesinthegeneralareaofshapeanalysis.Byshapeanalysis,wemeanasetof
toolsfordescribing,comparing,matching,deforming,andmodelingshapes.Three
major approaches can be identified in shape analysis based on how the object is
treatedinmathematicalterms(StoyanandStoyan(1995)):Objectscanbetreated
as subsets of Rm, they can be described as sequences of points that are given by
certain geometrical or anatomical properties (landmarks), or they can be defined
using functions that represent their contours.
Themajorityofresearchhasbeenrestrictedtolandmark-basedanalysis,where
objects are represented using k labeled points in the Euclidean space Rm. These
landmarks are required to appear in each data object and to correspond to each
other in a physical sense. Seminal papers on this topic are Bookstein (1978),
Kendall (1984), and Goodall (1991). The main references are Dryden and Mardia
(1998) and Kendall et al (2009). In this paper we concentrate on this approach.
When the landmark-based approach is used, the corresponding shape space is
a finite-dimensional Riemannian manifold, and statistical methodologies on man-
ifolds must be used. There are several difficulties with generalizing probability
distributions and statistical procedures to measurements in a non-vectorial space
like a Riemannian manifold, but fortunately, there has been a significant amount
ofresearchandactivityinthisareaoverrecentyears.Anexcellentreviewisgiven
by Pennec (2006).
Statisticallearningcanbesupervisedorunsupervised((Hastieetal,2009,Ch.
14) provide an excellent overview of unsupervised learning techniques) depending
on whether or not outcome variables are present or not. This last is our case.
In multivariate statistics, data decomposition techniques for finding the latent
components are very popular. A data matrix is viewed as a linear combination
of several factors. Different unsupervised methods emerge depending on the con-
straints on the factors and their combination (Mørup and Hansen (2012); Thurau
et al (2012); Vinu´e et al (2015a)). For example, in clustering methods such as
k-means (or k-medoids) data are explained by means of several centroids, which
are the average of groups of data (or data points in the case of k-medoids). This
makesfactorseasilyinterpretable.However,binaryassignmentofdatatotheclus-
tersreducestheirmodelingflexibility,ascomparedwithmethodssuchasprincipal
component analysis (PCA). PCA can explain data variability very well, but the
factorsobtainedarenotalwayseasytointerpret,astheyarealinearcombination
ofthevariables.Obviously,theirobjectivesaredifferent.Archetypeanalysis(AA)
lies somewhere between these two techniques, in the sense that its modeling flexi-
bility is higher than clustering methods and its factors are very easy to interpret.
The same happens with fuzzy versions of k-means and k-medoids, although their
objectives are different from that of AA. A summary table showing the relation-
shipbetweenseveralunsupervisedmethodsinthemultivariatecontext,whichalso
applies to shapes, is given by Vinu´e et al (2015a).
AA was proposed by Cutler and Breiman (1994). The objective of AA is to
represent data as a convex combination (mixture) of pure or extremal patterns
called archetypes. Archetypes are a convex combination of data points. A variant
ofAAisarchetypoidanalysis(ADA).UnlikeAA,thepuretypesinADAarenota
mixture of cases, but real (observed) cases. ADA represents the data as mixtures
ofextremecases,andnotasmixturesofmixtures,asAAdoes.ADAwasproposed
by Vinu´e et al (2015a).
Archetypalshapesbasedonlandmarksandextensiontohandlemissingdata 3
The archetypal patterns obtained in both AA and ADA are extreme observa-
tions (archetypes belong to the convex hull of data (Cutler and Breiman (1994)).
Representing data by their extreme constituents (Davis and Love (2010)) facil-
itates human interpretation and understanding of data due to the principle of
opposites (Thurau et al (2012)). Central points are not as good as extreme points
for human interpretability. Furthermore, the expression of data as a convex com-
bination of archetypal patterns makes a human reading of data easier, unlike a
linear combination without any constraints.
Besides ADA, AA has given rise to the development of other new methodolo-
giesandithasbeenextendedtootherkindsofdata.Someexamplesareweighted
and robust AA (Eugster and Leisch (2011)), interval archetypes (D’Esposito et al
(2012)), functional AA and ADA (Epifanio (2016)), data-driven prototype iden-
tification (Ragozini et al (2017)), probabilistic AA (Seth and Eugster (2016b)),
AAfornominalobservations(SethandEugster(2016a))andarchetypalnetworks
(Ragozini and D’Esposito (2015)).
AA and ADA have been applied in many different fields, such as astrophysics
(Chan et al (2003)), biology (D’Esposito et al (2012)), developmental psychology
(Ragozini et al (2017)), e-learning (Theodosiou et al (2013)), genetics (Thøgersen
et al (2013)), global development (Epifanio (2016)), industrial engineering (Epi-
fanio et al (2013); Vinu´e et al (2015a)), machine learning problems (Mørup and
Hansen(2012)),marketresearch(Lietal(2003);Porzioetal(2008);Midgleyand
Venaik (2013)), multi-document summarization (Canhasi and Kononenko (2013,
2014)),neuroscience(Tsanousaetal(2015);Hinrichetal(2016))andsports(Eug-
ster(2012);Vinu´eandEpifanio(2017)).AAandADAalgorithmsareimplemented
in the R package archetypes (Eugster and Leisch (2009)) and Anthropometry
(Vinu´e et al (2015b); Vinu´e (2017)), respectively.
Displayingfiguresthatcorrespondtoextremescoresofprincipalcomponentsis
quiteacommonexploratorytoolinstatisticalshapeanalysis(DrydenandMardia
(1998);Claude(2008)).Thiscouldbeviewedaslookingforthearchetypalshapes.
Nevertheless, unlike PCA, the purpose of AA or ADA is to find extreme obser-
vations, and cases with extreme PCA scores do not necessarily return archetypal
cases.ThisisexplainedbyCutlerandBreiman(1994)andshownbyEpifanioetal
(2013)throughaproblemwherearchetypescouldnotberecoveredwithPCAeven
if all the components had been considered.
Although it is common to study the average and variability of shapes in mor-
phometrics, many applications have to cope with the analysis of extreme shapes.
Severalbiologicalapplicationswheretheinterestisonextremeshapesratherthan
mean shapes are introduced by Dryden and Zempl´eni (2006): the analysis of
healthy and diseased muscle fiber cells and a time series of 4000 conformations
ofaDNAmoleculeover4nanoseconds.Inseveraldiseases,themorphologyofcer-
tain organic structures is affected and they have extreme shapes when compared
with those found in controls. For example, spines in scoliosis patients, or corneal
endothelium cells in pathological cases (Ayala et al (2006); Zapater et al (2002)).
Anotherpotentialbiologicalapplicationistaxonomy,sincetheinterestisprecisely
on species (pure types or archetypes) and their hybrids (mixture of archetypes).
Therefore,insteadoftheusualapplicationofPCAinthisfield(ViscosiandCardini
(2011)), AA for shapes could be very useful.
However, there are sometimes error-prone shape data (Du et al (2015)), i.e.
shapedatathatarenotfreefrommeasurementerrors,orsomelandmarksmayeven
4 IreneEpifanioetal.
be missing for different observations (Brown et al (2012)); for example, if we are
working with fossilized specimens, which are frequently subject to fragmentation,
distortionanderosion(ArbourandBrown(2014)).Excludingthecases(ormissed
landmarks) with missing data is not a good option (Arbour and Brown (2014)),
especiallyifthesamplesizeissmallandthecollectionofadditional,morecomplete,
material is not possible. Another option would be imputation, provided that the
proportion of missing data is small. Nevertheless, if the proportion is large, the
errors caused by imputation are increasingly important (Eirola et al (2013)). For
thatreason,wealsoproposeanewprocedureforcomputingAAwithmissingdata
in the multivariate case and we extend it to shapes with missing landmarks. A
previous attempt to compute multivariate AA with missing data was introduced
by Mørup and Hansen (2012), but by modifying the original objective function of
AA.
In industrial design, extremes are also very important. Human modeling is
widely used in many industries such as the aviation, automotive, defense and
manufacturing sectors. The use of representative human models (cases) provides
designers with an efficient way of applying the body size characteristics of the
target population to ergonomic design and evaluation. Boundary cases are those
located toward the edges of the distribution, and are very relevant in ergonomic
product design assuming that the accommodation of boundaries ensures the ac-
commodationofinteriorpoints.Notethatifthe“hard-to-fit”extremesubjects(the
boundary cases) are located before the design process begins, the design could be
improved from the very beginning. This would reduce the time and cost of the
design process.
The problem that motivated us is concerned with the analysis of children’s
shapes for garment fit problems, although it could be applied to the design of
other products for an appropriate target population. Lack of fit is a significant
problem for both customers and apparel companies. In fact, the return rates due
to size misfit are very high in online garment shops (Eneh (2015)). The idea is to
identify individuals who represent the fitting problems of the target population
by means of archetypal shapes. Then the designer may adapt the base pattern to
ensurethateachnewpatternisadaptedtothemeasurementsoftheextremesofa
size. A 3D anthropometric study of the child population in Spain was carried out
by the Biomechanics Institute of Valencia. The aim of this study was to obtain
anthropometric data from the child population for the clothing industry. A total
of 502 Spanish children aged 6 to 12 years old were randomly selected. They were
scanned using the Vitus Smart 3D body scanner from Human Solutions, a non-
intrusive laser system formed by four columns housing the optic system, which
moves from head to feet in ten seconds, performing a sweep of the body.
The purpose of this work is to extend AA and ADA to statistical shape anal-
ysis, which will help make a shape data set easier to understand, displaying and
describing the relevant features of the data. AA and ADA in the multivariate
case is reviewed in Section 2, together with some aspects of shape analysis, and
the novel extension of AA and ADA to shapes is introduced. In Section 3, AA
and ADA with landmarks is illustrated with a very well-known data set, and the
results are compared with those obtained with PCA, sparse PCA and k-means
clustering. A new simplex visualization tool is also proposed to provide a picture
ofthearchetypalanalysisresults.InSection4,AAwithmissingdataisproposed,
together with its extension to shapes with missing landmarks. The procedure is
Archetypalshapesbasedonlandmarksandextensiontohandlemissingdata 5
againillustratedandcomparedwithdifferentalternativesinthemultivariatecase
and shape case in the supplementary material (Online Resource 1). ADA with
missing data is also discussed. In Section 5, children’s shapes for garment design
are analyzed. Finally, conclusions and further developments are discussed in Sec-
tion6.ThecodeinR(RDevelopmentCoreTeam(2017))anddataforreproducing
theexamplesareavailableathttp://www3.uji.es/∼epifanio/RESEARCH/laa.rar.
2 Definition of AA and ADA with landmarks
2.1 AA and ADA in the multivariate case
LetX =(x ,...,x )beann×rdatamatrixwithncasesandrvariables.ForAA,
1 n
the p×r matrix Z = (z , ..., z ) will contain the p archetypes in those data, in
1 p
suchawaythatrowx isapproximatedbyamixtureoftherowsz ’s(archetypes):
i j
p
(cid:88)
x ∼xˆ = α z , (1)
i i ij j
j=1
with the mixture coefficients contained in the n×p matrix α = (α ). On the
ij
other hand, the archetypes z are obtained with mixture coefficients compiled in
j
the p×n matrix β = (β ) according to:
jl
n
(cid:88)
z = β x . (2)
j jl l
l=1
TodeterminethematricesZ,αandβ foragivendatamatrixX,thefollowing
residual sum of squares (RSS) is minimized ( (cid:107)·(cid:107) denotes the Frobenius matrix
norm for matrices, and the Euclidean norm for vectors ):
n p n p n
RSS =(cid:107)X−αβX(cid:107)2 =(cid:88)(cid:107)x −(cid:88)α z (cid:107)2 =(cid:88)(cid:107)x −(cid:88)α (cid:88)β x (cid:107)2, (3)
i ij j i ij jl l
i=1 j=1 i=1 j=1 l=1
under the constraints
p
(cid:88)
1) α =1 with α ≥0 and i=1,...,n, j =1,...,p and
ij ij
j=1
n
(cid:88)
2) β =1 with β ≥0 and j =1,...,p and l=1,...,n.
jl jl
l=1
Constraint 1) indicates that the approximation of x is a convex combination
i
p
(cid:88)
ofarchetypes,xˆ = α z .Eachα istheweightofthearchetypej forthecase
i ij j ij
j=1
i; that is to say, the α coefficients indicate how much each archetype contributes
to the approximation of each case. Constraint 2) indicates that archetypes z are
j
n
(cid:88)
case mixtures, z = β x .
j jl l
l=1
6 IreneEpifanioetal.
Archetypes z are not necessarily observed data points. z would only be an
j j
observed data point if one β was equal to 1 in constraint 2) for each j. In ADA,
jl
therefore, β can only take on the binary values 0 or 1, because β ≥ 0 and the
jl jl
sumofconstraint2)is1.Asaconsequence,insteadofthecontinuousoptimization
problemofAA,thefollowingmixed-integeroptimizationproblemhastobesolved
in ADA:
n p n p n
RSS =(cid:88)(cid:107)x −(cid:88)α z (cid:107)2 =(cid:88)(cid:107)x −(cid:88)α (cid:88)β x (cid:107)2, (4)
i ij j i ij jl l
i=1 j=1 i=1 j=1 l=1
under the constraints
p
(cid:88)
1) α =1 with α ≥0 and i=1,...,n, j =1,...,p and
ij ij
j=1
n
(cid:88)
2) β =1 with β ∈{0,1} and j =1,...,p and l=1,...,n.
jl jl
l=1
Note that β =1 for one and only one l, otherwise β =0.
jl jl
Due to their definitions, archetypes and archetypoids are extremal representa-
tives of the data. If p > 1, archetypes are located on the boundary of the convex
hull of the data (see Cutler and Breiman (1994)), while this does not necessarily
hold for archetypoids (see Vinu´e et al (2015a)). If p = 1, the mean of the data
is the archetype, and the archetypoid is the medoid of the data (Kaufman and
Rousseeuw (1990)).
An alternating minimizing algorithm was proposed by Cutler and Breiman
(1994) to solve the problem (3). It alternates between estimating the best α for
given archetypes Z and the best archetypes Z for a given α. The convex least
squares problems were solved with a penalized version of the non-negative least
squares algorithm by Lawson and Hanson (1974). Eugster and Leisch (2009) im-
plemented that algorithm in the R library archetypes, where data are standard-
ized by default. We have based our algorithm on this one, but the data are not
standardized and the Frobenius norm is considered, as indicated in equation (3),
instead of the spectral norm used by Eugster and Leisch (2009).
An algorithm based on the idea of the Partitioning Around Medoids (PAM)
clustering algorithm (Kaufman and Rousseeuw (1990)) was proposed by Vinu´e
et al (2015a) to solve the problem (4). A BUILD phase and a SWAP phase are
considered in that algorithm. From an initial set of archetypoids calculated in
the BUILD step, the SWAP phase improves that set by exchanging selected cases
forunselectedobservationsandbycheckingifthesereplacementsreducetheRSS.
ThatalgorithmwasimplementedintheRlibraryAnthropometrybyVinu´eetal
(2015b). Three alternatives for the BUILD phase are considered in the R imple-
mentation.ThefirstcandidatesarethenearestneighborsinEuclideandistanceto
the p archetypes, the so-called cand set. The second initial candidates, referred
ns
to as the cand set, are the cases with the maximum α value for each archetype
α
j, i.e., the cases with the largest relative share for the respective archetype. The
third set of candidates, the cand set, consists of the observations with the maxi-
β
mumβ valueforeacharchetypej,i.e.,themajorcontributorsinthegenerationof
thearchetypes.Fromthesethreeinitialsets,aftertheSWAPphase,ADAreturns
Archetypalshapesbasedonlandmarksandextensiontohandlemissingdata 7
threesetsofarchetypoids.ThesetwithlowestRSS(oftenthesamesetisobtained
from the three initializations) is the returned ADA solution.
An open question is the number p of archetypes or archetypoids to compute.
Note that neither archetypes nor archetypoids are necessarily nested. It could be
decided by the user or the elbow criterion could be used as made by Cutler and
Breiman (1994); Eugster and Leisch (2009); Vinu´e et al (2015a) (the value p is
selected as the point where the elbow on the RSS representation for a series of
different p values is found).
2.2 Shape space and shape distances
Theword“shape”isverycommonlyusedineverydaylanguage,usuallyreferringto
the visual appearance of a geometric object. More formally, shape can be defined
as geometric information about the object that is invariant under a Euclidean
similarity transformation, i.e., with respect to location (translation), orientation
(rotation) and scale (size) (Dryden and Mardia (1998)). In this work, the shape
of geometrical m-dimensional objects (usually m=2,3) is determined by a finite
numberofk>mcoordinatepoints,knownaslandmarkpoints.Eachobjectisthen
described by a k×m configuration matrix X = (x ) containing the m Cartesian
ij
coordinates x of its k landmarks, i.e. each row represents a landmark and each
ij
column represents one Cartesian coordinate of that landmark.
However,aconfigurationmatrixX isnotapropershapedescriptorbecauseit
is not invariant to similarity transformations. For any similarity transformation,
i.e., for any translation vector b ∈ Rm, scale parameter s ∈ R+, and m × m
rotationmatrixR,theconfigurationmatrixgivenbysXR+1 bT (where1 isthe
k k
k×1 vector of ones, and the superscript T means transpose) describes the same
shape as X.
Definition 1 The shape space Σk is the set of equivalence classes [X] of k ×
m
m configuration matrices X ∈ Rk×m under the action of Euclidean similarity
transformations.
A representative of each equivalence class [X] can be obtained by removing the
similarity transformations one at a time. There are different ways to do that.
LetX beaconfigurationmatrix.Onewaytoremovethelocationeffectconsists
ofmultiplyingitbytheHelmertsub-matrix((DrydenandMardia,2016,p.49-50)),
H, i. e., X =HX.
H
TofilterscalewecandivideX bythecentroidsize,whichisgivenbyS(X)=
H
(cid:107)X (cid:107).
H
X
Y = H (5)
(cid:107)X (cid:107)
H
iscalledthepre-shapeoftheconfigurationmatrixX becauseallinformationabout
location and scale is removed, but rotation information remains.
Definition 2 The pre-shape space Sk is the set of all possible pre-shapes Y.
m
Sk isahypersphereofunitradiusinRm(k−1) (aRiemannianmanifoldthatis
m
widely studied and known). Σk is the quotient space of Sk under rotations.
m m
8 IreneEpifanioetal.
Asaresult,ashape[X]isanorbitgeneratedbytherotationgroupSO(m)on
the pre-shape space.
For m=2, this quotient space is isometric with the complex projective space
CP , a familiar Riemannian manifold without singularities. For m > 2, which
k−2
is the case of our application, Σk is not a familiar space, and it has singulari-
m
ties; however, the Riemannian structure of the non-singular part of Σk can be
m
obtainedtakingintoaccountthatthequotientspaceΣk/SO(m)isaRiemannian
m
submersion; see Kendall et al (2009).
Different distances between shapes can be introduced in Σk. One of the most
m
popular is the full Procrustes distance. Given two configuration matrices X and
1
X , the full Procrustes distance is a least-squares type metric between the corre-
2
sponding pre-shapes Y and Y , respectively.
1 2
Definition 3 ThefullProcrustesdistancebetweenconfigurationmatricesX and
1
X is defined by:
2
d (X ,X )= inf (cid:107)Y −βY R(cid:107), (6)
F 1 2 2 1
R∈SO(m),β∈R
SO(m) being the orthogonal group of rotations.
The solution of this optimization problem is given by:
(cid:118)
(cid:117) m
(cid:117) (cid:88)
dF(X1,X2)=(cid:116)1−( λi)2,
i=1
where λ ≥ λ ≥ ...λ ≥| λ | are the square roots of the eigenvalues of
1 2 m−1 m
YTY YTY , and the smallest value λ is the negative square root if and only if
1 2 2 1 m
detYTY <0 Dryden and Mardia (1998).
1 2
Alternative distances could be used in the shape space. In particular, we will
also work with:
Definition 4 The Procrustes distance ρ(X ,X ) is the closest (over rotations)
1 2
great circle distance (the shortest distance between two points on the surface of
a sphere, measured along the surface of the sphere) between Z and Z on the
1 2
pre-shape space Sk.
m
The relationship between d and ρ is (Kendall et al (2009)): d (X ,X ) =
F F 1 2
sinρ(X ,X ).
1 2
Based on the full Procrustes distance a concept of mean shape [µ] can be
(cid:98)
introduced in the Fr´echet sense Fr´echet (1948), i.e., one that minimizes the sum
of squared distances from any shape in the set.
Definition 5 Given a set of configuration matrices X ,...,X , the full Pro-
1 n
crustes mean in Σk is given by
m
n
[µ]=arg inf (cid:88)d2(X ,µ). (7)
(cid:98) F i
µ:S(µ)=1
i=1
Fortwo-dimensionaldataanexpliciteigenvectorsolutionoftheoptimizationprob-
lem (5) is available (see p. 44 by Dryden and Mardia (1998)), but for m=3 and
higher dimensional data, an iterative procedure must be used.
Archetypalshapesbasedonlandmarksandextensiontohandlemissingdata 9
2.3 AA and ADA for shapes
Let X ,...,X be n landmark configuration matrices, with k landmarks in di-
1 n
mension m, each X is the representative of a shape [X ], an element of the shape
i i
space Σk. Henceforth, in order to simplify the notation, we will use X to denote
m
bothaconfigurationmatrixanditsshape,providedthatitisunderstoodfromthe
context.
OurobjectiveistofindshapesZ ∈Σk,j =1,..,pthatgeneralizethesettings
j m
(1)and(2)tothisspace.Theycannotbeuseddirectlybecauseshapespaceisnot
avectorialspaceandtheexpressions(cid:80)p α Z and(cid:80)n β X arenotdefined.
j=1 ij j i=1 jl l
Instead we propose to project X , ..., X to an approximating linear space
1 n
where this expression could be defined and then take it back to Σk.
m
In order to obtain this linear space we have to take into account that, as
noted before, the non-singular part of Σk has a Riemannian manifold structure
m
and so it is locally homeomorphic with a Euclidean space such that the local
homeomorphismscanbesmoothlypatchedtogether.Thisleadsustothedefinition
of the tangent space at each point (pole) that is linear and contains all possible
directions at the pole.
Additionally, the tangent space has the desirable property that the distance
from the shape to the pole is preserved, i.e. the distance from a point in the
manifold to the pole is equal to the Euclidean distance between its projections in
the tangent space. The type of distance depends on the choice of the tangent; in
our application the full Procrustes distance will be adopted (Dryden and Mardia
(2016)).
As one moves away from the pole, Euclidean distances between some pairs of
points in the tangent space are smaller than their corresponding shape distances.
This distortion becomes larger as one considers points that are more distant from
the pole. For this reason, the pole should be taken close to all of the points and
the mean of the observed shapes is the best choice (Rohlf (1998); Slice (2001)).
The tangent space of the shape space at the Procrustes mean (7) is called the
Procrustes tangent space.
As discussed before, if the data are fairly concentrated around this mean, the
EuclideandistanceintheProcrustestangentspaceisagoodapproximationtod ,
F
and standard multivariate techniques based on Euclidean distance can be used in
thisspace.Forthisreason,thisapproachiswidelyusedforstatisticalinferenceon
the shape space in many applications.
To know whether shape variation is sufficiently small in practical applica-
tions, Rohlf (1999) suggests comparing the Euclidean distances between all pairs
of points in the tangent space (or simply the distance to the average shape)
against their Procrustes distances in the shape space. Furthermore, Rohlf (1999)
also points out that, at least in biological applications, the approximation would
usually be good when there are more than just a few landmarks.
In terms of Riemannian manifolds, the maps that allow us to move from the
manifoldtothetangentspaceorviceversaarecalledlogarithmicandexponential
maps, respectively. For m=2 their expressions are easy because, as noted before,
inthiscasetheshapespaceisisometrictothecomplexprojectivespace.Toobtain
their expressions in the case of m > 2, which is the case of our application, we
have to take into account that the mapping π that assigns to each preshape Y
in Sk the corresponding element π(Y) in Σk is a Riemannian submersion and it
m m
10 IreneEpifanioetal.
maps the horizontal subspace of the tangent space to the pre-shape sphere at Y
isometricallyontothetangentspacetotheshapespaceatπ(Y).Usingthisresult,
theexponentialandlogarithmmapsinΣk canbecomputed(DrydenandMardia
m
(2016)).
Before showing the calculus, it is necessary to introduce the vectorizing oper-
ator. The vectorizing operator of an l×m matrix A with columns a ,a ,...,a
1 2 m
is defined as: vec(A)=(aT,aT,...,aT)T.
1 2 m
Let S be the pre-shape of the Procrustes mean µ of X ,...,X and Y ,...,Y
1 n 1 n
their respective preshapes, obtained using equation (5). To obtain the expression
of the projection onto the tangent plane at S of X ,...,X , the pre-shape Y is
1 n i
rotatedtobeascloseaspossibletoS.Wewritetherotatedpre-shapeasY Γˆ with
i i
the rotation matrix Γˆ. The expression of Γˆ can be found on p. 61 of Dryden and
i i
Mardia (1998):Γˆ = U VT, where U ,V ∈ SO(m) are the left and right matrices
i i i i i
of the singular value decomposition of STY . Then, the projection of Y on the
i i
tangent space at S is:
trace(STY Γˆ)
log (Y )=(I −vec(S)vec(S)T)vec(Y Γˆ) i i , (8)
S i km−m i i sin(trace(STY Γˆ))
i i
where I is the (km−m)×(km−m) identity matrix.
km−m
ThelogarithmictangentprojectionhasthepropertythatitpreservesthePro-
crustes distance (Def. 4). Instead of using these coordinates, we propose to use:
sin(trace(STY Γˆ))
v =log (Y ) i i .
i S i trace(STY Γˆ)
i i
Thev coordinatesarecalledKent’spartialtangentcoordinates.Wechosethis
i
projection because instead of preserving the Procrustes distance it preserves the
full Procrustes distance (Def. 6). The practical difference is that, using Kent’s
partial tangent coordinates the more extreme observations are pulled in more
towards the pole Dryden and Mardia (2016).
We are now ready to introduce the definition of AA and ADA for shapes.
Let v ,...,v be the tangent coordinates of X ,...,X . The coordinates in the
1 n 1 n
tangent space u j = 1,..,p of the archetypes Z ∈ Σk, j = 1,..,p are obtained
j j m
by minimizing:
n p n p n
RSS =(cid:88)(cid:107)v −(cid:88)α u (cid:107)2 =(cid:88)(cid:107)v −(cid:88)α (cid:88)β v (cid:107)2, (9)
i ij j i ij jl l
i=1 j=1 i=1 j=1 l=1
under the constraints
p
(cid:88)
1) α =1 with α ≥0 and i=1,...,n, j =1,...,p and
ij ij
j=1
n
(cid:88)
2) β =1 with β ≥0 and j =1,...,p and l=1,...,n.
jl jl
l=1
For the case of ADA, the definition is analogous, but constraint 2) is changed
as in Eq. 4.
A significant difference between our methodology and the usual statistical
methods for shape analysis is that once archetypes are computed in the tangent
Description:Anthropometric data · Children's wear · Missing data. Mathematics . AA and ADA have been applied in many different fields, such as astrophysics.