ebook img

Principal Polynomial Analysis PDF

1.9 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Principal Polynomial Analysis

1.1 Desirablepropertiesinmanifoldlearning In recent years, several dimensionality reduction methods have been proposed to deal with manifolds that can not be linearly de- Principal Polynomial Analysis scribed(see[32]foracomprehensivereview): theapproachespro- posedrangefromlocalmethods[4,45,49,50,52], tokernel-based and spectral decompositions [44,46,53], neural networks [15,20, ValeroLaparra,SandraJime´nez, 26],andprojectionpursuitmethods[22,27]. However,despitethe DevisTuia,GustauCamps-VallsandJesu´sMalo∗†‡ advantages of nonlinear methods, classical PCA still remains the mostwidelyuseddimensionalityreductiontechniqueinrealappli- cations. ThisisbecausePCA:1)iseasytoapply,2)involvessolv- ingaconvexproblem,forwhichefficientsolversexist,3)identifies featureswhichareeasilyinterpretableintermsoforiginalvariables, Abstract and4)hasastraightforwardinverseandout-of-sampleextension. Theaboveproperties,whicharethebaseofthesuccessofPCA, This paper presents a new framework for manifold learning based are not always present in the new nonlinear dimensionality reduc- on a sequence of principal polynomials that capture the possibly 6 tion methods due either to complex formulations, to the introduc- 1nonlinear nature of the data. The proposed Principal Polynomial tion of a number of non-intuitive free parameters to be tuned, to 0Analysis (PPA) generalizes PCA by modeling the directions of theirhighcomputationalcost,totheirnon-invertibilityor,insome 2maximal variance by means of curves, instead of straight lines. cases, to strong assumptions about the manifold. More plausibly, Contrarilytopreviousapproaches,PPAreducestoperformingsim- n the limited adoption of nonlinear methods in daily practice has to apleunivariateregressions,whichmakesitcomputationallyfeasible dowiththelackoffeatureandmodelinterpretability.Inthisregard, Jandrobust. Moreover,PPAshowsanumberofinterestinganalyti- the usefulness of data description methods is tied to the following 1calproperties.First,PPAisavolume-preservingmap,whichinturn properties: 3guaranteestheexistenceoftheinverse. Second,suchaninversecan beobtainedinclosedform. Invertibilityisanimportantadvantage 1. Invertibilityofthetransform. Itallowsbothcharacterizingthe ]over other learning methods, because it permits to understand the transformed domain and evaluating the quality of the trans- L identified features in the input domain where the data has physi- form. On the one hand, inverting the data back to the in- M cal meaning. Moreover, it allows to evaluate the performance of put domain is important to understand the features in physi- .dimensionalityreductioninsensible(input-domain)units. Volume callymeaningfulunits,whileanalyzingtheresultsinthetrans- t apreservationalsoallowsaneasycomputationofinformationtheo- formeddomainistypicallymorecomplicated(ifnotimpossi- t sreticquantities,suchasthereductioninmulti-informationafterthe ble). Ontheotherhand,invertibletransformslikePCAallow [transform. Third,theanalyticalnatureofPPAleadstoacleargeo- the assessment ofthe dimensionality reduction errors as sim- metricalinterpretationofthemanifold:itallowsthecomputationof plereconstructiondistortion. 1 vFrenet-Serretframes(localfeatures)andofgeneralizedcurvatures 2. Geometricalinterpretationofthemanifold.Understandingthe 1atanypointofthespace.Andfourth,theanalyticalJacobianallows system that generated the data is the ultimate goal of mani- 2thecomputationofthemetricinducedbythedata,thusgeneralizing foldlearning. Invertingthetransformisjustonesteptowards 2the Mahalanobis distance. These properties are demonstrated the- knowledgeextraction. Geometricalinterpretationandanalyt- 0 oreticallyandillustratedexperimentally. TheperformanceofPPA 0 ical characterization of the manifolds give us further insight is evaluated in dimensionality and redundancy reduction, in both . intotheproblem. Ideally,onewouldliketocomputegeomet- 2syntheticandrealdatasetsfromtheUCIrepository. ric properties from the learned model, such as the curvature 0 andtorsionofthemanifold,orthemetricinducedbythedata. 6 11 Introduction Thisgeometricalcharacterizationallowstounderstandthela- : tentparametersgoverningthesystem. v iPrincipalComponentAnalysis(PCA),alsoknownastheKarhunen- It is worth noting that both properties are scarcely achieved in the X Loe`ve transform or the Hotelling transform, is a well-known manifoldlearningliterature. Forinstance,spectralmethodsdonot rmethod in machine learning, signal processing and statistics [24]. generallyyieldintuitivemappingsbetweentheoriginalandthein- a PCAessentiallybuildsanorthogonaltransformtoconvertasetof trinsic curvilinear coordinates of the low dimensional manifold. observations of possibly correlated variables into a set of linearly Even though a metric can be derived from particular kernel func- uncorrelated variables. PCA has been used for manifold descrip- tions [6], the interpretation of the transformation is hidden behind tion and dimensionality reduction in a wide range of applications animplicitmappingfunction,andsolvingthepre-imageproblemis because of its simplicity, energy compaction, intuitive interpreta- generally not straightforward [21]. In such cases, the application tion, andinvertibility. Nevertheless, PCAishamperedbydataex- of(indirect)evaluationtechniqueshasbecomearelevantissuefor hibiting nonlinear relations. In this paper, we present a nonlinear methodsleadingtonon-invertibletransforms[51].Onecouldargue generalization of PCA that, unlike other alternatives, keeps all the thatdirectandinversetransformscanbealternativelyderivedfrom abovementionedappealingpropertiesofPCA. mixtures of local models [4]. However, the effect of these local alignment operations in the metric is not trivial. In the same way, ∗ElectronicversionofanarticlepublishedasInt. J.NeuralSyst. 24(7)(2014) explicitgeometricdescriptionsofthemanifold,suchasthecompu- [DOI:10.1142/S0129065714400073][copyrightWorldScientificPublishingCom- tationofcurvatures,isnotobviousfromotherinvertibletransforms, pany]. †ImageProcessingLaboratory(IPL),UniversitatdeVale`ncia,Catedra´ticoA.Es- asautoencodersordeepnetworks[15,20,26,27]. cardino-46980Paterna,Vale`ncia(Spain).E-mail:{valero.laparra,jesus.malo,gus- In this paper, we introduce the Principal Polynomial Analysis tau.camps}@uv.es (PPA),whichisanonlineargeneralizationofPCAthatstillshares ‡ThisworkwaspartiallysupportedbytheSpanishMinistryofEconomyand allitsimportantproperties. PPAiscomputationallyeasyasitonly Competitiveness (MINECO) under project TIN2012-38102-C03-01, and under a EUMETSATcontract. reliesonmatrixinversionandmultiplication,anditisrobustsince 1 2 2015. PublishedinIJNS-WorldScientific. DOI:10.1142/S0129065714400073 itreducestoaseriesofmarginal(univariate)regressions. PPAim- 1.3 Outlineofthepaper plements a volume-preserving and invertible map. Not only the Thepaperisorganizedasfollows.Section2formalizestheforward features are easy to interpret in the input space but, additionally, PPA transform and analytically proves that PPA generalizes PCA the analytical nature of PPA allows to compute classical geomet- andimprovesitsperformanceindimensionalityreduction. Theob- rical descriptors such as curvature, torsion and the induced metric jectivefunctionofPPA,itsrestrictions, anditscomputationalcost at any point of the manifold. Applying the learned transform to are then analyzed. Section 3 studies the main properties of PPA: newsamplesisalsoasstraightforwardasinPCA.Preliminaryver- Jacobian,volumepreservation,invertibility,andmetric. InSection sionsofPPAwerepresentedin[31],andappliedtoremotesensing 4wediscussthedifferencesbetweenPPAandrelatedwork.InSec- in[30].However,thoseconferencepapersdidnotstudytheanalyti- tion5, wecheckthegeneralizationofMahalanobisdistanceusing calpropertiesofPPA(volumepreservation,invertibility,andmodel the PPA metric, and its ability to characterize the manifold geom- geometry),norcomparedwithapproachesthatfollowsimilarlogic etry(curvatureandtorsion). Finally, wereportresultsonstandard likeNL-PCA. databasesfordimensionalityandredundancyreduction. Section6 concludes the paper. Additionally, the appendix details a step-by- stepexampleoftheforwardtransform. 1.2 IllustrationofPrincipalPolynomialAnalysis 2 Principal Polynomial Analysis The proposed PPA method can be motivated by considering the In this section, we start by reviewing the PCA formulation as a conditional mean of the data. In essence, PCA is optimal for di- deflationary(orsequential)methodthataddressesonedimensionat mensionalityreductioninameansquareerror(MSE)senseifand a time. This is convenient since it allows to introduce PPA as the only if the conditional mean in each principal component is con- generalizationthatusespolynomialsinsteadofstraightlinesinthe stant along the considered dimension. Hereafter, we will refer to sequence. this as the conditional mean independence assumption. Unfortu- nately, this symmetry requirement does not apply in general, as many datasets live in non-Gaussian and/or curved manifolds. See 2.1 Thebaseline: PrincipalComponentAnalysis forinstancethedatainFig.1(left): thedimensionshaveanonlin- ear relation even after PCA rotation (center). In this situation, the Givenad-dimensionalcenteredrandomvariablex,thePCAtrans- mean of the second principal component given the first principal form, R, maps data from the input domain, X ⊆ Rd×1, to a re- component can be easily expressed with a parabolic function (red sponse domain, R ⊆ Rd×1. PCA can be actually seen as a se- quentialmapping(orasetofconcatenatedd−1transforms). Each line). Fordatamanifoldslackingtherequiredsymmetry,nonlinear transforminthesequenceexplainsasingledimensionoftheinput modificationsofPCAshouldremovetheresidualnonlineardepen- databycomputingasinglecomponentoftheresponse: dence.   Followingthepreviousintuition,PPAaimstoremovethecondi-      α  α1 1 tionmean. LeftpanelinFig.1showstheinput2ddatadistribution, α α  α2  where we highlight a point of interest, x. PPA is a sequential al-  x −R→1  1 −R→2  2 ···R−d→−1  .. , (1)  0       .  gorithm (as PCA) that transforms one dimension at each step in    x   x    1 2  αd−1  the sequence. The procedure in each step consists of two opera- x d−1 tions. The first operation looks for the best vector for data pro- jection. Eventhoughdifferentpossibilitieswillbeconsideredlater and hence the PCA transformation can be expressed as: R = (Section2.3), aconvenientchoice forthisoperation isthe leading R ◦R ◦···◦R ◦R . Herevectors, x , andtransforms, d−1 d−2 2 1 p eigenvectorofPCA.Figure1[middle]showsthedataafterthispro- R ,refertothep-thstepofthesequence. Eachoftheseelementary p jection: althoughthelineardependencieshavebeenremoved,there transforms,R ,actsonlyonpartofthedimensionsoftheoutputof p arestillrelationsbetweenthefirstandtheseconddatadimensions. theprevioustransform: theresidual,x . Subscriptp = 0refers p−1 Thesecondoperationconsistsinsubtractingtheconditionalmean to the input data so x = x. This sequential (deflationary) inter- 0 to every sample. The conditional mean is estimated by fitting a pretation, which is also applicable to PPA as we will see later, is curvepredictingtheresidualusingtheprojectionsestimatedbythe convenienttoderivemostofthepropertiesofPPAinSection3. firstoperation. In PCA, each transform R : (1) α , which is the projection of p p the data coming from the previous step, x , onto the unit norm This step, composed of the two operations above, describes the p−1 vector e ; and (2) xPCA, which are the residual data for the next d-dimensional data along one curvilinear dimension through (1) a p p step,obtainedbyprojectingx inthecomplementspace: projectionscoreontocertainleadingvector,and(2)acurvedepend- p−1 ing on the projection score. PPA differs from PCA in this second α = e(cid:62)x p p p−1 operationbecauseitbendsthestraightlineintoacurve,thuscaptur- xPCA = E(cid:62)x , (2) ingpartofthenonlinearrelationsbetweentheleadingdirectionand p p p−1 the orthogonal subspace. Since this example is two-dimensional, whereE(cid:62)isa(d−p)×(d−p+1)matrixcontainingtheremaining PPAendsafteronestep.However,whentherearemoredimensions, p setofvectors. InPCA,e isthevectorthatmaximizesthevariance the two-operations are repeated for the remaining dimensions. At p oftheprojecteddata: the first step, the (d − 1)-dimensional information still to be de- scribedisthedeparturefromthecurveinthesubspaceorthogonal e =argmax{E[(e(cid:62)x )2]}, (3) to the leading vector. This data of reduced dimension is the input p p−1 e for the next step in the sequence. The last PPA dimension will be the1dresidualwhich,inthisexample,correspondstotheresiduals where e ∈ R(d−p+1)×1 represents the set of possible unit norm intheseconddimension. vectors. E(cid:62)canbeanymatrixthatspansthesubspaceorthonormal p Laparraetal.,2015 3 Figure1: ThetwooperationsineachstageofPPA:projectionandsubtractionofthepolynomialprediction. Left: inputmean-centered data.Anillustrativesample,x,ishighlighted.ThissetisnotsuitableforPCAbecauseitdoesnotfulfiltheconditionalmeanindependence assumption: thelocationoftheconditionalmeaninthesubspaceorthogonaltoPC1stronglydependsonPC1. Center: PCAprojection (rotation)and estimationof theconditionalmean bya polynomialof degree2 (redcurve) fittedto minimizetheresidual |x−mˆ| ∀x. Theblacksquare(α)istheprojectionofxontoPC1. Thediamond(inred),mˆ,inthecurverepresentstheestimatedconditionalmean of x predicted from the projection α. The advantage of the polynomial with regard to the straight line is that it accounts for what can benonlinearlypredicted. Right: thedataafterremovingtheestimatedconditionalmean(PPAsolution). Seetheon-linepaperforcolor figures. toe ,anditsrowscontaind−porthonormalvectors. Accordingly, andindependentlyofthemethodusedtochoosee ,thetruncation p p e andE fulfil: errorinPPAis: p p E(cid:62)pep = Ø MSEPpPA =E[(cid:107)E(cid:62)pxp−1−mˆp(cid:107)22]. (7) E(cid:62)E = I , (4) p p (d−p)×(d−p) Estimation of the conditional mean at step p. The conditional mean can be estimated with any regression method mˆ = g(α ). whichwillbereferredtoastheorthonormalityrelationsofe and p p p In this work, we propose to estimate the conditional mean at each E inthediscussionbelow. p stepofthesequenceusingapolynomialfunctionwithcoefficients Inthep-thstepofthesequence,thedatayettobeexplainedisx . p w anddegreeγ . Hence,theestimationproblembecomes: Therefore, truncating the PCA expansion at dimension p implies pij p ignoringtheinformationcontainedinx sothatthedimensionality reductionerroris: p  wp11 wp12 ··· wp1(γp+1)  1  MSEPpCA =E[(cid:107)E(cid:62)pxp−1(cid:107)22]=E[(cid:107)xp(cid:107)22]. (5) mˆp= wwpp...2311 wwpp...2322 ··.··.··. wwpp32((γγ...pp++11)) αα...p2p, (8) PCA is the optimal linear solution for dimensionality reduction in wp(d−p)1 wp(d−p)2 ··· wp(d−p)(γp+1) αγpp MSEtermssinceEq.(3)impliesminimizingthedimensionalityre- ductionerrorinEq.(5)duetotheorthonormalnatureoftheprojec- which, in matrix notation is mˆp = Wpvp, where Wp ∈ tionvectorsepandEp. R(d−p)×(γp+1),andvp =[1,αp,αp2,...,αpγp](cid:62). Notethatwhenconsideringninputexamples,wemaystackthem column-wiseinamatrixX ∈Rd×n.IntheabovementionedPCA- 2.2 Theextension: PrincipalPolynomialAnalysis 0 basedsolution,thep-thstepofthePPAsequencestartsbycomput- PPA removes the conditional mean in order to reduce the recon- ingPCAonXp−1. Then,weusethefirsteigenvectorofthesample struction error of PCA in Eq. (5). When the data fulfill the con- covarianceasleadingvectorep,andtheremainingeigenvectorsas ditional mean independence requirement, the conditional mean at Ep. These eigenvectors are orthonormal; if a different strategy is everypointintheep directioniszero. Inthiscase,thedatavector usedtofindep,thenEpcanbechosentobeanyorthonormalcom- goesthroughthemeansinthesubspacespannedbyEp,resultingin plementofep(seeSection2.3). Fromtheprojectionsofthensam- a small PCA truncation error. However, this is not true in general plesontotheleadingvector(i.e. fromthencoefficientsαp,k with (cf. Fig. 1) and then the conditional mean mp = E[xp|αp] (cid:54)= 0. k =1,...,n),webuildtheVandermondematrixVp ∈R(γp+1)×n, In order to remove the conditional mean m from x , PPA mod- bystackingthencolumnvectorsv ,withk =1,...,n. p p pk ifies the elementary PCA transforms in Eq. (2) by subtracting an Then,theleastsquaressolutionforthematrixW ofcoefficients p estimationoftheconditionalmean,mˆ : ofthepolynomialis: p αp = e(cid:62)pxp−1 Wp =(E(cid:62)pXp−1)Vp†, (9) xPPA = E(cid:62)x −mˆ (6) where † stands for the pseudoinverse operation. Hence, the esti- p p p−1 p mation of the conditional mean for all the samples, column-wise Assuming for now that the leading vector, ep, is computed in the stackedinmatrixMˆp,is: samewayasinPCA,PPAonlydiffersfromPCAinthesecondop- erationofeachtransformRp(cf.Eq.(2)). However,thissufficesto Mˆp =WpVp, (10) ensurethesuperiorityofPPAoverPCA.Wewillrefertothispar- ticular choice of e as the PCA-based solution of PPA. In Section andtheresidualsforthenextstepare,XPPA =E(cid:62)X −Mˆ . p p p p−1 p 2.3, we consider more general solutions to optimize the objective Summarizing,theextraelementswithrespecttoPCAareasin- function at the cost of facing a non-convex problem. In any case, glematrixinversioninEq.(9)andthematrixproductinEq.(10). 4 2015. PublishedinIJNS-WorldScientific. DOI:10.1142/S0129065714400073 Also note that the estimation of the proposed polynomial is much 2.3 PPA cost function: alternative solutions and op- simplerthanfittingapolynomialdependingonanaturalparameter timizationproblems suchastheorthogonalprojectiononthecurve,asonewoulddoac- cording to the classical Principal Curve definition [19]. Since the ByconstructionPPAimprovesthedimensionalityreductionperfor- proposedobjectivefunctioninEq.(7)doesnotestimatedistortions manceofPCAwhenusingtherestrictedPCA-basedsolution. Here orthogonal to the curve but rather those orthogonal to the leading weshowthatbettersolutionsforthePPAcostfunctionmayexist, vectore ,thecomputationoftheprojectionsisstraightforwardand but unfortunately are not easy to obtain. Possible improvements p decoupledfromthecomputationofW . Theproposedestimation would involve (1) alternative functions to estimate the conditional p inEq.(9)consistsofd−pseparateunivariateproblemsonly: this mean,and(2)moreadequateprojectionvectorsep. meansthatPPAneedstofitd−pone-dimensionalpolynomialsde- Betterestimationsoftheconditionalmeancanbeobtainedwith pending solely on the (easy-to-compute) projection parameter α . prior knowledge about the system that generated the data. For in- p Sincelotsofsamplesn (cid:29) γ +1aretypicallyavailable,theesti- stance, if one knows that samples should follow an helical distri- p mationofsuchpolynomialsisusuallyrobust. Theconvenienceof bution, alinearcombinationofsinusoidscouldbeabetterchoice. thisdecouplingisillustratedinthestep-by-stepexamplepresented Evenforthesecases,leastsquareswouldobtaintheweightsofthe in the appendix. Since we compute W using least squares, we linearcombination.Nevertheless,inthiswork,werestrictourselves p obtainthreeimportantproperties: topolynomialssincetheyprovideflexibleenoughsolutionsbyus- ingtheappropriatedegree. Belowweshowthatonecanfitcompli- cated manifolds, e.g. helices, with generic polynomials. More in- Property1 The PPA error does not depend on the particular terestingly,geometricdescriptionsofmanifold,suchascurvatureor selectionofthebasisE ifitsatisfiestheorthonormalityrelations p torsion, canbecomputedfromthePPAmodeldespitebeingfunc- inEq.(4). tionallydifferentfromtheactualgenerativemodel. The selection of appropriate e is more critical, since Property p 1 implies that MSE does not depend on E , but only on e . The Proof: Using different basis E(cid:48) in the subspace orthogonal to e p p p p costfunctionfore measuringthedimensionalityreductionerroris is equivalent to applying an arbitrary (d − p) × (d − p) rotation p f(e): matrix, G, to the difference vectors expressed in this subspace in Eq.(7): MSEPpPA(G) = E[(cid:0)G(E(cid:62)pxp−1−mˆp)(cid:1)(cid:62)G(E(cid:62)pxp−1− ep = argminf(e)=argminE[(cid:107)E(cid:62)pxp−1−Wpvp(cid:107)22], mˆ )]. Since G(cid:62)G = I, the error is independent of this rotation, e e p s.t. E(cid:62)E =I andhenceindependentofthebasis. p p E(cid:62)e =Ø p p Property2 The PPA error is equal to or smaller than the PCA W =(E(cid:62)X )V†. error. p p p−1 p Thisconstrainedoptimizationdoesnothaveaclosed-formsolution, andonehastoresorttogradient-descentalternatives. Thegradient Proof: The PPA Eqs. (7) and (9) reduce to PCA Eq. (5) in the ofthecostfunctionf(e)is: restricted case of W = Ø. Since, in general, PPA allows for p Wsupper(cid:54)=iorØity,thoifsPimPAploievsetrhPatCMASiEnPpMPSAE≤teMrmSsEiPpsCcAle.aErevrewnhtheonutgahkitnhge ∂∂ef =E(cid:20)d(cid:88)−p2(E(cid:62)pixp−1−mˆpi) WpiQvpx(p−1)j(cid:21), (11) e asinPCA,thispropertyholdsingeneral. Ifabetterchoicefor pj i=1 p e isavailable,itwouldreducetheerrorwhilehavingnonegative p where E(cid:62) and W refer to the i-th rows of the corresponding impact in the cost function, since it is independent from the basis pi pi matrices,mˆ andx arethei-thandj-thcomponentsofthe E chosen(seeProperty1above). pi (p−1)j p correspondingvectors,andQ∈Rp×pis: Property3 PPAreducestoPCAwhenusingfirstdegreepolynomi- 0 1 0 0··· 0  als(i.e. straightlines). 0 0 2 ··· 0   Q=... ... ... ... ... , (12)   0 0 ··· 0 p−1 Proof: Inthisparticularsituation(γ =1, ∀p),thefirsteigenvec- p 0 0 ··· 0 0 torofX isthebestdirectiontoprojectonto[24]. Additionally, p−1 whenusingfirstdegreepolynomials,V isverysimpleandV† can p p Ingeneral,thePPAcostfunctionisnon-convex. Theproperties becomputedanalytically. PluggingthisparticularVp† intoEq.(9), of f(e) for the particular dataset at hand will determine the com- itiseasytoseethatW = Øsincethedataiscenteredandα is p p plexityoftheproblemandtheaccuracyoftherestrictedPCA-based decorrelatedofE(cid:62)x . Therefore,whenusingstraightlinesW p p−1 p solution. Actually,theexampleinFig.2showsthat,ingeneral,the vanishesandPPAreducestoPCA. PCA-basedsolutionfore issuboptimal,andbettersolutionsmay p Finally,alsonotethat,asinanynonlinearmethod,inPPAthere bedifficulttofindgiventhenon-convexityofthecostfunction. In isatrade-offbetweentheflexibilitytofitthetrainingdataandthe this2dillustration,theonlyfreeparameteristheorientationofe . p generalization ability to cope with new data. In PPA, this can be Fig. 2(b) shows the values of the error, f(e), as a function of the easily controlled selecting the polynomial degree γ . This can be orientationofe.SincePCArankstheprojectionbyincreasingvari- p done through standard cross-validation (as in our experiments), or ance (Eq. (3)), the PCA solution is suboptimal with respect to the byusinganyothermodelselectionproceduresuchasleave-one-out one obtained by PPA with gradient descent. The first PCA eigen- or(nested)v-foldcross-validation. Notethatthisparameterisalso vector does not optimize Eqs. (7) or (11). Even worse, the risk of interpretableandeasytotune,sinceitcontrolstheflexibilityofthe gettingstuckintoasuboptimalsolutionishighwhenusingrandom curvesorthereductionofPPAtoPCAintheγ =1case. initializationandsimplegradientdescentsearch. Laparraetal.,2015 5 a b c Figure 2: PPA objective is non-convex. (a) Samples drawn from a noisy parabola (blue) and the eigenvectors of the covariance matrix, PC1 and PC2(ingray). ThePPAparabolasobtainedfromprojectionsontoPC1(PCA-basedsolution)andontothePC2areplotin◦and(cid:3)respectively. (b) Dimensionalityreductionerror,f(e),fore vectorswithdifferentorientationφ,whereφ=0correspondstoPC2((cid:3))andφ= π correspondstoPC1 p 2 (◦).(c)FittedPPAparabolas(∗)forarangeoforientationsofthecorrespondinge (inblack). p 3 Jacobian, invertibility and induced met- TheresultsinthissectionsuggestthatthesimplePCA-basedso- lution for ep may be improved at the expense of solving a non- ric convex problem. According to this, in Section 4 we will present results for PPA optimized by using both the gradient descent and ThemostappealingcharacteristicsofPPA(invertibilityofthenon- the PCA-based solutions. But in all cases, and thanks to Property linear transform, its geometric properties and the identified fea- 2,PPAobtainsbetterresultsthanPCA. tures) are closely related to the Jacobian of the transform. This sectionpresentstheanalyticalexpressionoftheJacobianofPPAas well as the induced properties of volume preservation and invert- 2.4 PPAcomputationalcost ibility. Thenweintroducetheanalyticalexpressionfortheinverse andthemetricinducedbyPPA. PPAiscomputationallymorecostlythanPCA,whichinana¨ıveim- plementationroughlyscalescubicallywiththeproblemdimension- 3.1 PPAJacobian ality O(d3). In the case of PCA-based PPA, this cost is increased because,ineachofthed−1deflationarysteps,thepseudoinverse SincePPAisacompositionoftransforms,cf. Eq.(1),itsJacobian ofthematrixVphastobecomputed.Thesepseudoinversesinvolve istheproductoftheJacobiansateachstep: d−1operationsofcostO((γ+1)3). Therefore,intotal,thecost ofPCA-basedPPAisO(d3+(d−1)(γ+1)3). (cid:89)1 ∇R(x)= ∇R =∇R ∇R ···∇R ∇R . (13) If the gradient-descent optimization, Eq. (11), is used, the cost p d−1 d−2 2 1 increasessubstantiallysincethesameproblemissolvedforanum- p=d−1 berofiterationskuntilconvergence,O(k(d3+(d−1)(γ+1)3)). Therefore, thequestionreducestocomputetheJacobian∇R for p The cost associated to this search may be prohibitive in many ap- eachelementarytransforminthesequence.Takingintoaccountthe plications,butitisstilllowerthanthecostofothergeneralizations expression for each elementary transform in Eq. (6), and the way of PCA: kernel-PCA scales with the number of samples, O(n3), m isestimatedinEq.(10),simplederivativesleadto: p whichistypicallylargerthanthedimensionalityn (cid:29) d,andnon- analyticPrincipalCurvesareslowtoapplysincetheyrequirecom-   I 0 (p−1)×(p−1) (p−1)×(d−p+1) putingdcurvespersample. ∇Rp =0 (cid:32) e(cid:62)p (cid:33)−(cid:32) 01×(d−p+1) (cid:33), (14) (d−p+1)×(p−1) E(cid:62)p upe(cid:62)p 2.5 PPARestrictions whereu =W v˙ andv˙ =[0,1,2α ,...,γ αγp−1](cid:62).Notethat p p p p p p p PPAhastwomainrestrictionsthatlimittheclassofmanifoldsfor the block structure of the Jacobian of each elementary transform which PPA is well suited. First, PPA needs to fit uni-valued func- and the identity in the top left block are justified by the fact that tions in each regression in order to ensure the transform is a bi- eachRp onlyactsontheresidualxp−1 oftheprevioustransform, jection. This may not be a good solution when the manifold ex- i.e. Rp doesnotmodifythefirstp−1componentsoftheprevious hibits bifurcations, self-intersections, or holes. While other (non- output. analytical) principal curves methods can deal with such complex- ities [25,42], their resulting representations could be ambiguous, 3.2 PPAisavolume-preservingmapping sinceasinglecoordinatevaluewouldmapclosepoints,whichare farintheinputspace. Thiscanbeinturnproblematictodefinean Proof: The volume of any d-cube is invariant under a nonlinear inversefunction. mappingRif|∇R(x)| = 1,∀x ∈ X [9]. InthecaseofPPA,the Secondly, PPA assumes stationarity along the principal directions above is true if |∇Rp| = 1 for every elementary transform Rp in asdoneinPCA.Thisisnotaproblemifthedatafollowthesame Eq.(13). Toprovethis,weneedtofocusonthedeterminantofthe (cid:12) (cid:12) kpianldcuorfvceo.nHdoitwioenvaelr,psruocbhabciolintdyitdieonnsditoyefsunnocttihoonldalionnggeenaecrhal.prMinocrie- bottom-rightsubmatrixof∇Rp,since(cid:12)(cid:12)(cid:12)A0 B0(cid:12)(cid:12)(cid:12) = |A||B|,wherein flexibleframeworkssuchastheSequentialPrincipalCurvesAnal- ourcaseAistheidentitymatrix. Sincethedeterminantofamatrix ysis[28]aregoodalternativestocircumventthisshortcoming,but isthevolumeoftheparallelogramdefinedbytherowvectorsinthe atthepriceofahighercomputationalcost. matrix,theparallelogramdefinedbythevectore(cid:62) andthevectors p 6 2015. PublishedinIJNS-WorldScientific. DOI:10.1142/S0129065714400073 (a)Originaldata (b)Inputdomain (c)PPAdomain (d)WhitenedPPAdomain Figure3: PPAcurvilinearfeaturesanddiscriminationellipsoidsbasedonthePPAmetric.(a)Non-linearlyseparabledata.PPAresultsforthefirstclass data: (b)intheinputdomain,(c)inthePPAdomain,and(d)inthewhitenedPPAdomain,whichisincludedhereforthesakeofcomparisonwiththe Mahalanobismetric.Thecurvilinearfeatures(blackgrid)arecomputedfromthepolynomialsfoundbyPPA,whiletheunitradiusspheresrepresentthe metricinducedbythewhitenedPPAdomainineachdomain. in E(cid:62) is a unit volume (d−p+1)-cube due to the orthonormal andcanbecomputedas: p nature of these vectors. The right-hand matrix subtracts a scaled version of the leading vector, upie(cid:62)p, to the vector in the i-th row d2PPA(x,x+∆x)=∆x(cid:62)M(x)∆x, (16) ofE(cid:62),withi = [1,...,d−p]. Independentlyofweightsu ,this p pi andthePPA-inducedmetricM(x)istiedtotheJacobian, isashearmappingofthe(d−p+1)-cubedefinedbye(cid:62) andE(cid:62). p p Therefore,afterthesubtraction,thedeterminantofthissubmatrixis M(x)=∇R(x)(cid:62)Λ−1 ∇R(x) (17) PPA still1. Asaresult|∇R |=1,andhence|∇R(x)|=1, ∀x∈X. p and Λ defines the metric in the PPA domain. In principle, Volumepreservationisanappealingpropertywhendealingwith PPA onecanchooseΛ dependingonthepriorknowledgeaboutthe distributions in different domains. Note that probability densities PPA problem. Forinstance,aclassicalchoiceinclassificationproblems under transforms depend only on the determinant of the Jacobian: is the Mahalanobis metric [10,37]. Mahalanobis metric is equiv- p (x) = p (y)|∇R(x)|,forPPAp (x) = p (y). Apossibleuse x y x y alent to using Euclidean metric after whitening, i.e. after dividing of this property will be shown in sec. 5.4 to compute the multi- each PCA dimension by its standard deviation. One can general- informationreductionachievedbythetransform. izeMahalanobismetricusingPPAbyselectingaΛ asamatrix PPA whosediagonaliscomposedbythevarianceofeachdimensionin 3.3 PPAisinvertible thePPAdomain. Oranalogously,employingtheEuclideanmetric afterwhiteningthePPAtransform. Figure3showsanexampleof Proof: A nonlinear transform is invertible if its derivative (Jaco- theunitdistancelociinducedbythegeneralizedMahalanobisPPA bian) exists and it is non-singular ∀x. This is because, in general, metricindifferentdomains. Thebenefitsofthismetricforclassifi- theinversecanbethoughtastheintegrationofadifferentialequa- cationwillbeillustratedinSection5.1. tiondefinedbytheinverseoftheJacobian[14,33]. Therefore,the volume preservation property, which ensures that the Jacobian is 4 Related Methods non-singular,alsoguaranteestheexistenceoftheinverse. Here we present a straightforward way to compute the inverse by The qualitative idea of generalizing principal components from undoing each of the elementary transforms in the PPA sequence. straight lines to curves is not new. Related work includes ap- GiventhatthereisnolossofinformationineachPPAstep,thein- proachesbasedon(1)non-analyticalprincipalcurves[11,12,28,41, verse has perfect reconstruction, i.e. if there is no dimensionality 42,54],(2)fittinganalyticcurves[3,8,24],and(3)implicitmethods reduction the inverted data is equal to the original one. Given a transformedpoint,r=[α ,α ,...,α ,x ](cid:62),andtheparam- based on neural networks and autoencoders [15,20,26] as well as 1 2 d−1 d−1 eters of the learned transform (i.e. the variables e , E , and W , reproducingkernelsasinthekernel-PCA[46]. Herewereviewthe p p p forp=1,...,d−1),theinverseisobtainedbyrecursivelyapplying differencesbetweenPPAandtheseapproaches. thefollowingtransform: Non-analytic Principal Curves. In the Principal Curves litera-   α  ture,interpretationoftheprincipalsubspacesasd-dimensionalnon- p xp−1 =ep Ep  (15) linearrepresentationsisonlymarginallytreatedin[12,41,42].This x +W v isduetothefactthatsuchsubspacesarenotexplicitlyformulated p p p asdatatransforms. Actually,in[42]theauthorsacknowledgethat, eventhoughtheiralgorithmcouldbeusedasarepresentationifap- 3.4 PPAgeneralizesMahalanobisdistance plied sequentially, such an interpretation was not possible at that point since the projections lacked the required accuracy. The pro- When dealing with non-linear transformations, it is useful to have posed PPA is closer to the recently proposed Sequential Principal a connection between the metrics (distances) in the input and CurvesAnalysis(SPCA)[28]wherestandardandsecondaryprin- transformed domains. For instance, if one applies a classification cipalcurves[7,19]areusedascurvilinearaxestoremovethenon- methodinthetransformeddomain,itiscriticaltounderstandwhich lineardependenceamongtheinputdimensions. Whileflexibleand aretheclassificationboundariesintheoriginaldomain. interpretable, defining a transformation based on non-parametric Consistently with results reported for other nonlinear map- Principal Curves (as in SPCA) has two main drawbacks: (1) it is pings [13,28,29,38], the PPA-induced distance in the input space computationally demanding since, in d-dimensional scenarios, the followsastandardchangeofmetricunderchangeofcoordinates[9] frameworkrequiresdrawingdindividualPrincipalCurvespertest Laparraetal.,2015 7 sample, and (2) the lack of analytical form in the principal curves are accurate in MSE terms in the input domain (no matter the impliesnon-trivialparametertuningtoobtaintheappropriateflexi- pre-imaging technique). This is a fundamental difference with bilityofthecurvilinearcoordinates. Toresolvetheseissuesanden- PCA(andwithPPA).Forthisreason,usingKPCAinexperiments sureminimalparametertuning,weproposeheretofitpolynomials where reconstruction is necessary (as those in Section 5.3) would thatestimatetheconditionalmeanalongeachlineardirection. We notbefairtoKPCA. acknowledge that the higher flexibility of methods based on non- parametricPrincipalCurvessuggestspossiblybetterperformances Similarlyto[16],themainmotivationofPPAisfindingtheinput thanPPA.However,itisdifficulttoprovesuchintuition,since,con- datamanifoldthatbestrepresentsthatdatastructureinamultivari- trarilytoPPA,thesemethodsdonotprovideananalyticsolution. ateregressionproblem.Theabovediscussionsuggeststhatthepro- posednonlinearextensionofPCAopensnewpossibilitiesinrecent applications of linear PCA such as [1,2,17,23,40], and in cases Methodsfittinganalyticcurves. AdditivePrincipalComponents whereitisnecessarytotakehigherorderrelationsintoaccountdue (APC)proposedin[8]explicitlyfitsasequenceofnonlinearfunc- tothenonlinearnatureofthedata[39]. tions as PPA. However, the philosophy of their approach differs from Principal Curves since they focus on the low variance fea- tures. Inthelinearcase,sequentialordeflationaryapproachesmay 5 Experiments equivalentlystartbylookingforfeaturesthatexplainmostorleast ofthevariance.However,inthenonlinearAPCcase,theinterpreta- This section illustrates the properties of PPA through a set of tionoflowvariancefeaturesisverydifferentfromthehighvariance four experiments. The first one illustrates the advantage of us- features [8]. The high variance features identified by APC do not ing the manifold-induced PPA metric for classification. The representasummaryofthedata,asPrincipalCurvesdo.Inthenon- second one shows how to use the analytic nature of PPA linearcase,minimizingthevarianceisnotthesameasminimizing to extract geometrical properties of the manifold. The third therepresentationerror,whichisourgoal. Therefore,ourapproach experiment analyzes the performance of PPA for dimension- isclosertoPrincipalCurvesapproachesofthepreviousparagraph ality reduction on different standard databases. Finally, we thantoAPC. show the benefits of the PPA volume-preserving property to Ourmethodalsopresentsamodelandminimizationoftherepre- compute the multi-information reduction. For the interested sentationerrorsubstantiallydifferenttotheFixedEffectCurvilinear reader, and for the sake of reproducibility, an online imple- Modelin[3].Thisdifferenceintheformulationisnottrivialsinceit mentation of the proposed PPA method can be found here: makestheirformulationfullyd-dimensional,whilewerestrictour- http://isp.uv.es/ppa.html. selvestoasequentialframeworkwhered−1polynomialsarefitted, ThesoftwareiswritteninMatlabandwastestedinWindows7and oneatatime. Moreover, thePPAprojectionsontothepolynomial Linux 12.4 over several workstations. It contains demos for run- are extracted using the subspace orthogonal to the leading vector, ningexamplesofforwardandinversePPAtransforms. Thecodeis whichmakestheestimationevensimpler. Additionally, their goal licensedundertheFreeBSDlicense(alsoknownasSimplifiedBSD (minimizing the representation error in a nonlinearly transformed license). domain)isnotequivalenttominimizingthedimensionalityreduc- tionerrorintheinputspace(asitisthecaseforPPA). 5.1 BenefitsofthePPAmetricinclassification Neural networks and autoencoders. Neural network ap- As presented above, the PPA manifold-induced metric provides proaches, namely nonlinear PCA [15, 24, 26] and autoen- moremeaningfuldistancemeasuresthantheEuclideandistanceor coders[20],sharemanypropertiesofPPA:theycanbeenforcedto its linear Mahalanobis distance counterpart. To illustrate this, we specificallyreducetheMSE,arenon-linear, invertible, andcanbe considerk-nearestneighbors(k-NN)classification,whosesuccess easilyappliedtonewsamples[48].However,thenonlinearfeatures stronglydependsontheappropriatenessofthedistanceused[10]. are not explicit in the formulation and one is forced to use the in- We focus on the synthetic data in Fig. 3, where two classes are verseofthetransformationtovisualizethecurvilinearcoordinates presented. They have both been generated from noisy parabolas. oftheidentifiedlowdimensionalsubspace. Anotherinconvenience A cross-validation procedure on 1000 samples fitted the degree of is selecting the network architecture and fitting the model param- thepolynomialsdescribingthedatatoγ = 2. Figure4showsthe p eters (see [47] for a recent review), upon which the regularization positiveeffectofconsideringPPAmetricwhenΛ isadiagonal PPA abilityofthenetworkdepends. Thenumberofhiddenunitsistyp- matrixwiththevariancesoftheresponsecoefficients(i.e. general- ically assumed to be higher than the dimensionality of the input ization of the Mahalanobis distance) for k-NN classification [10]. space,butthereisstillnoclearwaytosetthenetworkbeforehand. Better performance is obtained when considering the PPA metric Asopposedtomoreexplicitmethods(PPAorSPCA),thecurvature comparedtotheEuclideanorthelinearMahalanobiscounterparts, oftheddimensionaldatasetisnotencodedusingdnonlinearfunc- especially for few training samples (Fig. 4). Moreover, the accu- tionswithdifferentrelevance,whichmakesthegeometricalanaly- racyofthe classifier builtwiththePPAmetricisfairly insensitive sisdifficult. to the number of neighbors k in k-NN, no matter the number of samples. ThegainobservedwiththePPAmetricincreaseswiththe KernelPCA. Thisnon-lineargeneralizationofPCAisbasedon curvatureofthedatadistribution(bottomrowofFig.4). Notethat, embeddingthedataintoahigher-dimensionalHilbertspace.Linear with higher curvatures, the Euclidean and the linear Mahalanobis features in the Hilbert space correspond to nonlinear features in metricperformsimilarlypoor. Whenalargernumberofsamplesis the input domain [46]. Inverting the Hilbert space representation availabletheresultsbecomeroughlyindependentofthecurvature, is not straightforward but a number of pre-imaging techniques buteveninthatsituationthePPAmetricoutperformstheothers. have been developed [21]. However, there is a more important The generalization of the Mahalanobis metric using PPA may complication. Whileitispossibletoobtainreduced-dimensionality also be useful in extending hierarchical SOM models using more representationsintheHilbertspaceforsupervisedlearning[5],the general distortion measures [35], which are useful for segmenta- KPCA formulation does not guarantee that these representations tion[34]. 8 2015. PublishedinIJNS-WorldScientific. DOI:10.1142/S0129065714400073 Figure4: EffectofPPAmetricink-nearestneighborsclassificationforlow(top)andhigh(bottom)curvatures. Figure5: GeometriccharacterizationofcurvilinearPPAfeaturesin3dhelicalmanifolds. ScatterplotsshowdatausedtotrainthePPAmodel(1000 trainingand1000cross-validationsamples)underthreedifferentnoiseconditions(seetext).Correspondinglineplotsshowtheactualfirstprincipalcurve (incyan)andtheidentifiedfirstcurvilinearPPAfeature(ingray). Theordersofthefirstpolynomialfoundbycrossvalidationwereγ = [12,14,12], 1 intherespectivenoiseconditions.LinesinRGBstandforthetangent,normalandbinormalvectorsoftheSerret-FrenetframeateachpointofthePPA polynomial. 5.2 DifferentialgeometryofPPAcurvilinearfeatures noise levels. We used a = 2, b = 0.8, and Gaussian noise of standarddeviations0.1,0.3,and0.6,respectively. Notethatinthe According to standard differential geometry of curves in d- high noise situation, the noise scale is comparable to the scale of dimensionalspaces[9],characteristicpropertiesofacurvesuchas thehelixparameters. generalized curvatures χ , with p = [1,...,d−1], and Frenet- p The tangent vectors of this first curvilinear feature (in red) are Serret frames, are related to the p-th derivatives of the vector tan- computedfromthefirstcolumnoftheinverseoftheJacobian(us- genttothecurve. Atacertainpointx,thevectortangenttothep-th ingEqs.(13)and(14)). TheothercomponentsoftheFrenet-Serret curvilineardimensioncorrespondstothep-thcolumnoftheinverse frames(in3d,thenormalandbinormalvectors,hereingreenand oftheJacobian. blue),arecomputedfromthederivativesofthetangentvector,and WenowusetheanalyticalnatureofPPAtoobtainacompletege- the generalized curvatures are given by the Frenet-Serret formu- ometriccharacterizationofthecurvilinearfeaturesidentifiedbythe las [9]. For each of the three examples, we report the curvature algorithm. IneachstepofthePPAsequence,thealgorithmobtains acurve(polynomial)inRd. Belowwecomputesuchcharacteriza- values obtained by the PPA curves, as well as the theoretical val- uesforthegeneratinghelix. Eventhoughcurvatureandtorsionare tionfordatacomingfromhelicalmanifoldswherethecomparison with ideal results is straightforward1. Note that this is not just an constantinanhelix,χP1PA andχP2PA areslightlypoint-dependent. ThatisthereasonforthestandarddeviationintheχPPA values. In illustrativeexercise,becausethismanifoldarisesinrealcommuni- i thisparticularillustration,theeffectofnoiseleadstoamorecurly cationproblems,andduetoitsinterestingstructure,itservedastest helix,henceoverestimatingthecurvatures. caseforPrincipalCurvesMethods[42]. The first example considers a 3d helix where the Frenet-Serret Inthesecondexample,weconsiderahigherdimensionalsetting framesareeasytovisualizeasorthonormalvectors.Figure5shows and embed 3d helices with arbitrary radius and pitch (in the [0,1] thefirstcurvilinearfeatureidentifiedbyPPA(ingray)comparedto range)intothe4dspacebyfirstaddingzerosinthe4thdimension, theactualhelixusedtogeneratethe3ddata(incyan),fordifferent andthenapplyingarandomrotationin4d. Sincetherotationdoes notchangethecurvatures,χtheorandχtheorcanbecomputedasin 1In3dspaces,thetwogeneralizedcurvaturesthatfullycharacterizeacurveare the 3d case, and χtheor = 01. Fig. 6 sho2ws the alignment between knownsimplyascurvatureandtorsion. Inthecaseofanhelixwithradius,a,and 3 pitch,2πb,thecurvatureandtorsionaregivenbyχ1 = |a|/(a2+b2)andχ2 = χtiheorandχPiPAfordifferentnoiselevels.WealsoreporttheχP3PA b/(a2+b2)[9]. values (that should be zero). Noise implies different curvature es- Laparraetal.,2015 9 timations along the manifold (larger variance), and, for particular • Vehicles. Thedatabasedescribesvehiclesthroughtheapplica- combinationsofaandb,noisealsoimpliesbiasintheestimations: tion of an ensemble of 18 shape feature extractors to the 2D divergenceformthe(idealagreement). silhouettesofthevehicles. Theoriginalsilhouettescomefrom Also remind that the PPA formulation allows to obtain Frenet- viewsfrommanydifferentdistancesandangles. Thisisasuit- Serret frames in more than three dimensions. However, visualiza- able dataset to assess manifold learning algorithms that can tioninthosecasesisnotstraightforward. Forillustrationpurposes adapttospecificdatainvariancesofinterest. here we focus on the first PPA curvilinear dimension. Neverthe- For every dataset we normalized the values in each dimen- less,thesamegeometricdescriptors(χ andFrenet-Serretframes) i sion between zero and one. We use a maximum of 20 dimen- canbeobtainedalongthecurvilinearfeatures. Estimationofcurva- sionswhichisthelimitintheavailableimplementationofNLPCA turesfromthePPAmodelmaybeinterestinginapplicationswhere (http://www.nlpca.org/)[47]. NotethatourimplementationofPPA geometrydeterminesresourceallocation[43]. doesnothavethisproblem. 5.3 Dimensionalityreduction Table1: Summaryofthedata-sets. In this section, we first illustrate the ability of PPA to visualize Database n((cid:93)samples) d(dimension) n/d high dimensional data in a similar way to Principal Volumes and 1 MagicGamma 19020 10 1902 Surfaces. Then, we compared the performance of PCA, PPA and 2 JapaneseVowels 9961 12 830 nonlinear PCA (NLPCA) of [15] in terms of reconstruction error 3 Pageblocks 5473 10 547 obtainedaftertruncatinganumberoffeatures. 4 Sat 6435 36 179 Data. We use six databases extracted from the UCI repository2. 5 Segmentation 2310 16 144 The selected databases deal with challenging real problems and 6 Vehicles 846 18 47 were chosen according to these criteria: they are defined in the real domain, they are high-dimensional (d ≥ 9), the ratio be- tweenthenumberofsamplesandthenumberofdimensionsislarge (n/d ≥ 40), andtheydisplaynonlinearrelationsbetweencompo- nents (which was evaluated by pre-visualizing the data). See data PPA learning strategies. In the experiments, the alternative summarybelowandintableI: strategiesdescribedinSections2.2and2.3willbereferredtoas:(1) PPA,whichisthePCA-basedsolutionthatinheritstheleadingvec- • MagicGamma. The dataset represent traces of high energy torsep fromPCA;and(2)PPAGD,whichisthegradient-descent gamma particles in a ground-based atmospheric Cherenkov solutionthatobtainsepviaminimizationofEq.(11). Inbothcases, gammatelescope.Theavailableinformationconsistsofpulses thetransformsareobtainedusing50%ofthedata,andthepolyno- left by the incoming Cherenkov photons on the photomulti- mial degree is selected automatically (in the range γ ∈ [1,5]) by plier tubes, arranged in an image plane. The input features cross-validationusing50%ofthetrainingdata. are descriptors of the clustered image of gamma rays in an hadronicshowerbackground. PPA Principal Curves, Surfaces and Volumes. First we illus- trate the use of PPA to visualize the ”MagicGamma” data using a • JapaneseVowels. Thisdatasetdealswithvowelidentification small number of dimensions. Figure 7 shows how the model ob- injapanese,andcontainscepstrumcoefficientsestimatedfrom tained by PPA (red line and grey grids) adapts to the samples (in speech. NinespeakersutteredtwoJapanesevowels/ae/suc- blue). All plots represent the same data from different points of cessively.Linearanalysiswasappliedtoobtainadiscrete-time view. Note that the relation between data dimensions cannot be series with 12 linear prediction cepstrum coefficients, which explainedwithlinearcorrelation. constitutetheinputfeatures. The curve (red) in the plots corresponds to the first identified polynomialortothedatareconstructedusingjustonePPAdimen- • Pageblocks.Thedatabasedescribestheblocksofthepagelay- sion. ThegridsinthefirstrowofFig.7werecomputedbydefining out of documents that have been detected by a segmentation auniformgridinthefirsttwodimensionsofthetransformedPPA process. Thefeaturevectorscomefrom54distinctdocuments domain,andtransformingitbackintotheoriginaldomain. Second andcharacterizeeachblockwith10numericalattributessuch rowinFig.7representsvisualizationsinthreedimensions,together asheight,width,area,eccentricity,etc. withgridscomputedinvertinguniformsamplesina5×5×5cube (or5stackedsurfaces)inthePPAdomain. • Sat. This dataset considers a Landsat MSS image consisting The qualitative conclusion is that despite the differences in the of82×100pixelswithaspatialresolutionof80m×80m,and costfunction(seediscussionsinSections2.2and4), thefirstPPA 4wavelengthbands. Contextualinformationwasincludedby polynomial(redcurve)alsopassesthroughthemiddleofthesam- stacking neighboring pixels in 3×3 windows. Therefore, 36- ples,soitcanbeseenasanalternativetothePrincipalCurveofthe dimensionalinputsamplesweregenerated,withahighdegree data[18].Thegraygridsalsogothroughthemiddleofthesamples, ofredundancy. whichsuggeststhatnotonlyalternativePrincipalCurvescanbeob- tainedwithPPA,butalsoPrincipalSurfacesandVolumes[7,18,41]. • Segmentation. Thisdatasetcontainsacollectionofimagesde- Moreover, these surfaces and volumes help to visualize the struc- scribedby16high-levelnumeric-valuedattributes,suchasav- tureofthedata. Thisadvantagecanbeseenclearlyinthethirdand erage intensity, rows and columns of the center pixel, local fourth plots of the first row, where the data manifold seems to be densitydescriptors, etc. Theimageswerehand-segmentedto embeddedinmorethantwodimensions. createaclassificationlabelforeverypixel. 2The databases are available at http://archive.ics.uci.edu/ml/ Reconstructionerror Toevaluatetheperformanceindimension- datasets.html ality reduction, we employ the reconstruction mean square error 10 2015. PublishedinIJNS-WorldScientific. DOI:10.1142/S0129065714400073 Figure6: Geometriccharacterizationof4dhelicalmanifoldsusingPPA.Predictionofgeneralizedcurvaturesχ (left),andχ ,χ (right)forawide 1 2 3 familyof4dhelicaldatasets(seetextfordetails). Darkergraystandsforlowernoiselevels. Accordingtothewaydataweregenerated,thetheoretical valueofthethirdgeneralizedcurvatureisχtheor =0. 3 [2,5] [3,9] [4,3] [5,7] [9,1] Figure 7: Principal Curves, Surfaces and Volumes using PPA. First row shows a 2d visualization. Titles of the panels indicate the dimensionsbeingvisualized. Ineachpanel,aretheoriginaldata(bluedots),thecurve(red)isthereconstructeddata(whenusingonly onedimension)andthegraylinescorrespondtoagridrepresentingthetwofirstPPAdimensions.Thesecondrowshows3dvisualizations of dimensions [3,5,10] from different camera positions. In this case, the inverted uniform grid has been constructed in the three first dimensionsofthetransformeddomain. Seetextfordetails. (MSE) in the original domain. For each method, the data are morerobustingeneralthanNLPCAforahighnumberofextracted transformed and then inverted retaining a reduced set of dimen- features. On the other hand, NLPCA only achieves good perfor- sions. This kind of direct evaluation can be used only with in- mancewithalownumberofextractedfeatures. Itisworthnoting vertible methods. Distortion introduced by method m is shown that PPA GD obtains good results for the first component in the in terms of the relative MSE (in percentage) with regard to PCA: training sets, in particular always better than PPA (as proved the- Rel.MSE = 100×MSE /MSE Resultsinthissectionare oretically in sec. 2.3). Generalization ability (i.e. performance in m m PCA theaverageovertenindependentrealizationsoftherandomselec- test)dependsonthemethodandthedatabase. Eventhoughahigh tionoftrainingsamples. samples per dimension ratio may help to obtain better generaliza- Figure 8 shows the results in relative MSE as a function of the tion, itisnotalwaysthecase(seeforinstanceresultsfordatabase number of retained dimensions. Performance on the training and “Sat”). Morecomplexmethods(asPPAGDandNLPCA)perform test sets is reported in the top and the bottom panels respectively. better in training but not necessarily in test, probably due to over- Notethat100%representsthebase-linePCAerror. fitting. Moreadaptedschemesfortrainingcouldbeemployed(see Several conclusions can be extracted from these results. The forinstance[47]). mostimportantconclusionisthatPCA-basedPPAperformsalways better than PCA in the training set, as expected. This may not be thecasewithnew(unobserved)testdata. Ontheonehand,PPAis

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.