ebook img

Probabilistic archetypal analysis PDF

29 Pages·2015·5.56 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Probabilistic archetypal analysis

MachLearn(2016)102:85–113 DOI10.1007/s10994-015-5498-8 Probabilistic archetypal analysis SohanSeth1,2 · ManuelJ.A.Eugster1 Received:1April2014/Accepted:13April2015/Publishedonline:5June2015 ©TheAuthor(s)2015 Abstract Archetypal analysis represents a set of observations as convex combinations of pure patterns, or archetypes. The original geometric formulation of finding archetypes by approximating the convex hull of the observations assumes them to be real–valued. This, unfortunately, is not compatible with many practical situations. In this paper we revisit archetypal analysis from the basic principles, and propose a probabilistic framework that accommodatesotherobservationtypessuchasintegers,binary,andprobabilityvectors.We corroboratetheproposedmethodologywithconvincingreal-worldapplicationsonfinding archetypal soccer players based on performance data, archetypal winter tourists based on binarysurveydata,archetypaldisaster-affectedcountriesbasedondisastercountdata,and documentarchetypesbasedonterm-frequencydata.Wealsopresentanappropriatevisual- izationtooltosummarizearchetypalanalysissolutionbetter. Keywords Archetypalanalysis·Probabilisticmodeling·Majorization–minimization· Visualization·Convexhull·Binaryobservation 1 Introduction Archetypal analysis (AA) represents observations as composition of pure patterns, i.e., archetypes, or equivalently convex combinations of extreme values (Cutler and Breiman 1994). Although AA bears resemblance with many well established prototypical analysis Editor:KristianKersting. B SohanSeth sohan.seth@hiit.fi ManuelJ.A.Eugster manuel.eugster@hiit.fi 1 HelsinkiInstituteforInformationTechnologyHIIT,DepartmentofComputerScience,Aalto University,Espoo,Finland 2 PresentAddress:InstituteforAdaptiveandNeuralComputation,SchoolofInformatics,University ofEdinburgh,Edinburgh,UK 123 86 MachLearn(2016)102:85–113 (a) (b) Fig. 1 a Illustration of archetypal analysis with three archetypes Z, and b the corresponding factors H, projectionsoftheoriginalobservationsontheconvexhullofthearchetypes;aalsoexplicatesthedifference betweenPCAandAA tools, such as principal component analysis (PCA, Mohamed et al. 2009), non-negative matrix factorization (NMF, Févotte and Idier 2011), probabilistic latent semantic analysis (Hofmann1999),andk-means(Steinley2006);AAisarguablyunique,bothconceptually andcomputationally.Conceptually,AAimitatesthehumantendencyofrepresentingagroup of objects by its extreme elements (Davis and Love 2010): this makes AA an interesting exploratorytoolforappliedscientists(e.g.,Eugster2012;SeilerandWohlrabe2013).Com- putationally,AAisdata-driven,andrequiresthefactorstobeprobabilityvectors:thesemake AAacomputationallydemandingtool,yetbringsbetterinterpretability. TheconceptofAAwasoriginallyformulatedbyCutlerandBreiman(1994).Theauthors posedAAastheproblemoflearningtheconvexhullofapoint-cloud,andsolveditusing alternatingnon-negativeleastsquaresmethod.Inrecentyears,differentvariationsandalgo- rithmsbasedontheoriginalgeometricalformulationhavebeenpresented(Bauckhageand Thurau2009;EugsterandLeisch2011;MørupandHansen2012).However,unfortunately, thisframeworkdoesnottacklemanyinterestingsituations.Forexample,considertheprob- lem of finding archetypal response to a binary questionnaire. This is a potentially useful probleminareasofpsychologyandmarketingresearchthatcannotbeaddressedinthestan- dardAAformulation,whichreliesontheobservationstoexistinavectorspaceforforming aconvexhull.Evenwhentheobservationsexistinavectorspace,standardAAmightnotbe anappropriatetoolforanalyzingit.Forexample,inthecontextoflearningarchetypaltext documentswithtf-idfasfeatures,standardAAwillbeinclinedtofindingarchetypesbased onthevolumeratherthanthecontentofthedocument. Inthispaperwerevisitarchetypalanalysisfromthebasicprinciples,andreformulateitto extenditsapplicability.Weadmitthattheapproximationoftheconvexhull,asinthestandard AA,isindeedanelegantsolutionforcapturingtheessenceof‘archetypes’,(seeFig.1a,bfor abasicillustration).Therefore,ourobjectiveistoextendthecurrentframework,nottodiscard it.WeproposeaprobabilisticfoundationofAA,wheretheunderlyingideaistoformthe convexhullintheparameterspace.Theparameterspaceisoftenvectorialevenifthesample spaceisnot(seeFig.2fortheplatediagram).Wesolvetheresultingoptimizationproblem using majorization–minimization, andalso suggest a visualization tool to help understand thesolutionofAAbetter. The paper is organized as follows. In Sect. 2, we start with an in-depth discussion on what archetypes mean, and how this concept has evolved over the last decade, and has beenutilizedindifferentcontexts.InSect.3,weprovideaprobabilisticperspectiveofthis concept,andsuggestprobabilisticarchetypalanalysis.Hereweexplicitlytacklethecasesof 123 MachLearn(2016)102:85–113 87 Fig.2 Platediagramof probabilisticarchetypalanalysis wk 1 K 1 hn xn Θ N Table1 Probabilitydistributionsusedinthepaper Distribution Notation Pdf/pmf Normal N(μ,Σ) (2π)−K2|Σ|−21 exp{−1(x−μ)(cid:2)Σ−1(x−μ)} 2 μ∈RK,Σ∈RK×K (cid:2) Dirichlet Dir(α) B(1α)(cid:2)iK=1xiαi−1whereB(α)= Γ(iK(cid:3)=1KΓ(ααi)) i=1 i α=(α1,...,αK), K >1,αi >0 Poisson Pois(λ) λxx! exp{−λ} λ>0 Bernoulli Ber(p) px(1−p)1−x 0< p<1 Multinomial Mult(n,p) x1!·n··!xK!p1x1···pKxK n>0, p=(p1,...,pK), (cid:3) iK=1pi =1 Bernoulli,Poissonandmultinomialprobabilitydistributions,andderivethenecessaryupdate rules (derivations available as appendix). In Sect. 4, we discuss the connection between AAandotherprototypicalanalysistools—aconnectionthathasalsobeenpartlynotedby otherresearchers(MørupandHansen2012).InSect.5weprovidesimulationstoshowthe differencebetweeenprobabilisticandstandardarchetypalanalysissolutions.InSect.6,we discussavisualizationmethodforarchetypalanalysis,andpresentseveralimprovements.In Sect.7,wepresentanapplicationforeachoftheaboveobservationmodels:findingarchetypal soccerplayersbasedonperformancedata;findingarchetypalwintertouristsbasedonbinary surveydata;findingarchetypaldisaster-affectedcountriesbasedondisastercountdata;and finding document archetypes based on term-frequency data. In Sect. 8 we summarize our contribution,andsuggestfuturedirections. Throughout the paper, we represent matrices by boldface uppercase letters, vectors by boldfacelowercaseletter,andvariablesbynormallowercaseletter.1denotestherowvector ofones,andIdenotestheidentitymatrix.Table1providesthedefinitionsofthedistributions used throughout the paper. Implementations of the presented methods and source code to reproducethepresentedexamplesareavailableathttp://aalab.github.io/. 123 88 MachLearn(2016)102:85–113 2 Review Thegoalofarchetypalanalysisistofindarchetypes,‘pure’patterns.InSect.2.1weprovide someintuitiononwhatthesepurepatternsimply.InSect.2.2,wediscussthemathematical formulationofthisarchetypalanalysisassuggestedbyCutlerandBreiman(1994).InSect.2.3 we discuss how this concept has been utilized since its inception: here, we point out key references,importantdevelopments,andconvincingapplications. 2.1 Intuition Archetypes are ‘ideal example of a type’. The word ‘ideal’ does not necessarily have a qualitativemeaningofbeing‘good’,butthisconceptismostlysubjective.Forexample,one can consider ideal example to be a prototype, and other objects to be variations of such prototype.Thisviewisclosetotheconceptofclustering,wherethecentersoftheclusters aretheprototypes.Forarchetypalanalysis,however,idealexamplehasadifferentmeaning. Intuitively it implies that the prototype can not be surpassed, its the purest or the most extremethatcanbewitnessed.Asimpleexampleofarchetypesarethecolorsred,blueand green(cf.Fig.1a)intheRGBcolorspace:anyothercolorcanbeexpressedascombinations of these ideal colors. Another example can be comic book superheros who excel in some uniquecharacteristics,sayspeedorstaminaorintelligence,morethananybodywiththese abilities:theyarethearchetypalsuperheroswiththatparticularability.Thenon-archetypal superheroes,ontheotherhand,possess“many”abilitiesthatarenotextreme.Itistobenoted thatapersonwithalltheabilitiestotheirfullrealizations,ifexists,isanarchetype,Similarly, ifoneconsidersnormalhumansalongsidesuper-humansthenapersonwithnoneofthese abilitiesisalsoanarchetype. Inboththeseexamples,thearchetypesarerathertrivial.Ifonerepresentseachcolorinthe RGBspacethenitisobviousthattheunitvectorsR,GandBarepurecolorsorarchetypes. Similarly,ifonerepresentsevery(super-)humaninatwodimensionalnormalizedscaleof strengthandintelligence,thentherearefourextremeinstances,andhencearchetypesare: firstandsecond,personwithhighestscoreineitheroftheseattributesandnoneintheother; thirdandfourth,personwithhighest/lowestscoreinboththeseattributes.However,inreality onemaynotobservetheseattributesdirectly,butsomeotherfeatures.Forexample,onecan describeapersonwithmanypersonalitytraits,suchashumor,discipline,optimism,etc.,but thesecharacteristicscannotbemeasureddirectly.However,onecanprepareaquestionnaire (orobservedvariables)thatexploresthese(latent)personalitytraits.Fromthisquestionnaire, curious users can attempt to identify archetypal humans, say an archetypal leader or an archetypaljester. Findingarchetypalpatternsintheobservedspaceisanon-trivialproblem.Itisdifficult, in particular, since the archetype itself may not be belong to the set of observed samples butshouldbeinferred;yet,itshouldalsonotbe“mythological”,butrathersomethingthat mightbeobserved.CutlerandBreiman(1994)suggestedasimpleyetelegantsolutionthat findstheapproximateconvexhulloftheobservations,anddefinetheverticesasarchetypes. Thisallowsindividualobservationstobebestrepresentedbycomposition(convexcombina- tion)ofarchetypes,whilearchetypescanonlybeexpressedbythemselves,i.e.,theyarethe ‘purestform’or‘mostextreme’forms.Although,itiscertainlynotthemostdesiredsolution, since,theinferredarchetypesarerestrictedtobeontheboundaryoftheconvexhullofthe observations,whereastruearchetypemaybeoutside;inferringsucharchetypesoutsidethe observationhullwillrequirestrongregularityassumptions.ThesolutionsuggestedbyCutler 123 MachLearn(2016)102:85–113 89 and Breiman (1994) finds a trade off between computational simplicity, and the intuitive natureofarchetype. 2.2 Formulation CutlerandBreimanposedAAastheproblemoflearningtheconvexhullofapoint-cloud. Theyassumedthearchetypestobeconvexcombinationsofobservations,andtheobservations tobeconvexcombinationsofthearchetypes.LetXbea(real-valued)datamatrixwitheach column as an observation. Then, this is equivalent to solving the following optimization problem: min||X−XWH||2 (1) W,H F withtheconstraintthatbothWandHarecolumnstochasticmatrices.SubscriptFdenotes Frobeniousnorm.GivenNobservations,andK archetypes,WisN×K dimensionalmatrix, andHis K ×N dimensionalmatrix.Here,Z = XWaretheinferredarchetypesthatexist ontheconvexhulloftheobservationsduetothestochasticityofWandforeachn-thsample x ,Zh isitsprojectionontheconvexhullofthearchetypes. n n CutlerandBreimansolvedthisproblemusinganalternatingnon-negativeleastsquares methodasfollows: Ht+1 =arg minH≥0||X−ZtH||2F+λ||1H−1||2 (2) and Wt+1 =arg minW≥0||Zt −XW||2F+λ||1W−1||2 (3) whereaftereachalternatingstep,thearchetypes(Z)areupdatedbysolvingX=Zt+1Ht,and Zt+1 = XWt,respectively.Thealgorithmalternatesbetweenfindingthebestcomposition ofobservationsgivenasetofarchetypes,andthenfindingthebestsetofarchetypesgivena compositionofobservations.Noticethatthestochasticityconstraintwascleverlyenforcedby asuitablystrongregularizationparameterλ.Theauthorsalsoprovedthatk >1archetypes arelocatedontheboundaryoftheconvexhullofthepoint-cloud,andk =1archetypeisthe meanofthepointcloud. 2.3 Development Thefirstpublication,tothebestofourknowledge,whichdealswiththeideaof“idealtypes” andobservationsrelatedtothem,isWoodburyandClive(1974).There,theauthorsdiscuss howtoderiveestimatesofgradesofmembershipofcategoricalobservations,givenana-priori definedsetofideal(orpure)types,inthecontextofclinicaljudgment.Twentyyearslater—in 1994—CutlerandBreiman(1994)formulatedarchetypalanalysis(AA)astheproblemof estimatingboththemembershipandtheidealtypesgivenasetofreal-valuedobservations. They motivated this new kind of analysis with, among other examples, the estimation of archtyepalheaddimensionsofSwissArmysoldiers. OneoftheoriginalauthorscontinuedherworkonAAinthefieldsofphysicsandapplied itonspatio-temporaldata(StoneandCutler1996).Inthislineofresearch,CutlerandStone (1997)developedmovingarchetypes,byextendingtheoriginalAAframeworkwithanaddi- tional optimization step, which estimates the optimal shift of observations in the spatial domainovertime.Theyappliedthismethodtodatagatheredfromachemicalpulseexperi- ment.OtherresearchestookuptheideaofAAandapplieditindifferentfields;thefollowingis 123 90 MachLearn(2016)102:85–113 acomprehensivelistofproblemswhereotherresearchershaveappliedAA:analysisofgalaxy spectra(Chanetal.2003),ethicalissuesandmarketsegmentation(Lietal.2003),thermogram sequences(Marinettietal.2007),geneexpressiondata(Thøgersenetal.2013);performance analysis in marketing (Porzio et al. 2008), sports (Eugster 2012), and science (Seiler and Wohlrabe2013);facerecognition(Xiongetal.2013);andingameAIdevelopment(Sifaand Bauckhage2013). Inrecentyears,animatedbytheriseofthenon-negativematrixfactorizationresearch,var- iousauthorshaveproposedextensionsandvariationstotheoriginalalgorithm.Thefollowing areafewnotablepublications.Thurauetal.(2009)introducetheconvex-hullnon-negative matrix factorization (NMF). Motivated by the convex NMF, the authors make the same assumptionasmadeinAAthatobservationsareconvexcombinationsofspecificobserva- tions.However,theyderiveanalgorithmwhichestimatesthearchetypesnotfromtheentire setofobservationsbutfrompotentialcandidatesfoundfrom2-dimensionalprojectionson eigenvectors:thisleadstoasolutionalsoapplicableforlargedatasets.Theauthorsdemon- strate their method on a data set consisting of 150 million votes on World of Warcraft® guilds. In Thurau et al. (2010), the authors present an even faster approach by deriving a highlyefficientvolumemaximizationalgorithm.EugsterandLeisch(2011)tackletheprob- lem of robustness, and that a single outlier can break down the archetype solution. They adapttheoriginalalgorithmtobearobustM-estimatorandpresentaniterativelyreweighted leastsquaresfittingalgorithm.TheyevaluatetherealgorithmusingtheOzonedatafromthe original AA paperwith contaminated observations.MørupandHansen(2012)also tackle theproblemofderivinganalgorithmforlargescaleAA.Theyproposeasolutionbasedon asimpleprojectedgradientmethod,incombinationwithanefficientinitializationmethod for finding candidates of archetypes. The authors demonstrate their method, among other examples,withananalysisoftheNIPSbagofwordscorpusandtheMovielensmovierating dataset. 3 Probabilisticarchetypalanalysis We observe that the original AA formulation implicitly exploits a simplex latent variable model,andnormalobservationmodel,i.e., h ∼Dir(1), x ∼N(Zh ,(cid:6) I). (4) n n n 1 But,itgoesastepfurther,andgeneratestheloadingmatrixZfromasimplexlatentmodel itselfwithknownloadingsΘ ∈RM×N,i.e., w ∼Dir(1), z ∼N(Θw ,(cid:6) I). (5) k k k 2 Thus,thelog-likelihoodcanbewrittenas, (cid:6) (cid:6) LL(X|W,H,Z,Θ)=− 1||X−ZH||2 − 2||Z−ΘW||2 +C((cid:6) ,(cid:6) ). (6) 2 F 2 F 1 2 ThearchetypesZ,andcorrespondingfactorsHcanthenbefoundbymaximizingthislog- likelihood(orminimizingthenegativelog-likelihood)undertheconstraintthatbothWand Harestochastic:thiscanbeachievedbyalternatingoptimizationasCutlerandBreimandid (butwithdifferentupdaterulesforZ,and(cid:6)·,seeAppendix1fordetails). The equivalence of this approach to the standard formulation requires that Θ = X. Althoughunusualinaprobabilisticframework,thiscontributestothedata-drivennatureof AA.Intheprobabilisticframework,Θcanbeviewedasasetofknownbasesthatisdefined 123 MachLearn(2016)102:85–113 91 bytheobservations,andthepurposeofarchetypalanalysisistofindasparsesetofbasesthat canexplaintheobservations.Theseinferredbasesarethearchetypes,andthestochasticity constraintsonWandHensurethattheyaretheextremevaluesasonedesires.Itshouldbe notedthatΘ doesnotneedtocorrespondtoX :moregenerally,Θ = XPwherePisa n n permutationmatrix. 3.1 Exponentialfamily WedescribeAAinaprobabilisticset-upasfollows(seeFig.2), w ∼Dir(1), h ∼Dir(1), x ∼EF(x ;ΘWh ) (7) k n n n n where EF(z;θ)=h(z)g(θ)exp(η(θ)(cid:6)s(z)) (8) with standard meaning for the functions g,h,η and s. Notice that, we employ the normal parameterθ ratherthanthenaturalparameterη(θ),sincetheformerismoreinterpretable. Infact,theconvexcombinationofθ ismoreinterpretablethantheconvexcombinationof η(θ),asalinearcombinationonη(θ)wouldleadtononlinearcombinationofθ.Toadhereto theoriginalformulation,wesuggestΘ·n tobethemaximumlikelihoodpointestimatefrom observationX·n.Again,thecolumnsofΘandXdonotnecessarilyhavetobecorresponded. Then,wefindarchetypesZ=ΘWbysolving argminW,H≥0−LL(X|W,H,Θ)suchthat1W=1,1H=1. (9) Wecallthisapproachprobabilisticarchetypalanalysis(PAA). ThemeaningofarchetypeinPAAisdifferentthaninthestandardAAsincetheformer lies in the parameter space, whereas the latter in the observation space. To differentiate thesetwoaspects,wecallthearchetypesZ=ΘWfoundbyPAA[solving(9)],archetypal profiles:ourmotivationisthatΘ·n canbeseenastheparametricprofilethatbestdescribes thesingleobservationx ,andthus,Zarethearchetypalprofilesthatareinferredfromthem. n We generally refer to the set of indices that contribute to the k-th archetypal profile, i.e., {i :W >δ},whereδisasmallvalue,asgeneratingobservationsofthatarchetype.Notice ik that,whentheobservationmodelismultivariatenormalwithidentitycovariance,thenthis formulationisthesameassolving(1).WeexploresomeotherexamplesofEF:multinomial, productofunivariatePoissondistributions,andproductofBernoullidistributions. 3.2 Poissonobservations If the observations are integer–valued then they are usually assumed to originate from a Poissondistribution.Thenweneedtosolvethefollowingproblem, (cid:4)(cid:5) (cid:6) argminW,H≥0 −Xmnlog(ΛWH)mn+(ΛWH)mn (10) mn (cid:3) (cid:3) suchthat H =1and W =1.HereΛ isthemaximumlikelihoodestimateof j jn i ik mn thePoissonrateparameterfromobservationX . mn To solve this problem efficiently, we employ a similar technique used by Cutler and Breimanbyrelaxingtheequalityconstraintwithasuitablystrongregularizationparameter. However,weemployamultiplicativeupdateruleafterwardsinsteadofanexactmethodlike 123 92 MachLearn(2016)102:85–113 thenonnegativeleastsquares.Theresultingupdaterulesare(seeAppendix1forderivation) ∇− (cid:4) Ht+1 =Ht (cid:7) Ht, ∇+ = Λ W +λ, ∇+ Hnj im mn Ht im (cid:3) (cid:4) X Λ W λ ∇− = (cid:3)ij m im mn + (cid:3) (11) Hnj i mnΛimWmnHnj nHnj and ∇− (cid:4) Wt+1 =Wt (cid:7) Wt, ∇+ = Λ H +λ, ∇+ Wmn im nj Wt ij (cid:4) X Λ H λ ∇− = (cid:3) ij im nj + (cid:3) . (12) Wmn ij mnΛimWmnHnj mWmn Here(cid:7)denotesHadamardproduct.Wechooseλtobe20timesthevarianceofthesamples which is sufficiently large to enforce stochasticity but not too strong to be dominating. A similarvalueλ2 =200hasbeenusedbyEugsterandLeisch(2011). 3.3 Multinomialobservations In many practical problems such as document analysis, the observations can be thought of as originating from a multinomial model. In such cases, PAA expresses the underlying multinomialprobabilityasPWHwherePisthemaximumlikelihoodestimateachievedfrom wordfrequencymatrixX.ThisdecompositionisverysimilartoPLSA:PLSAestimatesa topicbydocumentmatrixHandawordbytopicmatrixZ,whileAAestimatesadocument by topic matrix (W) and a topic by document matrix (H) from which the topics can be estimatedasarchetypesZ = PW.Therefore,thearchetypalprofilesareeffectivelytopics, but topics might not always be archetypes. For instance, given three documents {A,B}, {B,C},{C,A};thethreetopicscouldbe{A},{B},and{C},whereasthearchetypescanonly bethedocumentsthemselves.Thus,itcanbearguedthatarchetypesaretopicswithbetter interpretability. Tofindarchetypesforthisobservationmodeloneneedstosolvethefollowingproblem, (cid:4) (cid:4) argminW,H≥0− Xmnlog PmiWijHjn, (13) mn ij (cid:3) (cid:3) suchthat H =1and W =1. j jn i ik This can be efficiently solved using expectation-maximization (or majorization– minimization)frameworkwiththefollowingupdaterules(seeAppendix1forderivation), (cid:4)X H W P Ht+1 Ht+1 = il ij jk kl, Ht+1 = (cid:3) ij (14) ij kl (HWP)il ij jHit+j 1 and (cid:4)X H W P Wt+1 Wt+1 = il ij jk kl, Wt+1 = (cid:3) jk . (15) jk il (HWP)il jk kWtj+k1 123 MachLearn(2016)102:85–113 93 3.4 Bernoulliobservations Therearerealworldapplicationsthatdealwithbinaryobservationsratherthanreal–valuedor integers,e.g.,binaryquestionnaireinmarketingresearch.Suchobservationscanbeexpressed intermsoftheBernoullidistribution.Tofindthearchetypalrepresentationofbinarypattern weneedtosolvethefollowingproblem, (cid:4)(cid:5) (cid:6) argminW,H≥0 −Xmnlog(PWH)mn−Ymnlog(QWH)mn , (16) mn (cid:3) (cid:3) suchthat H =1and W =1,whereXisthebinarydatamatrix(with1denoting j jn i ik success/trueand0denotingfailure/false),P istheprobabilityofsuccessestimatedfrom mn X (effectivelyeither0or1),Y =1−X ,andQ =1−P . mn mn mn mn mn Thisisamoreinvolvedformthanthepreviousones:onecannotuserelaxationtechniqueas inthePoissoncase,sincerelaxationoverthestochasticityconstraintmightrendertheresulting probabilitiesPWHgreaterthan1,thusmakingthecostfunctionincomputable.Therefore, wetakeadifferentapproachtowardsolvingthisproblembyreparameterizing(cid:3)thestochastic vector(says)byanunnormalizednon-negativevector(sayt),suchthats=t/ t .Weshow i thatthestructureofthecostallowsustoderiveefficientupdaterulesovertheunnormalized vectorsusingmajorization–minimization.Giveng andv tobethereparameterizationofh n k n andw respectively,wegetthefollowingupdateequations,(seeAppendix1forderivation), k ∇n Gt+1 =Gt (cid:7) G, ∇d G with (cid:4) (cid:4) ∇d = X + Y , (17) Gnj ij ij i (cid:3)i (cid:3) (cid:4) (cid:4) X P W Y Q W ∇n = (cid:3)ij m im mn + (cid:3)ij m im mn (18) Gnj i mnPimWmnHnj i mnQimWmnHnj and ∇n Vt+1 =Vt (cid:7) V, ∇d V with (cid:3) (cid:3) (cid:4) (cid:4) X P W H Y Q W H ∇d = (cid:3)ij m im mn nj + (cid:3)ij m im mn nj, (19) Vmn P W H Q W H ij mn im mn nj ij mn im mn nj (cid:4) (cid:4) X P H Y Q H ∇n = (cid:3) ij im nj + (cid:3) ij im nj . (20) Vmn P W H Q W H ij mn im mn nj ij mn im mn nj 4 Relatedwork Archetypalanalysisanditsprobabilisticextensionsharecloseconnectionswithotherpopular matrix factorization methods. We explore some of these connections in this section, and provideasummaryinFig.3.WerepresenttheoriginaldatamatrixbyX ∈ XM×N where 123 94 MachLearn(2016)102:85–113 AA C-NMF PAA CutlerandBreiman(1994) Dingetal(2010) 1.Z=XW 2.Hstochastic EF-PCA VCA PLSA Mohamedetal(2009) doNascimentoandDias(2005) Hofmann(1999) S-NMF NMF PPCA Dingetal(2010) LeeandSeung(1999) 3.X∼f(ZH) 4.Hnonnegative 5.Z,Hnonnegative Fig.3 Relationsamongfactorizationmethods.1.data-drivenmethodswheretheloadingsdependoninput, 2.simplexfactormodelswithprobabilityvectorasfactors,3.probabilisticmethods,4.non-negativefactors witharbitraryloadings,and5.non-negativefactorswithnon-negativeloadings.Theconnectionsareelaborated inSect.4 eachcolumnx isanobservation;thecorrespondinglatentfactormatrixbyH∈RK×N,and n loadingmatrixbyZ∈RM×K. Principal component analysis and extensions: Principal component analysis (PCA) finds an orthogonal transformation of a point-cloud, which projects the observations in a new coordinate system that preserves the variance of the point-cloud the best. The concept of PCAhasbeenextendedtoaprobabilisticaswellasaBayesianframework(Mohamedetal. 2009).ProbabilisticPCA(PPCA)assumesthatthedataoriginatesfromalowerdimensional subspaceonwhichitfollowsanormaldistribution(N),i.e., h ∼N(0,I), x ∼N(Zh ,(cid:6)I) n n n whereh ∈RK,x ∈RM,K < M,and(cid:6) >0. n n Probabilistic principal component analysis explicitly assumes that the observations are normally distributed: an assumption that is often violated in practice, and to tackle such situationsoneextendsPPCAtoexponentialfamily(EF).Theunderlyingprinciplehereisto changetheobservationmodelaccordingly: h ∼N(0,I), x ∼EF(x ;Zh ), n n n n i.e.,eachelementofx isgeneratedfromthecorrespondingelementofZh asEF(z;θ)= n n h(z)g(θ)exp(θs(z)) where s(z) is the sufficient statistic, and θ is the natural parameter: PAAutilizessimilarapproachbutwithnormalparameters. Similarly,onecanalsomanipulatethelatentdistribution.ApopularchoiceistheDirichlet distribution(Dir),whichhasbeenwidelyexploredintheliterature,e.g.,inprobabilisticlatent semanticanalysis(PLSA(Hofmann1999)),h ∼Dir(1), x ∼Mult(Zh ), where1Z=1; n n n vertexcomponentanalysis(doNascimentoandDias2005),h ∼Dir(1), x ∼N(Zh ,(cid:6)I); n n n andsimplexfactoranalysis(BhattacharyaandDunson2012),ageneralizationofPLSA(or morespecificallyoflatentDirichletallocation,LDA(Bleietal.2003)):PAAadditionally decomposetheloadinginsimplexfactorswithknownloading. 123

Description:
Archetypal analysis (AA) represents observations as composition of pure In this paper we revisit archetypal analysis from the basic principles, and
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.