ebook img

Maximum likelihood estimation in constrained parameter spaces for mixtures of factor analyzers PDF

1.7 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Maximum likelihood estimation in constrained parameter spaces for mixtures of factor analyzers

NonamemanuscriptNo. (willbeinsertedbytheeditor) Maximum likelihood estimation in constrained parameter spaces for mixtures of factor analyzers FrancescaGreselin SalvatoreIngrassia · Received:date/Accepted:date 3 1 0 Abstract Mixturesoffactoranalyzersarebecomingmoreandmorepopularintheareaof 2 model basedclusteringofhigh-dimensional data.According tothelikelihood approach in n datamodeling,itiswellknownthattheunconstrainedlog-likelihoodfunctionmaypresent a spurious maxima and singularities and this is dueto specificpatterns of theestimatedco- J variancestructure,whentheirdeterminantapproaches0.Toreducesuchdrawbacks,inthis 8 paperweintroduceaprocedurefortheparameterestimationofmixturesoffactoranalyzers, whichmaximizesthelikelihoodfunctioninaconstrainedparameterspace.Wethenanalyze ] E and measure its performance, compared to the usual non-constrained approach, via some M simulationsandapplicationstorealdatasets. . Keywords Constrained estimation Factor Analyzers Modeling Mixture Models t · · · a Model-BasedClustering. t s [ 1 Introductionandmotivation 1 v Finitemixture distributions havebeen receivingagrowing interestinstatisticalmodeling. 5 0 Theircentralroleismainlyduetotheirdoublenature:theycombinetheflexibilityofnon- 5 parametricmodelswiththestrongandusefulmathematicalpropertiesofparametricmodels. 1 According tothisapproach, whenweknow that asampleofobservations hasbeendrawn . from different populations, weassumeaspecificdistributional form ineachof the under- 1 0 lying populations. The purpose is to decompose the sample into its mixture components, 3 which,forquantitativedata,areusuallymodeledasamultivariateGaussiandistribution,and 1 toestimateparameters.Theassumptionofunderlyingnormality,besidestheelegantanalytic : properties, allows alsotoemploy theEMalgorithm for theMLestimationoftheparame- v i ters. On the other side, when considering alarge number of observed variables, Gaussian X r FrancescaGreselin a DepartmentofStatisticsandQuantitativeMethods Milano-BicoccaUniversity ViaBicoccadegliArcimboldi8-20126Milano(Italy).E-mail:[email protected] SalvatoreIngrassia DepartmentofEconomicsandBusiness UniversityofCatania CorsoItalia55,-Catania(Italy).E-mail:[email protected] 2 FrancescaGreselin,SalvatoreIngrassia mixturemodelscanprovideanover-parameterizedsolutionas,besidesthemixingweights, it is required to estimate the mean vector and the covariance matrix for each component (PeelandMcLachlan,2000).Asaconsequence,weobserveatthesametimeanundueload ofcomputationallyintensiveproceduresfortheestimation. This is the reason why a number of strategies have been introduced in the literature to avoid over-parameterized solutions. Among the various proposal, some authors devel- oped methodologies for variable selection (see, f.i., Liuetal. (2003) and Hoff (2005) in theBayesianframework,PanandShen(2007)andRafteryandDean(2006)inthefrequen- tist one). They further motivate their approach from the observation that the presence of non-informative variables can be strongly misleading for some clustering methods. With thesamepurposeofparsimony, but acompletelydifferent approach, BanfieldandRaftery (1993)devisedamethodologytoidentifycommonpatternsamongthecomponent-covariance matrices; their proposal arose a great attention in the literature. Along a slightly different line of thinking, GhahramaniandHilton (1997) and McLachlanetal. (2003) proposed to employlatentvariablestoperformdimensionalreductionineachcomponent,startingfrom theconsiderationthatinmanyphenomenasomefewunobservedfeaturescouldbeexplained bythemanyobservedones. In this paper we address mixtures of factor analyzers by assuming that the data have been generated by a linear factor model with latent variables modeled as Gaussian mix- tures. Our purpose is to improve the performances of the EM algorithm, by facing with someofitsissuesandgivingpracticalrecipestoovercomethem.Itiswellknownthatthe EMalgorithmgeneratesasequenceofestimates,startingfromaninitialguess,sothatthe corresponding sequence of the log-likelihood values is not decreasing. However, the con- vergencetoward theMLEisnot guaranteed, becausethelog-likelihood isunbounded and presentslocalmaxima.Anothercriticalpointisthattheparameterestimatesaswellasthe convergenceofthewholeestimationprocessmaybeaffectedbythestartingvalues(see,f.i., McLachlanandKrishnan(2007))sothat thefinalestimatecruciallydepends ontheinitial guess.Thisissuehasbeeninvestigatedbymanyauthors,startingfromtheseminalpaperof RednerandWalker(1984).Alongthelinesof(Ingrassia,2004),inthispaperweintroduce and implement aprocedure for the parameters estimation of mixtures of factor analyzers, whichmaximizesthelikelihood function inaconstrained parameterspace,having nosin- gularitiesandareducednumberofspuriouslocalmaxima.Wethenanalyzeandmeasureits performance,comparedtotheusualnon-constrainedapproach. We have organized the rest of the paper as follows. In Section 2 we summarize main ideas about Gaussian Mixtures of Factor Analyzer model; in Section 3 we provide fairly extensive notes concerning the likelihood function and the AECM algorithm. Some well knownconsiderations(Hathaway,1985)relatedtospuriousmaximizersandsingularitiesin the EM algorithm are recalled in Section 4, and motivate our proposal to introduce con- straints on factor analyzers. Further, we give a detailed methodology to implement such constraintsintotheEMalgorithm.InSection6weshowanddiscusstheimprovedperfor- manceofourprocedure, onthegroundofsomenumericalresultsbasedonbothsimulated andrealdata.Section7containsconcludingnotesandprovidesideasforfutureresearch. 2 TheGaussianMixtureofFactoranalyzers WithintheGaussianMixture(GM)model-basedapproachtodensityestimationandcluster- ing,thedensityofthed-dimensionalrandomvariableXofinterestismodelledasamixture ofanumber,sayG,ofmultivariatenormaldensitiesinsomeunknownproportionsp ,...p . 1 G Maximumlikelihoodestimationinconstrainedparameterspacesformixturesoffactoranalyzers 3 Thatis,eachdatapointistakentobearealizationofthemixtureprobabilitydensityfunc- tion, G f(x;q )= (cid:229) p f (x;m ,S ) (1) g d g g g=1 wheref (x;m ,S )denotesthed-variatenormaldensityfunctionwithmeanm andcovariance d matrixS .Herethevectorq (d,G)ofunknownparametersconsistsofthe(G 1)mixing GM proportionsp ,theG delementsofthecomponentmeansm ,andthe 1Gd(d+−1)distinct elements of tghe comp×onent-covariance matrices S . Therefogre, the G-2component normal g mixturemodel(1)withunrestrictedcomponent-covariancematricesisahighlyparametrized model. We crucially need some method for parsimonious parametrization of the matrices S ,becausetheyrequiresO(d2)parameters.Amongthevariousproposalsfordimensional- g ityreduction,weareinterestedhereinconsideringMixturesofGaussianFactorAnalyzers (MGFA), which allows to explain data by explicitly modeling correlations between vari- ablesinmultivariateobservations.Wepostulateafinitemixtureoflinearsub-modelsforthe distributionofthefullobservationvectorX,giventhe(unobservable)factorsU.Thatiswe canprovidealocaldimensionalityreductionmethodbyassumingthatthedistributionofthe observationX canbegivenas i X =m +L U +e withprobability p (g=1,...,G) fori=1,...,n, (2) i g g ig ig g whereL is a d q matrix of factor loadings, the factors U ,...,U are N (0,I ) dis- g 1g ng q tributedindepend×entlyoftheerrorse ,whichareindependentlyN (0,Y )distributed,and ig g Y isad d diagonal matrix (g=1,...,G).Wesupposethat q<d, whichmeans thatq g × unobservablefactorsarejointlyexplainingthedobservablefeaturesofthestatisticalunits. Undertheseassumptions,themixtureoffactoranalyzersmodel isgivenby(1),wherethe g-thcomponent-covariancematrixS hastheform g S g=L gL ′g+Y g (g=1,...,G). (3) Theparametervectorq (d,q,G)nowconsistsoftheelementsofthecomponentmeans MGFA m ,theL ,andtheY ,alongwiththemixingproportionsp (g=1,...,G 1),onputting g g g g − mp God=el1(−2)(cid:229) isGi=−s1t1ilplgs.aNtiosfiteedthiaftwinethreepclaasceeoLfqb>y1L,thHer,ewishaenreinHfinisityanoyfochrtohiocgeosnfoalrLmag,trsiixncoef g g ′ orderq.Asq(q 1)/2constraintsareneededforL tobeuniquelydefined,thenumberof g − freeparameters,foreachcomponentofthemixture,is 1 dq+d q(q 1). −2 − Comparingthetwoapproachesandwillingnowtomeasurethegainedparsimonywhen weusemixturesoffactoranalyzers,withrespecttothemoreusualgaussianmixtures,and denotingby q (d,G) and q (d,q,G),thenumberoftheestimatedparameters CovGM CovMGFA | | | | forthecovariancematricesintheGMandMGFAmodels,respectively,wehavetochoose valuesofqsuchthatthefollowingquantityPispositive G 1 P= q (d,G) q (d,q,G) = d(d+1) G[dq d+ q(q 1)] CovGM CovMGFA | |−| | 2 − − 2 − i.e.: G P= [(d q)2 (d+q)]. 2 − − 4 FrancescaGreselin,SalvatoreIngrassia This is the only requirement for parsimony. Now, we can express the relative reduction RR(d,q,G)=RR(d,q)givenby q (d,G) q (d,q,G) (d q)2 (d+q) CovGM CovGMFA RR(d,q)= | |−| | = − − . q (d,G) d(d+1) CovGM | | InTable1wereporttherelativereduction,intermoflowernumberofestimatedparameters forthecovariancematricesintheMGFAmodels,withrespecttotheGMmodels. Table1 RelativereductionRR(d,q) qd 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | 1 - - - 0.20 0.33 0.43 0.50 0.56 0.60 0.64 0.67 0.69 0.71 0.73 0.75 2 - - - - 0.07 0.19 0.29 0.36 0.42 0.47 0.52 0.55 0.58 0.61 0.63 3 - - - - - - 0.11 0.19 0.27 0.33 0.38 0.42 0.46 0.50 0.53 4 - - - - - - - 0.06 0.13 0.20 0.26 0.31 0.35 0.39 0.43 5 - - - - - - - - 0.02 0.09 0.15 0.21 0.25 0.30 0.33 Therelativereduction represents theextenttowhichthefactormodel offersasimpler interpretation forthebehaviour ofxthanthealternativeassumptiongivenbythegaussian mixturemodel. 3 ThelikelihoodfunctionandtheEMalgorithmforMGFA In this section we summarize the main steps of the EM algorithm for mixtures of Factor analyzers,seee.g.McLachlanandPeel(2000)fordetails. Let X=(x ,...,x ) be a sample of size n from density (1), and let x (i=1,...,n) 1 n i denotes ∼the realization of X in (2). For given data X, parameters in (1) can be estimated i accordingtothelikelihoodapproachviatheEMalgo∼rithm,wherethelikelihoodfunctionis givenby: n G n G L(q ;X)=(cid:213) (cid:229) f (x;m ,S )p =(cid:213) (cid:229) f (x;m ,L ,Y )p , d i g g g d i g g g g ∼ i=1(g=1 ) i=1(g=1 ) wherewesetS g=L gL ′g+Y g(g=1,...,G).Considertheaugmenteddata{(xi,uig,zi),i= 1,...,n , where z =(z ,...,z ), with z =1 if x comes from the g-th population and i i1 ig ′ ig i } z =0otherwise.Then,thecomplete-datalikelihoodfunctioncanbewrittenintheform: ig n G L (q ;X)=(cid:213) (cid:213) f x u;m ,L ,Y f (u )p zig. (4) c d i| i g g g q ig g ∼ i=1g=1 (cid:2) (cid:0) (cid:1) (cid:3) In particular, due to the factor structure of the model, see MengandvanDyk (1997), we havetoconsiderthealternatingexpectation-conditional maximization (AECM)algorithm. Suchaprocedure isanextensionoftheEMalgorithm that usesdifferent specificationsof missingdataateachstage.Theideaistopartitionq =(q ,q ) insuchawaythatL(q ;X) ′1 ′2 ′ iseasytomaximizeforq givenq andviceversa.Then,wecaniteratebetweenthesetw∼o 1 2 conditionalmaximizationsuntilconvergence.Inthiscaseq = p ,m ,g=1,...,G where 1 { g g } Maximumlikelihoodestimationinconstrainedparameterspacesformixturesoffactoranalyzers 5 themissingdataaretheunobservedgrouplabelsZ=(z ,...,z ),andthesecondpartofthe ′1 ′n parametersvectorisgivenbyq = (L ,Y ),g=1,...,G wherethemissingdataarethe 2 g g { } grouplabelsZandtheunobservedlatentfactorsUe=(U11,...,UnG).Hence,theapplication of the AECM algorithm consists of two cycles, and there is one E-step and one CM-step alternativelyconsideringq andq ineachpairofcycles. 1 2 First Cycle. Here it is q = p ,m ,g=1,...,G where the missing data are the unob- 1 { g g } servedgrouplabelsZ=(z ,...,z ).Thecompletedatalikelihoodis ′1 ′n n G L (q )=(cid:213) (cid:213) f x;m ,S p zig. (5) c1 1 d i g g g i=1g=1 (cid:2) (cid:0) (cid:1) (cid:3) TheE-steponthefirstcycleonthe(k+1)-thiterationrequiresthecalculationofQ (q ;q (k))= 1 1 E L (q )X whichistheexpectedcomplete-datalog-likelihoodgiventhedataXand q (k){ c 1 | } using the curre∼nt estimate q (k) for q . In practice it requires calculating E Z X∼ and q (k){ ig| } usualcomputationsshowthatthisstepisachievedbyreplacingeachz byitscurre∼ntcon- ig ditionalexpectationgiventheobserveddatax,thatiswereplacez byz(k+1/2),where i ig ig f d xi m (gk),L (gk),Y (gk) p g(k) z(k+1)= | . (6) ig (cid:229) G f(cid:16) x m (k),L (k),Y (cid:17)(k) p (k) j=1 d i| j j j j (cid:16) (cid:17) OntheM-step,themaximizationofthiscomplete-datalog-likelihoodyields (cid:229) n z(k+1) p g(k+1)= i=1 ig n m (k+1)= 1 (cid:229) n z(k+1)x g n ig i gi=1 where ng(k+1) =(cid:229) ni=1zi(gk+1). According to notation in McLachlanandPeel (2000), we set q (k+1/2)=(q (k+1)′,q (k)′). 1 2 ′ Second Cycle. Here it is q = S ,g=1,...,G = (L ,Y ),g=1,...,G where the 2 g g g { } { } missing data are the unobserved group labels Z and the latent factors U. Therefore, the completedatalikelihoodis Lc2(q 2)=(cid:213) n (cid:213) G f d xi|uig;m g(k+1),S g f q(uig)p g(k+1) zig i=1g=1 h (cid:16) (cid:17) i =(cid:213) n (cid:213) G f d xi|uig;m g(k+1),L g,Y g f q(uig)p g(k+1) zig, (7) i=1g=1 h (cid:16) (cid:17) i where 1 1 f d xi|uig;m g(k+1),L g,Y g = 2Yp 1/2exp −2(xi−m g(k+1)−L guig)′Y −g1(xi−m g(k+1)−L guig) . (cid:16) (cid:17) | g| (cid:26) (cid:27) 1 1 f q(uig)= (2p )q/2exp −2u′iguig . (cid:26) (cid:27) 6 FrancescaGreselin,SalvatoreIngrassia Nowthecompletedatalog-likelihoodisgivenby Lc2(q 2)=−n2dln2p +(cid:229)G nglnp g+21(cid:229)n (cid:229)G zigln|Y −g1| g=1 i=1g=1 −21(cid:229) n (cid:229)G zigtr (xi−m g(k+1)−L guig)(xi−m g(k+1)−L guig)′Y −g1 . (8) i=1g=1 n o Somealgebrasleadtothefollowingestimateof (L ,Y ),g=1,...,G : g g { } Lˆg=Sg(k+1)g (gk)′[Q (gk)]−1 Yˆg=diag Sg(k+1)−Lˆgg (gk)Sg(k+1) . n o whereweset n Sg(k+1)=(1/ng(k+1))(cid:229) zi(gk+1)(xi−m g(k+1))(xi−m g(k+1))′ i=1 g (gk)=L (gk)′(L (gk)L (gk)′+Y (gk))−1 Q (igk)=Iq−g (gk)L (gk)+g (gk)(xi−m g)(xi−m g)′g (gk)′. HencethemaximumlikelihoodestimatesLˆ andYˆ forL andY canbeobtainedbyalter- g g nativelycomputingtheupdateestimatesL +andY +,by g g L +g =Sg(k+1)g (gk)′[Q (gk)]−1 and Y +g =diag Sg(k+1)−L +gg (gk)Sg(k+1) , (9) n o and,fromthelatter,evaluatingtheupdateestimatesg+andQ +by g g g +g =L ′g(L gL ′g+Y g)−1 and Q +g =Iq−g gL g+g gSg(k+1)g ′g, (10) iteratingthesetwostepsuntilconvergenceonLˆ andYˆ ,sogivingL (k+1)andY (k+1). g g g g Insummary,theprocedurecanbedescribedasfollows.Foragiveninitialrandomclus- tering z(0), on the (k+1) th iteration, the algorithm carries out the following steps, for − g=1,...,G: 1. Computez(k+1)andconsequentlyobtainp (k+1),m (k+1),n(k+1)andS(k+1); ig g g g g 2. SetastartingvalueforL andY fromS(k+1); g g g 3. Repeatthefollowingsteps,untilconvergenceonLˆ andYˆ : g g (a) Computeg +andQ +from(10); g g (b) Setg g +andQ Q +; g← g g← g (c) ComputeL +g ←Sg(k+1)g ′g(Q −g1)andY +g ←diag Sg(k+1)−L +gg gSg(k+1) ; (d) SetL g←L +g andY g←Y +g; n o Tocompletelydescribethealgorithm,herewegivemoredetailsonhowtospecifythe startingvaluesforL andY fromS(k+1),asitisneededinStep2. g g g Starting from the eigen-decomposition of Sg(k+1), say Sg(k+1) =AgBgA′g, computed on thebaseofz(k+1),themainideaisthatL hastosynthesizethe”moreimportant”relations ig g Maximumlikelihoodestimationinconstrainedparameterspacesformixturesoffactoranalyzers 7 betweenthedobservedfeatures,seeMcNicholasandMurphy(2008).Then,lookingatthe equalityS =G G +Y ,theinitialvaluesofL weresetas g g ′g g g l = d a (11) ij j ij whered isthe jthlargesteigenvalueofS(k+1)panda istheithelementofthecorresponding j g ij eigenvectora (the jthcolumninA ),fori 1,2,...,p and j 1,2,...,q .Finallythe j g ∈{ } ∈{ } Y gmatricescanbeinitializedbythepositionY g=diag{Sg(k+1)−L gL ′g}. 4 Likelihoodmaximizationinconstrainedparametricspaces Propertiesofmaximumlikelihoodestimationfornormalmixturemodelshavebeendeeply investigated. It iswell known that L(q )is unbounded onQ and maypresent many local maxima. Day(1969) wasperhaps thefirstnoting thatanysmallnumber ofsamplepoints, grouped sufficiently close together, can give raise to spurious maximizers, corresponding toparameterspointswithgreatlydifferingcomponentstandarddeviation.Toovercomethis issue and to prevent L(q ) from singularities, Hathaway (1985) proposed a constrained maximumlikelihoodformulationformixturesofunivariatenormaldistributions,suggesting anaturalextensiontothemultivariatecase.Letc (0,1],thenthefollowingconstraints ∈ 1 mh=inj kl (S hS −j1)≥c (12) ≤ 6 ≤ ontheeigenvaluesl ofS S 1leadstoproperlydefined,scale-equivariant,consistentML- h −j estimatorsforthemixture-of-normal case,seeHennig(2004).Itiseasytoshowthatasuf- ficientconditionfor(12)is a l b, i=1,...,d; g=1,...,G (13) ig ≤ ≤ where l denotes the ith eigenvalue of S i.e. l =l (S ), and for a,b R+ such that ig g ig i g ∈ a/b c, see Ingrassia (2004). Differently from (12), condition (13) can be easily imple- ment≥edinanyoptimizationalgorithm.LetusconsidertheconstrainedparameterspaceQ c ofQ : Q = (p ,...,p ,m ,...,m ,S ,...,S ) Rk[1+d+(d2+d)/2] : c { 1 G 1 G 1 G ∈ p 0,p + +p =1,a l b, g=1,...,G i=1,...,d . (14) g 1 G ig ≥ ··· ≤ ≤ } DuetothestructureofthecovariancematrixS givenin(3),boundin(13)yields g l min(L gL ′g+Y g)≥a and l max(L gL ′g+Y g)≤b, g=1,...,G (15) wherel ()andl ()denotethesmallestandthelargesteigenvalueof()respectively. min max SinceL L ·andY are·symmetricandpositivedefinite,thenitresults: · g ′g g l min(L gL ′g+Y g)≥l min(L gL ′g)+l min(Y g)≥a (16) l max(L gL ′g+Y g)≤l max(L gL ′g)+l max(Y g)≤b, (17) seeLu¨tkepohl(1996).Moreover,beingY adiagonalmatrix,then g l (Y )=miny and l (Y )=maxy , (18) min g ig max g ig i i 8 FrancescaGreselin,SalvatoreIngrassia wherey denotesthei-thdiagonalentryofthematrixY . ig g decoCmopnocseirtnioinng,it.he.ewsequcaarnefidnd×Ld manatdriGx LsguLch′gth(gat=1,...,G), we can get its eigenvalue g g L gL ′g=G gD gG ′g (19) whereG g is the orthonormal matrix whose rows are the eigenvectors ofL gL ′g and D g = diag(d 1g,...,d dg)isthediagonalmatrixoftheeigenvaluesofL gL ′g,sortedinnonincreasing order,i.e.d d ... d 0,andd = =d =0. 1g≥ 2g≥ ≥ qg≥ (q+1)g ··· dg Now, we can apply the singular value decomposition to the d q rectangular matrix L , so giving L =U D V , where U is a d d unitary matrix (×i.e., such that U U = g g g g ′g g × ′g g I )andD isad qrectangular diagonal matrixwithqnonnegative real numbers onthe d g × diagonal,knownassingularvalues,andV isaq qunitarymatrix.Thed columnsofU g andtheqcolumnsofVarecalledtheleftsingularv×ectorsandrightsingularvectorsofL , g respectively.Nowwehavethat L gL ′g=(UgDgV′g)(VgD′gU′g)=UgDgIqD′gU′g=UgDgD′gU′g (20) andequating(19)and(20)wegetG =U andD =D D ,thatis g g g g ′g diag(d ,...,d )=diag(d2 ,...,d2 ). (21) 1g qg 1g qg withd d d 0.Inparticular,itisknownthatonlythefirstqvaluesofD 1g 2g qg g ≥ ≥···≥ ≥ arenonnegative,andtheremainingd qtermsarenull.Thusitresults − l max(L gL ′g)=d12g. (22) Supposingnowtochooseavaluefortheupperboundbinsuchawaythatb Y for ig ≥ g=1,...,Gandi=1,...,q,thenconstraints(16)and(17)aresatisfiedwhen d2 +y a i=1,...,d (23) ig ig≥ d b Y i=1,...,q (24) ig ig ≤ − y b i=q+1,...,d (25) ig p ≤ for g=1,...,G. In particular, we remark that condition (23) reduces toY a for i= ig ≥ (q+1),...,d. 5 Constraintsonthecovariancematrixforfactoranalyzers Thetwo-fold(eigenvalueandsingularvalue)decompositionoftheL presentedabove,sug- g gestshowtomodifytheEMalgorithminsuchawaythattheeigenvaluesofthecovariances S (for g=1,...,G)areconfined intosuitableranges. Tothisaimwehavetoimplement g constraints(23),(24)and(25). Weproceedasfollowsonthe(k+1)thiteration: 1. DecomposeL accordingtothesingularvaluedecompositionasL =U D V ; g g g g ′g 2. Computethesquaredsingularvalues(d2 ,...,d2 )ofL ; 1g qg g 3. CreateacopyD ofD(k+1)andacopyY ofY (k+1); ∗g g ∗g g 4. Fori=1toq, ifd2 +y (k+1)<a, then ifa y (k+1) 0 setd a y (k+1) else ig ig − ig ≥ ig← − ig dig←√aintoD∗g; q Maximumlikelihoodestimationinconstrainedparameterspacesformixturesoffactoranalyzers 9 5. Fori=q+1tod,ify i(gk+1)<athensety i(gk+1)←aintoY ∗g; 6. Fori=1toq,ifd2 +y (k+1)>b,thenifb y (k+1) 0setd b y (k+1)intoD ig ig − ig ≥ ig← − ig ∗g elsed √bintoD ; q ig← ∗g 7. Fori=q+1tod,ify i(gk+1)>bthensety i(gk+1)←bintoY ∗g; 8. SetL (k+1) U D V ; g ← g ∗g ′g 9. SetY (k+1) Y . g ← ∗g 10. Stop. It is important to remark that the resulting EM algorithm is monotone, once the initial guess,sayS 0,satisfiestheconstraints.Further,asshowninthecaseofgaussianmixturesin g IngrassiaandRocci(2007),themaximizationofthecompleteloglikelihood isguaranteed. Fromtheotherside,itisapparentthattheaboverecipesrequiresomeaprioriinformation onthecovariancestructureofthemixture,throughouttheboundsaandb. 6 Numericalstudies In this section we present numerical studies, based on both simulated and real data sets, inorder toshow theperformance oftheconstrainedEMalgorithm withrespecttouncon- strainedapproaches. 6.1 Artificialdata We consider here three mixtures of G components of d-variate normal distributions, for different values of the parameter q . First, we point out that the point of local maximum 0 corresponding to the consistent estimator q , has been chosen to be the limit of the EM ∗ algorithmusingthetrueparameterq asinitialestimate,i.e.consideringthetrueclassifica- 0 tion.Inotherwords,wesetz =1iftheithunitcomesfromthegthcomponentandz =0 ig ig otherwise.Inthefollowing, suchestimatewillbereferred toastheright maximum ofthe likelihoodfunction. Tobeginwith,wegenerateasetof100differentrandominitialclusteringstoinitialize thealgorithmateachrun.Tothisaim,forafixednumberGofcomponentsofthemixture,we draweachtimeasetofrandomstartingvaluesforthez fromthemultinomialdistribution ig with values in (1,2,...,G) with parameters (p ,p ,...,p )=(1/G,1/G,...,1/G). Then 1 2 g werunahundredtimesboththeunconstrainedandtheconstrainedAECMalgorithms(for differentvaluesoftheconstraintsa,b)usingthesamesetofinitialclusteringsinbothcases. TheinitialvaluesfortheelementsofL andY canbeobtainedasdescribedattheendof g g Section3fromtheeigen-decompositionofS ,andthealgorithmsrununtilconvergenceor g itreachesthefixedmaximumnumberofiterations. ThestoppingcriterionisbasedontheAitkenaccelerationprocedure(Aitken,1926),to estimatetheasymptoticmaximumofthelog-likelihoodateachiterationoftheEMalgorithm (in such a way, a decision can be made regarding whether or not the algorithm reaches convergence; thatis,whetherornotthelog-likelihood issufficientlyclosetoitsestimated asymptoticvalue).TheAitkenaccelerationatiterationkisgivenby L(k+1) L(k) a(k)= − , L(k) L(k 1) − − 10 FrancescaGreselin,SalvatoreIngrassia whereL(k+1),L(k),andL(k 1) arethelog-likelihoodvaluesfromiterationsk+1,k,and − k 1,respectively.Then,theasymptoticestimateofthelog-likelihood atiterationk+1is − givenby 1 L¥(k+1)=L(k)+ L(k+1) L(k) , 1 a(k) − − (cid:16) (cid:17) see Bo¨hningetal. (1994). In our analyses, the algorithms stop when L¥(k+1) L(k) <e , withe =0.001.ProgramshavebeenwrittenintheRlanguage; thedifferentc−asesandthe obtainedresultsaredescribedbelow. MIXTURE1:G=3,d=6,q=2,N=150. Thesample has been generated withweights a =(0.3,0.4,0.3) according tothe fol- ′ lowingparameters: m 1=(0,0,0,0,0,0)′ Y 1=diag(0.1,0.1,0.1,0.1,0.1,0.1) m 2=(5,5,5,5,5,5)′ Y 2=diag(0.4,0.4,0.4,0.4,0.4,0.4) m 3=(10,10,10,10,10,10)′ Y 3=diag(0.2,0.2,0.2,0.2,0.2,0.2) 0.50 1.00 0.10 0.20 0.10 0.20 1.00 0.45 0.20 0.50 0.20 0.00       0.05 0.50 1.00 1.00 1.00 0.00 L 1= 0.60 −0.50  L 2= 0.20 −0.50  L 3= 0.20 0.00 . −  −  −   0.50 0.10   1.00 0.70   1.00 0.00         1.00 0.15  1.20 0.30  0.00 1.30  −   −   −        Hence,thecovariancematricesS g=L gL ′g+Y g(g=1,2,3)havethefollowingeigen- values: l (S 1)=(3.17,1.63,0.10,0.10,0.10,0.10)′ l (S 2)=(4.18,2.27,0.40,0.40,0.40,0.40)′ l (S 3)=(2.29,1.93,0.20,0.20,0.20,0.20)′, whoselargestvalueisgivenbymax l (S )=4.18. i,g i g First we run the unconstrained algorithm: the right solution has been attained in 24% ofcases,withoutincurringinsingularities.Summarystatistics(minimum,firstquartileQ , 1 medianQ ,thirdquartileQ andmaximum)aboutthedistributionofthemisclassification 2 3 erroroverthe100runsarereportedinTable2.Duetothechoiceonparameters,werarely expecttoosmalleigenvaluesintheestimatedcovariancematrices:weseta=0.01toprotect fromthem;conversely,aslocalmaximaarequiteoftenduetolargeestimatedeigenvalues, weconsidersettingalsoaconstraintfromabove,takingintoaccountsomevaluesforb,the upperbound.Tocomparehowthechoiceoftheboundsaandbinfluencestheperformance oftheconstrained EM, weexperimented withdifferent pairs of values, andinTable3we report themore interesting cases.Further results arereported in Figure 1, which provides the boxplots of thedistribution of themisclassification errors obtained in the sequence of 100runs,showingthepoorperformanceoftheunconstrainedalgorithmcomparedwiththe good behaviour of its constrained version. For all values of the upper bound b, the third quartileofthemisclassificationerrorissteadilyequalto0.Indeed,forb=6,10and15we hadnomisclassificationerror,whileweobservedverylowandraremisclassificationerrors

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.