SPRINGER BRIEFS IN STATISTICS ABE Víctor Hugo Lachos Dávila Celso Rômulo Barbosa Cabral Camila Borelli Zeller Finite Mixture of Skewed Distributions SpringerBriefs in Statistics SpringerBriefs in Statistics – ABE SeriesEditors FranciscoLouzadaNeto,DepartamentodeMatemáticaAplicadaeEstatística, UniversidadedeSãoPaulo,SãoCarlos,SãoPaulo,Brazil HeliodosSantosMigon,InstitutodeMatemática,UniversidadeFederaldoRio deJaneiro,RiodeJaneiro,RiodeJaneiro,Brazil GilbertoAlvarengaPaula,InstitutodeMatemáticaeEstatística,Universidade deSãoPaulo,SãoPaulo,SãoPaulo,Brazil SpringerBriefs present concise summaries of cutting-edge research and practical applicationsacrossawidespectrumoffields.Featuringcompactvolumesof50to 125 pages, the series covers a range of content from professional to academic. Briefsarecharacterizedbyfast,globalelectronicdissemination,standardpublishing contracts, standardized manuscript preparation and formatting guidelines, and expeditedproductionschedules. Typicaltopicsmightinclude: (cid:129) Atimelyreportofstate-of-arttechniques (cid:129) A bridge between new research results, as published in journal articles, and acontextualliteraturereview (cid:129) Asnapshotofahotoremergingtopic (cid:129) Anin-depthcasestudy (cid:129) A presentationofcoreconceptsthatstudentsmustunderstandinordertomake independentcontributions This SpringerBriefsin Statistics – ABE subseries aims to publish relevantcontri- butions to several fields in the Statistical Sciences, produced by full members of the ABE – Associação Brasileira de Estatística (Brazilian Statistical Association), or by distinguished researchers from all Latin America. These texts are targeted toabroadaudiencethatincludesresearchers,graduatestudentsandprofessionals, and may cover a multitude of areas like Pure and Applied Statistics, Actuarial Science, Econometrics, Quality Assurance and Control, Computational Statistics, RiskandProbability,EducationalStatistics,DataMining,BigDataandConfidence andSurvivalAnalysis,tonameafew. Moreinformationaboutthissubseriesathttp://www.springer.com/subseries/13778 Víctor Hugo Lachos Dávila Celso Rômulo Barbosa Cabral Camila Borelli Zeller Finite Mixture of Skewed Distributions 123 VíctorHugoLachosDávila CelsoRômuloBarbosaCabral DepartmentofStatistics DepartmentofStatistics UniversityofConnecticut FederalUniversityofAmazonas StorrsMansfield,CT,USA Manaus,Brazil CamilaBorelliZeller DepartmentofStatistics FederalUniversityofJuizdeFora JuizdeFora,MinasGerais,Brazil ISSN2191-544X ISSN2191-5458 (electronic) SpringerBriefsinStatistics ISSN2524-6917 SpringerBriefsinStatistics–ABE ISBN978-3-319-98028-7 ISBN978-3-319-98029-4 (eBook) https://doi.org/10.1007/978-3-319-98029-4 LibraryofCongressControlNumber:2018951927 MathematicsSubjectClassification:62-07,62Exx,62Fxx,62Jxx ©TheAuthor(s),underexclusivelicencetoSpringerNatureSwitzerlandAG2018 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthors,andtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Lourdes,CarloseMarcelo(CamilaBorelli Zeller) Sílvia,IvaneteeGabriela(CelsoRômulo BarbosaCabral) RosaDávila,Eduardo,ThuanyeAlberto (VíctorHugoLachosDávila) Preface Modeling based on finite mixture distributions is a rapidly developing area with an exploding range of applications. Finite mixture models are nowadays applied in such diverse areas as biology, biometrics, genetics, medicine, and marketing, among others. The main objective of this book is to present recent results in this areaofresearch,whichintendedtopreparethereaderstoundertakemixturemodels usingscalemixturesofskew-normal(SMSN)distributions.Weconsidermaximum likelihoodestimationforunivariateandmultivariatefinitemixtureswherecompo- nentsaremembersoftheflexibleclassofSMSNdistributionsproposedbyLachos et al. (2010), which is a subclass of the skew-elliptical class proposed by Branco and Dey (2001). This subclass contains the entire family of normal independent distributions, also known as scale mixtures of normal distributions (SMN) (Lange and Sinsheimer 1993). In addition, the skew-normal and skewed versions of some other classical symmetric distributions are SMSN members: the skew-t (ST), the skew-slash (SSL), and the skew-contaminated normal (SCN), for example. These distributions have heavier tails than the typical normal one, and thus they seem to beareasonable choiceforrobustinference.Theproposed EM-typealgorithmand methodsareimplementedintheRpackagemixsmsn. In Chap.1, we present a motivating example, where it seems that a better degreeofexplanationisobtainedusingmixturesofSMSNdistributionsthanusing symmetriccomponents. In Chap.2, we present background material on mixture models and the EM algorithm for maximum likelihood estimation. This is followed by the derivation oftheobservedinformationmatrixtoobtainthestandarderrors. InChap.3,forthesakeofcompleteness,wedefinethemultivariateSMSNdis- tributionsandstudysomeofitsimportantproperties,viz.,moments,kurtosis,linear transformations,marginalandconditionaldistributions,amongothers.Further,the EMalgorithmforperformingmaximumlikelihoodestimationispresented. In Chap.4, we propose a finite mixture of univariate SMSN distributions (FM-SMSN) and an EM-type algorithm for maximum likelihood estimation. The associated observed information matrix is obtained analytically. The methodology vii viii Preface proposed is illustrated considering the analysis of a real data set and simulation studies. In Chap.5, we consider a flexible class of models, with elements that are finite mixtures of multivariate SMSN distributions. A general EM-type algorithm is employed for iteratively computing parameter estimates and this is discussed with emphasis on finite mixtures of skew-normal, skew-t, skew-slash, and skew- contaminatednormaldistributions.Further,ageneralinformation-basedmethodfor approximatingtheasymptoticcovariancematrixoftheestimatesisalsopresented. Finally in Chap.6, we present a proposal to deal with mixtures of regression models by assuming that the random errors follow scale mixtures of skew-normal distributions. This approach allows us to model data with great flexibility, accom- modatingskewnessandheavytailsatthesametime.AsimpleEM-typealgorithm toperformmaximumlikelihoodinferenceoftheparametersoftheproposedmodel is derived. A real data set is analyzed, illustrating the usefulness of the proposed method. We hope that the publication of this text will enhance the spread of ideas that are currently trickling thought the literature of mixture models. The skew modelsandmethodsdevelopedrecentlyinthisfieldhaveyettoreachtheirlargest possible audience, partly because the results are scattered in various journals. [email protected],[email protected], or [email protected]. Víctor H. Lachos was supported by CNPq-Brazil (BPPesq) and São Paulo State Research Foundation (FAPESP). Celso Rômulo Barbosa Cabral was supported by CNPq (BPPesq) and Amazonas State Research Foundation(FAPEAM,Universalproject).CamilaBorelliZellerwassupportedby CNPq(BPPesqandUniversalproject)andMinasGeraisStateResearchFoundation (FAPEMIG,Universalproject). Storrs,USA VíctorHugoLachosDávila Manaus,Brazil CelsoRômuloBarbosaCabral JuizdeFora,Brazil CamilaBorelliZeller January2018 Contents 1 Motivation .................................................................... 1 2 MaximumLikelihoodEstimationinNormalMixtures.................. 7 2.1 EMAlgorithmforFiniteMixtures..................................... 8 2.2 StandardErrors.......................................................... 10 3 ScaleMixturesofSkew-NormalDistributions............................ 15 3.1 Introduction ............................................................. 15 3.2 SMNDistributions...................................................... 17 3.2.1 ExamplesofSMNDistributions............................... 18 3.3 MultivariateSMSNDistributionsandMainResults.................. 19 3.3.1 ExamplesofSMSNDistributions.............................. 25 3.3.2 ASimulationStudy............................................. 29 3.4 MaximumLikelihoodEstimation...................................... 31 3.5 TheObservedInformationMatrix ..................................... 34 4 UnivariateMixtureModelingUsingSMSNDistributions............... 37 4.1 Introduction ............................................................. 37 4.2 TheProposedModel.................................................... 38 4.2.1 MaximumLikelihoodEstimationviaEMAlgorithm......... 39 4.2.2 NotesonImplementation....................................... 42 4.3 TheObservedInformationMatrix ..................................... 43 4.3.1 TheSkew-tDistribution........................................ 45 4.3.2 TheSkew-SlashDistribution................................... 45 4.3.3 TheSkew-ContaminatedNormalDistribution ............... 45 4.4 SimulationStudies...................................................... 46 4.4.1 Study1:Clustering ............................................. 46 4.4.2 Study2:AsymptoticProperties................................ 48 4.4.3 Study3:ModelSelection....................................... 49 4.5 ApplicationwithRealData............................................. 54 ix x Contents 5 MultivariateMixtureModelingUsingSMSNDistributions ............ 57 5.1 Introduction ............................................................. 57 5.2 TheProposedModel.................................................... 59 5.2.1 MaximumLikelihoodEstimationviaEMAlgorithm......... 60 5.3 TheObservedInformationMatrix ..................................... 63 5.3.1 TheSkew-NormalDistribution ................................ 64 5.3.2 TheSkew-tDistribution........................................ 65 5.3.3 TheSkew-SlashDistribution................................... 65 5.3.4 TheSkew-ContaminatedNormalDistribution................ 65 5.4 ApplicationswithSimulatedandRealData........................... 66 5.4.1 Consistency ..................................................... 66 5.4.2 StandardDeviation ............................................. 67 5.4.3 ModelFitandClustering....................................... 71 5.4.4 ThePimaIndiansDiabetesData............................... 74 5.5 IdentifiabilityandUnboundedness..................................... 76 6 MixtureRegressionModelingBasedonSMSNDistributions .......... 77 6.1 Introduction ............................................................. 77 6.2 TheProposedModel.................................................... 79 6.2.1 MaximumLikelihoodEstimationviaEMAlgorithm......... 80 6.2.2 NotesonImplementation....................................... 83 6.3 SimulationExperiments ................................................ 84 6.3.1 Experiment1:ParameterRecovery............................ 84 6.3.2 Experiment2:Classification ................................... 85 6.3.3 Experiment3:Classification ................................... 89 6.4 RealDataset............................................................. 90 References......................................................................... 95 Index............................................................................... 101