ebook img

Applied Compositional Data Analysis: With Worked Examples in R PDF

288 Pages·2018·7.287 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Applied Compositional Data Analysis: With Worked Examples in R

Springer Series in Statistics Peter Filzmoser · Karel Hron  Matthias Templ Applied Compositional Data Analysis With Worked Examples in R Springer Series in Statistics SeriesEditors: PeterDiggle,UrsulaGather,ScottZeger PastEditors: PeterBickel,NannyWermuth FoundingEditors: David Brillinger, Stephen Fienberg, Joseph Gani, John Hartigan, Jack Kiefer, KlausKrickeberg Moreinformationaboutthisseriesathttp://www.springer.com/series/692 Peter Filzmoser • Karel Hron (cid:129) Matthias Templ Applied Compositional Data Analysis With Worked Examples in R 123 PeterFilzmoser KarelHron InstituteofStatisticsandMathematical DepartmentofMathematicalAnalysis MethodsinEconomics andApplicationsofMathematics TUWien PalackýUniversityOlomouc Vienna,Austria Olomouc,CzechRepublic MatthiasTempl InstituteofDataAnalysisandProcess Design ZHAWZurichUniversityofApplied Sciences Winterthur,Switzerland ISSN0172-7397 ISSN2197-568X (electronic) SpringerSeriesinStatistics ISBN978-3-319-96420-1 ISBN978-3-319-96422-5 (eBook) https://doi.org/10.1007/978-3-319-96422-5 LibraryofCongressControlNumber:2018952636 MathematicsSubjectClassification(2010):62H25,62H30,62H20,62J05,15A03,62P12,62P25 ©SpringerNatureSwitzerlandAG2018 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland To ourfamilies Preface Compositional data are nowadays widely accepted as multivariate observations carrying relative information: those following the principle of scale invariance, typically being representedin proportionsand percentages,but also in other units like mg/kg and mg/l that reflect their relative nature. In other words, for compo- sitional data the relevant information is contained in the (log-)ratios between the components(parts).In2006,20yearsaftertheseminalbookofJohnAitchison,The statisticalanalysisofcompositionaldata,hasbeenpublished,wemetcompositional dataandthelogratiomethodologyforthefirsttime—tobehonest,notassomething highly appealing, but originally for the reason to get a research paper finally acceptedforpublication,afteratediousreviewingprocess.Wewerenotfullycon- vincedthatthisapproachwouldbesoimportantforpracticalapplications,because at that time the methodology was presented more from a theoretical perspective, andtheapplicationswerepartiallyevenbasedoninventeddata.Ontheotherhand, it was clear that the logratio methodology formed a consistent approach to deal withthistypeofdata,andfurtherinterestingdirectionswereproposed:thepaperon orthonormalcoordinatesforcompositionaldata[Egozcue,J.J., Pawlowsky-Glahn, V., Mateu-Figueras, G., Barceló-Vidal, C. Isometric logratio transformation for compositionaldataanalysisinMathematicalGeology]was publishedjust3years before,andalsotheprincipleofworkingincoordinateswasjustborn. Whenworkingmoreandmoreinthisarea,wefeltatsomepointthattherecould beaneedforapracticalguidetocompositionaldataanalysis—notjustforpeople fromapplications,butalsoforourowncuriosity,tounderstandwhichvalueadded thelogratiomethodologycouldyieldwhenprocessingcompositionaldata.Howdo the results differ when simply taking a log-transformation, compared to working in an appropriate geometry? And are the results (more) reasonable and justified? In the last ten or more years, we did quite an effortin this direction, by touching systematically almost all popular multivariate statistical methods and those fields thatare of primaryimportanceforpracticaldata analysis(robuststatistics, outlier detection,anddealingwithmissingandzerovalues). This book provides a summary of our efforts. We wrote it in a great freedom from what should be followed or mentioned from historical or any other reasons. vii viii Preface The focus is on a proper orthonormal coordinate representation of compositional data thatindeedprovidesa usefulway for a reasonableprocessingof multivariate observations.Thecentralpointareso-calledpivotcoordinatesthataimtoextractall relativeinformationaboutoneofthepartsinacomposition.Thesecoordinateshave proventheiradvancesinanumberofapplicationsandprovokedmanydiscussions. We present the pivot coordinates in a form that shows their flexibility in various dataprocessingcontextsandtheirstrengthfortheinterpretationoftheresults.Nev- ertheless, we admit that also other representations,like more generalorthonormal coordinates,balances,butalsocenteredlogratiocoefficients,orpairwiselogratios, areusefulinconcretecontexts. Thebookcanbetakenasaconcise,self-containedmanualonhowtoapplythe logratio methodology for compositional data analysis in everyday practice, using thestatisticalsoftwareenvironmentRandthepackagerobCompositions.We triedtoillustratethetheoreticalpartswithseveralexamplesfromapplicationswith general understandability, like those from official statistics, economics, geology, or chemometrics. As a minimum prerequisite for accessing the book, just a basic courseonprobabilityandstatisticsisrequired,althoughadditionalexperiencewith multivariate statistics and statistical computing might be advisable. On the other hand, the book can also be considered as a source of inspiration for those who are familiar enough with standard knowledge on compositional data analysis, as presented in the book by V. Pawlowsky-Glahn, J.J. Egozcue, and R. Tolosana- Delgado, Modeling and analysis of compositional data. According to these aims, afterprovidingthegeometricalreasoningforarelevant(notexclusivelystatistical) processingof compositionaldata, manypopularstatistical methods,like principal component analysis, cluster analysis, classification and regression analysis, are adaptedfordealingwithdatacarryingrelativeinformation.Moreover,exploratory andpreprocessingissuesarediscussed:visualization,outlierdetection,anddealing with missing values and particularly with zeros that form a touchstone of the logratio analysis. Last but not least, also emerging fields like analyzing high- dimensional compositional data and compositional tables, with great potential for future developments, are discussed. This clearly illustrates that not a closed methodological framework but rather just a state of the art of an intensively developingresearchfieldispresented. Finally, the structure of the book can also be used for a one-semester course on applied compositional data analysis. The interactive form of the book enables students to practice theoretical knowledge directly with data sets coming from differentfields of their possible future expertise.Our sincere wish is to contribute to the education of a new generation of people for which statistical analysis of compositionaldataisamatterofcreativethinking. Vienna,Austria PeterFilzmoser Olomouc,CzechRepublic KarelHron Winterthur,Switzerland MatthiasTempl August25,2018 Acknowledgments We are very gratefulto our colleaguesfrom the Vienna University of Technology and from the Palacký University Olomouc and to many collaboratorswho helped us to get familiar with compositionaldata analysis. In particular,we like to thank Dr. Clemens Reimann from the Geological Survey of Norway, for bringing us in touch with real data applications from geochemistry, which made it necessary at somepointtogetacquaintedwithcompositionaldata.WearegratefultoProf.Vera Pawlowsky-Glahn from the University of Girona and to Prof. Juan José Egozcue from the Polytechnic University of Catalonia for numerous fruitful discussions— theyarereally“parents”ofcompositionaldataanalysis.Theirhintsandideasgreatly helpedtowritethebookinthispresentform.And,finally,ourgreatestgratitudeis toourfamilies:withouttheirlong-termsupport,anyresearchactivitieswouldbyfar notbepossible. ix Contents 1 CompositionalDataasaMethodologicalConcept...................... 1 1.1 WhatAreCompositionalData? ..................................... 1 1.2 IntroductoryProblems ............................................... 5 1.2.1 PhDStudentsExample..................................... 5 1.2.2 BeerDataExample......................................... 8 1.2.3 GeochemicalDataExample................................ 10 1.3 PrinciplesofCompositionalDataAnalysis......................... 11 1.4 StepstoaConciseMethodology .................................... 14 References.................................................................... 15 2 AnalyzingCompositionalDataUsingR ................................. 17 2.1 BriefOverviewonPackagesRelatedtoCompositionalData Analysis............................................................... 17 2.1.1 compositions................................................ 18 2.1.2 robCompositions............................................ 18 2.1.3 ggtern........................................................ 21 2.1.4 zCompositions.............................................. 21 2.1.5 mvoutlier,StatDA .......................................... 21 2.1.6 CoDaPack................................................... 21 2.1.7 compositionsGUI........................................... 22 2.2 TheStatisticsEnvironmentR........................................ 22 2.3 BasicsinR............................................................ 22 2.3.1 InstallationofRandUpdates.............................. 24 2.3.2 InstallrobCompositions.................................... 24 2.3.3 Help ......................................................... 25 2.3.4 TheRWorkspaceandtheWorkingDirectory ............ 26 2.3.5 DataTypes.................................................. 27 2.3.6 GenericFunctions,MethodsandClasses.................. 32 References.................................................................... 33 xi

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.