UseR! K. Gerald van den Boogaart Raimon Tolosana-Delgado Analyzing Compositional Data with R Use R! SeriesEditors RobertGentleman KurtHornik GiovanniG.Parmigiani Forfurthervolumes: http://www.springer.com/series/6991 K. Gerald van den Boogaart Raimon Tolosana-Delgado Analyzing Compositional Data with R 123 K.GeraldvandenBoogaart RaimonTolosana-Delgado FreibergforResourcesTechnology HelmholtzInstitute Freiberg Germany SeriesEditors: RobertGentleman KurtHornik PrograminComputationalBiology DepartmentofStatistikandMathematik DivisionofPublicHealthSciences Wirtschaftsuniversita¨tWien FredHutchinsonCancerResearchCenter Augasse2-6 1100FairviewAvenue,N.M2-B876 A-1090Wien Seattle,Washington98109 Austria USA GiovanniParmigiani TheSidneyKimmelComprehensive CancerCenteratJohnsHopkinsUniversity 550NorthBroadway Baltimore,MD21205-2011 USA ISBN978-3-642-36808-0 ISBN978-3-642-36809-7(eBook) DOI10.1007/978-3-642-36809-7 SpringerHeidelbergNewYorkDordrechtLondon LibraryofCongressControlNumber:2013940100 MathematicsSubjectClassification:62H99(generalmultivariatemethods),62J05(linearregression), 62P12(environmentalapplications) ©Springer-VerlagBerlinHeidelberg2013 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped.Exemptedfromthislegalreservationarebriefexcerptsinconnection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’slocation,initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer. PermissionsforusemaybeobtainedthroughRightsLinkattheCopyrightClearanceCenter.Violations areliabletoprosecutionundertherespectiveCopyrightLaw. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. While the advice and information in this book are believed to be true and accurate at the date of publication,neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityfor anyerrorsoromissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,with respecttothematerialcontainedherein. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Acknowledgments We would like to thank Regina van den Boogaart for her infinite patience and supportto theauthorsduringmanyperiodsofhardworkandMatevzBren forhis help with the package and the fruitful collaboration on the special compositions. WeareespeciallyindebtedtoVeraPawlowsky-GlahnandJuan-JoseEgozcuefora longrunninggeneralsupport,withoutwhichnoneofuswouldbewritingabookon compositionaldataanalysis. v Contents 1 Introduction .................................................................. 1 1.1 WhatAreCompositionalData?........................................ 1 1.1.1 CompositionsArePortionsofaTotal........................ 1 1.1.2 CompositionsAreMultivariatebyNature ................... 3 1.1.3 TheTotalSumofaCompositionIsIrrelevant............... 3 1.1.4 PracticallyAllCompositionsAreSubcompositions......... 4 1.1.5 TheVeryBriefHistoryofCompositionalDataAnalysis.... 5 1.1.6 SoftwareforCompositionalDataAnalysis .................. 6 1.2 GettingStartedwithR.................................................. 6 1.2.1 SoftwareNeededfortheExamplesinThisBook............ 6 1.2.2 InstallingRandtheExtensionPackages..................... 7 1.2.3 BasicRUsage................................................. 7 1.2.4 GettingHelpforCompositionSpecificCommands......... 10 1.2.5 Troubleshooting............................................... 11 References..................................................................... 11 2 FundamentalConceptsofCompositionalDataAnalysis................ 13 2.1 APracticalViewtoCompositionalConcepts ......................... 13 2.1.1 DefinitionofCompositionalData ............................ 13 2.1.2 SubcompositionsandtheClosureOperation................. 14 2.1.3 CompletionofaComposition ................................ 15 2.1.4 CompositionsasEquivalenceClasses........................ 17 2.1.5 PerturbationasaChangeofUnits............................ 18 2.1.6 Amalgamation................................................. 19 2.1.7 MissingValuesandOutliers.................................. 20 2.2 PrinciplesofCompositionalAnalysis.................................. 20 2.2.1 ScalingInvariance............................................. 20 2.2.2 PerturbationInvariance........................................ 21 2.2.3 SubcompositionalCoherence................................. 22 2.2.4 PermutationInvariance........................................ 23 vii viii Contents 2.3 ElementaryCompositionalGraphics................................... 23 2.3.1 SenseandNonsenseofScatterplotsofComponents ........ 24 2.3.2 TernaryDiagrams ............................................. 24 2.3.3 Log-RatioScatterplots ........................................ 27 2.3.4 BarPlotsandPieCharts ...................................... 28 2.4 MultivariateScales...................................................... 29 2.4.1 ClassicalMultivariateVectorialData(rmult)................ 31 2.4.2 PositiveDatawithAbsoluteGeometry(rplus)............... 31 2.4.3 PositiveDatawithRelativeGeometry(aplus)............... 32 2.4.4 CompositionalDatawithAbsoluteGeometry(rcomp)...... 32 2.4.5 CompositionalDatawithAitchisonGeometry(acomp) .... 33 2.4.6 CountCompositions(ccomp)................................. 34 2.4.7 PracticalConsiderationsonScaleSelection ................. 34 2.5 TheAitchisonSimplex ................................................. 37 2.5.1 TheSimplexandtheClosureOperation ..................... 37 2.5.2 PerturbationasCompositionalSum.......................... 37 2.5.3 PoweringasCompositionalScalarMultiplication........... 39 2.5.4 CompositionalScalarProduct,Norm,andDistance......... 39 2.5.5 TheCenteredLog-RatioTransformation(clr)............... 41 2.5.6 TheIsometricLog-RatioTransformation(ilr)............... 42 2.5.7 TheAdditiveLog-RatioTransformation(alr) ............... 44 2.5.8 GeometricRepresentationofStatisticalResults............. 45 2.5.9 ExpectationandVarianceintheSimplex .................... 47 References..................................................................... 49 3 DistributionsforRandomCompositions.................................. 51 3.1 ContinuousDistributionModels ....................................... 51 3.1.1 TheNormalDistributionontheSimplex..................... 51 3.1.2 TestingforCompositionalNormality ........................ 53 3.1.3 TheDirichletDistribution..................................... 58 3.1.4 TheAitchisonDistribution.................................... 61 3.2 ModelsforCountCompositions ....................................... 62 3.2.1 TheMultinomialDistribution................................. 62 3.2.2 TheMulti-PoissonDistribution............................... 64 3.2.3 DoubleStochasticCountDistributions....................... 66 3.3 RelationsBetweenDistributions....................................... 67 3.3.1 MarginalizationProperties.................................... 67 3.3.2 ConjugatedPriors ............................................. 70 References..................................................................... 70 4 DescriptiveAnalysisofCompositionalData.............................. 73 4.1 DescriptiveStatistics.................................................... 73 4.1.1 CompositionalMean.......................................... 74 4.1.2 MetricVarianceandStandardDeviation..................... 75 4.1.3 VariationMatrixandItsRelatives............................ 76 4.1.4 VarianceMatrices ............................................. 80 4.1.5 Normal-BasedPredictiveandConfidenceRegions.......... 82 Contents ix 4.2 ExploringMarginals.................................................... 85 4.2.1 TheThreeTypesofCompositionalMarginals............... 85 4.2.2 TernaryDiagramMatricesforMultidimensional Compositions.................................................. 87 4.3 ExploringProjections................................................... 89 4.3.1 ProjectionsandBalances...................................... 89 4.3.2 BalanceBases ................................................. 91 4.3.3 TheCoda-Dendrogram........................................ 92 References..................................................................... 93 5 LinearModelsforCompositions........................................... 95 5.1 Introduction ............................................................. 95 5.1.1 ClassicalLinearRegression(ContinuousCovariables) ..... 95 5.1.2 ClassicalAnalysisoftheVariance (DiscreteCovariables)......................................... 100 5.1.3 The Different Types of Compositional RegressionandANOVA ...................................... 102 5.2 CompositionsasIndependentVariables ............................... 103 5.2.1 Example........................................................ 103 5.2.2 DirectVisualizationoftheDependence...................... 103 5.2.3 TheModel..................................................... 106 5.2.4 EstimationofRegressionParameters......................... 107 5.2.5 DisplayingtheModel ......................................... 108 5.2.6 PredictionandPredictiveRegions............................ 110 5.2.7 ModelChecks ................................................. 112 5.2.8 TheStrengthoftheRelationship ............................. 112 5.2.9 GlobalandIndividualTests................................... 114 5.2.10 ModelDiagnosticPlots ....................................... 115 5.2.11 CheckingtheNormalityAssumptions ....................... 117 5.2.12 CheckingConstantVariance.................................. 118 5.2.13 RobustRegression............................................. 118 5.2.14 QuadraticRegression.......................................... 120 5.3 CompositionsasDependentVariables................................. 122 5.3.1 Example........................................................ 122 5.3.2 DirectVisualizationoftheDependence...................... 125 5.3.3 TheModelandItsRInterface................................ 129 5.3.4 EstimationandRepresentationoftheModelParameters ... 134 5.3.5 TestingtheInfluenceofEachVariable....................... 138 5.3.6 ComparingPredictedandObservedValues.................. 140 5.3.7 PredictionofNewValues..................................... 142 5.3.8 Residuals....................................................... 145 5.3.9 MeasuresofCompositionalDetermination.................. 152 5.4 CompositionsasBothDependentandIndependentVariables........ 154 5.4.1 Example........................................................ 154 5.4.2 Visualization................................................... 155
Description: