Table Of ContentUseR!
K. Gerald van den Boogaart
Raimon Tolosana-Delgado
Analyzing
Compositional
Data with R
Use R!
SeriesEditors
RobertGentleman KurtHornik GiovanniG.Parmigiani
Forfurthervolumes:
http://www.springer.com/series/6991
K. Gerald van den Boogaart
Raimon Tolosana-Delgado
Analyzing Compositional
Data with R
123
K.GeraldvandenBoogaart
RaimonTolosana-Delgado
FreibergforResourcesTechnology
HelmholtzInstitute
Freiberg
Germany
SeriesEditors:
RobertGentleman KurtHornik
PrograminComputationalBiology DepartmentofStatistikandMathematik
DivisionofPublicHealthSciences Wirtschaftsuniversita¨tWien
FredHutchinsonCancerResearchCenter Augasse2-6
1100FairviewAvenue,N.M2-B876 A-1090Wien
Seattle,Washington98109 Austria
USA
GiovanniParmigiani
TheSidneyKimmelComprehensive
CancerCenteratJohnsHopkinsUniversity
550NorthBroadway
Baltimore,MD21205-2011
USA
ISBN978-3-642-36808-0 ISBN978-3-642-36809-7(eBook)
DOI10.1007/978-3-642-36809-7
SpringerHeidelbergNewYorkDordrechtLondon
LibraryofCongressControlNumber:2013940100
MathematicsSubjectClassification:62H99(generalmultivariatemethods),62J05(linearregression),
62P12(environmentalapplications)
©Springer-VerlagBerlinHeidelberg2013
Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof
thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation,
broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation
storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology
nowknownorhereafterdeveloped.Exemptedfromthislegalreservationarebriefexcerptsinconnection
with reviews or scholarly analysis or material supplied specifically for the purpose of being entered
and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of
this publication or parts thereof is permitted only under the provisions of the Copyright Law of the
Publisher’slocation,initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer.
PermissionsforusemaybeobtainedthroughRightsLinkattheCopyrightClearanceCenter.Violations
areliabletoprosecutionundertherespectiveCopyrightLaw.
Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication
doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant
protectivelawsandregulationsandthereforefreeforgeneraluse.
While the advice and information in this book are believed to be true and accurate at the date of
publication,neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityfor
anyerrorsoromissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,with
respecttothematerialcontainedherein.
Printedonacid-freepaper
SpringerispartofSpringerScience+BusinessMedia(www.springer.com)
Acknowledgments
We would like to thank Regina van den Boogaart for her infinite patience and
supportto theauthorsduringmanyperiodsofhardworkandMatevzBren forhis
help with the package and the fruitful collaboration on the special compositions.
WeareespeciallyindebtedtoVeraPawlowsky-GlahnandJuan-JoseEgozcuefora
longrunninggeneralsupport,withoutwhichnoneofuswouldbewritingabookon
compositionaldataanalysis.
v
Contents
1 Introduction .................................................................. 1
1.1 WhatAreCompositionalData?........................................ 1
1.1.1 CompositionsArePortionsofaTotal........................ 1
1.1.2 CompositionsAreMultivariatebyNature ................... 3
1.1.3 TheTotalSumofaCompositionIsIrrelevant............... 3
1.1.4 PracticallyAllCompositionsAreSubcompositions......... 4
1.1.5 TheVeryBriefHistoryofCompositionalDataAnalysis.... 5
1.1.6 SoftwareforCompositionalDataAnalysis .................. 6
1.2 GettingStartedwithR.................................................. 6
1.2.1 SoftwareNeededfortheExamplesinThisBook............ 6
1.2.2 InstallingRandtheExtensionPackages..................... 7
1.2.3 BasicRUsage................................................. 7
1.2.4 GettingHelpforCompositionSpecificCommands......... 10
1.2.5 Troubleshooting............................................... 11
References..................................................................... 11
2 FundamentalConceptsofCompositionalDataAnalysis................ 13
2.1 APracticalViewtoCompositionalConcepts ......................... 13
2.1.1 DefinitionofCompositionalData ............................ 13
2.1.2 SubcompositionsandtheClosureOperation................. 14
2.1.3 CompletionofaComposition ................................ 15
2.1.4 CompositionsasEquivalenceClasses........................ 17
2.1.5 PerturbationasaChangeofUnits............................ 18
2.1.6 Amalgamation................................................. 19
2.1.7 MissingValuesandOutliers.................................. 20
2.2 PrinciplesofCompositionalAnalysis.................................. 20
2.2.1 ScalingInvariance............................................. 20
2.2.2 PerturbationInvariance........................................ 21
2.2.3 SubcompositionalCoherence................................. 22
2.2.4 PermutationInvariance........................................ 23
vii
viii Contents
2.3 ElementaryCompositionalGraphics................................... 23
2.3.1 SenseandNonsenseofScatterplotsofComponents ........ 24
2.3.2 TernaryDiagrams ............................................. 24
2.3.3 Log-RatioScatterplots ........................................ 27
2.3.4 BarPlotsandPieCharts ...................................... 28
2.4 MultivariateScales...................................................... 29
2.4.1 ClassicalMultivariateVectorialData(rmult)................ 31
2.4.2 PositiveDatawithAbsoluteGeometry(rplus)............... 31
2.4.3 PositiveDatawithRelativeGeometry(aplus)............... 32
2.4.4 CompositionalDatawithAbsoluteGeometry(rcomp)...... 32
2.4.5 CompositionalDatawithAitchisonGeometry(acomp) .... 33
2.4.6 CountCompositions(ccomp)................................. 34
2.4.7 PracticalConsiderationsonScaleSelection ................. 34
2.5 TheAitchisonSimplex ................................................. 37
2.5.1 TheSimplexandtheClosureOperation ..................... 37
2.5.2 PerturbationasCompositionalSum.......................... 37
2.5.3 PoweringasCompositionalScalarMultiplication........... 39
2.5.4 CompositionalScalarProduct,Norm,andDistance......... 39
2.5.5 TheCenteredLog-RatioTransformation(clr)............... 41
2.5.6 TheIsometricLog-RatioTransformation(ilr)............... 42
2.5.7 TheAdditiveLog-RatioTransformation(alr) ............... 44
2.5.8 GeometricRepresentationofStatisticalResults............. 45
2.5.9 ExpectationandVarianceintheSimplex .................... 47
References..................................................................... 49
3 DistributionsforRandomCompositions.................................. 51
3.1 ContinuousDistributionModels ....................................... 51
3.1.1 TheNormalDistributionontheSimplex..................... 51
3.1.2 TestingforCompositionalNormality ........................ 53
3.1.3 TheDirichletDistribution..................................... 58
3.1.4 TheAitchisonDistribution.................................... 61
3.2 ModelsforCountCompositions ....................................... 62
3.2.1 TheMultinomialDistribution................................. 62
3.2.2 TheMulti-PoissonDistribution............................... 64
3.2.3 DoubleStochasticCountDistributions....................... 66
3.3 RelationsBetweenDistributions....................................... 67
3.3.1 MarginalizationProperties.................................... 67
3.3.2 ConjugatedPriors ............................................. 70
References..................................................................... 70
4 DescriptiveAnalysisofCompositionalData.............................. 73
4.1 DescriptiveStatistics.................................................... 73
4.1.1 CompositionalMean.......................................... 74
4.1.2 MetricVarianceandStandardDeviation..................... 75
4.1.3 VariationMatrixandItsRelatives............................ 76
4.1.4 VarianceMatrices ............................................. 80
4.1.5 Normal-BasedPredictiveandConfidenceRegions.......... 82
Contents ix
4.2 ExploringMarginals.................................................... 85
4.2.1 TheThreeTypesofCompositionalMarginals............... 85
4.2.2 TernaryDiagramMatricesforMultidimensional
Compositions.................................................. 87
4.3 ExploringProjections................................................... 89
4.3.1 ProjectionsandBalances...................................... 89
4.3.2 BalanceBases ................................................. 91
4.3.3 TheCoda-Dendrogram........................................ 92
References..................................................................... 93
5 LinearModelsforCompositions........................................... 95
5.1 Introduction ............................................................. 95
5.1.1 ClassicalLinearRegression(ContinuousCovariables) ..... 95
5.1.2 ClassicalAnalysisoftheVariance
(DiscreteCovariables)......................................... 100
5.1.3 The Different Types of Compositional
RegressionandANOVA ...................................... 102
5.2 CompositionsasIndependentVariables ............................... 103
5.2.1 Example........................................................ 103
5.2.2 DirectVisualizationoftheDependence...................... 103
5.2.3 TheModel..................................................... 106
5.2.4 EstimationofRegressionParameters......................... 107
5.2.5 DisplayingtheModel ......................................... 108
5.2.6 PredictionandPredictiveRegions............................ 110
5.2.7 ModelChecks ................................................. 112
5.2.8 TheStrengthoftheRelationship ............................. 112
5.2.9 GlobalandIndividualTests................................... 114
5.2.10 ModelDiagnosticPlots ....................................... 115
5.2.11 CheckingtheNormalityAssumptions ....................... 117
5.2.12 CheckingConstantVariance.................................. 118
5.2.13 RobustRegression............................................. 118
5.2.14 QuadraticRegression.......................................... 120
5.3 CompositionsasDependentVariables................................. 122
5.3.1 Example........................................................ 122
5.3.2 DirectVisualizationoftheDependence...................... 125
5.3.3 TheModelandItsRInterface................................ 129
5.3.4 EstimationandRepresentationoftheModelParameters ... 134
5.3.5 TestingtheInfluenceofEachVariable....................... 138
5.3.6 ComparingPredictedandObservedValues.................. 140
5.3.7 PredictionofNewValues..................................... 142
5.3.8 Residuals....................................................... 145
5.3.9 MeasuresofCompositionalDetermination.................. 152
5.4 CompositionsasBothDependentandIndependentVariables........ 154
5.4.1 Example........................................................ 154
5.4.2 Visualization................................................... 155
Description:This book presents the statistical analysis of compositional data sets, i.e., data in percentages, proportions, concentrations, etc. The subject is covered from its grounding principles to the practical use in descriptive exploratory analysis, robust linear models and advanced multivariate statistic