ebook img

Multivariate Analysis of Ecological Data PDF

110 Pages·2003·1.687 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Jan Lepš & Petr Šmilauer Faculty of Biological Sciences, UnČiversity of ěSouth Bohemia eské Bud jovice, 1999 Foreword This textbook provides study materials for the participants of the course named Multivariate Analysis of Ecological Data that we teach at our university for the third year. Material provided here should serve both for the introductory and the advanced versions ofthe course.We admit that some partsofthe text would profit from further polishing, theyare quite roughbut we hope in further improvement ofthis text. We hope that this book provides an easy-to-read supplement for the more exact and detailed publications like the collection of the Dr. Ter Braak' papers and the Canoco for Windows 4.0 manual. In addition to the scope of these publications, this textbook adds information on the classification methods of the multivariate data analysis and introduces some of the modern regression methods most useful in the ecologicalresearch. Wherever we refer to some commercial software products, these are covered bytrademarks orregistered marks oftheir respective producers. This publication is far from being final and this is seen on its quality: some issues appear repeatedly through the book, but we hope this provides, at least, an opportunitytothe readerto see the same topic expressed indifferent words. 2 Table of contents 1. INTRODUCTION AND DATA MANIPULATION......................................7 1.1.Examplesofresearchproblems...............................................................................................7 1.2.Terminology.............................................................................................................................8 1.3.Analyses..................................................................................................................................10 1.4.Response(species)data..........................................................................................................10 1.5.Explanatoryvariables............................................................................................................11 1.6.Handlingmissingvalues.........................................................................................................12 1.7.Importingdatafromspreadsheets-CanoImpprogram.......................................................13 1.8.CANOCOFullformatofdatafiles........................................................................................15 1.9.CANOCOCondensedformat................................................................................................17 1.10.Formatline...........................................................................................................................17 1.11.Transformationofspeciesdata............................................................................................19 1.12.Transformationofexplanatoryvariables............................................................................20 2. METHODS OF GRADIENT ANALYSIS.................................................22 2.1.Techniquesofgradientanalysis.............................................................................................22 2.2.Modelsofspeciesresponsetoenvironmentalgradients........................................................23 2.3.Estimatingspeciesoptimumbytheweightedaveragingmethod..........................................24 2.4.Ordinations.............................................................................................................................26 2.5.Constrainedordinations.........................................................................................................26 2.6.Codingenvironmentalvariables............................................................................................27 2.7.Basictechniques.....................................................................................................................27 2.8.Ordinationdiagrams..............................................................................................................27 2.9.Twoapproaches......................................................................................................................28 2.10.Partialanalyses.....................................................................................................................29 2.11.Testingthesignificanceofrelationshipswithenvironmentalvariables..............................29 2.12.SimpleexampleofMonteCarlopermutationtestforsignificanceofcorrelation...............30 3. USING THE CANOCO FOR WINDOWS 4.0 PACKAGE.......................32 3 3.1.Overviewofthepackage........................................................................................................32 CanocoforWindows4.0.........................................................................................................32 CANOCO4.0.........................................................................................................................32 WCanoImpandCanoImp.exe..................................................................................................33 CEDIT....................................................................................................................................34 CanoDraw3.1.........................................................................................................................34 CanoPostforWindows1.0......................................................................................................35 3.2.TypicalanalysisworkflowwhenusingCanocoforWindows4.0..........................................36 3.3.Decideaboutordinationmodel:unimodalorlinear?..........................................................38 3.4.Doingordination-PCA: centeringandstandardizing.........................................................39 3.5.Doingordination-DCA:detrending.....................................................................................40 3.6.Doingordination-scalingofordinationscores.....................................................................41 3.7.RunningCanoDraw3.1.........................................................................................................41 3.8.AdjustingdiagramswithCanoPostprogram........................................................................43 3.9.Newanalysesprovidingnewviewsofourdatasets................................................................43 3.10.Lineardiscriminantanalysis................................................................................................44 4. DIRECT GRADIENT ANALYSIS AND MONTE-CARLO PERMUTATION TESTS..........................................................................................................46 4.1.Linearmultipleregressionmodel..........................................................................................46 4.2.Constrainedordinationmodel...............................................................................................47 4.3.RDA:constrainedPCA..........................................................................................................47 4.4.MonteCarlopermutationtest:anintroduction....................................................................49 4.5.Nullhypothesismodel............................................................................................................49 4.6.Teststatistics..........................................................................................................................50 4.7.Spatialandtemporalconstraints...........................................................................................51 4.8.Design-basedconstraints.......................................................................................................53 4.9.Stepwiseselectionofthemodel..............................................................................................53 4.10.Variancepartitioningprocedure.........................................................................................55 5. CLASSIFICATION METHODS ..............................................................57 5.1.Sampledataset......................................................................................................................57 5.2.Non-hierarchicalclassification(K-meansclustering)...........................................................59 5.3.Hierarchicalclassifications....................................................................................................61 Agglomerativehierarchicalclassifications(Clusteranalysis)...................................................61 4 Divisiveclassifications............................................................................................................65 AnalysisoftheTatrysamples..................................................................................................67 6. VISUALIZATION OF MULTIVARIATE DATA WITH CANODRAW 3.1 AND CANOPOST 1.0 FOR WINDOWS.......................................................72 6.1.Whatcanwereadfromtheordinationdiagrams:Linearmethods......................................72 6.2.Whatcanwereadfromtheordinationdiagrams:Unimodalmethods.................................74 6.3.RegressionmodelsinCanoDraw...........................................................................................76 6.4.OrdinationDiagnostics...........................................................................................................77 6.5.T-valuebiplotinterpretation..................................................................................................78 7. CASE STUDY 1: SEPARATING THE EFFECTS OF EXPLANATORY VARIABLES.................................................................................................80 7.1.Introduction............................................................................................................................80 7.2.Data........................................................................................................................................80 7.3.Dataanalysis...........................................................................................................................80 8. CASE STUDY 2: EVALUATION OF EXPERIMENTS IN THE RANDOMIZED COMPLETE BLOCKS........................................................84 8.1.Introduction............................................................................................................................84 8.2.Data........................................................................................................................................84 8.3.Dataanalysis...........................................................................................................................84 9. CASE STUDY 3: ANALYSIS OF REPEATED OBSERVATIONS OF SPECIES COMPOSITION IN A FACTORIAL EXPERIMENT: THE EFFECT OF FERTILIZATION, MOWING AND DOMINANT REMOVAL IN AN OLIGOTROPHIC WET MEADOW...............................................................88 9.1.Introduction............................................................................................................................88 9.2.Experimentaldesign...............................................................................................................88 9.3.Sampling.................................................................................................................................89 9.4.Dataanalysis...........................................................................................................................89 9.5.Technicaldescription.............................................................................................................90 9.6.Furtheruseofordinationresults...........................................................................................93 10. TRICKS AND RULES OF THUMB IN USING ORDINATION METHODS....................................................................................................94 5 10.1.Scalingoptions.....................................................................................................................94 10.2.Permutationtests.................................................................................................................94 10.3.Otherissues..........................................................................................................................95 11. MODERN REGRESSION: AN INTRODUCTION................................96 11.1.Regressionmodelsingeneral...............................................................................................96 11.2.GeneralLinearModel:Terms.............................................................................................97 11.3.GeneralizedLinearModels(GLM).....................................................................................99 11.4.Loesssmoother...................................................................................................................100 11.5.GeneralizedAdditiveModel(GAM).................................................................................101 11.6.ClassificationandRegressionTrees..................................................................................101 11.7.Modellingspeciesresponsecurves:comparisonofmodels...............................................102 12. REFERENCES..................................................................................110 6 1. Introduction and Data Manipulation 1.1. Examples of research problems Methods of multivariate statistical analysis are no longer limited to exploration of multidimensional data sets. Intricate research hypotheses can be tested, complex experimental designs can be taken into account during the analyses. Following are few examples ofresearch questions where multivariate data analyses were extremely helpful: • Can we predict loss of nesting locality of endangered wader species based on the current state of the landscape? What landscape components are most important for predicting this process? The following diagrampresents the results ofa statistical analysis that addressed this question: Figure 1-1 Ordination diagram displaying the first two axes of a redundancy analysis for the dataonthewadersnestingpreferences The diagram indicates that three of the studied bird species decreased their nesting frequency in the landscape with higher percentage of meadows, while the fourth one (Gallinago gallinago) retreated in the landscape with recently low percentage of the area covered by the wetlands. Nevertheless, when we tested the significance of the indicated relations, none ofthemturned out to be significant. In this example, we were looking on the dependency of (semi-)quantitative response variables (the extent of retreat of particular bird species) upon the percentage cover of the individual landscape components. The ordination method provides here an extension ofthe regression analysis where we model response of several variables at the same time. 7 • How do individual plant species respond to the addition of phosphorus and/or exclusion of AM symbiosis? Does the community response suggest an interactioneffect betweenthe two factors? This kind ofquestion used to be approached using one or another formofanalysis of variance (ANOVA). Its multivariate extension allows us to address similar problems, but looking at more than one response variable at the same time. Correlations betweenthe plant species occurrences are accounted for inthe analysis output. Figure1-2Ordinationdiagramdisplayingthefirsttwoordinationaxesofaredundancyanalysis summarizing effects of the fungicide and of the phosphate application on a grassland plant community. This ordination diagram indicates that many forbs decreased their biomass when either the fungicide (Benomyl) or the phosphorus source were applied. The yarrow (Achillea millefolium) seems to profit from the fungicide application, while the grasses seem to respond negatively to the same treatment. This time, the effects displayed in the diagram are supported by a statistical test which suggests rejection ofthe null hypothesis at a significance levelα= 0.05. 1.2. Terminology The terminologyfor multivariate statisticalmethods is quite complicated, so we must spend some time with it. There are at least two different terminological sets. One, more general and more abstract, contains purely statistical terms applicable across the whole field of science. In this section, we give the terms from this set in italics, mostly in the parentheses. The other set represents a mixture of terms used in the ecological statistics with the most typical examples from the field of community ecology. This is the set we will focus on, using the former one just to be able to refer to the more general statistical theory. This is also the set adopted by the CANOCO program. 8 In all the cases, we have a dataset with the primary data. This┼dataset contains records on a collection of observations - samples (sampling units) . Each sample collects values for multiple species or, less often, environmental variables (variables). The primary data can be represented by arectangular matrix, where the rows typically represent individual samples and the columns represent individual variables (species, chemicalor physicalproperties ofthe wateror soil, etc). Very often is our primary data set (containing the response variables) accompanied byanother data set containing the explanatory variables. Ifour primary data represents a community composition, then the explanatory data set typically contains measurements of the soil properties, a semi-quantitative scoring of the human impact etc. When we use the explanatory variables in a model to predict the primary data (like the community composition), we might divide them into two different groups. The first group is called, somehow inappropriately, the environmental variables and refers to the variables which are of the prime interest inour particular analysis. The other grouprepresents the so-called covariables (often refered to as covariates in other statistical approaches) which are also explanatory variables with an acknowledged (or, at least, hypothesized) influence over the response variables. But we want to account for (or subtract or partial-out) such an influence before focusing onthe influence ofthe variables ofprime interest. As an example, let us imagine situation where we study effects of soil properties and type of management (hay-cutting or pasturing) on the plant species composition of meadows in a particular area. In one analysis, we might be interested in the effect ofsoilproperties, paying no attention to the management regime. In this analysis, we use the grassland composition as the species data (i.e. primary data set, with individual plant species acting as individual response variables) and the measured soil properties as the environmental variables (explanatory variables). Based on the results, we can make conclusions about the preferences of individual plant species' populations in respect to particular environmental gradients which are described (more or less appropriately) by the measured soil properties. Similarly, we can ask, how the management style influences plant composition. In this case, the variables describing the management regime act as the environmental variables. Naturally, we might expect that the management also influences the soil properties and this is probably one of the ways the management acts upon the community composition. Based on that expectation, we might ask about the influence of the management regime beyondthat mediated throughthe changes ofsoilproperties. To address suchquestion, we use the variables describing the management regime as the environmentalvariables and the measured properties ofsoilas the covariables. One of the keys to understanding the terminology used by the CANOCO program is to realize that the data refered to byCANOCO as the species data might, in fact, be any kind of the data with variables whose values we want to predict. So, if we would like, for example, predict the contents of various metal ions in river water, based on the landscape composition in the catchment area, then the individual ions' concentrations would represent the individual "species" in the CANOCO terminology. If the species data really represent the species composition of a community, then we usually apply various abundance measures, including counts, ┼ Thereisaninconsistencyintheterminology:inclassicalstatisticalterminology,samplemeans acollectionofsamplingunits,usuallyselectedatrandomfromthepopulation.Inthecommunity ecology,sampleisusuallyusedforadescriptiongofasamplingunit.Thisusagewillbefollowedin thistext.Thegeneralstatisticalpackagesusethetermcasewiththesamemeaning. 9 frequency estimates and biomass estimates. Alternatively, we might have information only on the presence orthe absence ofthe species in individual samples. Also among the explanatory variables (I use this term as covering both the environmental variables and covariables in CANOCO terminology), we might have the quantitative and the presence-absence variables. These various kinds of data values are treatedin more detaillater inthis chapter. 1.3. Analyses If we try to model one or more response variables, the appropriate statistical modeling methodologydepends on whether we model each of the response variables separately and whether we have any explanatory variables (predictors) available when building the model. The following table summarizes the most important statistical methodologies used inthe different situations: Response Predictor(s) variable ... Absent Present ... is one • distributionsummary • regression models s.l. • indirect gradient analysis (PCA, • direct gradient analysis ... are many DCA, NMDS) • constrained cluster analysis • cluster analysis • discriminant analysis (CVA) Table1-1Thetypesofthestatisticalmodels If we look just on a single response variable and there are no predictors available, thenwe can hardlydo morethansummarize the distributional properties of that variable. In the case of the multivariate data, we might use either the ordination approach represented bythe methods of indirect gradient analysis (most prominent are the principal components analysis - PCA, detrended correspondence analysis - DCA, and non-metric multidimensional scaling - NMDS) or we can try to (hierarchically) divide our set of samples into compact distinct groups (methods of the cluster analysis s.l., see the chapter 5). If we have one or more predictors available and we model the expected values ofa single response variable, then we use the regression models in the broad sense, i.e. including both the traditional regression methods and the methods of analysis of variance (ANOVA) and analysis of covariance (ANOCOV). This group of method is unified under the so-called general linear model and was recently further extended and enhanced by the methodology of generalized linear models (GLM) and generalized additive models (GAM). Further information on these models is provided inthe chapter 11. 1.4. Response (species) data Our primary data (often called, based on the most typical context of the biological community data, the species data) can be often measured in a quite precise (quantitative) way. Examples are the dry weight of the above-ground biomass of plant species, counts of specimens of individual insect species falling into soil traps or the percentage cover of individual vegetation types in a particular landscape. We 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.