2 Published by Pelagic Publishing www.pelagicpublishing.com PO Box 725, Exeter, EX1 9QU Community Ecology ® Analytical Methods Using R and Excel ISBN 978–1–907807–61–9 (Pbk) ISBN 978–1–907807–62–6 (Hbk) ISBN 978–1–907807–63–3 (ePub) ISBN 978–1–907807–65–7 (PDF) ISBN 978–1–907807–64–0 (Mobi) Copyright © 2014 Mark Gardener All rights reserved. No part of this document may be produced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise without prior permission from the publisher. While every effort has been made in the preparation of this book to ensure the accuracy of the information presented, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Pelagic Publishing, its agents and distributors will be held liable for any damage or loss caused or alleged to be caused directly or indirectly by this book. Windows, Excel and Word and are trademarks of the Microsoft Corporation. For more information visit www. microsoft.com. OpenOffice.org is a trademark of Oracle. For more information visit www.openoffice.org. LibreOffice is a trademark of The Document Foundation. For more information visit www.libreoffice.org. Apple Macintosh is a trademark of Apple Inc. For more information visit www.apple.com. British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library. Cover image: Over under water picture, showing Fairy Basslets (Pseudanthias tuka) amongst Cabbage Coral (Turbinaria reniformis) and tropical island in the background. Indo Pacific. © David Fleetham/OceanwideImages.com Typeset by Swales & Willis Ltd, Exeter, Devon, UK 3 About the author Mark Gardener (www.gardenersown.co.uk) is an ecologist, lecturer and writer working in the UK. His primary area of research was in pollination ecology and he has worked in the UK and around the world (principally Australia and the United States). Since his doctorate he has worked in many areas of ecology, often as a teacher and supervisor. He believes that ecological data, especially community data, are the most complicated and ill-behaved and are consequently the most fun to work with. He was introduced to R by a like-minded pedant whilst working in Australia during his doctorate. Learning R was not only fun but opened up a new avenue, making the study of community ecology a whole lot easier. He is currently self-employed and runs courses in ecology, data analysis and R for a variety of organisations. Mark lives in rural Devon with his wife Christine, a biochemist who consequently has little need of statistics. Acknowledgements There are so many people to thank that it is hard to know where to begin. I am sure that I will leave some people out, so I apologise in advance. Thanks to Richard Rowe (James Cook University) for inspiring me to use R. Data were contributed from various sources, especially from MSc students doing Biological Recording; thanks especially to Robin Cure, Jessie MacKay, Mark Latham, John Handley and Hing Kin Lee for your hard-won data. The MSc programme helped me to see the potential of ‘proper’ biological records and I thank Sarah Whild for giving me the opportunity to undertake some teaching on the course. Thanks also to the Field Studies Council in general: many data examples have arisen from field courses I’ve been involved with. Software used ® Several versions of Microsoft’s Excel spreadsheet were used in the preparation of this book. Most of the ® examples presented show version 2007 for Microsoft Windows although other versions may also be illustrated. The main version of the R program used was 2.12.1 for Macintosh: The R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0, http://www.R-project.org/. Other versions were used in testing code. Support material Free support material is available on the Community Ecology companion website, which can be accessed via the book’s resources page: http://www.pelagicpublishing.com/community-ecology-resources.html Reader feedback We welcome feedback from readers – please email us at [email protected] and tell us what you thought about this book. Please include the book title in the subject line of your email. Publish with Pelagic Publishing We publish scientific books to the highest editorial standards in all life science disciplines, with a particular focus on ecology, conservation and environment. Pelagic Publishing produces books that set new benchmarks, share advances in research methods and encourage and inform wildlife investigation for all. If you are interested in publishing with Pelagic please contact [email protected] with a synopsis of your book, a brief history of your previous written work and a statement describing the impact you would like your book to have on readers. 4 5 Contents Introduction 1. Starting to look at communities 1.1 A scientific approach 1.2 The topics of community ecology 1.3 Getting data – using a spreadsheet 1.4 Aims and hypotheses 1.5 Summary 1.6 Exercises 2. Software tools for community ecology 2.1 Excel 2.2 Other spreadsheets 2.3 The R program 2.4 Summary 2.5 Exercises 3. Recording your data 3.1 Biological data 3.2 Arranging your data 3.3 Summary 3.4 Exercises 4. Beginning data exploration: using software tools 4.1 Beginning to use R 4.2 Manipulating data in a spreadsheet 4.3 Getting data from Excel into R 4.4 Summary 4.5 Exercises 5. Exploring data: choosing your analytical method 5.1 Categories of study 5.2 How ‘classic’ hypothesis testing can be used in community studies 5.3 Analytical methods for community studies 70 5.4 Summary 5.5 Exercises 6. Exploring data: getting insights 6.1 Error checking 6.2 Adding extra information 6.3 Getting an overview of your data 6.4 Summary 6.5 Exercises 6 7. Diversity: species richness 7.1 Comparing species richness 7.2 Correlating species richness over time or against an environmental variable 7.3 Species richness and sampling effort 7.4 Summary 7.5 Exercises 8. Diversity: indices 8.1 Simpson’s index 8.2 Shannon index 8.3 Other diversity indices 8.4 Summary 8.5 Exercises 9. Diversity: comparing 9.1 Graphical comparison of diversity profiles 9.2 A test for differences in diversity based on the t-test 9.3 Graphical summary of the t-test for Shannon and Simpson indices 9.4 Bootstrap comparisons for unreplicated samples 9.5 Comparisons using replicated samples 9.6 Summary 9.7 Exercises 10. Diversity: sampling scale 10.1 Calculating beta diversity 10.2 Additive diversity partitioning 10.3 Hierarchical partitioning 10.4 Group dispersion 10.5 Permutation methods 10.6 Overlap and similarity 10.7 Beta diversity using alternative dissimilarity measures 10.8 Beta diversity compared to other variables 10.9 Summary 10.10 Exercises 11. Rank abundance or dominance models 11.1 Dominance models 11.2 Fisher’s log-series 11.3 Preston’s lognormal model 11.4 Summary 11.5 Exercises 12. Similarity and cluster analysis 12.1 Similarity and dissimilarity 12.2 Cluster analysis 12.3 Summary 12.4 Exercises 13. Association analysis: identifying communities 13.1 Area approach to identifying communities 7 13.2 Transect approach to identifying communities 13.3 Using alternative dissimilarity measures for identifying communities 13.4 Indicator species 13.5 Summary 13.6 Exercises 14. Ordination 14.1 Methods of ordination 14.2 Indirect gradient analysis 14.3 Direct gradient analysis 14.4 Using ordination results 14.5 Summary 14.6 Exercises Appendices Bibliography Index 8 Introduction Interactions between species are of fundamental importance to all living systems and the framework we have for studying these interactions is community ecology. This is important to our understanding of the planet’s biological diversity and how species interactions relate to the functioning of ecosystems at all scales. Species do not live in isolation and the study of community ecology is of practical application in a wide range of conservation issues. The study of ecological community data involves many methods of analysis. In this book you will learn many of the mainstays of community analysis including: diversity, similarity and cluster analysis, ordination and multivariate analyses. This book is for undergraduate and postgraduate students and researchers seeking a step-by-step methodology for analysing plant and animal communities using R and Excel. Microsoft’s Excel spreadsheet is virtually ubiquitous and familiar to most computer users. It is a robust program that makes an excellent storage and manipulation system for many kinds of data, including community data. The R program is a powerful and flexible analytical system able to conduct a huge variety of analytical methods, which means that the user only has to learn one program to address many research questions. Its other advantage is that it is open source and therefore free. Novel analytical methods are being added constantly to the already comprehensive suite of tools available in R. What you will learn in this book This book is intended to give you some insights into some of the analytical methods employed by ecologists in the study of communities. The book is not intended to be a mathematical or theoretical treatise but inevitably there is some maths! I’ve tried to keep this in the background and to focus on how to undertake the appropriate analysis at the right time. There are many published works concerning ecological theory; this book is intended to support them by providing a framework for learning how to analyse your data. The book does not cover every aspect of community ecology. There are a few minor omissions – I hope to cover some of these in later works. How this book is arranged There are four main strands to scientific study: planning, recording, analysis and reporting. The first few chapters deal with the planning and recording aspects of study. You will see how to use the main software tools, Excel and R, to help you arrange and begin to make sense of your data. Later chapters deal more explicitly with the grand themes of community ecology, which are: • Diversity – the study of diversity is split into several chapters covering species richness, diversity indices, beta diversity and dominance–diversity models. • Similarity and clustering – this is contained in one chapter covering similarity, hierarchical clustering and clustering by partitioning. • Association analysis – this shows how you can identify which species belong to which community by studying the associations between species. The study of associations leads into the identification of indicator species. • Ordination – there is a wide range of methods of ordination and they all have similar aims; to represent complicated species community data in a more simplified form. The reporting element is not covered explicitly; however the presentation of results is shown throughout the book. A more dedicated coverage of statistical and scientific reporting can be found in my previous work, Statistics for Ecologists Using R and Excel. Throughout the book you will see example exercises that are intended for you to try out. In fact they are expressly aimed at helping you on a practical level – reading how to do something is fine but you need to do it 9 for yourself to learn it properly. The Have a Go exercises are hard to miss. Have a Go: Learn something by doing it The Have a Go exercises are intended to give you practical experience at various analytical methods. Many will refer to supplementary data, which you can get from the companion website. Some data are intended to be used in Excel and others are for using with R. Most of the Have a Go exercises utilise data that is available on the companion website. The material on the website includes various spreadsheets, some containing data and some allowing analytical processes. The CERE.RData file is the most helpful – this is an R file, which contains data and custom R commands. You can use the data for the exercises (and for practice) and the custom commands to help you carry out a variety of analytical processes. The custom commands are mentioned throughout the book and the website contains a complete directory. You will also see tips and notes, which will stand out from the main text. These are ‘useful’ items of detail pertaining to the text but which I felt were important to highlight. Tips and Notes: Useful additional information The companion website contains supplementary data, which you can use for the exercises. There are also spreadsheets and useful custom R commands that you can use for your own analyses. At the end of each chapter there is a summary table to help give you an overview of the material in that chapter. There are also some self-assessment exercises for you to try out. The answers are in Appendix 1. Support files The companion website (see resources page: http://www.pelagicpublishing.com/community-ecology- resources.html) contains support material that includes spreadsheet calculations and data in Excel and CSV (comma separated values) format. There is also an R data file, which contains custom R commands and datasets. Instructions on how to load the R data into your copy of R are on the website. In brief you need to use the load() command, for Windows or Mac you can type the following: load(file.choose()) This will open a browser window and you can select the CERE.RData file. On Linux machines you’ll need to replace the file.choose() part with the exact filename in quotes, see the website for more details. I hope that you will find this book helpful, useful and interesting. Above all, I hope that it helps you to discover that analysis of community ecology is not the ‘boring maths’ at the end of your fieldwork but an enjoyable and enlightening experience. Mark Gardener, Devon 2013 10