Introduction to WinBUGS for Ecologists: Bayesian approach to regression, ANOVA, mixed models and related analyses PDF

300 Pages·2010·4.68 MB·English
ForinformationonallAcademicPresspublications visitourWebsiteatwww.books.elsevier.com Typesetby:diacriTech,Chennai,India PrintedandboundinChina 10 10 9 8 7 6 5 4 3 2 1 A Creed for Modeling To make sense of an observation, everybody needs a model … whether he or she knows it or not. It is difficult to imagine another method that so effectively fosters clear thinking about a system than the use of a model written in the language of algebra. v Foreword The title of Marc Kéry’s book, Introduction to WinBUGS for Ecologists, provides some good hints about its content. From this title, we might guess that the book focuses on a piece of software, WinBUGS, that the treatment will not presuppose extensive knowledge of this software, and that the focus will be on the kinds of questions and inference pro- blems that are faced by scientists who do ecology. So why WinBUGS and why ecologists? Of course, the most basic answer to this question is that Marc Kéry is an ecologist who has found WinBUGS to be extremely useful in his own work. But the important question then becomes, “Is Marc correct that WinBUGS can become an important tool for other eco- logists?”Theultimateutilityofthisbookwilldependontheanswertothis question, so I will try to develop a response here. WinBUGS is a flexible, user-friendly software package that permits Bayesian inference from data, based on user-defined statistical models. Because the models must be completely specified by the user, WinBUGS may not be viewed by some as being as user-friendly as older statistical software packages that provide classical inference via methods such as maximum likelihood. So why should an ecologist invest the extra time and effort to learn WinBUGS? I can think of at least two reasons. The firstisthatallinferenceisbasedonunderlyingmodels(abasicconstraint of the human condition). In the case of ecological data, the models repre- sent caricatures of the processes that underlie both the data collection methods and the dynamics of ecological interest. I confess to knowing from personal experience that it is possible to obtain and “interpret” results of analyses from a standard statistical software package, without properly understanding the underlying model(s) on which inference was based.Incontrast,having tospecify amodel in WinBUGS insuresabasic understanding that need not accompany use of many common statistical software packages. So the necessity of specifying models, and thus of thinkingclearlyaboutunderlyingsamplingandecologicalprocesses,pro- vide a good reason for ecologists to learn and use WinBUGS. Asecondreasonisthatecologicaldataaretypicallygeneratedbymulti- ple processes, each of which induces variation. Frequently, such multiple sourcesofvariationdonotcorrespondcloselytomodelsavailableinmore classicalstatisticalsoftwarepackages.Throughmycareerasaquantitative xi xii FOREWORD ecologist, hundreds of field ecologists have brought me data sets asking for “standard” analyses, suggesting that I must have seen many similar data sets and that analysis of their data should thus be relatively quick and easy. However, despite these claims, I can’t recall ever having seen a data set for which a standard, off-the-shelf analysis was strictly appro- priate.Therearealwaysaspectsofeitherthestudiedsystemor,moretypi- cally, the data collection process that requires nonstandard models. It was noted above that most ecological data sets are generated by at least two classes of process: ecological and sampling. The ecological process generates the true patterns that our studies are designed to investigate, and conditional on this truth, the sampling process generates the data thatweactuallyobtain.Thedataarethusgeneratedbymultipleprocesses that are best viewed and modeled as hierarchical. Indeed, hierarchical models appropriate for such data are readily constructed in WinBUGS, and hierarchical Bayes provides a natural approach to inference for such data.Relativeeaseofimplementationforcomplexhierarchicalmodelsisa compelling reason for ecologists to become proficient with WinBUGS. For many problems, use of WinBUGS to implement a complex model canresultinsubstantialsavingsoftimeandeffort.Iamalwaysimpressed by the small amount of WinBUGS code needed to provide inference for capture–recapture models that are represented by extremely complicated-lookinglikelihoodfunctions.However,evenmoreimportant thanproblemsthatcanbesolvedmoreeasilyusingWinBUGSthanusing traditional likelihood approaches are the problems that biologists would be unable to solve using these traditional approaches. For example, natural variation among individual organisms of a species will always existforanycharacteristicunderinvestigation.Suchvariationispervasive and provides the raw material for Darwinian evolution by natural selec- tion, the central guiding paradigm of all biological sciences. Even within groupsofanimalsdefinedbyage,sex,size,andotherrelevantcovariates, ecologists still expect variation among individuals in virtually any attri- bute of interest. In capture–recapture modeling, for example, we would like to develop models capable of accounting for variation in capture probabilities and survival probabilities among individuals within any defined demographic group. However, in the absence of individual covariates,itissimplynotpossibletoestimateaseparatecaptureandsur- vivalprobabilityforeachindividualanimal. Butwecanconsideradistri- bution of such probabilities across individuals and attempt to estimate characteristicsofthatdistribution.InWinBUGS,wecandevelophierarch- ical models in which among-individual variation is treated as a random effect,withthisportionoftheinferenceproblembecomingoneofestimat- ing theparametersofthedistributions thatdescribethisindividualvaria- tion. Incontrast, I(andmostecologists)would notknowhow to beginto xiii FOREWORD construct even moderately complex capture–recapture models with random individual effects using a likelihood framework. So WinBUGS provides access to models and inferences that would be otherwise unap- proachable for most ecological scientists. I conclude that WinBUGS, specifically, and hierarchical Bayesian ana- lysis, generally, are probably very good things for ecologists to learn. However, we should still ask whether the book’s contents and Marc’s tutorialwritingstylearelikelytoprovidereaderswithanadequateunder- standing of this material. My answer to this question is a resounding “Yes!” I especially like Marc’s use of simulation to develop the data sets used in exercises and analyses throughout the book, as this approach effectivelyexploitsthecloseconnectionbetweendataanalysisandgenera- tion.Statisticalmodelsareintendedtobesimplifiedrepresentationsofthe processes that generate real data, and the repeated interplay between simulationandanalysisprovidesanextremelyeffectivemeansofteaching theabilitytodevelopsuchmodelsandunderstandtheinferencesthatthey produce. Finally,Iliketheselectionofmodelsthatareexploredinthisbook.The bulkofthebookfocusesongeneralmodelclassesthatareusedfrequently by ecologists, as well as by scientists in other disciplines: linear models, generalized linear models, linear mixed models, and generalized linear mixed models. The learn-by-example approach of simulating data sets andanalyzingthemusingbothWinBUGSandclassicalapproachesimple- mentedinRprovideaneffectivewaynotonlytoteachWinBUGSbutalso to provide a general understanding of these widely used classes of statis- tical model. The two chapters dealing with specific classes of ecological models, site occupancy models, and binomial mixture abundance models then provide the reader with an appreciation of the need to model both samplingandecologicalprocessesinordertoobtainreasonableinferences using data produced by actual ecological sampling. Indeed, it is in the developmentandapplication ofmodelstailoredtodeal withspecificeco- logicalsamplingmethodsthatthepowerandutilityofWinBUGSaremost readily demonstrated. However,Ibelievethatthebook,IntroductiontoWinBUGSforEcologists, is far too modest and doesnot capture the central reasons why ecologists should read this book and work through the associated examples and exercises. Most important, I believe that the ecologist who gives this book a serious read will emerge with a good understanding of statistical models as abstract representations of the various processes that give rise to a data set. Such an understanding is basic to the development of infer- encemodelstailoredtospecificsamplingandecologicalscenarios.Abene- fitthatwillaccompanythisgeneralunderstandingisspecificinsightsinto majorclassesofstatisticalmodelsthatareusedinecologyandotherareas xiv FOREWORD ofscience.Inaddition,thetutorialdevelopmentofmodelsandanalysesin WinBUGS and R should leave the reader with the ability to implement bothstandardandtailoredmodels.Ibelievethatitwouldbehardtoover- state the value of adding to an ecologist’s toolbox this ability to develop and then implement models tailored to specific studies. Jim Nichols Patuxent Wildlife Research Center, Laurel, MD Preface This book is a gentle introduction to applied Bayesian modeling for ecologists using the highly acclaimed, free WinBUGS software, as run from program R. The bulk of the book is formed by a very detailed yet, I hope, enjoyable tutorial consisting of commented example analyses. Theseformaprogressionfromthetriviallysimpletothemoderatelycom- plex and cover linear, generalized linear (GLM), mixed, and generalized linearmixedmodels(GLMMs). Along theway,acomprehensive andlar- gelynonmathematicaloverviewisgivenoftheseimportantmodelclasses, whichrepresentthecoreofmodernappliedstatisticsandarethosewhich ecologists use most in their work. I provide complete R and WinBUGS code for all analyses; this allows you to follow them step-by-step and in the desired pace. Being an ecologist myself and having collaborated with many ecologist colleagues, I am convinced that the large majority of us best understands more complex statistical methods by first executing worked examples step-by-step and then by modifying these template analyses to fit their own data. AllanalyseswithWinBUGSaredirectlycomparedwithanalysesofthe same data using standard R functions such as lm(), glm(), and lmer(). Hence,Iwouldhopethatthisbookwillappealtomostecologistsregard- less of whether they ultimately choose a Bayesian or a classical mode of inference for their analyses. In addition, the comparison of classical and Bayesian analyses should help demystifythe Bayesianapproachto statis- ticalmodeling.Akeyfeatureofthisbookisthatalldatasetsaresimulated (=“assembled”)before analysis(=“disassembly”)andthat fullycommen- tedRcodeisprovidedforboth.Datasimulation,alongwiththepowerful, yet intuitive model specification language in WinBUGS, represents a unique way to truly understand that core of applied statistics in much of ecology and other quantitative sciences, generalized linear models (GLMs) and mixed models. Thisbooktracesmyownjourneyasaquantitativeecologisttowardan understanding of WinBUGS for Bayesian statistical modeling and of GLMs and mixed models. Both the simulation of data sets and model fit- ting in WinBUGS have been crucial for my own advancement in these respects. The book grew out of the documentation for a 1-week course that I teach at the graduate school for life sciences at the University of Zürich, Switzerland, and elsewhere to similar audiences. Therefore, the typical readership would be expected to be advanced undergraduate, xv xvi PREFACE graduate students, and researchers in ecology and other quantitative sciences. To maximize your benefits, you should have some basic knowl- edge in R computing and statistics at the level of the linear model (LM) (i.e., analysis of variance and regression). After three introductory chapters, normal LMs are dealt with in Chapters 4–11. In Chapter 9 and especially Chapter 12, they are general- ized to contain more than a single stochastic process, i.e., to the (normal) linear mixed model (LMM). Chapter 13 introduces the GLM, i.e., the extension of the normal LM to allow error distributions other than the normal. Chapters 13–15 feature Poisson GLMs and Chapters 17–18 bino- mial GLMs. Finally, the GLM, too, is generalized to contain additional sourcesofrandomvariationtobecomeaGLMMinChapter16foraPois- son exampleandinChapter19forabinomialexample.Istrongly believe that this step-up approach, where the simplest of all LMs, that “of the mean” (Chapter 4), is made progressively more complex until we have a GLMM, helps you to get a synthetic understanding of these model classes, which have such a huge importance for applied statistics in ecol- ogy and elsewhere. The final two main chapters go one step further and showcase two fairly novel and nonstandard versions of a GLMM. The first is the site- occupancy model for species distributions (Chapter 20; MacKenzie et al., 2002,2003,2006),andthesecondisthebinomial(orN-)mixturemodelfor estimation and modeling of abundance (Chapter 21; Royle, 2004). These models allow one to make inference about two pivotal quantities in ecology: distribution and abundance of a species (Krebs, 2001). Impor- tantly, these models fully account for the imperfect detection of occupied sitesandindividuals,respectively.Arguably,imperfectdetectionisahall- markofallecologicalfieldstudies.Hence,thesemodelsareextremelyuse- ful for ecologists but owing to their relative novelty are not yet widely known. Also, they are not usually described within the GLM framework, but Ibelievethatrecognizing how theyfitinto thelargerpictureoflinear models is illuminating. The Bayesian analysis of these two models offers clear benefits over that by maximum likelihood, for instance, in the ease withwhichfinite-sampleinferenceisobtained(RoyleandKéry,2007),but alsojustheuristically,sincethesemodelsareeasiertounderstandwhenfit in WinBUGS. Owingtoitsgentletutorialstyle,thisbookshouldbeexcellenttoteach yourself. I hope that you can learn much about Bayesian analysis using WinBUGS and about linear statistical models and their generalizations bysimplyreadingit.However,themosteffectivewaytodothisobviously is by sitting at a computer and working through all examples, as well as by solving the exercises. Fairly often, I just give the code required to pro- duce a certain output but do not show the actual result, so to fully grasp what is happening, it is best to execute all code. xvii PREFACE Ifthebookisusedinaclassroomsettingandplentyoftimeisgivento the solvingof exercises, then up totwoweeksmightberequiredtocover all material. Alternatively, some chapters may be skipped or left for the students to go through for themselves. Chapters 1–5, inclusive, contain key material. If you already have experience with Bayesian inference, you may skip Chapters 1–2. If you understand well (generalized) linear models, you may also skip Chapter 6 and just skim Chapters 7–11 to seewhetheryoucaneasilyfollow.Chapters9and12arethekeychapters for your understanding of mixed models, whether LMM or GLMM, and should not be skipped. The same goes for Chapter 13, which introduces GLMs.ThenextChapters(14–19)areexamplesof(mixed)GLMsandmay be sampled selectively as desired. There is some redundancy in content, e.g., between the following pairs of chapters, which illustrate the same kind of model for a Poisson and a binomial response: 13/17, 15/18, and 16/19. Finally, Chapters 20 and 21 are somewhat more specialized and may not have the same importance for all readers (though I find them to be the most fascinating models in the whole book). As much as I believe in the great benefits of data simulation for your understanding of a model, data assembly at the start of each chapter may be skipped. You can download all data sets from the book Web site orsimplyexecutetheRcodetogenerateyourowndatasetsandonlygoto the line-by-line mode of study where the analysis begins. Similarly, com- parison of the Bayesian solutions with the maximum likelihood estimates canbedroppedbysimplyfittingthemodelsinWinBUGSandnotinRalso. All R and WinBUGS code in this book can be downloaded from the book Web site at http://www.mbr-pwrc.usgs.gov/software/kerybook/ maintained by Jim Hines at the Patuxent Wildlife Research Center. The Web site also contains some bonus material: a list of WinBUGS tricks, an Errata page, solutions to exercises, a text file containing all the code shown in the book, as well as the actual data sets that were used to pro- duce the output shown in the book. It also contains a real data set (the Swiss hare data) we deal with extensively in the exercises. The Swiss hare data contain replicated counts of Brown hares (Lepus europaeus: see Chapter 13) conducted over 17 years (1992–2008) at 56 sites in eight regions of Switzerland. Replicated means that each year two counts were conducted during a 2-week period. Sites vary in area and elevation andbelongtotwotypesofhabitat(arableandgrassland):hence,thereare both continuous and discrete explanatory variables. Unbounded counts may be modeled as Poisson random variables with log(area) as an offset, butwecanalsotreattheobserveddensity(i.e.,theratioofacounttoarea) as a normal or the incidence of a density exceeding some threshold as a binomialrandomvariable.Hence,youcanpracticewithallmodelsshown inthisbookandmeetmanyfeaturesofgenuinedatasetssuchasmissing values and other nuisances of real life.

Bayesian statistics has exploded into biology and its sub-disciplines such as ecology over the past decade. The free software program WinBUGS and its open-source sister OpenBugs is currently the only flexible and general-purpose program available with which the average ecologist can conduct their ow
