Table Of ContentAcademicPressisanimprintofElsevier
30CorporateDrive,Suite400,Burlington,MA01803,USA
525BStreet,Suite1900,SanDiego,CA921014495,USA
84Theobald’sRoad,LondonWC1X8RR,UK
Radarweg29,POBox211,1000AEAmsterdam,TheNetherlands
Firstedition2010
Copyright©2010,ElsevierInc.Allrightsreserved.
Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmitted
inanyformorbyanymeanselectronic,mechanical,photocopying,recording,orotherwise
withoutthepriorwrittenpermissionofthepublisher.
PermissionsmaybesoughtdirectlyfromElsevier’sScience&TechnologyRights
DepartmentinOxford,UK:phone(+44)(0)1865843830;fax(+44)(0)1865853333;
email:permissions@elsevier.com.Alternativelyyoucansubmityourrequestonline
byvisitingtheElsevierWebsiteathttp://elsevier.com/locate/permissions,andselecting
ObtainingpermissiontouseElseviermaterial.
Notice
Noresponsibilityisassumedbythepublisherforanyinjuryand/ordamagetopersonsor
propertyasamatterofproductsliability,negligence,orotherwiseorfromanyuseor
operationofanymethods,products,instructions,orideascontainedinthematerialherein.
Becauseofrapidadvancesinthemedicalsciences,inparticular,independentverificationof
diagnosesanddrugdosagesshouldbemade.
LibraryofCongressCataloginginPublicationData
Kéry,Marc.
IntroductiontoWinBUGSforecologists:ABayesianapproachtoregression,ANOVA,
mixedmodelsandrelatedanalyses/MarcKéry. 1sted.
p.cm.
ISBN:9780123786050
1.Biometry Dataprocessing.2.WinBUGS.I.Title.
QH323.5.K472010
577.01'5118 dc22 2009048549
BritishLibraryCataloguinginPublicationData
AcataloguerecordforthisbookisavailablefromtheBritishLibrary.
ForinformationonallAcademicPresspublications
visitourWebsiteatwww.books.elsevier.com
Typesetby:diacriTech,Chennai,India
PrintedandboundinChina
10 10 9 8 7 6 5 4 3 2 1
A Creed for Modeling
To make sense of an observation, everybody needs a model …
whether he or she knows it or not.
It is difficult to imagine another method
that so effectively fosters clear thinking about a
system than the use of a model written in the language of algebra.
v
Foreword
The title of Marc Kéry’s book, Introduction to WinBUGS for Ecologists,
provides some good hints about its content. From this title, we might
guess that the book focuses on a piece of software, WinBUGS, that the
treatment will not presuppose extensive knowledge of this software,
and that the focus will be on the kinds of questions and inference pro-
blems that are faced by scientists who do ecology. So why WinBUGS
and why ecologists? Of course, the most basic answer to this question is
that Marc Kéry is an ecologist who has found WinBUGS to be extremely
useful in his own work. But the important question then becomes, “Is
Marc correct that WinBUGS can become an important tool for other eco-
logists?”Theultimateutilityofthisbookwilldependontheanswertothis
question, so I will try to develop a response here.
WinBUGS is a flexible, user-friendly software package that permits
Bayesian inference from data, based on user-defined statistical models.
Because the models must be completely specified by the user, WinBUGS
may not be viewed by some as being as user-friendly as older statistical
software packages that provide classical inference via methods such as
maximum likelihood. So why should an ecologist invest the extra time
and effort to learn WinBUGS? I can think of at least two reasons. The
firstisthatallinferenceisbasedonunderlyingmodels(abasicconstraint
of the human condition). In the case of ecological data, the models repre-
sent caricatures of the processes that underlie both the data collection
methods and the dynamics of ecological interest. I confess to knowing
from personal experience that it is possible to obtain and “interpret”
results of analyses from a standard statistical software package, without
properly understanding the underlying model(s) on which inference was
based.Incontrast,having tospecify amodel in WinBUGS insuresabasic
understanding that need not accompany use of many common statistical
software packages. So the necessity of specifying models, and thus of
thinkingclearlyaboutunderlyingsamplingandecologicalprocesses,pro-
vide a good reason for ecologists to learn and use WinBUGS.
Asecondreasonisthatecologicaldataaretypicallygeneratedbymulti-
ple processes, each of which induces variation. Frequently, such multiple
sourcesofvariationdonotcorrespondcloselytomodelsavailableinmore
classicalstatisticalsoftwarepackages.Throughmycareerasaquantitative
xi
xii
FOREWORD
ecologist, hundreds of field ecologists have brought me data sets asking
for “standard” analyses, suggesting that I must have seen many similar
data sets and that analysis of their data should thus be relatively quick
and easy. However, despite these claims, I can’t recall ever having seen
a data set for which a standard, off-the-shelf analysis was strictly appro-
priate.Therearealwaysaspectsofeitherthestudiedsystemor,moretypi-
cally, the data collection process that requires nonstandard models. It
was noted above that most ecological data sets are generated by at least
two classes of process: ecological and sampling. The ecological process
generates the true patterns that our studies are designed to investigate,
and conditional on this truth, the sampling process generates the data
thatweactuallyobtain.Thedataarethusgeneratedbymultipleprocesses
that are best viewed and modeled as hierarchical. Indeed, hierarchical
models appropriate for such data are readily constructed in WinBUGS,
and hierarchical Bayes provides a natural approach to inference for such
data.Relativeeaseofimplementationforcomplexhierarchicalmodelsisa
compelling reason for ecologists to become proficient with WinBUGS.
For many problems, use of WinBUGS to implement a complex model
canresultinsubstantialsavingsoftimeandeffort.Iamalwaysimpressed
by the small amount of WinBUGS code needed to provide inference
for capture–recapture models that are represented by extremely
complicated-lookinglikelihoodfunctions.However,evenmoreimportant
thanproblemsthatcanbesolvedmoreeasilyusingWinBUGSthanusing
traditional likelihood approaches are the problems that biologists would
be unable to solve using these traditional approaches. For example,
natural variation among individual organisms of a species will always
existforanycharacteristicunderinvestigation.Suchvariationispervasive
and provides the raw material for Darwinian evolution by natural selec-
tion, the central guiding paradigm of all biological sciences. Even within
groupsofanimalsdefinedbyage,sex,size,andotherrelevantcovariates,
ecologists still expect variation among individuals in virtually any attri-
bute of interest. In capture–recapture modeling, for example, we would
like to develop models capable of accounting for variation in capture
probabilities and survival probabilities among individuals within any
defined demographic group. However, in the absence of individual
covariates,itissimplynotpossibletoestimateaseparatecaptureandsur-
vivalprobabilityforeachindividualanimal. Butwecanconsideradistri-
bution of such probabilities across individuals and attempt to estimate
characteristicsofthatdistribution.InWinBUGS,wecandevelophierarch-
ical models in which among-individual variation is treated as a random
effect,withthisportionoftheinferenceproblembecomingoneofestimat-
ing theparametersofthedistributions thatdescribethisindividualvaria-
tion. Incontrast, I(andmostecologists)would notknowhow to beginto
xiii
FOREWORD
construct even moderately complex capture–recapture models with
random individual effects using a likelihood framework. So WinBUGS
provides access to models and inferences that would be otherwise unap-
proachable for most ecological scientists.
I conclude that WinBUGS, specifically, and hierarchical Bayesian ana-
lysis, generally, are probably very good things for ecologists to learn.
However, we should still ask whether the book’s contents and Marc’s
tutorialwritingstylearelikelytoprovidereaderswithanadequateunder-
standing of this material. My answer to this question is a resounding
“Yes!” I especially like Marc’s use of simulation to develop the data sets
used in exercises and analyses throughout the book, as this approach
effectivelyexploitsthecloseconnectionbetweendataanalysisandgenera-
tion.Statisticalmodelsareintendedtobesimplifiedrepresentationsofthe
processes that generate real data, and the repeated interplay between
simulationandanalysisprovidesanextremelyeffectivemeansofteaching
theabilitytodevelopsuchmodelsandunderstandtheinferencesthatthey
produce.
Finally,Iliketheselectionofmodelsthatareexploredinthisbook.The
bulkofthebookfocusesongeneralmodelclassesthatareusedfrequently
by ecologists, as well as by scientists in other disciplines: linear models,
generalized linear models, linear mixed models, and generalized linear
mixed models. The learn-by-example approach of simulating data sets
andanalyzingthemusingbothWinBUGSandclassicalapproachesimple-
mentedinRprovideaneffectivewaynotonlytoteachWinBUGSbutalso
to provide a general understanding of these widely used classes of statis-
tical model. The two chapters dealing with specific classes of ecological
models, site occupancy models, and binomial mixture abundance models
then provide the reader with an appreciation of the need to model both
samplingandecologicalprocessesinordertoobtainreasonableinferences
using data produced by actual ecological sampling. Indeed, it is in the
developmentandapplication ofmodelstailoredtodeal withspecificeco-
logicalsamplingmethodsthatthepowerandutilityofWinBUGSaremost
readily demonstrated.
However,Ibelievethatthebook,IntroductiontoWinBUGSforEcologists,
is far too modest and doesnot capture the central reasons why ecologists
should read this book and work through the associated examples and
exercises. Most important, I believe that the ecologist who gives this
book a serious read will emerge with a good understanding of statistical
models as abstract representations of the various processes that give rise
to a data set. Such an understanding is basic to the development of infer-
encemodelstailoredtospecificsamplingandecologicalscenarios.Abene-
fitthatwillaccompanythisgeneralunderstandingisspecificinsightsinto
majorclassesofstatisticalmodelsthatareusedinecologyandotherareas
xiv
FOREWORD
ofscience.Inaddition,thetutorialdevelopmentofmodelsandanalysesin
WinBUGS and R should leave the reader with the ability to implement
bothstandardandtailoredmodels.Ibelievethatitwouldbehardtoover-
state the value of adding to an ecologist’s toolbox this ability to develop
and then implement models tailored to specific studies.
Jim Nichols
Patuxent Wildlife Research Center, Laurel, MD
Preface
This book is a gentle introduction to applied Bayesian modeling for
ecologists using the highly acclaimed, free WinBUGS software, as run
from program R. The bulk of the book is formed by a very detailed yet,
I hope, enjoyable tutorial consisting of commented example analyses.
Theseformaprogressionfromthetriviallysimpletothemoderatelycom-
plex and cover linear, generalized linear (GLM), mixed, and generalized
linearmixedmodels(GLMMs). Along theway,acomprehensive andlar-
gelynonmathematicaloverviewisgivenoftheseimportantmodelclasses,
whichrepresentthecoreofmodernappliedstatisticsandarethosewhich
ecologists use most in their work. I provide complete R and WinBUGS
code for all analyses; this allows you to follow them step-by-step and in
the desired pace. Being an ecologist myself and having collaborated with
many ecologist colleagues, I am convinced that the large majority of us
best understands more complex statistical methods by first executing
worked examples step-by-step and then by modifying these template
analyses to fit their own data.
AllanalyseswithWinBUGSaredirectlycomparedwithanalysesofthe
same data using standard R functions such as lm(), glm(), and lmer().
Hence,Iwouldhopethatthisbookwillappealtomostecologistsregard-
less of whether they ultimately choose a Bayesian or a classical mode of
inference for their analyses. In addition, the comparison of classical and
Bayesian analyses should help demystifythe Bayesianapproachto statis-
ticalmodeling.Akeyfeatureofthisbookisthatalldatasetsaresimulated
(=“assembled”)before analysis(=“disassembly”)andthat fullycommen-
tedRcodeisprovidedforboth.Datasimulation,alongwiththepowerful,
yet intuitive model specification language in WinBUGS, represents a
unique way to truly understand that core of applied statistics in much
of ecology and other quantitative sciences, generalized linear models
(GLMs) and mixed models.
Thisbooktracesmyownjourneyasaquantitativeecologisttowardan
understanding of WinBUGS for Bayesian statistical modeling and of
GLMs and mixed models. Both the simulation of data sets and model fit-
ting in WinBUGS have been crucial for my own advancement in these
respects. The book grew out of the documentation for a 1-week course
that I teach at the graduate school for life sciences at the University of
Zürich, Switzerland, and elsewhere to similar audiences. Therefore, the
typical readership would be expected to be advanced undergraduate,
xv
xvi
PREFACE
graduate students, and researchers in ecology and other quantitative
sciences. To maximize your benefits, you should have some basic knowl-
edge in R computing and statistics at the level of the linear model (LM)
(i.e., analysis of variance and regression).
After three introductory chapters, normal LMs are dealt with in
Chapters 4–11. In Chapter 9 and especially Chapter 12, they are general-
ized to contain more than a single stochastic process, i.e., to the (normal)
linear mixed model (LMM). Chapter 13 introduces the GLM, i.e., the
extension of the normal LM to allow error distributions other than the
normal. Chapters 13–15 feature Poisson GLMs and Chapters 17–18 bino-
mial GLMs. Finally, the GLM, too, is generalized to contain additional
sourcesofrandomvariationtobecomeaGLMMinChapter16foraPois-
son exampleandinChapter19forabinomialexample.Istrongly believe
that this step-up approach, where the simplest of all LMs, that “of the
mean” (Chapter 4), is made progressively more complex until we have
a GLMM, helps you to get a synthetic understanding of these model
classes, which have such a huge importance for applied statistics in ecol-
ogy and elsewhere.
The final two main chapters go one step further and showcase two
fairly novel and nonstandard versions of a GLMM. The first is the site-
occupancy model for species distributions (Chapter 20; MacKenzie et al.,
2002,2003,2006),andthesecondisthebinomial(orN-)mixturemodelfor
estimation and modeling of abundance (Chapter 21; Royle, 2004). These
models allow one to make inference about two pivotal quantities in
ecology: distribution and abundance of a species (Krebs, 2001). Impor-
tantly, these models fully account for the imperfect detection of occupied
sitesandindividuals,respectively.Arguably,imperfectdetectionisahall-
markofallecologicalfieldstudies.Hence,thesemodelsareextremelyuse-
ful for ecologists but owing to their relative novelty are not yet widely
known. Also, they are not usually described within the GLM framework,
but Ibelievethatrecognizing how theyfitinto thelargerpictureoflinear
models is illuminating. The Bayesian analysis of these two models offers
clear benefits over that by maximum likelihood, for instance, in the ease
withwhichfinite-sampleinferenceisobtained(RoyleandKéry,2007),but
alsojustheuristically,sincethesemodelsareeasiertounderstandwhenfit
in WinBUGS.
Owingtoitsgentletutorialstyle,thisbookshouldbeexcellenttoteach
yourself. I hope that you can learn much about Bayesian analysis using
WinBUGS and about linear statistical models and their generalizations
bysimplyreadingit.However,themosteffectivewaytodothisobviously
is by sitting at a computer and working through all examples, as well as
by solving the exercises. Fairly often, I just give the code required to pro-
duce a certain output but do not show the actual result, so to fully grasp
what is happening, it is best to execute all code.
xvii
PREFACE
Ifthebookisusedinaclassroomsettingandplentyoftimeisgivento
the solvingof exercises, then up totwoweeksmightberequiredtocover
all material. Alternatively, some chapters may be skipped or left for the
students to go through for themselves. Chapters 1–5, inclusive, contain
key material. If you already have experience with Bayesian inference,
you may skip Chapters 1–2. If you understand well (generalized) linear
models, you may also skip Chapter 6 and just skim Chapters 7–11 to
seewhetheryoucaneasilyfollow.Chapters9and12arethekeychapters
for your understanding of mixed models, whether LMM or GLMM, and
should not be skipped. The same goes for Chapter 13, which introduces
GLMs.ThenextChapters(14–19)areexamplesof(mixed)GLMsandmay
be sampled selectively as desired. There is some redundancy in content,
e.g., between the following pairs of chapters, which illustrate the same
kind of model for a Poisson and a binomial response: 13/17, 15/18, and
16/19. Finally, Chapters 20 and 21 are somewhat more specialized and
may not have the same importance for all readers (though I find them
to be the most fascinating models in the whole book).
As much as I believe in the great benefits of data simulation for your
understanding of a model, data assembly at the start of each chapter
may be skipped. You can download all data sets from the book Web site
orsimplyexecutetheRcodetogenerateyourowndatasetsandonlygoto
the line-by-line mode of study where the analysis begins. Similarly, com-
parison of the Bayesian solutions with the maximum likelihood estimates
canbedroppedbysimplyfittingthemodelsinWinBUGSandnotinRalso.
All R and WinBUGS code in this book can be downloaded from
the book Web site at http://www.mbr-pwrc.usgs.gov/software/kerybook/
maintained by Jim Hines at the Patuxent Wildlife Research Center. The
Web site also contains some bonus material: a list of WinBUGS tricks,
an Errata page, solutions to exercises, a text file containing all the code
shown in the book, as well as the actual data sets that were used to pro-
duce the output shown in the book. It also contains a real data set (the
Swiss hare data) we deal with extensively in the exercises. The Swiss
hare data contain replicated counts of Brown hares (Lepus europaeus: see
Chapter 13) conducted over 17 years (1992–2008) at 56 sites in eight
regions of Switzerland. Replicated means that each year two counts
were conducted during a 2-week period. Sites vary in area and elevation
andbelongtotwotypesofhabitat(arableandgrassland):hence,thereare
both continuous and discrete explanatory variables. Unbounded counts
may be modeled as Poisson random variables with log(area) as an offset,
butwecanalsotreattheobserveddensity(i.e.,theratioofacounttoarea)
as a normal or the incidence of a density exceeding some threshold as a
binomialrandomvariable.Hence,youcanpracticewithallmodelsshown
inthisbookandmeetmanyfeaturesofgenuinedatasetssuchasmissing
values and other nuisances of real life.
Description:Introduction to WinBUGS for ecologists : A Bayesian approach to regression, ANOVA, software packages that provide classical inference via methods such as . This book is a gentle introduction to applied Bayesian modeling for.