ebook img

Statistics and Scientific Method: An Introduction for Students and Researchers PDF

189 Pages·2011·4.63 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Statistics and Scientific Method: An Introduction for Students and Researchers

Statistics and Scientific Method Statistics and Scientific Method , First Edition, Peter J. Diggle, Amanda G. Chetwynd © Peter J. Diggle, Amanda G. Chetwynd 2011. Published in 2011 by Oxford University Press Statistics and Scientific Method An Introduction for Students and Researchers PETER J. DIGGLE and AMANDA G. CHETWYND Lancaster University 1 3 GreatClarendonStreet,Oxfordox26dp OxfordUniversityPressisadepartmentoftheUniversityofOxford. ItfurtherstheUniversity’sobjectiveofexcellenceinresearch,scholarship, andeducationbypublishingworldwidein Oxford NewYork Auckland CapeTown DaresSalaam HongKong Karachi KualaLumpur Madrid Melbourne MexicoCity Nairobi NewDelhi Shanghai Taipei Toronto Withofficesin Argentina Austria Brazil Chile CzechRepublic France Greece Guatemala Hungary Italy Japan Poland Portugal Singapore SouthKorea Switzerland Thailand Turkey Ukraine Vietnam OxfordisaregisteredtrademarkofOxfordUniversityPress intheUKandincertainothercountries PublishedintheUnitedStates byOxfordUniversityPressInc.,NewYork (cid:2)c PeterJ.DiggleandAmandaG.Chetwynd2011 Themoralrightsoftheauthorshavebeenasserted DatabaserightOxfordUniversityPress(maker) Firstpublished2011 Allrightsreserved.Nopartofthispublicationmaybereproduced, storedinaretrievalsystem,ortransmitted,inanyformorbyanymeans, withoutthepriorpermissioninwritingofOxfordUniversityPress, orasexpresslypermittedbylaw,orundertermsagreedwiththeappropriate reprographicsrightsorganization.Enquiriesconcerningreproduction outsidethescopeoftheaboveshouldbesenttotheRightsDepartment, OxfordUniversityPress,attheaddressabove Youmustnotcirculatethisbookinanyotherbindingorcover andyoumustimposethesameconditiononanyacquirer BritishLibraryCataloguinginPublicationData Dataavailable LibraryofCongressCataloginginPublicationData Dataavailable TypesetbySPIPublisherServices,Pondicherry,India PrintedinGreatBritain onacid-freepaperby CPIAntonyRowe,Chippenham,Wiltshire ISBN 978–0–19–954318–2(Hbk.) 978–0–19–954319–9(Pbk.) 1 3 5 7 9 10 8 6 4 2 To Mike, especially for Chapter 8 Statistics and Scientific Method , First Edition, Peter J. Diggle, Amanda G. Chetwynd © Peter J. Diggle, Amanda G. Chetwynd 2011. Published in 2011 by Oxford University Press Acknowledgements MostofthediagramsinthebookwereproducedusingR.Fortheremainder, we thank the following people and organizations: Figure 2.1. Institute of Astronomy library, University of Cambridge Figure 4.1. Professor Andy Cossins, University of Liverpool Figure 5.1. copyright Rothamsted Research Ltd. Figure5.3.copyrightphotographbyAntonyBarringtonBrown,reproduced with the permission of the Fisher Memorial Trust Figure 10.1. Devra Davis (www.environmentalhealthtrust.org) We have cited original sources of data in the text where possible, but would like here to add our thanks to: Dr Bev Abram for the Arabadopsis microarray data; Professor Nick Hewitt for the Bailrigg daily temperature data;DrRaquelMenezesandDrJos´eAngelFernandezfortheGalicialead pollutiondata;ProfessorStephenSennfortheasthmadata;CSIROCentre forIrrigationResearch,Griffith,NewSouthWales,fortheglyphosatedata; Dr Peter Drew for the survival data on dialysis patients; Professor Tanja Pless-Mulloli for the Newcastle upon Tyne black smoke pollution data. Statistics and Scientific Method , First Edition, Peter J. Diggle, Amanda G. Chetwynd © Peter J. Diggle, Amanda G. Chetwynd 2011. Published in 2011 by Oxford University Press Preface Statistics is the science of collecting and interpreting data. This makes it relevant to almost every kind of scientific investigation. In practice, most scientific data involve some degree of imprecision or uncertainty, and one consequence of this is that data from past experiments cannot exactly predict the outcome of a future experiment. Dealing with uncertainty is a cornerstone of the statistical method, and distinguishes it from math- ematical method. The mathematical method is deductive: its concern is the logical derivation of consequences from explicitly stated assumptions. Statistical method is inferential: given empirical evidence in the form of data, its goal is to ask what underlying natural laws could have generated the data; and it is the imprecision or uncertainty in the data which makes the inferential process fundamentally different from mathematical deduction. As a simple example of this distinction, consider a system with a single variable input, x, and a consequential output, y. A scientific theory asserts thattheoutputisalinearfunctionoftheinput,meaningthatexperimental values of x and y will obey the mathematical relationship y =a+b×x forsuitablevaluesoftwoconstants,aandb.Toestablishthecorrectvalues ofaandb,weneedonlyruntheexperimentwithtwodifferentvaluesofthe input, x, measure the corresponding values of the output, y, plot the two points (x,y), connect them with a straight line and read off the intercept, a,andslope,b,oftheline.Ifthetruthoftheassumedmathematicalmodel is in doubt, we need only run the experiment with a third value of the input, measure the corresponding output and add a third point (x,y) to our plot. If the three points lie on a straight line, the model is correct, and conversely. However, if each experimental output is subject to any amount, however small, of unpredictable fluctuation about the underlying straight-line relationship, then logically we can neither determine a and b exactly, nor establish the correctness of the model, however many times we run the experiment. What we can do, and this is the essence of the statistical method, is estimate a and b with a degree of uncertainty which diminishesasthenumberofrunsoftheexperimentincreases,andestablish Statistics and Scientific Method , First Edition, Peter J. Diggle, Amanda G. Chetwynd © Peter J. Diggle, Amanda G. Chetwynd 2011. Published in 2011 by Oxford University Press viii PREFACE the extent to which the postulated model is reasonably consistent with the experimental data. In some areas of science, unpredictable fluctuations in experimental results are a by-product of imperfect experimental technique. This is presumably the thinking behind the physicist Ernest Rutherford’s much- quoted claim that ‘If your experiment needs statistics, you ought to have doneabetterexperiment.’ Perhapsforthisreason,inthephysicalsciences unpredictable fluctuations are often described as ‘errors’. In other areas of science, unpredictability is an inherent part of the underlying scientific phenomenon, and need carry no pejorative associations. For example, in medicinedifferentpatientsshowdifferentresponsestoagiventreatmentfor reasonsthatcannotbewhollyexplainedbymeasurabledifferencesamongst them, suchastheirage, weight orotherphysiological characteristics. More fundamentally, in biology unpredictability is an inherent property of the process of transmission and recombination of genetic material from parent to offspring, and is essential to Darwinian evolution. It follows that the key idea in the statistical method is to understand variationindataandinparticulartounderstandthatsomeofthevariation whichweseeinexperimentalresultsispredictable,orsystematic,andsome unpredictable,orrandom.Mostformaltreatmentsofstatisticstend,inthe authors’ opinion, to overemphasize the latter, with a consequential focus on the mathematical theory of probability. This is not to deny that an understanding of probability is of central importance to the statistics dis- cipline, but from the perspective of a student attending a service course in statisticsanemphasisonprobabilitycanmakethesubjectseemexcessively technical,obscuringitsrelevancetosubstantivescience.Evenworse,many service courses in statistics respond to this by omitting the theory and presenting only a set of techniques and formulae, thereby reducing the subject to the status of a recipe book. Our aim in writing this book is to provide an antidote to technique- oriented service courses in statistics. Instead, we have tried to emphasize statistical concepts, to link statistical method to scientific method, and to show how statistical thinking can benefit every stage of scientific inquiry, from designing an experimental or observational study, through collecting and processing the resulting data, to interpreting the results of the data- processing in their proper scientific context. Each chapter, except Chapters 1 and 3, begins with a non-technical discussion of a motivating example, whose substantive content is indicated inthechaptersubtitle.Ourexamplesaredrawnlargelyfromthebiological, biomedical and health sciences, because these are the areas of application with which we are most familiar in our own research. We do include some examples from other areas of science, and we hope that students whose specific scientific interests are not included in the subject matter of our examples will be able to appreciate how the underlying statistical PREFACE ix concepts are nevertheless relevant, and adaptable, to their own areas of interest. Our book has its origins in a service course at Lancaster University which we have delivered over a period of several years to an audience of first-year postgraduate students in science and technology. The scientific maturity of students at this level, by comparison with undergraduates, undoubtedly helps our approach to succeed. However, we do not assume anypriorknowledgeofstatistics,nordowemakemathematicaldemandson ourreadersbeyondawillingnesstogettogripswithmathematicalnotation (itself a way of encouraging precision of thought) and an understanding of basic algebra. Even the simplest of statistical calculations requires the use of a com- puter;andiftediumistobeavoided,thesameappliestographicalpresen- tationofdata.Ourbookdoesnotattempttoteachstatisticalcomputation inasystematicway.Manyoftheexercisescouldbedoneusingpencil,paper and pocket calculator, although we hope and expect that most readers of the book will use a computer. We have, however, chosen to present our material in a way that will encourage readers to use the R software environment (see the website www.r-project.org). From our perspectives as teachers and as professional statisticians,Rhasanumberofadvantages:itsopen-sourcestatus;thefact that it runs on most platforms, including Windows, Macintosh and Linux operating systems; its power in terms of the range of statistical methods that it offers. Most importantly from a pedagogical perspective, using R encourages the open approach to problems that the book is intended to promote, and discourages the ‘which test should I use on these data’ kind ofclosedthinkingthatweverymuchwanttoavoid.Ofcourse, Risnotthe only software environment which meets these criteria, but it is very widely used in the statistical community and does seem to be here to stay. WehaveprovideddatasetsandRscripts(sequencesofRcommands)that will enable any reader to reproduce every analysis reported in the book. This material is freely available at: www.lancs.ac.uk/staff/diggle/intro- stats-book. We hope that readers unfamiliar with R will either be able to adapt these datasets and programmes for their own use, or will be stimulated to learn more about the R environment. But we emphasize that the book can be read and used without any knowledge of, or reference to, R whatsoever.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.