ebook img

Ten Projects in Applied Statistics PDF

415 Pages·2023·10.135 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Ten Projects in Applied Statistics

Springer Series in Statistics Peter McCullagh Ten Projects in Applied Statistics Springer Series in Statistics SeriesEditors PeterBühlmann,SeminarfürStatistik,ETHZürich,Zürich,Switzerland PeterDiggle,Dept.Mathematics,UniversityLancaster,Lancaster,UK UrsulaGather,Dortmund,Germany ScottZeger,Baltimore,MD,USA SpringerSeriesinStatistics(SSS)isaseriesofmonographsofgeneralinterestthat discussstatisticaltheoryandapplications. The series editors are currently Peter Bühlmann, Peter Diggle, Ursula Gather, andScottZeger.PeterBickel,IngramOlkin,andStephenFienbergwereeditorsof theseriesformanyyears. Peter McCullagh Ten Projects in Applied Statistics PeterMcCullagh DepartmentofStatistics UniversityofChicago Chicago,IL,USA ISSN0172-7397 ISSN2197-568X (electronic) SpringerSeriesinStatistics ISBN978-3-031-14274-1 ISBN978-3-031-14275-8 (eBook) https://doi.org/10.1007/978-3-031-14275-8 ©TheEditor(s)(ifapplicable)andTheAuthor(s),underexclusivelicensetoSpringerNatureSwitzerland AG2022 Thisworkissubjecttocopyright.AllrightsaresolelyandexclusivelylicensedbythePublisher,whether thewhole orpart ofthematerial isconcerned, specifically therights oftranslation, reprinting, reuse ofillustrations, recitation, broadcasting, reproductiononmicrofilmsorinanyotherphysicalway,and transmissionorinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilar ordissimilarmethodologynowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthors,andtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressedorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland To Rosa Preface Goals Thebookbeginswithtenchapters,eachdevotedtoadetaileddiscussionofaspecific project. Some of these are projects that arose as part of the statistical consulting programattheUniversityofChicago;othersaretakenfromrecentpublicationsin thescientificliterature.Thediscussionspecificallycoversanalysesthatmightseem superficiallyplausible,butareinfactmisleading. The areas of application range from medical and biologicalsciences to animal behavior studies, growth curves, time series, and environmental work. Statistical techniques are kept as simple as is reasonable to do justice to the scientific goals. They range from summary tables and graphs to linear models, generalized linear models, variance-componentmodels, time series, spatial processes, and so on. Recognition of relationships among the observational units and the need to accommodatecorrelationsamongresponsesisarecurringtheme. The second half of the book begins by discussing a range of fundamental considerationsthatshapemyattitudetoappliedstatistics.Oncetheyarepointedout, thesemattersappearsosimpleandobviousthatitishardtoexplainwhyadetailed discussionshouldbeneededinanadvanced-leveltext.Butthesimplefactthatthey aresofrequentlyoverlookedshowsthatthisattitudeisunhelpful.Mostsuchmatters are relatedto statistical design,the placementof the baseline,the identificationof observationalunitsandexperimentalunits,theroleofcovariatesandrelationships, initialvalues,randomizationandtreatmentassignment,andsoon.Othersarerelated to the interplaybetween design and modelling:Is the proposedmodelcompatible with the design? More technicalmatters related to stochastic processes, including stationarity and isotropy, techniques for constructing spatio-temporal processes, likelihoodfunctions,and so on are coveredin later chapters.Parametricinference is important, but it is not a major or primary focus. More important by far is to estimatetherightthing,howeverinefficiently,thantoestimatethewrongthingwith maximumefficiently. vii viii Preface The bookis aimed at professionalstatisticians and at studentswho are already familiar with linear modelsbut who wish to gain experiencein the application of statistical ideas to scientific research. It is also aimed at scientific researchers in ecology, biology, or medicine who wish to use appropriate statistical methods in theirwork. The primary emphasis is on the translation of experimental concepts and scientific ideas into stochastic models, and ultimately into statistical analyses and substantiveconclusions.Althoughnumericalcomputationplays a major role, it is notadrivingforce. The aim of the book is not so much to illustrate algorithms or techniques of computation,buttoillustratetheroleofstatisticalthinkingandstochasticmodelling in scientific research. Typically, that cannot be accomplished without a detailed discussion of the scientific objectives, the experimental design, randomization, treatment assignment, baseline factors, response measurement, and so on. Before settling on a standard family of stochastic processes, the statistician must first ask whether the model is adequate as a formulation of the scientific goals. Is it self-consistent? Is it in conflict with randomization? Does it address adequately the sexual asymmetry in Drosophila courtship rituals? Is the sampling scheme adequateforthestatedobjective?AglanceatChaps.1–5showstheextenttowhich the discussion must be tailored to the specific needs of the application, and the unfortunateconsequencesof adoptingan off-the-shelfmodelin a routineway.As D.R.CoxonceremarkedatanISImeetingin1979,“Therearenoroutinestatistical questions—onlyquestionablestatisticalroutines.” Everyanalysisandeverystochasticmodelisopentocriticismonthegroundsthat itisnotagoodmatchfortheapplication.Aperfectmatchisararity,socompromise is needed in every setting, and a balance must be struck in order to proceed. At variouspoints, I haveincludedplausibleanalysesthatare deficient,inappropriate, misleading,orsimplyincorrect.Thehopeisthatstudentsmightlearnnotonlyfrom theirownmistakesbutfromthemistakesofothers. Computation The R package (R Core Team, 2015) is used throughout the book for all plots and computations. Initial data summaries may use built-in functions such as apply(...) or tapply(..) for one- and two-way tables of averages, or plot(...)forplotsofseasonaltrendsorresiduals.Apartfromstandardfunctions suchaslm(...)forfittinglinearmodels,glm(...)forgeneralizedlinearmod- els,andfft(...)forthefastFouriertransformation,twoadditionalpackagesare usedforspecializedpurposes: 1. regress(...) (Clifford and McCullagh, 2006) for fitting linear Gaussian modelshavingnon-trivialspatialortemporalcorrelations; Preface ix 2. lmer(...)(Batesetal.,2015)forfittinglinearandgeneralizedlinearrandom- effectsmodels. Either package may be used for fitting the models in Chaps.1 and 2; glm() is adequateformostcomputationsinChaps.3and7;regress()isbettersuitedto theneedsofChaps.4and5;andlmer()isbettersuitedforChap.9. Organization The book is not organized linearly by topic. From a logical standpoint, it might havebeenintellectuallymoresatisfyingtobeginatthebeginningwithChap.11and to illustrate the various statistical design concepts with examples drawn from the literature.Thatoptionwasconsideredandquicklyabandoned.Adeliberatechoice hasbeenmadetoputasmuchemphasisontheprojectsasonthestatisticalmethods andtodrawuponwhateverstatisticaltechniquesarerequiredasandwhentheyare required.For that reason, the projects come first. Thus, a reader who is unsure of the distinction between covariatesand relationshipsor between observationaland experimentalunitsortheimplicationsofthosedistinctionsmayconsulttherelevant portionsofChap.11. Severaloftheprojectsaretakenfromexperimentsreportedintherecentscientific literature.Inmostcases,theexperimentaldesignisfairlyeasytounderstandwhen the structure of the units is properly laid out. The importance of accommodating correlations arising from temporal or other relationships among the units is a recurringtheme.Andifthatistheonlylessonlearned,thebookwillhaveserveda usefulpurpose. Acknowledgments My attitude toward applied statistics has been shaped over many years by discus- sions with colleagues, particularly David Wallace, Mike Stein, Steve Stigler, Mei Wang, and Colm O’Muircheartaigh. The books on experimental design by Cox (1958), Mead (1988) and Bailey (2008), and the pair of papers Nelder (1965a,b) onrandomizedfieldexperimentshavebeenparticularlyinfluential. CoxandSnell (1981)isatroveofstatisticalwisdomandunteachablecommonsense. For the past 25years, I have worked closely with my colleague Mei Wang, encouraginggraduate students and advising research workers on projects brought tothestatisticalconsultingprogramattheUniversityofChicago.Overthatperiod, we have given advice on perhaps 500 projects from a wide range of researchers, from all branches of the Physical and Biological Sciences to the Social Sciences, PublicPolicy,Law,Humanities,andeventhe DivinitySchool.Allof theattitudes expressed in these notes are the product of direct experiences with researchers, x Preface plus discussions with students and colleagues. Parts of three consulting projects areincludedinChaps.1,3and10. Other themes and whole chapters have emerged from an advanced graduate course taught at the University of Chicago, in which students are encouraged to work on a substantial applied project taken from the recent scientific literature. Chapter 5 is based on a course project selected by Wei Kuang in 2020. I have includeda detailedanalysis ofthis projectbecause it raisesa numberof technical issuesconcerningfactorialmodelsthatarewellunderstoodbutseldomadequately emphasizedintheappliedstatisticalliterature.Chapter9isbasedoncourseprojects byDongyueXie,IrinaCristali,LinGui,andY.Weiin2018–2020.Chapter16was motivated by a 2020 masters project by Ben Harris on spatio-temporal variation in summer solar irradiance and its effect on solar power generation in downstate Illinois.Alloftheattitudesandopinionsexpressedintheseanalysesareminealone. Overthepastfewyears,severalstudentshavetackledvariousaspectsoftheOut- of-Africaproject.Some have gonebeyondthe call of duty,includingShaneMiao forherMastersprojectattheUniversityofOxfordin2016,andJosephineSantoso forherMastersprojectattheUniversityofChicagoin2022. I am indebted to students and colleagues, particularly Heather Battey, Laurie Butler, Emma McCullagh, Mike Stein, Steve Stigler, and Mei Wang, for reading andcommentingonanearlierdraftofthemanuscript. Chicago,IL,USA PeterMcCullagh

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.