Analysis of Variance, Design, and Regression: Applied Statistical Methods Ronald Christensen Department of Mathematics and Statistics University of New Mexico v To Mark, Karl, and John It’s been great fun. Contents Preface xiii 1 Introduction 1 1.1 Probability 1 1.2 Randomvariablesandexpectations 4 1.2.1 Expectedvaluesandvariances 6 1.2.2 Chebyshev’sinequality 9 1.2.3 Covariancesandcorrelations 9 1.2.4 Rulesforexpectedvaluesandvariances 11 1.3 Continuousdistributions 13 1.4 Thebinomialdistribution 15 1.5 Themultinomialdistribution 20 1.6 Exercises 23 2 Onesample 27 2.1 Exampleandintroduction 27 2.2 Inferenceaboutµ 31 2.2.1 Confidenceintervals 33 2.2.2 Hypothesistests 35 2.3 Predictionintervals 41 2.4 Checkingnormality 43 2.5 Transformations 51 2.6 Inferenceaboutσ2 55 2.7 Exercises 58 3 Ageneraltheoryfortestingandconfidenceintervals 61 3.1 Theoryforconfidenceintervals 62 3.2 Theoryforhypothesistests 67 3.3 Validityoftestsandconfidenceintervals 75 3.4 Therelationshipbetweenconfidenceintervalsandtests 76 3.5 Theoryofpredictionintervals 77 3.6 Samplesizedeterminationandpower 80 3.7 Exercises 83 4 Twosampleproblems 85 4.1 Twocorrelatedsamples:pairedcomparisons 85 4.2 Twoindependentsampleswithequalvariances 88 4.3 Twoindependentsampleswithunequalvariances 94 4.4 Testingequalityofthevariances 98 4.5 Exercises 101 vii viii CONTENTS 5 One-wayanalysisofvariance 107 5.1 Introductionandexamples 107 5.1.1 Theory 113 5.1.2 BalancedANOVA:introductoryexample 117 5.1.3 Analyticandenumerativestudies 120 5.2 Balancedone-wayanalysisofvariance:theory 121 5.2.1 Theanalysisofvariancetable 125 5.3 Unbalancedanalysisofvariance 127 5.4 Choosingcontrasts 129 5.5 Comparingmodels 134 5.6 ThepoweroftheanalysisofvarianceF test 136 5.7 Exercises 137 6 Multiplecomparisonmethods 143 6.1 Fisher’sleastsignificantdifferencemethod 145 6.2 Bonferroniadjustments 149 6.3 Studentizedrangemethods 151 6.3.1 Tukey’shonestsignificantdifference 152 6.3.2 Newman–Keulsmultiplerangemethod 154 6.4 Scheffe´’smethod 155 6.5 Othermethods 157 6.5.1 Ott’sanalysisofmeansmethod 157 6.5.2 Dunnett’smany-onet statisticmethod 158 6.5.3 Duncan’smultiplerangemethod 159 6.6 Summaryofmultiplecomparisonprocedures 159 6.7 Exercises 160 7 Simplelinearandpolynomialregression 163 7.1 Anexample 163 7.2 Thesimplelinearregressionmodel 168 7.3 Estimationofparameters 168 7.4 Theanalysisofvariancetable 171 7.5 Inferentialprocedures 172 7.6 Analternativemodel 174 7.7 Correlation 175 7.8 Recognizingrandomness:simulateddatawithzerocorrelation 178 7.9 Checkingassumptions:residualanalysis 183 7.10 Transformations 194 7.10.1 Box–Coxtransformations 196 7.11 Polynomialregression 201 7.12 Polynomialregressionandone-wayANOVA 208 7.13 Exercises 220 8 Theanalysisofcountdata 225 8.1 Onebinomialsample 225 8.1.1 Thesigntest 228 8.2 Twoindependentbinomialsamples 228 8.3 Onemultinomialsample 231 8.4 Twoindependentmultinomialsamples 233 8.5 Severalindependentmultinomialsamples 236 8.6 Lancaster–Irwinpartitioning 239 8.7 Logisticregression 245 CONTENTS ix 8.8 Exercises 250 9 Basicexperimentaldesigns 253 9.1 Completelyrandomizeddesigns 255 9.2 Randomizedcompleteblockdesigns 255 9.3 Latinsquaredesigns 267 9.4 Discussionofexperimentaldesign 274 9.5 Exercises 275 10 Analysisofcovariance 281 10.1 Anexample 281 10.2 Analysisofcovarianceindesignedexperiments 286 10.3 Computationsandcontrasts 287 10.4 PowertransformationsandTukey’sonedegreeoffreedom 291 10.5 Exercises 295 11 Factorialtreatmentstructures 299 11.1 Twofactors 299 11.2 Two-wayanalysisofvariancewithreplication 315 11.3 Multifactorstructures 318 11.4 ExtensionsofLatinsquares 329 11.5 Exercises 332 12 Splitplots,repeatedmeasures,randomeffects,andsubsampling 337 12.1 Theanalysisofsplitplotdesigns 337 12.2 Afour-factorsplitplotanalysis 348 12.3 Multivariateanalysisofvariance 367 12.4 Randomeffectsmodels 374 12.4.1 Subsampling 374 12.4.2 Randomeffects 376 12.5 Exercises 378 13 Multipleregression:introduction 383 13.1 Exampleofinferentialprocedures 383 13.2 Regressionsurfacesandprediction 386 13.3 Comparingregressionmodels 388 13.4 Sequentialfitting 393 13.5 Reducedmodelsandprediction 395 13.6 Partialcorrelationcoefficientsandaddedvariableplots 396 13.7 Collinearity 397 13.8 Exercises 399 14 Regressiondiagnosticsandvariableselection 405 14.1 Diagnostics 405 14.2 Bestsubsetmodelselectionmethods 412 14.2.1 R2statistic 412 14.2.2 AdjustedR2statistic 413 14.2.3 Mallows’sC statistic 415 p 14.2.4 Acombinedsubsetselectiontable 416 14.3 Stepwisemodelselectionmethods 417 14.3.1 Backwardselimination 417 14.3.2 Forwardselection 418 14.3.3 Stepwisemethods 419 x CONTENTS 14.4 Modelselectionandcasedeletion 419 14.5 Exercises 422 15 Multipleregression:matrixformulation 425 15.1 Randomvectors 425 15.2 Matrixformulationofregressionmodels 426 15.3 Leastsquaresestimationofregressionparameters 428 15.4 Inferentialprocedures 432 15.5 Residuals,standardizedresiduals,andleverage 435 15.6 Principalcomponentsregression 436 15.7 Weightedleastsquares 440 15.8 Exercises 445 16 Unbalancedmultifactoranalysisofvariance 447 16.1 Unbalancedtwo-wayanalysisofvariance 447 16.1.1 Proportionalnumbers 447 16.1.2 Generalcase 448 16.2 Balancedincompleteblockdesigns 456 16.3 Unbalancedmultifactoranalysisofvariance 463 16.4 Youdensquares 467 16.5 Matrixformulationofanalysisofvariance 470 16.6 Exercises 474 17 Confoundingandfractionalreplicationin2nfactorialsystems 477 17.1 Confounding 480 17.2 Fractionalreplication 489 17.3 Analysisofunreplicatedexperiments 494 17.4 Moreongraphicalanalysis 501 17.5 Augmentingdesignsforfactorsattwolevels 504 17.6 Exercises 505 18 Nonlinearregression 507 18.1 Introductionandexamples 507 18.2 Estimation 508 18.2.1 TheGauss–Newtonalgorithm 509 18.2.2 Maximumlikelihoodestimation 513 18.3 Statisticalinference 513 18.4 Linearizablemodels 523 18.5 Exercises 523 AppendixA:Matrices 525 A.1 Matrixadditionandsubtraction 526 A.2 Scalarmultiplication 526 A.3 Matrixmultiplication 526 A.4 Specialmatrices 528 A.5 Lineardependenceandrank 529 A.6 Inversematrices 530 A.7 Alistofusefulproperties 532 A.8 Eigenvaluesandeigenvectors 532 CONTENTS xi AppendixB:Tables 535 B.1 Tablesofthet distribution 536 B.2 Tablesoftheχ2distribution 538 B.3 TablesoftheW(cid:48)statistic 542 B.4 Tablesoforthogonalpolynomials 543 B.5 TablesoftheStudentizedrange 544 B.6 TheGreekalphabet 548 B.7 TablesoftheF distribution 549 AuthorIndex 567 SubjectIndex 569 Preface Thisbookexaminestheapplicationofbasicstatisticalmethods:primarilyanalysisofvarianceand regressionbutwithsomediscussionofcountdata.ItisdirectedprimarilytowardsMastersdegree students in statistics studying analysis of variance, design of experiments, and regression analy- sis.IhavefoundthattheMasterslevelregressioncourseisoftenpopularwithstudentsoutsideof statistics. These students are often weaker mathematically and the book caters to that fact while continuingtogiveacompletematrixformulationofregression. The book is complete enough to be used as a second course for upper division and beginning graduate students in statistics and for graduate students in other disciplines. To do this, one must beselectiveinthematerialcovered,butthemoretheoreticalmaterialappropriateonlyforStatistics Mastersstudentsisgenerallyisolatedinseparatesubsectionsand,lessoften,inseparatesections. ForaMasterslevelcourseinanalysisofvarianceanddesign,IhavethestudentsreviewChap- ter 2, I present Chapter 3 while simultaneously presenting the examples of Section 4.2, I present Chapters5and6,verybrieflyreviewthefirstfivesectionsofChapter7,presentSections7.11and 7.12indetailandthenIcoverChapters9,10,11,12,and17.Dependingontimeconstraints,Iwill deletematerialoraddmaterialfromChapter16. ForaMasterslevelcourseinregressionanalysis,IagainhavethestudentsreviewChapter2and IreviewChapter3withexamplesfromSection4.2.IthenpresentChapters7,13,and14,Appendix A, Chapter 15, Sections 16.1.2, 16.3, 16.5 (along with analysis of covariance), Section 8.7 and finallyChapter18.Allofthisisdoneincompletedetail.IfanytimeremainsIliketosupplement thecoursewithdiscussionofresponsesurfacemethods. Asasecondcourseforupperdivisionandbeginninggraduatestudentsinstatisticsandgraduate students in other disciplines, I cover the first eight chapters with omission of the more technical material. A follow up course covers the less technical aspects of Chapters 9 through 15 and Ap- pendixA. Ithinkthebookisreasonablyencyclopedic.ItreallycontainseverythingIwouldlikemystu- dents to know about applied statistics prior to them taking courses in linear model theory or log- linearmodels. I believe that beginning students (even Statistics Masters students) often find statistical proce- dures to be a morass of vaguely related special techniques. As a result, this book focuses on four connectingthemes. 1. Most inferential procedures are based on identifying a (scalar) parameter of interest, estimat- ingthatparameter,obtainingthestandarderroroftheestimate,andidentifyingtheappropriate referencedistribution.Giventheseitems,theinferentialproceduresareidenticalforvariouspa- rameters. 2. Balanced one-way analysis of variance has a simple, intuitive interpretation in terms of com- paringthesamplevarianceofthegroupmeanswiththemeanofthesamplevariancesforeach group.Allbalancedanalysisofvarianceproblemsareconsideredintermsofcomputingsample variancesforvariousgroupmeans. 3. Comparing different models provides a structure for examining both balanced and unbalanced analysis of variance problems and for examining regression problems. In some problems the mostreasonableanalysisissimplytofindasuccinctmodelthatfitsthedatawell. 4. Checkingassumptionsisacrucialpartofeverystatisticalanalysis. xiii xiv PREFACE The object of statistical data analysis is to reveal useful structure within the data. In a model- based setting,I know of twoways to do this.One way isto find a succinct model forthe data. In such a case, the structure revealed is simply the model. The model selection approach is particu- larlyappropriatewhentheultimategoaloftheanalysisismakingpredictions.Thisbookusesthe model selection approach for multiple regression and for general unbalanced multifactor analysis ofvariance.Theotherapproachtorevealingstructureistostartwithageneralmodel,identifyin- terestingone-dimensionalparameters,andperformstatisticalinferencesontheseparameters.This parametricapproachrequiresthatthegeneralmodelinvolveparametersthatareeasilyinterpretable. Weusetheparametricapproachforone-wayanalysisofvariance,balancedmultifactoranalysisof variance,andsimplelinearregression.Inparticular,theparametricapproachtoanalysisofvariance presentedhereinvolvesastrongemphasisonexaminingcontrasts,includinginteractioncontrasts. Inanalyzingtwo-waytablesofcounts,weuseapartitioningmethodthatisanalogoustolookingat contrasts. Allstatisticalmodelsinvolveassumptions.Checkingthevalidityoftheseassumptionsiscrucial because the models we use are never correct. We hope that our models are good approximations to the true condition of the data and experience indicates that our models often work very well. Nonetheless,tohavefaithinouranalyses,weneedtocheckthemodelingassumptionsasbestwe can.Someassumptionsareverydifficulttoevaluate,e.g.,theassumptionthatobservationsarestatis- ticallyindependent.Forcheckingotherassumptions,avarietyofstandardtoolshasbeendeveloped. Usingthesetoolsisasintegraltoaproperstatisticalanalysisasisperforminganappropriateconfi- denceintervalortest.Forthemostpart,usingmodel-checkingtoolswithouttheaidofacomputer ismoretroublethanmostpeoplearewillingtotolerate. My experience indicates that students gain a great deal of insight into balanced analysis of variancebyactuallydoingthecomputations.Thecomputationofthemeansquarefortreatmentsin abalancedone-wayanalysisofvarianceistrivialonanyhandcalculatorwithavarianceorstandard deviationkey.Moreimportantly,thecalculationreinforcesthefundamentalandintuitiveideabehind the balanced analysis of variance test, i.e., that a mean square for treatments is just a multiple of the sample variance of the corresponding treatment means. I believe that as long as students find the balanced analysis of variance computations challenging, they should continue to do them by hand(calculator).Ithinkthatautomatedcomputationshouldbemotivatedbyboredomratherthan bafflement. In addition to the four primary themes discussed above, there are several other characteristics thatIhavetriedtoincorporateintothisbook. I have tried to use examples to motivate theory rather than to illustrate theory. Most chapters beginwithdataandaninitialanalysisofthatdata.Afterillustratingresultsfortheparticulardata, wegobackandexaminegeneralmodelsandprocedures.Ihavedonethistomakethebookmore palatabletotwogroupsofpeople:thosewhoonlycareabouttheoryafterseeingthatitisusefuland thoseunfortunateswhocanneverbringthemselvestocareabouttheory.(TheolderIget,themoreI identifywiththefirstgroup.Asfortheothergroup,IfindmyselfagreeingwithW.EdwardsDeming that experience without theory teaches nothing.) As mentioned earlier, the theoretical material is generallyconfinedtoseparatesubsectionsor,lessoften,separatesections,soitiseasytoignore. Ibelievethattheultimategoalofallstatisticalanalysisispredictionofobservablequantities.I haveincorporatedpredictiveinferentialprocedureswheretheyseemednatural. The object of most statistics books is to illustrate techniques rather than to analyze data; this book is no exception. Nonetheless, I think we do students a disservice by not showing them a substantial portion of the work necessary to analyze even ‘nice’ data. To this end, I have tried to consistentlyexamineresidualplots,topresentalternativeanalysesusingdifferenttransformations and case deletions, and to give some final answers in plain English. I have also tried to introduce suchmaterialasearlyaspossible.Ihaveincludedreasonablydetailedexaminationsofathree-factor analysisofvarianceandofasplitplotdesignwithfourfactors.Ihaveincludedsomeexamplesin which,likereallife,thefinalanswersarenot‘neat.’WhileIhavetriedtointroducestatisticalideas assoonaspossible,Ihavetriedtokeepthemathematicsassimpleaspossibleforaslongaspossible. PREFACE xv Forexample,matrixformulationsarepostponedtothelastchapteronmultipleregressionandthe lastsectiononunbalancedanalysisofvariance. Ineverusesideconditionsornormalequationsinanalysisofvariance. Inmultiplecomparisonmethods,(weakly)controllingtheexperimentwiseerrorrateisdiscussed intermsoffirstperforminganomnibustestfornotreatmenteffectsandthenchoosingacriterionfor evaluatingindividualhypotheses.Mostmethodsconsidereddivideintothosethatusetheomnibus F test,thosethatusetheStudentizedrangetest,andtheBonferronimethod,whichdoesnotuseany omnibustest. I have tried to be very clear about the fact that experimental designs are set up for arbitrary groupsoftreatmentsandthatfactorialtreatmentstructuresaresimplyanefficientwayofdefining thetreatmentsinsomeproblems.Thus,thenatureofarandomizedcompleteblockdesigndoesnot dependonhowthetreatmentshappentobedefined.Theanalysisalwaysbeginswithabreakdown of the sum of squares into treatments, blocks, and error. Further analysis of the treatments then focusesonwhateverstructurehappenstobepresent. Theanalysisofcovariancechapterincludesanextensivediscussionofhowthecovariatesmust bechosentomaintainavalidexperiment.Tukey’sonedegreeoffreedomtestfornonadditivityis presented as an analysis of covariance test for the need to perform a power transformation rather thanasatestforaparticulartypeofinteraction. The chapter on confounding and fractional replication has more discussion of analyzing such datathanmanyotherbookscontain. Minitab commands are presented for most analyses. Minitab was chosen because I find it the easiestofthecommonpackagestouse.However,therealpointofincludingcomputercommandsis toillustratethekindsofthingsthatoneneedstospecifyforanycomputerprogramandthevarious auxiliarycomputationsthatmaybenecessaryfortheanalysis.Theotherstatisticalpackagesused increatingthebookwereBMDP,GLIM,andMSUSTAT. Acknowledgements Many people provided comments that helped in writing this book. My colleagues Ed Bedrick, Aparna Huzurbazar, Wes Johnson, Bert Koopmans, Frank Martin, Tim O’Brien, and Cliff Qualls helpedalot.IgotnumerousvaluablecommentsfrommystudentsattheUniversityofNewMex- ico.MarjorieBond,MattCooney,JeffS.Davis,BarbaraEvans,MikeFugate,JanMines,andJim Shieldsstandoutinthisregard.Thebookhadseveralanonymousreviewers,someofwhommade excellentsuggestions. IwouldliketothankMartinGilchristandSpringer-VerlagforpermissiontoreproduceExam- ple 7.6.1 from Plane Answers to Complex Questions: The Theory of Linear Models. I also thank theBiometrikaTrusteesforpermissiontousethetablesinAppendixB.5.ProfessorJohnDeelyand theUniversityofCanterburyinNewZealandwerekindenoughtosupportcompletionofthebook duringmysabbaticalthere. Now my only question is what to do with the chapters on quality control, pn factorials, and responsesurfacesthatendeduponthecuttingroomfloor. RonaldChristensen Albuquerque,NewMexico February1996
Description: