Baldi-4100190 psls January13,2014 14:23 k c o St er p u S A/ P H N CHAPTER 26 More about Analysis of Variance: Follow-up Tests and Two-Way ANOVA A nalysisofvariance(ANOVA)isastatisticalmethodforcomparingthemeans IN THIS CHAPTER ofseveralpopulationsbasedonindependentrandomsamples,orthemean WE COVER... responses to several treatments in a randomized comparative experiment. Whenwecomparejusttwomeans,weusethetwo-samplet proceduresdescribed ■ Beyondone-wayANOVA in Chapter 18. ANOVA allows comparison of any number of means. The basic ■ Follow-upanalysis:Tukeypairwise multiplecomparisons formofANOVAisone-wayANOVA,whichtreatsthemeansbeingcomparedas ■ Follow-upanalysis:contrasts* meanresponsestodifferentlevelsofasinglevariable.Forexample,inChapter24 ■ Two-wayANOVA:conditions, weusedone-wayANOVAtocomparethemeanweightsofadultmaleWistarrats maineffects,andinteraction fedoneofthreetypesofdiets.Figure26.1showstheMinitabANOVAoutputfor ■ Inferencefortwo-wayANOVA thesedata(displayedinTable24.1,page598). ■ Somedetailsoftwo-wayANOVA* Beyondone-wayANOVA You should recall or review the big ideas of one-way ANOVA from Chapter 24. one-wayANOVA One-wayANOVAcomparesthemeansμ ,μ ,...,μ ofk populationsbasedon 1 2 k samplesofsizesn ,n ,...,n fromthesepopulations. 1 2 k 26-1 Baldi-4100190 psls December31,2013 11:1 26-2 CHAPTER26 ■ MoreaboutAnalysisofVariance:Follow-upTestsandTwo-WayANOVA ■ Usingseparatetwo-samplet procedurestocomparemanypairsofmeansis abadidea,becausewedon’thavea P-valueoraconfidencelevelforthe completesetofcomparisonstogether.Thisistheproblemofmultiple multiplecomparisons comparisons. ■ One-wayANOVAgivesasingletestforthenullhypothesisthatallthe populationmeansarethesameagainstthealternativehypothesisthatnotall arethesame(H simplyisnottrue). 0 ■ ANOVAworksbycomparinghowfarapartthesamplemeansarerelativeto thevariationamongindividualobservationsinthesamesample.Thetest ANOVAF statistic statisticistheANOVA F statistic variationamongthesamplemeans F = variationamongindividualsinthesamesample F distribution The P-valuecomesfroman F distribution. ANOVAconditions ■ Therequiredconditions for ANOVAareindependentrandomsamplesfrom eachofthek populations(orarandomizedcomparativeexperimentwithk treatments),Normaldistributionsfortheresponsevariableineachpopulation, andacommonstandarddeviationσ inallpopulations.Fortunately,ANOVA inferenceisquiterobustagainstmoderateviolationsoftheNormalityand commonstandarddeviationconditions. ■ Inbasicstatisticalpractice,wecombinethe F testwithdescriptivedata analysistochecktheconditionsforANOVAandtoseewhichmeansappear todifferandbyhowmuch. Examples 24.1 and 24.2 (pages 597 and 600) showed all the steps required for a one-way ANOVA. This chapter moves beyond basic one-way ANOVA in two directions. Minitab Session One-way ANOVA: Chow, Restricted, Extended Source DF SS MS F P Factor 2 63400 31700 10.71 0.000 Error 47 139174 2961 Total 49 202573 S = 54.42 R-Sq = 31.30% R-Sq(adj) = 28.37% Individual 95% CIs for Mean Based on Pooled StDev Level N Mean StDev Chow 19 605.63 49.64 Restricted 16 657.31 50.68 Extended 15 691.13 63.41 595 630 665 700 FIGURE 26.1 MinitabANOVA Pooled StDev = 54.42 outputfortheratweightdataof Examples24.1and24.2. Baldi-4100190 psls December31,2013 11:1 ■ Follow-upanalysis:Tukeypairwisemultiplecomparisons 26-3 Follow-up analysis. The ANOVA F test in Figure 26.1 tells us only that the populationmeansarenotthesame.Wewouldliketosaywhichmeansdifferandby howmuch.Forexample,dothedataallowustosaythatthe“extendeddiet”pop- ulationdoeshaveahighermeanweightthanthe“chowdiet”andthe“restricted diet”populationsofadultmaleWistarlabrats?Thisisafollow-upanalysistothe F testthatgoesbeyonddataanalysistoconfidenceintervalsandtestsofsignificance forspecificcomparisonsofmeans. Two-way ANOVA. One-way ANOVA compares mean responses for several levelsofjustoneexplanatoryvariable.InExamples24.1and24.2,thatvariableis “the type of diet provided.” Suppose that we have data on two explanatory vari- ables,say,thetypeofdietprovidedandwhetherthelabratsarephysicallyactive. Therearenow6groupsformedbycombinationsofdiettypeandphysicalactivity, asfollows: Variable2 Active Inactive Chow Group1 Group2 Variable1 Restricted Group3 Group4 Extended Group5 Group6 One-way ANOVA will still tell us if there is evidence that mean body weight inthese6experimentalgroupsdiffers.Butwewantmore:Doesdiettypematter? Doesphysicalactivitymatter?Anddothesetwovariablesinteract?Thatis,doesthe effectofdiettypechangewhenthelabratsarephysicallyactive?Perhapsphysical activityreducesthecravingforcafeteriafood,sothatdiettypehaslesseffectwhen theratsareactivethanwhentheyareinactive.Toanswerthesequestionswemust extend ANOVA to take into account the fact that the 6 groups are formed from twoexplanatoryvariables.Thisistwo-wayANOVA. Wewilldiscussfollow-upanalysisinANOVAfirst,andthentwo-wayANOVA. Fortunately,thedistinctionbetweenone-wayandtwo-waydoesn’taffectthefollow- up methods we will present. So once you have mastered these methods in the one-waysetting,youcanapplythemimmediatelytotwo-wayproblems. Follow-upanalysis:Tukeypairwisemultiplecomparisons InExamples24.1and24.2wesawthatthereisgoodevidencethatthemeanbody weight of adult male Wistarratsis not the same when they are assignedto a diet consisting of chow only, chow plus restricted access to cafeteria food, and chow plusextendedaccesstocafeteriafood.1 ThesamplemeansinFigure26.1suggest that(aswemightexpect)themeanbodyweightishighestinratsgivenextended accesstocafeteriafoodandlowestinratsgivenchowonly. Baldi-4100190 psls December31,2013 11:1 26-4 CHAPTER26 ■ MoreaboutAnalysisofVariance:Follow-upTestsandTwo-WayANOVA EXAMPLE 26.1 Comparinggroups:individualtprocedures Let’s use A, B, and C to refer to the chow, restricted, and extended groups, respec- tively.Howmuchhigheristhemeanbodyweightofratsgivenrestrictedaccesstocafe- teria food than that of rats given chow only? A 95% confidence interval comparing GroupsAandBanswersthisquestion.BecausetheconditionsforANOVArequirethat thepopulationstandarddeviationbethesameinallthreepopulationsofrats,wecan useaversionofthetwo-samplet confidenceintervalthatalsoassumesequalstandard deviations. CristianCiobanu/Alamy TheMinitaboutputinFigure26.1givesthepooledstandarddeviation(firstdefined inchapters18and24,pages453and616)ass = 54.42grams(g).Thisisanestimate p of the common standard deviation σ based on all three samples. It has 47 degrees of freedom,thedegreesoffreedomfor“Error”intheANOVAtable.Thestandarderrorfor thedifferenceinsamplemeansx −x is(page453) A B (cid:2) 1 1 s + p n n A B A95%confidenceintervalforμ −μ wouldthereforebe A B (cid:2) 1 1 (x −x )±t∗s + A B p n n A B usingt∗ = 2.012fromtechnology(orapproximatelyt∗ = 2.021fromTableCfordf= 40,conservatively,sincethereisnorowfordf=47inTableC). pairwisedifference However,wereallywanttoestimateallthreepairwisedifferencesamongthepop- ulationmeans, μ −μ μ −μ μ −μ A B A C B C Three95%confidenceintervalswillnotgiveus95%confidencethatallthreesimultaneously capturetheirtrueparametervalues.Thisisthefamiliarproblemofmultiplecomparisons thatwediscussedinChapters22and24.■ In general, we want to give confidence intervals for all pairwise differences overallconfidence amongthepopulationmeansμ1,μ2,...,μk ofkpopulations.Wewantanoverall confidence level of (say) 95%. That is, in very many uses of the method, all the intervals will simultaneously capture the true differences 95% of the time. To do this,takethenumberofcomparisonsintoaccountbyreplacingthet criticalvalue ∗ t in Example 26.1 with another critical value based on the distribution of the differencebetweenthelargestandsmallestofasetofksamplemeans.Wewillcall ∗ ∗ thiscriticalvaluem ,formultiplecomparisons.Valuesofm nolongercomefrom a t table. They depend on the number of populations we are comparing and on thetotalnumberofobservationsinthesamples,aswellasontheconfidencelevel we want. Software is very helpful for practical use. This method is named after its inventor, John Tukey (1915–2000), the same man who developed the ideas ∗ of modern data analysis. A short table of m values for a 95% confidence level (TableG)isprovidedforconvenienceattheendofthischapter. Baldi-4100190 psls December31,2013 11:1 ■ Follow-upanalysis:Tukeypairwisemultiplecomparisons 26-5 TUKEY PAIRWISE MULTIPLE COMPARISONS IntheANOVAsetting,wehaveindependentSRSsofsizen fromeachofk i populationshavingNormaldistributionswithmeansμ andacommon i standarddeviationσ.Tukey simultaneous confidence intervalsforall pairwisedifferencesμ −μ amongthepopulationmeanshavetheform i j (cid:2) 1 1 (x −x )±m∗s + i j p n n i j Herex isthesamplemeanoftheithsampleands isthepooledestimateof i p σ.Thecriticalvaluem∗ dependsontheconfidencelevelC,thenumberof populationsk,andthetotalnumberofobservations N. Ifallsamplesarethesamesize,theTukeysimultaneousconfidenceintervals provideanoveralllevelC ofconfidencethatalltheintervalssimultaneously capturethetruepairwisedifferences.Ifthesamplesdifferinsize,thetrue confidencelevelisatleastaslargeasC.Thatis,theconclusionsarethen conservative. Tocarryoutsimultaneous testsofthehypotheses H : μ = μ 0 i j H : μ =(cid:3) μ a i j forallpairsofpopulationmeans,reject H foranypairwhoseconfidence 0 intervaldoesnotcontain0.Thesetestshaveoverallsignificancelevelnoless than1−C.Thatis,1−C istheprobabilitythat,whenallthepopulation meansareequal,anyofthetestsincorrectlyrejectsitsnullhypothesis. Minitab Session Tukey 95% Simultaneous Confidence Intervals All Pairwise Comparisons Individual confidence level = 98.05% Chow subtracted from: Lower Center Upper Restricted 7.03 51.68 96.33 Extended 40.05 85.50 130.95 −60 0 60 120 Restricted subtracted from: Lower Center Upper FIGURE 26.2 Additional Extended −13.47 33.82 81.12 MinitabANOVAoutputshowing −60 0 60 120 Tukeypairwisemultiplecomparisons fortheratweightdata,for Example26.2. Baldi-4100190 psls December31,2013 11:1 26-6 CHAPTER26 ■ MoreaboutAnalysisofVariance:Follow-upTestsandTwo-WayANOVA EXAMPLE 26.2 Ratsandacafeteria-stylediet:multipleintervals Figure26.2containsmoreMinitaboutputfortheANOVAcomparingthemeanbody weightsin3experimentalgroupsoflabrats.WeaskedforTukeymultiplecomparisons withanoverallerrorrateof5%.Thatis,theoverallconfidencelevelforthethreein- tervalstogetheris95%. The format of the Minitab output takes some study. Be sure you can see that the Tukeyconfidenceintervalsare 7.03to 96.33 for μ −μ B A 40.05to130.95 for μ −μ C A −13.47to 81.12 for μ −μ C B Ifyoudonothaveaccesstotechnology,theseintervalscaneasilybecomputedbyhand. Let’sseehowtoobtainthefirstinterval,forμ −μ .TableGattheendofthischapter A B gives values of m∗ when using an overall confidence level of 95% and various combi- nationsof Nandk.Startbyfindingtherightcombinationofk comparisons(toprow) and N−k degreesoffreedom(leftmargin).Inourexample,k = 3and N−k = 47, so m∗ = 2.434, approximately (based on a conservative df = 40, since df = 47 is not available).Theintervalforμ −μ istherefore B A (cid:2) (cid:2) 1 1 1 1 (x −x )±m∗s + =(657.31−605.63)±(2.434)(54.42) + B A p n n 16 19 B A =51.68±44.94 =6.74to96.62 Noticethatthevalueofm∗ weusehereislargerthanthevalueoft∗ inExample26.1. This is the price we pay for having 95% confidence not just in one interval but in all threesimultaneously.■ EXAMPLE 26.3 Ratsandacafeteria-stylediet:multipletests TheANOVAnullhypothesisisthatallpopulationmeansareequal, H:μ =μ =μ 0 A B C WeknowfromtheoutputinFigure26.1thattheANOVAF testrejectsthishypothesis (F = 10.71, P < 0.0005).Sowehavegoodevidencethatsomepairsofmeansarenot the same. Which pairs? Look at the simultaneous 95% confidence intervals in Exam- ple26.2.Whichoftheseintervalsdonotcontain0?Ifanintervaldoesnotcontain0, werejectthehypothesisthatthispairofpopulationmeansareequal. Theconclusionsare Wecanreject H:μ =μ 0 B A Wecanreject H:μ =μ 0 C A Wecannotreject H:μ =μ 0 C B Baldi-4100190 psls December31,2013 11:1 ■ Follow-upanalysis:Tukeypairwisemultiplecomparisons 26-7 Thatis, Wedohaveenoughevidencetoconcludethatμ =(cid:3) μ A B Wedohaveenoughevidencetoconcludethatμ =(cid:3) μ A C Wedonothaveenoughevidencetoconcludethatμ =(cid:3) μ B C This Tukey simultaneous test of three null hypotheses has the property that when all three hypotheses are true, there is only a 5% probability that any of the three tests wronglyrejectsitshypothesis.■ Recall what a test at a fixed significance level such as 5% tells us: either we dohaveenoughevidencetorejectthenullhypothesis,orthedatadonotgiveenough evidencetoallowrejection. Thestudyfoundevidencethatratsonachow-onlydietdiffersignificantlyin bodyweightfromratsgivenrestrictedaccessandfromratsgivenextendedaccess to a cafeteria-style diet. However, the study did not find evidence that restricted andextendedaccesstocafeteriafoodresultinratswithsignificantlydifferentbody weights.Thatis, x = 605.63and x = 657.31arefarenoughaparttoconclude A B thatthepopulationmeansμ andμ differ, x = 605.63and x = 691.13are A B A C far enough apart to conclude that the population means μ and μ differ, but A C x = 657.31isnotfarenoughfrom x = 691.13toruleoutthepossibilitythat B C thepopulationmeansμ andμ mightbethesame. B C NoticethattheTukeymethoddoesnotgiveaP-valueforthethreeteststaken together. Rather, we have a set of “reject” or “fail to reject” conclusions with an overallsignificancelevelthatwefixedinadvance,5%inthisexample.Thereare several other multiple-comparisons procedures that produce simultaneous confi- dence intervals with an overall confidence level or simultaneous tests with an overall probability of any false rejection. The Tukey procedures are arguably the most useful.2 If you can interpret results from Tukey, you can understand output fromothermultiple-comparisonsprocedures. APPLY YOUR KNOWLEDGE 26.1 Caffeineandsugar. Exercise 24.26 (page 624) describes a double-blind random- izedexperimentthatassignedhealthyundergraduatestudentstodrinkoneoffour beveragesafterfastingovernight:water,waterwith75mgofcaffeine,waterwith 75gofglucose,andwaterwith75mgofcaffeineand75gofglucose.Thesubjects performed a cognitive task, and their reaction times in the task are summarized below(SEMisthestandarderrorofthemean):3 Beverage n x SEM Water 18 389.35 18.50 Waterandcaffeine 18 320.16 17.98 Waterandglucose 18 318.16 17.04 Water,caffeine,andglucose 18 336.44 14.02 Baldi-4100190 psls December31,2013 11:1 26-8 CHAPTER26 ■ MoreaboutAnalysisofVariance:Follow-upTestsandTwo-WayANOVA AnANOVAF testgivesasignificant P-valueof0.0134,withMSE=5186.0358. (a)Becauseallfourgroupsarethesamesize,themarginoferroristhesamefor all6pairwisecomparisons.ObtainthismarginoferrorusingTableGon page26-41.FindtheTukeysimultaneous95%confidenceintervalsforall pairwisecomparisonsofpopulationmeans. (b)Explaininsimplelanguagewhat“95%confidence”meansforthese intervals. (c)Whichpairsofmeansdiffersignificantlyattheoverall5%significance level? 26.2 Loggingintherainforest. Exercise 24.3 (page 604) describes a study comparing forest plots in Borneo that had never been logged (Group 1) with similar plots nearbythathadbeenlogged1yearearlier(Group2)and8yearsearlier(Group3). Thethreegroupscanbeconsideredtobeindependentrandomsamples.Thedata appearinTable24.2(page604);thevariableTreesisthenumberoftreesinaplot.4 Theone-wayANOVAshowninFigure24.4comparedthemeancountsoftreesin the3typesofforestplotsandwasstatisticallysignificant,with P =0.0002.Italso gaveMSE=27.3574. (a)FindtheTukeysimultaneous95%confidenceintervalsforallpairwise comparisonsofpopulationmeans.UsesoftwareorTableGonpage26-41. (b)Explaininsimplelanguagewhat“95%confidence”meansforthese intervals. (c)Whichpairsofmeansdiffersignificantlyattheoverall5%significancelevel? 26.3 Whichcolorattractsbeetlesbest? Example24.4(page611)presentsdataonthe numbers of cereal leaf beetles trapped by boards of four different colors.5 Yellow boards appear most effective. ANOVA gives very strong evidence (P < 0.0005, MSE=32.167)thatthecolorsdifferintheirabilitytoattractbeetles. (a)Howmanypairwisecomparisonsaretherewhenwecomparefourcolors? (b)UsesoftwareorTableGonpage26-41toobtaintheTukeysimultaneous95% confidenceintervalsforallpairwisecomparisonsofpopulationmeans.Which pairsofcolorsaresignificantlydifferentwhenwerequireasignificancelevelof 5%forallcomparisonsasagroup? 26.4 Dogs,friends,andstress. InExercise24.4(page605)youexaminedtheeffectof petsinstressfulsituationsfromtheEESEEstory“StressamongPetsandFriends.” The ANOVA F test had a very small P-value, giving good reason to conclude thatmeanheartratesunderstressdodifferdependingonwhetherapet,afriend, ornooneispresent.Table24.3(page606)displaysthesubjects’meanheartrate duringastressfultask.Wewanttoknowwhetherthemeansforthetwotreatments (pet,friend)differsignificantlyfromeachotherandfromthemeanforthecontrol group. (a)Whatarethecorrespondingthreenullhypotheses? (b)Wewanttobe95%confidentthatwedon’twronglyrejectanyofthethree nullhypotheses.Tukeypairwisecomparisonscangiveconclusionsthat meetthiscondition.Whataretheconclusions?UsesoftwareorTableGon page26-41.TheMinitaboutputforthesedatashowninFigure24.5 (page606)indicates“PooledStDev=9.208.” ■ Baldi-4100190 psls December31,2013 11:1 ■ Follow-upanalysis:contrasts 26-9 Follow-upanalysis:contrasts* Multiple-comparisons methods give conclusions about all comparisons in some class with a measure of confidence that applies to all the comparisons taken to- gether.Forexample,Tukey’smethodgivesconclusionsaboutallpairwisecompar- isons among a set of population means. These methods are most useful when we didnothaveanyspecificcomparisoninmindbeforeweproducedthedata. Multiple-comparisonsproceduressometimesgivetestsorconfidenceintervals for comparisons that don’t interest us. And they may leave out comparisons that dointerestus.Ifwehavespecificquestionsinmindbeforeweproducedata,itismore efficienttoplanananalysisthatasksthesespecificquestions. EXAMPLE 26.4 Whichcolorattractsbeetlesbest? Whatcolorshouldweuseonstickyboardsplacedinafieldofoatstoattractcerealleaf beetles? Example 24.4 (page 611) gives data from an experiment in which 24 boards (6 each of blue, green, white, and yellow) were placed at random locations in a field. ANOVAshowsthattherearesignificantdifferencesamongthemeannumbersofbeetles trapped by these colors. We might follow ANOVA with Tukey pairwise comparisons (Exercise26.3). But in fact we have specific questions in mind: We suspect that warm colors are generallymoreattractivethancoldcolors.Thatis,beforeanydataaregathered,wesuspect thatblueandwhiteboardswillhavesimilarproperties,thatgreenandyellowboardswill givesimilarresults,andalsothattheaveragebeetlecountforgreenandyellowwillbe HoltStudiosInternational/Alamy greater than the average count for blue and white. Therefore, we want to test three hypotheses: Hypothesis1 Hypothesis2 Hypothesis3 H:μ =μ H:μ =μ H:(μ +μ )/2=(μ +μ )/2 0 B W 0 G Y 0 Y G B W H:μ =(cid:3) μ H:μ =(cid:3) μ H:(μ +μ )/2>(μ +μ )/2 a B W a G Y a Y G B W Twoofthesehypothesesinvolvepairwisecomparisons.Thethirddoesnotandalsohas aone-sidedalternative.■ We can ask questions about population means by specifying contrasts among themeans. CONTRASTS IntheANOVAsettingcomparingthemeansμ ,μ ,...,μ ofk 1 2 k populations,apopulation contrastisacombinationofthemeans L = c μ +c μ +···+c μ 1 1 2 2 k k withnumericalcoefficientsthataddto0,c +c +···+c = 0. 1 2 k *Thismaterialisoptional. Baldi-4100190 psls December31,2013 11:1 26-10 CHAPTER26 ■ MoreaboutAnalysisofVariance:Follow-upTestsandTwo-WayANOVA EXAMPLE 26.5 Attractingbeetles:contrasts WecanrestatethethreehypothesesinExample26.4intermsofthreecontrasts: L =(1)(μ )+(0)(μ )+(−1)(μ )+(0)(μ ) 1 B G W Y L =(0)(μ )+(1)(μ )+(0)(μ )+(−1)(μ ) 2 B G W Y L =(−1/2)(μ )+(1/2)(μ )+(−1/2)(μ )+(1/2)(μ ) 3 B G W Y Checkthatthefourcoefficientsineachlinedoaddto0.Intermsofthesecontrasts,the hypothesesbecome Hypothesis1 Hypothesis2 Hypothesis3 H: L =0 H: L =0 H: L =0 0 1 0 2 0 3 H: L =(cid:3) 0 H: L =(cid:3) 0 H: L >0 a 1 a 2 a 3 ■ Somestatisticalsoftwarewilltesthypothesesandgiveconfidenceintervalsfor anycontrastsyouspecify.Becauseothersoftwarelacksthiscapability,here’show toproceedbyhand,usinginformationfrombasicANOVAoutput. Toestimateapopulationcontrast L = c μ +c μ +···+c μ 1 1 2 2 k k samplecontrast usethecorrespondingsample contrast ^L = c x +c x +···+c x 1 1 2 2 k k ^ Thesamplecontrast L hasstandarderror(estimatedstandarddeviation) (cid:3) (cid:4) (cid:4)(cid:5)c2 c2 c2 SE = s 1 + 2 +···+ k ^L p n n n 1 2 k INFERENCE ABOUT A POPULATION CONTRAST IntheANOVAsetting,alevelC confidence intervalforapopulation contrastis ^L ±t∗SE ^L ^ ∗ where L isthecorrespondingsamplecontrastandt isacriticalvaluefrom thet distributionwiththedegreesoffreedomforerrorintheANOVA. Totest H : L = 0,usethet statistic 0 ^ L t = SE ^L withthesamedegreesoffreedom. For one-way ANOVA, the degrees of freedom for error are N−k, where N isthetotalnumberofobservationsandk isthenumberofpopulationscompared
Description: