ANOVA: Analysis of Variance Marc H. Mehlman [email protected] UniversityofNewHaven “The analysis of variance is (not a mathematical theorem but) a simple method of arranging arithmetical facts so as to isolate and display the essential features of a body of data with the utmost simplicity.” – Sir Ronald A. Fisher MarcMehlman MarcMehlman (UniversityofNewHaven) ANOVA:AnalysisofVariance 1/34 Table of Contents 1 ANOVA: One Way Layout 2 Comparing Means 3 ANOVA: Two Way Layout 4 Chapter #12 R Assignment MarcMehlman MarcMehlman (UniversityofNewHaven) ANOVA:AnalysisofVariance 2/34 ANOVA (analysis of variance) is for testing if the means of k different populations are equal when all the populations are independent, normal and have the same unknown variance. An ANOVA test compares the randomness (variance) within groups (populations) to the randomness between groups. To test if the means of all the populations are equal, one considers the ratio variance between groups variance within groups as a test statistic. A large ratio would indicate a difference between in means between the groups. MarcMehlman MarcMehlman (UniversityofNewHaven) ANOVA:AnalysisofVariance 3/34 ANOVA:OneWayLayout ANOVA: One Way Layout ANOVA: One Way Layout MarcMehlman MarcMehlman (UniversityofNewHaven) ANOVA:AnalysisofVariance 4/34 ANOVA:OneWayLayout The Idea of ANOVA The sample means for the three samples are the same for each set. The variation among sample means for (a) is identical to (b). The variation among the individuals within the three samples is much less for (b). CONCLUSION: the samples in (b) contain a larger amount of variation among the sample means relative to the amount of variation within the samples, so ANOVA will find more significant differences among the means in (b) - assuming equal sample sizes here for (a) and (b). - Note: larger samples will find more significant differences. 7 MarcMehlman MarcMehlman (UniversityofNewHaven) ANOVA:AnalysisofVariance 5/34 ANOVA:OneWayLayout Note: When k = 2, one usually uses the two–sample t test. However, ANOVA will give the same result. When k > 2, hypothesis testing two populations at a time does not work well. For instance, if one has four populations and each test is a significance level 0.05, then the significance level of all (cid:0)4(cid:1) = 6 tests 2 would be 1−(1−0.05)6 = 0.265. The ANOVA procedure is computationally intense - one usually uses a computer program. MarcMehlman MarcMehlman (UniversityofNewHaven) ANOVA:AnalysisofVariance 6/34 ANOVA:OneWayLayout Assumptions for doing ANOVA 1 the populations are normal. 2 the populations have same (unknown) variance. The above conditions are robust in the sense one can use ANOVA if the populations are approximately normal (otherwise the Kruskal–Wallis Test – a nonparametric test) and the population variances are approximately equal. Convention: Rule for establishing equal variance If the largest sample standard deviation is less than twice the smallest sample standard deviation, one can use ANOVA techniques under the assumption the variances are all the same. Some textbooks use four times the smallest sample variance instead of just twice. MarcMehlman MarcMehlman (UniversityofNewHaven) ANOVA:AnalysisofVariance 7/34 ANOVA:OneWayLayout The Treatment or Factor is what differs between populations. Example A Blood pressure drug is administered to k populations in k different doses. One samples from each of the the k populations. dosage #1 X ,··· ,X 11 1n1 . . . . . . dosage #k X ,··· ,X k1 kn k MarcMehlman MarcMehlman (UniversityofNewHaven) ANOVA:AnalysisofVariance 8/34 ANOVA:OneWayLayout Definition Let def k = # of levels (populations) n d=ef sample size of random sample from jth population j def N = n +n +···+n = total number of random varibles 1 2 k x¯ d=ef sample mean from jth population j s2 d=ef sample variance from jth population j def 1 (cid:88)k (cid:88)ni x¯ = the grand mean = x ij N i=1 j=1 MarcMehlman MarcMehlman (UniversityofNewHaven) ANOVA:AnalysisofVariance 9/34 ANOVA:OneWayLayout Definition (cid:88)k (cid:88)ni SS = (x −x¯)2 = Sum of Squares Total TOT ij i=1 j=1 def SS = Sum of Squares between levels A = n (x¯ −x¯)2+n (x¯ −x¯)2+···+n (x¯ −x¯)2 1 1 2 2 k k def SS = Sum of Squares within the levels E = (n −1)s2+(n −1)s2+···+(n −1)s2 1 1 2 2 k k Theorem SS = SS +SS . TOT A E MarcMehlman MarcMehlman (UniversityofNewHaven) ANOVA:AnalysisofVariance 10/34
Description: