Chapter 4 ST 370 - Factorial Experiments and ANOVA Readings: Chapter 13.1-13.2, Chapter 14.1-14.4 Recap: So far we’ve learned: • Why we want a ‘random’ sample and how to achieve it (Sampling Scheme) • How to use randomization, replication, and control/blocking to create a valid experi- ment. • Now well look at a specific type of experiment and how to investigate which factors are important. Motivating Example: Mentos and Coke Consider an experiment where we want to determine the effect of initial volume (591 ml, 1000 ml, or 2000ml) on the % of coke expelled when Mentos (the freshmaker) are dropped in. A CRD was used. What is the response? Factor(s)? Level(s)? Treatments? What parameters might answer our question? 30 Suppose we collect data. Consider the following two hypothetical sets of boxplots for the data: Which set of boxplots gives more evidence that the true means differ? Although we’ll never know the true values of the parameters, we can use our sample data to estimate them. 31 How to use these estimates to make a claim? One-Way Analysis of Variance (ANOVA) Model (Used to analyze a CRD): Consider the data below We ‘fit’ the following model to this completely randomized design: 32 Is the factor important in our One-Way ANOVA model? How can we estimate each treatment mean? If these sample means differ by enough, what would this imply? - the difference(s) in the mean response when the factor goes from one level to another. Here, we would have two elements of our main effect. Write down their true values and their estimates. If the differences are not 33 Ok, so now we can estimate the treatment means for the Mentos and Coke example. Does there appear to be evidence the factor is important? What information would help? One-Way ANOVA model in statcrunch 34 Remember: Statistics is all about Variation! Total amount of variation in the data: ANOVA (Analysis of Variance) table splits up this total variation into different sources to help determine which sources are statistically significant. The ANOVA table generally has 6 columns • Source: • SS: • Df: Degrees of Freedom • MS: • F-stat: • P-value: 35 In One-Way ANOVA we only have 2 sources we care about (recall sources of variation from the design of experiments section!): • Treatment Effect • Error Source: Treatment Effect • SS(Trt) = • DF: For t treatments, we have • Often called • MS(Trt) = Source: Error • SS(E) = • DF: For N total observations and t treatments, we have • Often called • MS(E) = 36 Table for balanced one-way ANOVA: Source DF SS MS F Treatments t−1 SS(T) MS(T) = SS(T) F = MS(T) (t−1) MS(E) Error t(n−1) SS(E) MS(E) = SS(E) (N−t) Total nt−1 SS(TOT) where t n t (cid:88)(cid:88) (cid:88) SS(T) = (y¯ −y¯ )2 = n (y¯ −y¯ )2 i• •• i• •• i=1 j=1 i=1 t n (cid:88)(cid:88) SS(E) = (y −y¯ )2 ij i• i=1 j=1 t n (cid:88)(cid:88) SS(Tot) = (y −y¯ )2 ij •• i=1 j=1 Notes: SS(T) is also called SS(Between) and SS(E) is also called SS(Within). Treatment DF + Error DF = Total DF SS(Trt) + SS(E) = SS(Tot) More on the F-ratio and P-value 37 Recall Boxplot idea: Consider the following two hypothetical sets of boxplots for the data: We use the p-value to determine if the F-ratio is ‘large’ enough. If p-value is less than a pre-specified value (usually 0.05) we say have evidence that the main effect(s) are not all 0. That is Idea of a P-value • P-values are • Here, p-value represents • P-value for Initial Volume = 0.0148. Small! Goals of One-Way ANOVA • Determine if the factor is related to the response. • If so, estimate the main effects (factor level differences) 38 One-Way ANOVA Example: (some description taken from Goosen, 2014) Consider having 24 pieces of cheese. Color of the cheese is important in terms of consumer satisfaction. We have interest in how the color differs for 4 different types of corn syrup (26, 42, 55, and 62) (4 treatments). A CRD design is decided upon and we randomly assign each corn syrup type to 6 pieces of cheese (6 replicates for each treatment). As a response, we measure the color using a 3 part CIE L*a*b* Color System. • ‘L’ reflects the lightness of a sample, from black (L = 0) to white (L = 100) and runs from top to bottom. • ‘a’ defines the shades from red (positive values) to green (negative values). • ‘b’ defines the shades from yellow (positive values) to blue (negative values). All three of these could be treated as responses (and analyzed together), but for our purposes we will only look at the ‘L’ response variable. Again, we will focus on the means of the population. How might we make inference here? Define • µ = mean ‘L’ score for all pieces of cheese that with corn syrup 26. 1 • µ = mean ‘L’ score for all pieces of cheese that with corn syrup 42. 2 • µ = mean ‘L’ score for all pieces of cheese that with corn syrup 55. 3 • µ = mean ‘L’ score for all pieces of cheese that with corn syrup 62. 4 1. What is our factor and what are the levels of that factor? 2. What hypothesis do we want to test? 39
Description: