ebook img

ANOVA and R-squared revisited. Multiple regression and r-squared. correlation matrix PDF

28 Pages·2016·1.16 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview ANOVA and R-squared revisited. Multiple regression and r-squared. correlation matrix

Agenda for Week 7, Hour 1 ANOVA and R-squared revisited. Multiple regression and r-squared. Week 7, Hour 2 Multiple regression: co-linearity, perturbations, correlation matrix Stat 302 Notes. Week 7, Hour 1, Page 1 / 28 Consider this made-up dataset on silicon wafers, wafers.csv. It’s based on a very common type of quality control analysis in manufacturing. A factory manager is interested in reducing the number of bad wafers the factory produces in a batch. She sets the factory to make 6 batches of wafers each at 3 levels of cooking temperature and 3 levels of spin speed. There are 54 batches of wafers in total. The response variable is number of bad wafers (in a batch of 1000). Stat 302 Notes. Week 7, Hour 1, Page 2 / 28 Here are select rows from the dataset. 'cooktemp' is the cooking temperature in Celcius 'spinrpm' is the spin rate while cooling, in RPM 'bad' is the number of bad wafers in the batch Stat 302 Notes. Week 7, Hour 1, Page 3 / 28 Note that even though we can describe temperature and speed as continuous variables, we are treating them as categories here. Essentially we are calling them ‘low’, ‘medium’, and ‘high’ settings. wafers$spinrpm = as.factor(wafers$spinrpm) wafers$cooktemp = as.factor(wafers$cooktemp) Stat 302 Notes. Week 7, Hour 1, Page 4 / 28 Here is the one-way ANOVA of 'bad' using cooking temperature as an explanatory variable. mod = lm(bad ~ cooktemp, data=wafers) anova(mod) Stat 302 Notes. Week 7, Hour 1, Page 5 / 28 p value is small, so we have strong evidence that cooking temperature matters. Without the p-value, we could compare the obtained F to a critical value for F. (Recall: F test is one-tailed, we only care about larger variances) Stat 302 Notes. Week 7, Hour 1, Page 6 / 28 A hypothesis test tells us that there some of variance in bad wafer count is explained by cooktemp. It doesn't tell us how much of the variance is explained. For that we need the Sum of Squares total, SS + SS = 727 + 2934 = 3661 which is group resid Stat 302 Notes. Week 7, Hour 1, Page 7 / 28 Proportion of variance explained, or R-squared = SS / SS group total = 727 / 3661 = 0.1986, or 19.86% of variation explained. Stat 302 Notes. Week 7, Hour 1, Page 8 / 28 We can also get this information from the summary of the lm() object that we used to get the ANOVA in the first place. There's no such thing as a correlation in an ANOVA, but the sometimes the ANOVA is referred to as having an R-squared because of this variance explained connection. Stat 302 Notes. Week 7, Hour 1, Page 9 / 28 A two-armed bird needs a two-way ANOVA Stat 302 Notes. Week 7, Hour 1, Page 10 / 28

Description:
Here is the one-way ANOVA of 'bad' using cooking temperature as an . change) such that will score 5 more goals in a season, but also allow 3 more
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.