Author: Brenda Gunderson, Ph.D., 2015 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution- NonCommercial-Share Alike 3.0 Unported License: http://creativecommons.org/licenses/by-nc-sa/3.0/ The University of Michigan Open.Michigan initiative has reviewed this material in accordance with U.S. Copyright Law and have tried to maximize your ability to use, share, and adapt it. The attribution key provides information about how you may share and adapt this material. Copyright holders of content included in this material should contact [email protected] with any questions, corrections, or clarification regarding the use of content. For more information about how to attribute these materials visit: http://open.umich.edu/education/about/terms-of-use. Some materials are used with permission from the copyright holders. You may need to obtain new permission to use those materials for other uses. This includes all content from: Attribution Key For more information see: http:://open.umich.edu/wiki/AttributionPolicy Content the copyright holder, author, or law permits you to use, share and adapt: Creative Commons Attribution-NonCommercial-Share Alike License Public Domain – Self Dedicated: Works that a copyright holder has dedicated to the public domain. Make Your Own Assessment Content Open.Michigan believes can be used, shared, and adapted because it is ineligible for copyright. Public Domain – Ineligible. Works that are ineligible for copyright protection in the U.S. (17 USC §102(b)) *laws in your jurisdiction may differ. Content Open.Michigan has used under a Fair Use determination Fair Use: Use of works that is determined to be Fair consistent with the U.S. Copyright Act (17 USC § 107) *laws in your jurisdiction may differ. Our determination DOES NOT mean that all uses of this third-party content are Fair Uses and we DO NOT guarantee that your use of the content is Fair. To use this content you should conduct your own independent analysis to determine whether or not your use will be Fair. Statistics 250 Lab Workbook Fall 2015 Weekly Labs and Supplements Used in all lab sections of Stat 250 Dr. Brenda Gunderson Department of Statistics University of Michigan Table of Contents Material Page Note to Students and Supplements 1 Supplement 1: R Commands Summary 2 Supplement 2: Notation Sheet 4 Supplement 3: Name That Scenario 6 Supplement 4: Interpretation Examples 8 Supplement 5: Summary of the Main t-‐Tests 10 Supplement 6: Regression Output in R 12 Lab 1: Describing Data with Graphs and Numbers 15 Lab 2: Probability and Random Variables 19 Lab 3: Confidence Intervals for a Population Proportion 27 Lab 4: Hypothesis Testing for a Population Proportion 33 Lab 5: Understanding Normal and Random Data 39 Lab 6: Learning about a Population Mean 47 Lab 7: Paired Data Analysis 53 Lab 8: Comparing Two Means 59 Lab 9: One-‐Way Analysis of Variance (ANOVA) 67 Lab 10:Exploring Linear Regression 75 Lab 11: Regression Inference 81 Lab 12: Chi-‐Square Tests 85 Note to Students Welcome to Statistics 250 at the University of Michigan! This is the first summer term in which R and R Commander will be used as the software package for Stats 250. Some of the reasons why we made this switch are: • The ability to use R is a valuable skill recognized by employers. • Other Statistics courses use R and this will make for an easier transition into these next courses. • R is a free, open source software that can be downloaded onto student machines, so students can have access to it any time on their personal devices and won't have to use Virtual Sites. This lab workbook is designed for you to use in lab and as extra preparation for exams. In the workbook, you will find the following materials: Supplemental Material – great summaries for reference throughout the term: 1. R Commands Reference 2. Notation Sheet 3. Name That Scenario 4. Interpretation Examples 5. Summary of T-‐tests 6. Regression Output in R Weekly Labs (numbered 1 to 12) – each lab contains the follow parts: o Lab Background – objective and brief overview material, which is good to take a couple minutes to read before you come to lab each week. o Warm-‐Up Activity – quick questions for you to do before the In-‐Lab Project, usually a quick review of concepts you have seen in lecture. o ILP (In-‐Lab Project) – one or more activities you will work on in lab, in groups. o Cool-‐Down Activity –questions for you to do after the ILP for further reflection and application of the concepts covered in the ILP. The Labs are designed to be interactive and to provide you with a complete example for each concept. Completing the corresponding PreLab assignment (a link to video instructions for PreLabs will be on Canvas and the Stat 250 YouTube channel) and reading the upcoming lab background overview before lab each week is a good way to prepare for the various lab activities. Good luck in Statistics 250! -‐-‐ The Stat 250 Instructors and GSIs Special Thanks to the Statistics Graduate Students Kit Clement Sean Pikosz Daniel Walter For their substantial contributions to transition and modernize the Lab Materials to the Awesome R computing package 1 Supplement 1: R Commands Summary By Lab – For Quick Reference Lab 1 – Bar Charts, Histograms, Numerical Summaries, Boxplots Open a data file after loading R Commander: Data > Load data set To produce a Histogram: Graphs > Histogram To generate Descriptive Statistics: Statistics > Summaries > Numerical summaries To produce a Bar Chart: Graphs > Bar Graph To produce a Boxplot: Graphs > Boxplot Lab 5 – Time Plots, QQ Plots To produce a Sequence or Time Plot for the variable named “VARIABLE” in the data set “DATA” you must type these two lines of code into the R Script box: plot(DATA$VARIABLE, type =”l”, main="Normal QQ Plot of variable by name") Note that you can find the dataset name in blue text at the top. To find variable names, click View data set and look at the top row. To create the plot, highlight the above code and click the Submit button. To produce a QQ Plot: you can use the built in option under Graphs > Quantile-‐comparison plot Or you can make a QQ plot for the variable “VARIABLE” in the data set “DATA” by typing these two lines of code into the R Script box: qqnorm(DATA$VARIABLE, main="Normal QQ Plot of variable by name") qqline(DATA$VARIABLE) Then highlight this code and click the Submit button. Lab 6 – One-‐Sample t Procedures for a Population Mean To perform a One-‐Sample T Test for a population mean and obtain a confidence interval: Statistics > Means > Single-‐sample t-‐test Lab 7 – Paired t Procedures To perform a Paired T Test and obtain a confidence interval: Statistics > Means > Paired t-‐test To compute Differences: Data > Manage variables in active data set > Compute new variable. Lab 8 – Independent Samples t Procedures To perform Levene’s Test: Statistics > Variances > Levene’s Test 2 To perform a Two-‐Samples T Test and obtain a confidence interval: Statistics > Means > Independent samples t-‐test Lab 9 – One-‐way Analysis of Variance (ANOVA) To perform an ANOVA: Statistics > Means > One-‐Way ANOVA Lab 10 and 11 – Linear Regression To produce the correlation (R) for all pairs of variables: Statistics > Summaries > Correlation matrix To produce a Scatterplot: Graphs > Scatterplot To perform a Linear Regression: Statistics > Fit models > Linear regression To produce a Residual plot and QQ Plot of residuals, first make sure you have the correct model selected, then follow: Models > Graphs > Basic diagnostic plots Lab 12 – Chi-‐Square Tests To perform a Goodness of Fit Test: Statistics > Summaries > Frequency distributions. Make sure to check the box to run a goodness of fit test, and then you can specify the null probabilities. To perform a Test of Independence: Statistics > Contingency tables > Two-‐way table To perform a Test of Homogeneity: Statistics > Contingency tables > Two-‐way table 3 Supplement 2: Notation Sheet The table below defines important notations, including that used by R, which you will come across in the course. This is not an exhaustive list, but it is a fairly comprehensive overview of the “strange letters” used in the course. Note: Blank cells mean there is no corresponding notation. Notation used in R Name Population Notation Sample Notation Commander Summary Measures Mean μ (read as “mu”) x (x-‐bar) Mean Proportion p pˆ (p-‐hat) Standard deviation σ (sigma) s Varies, often “sd” Variance σ2 s2 Variance Sample size n n (sometimes N) Confidence Intervals z* (z-‐star) Multipliers t* (t-‐star) Margin of error m, m.e. Hypothesis Testing z t t Test statistics Note: t, F, and χ2 statistics F F have degrees of freedom (abbreviated df) associated with them. Look for these χ2 (chi-‐square) Chi-‐square on your Formula Card. Significance level α(alpha) Pr(*) p-‐value p-‐value (the star will depend on what test is being used) 4 Name Population Notation Sample Notation Notation used in R Analysis of Variance (abbreviated ANOVA) Row labeled with the Sum of squares for SSG grouping variable, groups column labeled Sum Sq Sum of squares for Row labeled Residuals, SSE error column labeled Sum Sq Row labeled with the Mean square for grouping variable, MSG groups column labeled Mean Sq Row labeled Residuals, Mean square error MSE column labeled Mean Sq Regression Response (dependent) (given by name of y-‐ y y variable variable) Predicted (estimated) E(y) (expected value of yˆ (y-‐hat) response y) Explanatory (given by name of x-‐ x x (independent) variable variable) B (look in the row y-‐intercept β (beta-‐not) b o o labeled (Intercept)) B (look in the row Slope β (beta-‐one) b labeled with the name 1 1 of the x-‐variable) Coefficient of Values in Correlation r correlation Matrix Coefficient of r2 Multiple-‐R Squared determination Error terms vs Unstandardized ε(error terms) e (residuals) Residuals residuals 5
Description: