Statistics 514: Checking Model Adequacy Lecture 4. Checking Model Adequacy Montgomery: 3-4, 15-1.1 Fall, 2005 Page 1 Statistics 514: Checking Model Adequacy Model Checking and Diagnostics (cid:15) Model Assumptions 1 Model is correct 2 Independent observations 3 Errors normally distributed 4 Constant variance − − yij = (y:: + (yi: y::)) + (yij yi:) yij = y^ij + (cid:15)^ij observed = predicted + residual (cid:15) Note that the predicted response at treatment i is y^ij = y(cid:22)i: (cid:15) Diagnostics use predicted responses and residuals. Fall, 2005 Page 2 Statistics 514: Checking Model Adequacy Diagnostics (cid:15) Normality – Histogram of residuals – Normal probability plot / QQ plot – Shapiro-Wilk Test (cid:15) Constant Variance – Plot (cid:15)^ij vs y^ij (residual plot) – Bartlett’s or Levene’s Test (cid:15) Independence – Plot (cid:15)^ij vs time/space – Plot (cid:15)^ij vs variable of interest (cid:15) Outliers Fall, 2005 Page 3 Statistics 514: Checking Model Adequacy Constant Variance (cid:15) 2 In some experiments, error variance ((cid:27) ) depends on the mean response i E(y ) = (cid:22) = (cid:22) + (cid:28) : ij i i So the constant variance assumption is violated. (cid:15) Size of error (residual) depends on mean response (predicted value) (cid:15) Residual plot – Plot (cid:15)^ij vs y^ij – Is the range constant for different levels of y^ij (cid:15) More formal tests: – Bartlett’s Test – Modified Levene’s Test. Fall, 2005 Page 4 Statistics 514: Checking Model Adequacy Bartlett’s Test (cid:15) 2 2 2 H : (cid:27) = (cid:27) = : : : = (cid:27) 0 1 2 a (cid:15) 2 q Test statistic: (cid:31) = 2:3026 0 c where P − 2 − a − 2 q = (N a)log10Sp i=1(ni 1)log10Si P 1 a − −1 − − −1 c = 1 + ( (n 1) (N a) ) i 3(a−1) i=1 2 2 and S is the sample variance of the ith population and S is the pooled i p sample variance. (cid:15) 2 2 Decision Rule: reject H0 when (cid:31)0 > (cid:31)(cid:11);a−1. Remark: sensitive to normality assumption. Fall, 2005 Page 5 Statistics 514: Checking Model Adequacy Modified Levene’s Test (cid:15) For each fixed i, calculate the median mi of yi1; yi2; : : : ; yin . i (cid:15) Compute the absolute deviation of observation from sample median:[ j − j d = y m ij ij i for i = 1; 2; : : : ; a and j = 1; 2; : : : ; ni, (cid:15) Apply ANOVA to the deviations: dij (cid:15) 2 2 Use the usual ANOVA F-statistic for testing H0 : (cid:27)1 = : : : = (cid:27)a. Fall, 2005 Page 6 Statistics 514: Checking Model Adequacy options ls=80 ps=65; title1 ’Diagnostics Example’; data one; infile ’c:\saswork\data\tensile.dat’; input percent strength time; proc glm data=one; class percent; model strength=percent; means percent / hovtest=bartlett hovtest=levene; output out=diag p=pred r=res; proc sort; by pred; symbol1 v=circle i=sm50; title1 ’Residual Plot’; proc gplot; plot res*pred/frame; run; proc univariate data=diag normal noprint; var res; qqplot res / normal (L=1 mu=est sigma=est); histogram res / normal; run; Fall, 2005 Page 7 Statistics 514: Checking Model Adequacy run; proc sort; by time; symbol1 v=circle i=sm75; title1 ’Plot of residuals vs time’; proc gplot; plot res*time / vref=0 vaxis=-6 to 6 by 1; run; symbol1 v=circle i=sm50; title1 ’Plot of residuals vs time’; proc gplot; plot res*time / vref=0 vaxis=-6 to 6 by 1; run; Fall, 2005 Page 8 Statistics 514: Checking Model Adequacy Diagnostics Example Sum of Source DF Squares Mean Square F Value Pr > F Model 4 475.7600000 118.9400000 14.76 <.0001 Error 20 161.2000000 8.0600000 Corrected Total 24 636.9600000 Levene’s Test for Homogeneity of strength Variance ANOVA of Squared Deviations from Group Means Sum of Mean Source DF Squares Square F Value Pr > F percent 4 91.6224 22.9056 0.45 0.7704 Error 20 1015.4 50.7720 Bartlett’s Test for Homogeneity of strength Variance Source DF Chi-Square Pr > ChiSq percent 4 0.9331 0.9198 Fall, 2005 Page 9 Statistics 514: Checking Model Adequacy Non-constant Variance: Impact and Remedy (cid:15) Does not affect F-test dramatically when experiment is balanced (cid:15) Why concern? – Comparison of treatments depends on MSE – Incorrect intervals and comparison results (cid:15) Variance-Stabilizing Transformations – Common transformations p p p x, log(x), 1=x, arcsin( x), and 1= x – Box-Cox transformations (cid:12) 1. approximate the relationship (cid:27)i = (cid:18)(cid:22)i , then the transformation is 1−(cid:12) X 2. use maximum likelihood principle (cid:3) Distribution often more “normal” after transformation Fall, 2005 Page 10
Description: