ebook img

Section 6-comparing means and analysis of variance (ANOVA) PDF

33 Pages·2015·0.41 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Section 6-comparing means and analysis of variance (ANOVA)

Section VI Comparing means & analysis of variance 0 VII – Analysis of variance Analysis of variance (ANOVA) refers to methods for comparing means. It can also be thought of as a special case of linear regression where all of the predictor variables (Xs) take on categorical values. Gender, diagnosis and ethnicity are examples of categorical predictors. The outcome, Y, is continuous. In comparing means with ANOVA, as opposed to doing lots of t tests, the results (SEs, p values) are based on a pooled SD, not the individual SDs. In a study design with several factors, each cross classified combination of the factors (predictors) form a “cell”. For example if the predictors are sex (male or female) and dementia (yes or no), there are four possible cross classified categories or four “cells” (males without dementia, males with dementia, females without dementia, females with dementia) each with their cell mean for Y. In balanced ANOVA, the sample size is the same for every cross classified combination of the X predictors. That is, the sample size is the same in every cell. When properly coded, predictor variables in balanced ANOVA models are all uncorrelated. The uncorrelated variables are called “orthogonal” since this is an “artificially” induced zero correlation. Males Females Overall Dementia Cell Cell Margin No dementia Cell Cell Margin Overall Margin Margin 1 Presenting means - ANOVA data Since all of the factors are discrete, it is often easy and strongly desirable to make graphs of the means as a function of the factors. mean serum glucose (mg/dl) by drug and gender One1 60 140 ONe dl) g/ 120 m e ( 100 s o c u gl 80 m u r 60 e s Males n a 40 e Females m 20 0 A B C D Drug One can also add “error bars” to these means. In analysis of variance, these error bars are based on the sample size and the pooled standard deviation, SD . This SD is the same residual SD e e e as in regression. 2 When comparing means, the “yardstick” is critical. In the weight loss comparison below, is a 4 lb difference “big”? Compared to what? What is the “yardstick”?        Diet       mean weight loss (lbs)    n        Pritikin                 5.0                    20        UCLA GS              9.0                    20   mean difference     4.0  Example, SD=1, SE diff = 0.32, t=12.6, p value < 0.001 3 What if the SD changes? In the example below, the means are still 4 lbs apart and have not changed but SD=5, SE diff=1.58, t=2.5, p value=0.02 The p value is smaller since the SD is larger. A larger SD makes the SE diff larger and the p value larger. 4 Comparing Means Two groups – t test (review) Mean differences are judged “statistically significant” (different beyond chance) relative to their standard error (SE ), a measure of mean variability (“noise”). d __ __ t = (Y - Y ) = “signal” 1 2 SE “noise” d _ Y = mean of group i, SE =standard error of mean difference i d The t statistic is the mean difference in SE units. The p value is a d function of the t statistic. As |t| increases, p value gets smaller. Rule of thumb: statistical significance, defined as p < 0.05, is achieved if |t| > 2 __ __ or | Y - Y | > t SE = 2 SE =LSD 1 2 cr d d t SE = 2 SE is sometimes referred to as the critical cr d d distance or the LSD=least significant difference. So, getting the correct SE is crucial!! d The SE is the “yardstick” for significance and depends on: a) the mean difference, b) the SD=the individual variability, c) the sample size. 5 How to compute SE (review) d SE depends on n, SD and study design. d (study design example: factorial or repeated measures) For a single mean, if n=sample size _ _____ 2 SEM = SD/n = SD /n __ __ For a mean difference (Y - Y ) 1 2 The SE of the mean difference, SE d is given by _________________ 2 2 SE =  [ SD /n + SD /n ] or d 1 1 2 2 ________________ 2 2 SE =  [SEM + SEM ] d 1 2 If data is paired (before-after), first compute differences (d=Y - i 2i Y ) for each person 1i For paired data: SE =SD(d )/√n d i 6 3 or more groups-analysis of variance (ANOVA) Pooled SDs What if we have many treatment groups, each with its own mean and SD? Group Mean SD sample size (n) __ A Y SD n 1 1 1 B Y SD n 2 2 2 C Y SD n 3 3 3 … __ k Y SD n k k k Usually, at least on some scale, (perhaps the log scale) there is a single true underlying SD, called , for all groups. The observed SDs, SD , SD , … 1 2 SD vary around  due to “chance” (random variation). This will be true k when the variability is caused by the equipment or the experimental conditions, not by the treatments / groups. One can check visually for “variance homogeneity” by plotting “X”=means versus “Y”=SDs. Should get a scatter about a horizontal line at  if SD homogeneity is true. 7 If the constant variance assumption is reasonable, then the best thing to do is pool to ALL of the sample SDs into a single common estimate, the pooled SD . This is the main idea for an analysis of variance (as opposed to a e bunch of t tests). When the individual SDs only vary randomly, the SD is more accurate than e any of the individual SDs and thus gives more accurate standard errors for means and mean differences. It also provides a common “yardstick”. 2 2 SD = SD = pooled error e 2 2 2 (n -1) SD + (n -1) SD + … (n -1) SD 1 1 2 2 k k (n -1) + (n -1) + … (n -1) 1 2 k ____ 2 so, SD =  SD e e In ANOVA - we use this pooled SD to compute SE and to compute “post e d hoc” (post pooling) t statistics and p values. ____________________ 2 2 SE =  [ SD /n + SD /n ] d 1 1 2 2 ____________ = SD  (1/n ) + (1/n ) e 1 2 since SD and SD are replaced by SD a 1 2 e “common yardstick”. Note: If n =n =n, then SE = SD 2/n=constant 1 2 d e 8 Comparing means – post hoc t test under ANOVA The usual t test is: t = (Y – Y )/SE 1 2 d ________________________ 2 2 where SE =  [ SD /n + SD /n ] d 1 1 2 2 Under ANOVA , SD =SD =… SD =SD =SD 1 2 k e pooled So, under ANOVA, t is as above except SD , is used e in place of SD & SD for any comparison between 1 2 two means. 2 2 So, SE =√SD /n + SD /n = SD √(1/n +1/n ). d e 1 e 2 e 1 2 If n =n =…n =n, SE = SD √2/n – a constant for all 1 2 k d e k mean comparisons. 9

Description:
Analysis of variance (ANOVA) refers to methods for comparing means. Margin Margin “big”? Compared to what? What is the “yardstick”? Diet mean weight loss (lbs) n (study design example: factorial or repeated measures).
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.