Statistical Power Analysis A Simple and General Model for Traditional and Modern Hypothesis Tests Second Edition This page intentionally left blank Statistical Power Analysis A Simple and General Model for Traditional and Modern Hypothesis Tests Second Edition Kevin R. Murphy Pennsylvania State University Brett Myors Griffith University LAWRENCE ERLBAUM ASSOCIATES, PUBLISHERS Mahwah, New Jersey London Copyright © 2004 by Lawrence Erlbaum Associates, Inc. All rights reserved. No part of this book may be repro- duced in any form, by photostat, microform, retrieval sys- tem, or any other means, without prior written permission of the publisher. Lawrence Erlbaum Associates, Inc., Publishers 10 Industrial Avenue Mahwah, NJ 07430 Cover design by Kathryn Houghtaling Lacey Library of Congress Cataloging-in-Publication Data Statistical power analysis : a simple and general model for traditional and modern hypothesis tests, second edition, by Kevin R. Murphy and Brett Myors. Includes bibliographical references and index. ISBN 0-8058-4525-9 (cloth : alk. paper) ISBN 0-8058-4526-7 (pbk. : alk. paper) Copyright information for this volume can be obtained by contacting the Library of Congress 2004 —dc21 2004000000 CIP Books published by Lawrence Erlbaum Associates are printed on acid-free paper, and their bindings are chosen for strength and durability. Printed in the United States of America 10 9 8 7 6 5 4 3 21 Disclaimer: This eBook does not include the ancillary media that was packaged with the original printed version of the book. Contents Preface vii 1 The Power of Statistical Jests 1 The Structure of Statistical Tests 2 The Mechanics of Power Analysis 7 Statistical Power of Research in the Social and Behavioral Sciences 15 Using Power Analysis 17 Hypothesis Tests Versus Confidence Intervals 20 Conclusions 21 2 A Simple and General Model for Power Analysis 22 The General Linear Model, the F Statistic, and Effect Size 24 The F Distribution and Power 25 Translating Common Statistics and Effect Size Measures Into F 30 Alternatives to the Traditional Null Hypothesis 33 Minimum-Effect Tests as Alternatives to Traditional Null Hypothesis Tests 36 Analytic and Tabular Methods of Power Analysis 41 Using the One-Stop F Table 43 The One-Stop PV Table 48 The One-Stop F Calculator 49 Effect Size Conventions for Defining Minimum-Effect Hypotheses 51 Conclusions 53 3 Using Power Analyses 55 Estimating the Effect Size 56 Four Applications of Statistical Power Analysis 59 V ri_ CONTENTS Conclusions 68 4 Multi-Factor ANOVA and Repeated-Measures Studies 69 The Factorial Analysis of Variance 70 Repeated Measures Designs 76 The Multivariate Analysis of Variance 82 Conclusions 83 5 Illustrative Examples 84 Simple Statistical Tests 84 Statistical Tests in Complex Experiments 90 Conclusions 96 6 The Implications of Power Analyses 98 Tests of the Traditional Null Hypothesis 99 Tests of Minimum-Effect Hypotheses 100 Power Analysis: Benefits, Costs, and Implications for Hypothesis Testing 104 Conclusions 110 References 113 Appendix A - Working With the Noncentral F Distribution 117 Appendix B - One-Stop FTable 119 Appendix C - One-Stop PV Table 131 Appendix D - df Needed for Power = .80 (a = .05) 149 in Tests of Traditional Null Hypothesis Appendix E - df Needed for Power =.80 (a = .05) 153 in Tests of the Hypothesis That Treatments Account for 1% or Less of the Variance in Outcomes Author Index 157 Subject Index 159 Preface One of the most common statistical procedures in the behavioral and social sciences is to test the hypothesis that treatments or interven- tions have no effect, or that the correlation between two variables is equal to zero, and so on (i.e., tests of the null hypothesis). Researchers have long been concerned with the possibility that they will reject the null hypothesis when it is in fact correct (i.e., make a Type I error), and an extensive body of research and data-analytic methods exists to help understand and control these errors. Substantially less attention has been devoted to the possibility that researchers will fail to reject the null hypothesis, when in fact treatments, interventions, and so forth, have some real effect (i.e., make a Type II error). Statistical tests that fail to detect the real effects of treatments or interventions might sub- stantially impede the progress of scientific research. The statistical power of a test is the probability that it will lead you to reject the null hypothesis when that hypothesis is in fact wrong. Be- cause most statistical tests are done in contexts where treatments have at least some effect (although it might be minuscule), power often trans- lates into the probability that the test will lead to a correct conclusion about the null hypothesis. Viewed in this light, it is obvious why re- searchers have become interested in the topic of statistical power, and in methods of assessing and increasing the power of their tests. This book presents a simple and general model for statistical power analysis based on the widely used F statistic. A wide variety of statistics used in the social and behavioral sciences can be thought of as special applications of the general linear model (e.g., t tests, analy- sis of variance and covariance, correlation, multiple regression), and the F statistic can be used in testing hypotheses about virtually any of these specialized applications. The model for power analysis laid out here is quite simple, and it illustrates how these analyses work and how they can be applied to problems of study design, to evaluating others' research, and even to problems such as choosing the appro- priate criterion for defining statistically significant outcomes. vii viii PREFACE In response to criticisms of traditional null hypothesis testing, sev- eral researchers have developed methods for testing what is referred to as a minimum-effect hypothesis (i.e., the hypothesis that the effect of treatments, interventions, etc. exceeds some specific minimal level). This is the first book to discuss in detail the application of power analysis to both traditional null hypothesis tests and to mini- mum-effect tests. It shows how the same basic model applies to both types of testing, and illustrates applications of power analysis to both traditional null hypothesis tests (i.e., tests of the hypothesis that treat- ments have no effect) and to minimum-effect tests (i.e., tests of the hy- pothesis that the effects of treatments exceeds some minimal level). A single table is used to conduct both significance tests and power anal- yses for traditional and for minimum-effect tests (The One-Stop F Ta- ble, presented in Appendix B), and some relatively simple procedures are presented that may be used to ask a series of important and so- phisticated questions about the research. This book is intended for a wide audience, and so presentations are kept simple and nontechnical wherever possible. For example, Appendix A presents some fairly daunting statistical formulas, but it also shows how a researcher with little expertise or interest in statis- tical analysis could quickly obtain the values needed to carry out power analyses for any range of hypotheses. Similarly, the first three chapters of this book present a few formulas, but the reader who skips them entirely will still be able to follow the ideas being pre- sented in this book. Finally, most of the examples presented herein are drawn from the social and behavioral sciences, as are many of the generalizations about statistical methods that are most likely to be used. In part, this reflects our biases (we are both psychologists), but it also reflects the fact that issues related to power analysis have been widely discussed in this literature over the last several years. Researchers in other ar- eas may find that some of the specific advice offered here does not ap- ply as well to them, but the general principles articulated in this book should be useful to researchers in a wide range of disciplines. This second edition includes a number of features that were not part of our first edition. First, a chapter (chap. 4) dealing with power analysis in multifactor analysis of variance (ANOVA), including re- peated measures designs, has been added. Multifactor ANOVA is very common in the behavioral and social sciences, and whereas the con- ceptual issues in power analysis are quite similar in factorial ANOVA as in other methods of analysis, there are several features of ANOVA that require special attention, and this topic deserves treatment in a separate chapter. Second, a "One-Stop PV Table" has been included, which presents the same information as in the One-Stop F Table,