Springer Series in Statistics Andersen/Borgan/Gill/Keiding: Statistical Models Based on Counting Processes. Atkinson/Riani: Robust Diagnotstic Regression Analysis. Berger: Statistical Decision Theory and Bayesian Analysis, 2nd edition. BolJarine/Zacks: Prediction Theory for Finite Populations. Borg/Groenen: Modem Multidimensional Scaling: Theory and Applications Brockwell/Davis: Time Series: Theory and Methods, 2nd edition. Chen/Shao/Ibrahim: Monte Carlo Methods in Bayesian Computation. David/Edwards: Annotated Readings in the History of Statistics. Devroye/Lugosi: Combinatorial Methods in Density Estimation. Efromovich: Nonparametric Curve Estimation: Methods, Theory, and Applications. Eggermont/LaRiccia: Maximum Penalized Likelihood Estimation, Volume I: Density Estimation. FahrmeiriTutz: Multivariate Statistical Modelling Based on Generalized Linear Models, 2nd edition. Farebrother: Fitting Linear Relationships: A History of the Calculus of Observations 1750-1900. Federer: Statistical Design and Analysis for Intercropping Experiments, Volume I: Two Crops. Federer: Statistical Design and Analysis for Intercropping Experiments, Volume II: Three or More Crops. Fienberg/Hoaglin/Kruskal/Tanur (Eds.): A Statistical Model: Frederick Mosteller's Contributions to Statistics, Science and Public Policy. Fisher/Sen: The Collected Works ofWassily Hoeffding. Glaz/Naus/Wallenstein: Scan Statistics. Good: Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses, 2nd edition. Gouriliroux: ARCH Models and Financial Applications. Grandell: Aspects of Risk Theory. Haberman: Advanced Statistics, Volume I: Description of Populations. Hall: The Bootstrap and Edgeworth Expansion. HardIe: Smoothing Techniques: With Implementation in S. Harrell: Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis Hart: Nonparametric Smoothing and Lack-of-Fit Tests. Hartigan: Bayes Theory. Hastie et al: The Elements of Statistical Learning: Data Mining, Inference and Prediction Hedayat/Sloane/Stujken: Orthogonal Arrays: Theory and Applications. Heyde: Quasi-Likelihood and its Application: A General Approach to Optimal Parameter Estimation. Huet/Bouvier/Gruet/Jolivet: Statistical Tools for Nonlinear Regression: A Practical Guide with S-PLUS Examples. Ibrahim/Chen/Sinha: Bayesian Survival Analysis. Kolen/Brennan: Test Equating: Methods and Practices. Kotz/Johnson (Eds.): Breakthroughs in Statistics Volume I. Kotz/Johnson (Eds.): Breakthroughs in Statistics Volume II. (continued after index) Springer Series in Statistics Advisors: P. Bickel, P. Diggle, S. Fienberg K. Krickeberg, I. Olkin, N. Wermuth, S. Zeger Springer Science+Business Media, LLC Paul W. Mielke, Jr. Kenneth J. Berry Permutation Methods A Distance Function Approach Springer Paul W. Mielke, Jr. Kenneth J. Berry Department of Statistics Department of Sociology Colorado State University Colorado State University Fort Collins, Colorado 80523 Fort Collins, Colorado 80523 E-mail: [email protected] E-mail: [email protected] Library of Congress Cataloging-in-Publication Data Mielke, Paul W. Permutation methods: a distance function approach 1 Paul W. Mielke, Jr., Kenneth 1. Berry. p. cm. - (Springer series in statistics) Includes bibliographical references and index. ISBN 978-1-4757-3451-5 ISBN 978-1-4757-3449-2 (eBook) DOI 10.1007/978-1-4757-3449-2 1. Statistical hypothesis testing. 2. Resampling (Statistics) I. Berry, Kenneth J. II. Title. III. Series. QA277 .M53 2001 519.5'6--dc21 00-067920 Printed on acid-free paper. © 2001 Springer Science+Business Media New York Originally published by Springer-Verlag New York, Inc in 200 I. Softcover reprint of the harcover I st edition 200 I All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher Springer Science+Business Media, LLC except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the former are not especially identified, is not to be taken as a sign that such names, as understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone. Production managed by A. Orrantia; manufacturing supervised by Jacqui Ashri. Photocomposed pages prepared from the authors' LaTeX files. 9 8 765 4 3 2 I SPIN 10731001 To our families. Preface The introduction of permutation tests by R.A.Fisher relaxed the paramet ric structure requirement of a test statistic. For example, the structure of the test statistic is no longer required if the assumption of normality is removed. The between-object distance function of classical test statis tics based on the assumption of normality is squared Euclidean distance. Because squared Euclidean distance is not a metric (i.e., the triangle in equality is not satisfied), it is not at all surprising that classical tests are severely affected by an extreme measurement of a single object. A major purpose of this book is to take advantage of the relaxation of the struc ture of a statistic allowed by permutation tests. While a variety of distance functions are valid for permutation tests, a natural choice possessing many desirable properties is ordinary (i.e., non-squared) Euclidean distance. Sim ulation studies show that permutation tests based on ordinary Euclidean distance are exceedingly robust in detecting location shifts of heavy-tailed distributions. These tests depend on a metric distance function and are reasonably powerful for a broad spectrum of univariate and multivariate distributions. Least sum of absolute deviations (LAD) regression linked with a per mutation test based on ordinary Euclidean distance yields a linear model analysis which controls for type I error. These Euclidean distance-based regression methods offer robust alternatives to the classical method of lin ear model analyses involving the assumption of normality and ordinary sum of least square deviations (OLS) regression linked with tests based on squared Euclidean distance. In addition, consideration is given to a num ber of permutation tests for (1) discrete and continuous goodness-of-fit, viii Preface (2) independence in multidimensional contingency tables, and (3) discrete and continuous multisample homogeneity. Examples indicate some favor able characteristics of seldom used tests. Following a brief introduction in Chapter 1, Chapters 2, 3, and 4 provide the motivation and description of univariate and multivariate permutation tests based on distance functions for completely randomized and random ized block designs. Applications are provided. Chapter 5 describes the linear model methods based on the linkage between regression and permutation tests, along with recently developed linear and nonlinear model prediction techniques. Chapters 6, 7, and 8 include the goodness-of-fit, contingency ta ble, and multisample homogeneity tests, respectively. Appendix A contains an annotated listing of the computer programs used in the book, organized by chapter. Paul Mielke is indebted to the following former University of Minnesota faculty members: his advisor Richard B. McHugh for introducing him to permutation tests, Jacob E. Bearman and Eugene A. Johnson for moti vating the examination of various problems from differing points of view, and also to Constance van Eeden and 1. Richard Savage for motivating his interest in nonparametric methods. He wishes to thank two of his Colorado State University students, Benjamin S. Duran and Earl S. Johnson, for stimulating his long term interest in alternative permutation methods. Fi nally, he wishes to thank his Colorado State University colleagues Franklin A. Graybill, Lewis O. Grant, William M. Gray, Hariharan K. lyer, David C. Bowden, Peter J. Brockwell, Yi-Ching Yao, Mohammed M. Siddiqui, Jagdish N. Srivastava, and James S. Williams, who have provided him with motivation and various suggestions pertaining to this topic over the years. Kenneth Berry is indebted to the former University of Oregon faculty members Walter T. Martin, mentor and advisor, and William S. Robinson who first introduced him to nonparametric statistical methods. Colorado State University colleagues Jeffrey 1. Eighmy, R. Brooke Jacobsen, Michael G. Lacy, and Thomas W. Martin were always there to listen, advise, and encourage. Acknowledgments. The authors thank the American Meteorological So ciety for permission to reproduce excerpts from Weather and Forecasting and the Journal of Applied Meteorology, Sage Publications, Inc. to repro duce excerpts from Educational and Psychological Measurement, the Amer ican Psychological Association for permission to reproduce excerpts from Psychological Bulletin, the American Educational Research Association for permission to reproduce excerpts from the Journal of Educational and Be havioral Statistics, and the editors and publishers to reproduce excerpts from Psychological Reports and Perceptual and Motor Skills. The authors also wish to thank the following reviewers for their help ful comments: Mayer Alvo, University of Ottawa; Bradley J. Biggerstaff, Centers for Disease Control and Prevention; Brian S. Cade, U.S. Geolog ical Survey; Hariharan K. lyer, Colorado State University; Bryan F. J. Preface ix Manly, WEST, Inc.; and Raymond K. W. Wong, Alberta Environment. At Springer-Verlag New York, Inc., we thank our editor, John Kimmel, for guiding the project throughout. We are grateful for the efforts of the production editor, Antonio D. Orrantia, and the copy editor, Hal Henglein. We wish to thank Roberta Mielke for reading the entire manuscript and correcting our errors. Finally, we alone are responsible for any shortcomings or inaccuracies. Paul W. Mielke, Jr. Kenneth J. Berry Contents Preface vii 1 Introduction 1 2 Description of MRPP 9 2.1 General Formulation of MRPP ..... . 12 2.1.1 Examples of MRPP ....... . 13 2.2 Choice of Weights and Distance Functions 18 2.3 Probability of an Observed 5 ..... . 21 2.3.1 Resampling Approximation .. . 22 2.3.2 Pearson Type III Approximation 22 2.3.3 Approximation Comparisons .. 26 2.3.4 Group Weights . . . . . . . . . . 27 2.3.5 Within-Group Agreement Measure 28 2.4 Exact and Approximate P-Values. 29 2.5 MRPP with an Excess Group . . . . . . . 32 2.6 Detection of Multiple Clumping ..... . 36 2.7 Detection of Evenly Spaced Location Patterns. 41 2.8 Dependence of MRPP on v ....... . 42 2.9 Permutation Version of One-Way AN OVA 46 2.10 Euclidean and Hotelling Commensuration 49 2.11 Power Comparisons ............ . 53 2.11.1 The Normal Probability Distribution. 59 2.11.2 The Cauchy Probability Distribution. 61