MONOGRAPHS ON APPLIED PROBABILITY AND STATISTICS DISTRIBUTION-FREE STATISTICAL METHODS MONOGRAPHS ON APPLIED PROBABILITY AND STATISTICS General Editor D.R. COX, F.R.S. Also available in the series Probability, Statistics and Time M.S. Bartlett The Statistical Analysis of Spatial Pattern M.S. Bartlett Stochastic Population Models in Ecology and Epidemiology M.S. Bartlett Risk Theory R.E. Beard, T. Pentikamen and E. Pesonen Point Processes D.R. Cox and V. Isham Analysis of Binary Data D.R. Cox The Statistical Analysis of Series of Events D.R. Cox and P.A.W. Lewis Queues D.R. Cox and W.L. Smith Stochastic Abundance Models E. Engen The Analysis of Contingency Tables B.S. Everitt Finite Mixture Distributions B.S. Everitt and D.J. Hand Population Genetics W.J. Ewens Classification A.D. Gordon Monte Carlo Methods J.M. Hammersley and D.C. Handscomb Identification of Outliers D.M. Hawkins Multivariate Analysis in Behavioural Research A.E. Maxwell Applications of Queueing Theory G.F. Newell Some Basic Theory for Statistical Inference E.J.G. Pitman Statistical Inference S.D. Silvey Models in Regression and Related Topics P. Sprent Sequential Methods in Statistics G.B. Wetherill Distribution-Free Statistical Methods J.S. MARITZ Department of Mathematical Statistics La Trobe University Australia SPRINGER-SCIENCE+BUSINESS MEDIA B.V. © 1981 1.S. Maritz Originally published by Chapman & Hall in 1981 Softcover reprint of the hardcover 1s t edition 1981 All rights reserved. No part of this book may be reprinted, or reproduced or utilized in any form or by any electronic, mechanical or other means, now known or hereafter invented, including photocopying and recording, or in any information storage and retrieval system, without permission in writing from the Publisher. British Library Cataloguing in Publication Data Maritz, 1. S. Distribution-free statistical methods. - (Monographs on applied probability and statistics) 1. Probabilities 2. Mathematical statistics I. Title II. Series 519.2 QA278 80-42136 ISBN 978-0-412-15940-4 ISBN 978-1-4899-3302-7 (eBook) DOI 10.1007/978-1-4899-3302-7 Contents Preface Page ix 1 Basic concepts in distribution-free methods 1 1.1 Introduction 1 1.2 Randomization and exact tests 2 1.3 Consistency of tests 5 1.4 Point estimation of a single parameter 7 1.5 Confidence limits 7 1.6 Efficiency considerations in the one-parameter case 8 1.7 Multiple samples and parameters 11 1.8 Normal approximations 15 2 One-sample location problems 21 2.1 Introduction 21 2.2 The median 23 2.3 Symmetric distributions 28 2.4 Asymmetric distributions: M-estimates 58 3 Miscellaneous one-sample problems 61 3.1 Introduction 61 3.2 Dispersion: the interquartile range 61 3.3 The sample distribution function and related inference 65 3.4 Estimation of F when some observations are censored 73 4 Two-sample problems 81 4.1 Types of two-sample problems 81 4.2 The basic randomization argument 82 4.3 Inference about location difference 83 4.4 Proportional hazards (Lehmann alternative) 108 4.5 Dispersion alternatives 117 viii CONTENTS 5 Straight-line regression 123 5.1 The model and some preliminaries 123 5.2 Inference about f3 only 124 5.3 Joint inference about IX and P 152 6 Multiple regression and general linear models 173 6.1 Introduction 173 6.2 Plane regression: two independent variables 173 6.3 General linear models 202 6.4 One-way analysis of variance 206 6.5 Two-way analysis of variance 209 7 Bivariate problems 215 7.1 Introduction 215 7.2 Tests of correlation 215 7.3 One-sample location 220 7.4 Two-sample location problems 236 7.5 Three-sample location problems 245 7.6 Multiple-sample location problems 251 Appendix A 253 Appendix B 257 Bibliography 259 Subject index 263 Author index 265 Preface The preparation of several short courses on distribution-free statisti cal methods for students at third and fourth year level in Australian universities led to the writing ofthis book. My criteria for the courses were, firstly, that the subject should have a clearly recognizable underlying common thread rather than appear to be a collection of isolated techniques. Secondly, some discussion of efficiency seemed essential, at a level where the students could appreciate the reasons for the types of calculations that are performed, and be able actually to do some of them. Thirdly, it seemed desirable to emphasize point and interval estimation rather more strongly than is the case in many of the fairly elementary books in this field. Randomization, or permutation, is the fundamental idea that connects almost all of the methods discussed in this book. Application of randomization techniques to original observations, or simple transformations of the observations, leads generally to conditionally distribution-free inference. Certain transformations, notably 'sign' and 'rank' transformations may lead to unconditionally distribution-free inference. An attendant advantage is that useful tabulations of null distributions of test statistics can be produced. In my experience students find the notion of asymptotic relative efficiency of testing difficult. Therefore it seemed worthwhile to give a rather informal introduction to the relevant ideas and to concentrate on the Pitman 'efficacy' as a measure of efficiency. Most of the impetus to use distribution-free methods was originally in hypothesis testing. It is now well recognized that adaptation of some of the ideas to point estimation can be advantageous from the points of view of efficiency and robustness. Pedagogically there are also advantages in emphasizing estimation. One of them is that one can adopt the straightforward approach of defining relative efficiency in terms of variances of estimates. Another is that using the notion of an estimating equation makes it easy to relate the distribution-free techniques to methods which will have been encountered in the x PREFACE standard statistics courses. Examples include the method of mo ments, and large sample approximations to standard errors of estimates. The aim of this book is to give an introduction to the distribution free way of thinking and sufficient detail of some standard techniques to be useful as a practical guide. It is not intended as a compendium of distribution-free techniques and some readers may find that a technique which they regard as important is not mentioned. For the most part the book deals with problems oflocation and location shift. They include one- and two-sample location problems, and some aspects of regression and of the 'analysis of variance'. Although some ofthe presentation is somewhat different from what appears to have become the standard in this field, very little, if any, of the material is original. Much has been gleaned from various texts. Direct acknowledgement of my indebtedness to the authors of these works is made by the listing of general references. Through these, and other references, I also acknowledge indirectly the work of other authors whose names may not appear in the bibliography. No serious attempt has been made to attribute ideas to their originators. Specific references are given only where it is felt that readers may be particularly interested in more detail. While the origins of this book are in undergraduate teaching I do hope that some experienced statisticians will find parts of it interest ing. In particular, developments in point and interval estimation, and noting of their connections with 'robust' methods have taken place fairly recently. Some interesting problems of estimating standard errors, as yet not fully resolved, are touched upon in several places. Many of my colleagues have helped me, by discussion and by reading sections of manuscript. Dr D.G. Kildea read the first draft of Chapter 2 and his detailed comments led to many improvements. Dr B.M. Brown was not only a patient listener on many occasions but also generously provided Appendix A. Melbourne, November 1980 lS. Maritz