ebook img

Applied Nonparametric Statistical Methods PDF

463 Pages·2001·7.091 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Applied Nonparametric Statistical Methods

©2001 CRC Press LLC Library of Congress Cataloging-in-Publication Data Sprent, Peter. Applied nonparametric statistical methods.--3rd ed. I P Sprent, N.C. Smeeton. p. cm. -- (Cbapman & Hall/CRC texts in statistical science series) Includes bibliographical references and index. ISBN 1-58488-145-3 (alk. paper) 1. Nonparametric statistics. I. Smecton, N.C. II. Title. III. Texts in statistical science. QA278.8 S74 20W 519.5'4--dc2l 00-055485 This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use. Apart from any fair dealing for the purpose of research or private study, or criticism or review, as permitted under the UK Copyright Designs and Patents Act, 1988, this publication my not be reproduced, stored or transmitted, in any form or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage or retrieval system, without the prior permission in writing of the publishers, or in the case of reprographic, reproduction only in accordance with the terms of the licenses issued by the Copyright Licensing Agency in the UK, or in accordance with the terms of the license issued by the appropriate Reproduction Rights Organization outside the UK. All rights reserved. Authorization to photocopy items for internal or personal use, or the personal or internal use of specific clients, my be granted by CRC Press LLC, provided that $.50 per page photocopied is paid directly to Copyright Clearance Center, 222 Rosewisd Drive, Danvers, MA 01923 USA. The fee code for users of the Transactional Reporting Service is ISBN 1-58489-1453/0150.004.50. The fee is subject to change without notice. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific permission must be obtained in writing from CRC Press LLC for such copying. Direct all inquiries to CRC Press LLC, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431. Trademark Notice: Product or corporate names my be trademarks or registered trademarks, and are used only for identification and explanation, without intent to infringe. Visit the CRC Press Web site at www.crcpress.com © 2001 by Chapman & Hall/CRC No claim to original U.S. Government works International Standard Book Number 1-58488-145-3 Library of Congress C~ard Number 00-05.5485 Printed in the United States of America 3 4 5 6 7 8 9 0 Printed on acid-free paper ©2001 CRC Press LLC Contents Preface 1Introducing nonparametric methods 1.1 Basic statistics 1.2 Samples and populations 1.3 Hypothesis tests 1.4 Estimation 1.5 Ethical issues 1.6 Computers and nonparametric methods 1.7 Further reading Exercises 2 Centrality inference for single samples 2.1 Using measurement data 2.2 Inferences about medians based on ranks 2.3 The sign test 2.4 Transformation of ranks 2.5 Asymptotic results 2.6 Robustness 2.7 Fields of application 2.8 Summary Exercises 3 Other single-sample inference 3.1 Inferences for dichotomous data 3.2 Tests related to the sign test 3.3 Matching samples to distributions 3.4 Angular data 3.5 A runs test for randomness 3.6 Fields of application 3.7 Summary Exercises 4 Methods for paired samples 4.1 Comparisons in pairs 4.2 A less obvious use of the sign test 4.3 Power and sample size 4.4 Fields of application 4.5 Summary Exercises ©2001 CRC Press LLC 5 Methods for two independent samples 5.1 Centrality tests and estimates 5.2 Rank based tests 5.3 The median test 5.4 Normal scores 5.5 Tests for survival data 5.6 Asymptotic approximations 5.7 Power and sample size 5.8 Tests for equality of variance 5.9 Tests for a common distribution 5.10 Fields of application 5.11 Summary Exercises 6 Three or more samples 6.1 Compaarisons with parametric methods 6.2 Centrality tests for independent samples 6.3 Centrality tests for related samples 6.4 More detailed treatment comparisons 6.5 Tests for heterogeneity of variance 6.6 Some miscellaneous considerations 6.7 Fields of application 6.8 Summary Exercises 7 Correlation and concordance 7.1 Correlation and bivariate data 7.2 Ranked data for several variables 7.3 Agreement 7.4 Fields of application 7.5 Summary Exercises 8 Regression 8.1 Bivariate linear regression 8.2 Multiple regression 8.3 Nonparametric regression models 8.4 Other multivariate data problems 8.5 Fields of application 8.6 Summary Exercises 9 Categorical data 9.1 Categories and counts 9.2 Nominal attribute categories ©2001 CRC Press LLC 9.3 Ordered categorical data 9.4 Goodness-of-fit tests for discrete data 9.5 Extension of McNemar's test 9,6 Fields of application 9.7 Summary Exercises 10 Association in categorical data 10.1 The analysis of association 10.2 Some models for contingency tables 10.3 Combining and partitioning of tables 10.4 Power 10.5 Fields of application 10.6 Summary Exercises 11 Robust estimation 11.1 When assumptions break down 11.2 Outliers and influence 11.3 The bootstrap 11.4 M-estimators and other robust estimators 11.5 Fields of application 11.6 Summary Exercises Appendix References Solutions to odd-numbered exercises ©2001 CRC Press LLC Preface T h e s e c o n d e d i t i o n o f t h i s b o o k w a s w r i t t e n b y t h e f i r s t - n a m e d a u t h o r t o p r o v i d e a t h e n ( 1 9 9 3 ) u p - t o - d a t e i n t r o d u c t i o n t o n o n p a r a m e t r i c a n d d i s t r i b u t i o n - f r e e m e t h o d s . I t t o o k a m i d w a y c o u r s e b e t w e e n a b a r e d e s c r i p t i o n o f t e c h n i q u e s a n d a d e t a i l e d e x p o s i t i o n o f t h e t h e o r y . I n d i v i d u a l m e t h o d s a n d l i n k s b e t w e e n t h e m w e r e i l l u s t r a t e d m a i n l y b y e x a m p l e s , M a t h e m a t i c s w a s k e p t t o t h e m i n i m u m n e e d e d f o r a c l e a r u n d e r s t a n d i n g o f s c o p e a n d l i m i t a t i o n s . T h e b o o k w a s d e s i g n e d t o m e e t t h e n e e d s b o t h o f s t a t i s t i c s s t u d e n t s m a k i n g f i r s t c o n t a c t w i t h t h e s e m e t h o d s a n d o f r e s e a r c h w o r k e r s , m a n a g e r s , r e s e a r c h a n d d e v e l o p m e n t s t a f f , c o n s u l t a n t s a n d o t h e r s w o r k i n g i n v a r i o u s f i e l d s w h o h a d a n u n d e r s t a n d i n g o f b a s i c s t a t i s t i c s a n d w h o , a l t h o u g h t h e y h a d l i t t l e p r e v i o u s k n o w l e d g e o f n o n p a r a m e t r i c m e t h o d s , n o w f o u n d o r t h o u g h t t h e y m i g h t f i n d t h e m u s e f u l i n t h e i r w o r k . A p o s i t i v e r e s p o n s e f r o m r e a d e r s a n d r e v i e w e r s h a s e n c o u r a g e d u s t o r e t a i n t h e b a s i c f o r m a t w h i l e t a k i n g t h e o p p o r t u n i t y t o i n t r o d u c e n e w t o p i c s a s w e l l a s c h a n g i n g t h e e m p h a s i s t o r e f l e c t b o t h d e v e l o p m e n t s i n c o m p u t i n g a n d n e w a t t i t u d e s t o w a r d s d a t a a n a l y s i s . N o n p a r a m e t r i c m e t h o d s a r e b a s i c a l l y a n a l y t i c t o o l s , b u t d a t a c o l l e c t i o n , a n a l y s e s a n d t h e i r i n t e r p r e t a t i o n a r e i n t e r r e l a t e d . T h i s i s w h y w e h a v e e x p a n d e d t h e c o v e r a g e o f t o p i c s s u c h a s e t h i c a l c o n s i d e r a t i o n s a n d c a l c u l a t i o n o f p o w e r a n d o f s a m p l e s i z e s n e e d e d t o a c h i e v e s t a t e d a i m s . T h e s e m a k e t h e i r m a i n i m p a c t a t t h e p l a n n i n g s t a g e , b u t a l s o i n f l u e n c e t h e a n a l y t i c a n d i n f e r e n t i a l p h a s e s . T h e r e h a s b e e n w i d e s p r e a d c r i t i c i s m i n r e c e n t y e a r s b y m a n y s t a t i s t i c i a n s o f i n a p p r o p r i a t e a n d e v e n i m p r o p e r u s e o f s i g n i f i c a n c e t e s t s a n d t h e r e l a t e d c o n c e p t o f P - v a l u e s . H o w e v e r , t h e s e t o o l s h a v e a p o s i t i v e r o l e w h e n p r o p e r l y u s e d a n d u n d e r s t o o d . T o e n c o u r a g e b e t t e r u s e t h e s e c t i o n o n h y p o t h e s i s t e s t i n g i n C h a p t e r I h a s b e e n r e w r i t t e n , a n d t h r o u g h o u t t h e b o o k t h e r e i s m o r e e m p h a s i s o n h o w t h e s e c o n c e p t s s h o u l d b e u s e d a n d w a r n i n g s a b o u t p o t e n t i a l m i s u s e . T h e l a y o u t o f C h a p t e r s I t o 1 0 f o l l o w s t h e b r o a d p a t t e r n o f t h e c o r r e s p o n d i n g c h a p t e r s i n t h e s e c o n d e d i t i o n b u t t h e r e a r e m a n y c h a n g e s i n o r d e r a n d o t h e r a s p e c t s o f p r e s e n t a t i o n i n c l u d i n g n e w a n d m o r e d e t a i l e d e x a m p l e s . O n e o r t w o t o p i c s h a v e b e e n d r o p p e d o r a r e t r e a t e d i n l e s s d e t a i l , a n d n e w m a t e r i a l h a s b e e n i n s e r t e d w h e r e a p p r o p r i a t e . A s w e l l a s c o m m e n t s o n e t h i c a l c o n s i d e r a t i o n s a n d d i s c u s s i o n s o n p o w e r a n d s a m p l e s i z e , t h e r e a r e n e w s e c t i o n s o n t h e ©2001 CRC Press LLC a n a l y s i s o f a n g u l a r d a t a , t h e u s e o f c a p t u r e - r e c a p t u r e m e t h o d s , t h e m e a s u r e m e n t o f a g r e e m e n t b e t w e e n o b s e r v e r s a n d s e v e r a l l e s s e r a d d i t i o n s . E x a m p l e s h a v e b e e n c h o s e n f r o m a w i d e r r a n g e o f d i s c i p l i n e s . F o r a f e w m o r e a d v a n c e d t o p i c s s u c h a s r e g r e s s i o n s m o o t h i n g t e c h n i q u e s a n d M - e s t i m a t i o n w e h a v e n o t g i v e n d e t a i l s o f s p e c i f i c m e t h o d s b u t o n l y a b r o a d o v e r v i e w o f e a c h t o p i c t o e n a b l e r e a d e r s t o j u d g e w h e t h e r i t m a y b e r e l e v a n t t o t h e i r p a r t i c u l a r n e e d s . I n s u c h c a s e s r e f e r e n c e s a r e g i v e n t o s o u r c e s t h a t c o n t a i n t h e d e t a i l n e e d e d f o r i m p l e m e n t a t i o n . C h a p t e r 1 1 h a s b e e n r e w r i t t e n t o g i v e a n e l e m e n t a r y i n t r o d u c t i o n t o i n f l u e n c e f u n c t i o n s , t h e n o n p a r a m e t r i c b o o t s t r a p a n d r o b u s t e s t i m a t i o n g e n e r a l l y , a g a i n w i t h r e f e r e n c e s t o s o u r c e m a t e r i a l f o r t h o s e w h o w a n t t o m a k e f u l l u s e o f t h e s e i d e a s . M a t e r i a l t h a t a p p e a r e d i n C h a p t e r 1 2 o f t h e s e c o n d e d i t i o n h a s b e e n u p d a t e d a n d i n c o r p o r a t e d a t r e l e v a n t p o i n t s i n t h e t e x t . W e h a v e n o t i n c l u d e d t a b l e s f o r b a s i c n o n p a r a m e t r i c p r o c e d u r e s , m a i n l y b e c a u s e m o r e s a t i s f a c t o r y i n f o r m a t i o n i s p r o v i d e d b y m o d e m s t a t i s t i c a l s o f t w a r e , m a k i n g m a n y s t a n d a r d t a b l e s i n s u f f i c i e n t o r s u p e r f l u o u s f o r s e r i o u s u s e r s o f t h e m e t h o d s . T h o s e w h o n e e d s u c h t a b l e s b e c a u s e t h e y h a v e n o a c c e s s t o s p e c i a l i z e d s o f t w a r e a r e w e l l c a t e r e d f o r b y s t a n d a r d c o l l e c t i o n s o f s t a t i s t i c a l t a b l e s . W e g i v e r e f e r e n c e s t o t h e s e t h r o u g h o u t t h e b o o k a n d a l s o w h e n r e l e v a n t t o s o m e s p e c i a l i z e d t a b l e s . W e h a v e r e t a i n e d t h e s e c t i o n o u t l i n i n g s o l u t i o n s t o o d d - n u m b e r e d e x e r c i s e s . W e a r e g r a t e f u l t o m a n y r e a d e r s o f t h e e a r l i e r e d i t i o n s w h o m a d e c o n s t r u c t i v e c o m m e n t s a b o u t t h e c o n t e n t a n d t r e a t m e n t , o r s o m e t i m e s a b o u t t h e l a c k o f t r e a t m e n t , o f p a r t i c u l a r t o p i c s . T h i s i n p u t t r i g g e r e d m a n y o f t h e c h a n g e s m a d e i n t h i s e d i t i o n . O u r s p e c i a l t h a n k s g o t o J i m M c G a n r i c k f o r h e l p f u l d i s c u s s i o n s o n p h y s i o l o g i c a l m e a s u r e m e n t s a n d t o P r o f e s s o r R i c h a r d H u g h e s f o r a d v i c e o n t h e G u i l l a i n - B a r r é s y n d r o m e . W e h a p p i l y r e n e w t h e t h a n k s r e c o r d e d i n t h e s e c o n d e d i t i o n t o T i m o t h y P . D a v i s a n d C h r i s T h e o b a l d w h o s u p p l i e d u s w i t h d a t a s e t s u s e d i n i t i a l l y i n t h a t e d i t i o n f o r e x a m p l e s t h a t w e h a v e r e t a i n e d . P. Sprent N. C. Smeeton July 2000 ©2001 CRC Press LLC 1 Introducing nonparametric methods 1.1 BASIC STATISTICS One need know only a little statistics to make sensible use of simple nonparametric methods. While we assume that many readers will be familiar with the basic statistical notions met in introductory or service courses of some 20 hours instruction, nevertheless those without formal statistical training should be able to use this book in parallel with one of the many introductory general statistical texts available in libraries or from most academic bookshops, or one recommended by a statistician colleague or friend. The choice may depend upon how familiar one is with mathematical terms and notation. For example, Essential Statistics (Rees, 1995) adopts a straightforward approach that should suffice for most purposes, but some may prefer a more advanced treatment, or an introductory text that emphasizes applications in particular areas such as medicine, biology, agriculture, the social sciences and so on. Readers with considerable experience in general statistics but who are new to nonparametric methods will be familiar with some background material we give, but we urge them at least to skim through this to see whether we depart from conventional treatments. For example, our approa ch to tes ting and estima tion in Se ctions 1. 3 and 1. 4 differs in certain aspects from that in some general statistics courses. In this chapter we survey some concepts relevant to nonpara- metric methods. It is the methods and not the data that are nonparametric. 1.1.1 Parametric and nonparametric methods Statistics students meet families of probability distributions early in their courses. One of the best known is the normal or Gaussian family, where individual members are specified by assigning constant values to two quantities called parameters. These are usually denoted by µ and σ2 and represent the mean and variance. ©2001 CRC Press LLC The notation N(µ, σ2) denotes a normal distribution with these parameters. Normal distributions are often associated with con- tinuous data such as measurements. Given a set of independent observations (a concept explained more fully in Sec iton 1. 2) from a norma ldis rtibution, we often want to infer something about the unknown parameters. The sample mean provides a point (i.e. single value) estimate of the parameter µ (sometimes called, in statistical jargon, the population mean). Here the well-known t-test is used to measure the strength of the evidence pr ov id ed by a sa mp le to su pp or t a n a pri or i hy po th es i ze d val ue µ 0 fo r the population mean. More usefully, we may determine what is called a confidence interval for the ‘true’ population mean. This is an interval for which we have, in a sense we describe in Section 1.4.1, reasonable confidence that it contains the true but unknown mean µ. These are examples of parametric inferences. The normal distribution is strictly relevant only to some types of continuous scale data such as measurements, but it often works quite well if the measurement scale is coarse (e.g. for examination marks recorded to the nearest integer). More importantly, it is useful for approximations in a wide range of circumstances when applied to other types of data. The binomial distribution is relevant to some types of counts. The family also has tw o paramete rs n, p, whe re n is the tota l number of obs er v at ion s an d p is th e pro bab il it y tha t a par tic ul ar on e fro m tw o possible events occurs at any observation. Subject to certain conditions, the number of occurrences, r, where 0 ≤ r ≤ n, of that event in the n observations has a binomial distribution, which we refer to as a B(n, p) distribution. With a binomial distribution if we observe r occurrences of an event in a set of n observations, then p^ = r/n is a point estimate of p, the probability of success at each independent observation. We may want to assess how strongly sample evidence supports an a priori hypothesized value p , say, for p or obtain a confidence interval for 0 the value of p for the population. The binomial distribution is often relevant to counts in dichot- om ou s out co me si tu at i on s. Fo r exa mp le, the nu mb er of mal e ch il dr en in a family of size n is often assumed to have a binomial distribution with p = 1/ , but we see in Section 1.1.3 that this is only 2 a pproximate. The number of ‘s ixe s’ re corded in 10 cas ts of a fa ir die has a B(10, 1/ ) distribution. The outcome of interest is sometimes 6 called a ‘favourable’ event, but this is hardly appropriate if, for example, the event is a positive diagnosis of illness. ©2001 CRC Press LLC Other well-known families include the uniform (or rectangular), multinomial, Poisson, exponential, double exponential, gamma, beta and Weibull distributions. This list is not exhaustive and you may not be, and need not be, familiar with all of them. It may be reasonable on theoretical grounds or on the basis of past experience to assume that observations come from a particular family of distributions. Also experience, backed by theory, suggests that for many measurements inferences based on the assumption that observations form a random sample from some normal distribution may not be misleading even if the normality assumption is incorrect. A theorem called the central limit theorem justifies these and other uses of the normal distribution, particularly in what are called asymptotic approximations. We often refer to these in this book. Parametric inference is sometimes inappropriate or even impossible. To assume that samples come from any specified family of distributions ma y be unrea sonable . For example , we ma y not have examination marks on, say, a percentage scale for each candidate but know only the numbers of candidates in banded and ordered grades des igna ted Grade A, Grade B, Gra de C, etc . Given thes e numbe rs for two different schools, we may want to know if they indicate a difference in performance between schools that might be due to unequal standards of teaching or the ability of one school to attract more able pupils. The method of inference is then usually nonparametric. Even when we have precise measurements it may be irrational to assume a normal distribution because normality implies certain properties of symmetry and spread. We may be able to see that a sample is not from a normal distribution simply by looking at a few characteristics like the sample mean, median, standard deviation and range of values. For example, if all observations are either zero or positive and the standard deviation is appreciably greater than the mean then, unless the sample is very small, it is unreasonable to assume it comes from a normal distribution (see Exercise 1.9). There are well-known distributions that are flatter than the normal (e.g. the continuous uniform, or rectangular, distribution) or skew (e.g. the exponential and gamma distributions). In practice we are often able to say little more than that our sample appears to come from a distribution that is skew, or very peaked, or very flat, etc. Here nonparametric inference may again be appropriate. In this latter situation some writers prefer the term distribution-free to nonparametric. The terms ‘distribution-free’ and ‘nonparametric’ are sometimes regarded as synonymous. Indeed, in defining a distribution-free ©2001 CRC Press LLC

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.