This Page Intentionally Left Blank BASIC STATISTICS This Page Intentionally Left Blank BASIC STATISTICS A Primer for the Biomedical Sciences Fourth Edition OLIVE JEAN DUNN VIRGINIA A. CLARK WILEY A JOHN WILEY &SONS, INC., PUBLICATION Copyright 0 2009 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 11 1 River Street, Hoboken, NJ 07030, (201) 748-601 1, fax (201) 748-6008, or online at http://www.wiley.com/go!permission. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic format. For information about Wiley products, visit our web site at www.wiley.com. Library of Congress Cataloging-in-Publication Data: Dunn, Olive Jean. Basic statistics: a primer for the biomedical sciences / Olive Jean DUM, Virgina A. Clark. - 4th ed. p. ; cm. Includes bibliographical references and index. ISBN 978-0-470-24879-9 (cloth) 1. Medical statistics. 2. Biometry. I. Clark, Virginia, 1928- 11. Title. [DNLM: 1. Biometry. 2. Statistics as Topic. WA 950 D923b 20091 RA409.D87 2009 5 19.5'02461-dc22 2009018425 Printed in the United States of America. 1 0 9 8 7 6 5 4 3 2 1 CONTENTS Preface to the Fourth Edition xiii 1 Initial Steps 1 1.1 Reasons for Studying Biostatistics 1.2 Initial Steps in Designing a Biomedical Study 1.2.1 Setting Objectives 1.2.2 Making a Conceptual Model of the Disease Process 1.2.3 Estimating the Number of Persons with the Risk Factor or Disease 4 1.3 Common Types of Biomedical Studies 5 1.3.1 Surveys 6 1.3.2 Experiments 7 1.3.3 Clinical Trials 7 1.3.4 Field Trials 9 1.3.5 Prospective Studies 9 1.3.6 Case/Control Studies 10 1.3.7 Other Types of Studies 10 V CONTENTS 1.3.8 Rating Studies by the Level of Evidence 11 1.3.9 CONSORT 11 Problems 12 References 12 Populations and Samples 13 2.1 Basic Concepts 13 2.2 Definitions of Types of Samples 15 2.2.1 Simple Random Samples 15 2.2.2 Other Types of Random Samples 15 2.2.3 Reasons for Using Simple Random Samples 17 2.3 Methods of Selecting Simple Random Samples 17 2.3.1 Selection of a Small Simple Random Sample 17 2.3.2 Tables of Random Numbers 17 2.3.3 Sampling With and Without Replacement 19 2.4 Application of Sampling Methods in Biomedical Studies 19 2.4.1 Characteristics of a Good Sampling Plan 19 2.4.2 Samples for Surveys 20 2.4.3 Samples for Experiments 21 2.4.4 Samples for Prospective Studies 23 2.4.5 Samples for Case/Control Studies 23 Problems 25 References 26 Collecting and Entering Data 27 3.1 Initial Steps 27 3.1.1 Decide What Data You Need 28 3.1.2 Deciding How to Collect the Data 29 3.1.3 Testing the Collection Process 30 3.2 DataEntry 31 3.3 Screening the Data 33 3.4 CodeBook 33 Problems 34 References 34 Frequency Tables and Their Graphs 35 4.1 Numerical Methods of Organizing Data 36 4.1.1 An Ordered Array 36 CONTENTS vii 4.1.2 Stem and Leaf Tables 36 4.1.3 The Frequency Table 38 4.1.4 Relative Frequency Tables 40 4.2 Graphs 40 4.2.1 The Histogram: Equal Class Intervals 41 4.2.2 The Histogram: Unequal Class Intervals 41 4.2.3 Areas Under the Histogram 43 4.2.4 The Frequency Polygon 44 4.2.5 Histograms with Small Class Intervals 45 4.2.6 Distribution Curves 45 Problems 47 References 47 5 Measures of Location and Variability 49 5.1 Measures of Location 50 5.1.1 The Arithmetic Mean 50 5.1.2 The Median 51 5.1.3 Other Measures of Location 52 5.2 Measures of Variability 52 5.2.1 The Variance and the Standard Deviation 52 5.2.2 Other Measures of Variability 54 5.3 Sampling Properties of the Mean and Variance 55 5.4 Considerations in Selecting Appropriate Statistics 57 5.4.1 Relating Statistics and Study Objectives 57 5.4.2 Relating Statistics and Data Quality 58 5.4.3 Relating Statistics to the Type of Data 58 5.5 A Common Graphical Method for Displaying Statistics 60 Problems 61 References 62 6 The Normal Distribution 63 6.1 Properties of the Normal Distribution 64 6.2 Areas Under the Normal Curve 65 6.2.1 Computing the Area Under a Normal Curve 66 6.2.2 Linear Interpolation 68 6.2.3 Interpreting Areas as Probabilities 70 6.3 Importance of the Normal Distribution 70 6.4 Examining Data for Normality 72 viii CONTENTS 6.4.1 Using Histograms and Box Plots 72 6.4.2 Using Normal Probability Plots or Quantile-Quantile Plots 72 6.5 Transformations 75 6.5.1 Finding a Suitable Transformation 76 6.5.2 Assessing the Need for a Transformation 77 Problems 77 References 78 7 Estimation of Population Means: Confidence Intervals 79 7.1 Confidence Intervals 80 7.1.1 An Example 80 7.1.2 Definition of Confidence Interval 81 7.1.3 Choice of Confidence Level 82 7.2 Sample Size Needed for a Desired Confidence Interval 83 7.3 The t Distribution 83 7.4 Confidence Interval for the Mean Using the t Distribution 85 7.5 Estimating the Difference Between Two Means: Unpaired Data 86 7.5.1 The Distribution of - 7, 86 7.5.2 Confidence Intervals for ,LL~ - p2: Known Variance 87 7.5.3 Confidence Intervals for ,LL~- p2: Unknown Variance 88 7.6 Estimating the Difference Between Two Means: Paired Comparison 89 Problems 91 References 93 8 Tests of Hypotheses on Population Means 95 8.1 Tests of Hypotheses for a Single Mean 96 8.1.1 Test for a Single Mean When u Is Known 96 8.1.2 One-sided Tests When u Is Known 99 8.1.3 Summary of Procedures for Test of Hypotheses 100 8.1.4 Test for a Single Mean When Is Unknown 101 8.2 Tests for Equality of two Means: Unpaired Data 103 8.2.1 Testing for Equality of Means When u Is Known 103 8.2.2 Testing for Equality of Means When 0 Is Unknown 104 8.3 Testing for Equality of Means: Paired Data 107 8.4 Concepts Used in Statistical Testing 108 8.4.1 Decision to Accept or Reject 108