ebook img

Introduction to Biostatistics. A Guide to Design, Analysis and Discovery. PDF

565 Pages·1995·33.504 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Introduction to Biostatistics. A Guide to Design, Analysis and Discovery.

Introduction to . Biostatistics A Guide to Design, Analysis, and Discovery Ronald N. Forthofer Longmont, Colorado Eun Sul Lee School of Public Health The University of Texas Health Sciences Center at Houston Houston, Texas Academic Press San Diego New York Boston London Sydney Tokyo Toronto This book is printed on acid-free paper. κ2) Copyright © 1995 by ACADEMIC PRESS, INC. All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Academic Press, Inc. A Division of Harcourt Brace & Company 525 B Street, Suite 1900, San Diego, California 92101-4495 United Kingdom Edition published by Academic Press Limited 24-28 Oval Road, London NW1 7DX Library of Congress Cataloging-in-Publication Data Introduction to Biostatistics: A Guide to Design, Analysis, and Discovery / edited by Ronald M. Forthofer, Eun Sul Lee p. cm. Includes bibliographical references and index. ISBN 0-12-262270-7 1. Medicine-Research-Statistical methods. 2. Biometry. I. Forthofer, Ron N., date. II. Lee, Eun Sul. R853.S7B54 1995 574*.01'5195--dc20 94-24912 CIP PRINTED IN THE UNITED STATES OF AMERICA 95 96 97 98 99 00 EB 9 8 7 6 5 4 3 2 1 To Mary and Chong Mahn Without their love and support, this book would not have been possible. Preface This introductory biostatistics textbook encourages readers to consider the full context of the problem being examined. The context includes what the data actually represent, why and how the data were collected, whether or not one can generalize from the sample to the target population, and what problems occur when the data are incomplete due to people refusing to participate in the study or due to the researcher failing to obtain all the relevant data from some sample subjects. Although many introductory biostatistical textbooks do a very good job in presenting statistical tests and estimators, they are limited in their presentations of the context. In addi­ tion, most textbooks do not emphasize the relevance of biostatistics to people's lives and well being. We have written this textbook to address these deficiencies and to provide a good introduction to statistical meth­ ods. We address the context as well as the importance of research design, particularly in controlling for confounding variables and in dealing with reversion to the mean. We focus on these issues in Chapters 1 to 3 and Chapter 8 and raise them again in examples and exercises throughout the book. xvii XViiî PREFACE This textbook also differs from the other texts in that it uses real data for most of the exercises and examples in the book. For example, real data on the relation between prenatal care and birthweight, instead of data from tossing dice or dealing cards, are used in the definition of probability and in the demonstration of the rules of probability. We then show how these rules are applied to the life table, a major tool used by health analysts. Another major difference between this and other texts is Chapter 12 on the analysis of the follow-up life table. The follow-up life table can be used to summarize survival data and is one of the more important tools used in clinical trials. We also include material on tolerance and prediction intervals, topics generally ignored in other texts. We demonstrate in which situations these intervals should be used and how they provide different information than that provided by confidence intervals. Two other topics, usually not men­ tioned in other introductory texts, introduced here are multiple regression and logistic regression, two of the more useful methods of analysis in statistics and epidemiology. We do not assume that the reader has prior knowledge of statistical methods, but we do assume that the reader is not rendered unconscious by the sight of a formula. In dealing with a formula, we first try to explain the concept underlying the formula. We then show how the formula is a trans­ lation of the concept into something that can be measured. The emphasis is on when and how to apply the formula, not on its derivation. We also show how the calculation can be quickly performed using a statistical pack­ age. The package shown in the text is MINITAB. Comparable commands for two other packages, Stata and SAS, are shown in the Appendix. The textbook is designed for a two-quarter course for graduate stu­ dents and for a two-semester course for undergraduate students. If used for a one-semester course, possible deletions include sections on the fol­ lowing topics: the geometric mean, the life table, the Poisson distribution, the distribution-free approach to intervals, the confidence interval and test of hypothesis for the correlation coefficient, the Kruskal-Wallis test, the trend test for r by 2 contingency tables, the two-way ANOVA and the linear model representation of the ANOVA. We wish to acknowledge especially useful suggestions and comments provided by Joel A. Harrison and Mary Forthofer. Others who made valu­ able contributions include Herbert Gautschi, Irene Easling, Anna Baron, Mary Grace Kovar, and the students at the University of Texas School of Public Health Satellite Program in El Paso who reviewed parts or all of the text. Any problems in the text are the responsibility of the authors, not of the reviewers. Introduction mmiostatistics is the application of statistical methods to the biological and life sciences. Statistical methods include procedures for: (1) collecting data, (2) presenting and summarizing data, and (3) drawing inferences from sample data to a population. These methods are particularly useful in studies involving humans because the processes under investigation are often very complex. Because of this complexity, a large number of mea­ surements on the study subjects are usually made to aid the discovery process; however, this complexity and abundance of data often mask the underlying processes. It is in these situations that the systematic methods found in Statistics help create order out of the seeming chaos. Some areas of application are: 1. A collection of vital statistics, for example, mortality rates, used to inform about and to monitor the health status of the population. 2. Clinical trials to determine whether or not a new hypertension medication performs better than the standard treatment for mild to moderate essential hypertension. 1 1 INTRODUCTION 3. Surveys to estimate the proportion of low-income women of child- bearing age with iron-deficiency anemia. 4. Studies to examine whether or not exposure to electromagnetic fields is a risk factor for leukemia. Biostatistics aids administrators, legislators, and researchers in answer­ ing questions. The questions of interest are explicit in examples 2 and 4 above: Is the new drug more effective than the standard and is exposure to the electromagnetic field a risk factor? In examples 1 and 3 the values or estimates obtained are measurements at a point in time which could be used with measures at other time points to determine whether or not a policy change, for example, a 10 percent increase in Medicaid funding in each state, had an effect. I. DATA: THE KEY COMPONENT OF A STUDY In this textbook, much of the material relates to methods to be used in the analysis of data. It is necessary to become familiar with these methods and their use as this knowledge will enable one to: (1) better understand re­ ports of studies, and (2) better design and carry out studies. Readers, however, must not let the large number of methods of analysis and the associated calculations presented in this book overwhelm them. More im­ portant than the methods used in the analysis is the use of the correct study design and the correct definition and measurement of the study variables. The key to a good study is good datai The following examples dem­ onstrate the importance of the data. Sometimes because of an incomplete understanding of the data or of possible problems with the data, the conclusion from a study may be problematic. For example, consider a study to examine whether or not circumcision status is associated with cancer of the cervix. One issue the researcher must decide is how to determine the circumcision status. The easiest way is to ask the male if he had been circumcised; however, Lilienfeld and Graham (1) found that 34 percent of 192 consecutive male patients they studied gave incorrect answers about their circumcision status. Most of the incorrect responses were due to the men not knowing they had been circumcised. Hence the use of a direct question instead of an examination may lead to an incorrect conclusion about the relation between circumci­ sion status and cancer of the cervix. In the preceding example, reliance on the study subject's memory or knowledge could be a mistake. Yaffe and Shapiro (2) provide another example of potential problems when the study subjects' responses are used. They examined the accuracy of subjects' reports of health care utili­ zation and expenditures for 7 months compared with that shown in their I. DATA: THE KEY COMPONENT OF A STUDY 3 medical and insurance records for two geographical areas. In the Baltimore area, which provided data from approximately 375 households, subjects reported only 73 percent of the identified physician office visits and only 54 percent of the clinic visits. The results for Washington County, Maryland, based on about 315 households, showed 84 percent accuracy for physician office visits but only 39 percent accuracy for clinic visits. Hence the re­ ported utilization of health services by subjects can greatly underestimate the actual utilization and, perhaps more importantly, the accuracy can vary by type of utilization and by population subgroups. An example of how a wrong conclusion could be reached because of a failure to understand how data are collected comes from Norris and Ship­ ley (3). Figure 1.1 shows the infant mortality rates, calculated convention­ ally as the ratio of the number of infant deaths to the number of live births during the same period multiplied by 1000, for different racial groups in California and the United States in 1967. Norris and Shipley questioned the accuracy of the rate for American Indians in California because it was much lower than the corresponding American Indian rate in the U.S., and even lower than the rates of the Chinese- and Japanese-Americans in California. Therefore they used a cohort method to recalculate the infant mortality rates. The cohort rate is based on following all the children that were born in California during a year and observing how many of those infants died before they reached 1 year of age. Some deaths were missed, for example, infants that died out of California, but it was estimated that almost 97 percent of the infant deaths of the cohort were captured in the California death records. Infant deaths gg> California ■ United States per 1000 -"--'"- ^m live births 40 38 35 30 30^ -31- 25 20 15 16 116 10 §?io 5 White African American Chinese Japanese Other American Indian Infant mortality rates per 1000 live births by race for California and the United States in 1967. 1 INTRODUCTION Norris and Shipley used 3 years of data in their reexamination of the infant mortality to provide better stability for the rates. Figure 1.2 shows the conventional and the cohort rates for the 1965-1967 period by race. The use of data from 3 years has not changed the conventional rates much. The conventional rate for American Indians in California is still much lower than the rate for American Indians in the U.S., although now it is slightly above the Chinese- and Japanese-American rates. The cohort rate for American Indians, however, is now much closer to the corresponding rate found in the United States. The rates for the Chinese- and Japanese-Ameri­ cans and other races have also increased substantially when the cohort method of calculation is used. What is the explanation for this discrepancy in results between these methods of calculating infant mortality rates? Norris and Shipley attributed much of the difference to how the birth and death certificates, used in the conventional method, were completed. They found that the birth certificate is typically filled out by hospital staff who deal mostly with the mother; hence, the birth certificate usually re­ flects the race of the mother. The funeral director is responsible for com­ pleting the death record and usually deals with the father who may be of a different racial group than the mother. Hence, the racial identification of an infant can vary between the birth and death records—a mismatch of the numerator (death) and the denominator (birth) in the calculation of the infant death rate. The cohort method is not affected by this possible differ­ ence because it uses only the child's race from the birth certificate. ;>3i> Conventional BH Cohort Infant deaths """'"'""" per 1000 live births 35 32 32 30 25 L 20 20 15 10 5 I- White African American Chinese Japanese Other American Indian Infant mortality rates per 1000 live births by conventional and cohort methods by race for California, 1965-1967. III. CONTENTS 5 Beginning with the 1989 data year, the National Center for Health Statistics (NCHS) (4, page 53) uses primarily the race of the mother taken from the birth certificate in tabulating data on births. This change should remove the problem caused by having parents from two racial groups in the use of the conventional method of calculating infant mortality rates. As can be seen, data rarely speak clearly and usually require an inter­ preter. The interpreter—someone like Norris and Shipley in the earlier example—is someone who is familiar with the subject matter, who under­ stands what the data are supposed to represent, and who knows how the data were collected. II. REPLICATION: PART OF THE SCIENTIFIC METHOD Even though most of the examples and problems in this book refer to the analysis of data from a single study, the reader must remember that one study rarely tells the complete story. Statistical analysis of data may demonstrate that there is. a high proba­ bility of an association between two variables; however, a single study rarely provides proof that such an association exists. Results must be replicated by additional studies that eliminate other factors that could have accounted for the relationship observed between the study variables. For example, many studies have examined the role of cigarette smoking in lung cancer and other diseases. Proponents of smoking argue that these studies do not prove that smoking is the cause of lung cancer; however, through the large number of studies, which almost always have found an association be­ tween smoking and lung cancer in a wide variety of situations, it has become clear that smoking greatly increases the risk of developing lung cancer. Another example of the use of replication is provided by the Food and Drug Administration (FDA). The FDA requires a pharmaceutical company to present data from a number of drug trials before it considers the drug. The FDA believes that a single trial does not provide sufficient evidence of the drug's efficacy and safety. III. CONTENTS The following chapters continue the theme of combining substantive knowledge with statistical methods. Where possible, we also demonstrate how the figures and calculations being considered can be created or per­ formed on the computer. We believe the computer can be an asset as it removes the burden of the calculations and provides more time for the

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.