ebook img

The Biostatistics Cookbook; The most user-friendly Guide for the Bio-medical Scientist - Kluwer Academic PDF

176 Pages·2016·1.77 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview The Biostatistics Cookbook; The most user-friendly Guide for the Bio-medical Scientist - Kluwer Academic

THE BIOSTATISTICS COOKBOOK The Biostatistics Cookbook The most user-friendly guide for the bio/medical scientist Seth Michelson Roche Bioscience, Palo Alto, CA, USA and Timothy Schofield Merck Research Laboratories, West Point, PA, USA KLUWER ACADEMIC PUBLISHERS NEW YORK / BOSTON / DORDRECHT / LONDON / MOSCOW eBook ISBN: 0-306-46853-0 Print ISBN: 0-792-33884-7 ©2002 Kluwer Academic Publishers New York, Boston, Dordrecht, London, Moscow All rights reserved No part of this eBook may be reproduced or transmitted in any form or by any means, electronic, mechanical, recording, or otherwise, without written consent from the Publisher Created in the United States of America Visit Kluwer Online at: http://www.kluweronline.com and Kluwer's eBookstore at: http://www.ebooks.kluweronline.com CONTENTS Introduction 1 1 Description 3 Populations, distributions and samples 5 Measures of central tendency 9 Data dispersion, noise and error 18 Graphics 28 2 Inference 45 Comparing a sample mean to a population with known mean and variance - the one sample z-test 48 Comparing a sample mean to a population with known mean and unknown variance - the one sample t-test 55 Comparing before and after data - the two sample paired t-test 62 Comparing two means - the two sample unpaired t-test 68 Comparing three or more means - the one way analysis of variance 77 Comparing two or more proportions: proportions tests and chi-square ( ) 90 Distribution-free measures: non-parametric statistics 104 3 Estimation 117 Data relationships: association and correlation 119 Data relationships: mathematical models and linear regression 128 Complex data relationships: mathematical models and non-linear regression 140 4 Design of a statistical experiment 149 Index 169 INTRODUCTION We live in a very uncertain world. Variation surrounds our work. There is noise in our experiments, in our measurements, and in our test subjects. From all these sources of uncertainty and variation, we try to extract a coherent picture of very complex and sometimes dynamic, biological and chemical processes. In fact, one of our major challenges is to separate this signal, the 'real' biology or chemistry, from the noise. The tools developed to do this are called, collectively, biostatistics. Any tool, even a hammer, can be misused. This could result, at best, in inefficiency, and, at worst, in disaster. With the advent of newer, user- friendly statistical software packages, desk top computing, and point-and- click technologies, it is easier than ever to make mistakes in your analyses. The beauty of having access to so much computing power is that you can now enjoy ultimate flexibility in data processing: that can also be a problem. Ask your computer to produce a particular analysis, report or graphic, and that is exactly what you will get: if you happen to have asked for the wrong thing it will be produced just as quickly, and you will probably never know it was wrong. One aim of this handbook is to help you choose the correct tool for the job at hand, understand its strengths and weaknesses, and to help you recognize when you should seek expert advice. We describe biostatistics as a collection of tools for very good reasons. They are techniques that have been developed to do a job. Although the mathematical theory behind them can sometimes be rather esoteric and quite complex, our primary concern, as experimental scientists, is on how they may be applied, not on the theory behind them. We use biostatistics - the entire tool box - to achieve a variety of goals. We can use some of these tools to describe our data in standard, rigorous ways which allow our audience to know exactly what we mean, and do not mean, when we discuss our results. Other tools are used to compare and draw inferences about populations: a word that needs to be taken in its broadest sense. Animals treated with different drugs represent different populations, but so do stones quarried from different sites. Yet another set of tools can be used to derive estimates of model parameters. A dose- response curve is a good example of a model based system from which estimates for parameters such as the ED50 or LD10 can be derived. These estimation tools can also provide a good insight into how much uncertain- ty there is in the model, the data, etc. and how much faith should be 2 The Biostatistics Cookbook placed in the results. The main categories we have just described are called description, inference and estimation, and we will devote one chapter to each. The point of this book is to make Biostatistics accessible. We want to inflame your intuition. Biostatistics can be intimidating if all you see are mathematical formulae - but if you understand why a particular test is performed and what it means in plain English, then you will know when and how to apply it to your own particular problems. That is our goal! 1. DESCRIPTION Collections of data are not the same thing as information. This is a rather harsh generalization, but one which holds when examined critically. Data points are measurements; they are random 'snapshots' of random processes. Because we human beings are limited by our technology, our measurements contain errors, and because it is impossible to run an experiment of infinite scope and range, data obtained from a limited sample must be extended to an entire underlying population. Data are, therefore, inherently noisy and incomplete. Information, on the other hand, depends upon context. Data need to be interpretable within that context. Valid summary and description are required to allow the signal to be separated from the noise and to enable the information obtained to be shared. For example, it makes no sense to separate your subjects into different classes and then ignore these classifications when you summarize your results. There must have been a reason for separating them in the first place: either they received different treatments, they represent different kinds of people, perhaps men and women, or they display some other attribute that makes them unique. In the next chapter we will explore ways of comparing groups. Before we do, however, it is important that you become acquainted with your data - summarize it, display it and extract from it all the information it has to offer. The tools of biostatistics which allow you to summarize, plot and interpret your data are called descriptive statistics. In the following sections we will discuss each tool separately, but first we will present a brief overview of the areas to be covered. The point of data description is to enable communication with your colleague - but what do you want to tell them? Do you really just want to describe the single sample of 10 rats you just received from your animal colony, or do you want to describe the class of subjects known as 'rat' and the effects of a particular treatment upon them? In order to generalize from your sample to the whole population you must be able to associate your observed data with an ideal underlying population that represents all the rats you could have possibly tested. In other words, we need to separate in our own minds the idea of 'population' from the idea of 'sample' so that we can derive a description of the first from the second. What do we mean by a description? Typically, we want to tell our audience about how our population responds to a stimulus. We would like to say something about the average behavior we observe, whether 4 The Biostatistics Cookbook we mean blood pressure in rats or densities in rocks. The statistician (and the skeptic!) usually also wants to know how your data are distributed around the average. Is one value, or set of values, more likely to occur than any other? We also need to know how much noise is inherent in the experiment. Suppose you could study simultaneously all the spontaneously hypertensive rats in the world. You might observe some with mean blood pressures below 90 mmHg, although the chances of that happening are quite small, maybe even 1 in a million. You would probably see more rats with blood pressures between 90 and 100 mmHg, and more still between 100 and 110 mmHg. If you allocated every hypertensive rat in the world to a group defined by blood pressure, classified in 10 mmHg intervals from 90 to 300 mmHg, you would have a clear picture of your population. That kind of experiment cannot be performed and reported in any reasonable time. You therefore need to say something about rats based upon the data observed in, say, 10 of their representatives. In the next section we will discuss populations, samples and distributions, and tie them together so that the summaries you derive from your sample actually represent the underlying population in a statistically rigorous way.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.