ebook img

Essentials of Statistics for Scientists and Technologists PDF

179 Pages·1967·5.474 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Essentials of Statistics for Scientists and Technologists

Essentials of Statistics for Scientists and Technologists Essentials of Statistics for Scientists and Technologists c. MACK Reader in Applied Mathematics Institute of Technology, Bradford PLENUM PRESS (T) NEW YORK ::r: 1967 u.s. edition published by Plenum Press. a division of Plenum Publishing Corporation 227 West 17th Street. New York. N.Y. 10011 ISBN 978-1-4684-7967-6 ISBN 978-1-4684-7965-2 (eBook) DOl 10.1007/978-1-4684-7965-2 © c. Mack 1966 First published 1967 library of Congress Catalog Card Number 67-17769 Preface Statistics is of ever-increasing importance in Science and Technology and this book presents the essentials of the subject in a form suitable either as the basis of a course of lectures or to be read and/or used on its own. It assumes very little in the way of mathematical knowledge-just the ability to substitute numerically in a few simple formulae. However, some mathematical proofs are outlined or given in full to illustrate the derivation of the subject; these can be omitted without loss of understanding. The book does aim at making clear the scope and nature of those essential tests and methods that a scientist or technologist is likely to need; to this end each chapter has been divided into sections with their own subheadings and some effort has been made to make the text unambiguous (if any reader finds a misleading point anywhere I hope he will write to me about it). Also with this aim in view, the equality of probability to proportion of population is stated early, then the normal distribution and the taking of samples is discussed. This occupies the first five chapters. With the principles of these chapters understood, the student can immediately learn the significance tests of Chapter 6 and, if he needs it, the analysis of variance of Chapter 7. For some scientists this will be most of what they need. Howcver, they will be in a position to read and/or use the remaining chapters without undue difficulty. Chapters 8 to 13 contain material which is of value to almost everyone at some time, but the remaining chapters are more special ized in content. The laws (or rules) of probability and the binomial theorem are dealt with in some detail in Chapter 9, for those inter ested in the fundamentals or in fraction defective testing. (The author and some of his colleagues have found the above order of material to give very good results with scientists and technologists, who have, as a result, grasped the essentials without encountering the difficulties that the formal laws of probability followed by the binomial distribution often give at the start.) A word to the student Learn the technical terms, they are usually introduced in inverted commas. You must do all the examples (all who have learnt statistics have found this essential). These are largely straightforward applica tions of the text; they are, by the way, in each section and not collected at the end of each chapter; this is to enable student or teacher to identify quickly the nature of each example, and also to v vi PREFACE enable an occasional user to tryout his understanding of a test on a numerical example (answers are given at the end of the book, p. 158). In addition Chapter 16 consists largely of a collection of problems of practical origin which will acquaint the reader with actual situations requiring statistics (and which are also useful as examination questions). The Appendix contains enough tables for most practical purposes of a not too specialized nature. The pertinent latest theoretical developments and improvements have been included, among them Welch's test for the significant difference between two sample means (which is independent of the true values of the population variances), the so-called 'distribution free or non-parametric' tests, and my own method of calculating confidence intervals when finding the fraction defective of articles whose parts are tested separately (in other words the product of several binomial parameters). Acknowledgments Thanks are due to Messrs Oliver and Boyd, Edinburgh, for permission to quote data from Statistical Methods in Research and Production by o. L. Davies. Other quotation acknowledgments are made in the Appendix. The original results of section 9.9 are quoted with the editor's permission from The New Journal of Statistics and Operational Research, published at the Institute of Technology, Bradford. Finally, I must acknowledge my deep personal indebtedness to those who introduced me to statistics, especially to L. H. C. Tippett and Professor M. S. Bartlett, and to my present colleagues M. L. Chambers and M. Gent, on whose knowledge and advice I have leaned heavily in writing this book, as well as to my scientist col leagues E. Dyson, J. M. Rooum, and H. V. Wyatt who contributed many of the practical examples of Chapter 16. Contents Introduction or 'What is statistics?' 1 2 The presentation of data 3 3 Probability, its meaning, real and theoretical popula- tions 15 4 Basic properties of the normal distribution 23 5 Some properties of sampling distributions 30 6 Applications of normal sampling theory; significance tests 38 7 Normal sampling theory: test for difference between several sample means, analysis of variance, design of experiments 52 8 Normal sampling theory: estimation of 'parameters' by confidence intervals, by maximum likelihood 63 9 The binomial distribution: laws of probability, applications of the binomial distribution, the multi- nomial distribution 72 10 The Poisson, negative exponential, and rectangular distributions 89 II The X2 test for 'goodness of fit': test for 'association' 95 12 Fitting lines and curves to data, least squares method 106 13 Regression curves and lines, correlation coefficient, normal bivariate distribution 116 14 Some distribution-independent (or 'distribution-free' or 'non-parametric') tests 126 15 Note on sampling techniques and quality control 132 16 Some problems of practical origin 135 Appendix 151 Answers 158 Index 171 vii I Introduction or 'What is statistics?' I do believe, Statist though I am none nor like to be, That this will prove a war-Cymbeline Act II Scene I Often the best way to understand a subject is to trace its historical development and, pursuing this policy, we note that the word 'statist' first appeared in the English language while Shakespeare was a young man (c. 1584). It meant 'a person knowledgeable in state affairs' as might be inferred from the quotation above. Almost exactly two hundred years later the word 'statistics' first appeared denoting the science of collecting, presenting, classifying and/or analysing data of importance in state or public affairs (note that the ending 'ics' usually connotes a science, e.g. physics). Within a further fifty years the word 'statistics' began to be applied to the data itself and, later, particular types of data acquired particular adjectives. Thus 'vital statistics' came to be used for the figures of births and deaths (from vita the Latin for life). Later still, any collection of data was called 'statistics' and, so, today the word means either 'data' or the 'science of collecting and analysing data', and whichever of the two meanings applies is determined from the context. Though, perhaps, the very latest use of the term 'vital statistics' to denote the most important measurements of bathing beauties should be mentioned (together with the Duke of Edinburgh's witty remark that 'Nowadays, statistics appears to mean the use of three figures to define one figure'). In this book we shall concentrate more on the analysis of data than on its collection or presentation and this analysis derives largely from the study of quite a different type of heavenly body, i.e. astronomy. In astronomy very great accuracy is essential and the need arose to make the maximum use of all the observations on a star to deduce its 'best' position. The methods developed by Gauss for this purpose (which were based on the theory of dice-throwing developed in 1654 by Pascal and Fermat, the very first theoretical work of all) were later applied to state and biological data, and were greatly extended and developed by Karl Pearson and others from 1900 onwards to this end. Later, agricultural experiments (e.g. the measurements of wheat yields with different strains, with different fertilizers, etc.) led to further important analytical developments largely due to Fisher. From 1920 onwards an ever-increasing flow 2 ESSENTIALS OF STATISTICS of both problems and solutions has arisen in every branch of science and technology, from industrial administration, from economics, etc. Indeed, it would be fair to say that some knowledge of statistics is essential to any education today. However, the reader need not be alarmed at the apparent need to acquire a mass of knowledge, for the subject matter has been reduced to a few clear-cut principles which, once grasped, apply to most problems. The aim of this book is to make those principles clear and also to give a clear idea of their use. We conclude this introduction with some problems which illustrate the need for, and use of, statistics. Now, shoe manufacturers are faced with the problem of what proportion of size 5, size 6, etc. shoes they should make and with what range of widths, etc. Since it is impossible to measure everyone in the country a small fraction or 'sample' of the population has to be measured. These questions then arise: how many people should be measured; how should they be selected, what reliability can be placed on the answer, etc.? The modern theory of statistics supplies answers to these problems. (The Shoe and Allied Trade Research Association has carried out a number of such surveys since its inception in 1948 and it may be a coincidence but I can now buy a good-fitting pair of shoes 'off the peg' and have lost the feeling of deformity that remarks like 'Rather a low instep, sir' had given me-my thanks are due to the manu facturers and to SATRA.) Again no manufacturer can guarantee that every article he makes is perfect, indeed, it is quite uneconomic to attempt to do so, but he likes to be sure that 95 %, say, of the articles reach a given standard. Again, it is usually uneconomic to test every article (it may even be impossible, e.g. testing the reliability of striking of matches). The problems of how many should be tested and what proportion of these can be safely allowed to be below the given standard are answered by statistical theory. Simi larly, in scientific work, no experiment exactly repeats itself. Thus, suppose we measure the velocity of light several times by a particular method. Each time we shall get a slightly different answer and we have to decide what the 'best' answer is. Again, suppose we measure the velocity by a different method; then we shall get a different 'best' answer and we have to compare this with the first to determine if they are consistent. These and other problems will be dealt with in this book, though we shall not deal with the theory in mathematical detail, but will outline sufficient to enable a scientist or technologist to appreciate the scope of each method or test. With this aim in mind we start with one of the early problems in the development of statistics, one which is essential to a clear understanding of the subject, namely the presentation of data. 2 The presentation of data 2.1 Graphical presentation (frequency diagram, histogram) One answer to the problem of presenting a mass of data in a clear manner is to present it graphically. One very useful form of graphical presentation is the 'frequency diagram'. In Fig. 2.1 the contents of Table 2.1, which gives the wages 'statistics' of a small factory, are shown as a frequency diagram. 50 40 >- .g, 30 .:0:,> ' It 20 10 o~~~~~~~~~~~~~~~~ 69 72 75 78 81 84 87 90 Wage-rote (Variate] FIGURE 2.1. Frequency diagram showing the wage-rate data of Table 2.1 Table 2.1 WAGE-RATES IN PENCE PER HOUR OF THE MEN IN A FACTORY Rate in pence per hour 69 72 75 78 81 84 87 90 Number of men 10 17 23 55 47 28 18 13 Before describing the nature of a frequency diagram in detail, we define a few basic terms which the reader should know. Data in statistics usually consist of measurements or readings of some 3 4 ESSENTIALS OF STATISTICS property of each person or object in a specified group. Thus, the wage-rates of Table 2.1 are measurements of the earning power of that gr6up consisting of the men in the factory. We call the property measured the 'variate', and the number of persons (or objects) who have a given value of the variate we call the 'frequency' of that variate-value. In Table 2.1 the 'variate' is the wage-rate and the number of men with a given wage-rate is the frequency (e.g. the frequency of a variate-value of 81 pence per hour is 47). The essential feature of a frequency diagram is that the heights of the ordinates are proportional to the frequencies. Example 2.1(i). Draw frequency diagrams for the following two sets of data: (a) Lengths of words selected at random from a dictionary Number of letters in the word 3 4 5 6 7 8 9 10 11 12 13 Frequency. 26 81 100 96 89 61 68 42 31 13 11 (b) Observations ofp roportion of sky covered by cloud at I Greenwich Proportion of sky covered 0'0 0'1 0·2 0·3 0'4 0'5 0·6 0'7 0·8 0·9 1·0 Frequency . . . 320 129 74 68 45 45 55 65 90 48 676 The frequency diagram gives a good pictorial description of the data if there are not too many different values of the variate (in Table 2.1 there are 8 different variate-values). When there are 15 or more variate-values it is usually better to group some of the variate values together and to present the results in the form of a 'histogram', as it is called. Fig. 2.2 presents the data of Table 2.2 (this gives the observed heights of 8,585 men) in 'histogram' form. In a histogram the area of each box is proportional to the frequency (which, in this case, is the number of men with heights within a given box-width). There are one or two details of presentation by histogram worth remarking on. (1) Where there are only a few observations as with heights less than 60 in. the observations are usually grouped into one box; similarly with heights greater than 74 in. (2) The width of the boxes is usually the same except for grouped frequencies. (3) By the height 62-in. is meant heights of 62 in. or more up to but not including 63 in.; similarly 63-in. means 63 in. or more up to 64 and so on. (4) (This point can be omitted on a first reading.) The accuracy of the measurements must be carefully noted-in this case each height was measured to the nearest tin.; thus, any height

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.