Table Of ContentApplied Statistics
PRINCIPLES AND EXAMPLES
Applied Statistics
PRINCIPLES AND EXAMPLES
D.R. cox
E.J. SNELL
Department of Mathematics,
Imperial College, University of London
LONDON NEW YORK
CHAPMAN AND HALL
First published 1981 by
Chapman and Hall Ltd
11 New Fetter Lane, London EC4P 4EE
Published in the USA by
Chapman and Hall
in association with Methuen, Inc.
733 Third Avenue, New York NY 10017
© 1981 D.R. Cox and E.l. Snell
Softcover reprint of the hardcover 1st edition 1981
at the University Press, Cambridge
ISBN-13: 978-94-009-5840-1 e-ISBN-13: 978-94-009-5838-8
DOl: 10.1007/978-94-009-5838-8
This title is available in both hardbound and paper
back editions. The paperback edition is sold subject
to the condition that it shall not, by way of trade
or otherwise, be lent, re-sold, hired out, or
otherwise circulated without the publisher's prior
consent in any form of binding or cover other than
that in which it is published and without a similar
condition including this condition being imposed on
the subsequent purchaser.
All rights reserved. No part of this book may be
reprinted, or reproduced or utilized in any form or
by any electronic, mechanical or other means, now
known or hereafter invented, including photocopying
and recording, or in any information storage and
retrieval system, without permission in writing from
the Publisher.
British Library Cataloguing in Publication Data
Cox, D.R.
Applied statistics.
1. Mathematical statistics
I. Title II. Snell, E.l.
519.5 QA276
Contents
Preface page vii
PART I PRINCIPLES
1. Nature and objectives of statistical analysis 3
1.1 Introduction 3
1.2 Data quality 4
1.3 Data structure and quantity 5
1.4 Phases of analysis 6
1.5 Styles of analysis 7
1.6 Computational and numerical analytical aspects 7
1.7 Response and explanatory variables 8
1.8 Types of investigation 10
1.9 Purposes of investigation 12
2. Some general concepts 14
2.1 Types of observation 14
2.2 Descriptive and probabilistic methods 15
2.3 Some aspects of probability models 17
3. Some strategical aspects 20
3.1 Introduction 20
3.2 Incorporation of related data and external information 20
3.3 Role of special stochastic models 21
3.4 Achievement of economical and consistent description 21
3.5 Attitudes to assumptions 23
3.6 Depth and complexity of analysis appropriate 24
3.7 Analysis in the light of the data 25
4. Some types of statistical procedure 28
4.1 Introduction 28
4.2 Formulation of models: generalities 28
4.3 Formulation of models: systematic component 29
4.4 Formulation of models: random component 33
4.5 Calculation of summarizing quantities 35
v
vi Contents
4.6 Graphical analysis 36
4.7 Significance tests 37
4.8 Interval estimation 39
4.9 Decision procedures 41
4.10 Examination of the adequacy of models 42
4.11 Parameters and parameterization 42
4.12 Transformations 46
4.13 Interaction 47
PART II EXAMPLES
A Admissions to intensive care unit 53
B Intervals between adjacent births 58
C Statistical aspects of literary style 63
D Temperature distribution in a chemical reactor 68
E A 'before and after' study of blood pressure 72
F Comparison of industrial processes in the presence of trend 77
G Cost of construction of nuclear power plants 81
H Effect of process and purity index on fault occurrence 91
I Growth of bones from chick embryos 95
J Factorial experiment on cycles to failure of worsted yarn 98
K Factorial experiment on diets for chickens 103
L Binary preference data for detergent use 107
M Fertilizer experiment on growth of cauliflowers 112
N Subjective preference data on soap pads 116
o
Atomic weight of iodine 121
P Multifactor experiment on a nutritive medium 126
Q Strength of cotton yarn 131
R Biochemical experiment on the blood of mice 135
S Voltage regulator performance 139
T Intervals between the failure of air-conditioning equipment in
aircraft 143
U Survival times of leukemia patients 148
V A retrospective study with binary data 151
W Housing and associated factors 155
X Educational plans of Wisconsin schoolboys 162
Summary of examples 165
Further sets of data 168
References 181
Author index 185
SUbject index 187
Preface
There are many books which set out the more commonly used statistical
methods in a form suitable for applications. There are also widely available
computer packages for implementing these techniques in a relatively painless
way. We have in the present book concentrated not so much on the techniques
themselves but rather on the general issues involved in their fruitful
application.
The book is in two parts, the first dealing with general ideas and principles
and the second with a range of examples, all, however, involving fairly small
sets of data and fairly standard techniques. Readers who have experience of
the application of statistical methods may want to concentrate on the first
part, using the second part, and better still their own experience, to illuminate
and criticize the general ideas. If the book is used by students with little or no
experience of applications, a selection of examples from the second part of the
book should be studied first, any general principles being introduced at a
later stage when at least some background for their understanding is available.
After some hesitation we have decided to say virtually nothing about
detailed computation. This is partly because the procedures readily available
will be different in different institutions. Those having access to GUM will
find that most of the examples can be very conveniently handled; however the
parameterization in GUM, while appropriate for the great generality
achieved, is not always suitable for interpretation and presentation of con
clusions. Most, although not all, of the examples are in fact small enough to
be analysed on a good pocket calculator. Students will find it instructive
themselves to carry out the detailed analysis.
We do not put forward our analyses of the examples as definitive. If the
examples are used in teaching statistical methods, students should be en
couraged to tryout their own ideas and to compare thoughtfully the con
clusions from alternative analyses. Further sets of data are included for use
by students.
Many of the examples depend in some way on application of the method
of least squares or analysis of variance or maximum likelihood. Some famili
arity with these is assumed, references being given for specific points.
The examples all illustrate real applications of statistical methods to some
branch of science or technology, although in a few cases fictitious data have
vii
viii Preface
been supplied. The main general limitation on the examples is, as noted
above, that inevitably they all involve quite small amounts of data, and im
portant aspects of statistical analysis specific to large amounts of data are
therefore not well covered. There is the further point that in practice over
elaboration of analysis is to be avoided. With very small sets of data, simple
graphs and summary statistics may tell all, yet we have regarded it as legiti
mate for illustration in some cases to apply rather more elaborate analyses
than in practice would be justified.
We are grateful to Dr C. Chatfield, University of Bath, for constructive
comments on a preliminary version of the book.
D.R. Cox
E.J. Snell
London, September 1980
Part I Principles
Chapter 1 Nature and objectives
of statistical analysis
1.1 Introduction
Statistical analysis deals with those aspects of the analysis of data that are
not highly specific to particular fields of study. That is, the object is to provide
concepts and methods that will, with suitable modification, be applicable in
many different fields of application; indeed one of the attractions of the
subject is precisely this breadth of potential application.
This book is divided into two parts. In the first we try to outline, without
going into much specific detail, some of the general ideas involved in applying
statistical methods. In the second part, we discuss some special problems,
aiming to illustrate both the general principles discussed earlier and also
particular techniques. References to these problems are given in Part I where
appropriate. While all the examples are real, discussion of them is inhibited
by two fairly obvious constraints. Firstly, it is difficult in a book to convey
the interplay between subject-matter considerations and statistical analysis
that is essential for fruitful work. Secondly, for obvious reasons, the sets of
data analysed are quite small. In addition to the extra computation involved
in the analysis of large sets of data, there are further difficulties connected, for
example, with its being hard in large sets of data to detect initially unanticip
ated complications. To the extent that many modern applications involve
large sets of data, this book thus paints an oversimplified picture of applied
statistical work.
We deal very largely with methods for the careful analysis and interpreta
tion of bodies of scientific and technological data. Many of the ideas are in
fact very relevant also to procedures for decision making, as in industrial
acceptance sampling and automatic process control, but there are special
issues in such applications, arising partly from the relatively mechanical
nature of the final procedures. In many applications, however, careful con
sideration of objectives will indicate a specific feature of central interest.
Special considerations enter also into the standardized analysis of routine
test data, for example in a medical or industrial context. The need here may
be for clearly specified procedures that can be applied, again in a quite
mechanical fashion, giving sensible answers in a wide range of circumstances,
and allowing possibly for individual 'management by exception' in extremely
3
4 Applied statistics [1.1
peculiar situations; quite commonly, simple statistical procedures are built
in to measuring equipment. Ideally, largely automatic rejection of 'outliers'
and routine quality control of measurement techniques are incorporated. In
the present book, however, we are primarily concerned with the individual
analysis of unique sets of data.
1.2 Data quality
We shall not in this book deal other than incidentally with the planning of
data collection, e.g. the design of experiments, although it is clear in a general
way that careful attention to design can simplify analysis and strengthen
interpretation.
We begin the discussion here, however, by supposing that data become
available for analysis. The first concerns are then with the quality of the data
and with what can be broadly called its structure. In this section we discuss
briefly data quality.
Checks of data quality typically include:
(i) visual or automatic inspection of the data for values that are logically
inconsistent or in conflict with prior information about the ranges likely to
arise for the various variables. For instances of possibly extreme observa
tions, see Examples E and S. Inspection of the minimum and maximum of
each variable is a minimal check;
(ii) examination of frequency distributions of the main variables to look
for small groups of discrepant observations;
(iii) examination of scatter plots of pairs of variables likely to be highly
related, this detecting discrepant observations more sensitively than (ii);
(iv) a check of the methods of data collection to discover the sources, if
any, of biases in measurement (e.g. differences between observers) which it
may be necessary to allow for in analysis, and to assess the approximate
measurement and recording errors for the main variables;
(v) a search for missing observations, including observations that have
been omitted because of their highly suspicious character. Often missing
observations are denoted in some conventional way, such as 0 or 99, and it
will be important not to enter these as real values in any analysis.
Concern that data quality should be high without extensive effort being
spent on achieving unrealistically high precision is of great importance. In
particular, recording of data to a large number of digits can be wasteful; on
the other hand, excessive rounding sacrifices information. The extent to which
poor data quality can be set right by more elaborate analysis is very limited,
particularly when appreciable systematic errors are likely to be present and
cannot be investigated and removed. By and large such poor-quality data
will not merit very detailed analysis.