ebook img

Modelling Binary Data PDF

381 Pages·1991·10.961 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Modelling Binary Data

Modelling Binary Data OTHER STATISTICS TEXTS FROM CHAPMAN & HALL Practical Statistics for Medical Research D. Altman The Analysis of Time Series C. Chatfield Problem solving - A statisticians guide C. Chatfield Statistics for Technology C. Chatfield Introduction to Multivariate Analysis C. Chatfield and A. 1. Collins Applied Statistics: Principles and Examples D. R. Cox and E. J. Snell An Introduction to Generalized Linear Models A. 1. Dobson Introduction to Optimization Methods and their Application in Statistics B. S. Everitt Multivariate Statistics - A Practical Approach B. Flury and H. Riedwyl Readings in Decision Analysis S. French Multivariate Analysis of Variance and Repeated Measures D. 1. Hand and C. C. Taylor Multivariate Statistical Methods - A Primer Bryan F. Manley Statistical Methods in Agriculture and Experimental Biology R. Mead and R. N. Curnow Elements of Simulation B. 1. T. Morgan Probability: Methods and Measurement A. O'Hagan Essential Statistics D. G. Rees Foundations of Statistics D. G. Rees Decision Analysis: A Bayesian Approach 1. Q. Smith Applied Statistics: A Handbook of BMDP Analyses E.1. Snell Elementary Applications of Probability Theory H. C. Tuckwell Intermediate Statistical Methods G. B. Wetherill Further information on the complete range of Chapman & Hall statistics books is available from the publishers Modelling Binary Data D. Collett Department of Applied Statistics, University of Reading, UK lal11 SPRINGER-SCIENCE+BUSINESS MEDIA, B.V. First edition 1991 ©D. Collett 1991 Originally published by Chapman & Hall in 1991 Softcover reprint of the hardcover 1st edition 1991 Typeset in 9J/lli Times by Interprint Limited, Malta ISBN 978-0-412-38790-6 ISBN 978-1-4899-4475-7 (eBook) DOI 10.1007/978-1-4899-4475-7 Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the UK Copyright Designs and Patents Act, 1988, this publication may not be reproduced, stored, or transmitted, in any form or by any means, without the prior permission in writing of the publishers, or in the case of reprographic reproduction only in accordance with the terms of the licences issued by the Copyright Licensing Agency in the UK, or in accordance with the terms of licences issued by the appropriate Reproduction Rights Organization outside the UK. Enquiries concerning reproduction outside the terms stated here should be sent to the publishers at the London address printed on this page. The publisher makes no representation, express or implied, with regard to the accuracy of the information contained in this book and cannot accept any legal responsibility or liability for any errors or omissions that may be made. The GLIM macros contained in Appendix C may be reproduced and used, with no implied warranty from the author or publisher. A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data Collett, D., 1952- Modelling binary data / D. Collett. p. cm. Includes bibliographical references and index. ISBN 978-0-412-38790-6 1. Analysis of variance. 2. Distribution (Probability theory) 3. Linear models (Statistics) I. Title. QA279.C64 1991 519.5'38 - dc20 91-23845 CIP To Janet Contents Preface xi 1 Introduction 1 1.1 Some examples 1 1.2 The scope of this book 14 1.3 Use of statistical software 15 Further reading 16 2 Statistical inference for binary data 17 2.1 The binomial distribution 17 2.2 Inference about the success probability 21 2.3 Comparison of two proportions 29 2.4 Comparison of two or more proportions 37 Further reading 42 3 Models for binary and binomial data 43 3.1 Statistical modelling 43 3.2 Linear models 45 3.3 Methods of estimation 48 3.4 Fitting linear models to binomial data 51 3.5 Models for binomial response data 53 3.6 The linear logistic model 56 3.7 Fitting the linear logistic model to binomial data 56 3.8 Goodness of fit of a linear logistic model 62 3.9 Comparing linear logistic models 67 3.10 Linear trends in proportions 74 3.11 Comparing stimulus-response relationships 77 3.12 Non-convergence and overfitting 81 3.13 A further example on model selection 83 3.14 Predicting a binary response probability 88 Further reading 91 4 Bioassay and some other applications 92 4.1 The tolerance distribution 92 viii Contents 4.2 Estimating an effective dose 96 4.3 Relative potency 101 4.4 Natural response 105 4.5 Non-linear logistic regression models 108 4.6 Applications of the complementary log-log model 112 Further reading 118 5 Model checking 120 5.1 Definition of residuals 121 5.2 Checking the form of the linear predictor 126 5.3 Checking the adequacy of the link function 138 5.4 Identification of outlying observations 141 5.5 Identification of influential observations 146 5.6 Checking the assumption of a binomial distribution 160 5.7 Model checking for binary data 162 5.8 Summary and recommendations 177 5.9 A further example on the use of diagnostics 179 Further reading 186 6 Overdispersion 188 6.1 Potential causes of overdispersion 188 6.2 Modelling variability in response probabilities 192 6.3 Modelling correlation between binary responses 194 6.4 Modelling overdispersed data 195 6.5 The special case of equal n, 199 6.6 The beta-binomial model 204 6.7 Random effects in a linear logistic model 205 6.8 Comparison of methods 215 6.9 A further example on modelling overdispersion 216 Further reading 221 7 ModeUing data from epidemiological studies 223 7.1 Basic designs for aetiological studies 224 7.2 Measures of association between disease and exposure 227 7.3 Confounding and interaction 231 7.4 The linear logistic model for data from cohort studies 234 7.5 Interpreting the parameters in a linear logistic model 238 7.6 The linear logistic model for data from case-control studies 251 7.7 Matched case-control studies 260 7.8 A matched case-control study on sudden infant death syndrome 268 Further reading 275 8 Some additional topics 277 8.1 Analysis of proportions and percentages 277 Contents ix 8.2 Analysis of rates 278 8.3 Analysis of binary data from cross-over trials 278 8.4 Random effects modelling 279 8.5 Modelling errors in the measurement of explanatory variables 280 8.6 Analysis of binary time series 281 8.7 Multivariate binary data 281 8.8 Experimental design 282 9 Computer software for modelling binary data 284 9.1 Statistical packages for modelling binary data 285 9.2 Computer-based analyses of example data sets 290 9.3 Using packages to perform some non-standard analyses 326 9.4 Summary of the relative merits of packages for modelling binary data 336 Further reading 338 Appendix A 340 Appendix B 342 B.1 An algorithm for fitting a generalized linear model to binomial data 342 B.2 The likelihood function for a matched case--control study 346 Appendix C 349 C.1 GUM macro for computing Anscombe residuals 349 C.2 GUM macro for constructing a half-normal plot of the standardised deviance residuals with simulated envelopes 350 C.3 GUM macro for constructing a smoothed line 350 C.4 GUM macro for computing the LIp-statistic 351 C.5 GUM macro for computing the LID-statistic 352 C.6 GUM macro for computing the C-statistic 353 C.7 GUM macro for implementing the Williams procedure for modelling overdispersion 353 References 355 Index of Examples 362 Index 364 Preface Data are said to be binary when each observation falls into one of two categories, such as alive or dead, positive or negative, defective or non-defective, and success or failure. In this book, it is shown how data of this type can be analysed using a modelling approach. There are several books that are devoted to the analysis of binary data, and a larger number that include material on this topic. However, there does appear to be a need for a textbook of an intermediate level which dwells on the practical aspects of modelling binary data, which incorporates recent work on checking the adequacy of fitted models, and which shows how modern computa tional facilities can be fully exploited. This book is designed to meet that need. The book begins with a description of a number of studies in which binary data have been collected. These data sets, and others besides, are then used to illustrate the techniques that are presented in the subsequent chapters. The majority of the examples are drawn from the agricultural, biological and medical sciences, mainly because these are the areas of application in which binary data are most frequently encountered. Naturally, the methods described in this book can be applied to binary data from other disciplines. Underlying most analyses of binary data is the assumption that the observations are from a binomial distribution, and in Chapter 2, a number of standard statistical procedures based on this distribution are described. The modelling approach is then introduced in Chapter 3, with particular emphasis being placed on the use of the linear logistic model. The analysis of binary data from biological assays, and other applications of models for binary data, are covered in Chapter 4. A distinguishing feature of this book is the amount of material that has been included on methods for assessing the adequacy of a fitted model, and the phenomenon of overdispersion. Chapter 5 provides a comprehensive account of model checking diagnostics, while models for overdispersed data, that is, data which are more variable than the binomial distribution can accommodate, are reviewed in Chapter 6. Both of these chapters include a summary of the methods that are most likely to be useful on a routine basis. A major area of application of the linear logistic model is to the analysis of data from epidemiological studies. In Chapter 7, we see how this model provides the basis of a systematic approach for analysing data from cohort studies and both unmatched and matched case-control studies. Chapter 8 contains a brief discussion of some additional topics in the analysis of binary data.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.