Table Of ContentWiley Seriu in Probability and Stlltisrics
Applied Logistic
Regression
Second Edition
David W Hosmer
Stanley Lemeshow
WILEY SERIES fN PROBABILITY AND STATISTICS
TEXTS AND REFERENCES SECTION
Established by WALTER A. SHEWHART and SAMUELS. WILKS
Editors: Noel A. C. Cressie, Nicholas I. Fisher, lain M Johnstone, J. B. Kadane,
David W. Scott, Bernard W. Silverman, Adrian F. M. Smith, Jozef L. Teugels;
Vic Barnett, Emeritus, Ralph A. Bradley, Emeritus, J. Stuart Hunter, Emeritus,
David G. Kendall, Emeritus
A complete list of the titles in this series appears at the end of this volume.
Applied Logistic Regression
Second Edition
DAVID W. HOSMER
University ofM assachusetts
Amherst, Massachusetts
STANLEY LEMESHOW
The Ohio State University
Columbus, Ohio
A Wiley-Interscience Publication
JOHN WILEY & SONS, INC.
New York • Chichester • Weinheim • Brisbane • Singapore • Toronto
To Trina, Wylie, Tri,
D. W.H.
To Elaine, Jenny, Adina, Steven,
S. L.
This text is printed on acid-tree paper. @
Copyright© 2000 by John Wiley & Sons, Inc.
All rights reserved. Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system or transmitted
in any form or by any means, electronic, mechanical, photocopying, recording, scanning
or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States
Copyright Act, without either the prior written permission of the Publisher, or
authorization through payment of the appropriate per-copy fee to the Copyright
Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax
(978) 750-4470. Requests to the Publisher for permission should be addressed to the
Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030,
(20 1) 748-6011, fax (20 1) 748-6008, E-Mail: PERMREQ@ WILEY. COM.
To order books or for customer service please, calll(800)-CALL-WILEY (225-5945).
Library of Congress Cataloging in Publication Data:
Hosmer, David W.
Applied logistic regression I David W. Hosmer. Jr., Stanley Lemeshow.-2nd ed.
p. em.
Includes bibliographical references and index.
ISBN 0-471-35632-8 (cloth : alk. paper)
I. Regression analysis. I. Lemeshow, Stanley. II. Title.
QA278.2.H67 2000
519.5'36-dc21 00-036843
Printed in the United States of America
10 9 8 7 6 5 4
CONTENTS
1 Introduction to the Logistic Regression Model 1
1.1 Introduction, I
1.2 Fitting the Logistic Regression Model, 7
1.3 Testing for the Significance ofthe Coefficients, 11
1.4 Confidence Interval Estimation, 17
1.5 Other Methods of Estimation, 21
I .6 Data Sets, 23
1.6.1 The ICU Study, 23
1.6.2 The Low Birth Weight Study, 25
1.6.3 The Prostate Cancer Study, 26
1.6.4 The UMARU IMPACT Study, 27
Exercises, 28
2 MuJtiple Logistic Regression 31
2.1 Introduction, 31
2.2 The Multiple Logistic Regression Model, 31
2.3 Fitting the Multiple Logistic Regression Model, 33
2.4 Testing for the Significance of the Model, 36
2.5 Confidence Interval Estimation, 40
2.6 Other Methods of Estimation, 43
Exercises, 44
3 Interpretation of the Fitted Logistic Regression Model 47
3.1 Introduction, 47
3.2 Dichotomous Independent Variable, 48
3.3 Polychotomous Independent Variable, 56
3.4 Continuous Independent Variable, 63
3.5 The Multivariable Model, 64
3.6 Interaction and Confounding, 70
3.7 Estimation of Odds Ratios in the Presence of
Interaction, 74
3.8 A Comparison of Logistic Regression and
Stratified Analysis for 2 x 2 Tables, 79
3.9 Interpretation ofthe Fitted Values, 85
Exercises, 88
4 Model-Building Strategies and Methods for
v
vi CONTENTS
Logistic Regression 91
4.1 Introduction, 91
4.2 Variable Selection, 92
4.3 Stepwise Logistic Regression, 116
4.4 Best Subsets Logistic Regression, 128
4.5 Numerical Problems, 135
Exercises, 142
5 Assessing the Fit of the Model 143
5.1 Introduction, 143
5.2 Summary Measures of Goodness-of-Fit, 144
5.2.1 Pearson Chi-Square Statistic and Deviance, 145
5.2.2 The Hosmer-Lemeshow Tests, 147
5.2.3 Classification Tables, 156
5.2.4 Area Under the ROC Curve, 160
5.2.5 Other Summary Measures, 164
5.3 Logistic Regression Diagnostics, 167
5.4 Assessment of Fit via External Validation, 186
5.5 Interpretation and Presentation of Results from
a Fitted Logistic Regression Model, 188
Exercises, 200
6 Application of Logistic Regression with Different
Sampling Models 203
6.1 Introduction, 203
6.2 Cohort Studies, 203
6.3 Case-Control Studies, 205
6.4 Fitting Logistic Regression Models to Data
from Complex Sample Surveys, 211
Exercises, 222
7 Logistic Regression for Matched Case-Control Studies 223
7 .I Introduction, 223
7.2 Logistic Regression Analysis for the 1-1
Matched Study, 226
7.3 An Example of the Use of the Logistic Regression
Model in a 1-1 Matched Study, 230
7.4 Assessment of Fit in a Matched Study, 236
7.5 An Example ofthe Use of the Logistic Regression
Model in a 1-M Matched Study, 243
7.6 Methods for Assessment of Fit in a 1-M
CONTENTS vii
Matched Study, 248
7. 7 An Example of Assessment of Fit in a 1-M
Matched Study, 252
Exercises, 259
8 Special Topics 260
8.1 The Multinomial Logistic Regression Model, 260
8.1.1 Introduction to the Model and Estimation of the
Parameters, 260
8.1.2 Interpreting and Assessing the Significance of the
Estimated Coefficients, 264
8.1.3 Model-Building Strategies for Multinomial Logistic
Regression, 273
8 .1.4 Assessment of Fit and Diagnostics for the
Multinomial Logistic Regression Model, 280
8.2 Ordinal Logistic Regression Models, 288
8.2.1 Introduction to th~ Models, Methods for Fitting
and Interpretation of Model Parameters, 288
8.2.2 Model Building Strategies for Ordinal Logistic
Regression Models, 305
8.3 Logistic Regression Models for the Analysis of
Correlated Data, 308
8.4 Exact Methods for Logistic Regression Models, 330
8.5 Sample Size Issues .When Fitting Logistic Regression
Models, 339
Exercises, 347
Addendum 352
References 354
Index 369
This page intentionally left blank
Preface To The Second Edition
The use of logistic regression modeling has exploded during the
past decade. From its original acceptance in epidemiologic research, the
method is now commonly employed in many fields including but not
nearly limited to biomedical research, business and finance, criminol
ogy, ecology, engineering, health policy, linguistics and wildlife biol
ogy. At the same time there has been an equal amount of effort in re
search on all statistical aspects of the logistic regression model. A lit
erature search that we did in preparing this Second Edition turned up
more than 1000 citations that have appeared in the 10 years since the
First Edition of this book was published.
When we worked on the First Edition of this book we were very lim
ited by software that could carry out the kinds of analyses we felt were
important. Specifically, beyond estimation of regression coefficients,
we were interested in such issues as measures of model performance,
diagnostic statistics, conditional analyses and multinomial response data.
Software is now readily available in numerous easy to use and widely
available statistical packages to address these and other extremely im
portant modeling issues. Enhancements to these capabilities are being
added to each new version. As is well-recognized in the statistical com
munity, the inherent danger of this easy-to-use software is that investi
gators are using a very powerful tool about which they may have only
limited understanding. It is our hope that this Second Edition will
bridge the gap between the outstanding theoretical developments and
the need to apply these methods to diverse fields of inquiry.
Numerous texts have sections containing a limited discussion of lo
gistic regression modeling but there are still very few comprehensive
texts on this subject. Among the textbooks written at a level similar to
ix
X PREFACE TO THE SECOND EDITION
this one are: Cox and Snell ( 1989), Collett ( 1991) and Kleinbaum
(1994).
As was the case in our First Edition, the primary objective of the
Second Edition is to provide a focused introduction to the logistic re
gression model and its use in methods for modeling the relationship
between a categorical outcome variable and a set of covariates. Topics
that have been added to this edition include: numerous new techniques
for model building including determination of scale of continuous co
variates; a greatly expanded discussion of assessing model performance;
a discussion of logistic regression modeling using complex sample sur
vey data; a comprehensive treatment of the use of logistic regression
modeling in matched studies; completely new sections dealing with lo
gistic regression models for multinomial, ordinal and correlated re
sponse data, exact methods for logistic regression and sample size is
sues. An underlying theme throughout this entire book is the focus on
providing guidelines for effective model building and interpreting the
resulting fitted model within the context of the applied problem.
The materials in the book have evolved considerably over the past
ten years as a result of our teaching and consulting experiences. We
have used this book to teach parts of graduate level survey courses,
quarter- or semester-long courses, and focused short courses to working
professionals. We assume that students have a solid foundation in linear
regression methodology and contingency table analysis.
The approach we take is to develop the model from a regression
analysis point of view. This is accomplished by approaching logistic
regression in a manner analogous to what would be considered good
statistical practice for linear regression. This differs from the approach
used by other authors who have begun their discussion from a contin
gency table point of view. While the contingency table approach may
facilitate the interpretation of the results, we believe that it obscures the
regression aspects of the analysis. Thus, discussion of the interpretation
of the model is deferred until the regression approach to the analysis is
firmly established.
To a large extent there are no major differences in the capabilities
of the various software packages. When a particular approach is avail
able in a limited number of packages, it will be noted in this text. In
general, analyses in this book have been performed in STATA [Stata
Corp. (1999)]. This easy to use package combines excellent graphics
and analysis routines, is fast, is compatible across Macintosh, Windows
and UNIX platforms and interacts well with Microsoft Word. Other