Table Of ContentGeneralized Linear Models
and Extensions
Fourth Edition
James W. Hardin Department of Epidemiology and Biostatistics University of
South Carolina
Joseph M. Hilbe Statistics, School of Social and Family Dynamics Arizona State
University
A Stata Press Publication StataCorp LLC College Station, Texas
Copyright © 2001, 2007, 2012, 2018 by StataCorp LLC
All rights reserved. First edition 2001
Second edition 2007
Third edition 2012
Fourth edition 2018
Published by Stata Press, 4905 Lakeway Drive, College Station, Texas 77845
Typeset in LATX 2
E
Printed in the United States of America
10 9 8 7 6 5 4 3 2 1
Print ISBN-10: 1-59718-225-7
Print ISBN-13: 978-1-59718-225-6
ePub ISBN-10: 1-59718-226-5
ePub ISBN-13: 978-1-59718-226-3
Mobi ISBN-10: 1-59718-227-3
Mobi ISBN-13: 978-1-59718-227-0
Library of Congress Control Number: 2018937959
No part of this book may be reproduced, stored in a retrieval system, or transcribed, in any form or by any
means—electronic, mechanical, photocopy, recording, or otherwise—without the prior written permission
of StataCorp LLC.
Stata, , Stata Press, Mata, , and NetCourse are registered trademarks of StataCorp LLC.
Stata and Stata Press are registered trademarks with the World Intellectual Property Organization of the
United Nations.
NetCourseNow is a trademark of StataCorp LLC.
LATX 2 is a trademark of the American Mathematical Society.
E
In all editions of this text, this dedication page was written as the final
deliverable to the editor—after all the work and all the requested
changes and additions had been addressed. In every previous edition,
Joe and I co-dedicated this book to our wives and children. This time,
I write the dedication alone.
In memory of
Joseph M. Hilbe
who passed away after responding to the editor’s final change
requests, but before the specification of our dedication. Joe was a dear
friend and colleague. We worked closely on this new edition, and he
was as cheerful and tireless as always. We worked a long time to put
this latest edition together, and he would want the reader to know that
he is very proud of our collaboration, but even more proud of his
family: Cheryl, Heather, Michael, and Mitchell.
Contents
Figures
Tables
Listings
Preface
1 Introduction
1.1 Origins and motivation
1.2 Notational conventions
1.3 Applied or theoretical?
1.4 Road map
1.5 Installing the support materials
I Foundations of Generalized Linear Models
2 GLMs
2.1 Components
2.2 Assumptions
2.3 Exponential family
2.4 Example: Using an offset in a GLM
2.5 Summary
3 GLM estimation algorithms
3.1 Newton–Raphson (using the observed Hessian)
3.2 Starting values for Newton–Raphson
3.3 IRLS (using the expected Hessian)
3.4 Starting values for IRLS
3.5 Goodness of fit
3.6 Estimated variance matrices
3.6.1 Hessian
3.6.2 Outer product of the gradient
3.6.3 Sandwich
3.6.4 Modified sandwich
3.6.5 Unbiased sandwich
3.6.6 Modified unbiased sandwich
3.6.7 Weighted sandwich: Newey–West
3.6.8 Jackknife
Usual jackknife
One-step jackknife
Weighted jackknife
Variable jackknife
3.6.9 Bootstrap
Usual bootstrap
Grouped bootstrap
3.7 Estimation algorithms
3.8 Summary
4 Analysis of fit
4.1 Deviance
4.2 Diagnostics
4.2.1 Cook’s distance
4.2.2 Overdispersion
4.3 Assessing the link function
4.4 Residual analysis
4.4.1 Response residuals
4.4.2 Working residuals
4.4.3 Pearson residuals
4.4.4 Partial residuals
4.4.5 Anscombe residuals
4.4.6 Deviance residuals
4.4.7 Adjusted deviance residuals
4.4.8 Likelihood residuals
4.4.9 Score residuals
4.5 Checks for systematic departure from the model
4.6 Model statistics
4.6.1 Criterion measures
AIC
BIC
4.6.2 The interpretation of R in linear regression
Percentage variance explained
The ratio of variances
A transformation of the likelihood ratio
A transformation of the F test
Squared correlation
4.6.3 Generalizations of linear regression R interpretations
Efron’s pseudo-R
McFadden’s likelihood-ratio index
Ben-Akiva and Lerman adjusted likelihood-ratio index
McKelvey and Zavoina ratio of variances
Transformation of likelihood ratio
Cragg and Uhler normed measure
4.6.4 More R measures
The count R
The adjusted count R
Veall and Zimmermann R
Cameron–Windmeijer R
4.7 Marginal effects
4.7.1 Marginal effects for GLMs
4.7.2 Discrete change for GLMs
II Continuous Response Models
5 The Gaussian family
5.1 Derivation of the GLM Gaussian family
5.2 Derivation in terms of the mean
5.3 IRLS GLM algorithm (nonbinomial)
5.4 ML estimation
5.5 GLM log-Gaussian models
5.6 Expected versus observed information matrix
5.7 Other Gaussian links
5.8 Example: Relation to OLS
5.9 Example: Beta-carotene
6 The gamma family
6.1 Derivation of the gamma model
6.2 Example: Reciprocal link
6.3 ML estimation
6.4 Log-gamma models
6.5 Identity-gamma models
6.6 Using the gamma model for survival analysis
7 The inverse Gaussian family
7.1 Derivation of the inverse Gaussian model
7.2 Shape of the distribution
7.3 The inverse Gaussian algorithm
7.4 Maximum likelihood algorithm
7.5 Example: The canonical inverse Gaussian
7.6 Noncanonical links
8 The power family and link
8.1 Power links
8.2 Example: Power link
8.3 The power family
III Binomial Response Models
9 The binomial–logit family
9.1 Derivation of the binomial model
9.2 Derivation of the Bernoulli model
9.3 The binomial regression algorithm
9.4 Example: Logistic regression
9.4.1 Model producing logistic coefficients: The heart data
9.4.2 Model producing logistic odds ratios
9.5 GOF statistics
9.6 Grouped data
9.7 Interpretation of parameter estimates
10 The general binomial family
10.1 Noncanonical binomial models
10.2 Noncanonical binomial links (binary form)
10.3 The probit model
10.4 The clog-log and log-log models
10.5 Other links
10.6 Interpretation of coefficients
10.6.1 Identity link
10.6.2 Logit link
10.6.3 Log link
10.6.4 Log complement link
10.6.5 Log-log link
10.6.6 Complementary log-log link
10.6.7 Summary
10.7 Generalized binomial regression
10.8 Beta binomial regression
10.9 Zero-inflated models
11 The problem of overdispersion
11.1 Overdispersion
11.2 Scaling of standard errors
11.3 Williams’ procedure
11.4 Robust standard errors
IV Count Response Models
12 The Poisson family
12.1 Count response regression models
12.2 Derivation of the Poisson algorithm
12.3 Poisson regression: Examples
12.4 Example: Testing overdispersion in the Poisson model
12.5 Using the Poisson model for survival analysis
12.6 Using offsets to compare models
12.7 Interpretation of coefficients
13 The negative binomial family
13.1 Constant overdispersion
13.2 Variable overdispersion
13.2.1 Derivation in terms of a Poisson–gamma mixture
13.2.2 Derivation in terms of the negative binomial probability function
13.2.3 The canonical link negative binomial parameterization
13.3 The log-negative binomial parameterization
13.4 Negative binomial examples
13.5 The geometric family
13.6 Interpretation of coefficients
14 Other count-data models
14.1 Count response regression models
14.2 Zero-truncated models
14.3 Zero-inflated models
14.4 General truncated models
14.5 Hurdle models
14.6 Negative binomial(P) models
14.7 Negative binomial(Famoye)
14.8 Negative binomial(Waring)
14.9 Heterogeneous negative binomial models
14.10 Generalized Poisson regression models
14.11 Poisson inverse Gaussian models
14.12 Censored count response models
14.13 Finite mixture models
14.14 Quantile regression for count outcomes
14.15 Heaped data models
V Multinomial Response Models
15 Unordered-response family
15.1 The multinomial logit model
15.1.1 Interpretation of coefficients: Single binary predictor
15.1.2 Example: Relation to logistic regression
15.1.3 Example: Relation to conditional logistic regression
15.1.4 Example: Extensions with conditional logistic regression
15.1.5 The independence of irrelevant alternatives
15.1.6 Example: Assessing the IIA
15.1.7 Interpreting coefficients
15.1.8 Example: Medical admissions—introduction
15.1.9 Example: Medical admissions—summary
15.2 The multinomial probit model
15.2.1 Example: A comparison of the models
15.2.2 Example: Comparing probit and multinomial probit
15.2.3 Example: Concluding remarks
16 The ordered-response family
16.1 Interpretation of coefficients: Single binary predictor
16.2 Ordered outcomes for general link
16.3 Ordered outcomes for specific links
16.3.1 Ordered logit
16.3.2 Ordered probit
16.3.3 Ordered clog-log
16.3.4 Ordered log-log
16.3.5 Ordered cauchit
16.4 Generalized ordered outcome models
16.5 Example: Synthetic data
16.6 Example: Automobile data
16.7 Partial proportional-odds models