Table Of ContentSpringer Texts in Statistics
Advisors:
George Casella Stephen Fienberg Ingram Olkin
Springer
New York
Berlin
Heidelberg
Barcelona
Hong Kong
London
Milan
Paris
Singapore
Tokyo
Springer Texts in Statistics
Alfred: Elements of Statistics for the Life and Social Sciences
Berger: An Introduction to Probability and Stochastic Processes
Bilodeau and Brenner: Theory of Multivariate Statistics
Blom: Probability and Statistics: Theory and Applications
Brockwell and Davis: An Introduction to Times Series and Forecasting
Chow and Teicher: Probability Theory: Independence, Interchangeability,
Martingales, Third Edition
Christensen: Plane Answers to Complex Questions: The Theory of Linear
Models, Second Edition
Christensen: Linear Models for Multivariate, Time Series, and Spatial Data
Christensen: Log-Linear Models and Logistic Regression, Second Edition
Creighton: A First Course in Probability Models and Statistical Inference
Dean and Voss: Design and Analysis of Experiments
du Toit, Steyn, and Stumpf Graphical Exploratory Data Analysis
Durrett: Essentials of Stochastic Processes
Edwards: Introduction to Graphical Modelling, Second Edition
Finkelstein and Levin: Statistics for Lawyers
Flury: A First Course in Multivariate Statistics
Jobson: Applied Multivariate Data Analysis, Volume I: Regression and
Experimental Design
Jobson: Applied Multivariate Data Analysis, Volume II: Categorical and
Multivariate Methods
Kalbfleisch: Probability and Statistical Inference, Volume I: Probability,
Second Edition
Kalbfleisch: Probability and Statistical Inference, Volume II: Statistical
Inference, Second Edition
Karr: Probability
Keyjitz: Applied Mathematical Demography, Second Edition
Kiefer: Introduction to Statistical Inference
Kokoska and Nevison: Statistical Tables and Formulae
Kulkarni: Modeling, Analysis, Design, and Control of Stochastic Systems
Lehmann: Elements of Large-Sample Theory
Lehmann: Testing Statistical Hypotheses, Second Edition
Lehmann and Casella: Theory of Point Estimation, Second Edition
Lindman: Analysis of Variance in Experimental Design
Lindsey: Applying Generalized Linear Models
Madansky: Prescriptions for Working Statisticians
McPherson: Statistics in Scientific Investigation: Its Basis, Application, and
Interpretation
Mueller: Basic Principles of Structural Equation Modeling: An Introduction to
LISREL and EQS
(continued after index)
Ashish Sen Muni Srivastava
Regression Analysis
Theory, Methods, and Applications
With 38 Illustrations
i
Springer
Ashish Sen Muni Srivastava
College of Architecture, Art, and Urban Planning Department of Statistics
School of Urban Planning and Policy University of Toronto
The University of Illinois Toronto, Ontario
Chicago, IL 60680 Canada M5S lAl
USA
Editorial Board
George Casella Stephen Fienberg Ingram 01kin
Biometrics Unit Department of Statistics Department of Statistics
Cornell University Carnegie-Mellon U ni versi ty Stanford University
Ithaca, NY 14853-7801 Pittsburgh, PA 15213 Stanford, CA 94305
USA USA USA
Mathematical Subject Classification: 62Jxx, 62-01
Library of Congress Cataloging-in-Publication Data
Sen, Ashish K.
Regression analysis: Theory, methods, and applications/Ashish Sen, Muni
Srivastava.
p. cm.-(Springer texts in statistics)
ISBN-J3: 978-1-4612-8789-6
I. Regression analysis. I. Srivastava, M.S. II. Title.
III. Series.
QA278.2.S46 1990
519.5'36---dc20 89-48506
Printed on acid-free paper.
© 1990 Springer-Verlag New York Inc.
Softcover reprint of the hardcover I st edition 1990
All rights reserved. This work may not be translated or copied in whole or in part without the written
permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New York, NY
10010, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in
connection with any form of information and retrieval, electronic adaptation, computer software, or
by similar or dissimilar methodology now known or hereafter developed is forbidden.
The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the
former are not especially identified, is not to be taken as a sign that such names, as understood by
the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone.
Photcomposed copy prepared from the authors' LATEX file.
9 8 7 6 5
ISBN-13: 978-1-4612-8789-6 e-ISBN-13: 978-1-4612-4470-7
001: 10.1007/978-1-4612-4470-7
To
Ashoka Kumar Sen
and the memory of
Jagdish Bahadur Srivastava
Preface
Any method of fitting equations to data may be called regression. Such
equations are valuable for at least two purposes: making predictions and
judging the strength of relationships. Because they provide a way of em
pirically identifying how a variable is affected by other variables, regression
methods have become essential in a wide range of fields, including the social
sciences, engineering, medical research and business.
Of the various methods of performing regression, least squares is the
most widely used. In fact, linear least squares regression is by far the most
widely used of any statistical technique. Although nonlinear least squares
is covered in an appendix, this book is mainly about linear least squares
applied to fit a single equation (as opposed to a system of equations).
The writing of this book started in 1982. Since then, various drafts have
been used at the University of Toronto for teaching a semester-long course
to juniors, seniors and graduate students in a number of fields, including
statistics, pharmacology, engineering, economics, forestry and the behav
ioral sciences. Parts of the book have also been used in a quarter-long course
given to Master's and Ph.D. students in public administration, urban plan
ning and engineering at the University of Illinois at Chicago (UIC). This
experience and the comments and criticisms from students helped forge the
final version.
The book offers an up-to-date account of the theory and methods of
regression analysis. We believe our treatment of theory to be the most
complete of any book at this level. The methods provide a comprehensive
toolbox for the practicing regressionist. The examples, most of them drawn
from 'real life' , illustrate the difficulties commonly encountered in the prac
tice of regression, while the solutions underscore the subjective judgments
the practitioner must make. Each chapter ends with a large number of exer
cises that supplement and reinforce the discussions in the text and provide
valuable practical experience. When the reader has mastered the contents
of this book, he or she will have gained both a firm foundation in the the
ory of regression and the experience necessary to competently practice this
valuable craft.
A first course in mathematical statistics, the ability to use statistical
computer packages and familiarity with calculus and linear algebra are
viii Preface
prerequisites for the study of this book. Additional statistical courses and
a good knowledge of matrices would be helpful.
This book has twelve chapters. The Gauss-Markov Conditions are as
sumed to hold in the discussion of the first four chapters; the next five
chapters present methods to alleviate the effects of violations of these con
ditions. The final three chapters discuss the somewhat related topics of
multicollinearity, variable search and biased estimation. Relevant matrix
and distribution theory is surveyed in the first two appendices at the end
of the book, which are intended as a convenient reference. The last appendix
covers nonlinear regression.
Chapters and sections that some readers might find more demanding are
identified with an asterisk or are placed in appendices to chapters. A reader
can navigate around these without losing much continuity. In fact, a reader
who is primarily interested in applications may wish to omit many of the
other proofs and derivations. Difficult exercises have also been marked with
asterisks.
Since the exercises and examples use over 50 data sets, a disk containing
most of them is provided with the book. The READ.ME file in the disk
gives further information on its contents.
This book would have been much more difficult, if not impossible, to
write without the help of our colleagues and students. We are especially
grateful to Professor Siim Soot, who examined parts of the book and was
an all-round friend; George Yanos of the Computer Center at VIC, whose
instant E-mail responses to numerous cries for help considerably shortened
the time to do the numerical examples (including those that were ulti
mately not used); Dr. Chris Johnson, who was a research associate of one
of the authors during the time he learnt most about the practical art of
regression; Professor Michael Dacey, who provided several data sets and
whose encouragement was most valuable; and to Professor V. K. Srivas
tava whose comments on a draft of the book were most useful. We also
learnt a lot from earlier books on the subject, particularly the first editions
of Draper and Smith (1966) and Daniel and Wood (1971), and we owe a
debt of gratitude to their authors.
Numerous present and former students of both authors contributed their
time in editing and proof-reading, checking the derivations, inputting data,
drawing diagrams and finding data-sets. Soji Abass, Dr. Martin Bilodeau,
Robert Drozd, Andrea Fraser, Dr. Sucharita Ghosh, Robert Gray, Neleema
Grover, Albert Hoang, M.R. Khavanin, Supin Li, Dr. Claire McKnight,
Cresar Singh, Yanhong Wu, Dr. Y. K. Yau, Seongsun Yun and Zhang Ting
wei constitute but a partial list of their names. We would like to single out
for particular mention Marguerite Ennis and Piyushimita Thakuriah for
their invaluable help in completing the manuscript. Linda Chambers JEXed
an earlier draft of the manuscript, Barry Grau was most helpful identifying
computer programs, some of which are referred to in the text, Marilyn
Engwall did the paste-up on previous drafts, Ray Brod drew one of the
Preface ix
figures and Bobbie Albrecht designed the cover. We would like to express
our gratitude to all of them. A particular thanks is due to Dr. Colleen Sen
who painstakingly edited and proofread draft after draft.
We also appreciate the patience of our colleagues at UIC and the Uni
versity of Toronto during the writing of this book. The editors at Springer
Verlag, particularly Susan Gordon, were most supportive. We would like
to gratefully acknowledge the support of the Natural Sciences and Engi
neering Research Council of Canada and the National Science Foundation
of the U.S. during the time this book was in preparation. The help of the
Computer Center at UIC which made computer time freely available was
indispensable.
Preface to the Fourth Printing
We have taken advantage of this as well as previous reprintings to correct
several typographic errors. In addition, two exercises have been changed.
One because it required too much effort and another because we were able
to replace it with problems we found more interesting.
In order to keep the price of the book reasonable, the data disk has is
no longer included. Its contents have been placed at web sites from which
they may be downloaded. The URLs are http://VIWW.springer-ny . com
andhttp://VIWW.uic.edu/-ashish/regression.html.
Contents
1 Introduction 1
1.1 Relationships . . . . . . . . . . . . . . . . . . . 1
1.2 Determining Relationships: A Specific Problem 2
1.3 The Model ............... 5
1.4 Least Squares . . . . . . . . . . . . . . . . 7
1.5 Another Example and a Special Case. . . 10
1.6 When Is Least Squares a Good Method? . 11
1. 7 A Measure of Fit for Simple Regression 13
1.8 Mean and Variance of bo and b1 . 14
1.9 Confidence Intervals and Tests 17
1.10 Predictions . . . . . . . 18
Appendix to Chapter 1. 20
Problems ..... 23
2 Multiple Regression 28
2.1 Introduction............... 28
2.2 Regression Model in Matrix Notation. 28
2.3 Least Squares Estimates. 30
2.4 Examples . . . . . . . . . . . . . . . . 31
2.5 Gauss-Markov Conditions . . . . . . . 35
2.6 Mean and Variance of Estimates Under G-M Conditions 35
2.7 Estimation of 0-2 • • • • • • • 37
2.8 Measures of Fit . . . . . . . . 39
2.9 The Gauss-Markov Theorem 41
2.10 The Centered Model . . . . 42
2.11 Centering and Scaling . . . 44
2.12 *Constrained Least Squares 44
Appendix to Chapter 2 . 46
Problems .......... 49
3 Tests and Confidence Regions 60
3.1 Introduction.... 60
3.2 Linear Hypothesis ...... 60
3.3 *Likelihood Ratio Test . . . . 62
3.4 *Distribution of Test Statistic 64
3.5 Two Special Cases . . . . . . 65
3.6 Examples . . . . . . . . . . . 66
3.7 Comparison of Regression Equations 67
3.8 Confidence Intervals and Regions . . 71
3.8.1 C.l. for the Expectation of a Predicted Value 71
3.8.2 C.l. for a Future Observation . . . . . . . . . 71
3.8.3 *Confidence Region for Regression Parameters 72
3.8.4 *C.l.'s for Linear Combinations of Coefficients 73
Problems ......................... 74
4 Indicator Variables 83
4.1 Introduction........ 83
4.2 A Simple Application .. 83
4.3 Polychotomous Variables. 84
4.4 Continuous and Indicator Variables . 88
4.5 Broken Line Regression ..... . 89
4.6 Indicators as Dependent Variables 92
Problems ............. . 95
5 The Normality Assumption 100
5.1 Introduction ...... . 100
5.2 Checking for Normality . 101
5.2.1 Probability Plots . 101
5.2.2 Tests for Normality. 105
5.3 Invoking Large Sample Theory 106
5.4 *Bootstrapping ... 107
5.5 *A symptotic Theory 108
Problems .... 110
6 Unequal Variances 111
6.1 Introduction.......... 111
6.2 Detecting Heteroscedasticity . 111
6.2.1 Formal Tests ..... 114
6.3 Variance Stabilizing Transformations 115
6.4 Weighting 118
Problems .... 128
7 * Correlated Errors 132
7.1 Introduction ... 132
n
7.2 Generalized Least Squares: Case When Is Known 133
7.3 Estimated Generalized Least Squares . . . . . . 134
7.3.1 Error Variances Unequal and Unknown 134
7.4 Nested Errors . . . . . . . . . . . . . . . . . . . 136