Handbook of REGRESSION METHODS Handbook of REGRESSION METHODS Derek S. Young University of Kentucky, Lexington CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2017 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed on acid-free paper Version Date: 20170613 International Standard Book Number-13: 978-1-4987-7529-8 (Hardback) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging‑in‑Publication Data Names: Young, Derek S. Title: Handbook of regression methods / Derek S. Young. Description: Boca Raton : CRC Press, 2017. | Includes bibliographical references and index. Identifiers: LCCN 2017011248 | ISBN 9781498775298 (hardback) Subjects: LCSH: Regression analysis. | Multivariate analysis. Classification: LCC QA278.2 .Y66 2017 | DDC 519.5/36--dc23 LC record available at https://lccn.loc.gov/2017011248 Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com To Andri. Contents List of Examples xiii Preface xv I Simple Linear Regression 1 1 Introduction 3 2 Basics of Regression Models 7 2.1 Regression Notation . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Population Model for Simple Linear Regression . . . . . . . 8 2.3 Ordinary Least Squares . . . . . . . . . . . . . . . . . . . . 10 2.4 Measuring Overall Variation from the Sample Line . . . . . 12 2.4.1 R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.5 Regression Through the Origin . . . . . . . . . . . . . . . . 13 2.6 Distinguishing Regression from Correlation . . . . . . . . . 14 2.7 Regression Effect . . . . . . . . . . . . . . . . . . . . . . . . 16 2.7.1 Regression Fallacy . . . . . . . . . . . . . . . . . . . 17 2.8 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3 Statistical Inference 25 3.1 Hypothesis Testing and Confidence Intervals . . . . . . . . 25 3.2 Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.3 Inference on the Correlation Model . . . . . . . . . . . . . . 33 3.4 Intervals for a Mean Response . . . . . . . . . . . . . . . . 35 3.5 Intervals for a New Observation . . . . . . . . . . . . . . . 36 3.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4 Regression Assumptions and Residual Diagnostics 49 4.1 Consequences of Invalid Assumptions . . . . . . . . . . . . 50 4.2 Diagnosing Validity of Assumptions . . . . . . . . . . . . . 51 4.3 Plots of Residuals Versus Fitted Values . . . . . . . . . . . 53 4.3.1 Ideal Appearance of Plots . . . . . . . . . . . . . . 54 4.3.2 Difficulties Possibly Seen in the Plots . . . . . . . . 56 4.4 Data Transformations . . . . . . . . . . . . . . . . . . . . . 57 4.5 Tests for Normality . . . . . . . . . . . . . . . . . . . . . . 59 4.5.1 Skewness and Kurtosis . . . . . . . . . . . . . . . . 61 vii viii Contents 4.6 Tests for Constant Error Variance . . . . . . . . . . . . . . 63 4.7 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5 ANOVA for Simple Linear Regression 73 5.1 Constructing the ANOVA Table . . . . . . . . . . . . . . . 73 5.2 Formal Lack of Fit . . . . . . . . . . . . . . . . . . . . . . . 77 5.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 II Multiple Linear Regression 83 6 Multiple Linear Regression Models and Inference 85 6.1 About the Model . . . . . . . . . . . . . . . . . . . . . . . . 85 6.2 Matrix Notation in Regression . . . . . . . . . . . . . . . . 87 6.3 Variance–Covariance Matrix and Correlation Matrix of βˆ . 92 6.4 Testing the Contribution of Individual Predictor Variables . 94 6.5 Statistical Intervals . . . . . . . . . . . . . . . . . . . . . . . 95 6.6 Polynomial Regression . . . . . . . . . . . . . . . . . . . . . 96 6.7 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 7 Multicollinearity 109 7.1 Sources and Effects of Multicollinearity . . . . . . . . . . . 109 7.2 Detecting and Remedying Multicollinearity . . . . . . . . . 110 7.3 Structural Multicollinearity . . . . . . . . . . . . . . . . . . 114 7.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 8 ANOVA for Multiple Linear Regression 121 8.1 The ANOVA Table . . . . . . . . . . . . . . . . . . . . . . . 121 8.2 The General Linear F-Test . . . . . . . . . . . . . . . . . . 122 8.3 Lack-of-Fit Testing in the Multiple Regression Setting . . . 123 8.4 Extra Sums of Squares . . . . . . . . . . . . . . . . . . . . . 124 8.5 Partial Measures and Plots . . . . . . . . . . . . . . . . . . 125 8.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 9 Indicator Variables 137 9.1 Leave-One-Out Method . . . . . . . . . . . . . . . . . . . . 137 9.2 Coefficient Interpretations . . . . . . . . . . . . . . . . . . . 138 9.3 Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 9.4 Coded Variables . . . . . . . . . . . . . . . . . . . . . . . . 141 9.5 Conjoint Analysis . . . . . . . . . . . . . . . . . . . . . . . 143 9.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 Contents ix III Advanced Regression Diagnostic Methods 151 10 Influential Values, Outliers, and More Diagnostic Tests 153 10.1 More Residuals and Measures of Influence . . . . . . . . . . 154 10.2 Masking, Swamping, and Search Methods . . . . . . . . . . 163 10.3 More Diagnostic Tests . . . . . . . . . . . . . . . . . . . . . 164 10.4 Comments on Outliers and Influential Values . . . . . . . . 167 10.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 11 Measurement Errors and Instrumental Variables Regression 179 11.1 Estimation in the Presence of Measurement Errors . . . . . 180 11.2 Orthogonal and Deming Regression . . . . . . . . . . . . . 182 11.3 Instrumental Variables Regression . . . . . . . . . . . . . . 184 11.4 Structural Equation Modeling . . . . . . . . . . . . . . . . 186 11.5 Dilution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 11.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 12 Weighted Least Squares and Robust Regression Procedures 195 12.1 Weighted Least Squares . . . . . . . . . . . . . . . . . . . . 195 12.2 Robust Regression Methods . . . . . . . . . . . . . . . . . . 197 12.3 Theil–Sen and Passing–Bablok Regression . . . . . . . . . . 201 12.4 Resistant Regression Methods . . . . . . . . . . . . . . . . . 202 12.5 Resampling Techniques for βˆ . . . . . . . . . . . . . . . . . 203 12.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 13 Correlated Errors and Autoregressive Structures 219 13.1 Overview of Time Series and Autoregressive Structures . . 219 13.2 Properties of the Error Terms . . . . . . . . . . . . . . . . . 221 13.3 Testing and Remedial Measures for Autocorrelation . . . . 224 13.4 Advanced Methods . . . . . . . . . . . . . . . . . . . . . . . 230 13.4.1 ARIMA Models . . . . . . . . . . . . . . . . . . . . 230 13.4.2 Exponential Smoothing . . . . . . . . . . . . . . . . 233 13.4.3 Spectral Analysis . . . . . . . . . . . . . . . . . . . 234 13.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 14 Crossvalidation and Model Selection Methods 249 14.1 Crossvalidation . . . . . . . . . . . . . . . . . . . . . . . . . 249 14.2 PRESS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 14.3 Best Subset Procedures . . . . . . . . . . . . . . . . . . . . 252 14.4 Statistics from Information Criteria . . . . . . . . . . . . . 254 14.5 Stepwise Procedures for Identifying Models . . . . . . . . . 255 14.6 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256