ebook img

Linear Regression: An Introduction to Statistical Models PDF

201 Pages·2022·7.453 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Linear Regression: An Introduction to Statistical Models

LINEAR REGRESSION 00_MARTIN_LR_FM.indd 1 21/04/2021 2:14:11 PM THE SAGE QUANTITATIVE RESEARCH KIT Beginning Quantitative Research by Malcolm Williams, Richard D. Wiggins, and the late W. Paul Vogt is the first volume in The SAGE Quantitative Research Kit. This book can be used together with the other titles in the Kit as a comprehensive guide to the process of doing quantitative research, but it is equally valuable on its own as a practical introduction to completing quantitative research. Editors of The SAGE Quantitative Research Kit: Malcolm Williams – Cardiff University, UK Richard D. Wiggins – UCL Social Research Institute, UK D. Betsy McCoach – University of Connecticut, USA Founding editor: The late W. Paul Vogt – Illinois State University, USA 0000__WMAILRLTIAINM_SL_RF_MFM.in.idndd d 2 2 1201-/0A4p/r2-02211 1 21::1447::1215 PAMM LINEAR REGRESSION: AN INTRODUCTION TO STATISTICAL MODELS PETER MARTIN THE SAGE QUANTITATIVE RESEARCH KIT 00_MARTIN_LR_FM.indd 3 21/04/2021 2:14:11 PM SAGE Publications Ltd © Peter Martin 2021 1 Oliver’s Yard 55 City Road This volume published as part of The SAGE Quantitative London EC1Y 1SP Research Kit (2021), edited by Malcolm Williams, Richard D. Wiggins and D. Betsy McCoach. SAGE Publications Inc. 2455 Teller Road Apart from any fair dealing for the purposes of research, Thousand Oaks, California 91320 private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act, 1988, this publication SAGE Publications India Pvt Ltd may not be reproduced, stored or transmitted in any form, B 1/I 1 Mohan Cooperative Industrial Area or by any means, without the prior permission in writing of Mathura Road the publisher, or in the case of reprographic reproduction, New Delhi 110 044 in accordance with the terms of licences issued by the Copyright Licensing Agency. Enquiries concerning SAGE Publications Asia-Pacific Pte Ltd reproduction outside those terms should be sent to the 3 Church Street publisher. #10-04 Samsung Hub Singapore 049483 Library of Congress Control Number: 2020949998 Editor: Jai Seaman Assistant editor: Charlotte Bush British Library Cataloguing in Publication data Production editor: Manmeet Kaur Tura Copyeditor: QuADS Prepress Pvt Ltd A catalogue record for this book is available from the Proofreader: Elaine Leek British Library Indexer: Cathryn Pritchard Marketing manager: Susheel Gokarakonda Cover design: Shaun Mercier Typeset by: C&M Digitals (P) Ltd, Chennai, India Printed in the UK ISBN 978-1-5264-2417-4 At SAGE we take sustainability seriously. Most of our products are printed in the UK using responsibly sourced papers and boards. When we print overseas we ensure sustainable papers are used as measured by the PREPS grading system. We undertake an annual audit to monitor our sustainability. Contents List of Figures, Tables and Boxes ix About the Author xv Acknowledgements xvii Preface xix 1 What Is a Statistical Model? 1 Kinds of Models: Visual, Deterministic and Statistical 2 Why Social Scientists Use Models 3 Linear and Non-Linear Relationships: Two Examples 4 First Approach to Models: The t-Test as a Comparison of Two Statistical Models 6 The Sceptic’s Model (Null Hypothesis of the t-Test) 8 The Power Pose Model: Alternative Hypothesis of the t-Test 9 Using Data to Compare Two Models 10 The Signal and the Noise 14 2 Simple Linear Regression 17 Origins of Regression: Francis Galton and the Inheritance of Height 18 The Regression Line 21 Regression Coefficients: Intercept and Slope 23 Errors of Prediction and Random Variation 24 The True and the Estimated Regression Line 25 Residuals 26 How to Estimate a Regression Line 27 How Well Does Our Model Explain the Data? The R2 Statistic 29 Sums of Squares: Total, Regression and Residual 29 R2 as a Measure of the Proportion of Variance Explained 31 R2 as a Measure of the Proportional Reduction of Error 31 Interpreting R2 32 Final Remarks on the R2 Statistic 32 Residual Standard Error 33 Interpreting Galton’s Data and the Origin of ‘Regression’ 33 Inference: Confidence Intervals and Hypothesis Tests 35 00_MARTIN_LR_FM.indd 5 21/04/2021 2:14:11 PM vi LINEAR REGRESSION: AN INTRODUCTION TO STATISTICAL MODELS Confidence Range for a Regression Line 39 Prediction and Prediction Intervals 42 Regression in Practice: Things That Can Go Wrong 44 Influential Observations 45 Selecting the Right Group 46 The Dangers of Extrapolation 47 3 Assumptions and Transformations 51 The Assumptions of Linear Regression 52 Investigating Assumptions: Regression Diagnostics 54 Errors and Residuals 54 Standardised Residuals 55 Regression Diagnostics: Application With Examples 56 Normality 56 Homoscedasticity and Linearity: The Spread-Level Plot 61 Outliers and Influential Observations 64 Independence of Errors 70 What if Assumptions Do Not Hold? An Example 71 A Non-Linear Relationship 71 Model Diagnostics for the Linear Regression of Life Expectancy on GDP 73 Transforming a Variable: Logarithmic Transformation of GDP 73 Regression Diagnostics for the Linear Regression With Predictor Transformation 79 Types of Transformations, and When to Use Them 79 Common Transformations 80 Techniques for Choosing an Appropriate Transformation 83 4 Multiple Linear Regression: A Model for Multivariate Relationships 87 Confounders and Suppressors 88 Spurious Relationships and Confounding Variables 88 Masked Relationships and Suppressor Variables 91 Multivariate Relationships: A Simple Example With Two Predictors 93 Multiple Regression: General Definition 96 Simple Examples of Multiple Regression Models 97 Example 1: One Numeric Predictor, One Dichotomous Predictor 98 Example 2: Multiple Regression With Two Numeric Predictors 107 Research Example: Neighbourhood Cohesion and Mental Wellbeing 113 00_MARTIN_LR_FM.indd 6 21/04/2021 2:14:11 PM contents vii Dummy Variables for Representing Categorical Predictors 117 What Are Dummy Variables? 118 Research Example: Highest Qualification Coded Into Dummy Variables 118 Choice of Reference Category for Dummy Variables 122 5 Multiple Linear Regression: Inference, Assumptions and Standardisation 125 Inference About Coefficients 126 Standard Errors of Coefficient Estimates 126 Confidence Interval for a Coefficient 128 Hypothesis Test for a Single Coefficient 128 Example Application of the t-Test for a Single Coefficient 129 Do We Need to Conduct a Hypothesis Test for Every Coefficient? 130 The Analysis of Variance Table and the F-Test of Model Fit 131 F-Test of Model Fit 132 Model Building and Model Comparison 135 Nested and Non-Nested Models 135 Comparing Nested Models: F-Test of Difference in Fit 137 Adjusted R2 Statistic 139 Application of Adjusted R2 140 Assumptions and Estimation Problems 141 Collinearity and Multicollinearity 141 Diagnosing Collinearity 142 Regression Diagnostics 144 Standardisation 148 Standardisation and Dummy Predictors 151 Standardisation and Interactions 151 Comparing Coefficients of Different Predictors 152 Some Final Comments on Standardisation 152 6 Where to Go From Here 155 Regression Models for Non-Normal Error Distributions 156 Factorial Design Experiments: Analysis of Variance 157 Beyond Modelling the Mean: Quantile Regression 158 Identifying an Appropriate Transformation: Fractional Polynomials 158 Extreme Non-Linearity: Generalised Additive Models 159 Dependency in Data: Multilevel Models (Mixed Effects Models, Hierarchical Models) 159 00_MARTIN_LR_FM.indd 7 21/04/2021 2:14:12 PM viii LINEAR REGRESSION: AN INTRODUCTION TO STATISTICAL MODELS Missing Values: Multiple Imputation and Other Methods 159 Bayesian Statistical Models 160 Causality 160 Measurement Models: Factor Analysis and Structural Equations 161 Glossary 163 References 171 Index 175 00_MARTIN_LR_FM.indd 8 21/04/2021 2:14:12 PM List of Figures, Tables and Boxes List of figures 1.1 Child wellbeing and income inequality in 25 countries 4 1.2 Gross domestic product (GDP) per capita and life expectancy in 134 countries (2007) 6 1.3 Hypothetical data from a power pose experiment 8 1.4 Illustrating two statistical models for the power pose experiment 8 1.5 Partition of a statistical model into a systematic and a random part 15 2.1 Scatter plot of parents’ and children’s heights 19 2.2 Galton’s data with superimposed regression line 22 2.3 An illustration of the regression line, its intercept and slope 23 2.4 Illustration of residuals 27 2.5 Partition of the total outcome variation into explained and residual variation 30 2.6 Illustration of R2 as a measure of model fit 31 2.7 Galton’s regression line compared to the line of equal heights 34 2.8 Regression line with 95% confidence range for mean prediction 40 2.9 Regression line with 95% prediction intervals 43 2.10 Misleading regression lines resulting from influential observations 45 2.11 The relationship between GDP per capita and life expectancy, in two different selections from the same data set 46 2.12 Linear regression of life expectancy on GDP per capita in the 12 Asian countries with highest GDP, with extrapolation beyond the data range 48 2.13 Checking the extrapolation from Figure 2.12 by including the points for the 12 Asian countries with the lowest GDP per capita 48 3.1 Illustration of the assumptions of normality and homoscedasticity in Galton’s regression 54 3.2 An illustration of the normal distribution 57 3.3 Histogram of standardised residuals from Galton’s regression, with a superimposed normal curve 58 3.4 Histograms of standardised residuals illustrating six distribution shapes 59 00_MARTIN_LR_FM.indd 9 21/04/2021 2:14:12 PM

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.