Table Of ContentEconometrics and data analysis for
developing countries
Econometrics and Data Analysis for Developing Countries provides a
rigorous but accessible foundation to modern data analysis and econo
metricpractice.Thebook containsmanyexamples and exerciseswith data
from developing countries, available for immediate use on the floppy disk
provided.
Distinctive features include:
• teaching regression by example using data from actual development
experiences
• a wide range of detailed information from Latin America, Africa and
South Asia
• extensive use of regression graphics as a complementary diagnostic
tool of applied regression
• opportunities for readers to gain hands-on experience in empirical
research
• hundreds·of useful statistical examples from developing countries on
computer disk
Econometrics and Data Analysis for Developing Countries is designed as
a course consisting both of lecture and of computer-assisted practical
workshops. Itis aunique resourcefor students and researchersindevelop
ment economics, quantitative development studies and policy analysis.
Chandan Mukherjee is the Director of the Centre for Development
Studies, Trivandrum, India. He has over twenty years' experience of
teaching quantitative methods to economics students.
How~rd White is Senior Lecturer in Applied Quantitative Economics at
the Institute of Social Studies, The Hague, The Netherlands. He has
published widely on development economics and other subjects.
MarcWuytsisProfessorinAppliedQuantitativeEconomicsattheInstitute
of Social Studies, The Hague, The Netherlands. He has extensive experi
ence as a teacher in statistics, econometrics and development economics.
Priorities for development economics
Series Editor: Paul Mosley
University ofReading
Development economics deals with the most fundamental problems of
economics- poverty, famine, population growth, structural change, indus
trialisation, debt, international finance, the relations between state and
market, the gap between rich and poor countries. Partly because of this,
its subject matter has fluctuated abruptly over time in response to polit
ical currents in a way which sometimes causes the main issues to be
obscured; at the same time it is being constantly added to and modified
in every developed and developing country. The present series confronts
these problems. Each contribution will begin with a dispassionate review
ofthe literature worldwide and will use this as a springboard to argue the
author's own original point of view. In this way the reader will both be
brought up to date with the latest advances in a particular field of study
and encounter a distinctive approach to that area.
Econometrics and
data analysis for
developing countries
Chandan Mukherjee, Howard White and
Marc Wuyts
i~ ~~o~;~~n~~~up
LONDONANDNEWYORK
First published 1998 by Routledge
2ParkSquare,MiltonPark,Abingdon,OxonOX144RN
Simultaneously published in the USA and Canada
by Routledge
270MadisonAve.NewYorkNY 10016
Routledgeisanimprintofthe Taylor& FrancisGroup, aninformabusiness
© 1998 Chandan Mukherjee, Howard White and Marc Wuyts
Typeset in Times by Florencetype Ltd, Stoodleigh, Devon
All rights reserved. No part of this book may be reprinted or reproduced or
utilisedin any form or by any electronic, mechanical, or othermeans, now known
or hereafter invented, including photocopying and recording, or in any informa
tionstorageorretrievalsystem,withoutpermissioninwritingfrom thepublishers.
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library.
Library ofCongress Cataloging in Publication Data
A catalog record for this book has been requested.
ISBN 10:0-415-09399-6(hbk)
ISBN 10:0-415-09400-3 (pbk)
ISBN 13:978-0-415-09399-6(hbk)
ISBN 13:978-0-415-09400-9(pbk)
Contents
List offigures ix
List oftables xiii
List ofboxes xvi
Preface xvii
Introduction 1
1 The purpose of this book 1
2 The approach of this book: an example 3
Part I Foundations of data analysis 21
1 Model specification and applied research 23
1.1 Introduction 23
1.2 Model specification and statistical inference 24
1.3 The role of data in model specification:
traditional modelling 29
1.4 The role of data in model specification:
modern approaches 32
1.5 The time dimension in data 39
1.6 Summary of main points 42
2 Modelling an average 44
2.1 Introduction 44
2.2 Kinds of averages 45
2.3 The assumptions of the model 50
2.4 The sample mean as best linear unbiased
estimator (BLUE) 53
2.5 Normality and the maximum likelihood principle 58
2.6 Inference from a sample of a normal distribution 61
2.7 Summary of main points 71
Appendix 2.1: Properties of mean and variance 73
Appendix 2.2: Standard sampling distributions 73
vi Contents
3 Outliers, skewness and data transformations 75
3.1 Introduction 75
3.2 The least squares principle and the concept
of resistance 76
3.3 Mean-based versus order-based sample statistics 80
3.4 Detecting non-normality in data 90
3.5 Data transformations to eliminate skewness 97
3.6 Summary of main points 106
Part II Regression and data analysis 109
4 Data analysis and simple regression 111
4.1 Introduction 111
4.2 Modelling simple regression 112
4.3 Linear regression and the least squares principle 114
4.4 Inference from classical normal linear
regression model 120
4.5 Regression with graphics: checking the model
assumptions 124
4.6 Regression through the origin 136
4.7 Outliers, leverage and influence 137
4.8 Transformation towards linearity 148
4.9 Summary of main points 159
5 Partial regression: interpreting multiple regression coefficients 163
5.1 Introduction 163
5.2 The price of food and the demand for
manufactured goods in India 165
5.3 Least squares and the sample multiple regression line 173
5.4 Partial regression and partial correlation 180
5.5 The linear regression model 184
5.6 The t-test in multiple regression 192
5.7 Fragility analysis: making sense of
regression coefficients 198
5.8 Summary of main points 206
6 Model selection and misspecification in multiple regression 208
6.1 Introduction 208
6.2 Griffin's aid versus savings model: the omitted
variable bias 209
6.3 Omitted variable bias: the theory 212
6.4 Testing zero restrictions 219
6.5 Testing non-zero linear restrictions 229
6.6 Tests of parameter stability 231
6.7 The use of dummy variables 237
6.8 Summary of main points 246
Contents vii
Part III Analysing cross-section data 249
7 Dealing with heteroscedasticity 251
7.1 Introduction 251
7.2 Diagnostic plots: looking for heteroscedasticity 252
7.3 Testing for heteroscedasticity 256
7.4 Transformations towards homoscedasticity 264
7.5 Dealing with genuine heteroscedasticity: weighted
least squares and heteroscedastic standard errors 270
7.6 Summary of main points 277
8 Categories, counts and measurements 279
8.1 Introduction 279
8.2 Regression on a categorical variable: using
dummy variables 280
8.3 Contingency tables: association between
categorical variables 287
8.4 Partial association and interaction 293
8.5 Multiple regression on categorical variables 295
8.6 Summary of main points 298
9 Logit transformation, modelling and regression 302
9.1 Introduction 302
9.2 The logit transformation 303
9.3 Logit modelling with contingency tables 307
9.4 The linear probability model versus logit regression 313
9.5 Estimation and hypothesis testing in logit regression 320
9.6 Graphics and residual analysis in logit regression 327
9.7 Summary of main points 331
Part IV Regression with time-series data 333
10 Trends, spurious regressions and transformations
to stationarity 335
10.1 Introduction 335
10.2 Stationarity and non-stationarity 335
10.3 Random walks and spurious regression 338
10.4 Testing for stationarity 349
10.5 Transformations to stationarity 356
10.6 Summary of main points 363
Appendix 10.1: Generated DSP and TSP series for exercises 365
11 Misspecification and autocorrelation 366
11.1 Introduction 366
11.2 What is autocorrelation and why is it a problem? 366
11.3 Why do we get autocorrelation? 370
11.4 Detecting autocorrelation 379
viii Contents
11.5 What to do about autocorrelation 387
11.6 Summary of main points 390
Appendix 11.1: Derivation of variance and covariance
for AR(l) model 391
12 Cointegration and the error correction model 393
12.1 Introduction 393
12.2 What is cointegration? 393
12.3 Testing for cointegration 399
12.4 The error correction model (ECM) 406
12.5 Summary of main points 412
Part V Simultaneous equation models 413
13 Misspecification bias from single equation estimation 415
13.1 Introduction 415
13.2 Simultaneity bias in a supply and demand model 417
13.3 Simultaneity bias: the theory 422
13.4 The Granger and Sims tests for causality and
concepts of exogeneity 425
13.5 The identification problem 428
13.6 Summary of main points 434
14 Estimating simultaneous equation models 437
14.1 Introduction 437
14.2 Recursive models 437
14.3 Indirect least squares 439
14.4 Instrumental variable estimation and two-stage
least squares 442
14.5 Estimating the consumption function in a
simultaneous system 445
14.6 Full information estimation techniques 448
14.7 Summary of main points 451
Appendix A: The data sets used in this book 455
Appendix B: Statistical tables 463
References 481
Index 485
Figures
1 Histogram of the three variables 8
2 The scatter plot matrix of the variables 10
3 Histograms with transformed data 11
4 The scatter plot matrix of the transformed variables 12
5 Scatter plot of birth against infant mortality with regression
curve (regression of birth against square root of infant
mortality) 15
1.1 The elements of a statistical study 26
2.1 The demand for and recruitment of casual labour, Maputo
harbour 47
2.2 Weekly overtime payments 48
2.3 Real manufacturing GDP, Tanzania 51
2.4 Comparing the sampling and population distributions of
labour demand 55
2.5 Comparing the sampling distributions of mean and median:
demand for labour 60
2.6 Confidence intervals of sample means: demand for labour,
Maputo harbour 64
3.1 The least squares property 78
3.2 Box plots of GNP per capita (two samples of seven African
countries) 86
3.3 Comparative box plots of gender differences in life
expectancy 88
3.4 Mean versus median: GNP per capita 92
3.5 Symmetric but unusual tail: female-male life expectancy 92
3.6 Household income data 99
3.7 Log household income 99
3.8 Fourth root transformation 102
4.1 Regressing D on R 127
4.2 Exploratory band regression: D on R 128
4.3 Scatter plot with regression line: RE on RR 130
4.4 Residuals versus fitted values 130
4.5 Exploratory band regression: RE on RR 131