Micro-Econometrics Second Edition Myoung-jae Lee Micro-Econometrics Methods of Moments and Limited Dependent Variables Second Edition 123 Myoung-jaeLee DepartmentofEconomics KoreaUniversity Anam-dong,Sungbuk-gu Seoul,Korea136-701 GAUSSisatrademarksofAptechSystems,Inc. (cid:2) (cid:2) STATAR andtheSTATAR logoareregisteredtrademarksofStataCorpLP. ISBN978-0-387-95376-2 e-ISBN978-0-387-68841-1 DOI10.1007/b60971 SpringerNewYorkDordrechtHeidelbergLondon LibraryofCongressControlNumber:2009935059 (cid:2)c SpringerScience+BusinessMedia,LLC1996,2010 Allrightsreserved.Thisworkmaynotbetranslatedorcopiedinwholeorinpartwithoutthewritten permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA),except for brief excerpts in connection with reviews orscholarly analysis. Usein connection with any form of information storage and retrieval, electronic adaptation, computer software,orbysimilarordissimilarmethodologynowknownorhereafterdevelopedisforbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not theyaresubjecttoproprietaryrights. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) To my “Little Women” Hyun-joo, Hyun and Young, full of life, love and the cool PREFACE WhenIwrotethebookMethodsofMomentsandSemiparametricEcono- metrics for Limited Dependent Variable Models published from Springer in 1996, my motivation was clear: there was no book available to convey the latest messages in micro-econometrics. The messages were that most econo- metric estimators can be viewed as method-of-moment estimators and that inferences for models with limited dependent variables (LDV) can be done without going fully parametric. Time has passed and there are now several books available for the same purpose.Thesedays,methodsofmomentsarethemainstayineconometrics, not just in micro-, but also in macro-econometrics. Many papers have been published for semiparametric methods and LDV models. I, myself, learned much over the years since 1996, so much so that my own view on what should be taught, and how, has changed much. Particularly, my exposure to the“sampleselection”and“treatmenteffect”literaturehaschangedtheway Ilookateconometricsnow.WhenIsetouttowritethesecondeditionofthe 1996 book, these changes prompted me to re-title, reorganize, and re-focus the book. This book, or the second edition of the book from Springer in 1996, differs greatly from the 1996 book in three aspects. First, I tried to write the book more as a textbook than as a monograph, so that the book can be used as a first year textbook in graduate econometrics courses. Second, differently from the 1996 book, many empirical examples have been added and estimators that work well in practice are given more coverage than the others. Third, the literature have been updated, or at least, the relevant new papers have been cited so that the reader can consult them if he/she desires so. These changes resulted in more than doubling the book length. Onemayclassifyeconometricsintotwo:micro-econometricsdealingwith individualdata,andmacro-econometricsdealingwith(aggregate)time-series data. Micro-econometrics may befurtherclassified into“cross-section micro- econometrics” and “panel-data micro- econometrics”; an analogous classi- fication can be done for macro-econometrics. In 2002, I published a book entitled Panel Data Econometrics: Methods of Moments and Limited De- pendent Variables from Academic Press; for me, this leaves “cross-section micro-econometrics”tocoverinmicro-econometrics,whichiswhatthisbook is mainly about, although panel data models are also examined occasionally. One of the “buzz word” in micro-econometrics these days is “treatment effect.” This topic has been studied extensively in epidemiology and medical science as well as in some social science disciplines. Treatment effect frame- work is, in fact, nothing but “switching regression” in micro-econometrics thatwaspopularsometimeago:theeffectofabinarytreatmentisofinterest, and if there is any treatment effect, we get to see two different (i.e., switch- ing)regimesdependingonthetreatment.In2005,Ipublishedabookentitled vii viii Preface Micro-EconometricsforPolicy,Program,andTreatmentEffects fromOxford University Press. Hence, despite its prominence, treatment effect will be dis- cussed at minimum, if at all, in this book. Closelyrelatedtotreatmenteffectis“sampleselection”wherethesample at hand comes from only one regime while our interest is on both or on the “averaged” regime. I am planning to write a book on sample selection in the near future, and thus the coverage of sample selection in this book will not be extensive. Sample selection is a fairly well-confined topic in micro- econometrics, and the non-extensive coverage in this book would not distort the overall picture of micro-econometrics. The book consists of three parts in the main text, with each part hav- ing a number of chapters, and three appendices. The first part (Chapters 1 and 2) in the main text is for methods of moments for linear models, the second part (Chapter 3–6) is for nonlinear models and parametric methods for LDV models, and the third part (Chapter 7–9) is for semiparametric and nonparametric methods. Appendix I contains one section on mathematical andstatisticalbackgrounds,andeightmoresectionsofappendicesforChap- ters 2–9. Appendix II has further supporting materials. Both appendices are technical, digressive or tentative, and Appendix II is more so than Appendix I in this regard. Most things the reader may feel missing while reading the main text can be found in the appendices, although what is available in the appendix is not specifically mentioned in the main text. Some interesting topicsareputintheappendicestoavoidlengtheningthemaintexttoomuch and thus discouraging the reader prematurely. Appendix III provides some GAUSS programs. I tried to select only simpleandnumericallystable(i.e.,reliable)programs.Allprogramsusesim- ulated data. Although I wrote this book so that the readers can write their own programs, STATA commands are occasionally referred to, in case the reader may think that the procedure under consideration is difficult to im- plement and not available in ready-made econometric packages. As in my other books, small sample issues and matters of “second order importance” will not be discussed much, because econometricians will be making mistakes of large magnitude, if any. With this being the case, paying attention to small sample improvement and low-order precision seems not so meaningful. Of course, ideally, one should avoid mistakes of both large and smallmagnitudes,butsayingthatwouldbeignoringeconometricians’budget and time constraints; politicians might feel comfortable saying that, but not most economists. Some glaring omissions in this book’s coverage include weak instru- ments, factor analysis, stochastic frontiers, measurement errors (or errors invariables),semiparametricefficiency,auction-relatedeconometrics,spatial dependence, demand system analysis, sampling, and missing data and impu- tation which are closely related to sample selection. Also it would be nicer to have more detailed coverages of duration analysis, multinomial choices, “bandwidth-dependent” semiparametric methods for LDV models, and so on. All of these require much more time and efforts on my side, and cover- Preface ix ing them would mean this book not seeing the daylight for another several years—perhaps next time. The target audience of this book are graduate students and researchers. The entire book may be covered in two to four semesters—one semester for each part plus the appendices—but covering essential topics selectively while omitting the optional starred topics (and some others) may be done in two semesters. Most estimators and tests have been tried with real or simulateddataexceptsomeintheappendices.Thereaderwillfindintuitions forhowestimators/testsworkaswellasvarioustipsforhand-onexperiences. Aboutempiricalexamplesinthisbook,itwouldbeidealtochoosethe“best” empirical examples for a given estimator/test. But, unfortunately, my time constraintspreventedmefromdoingthat;rather,mostexampleswerechosen moreorless“randomly”—i.e.,Ihappenedtoruninto,orjustremember,the example when the topic was written about. In this book, theoretically oriented readers will find an overview on micro-econometrics, and applied researchers will find helpful informations on how to apply micro-econometric techniques; there will be something for everybody—atleastthatiswhatIhope.Thereadermayalsowanttoconsult other good books with micro-econometric focus such as Wooldridge (2002), Cameron and Trivedi (2005, 2009), and Green (2007). Compared with these books,thetheoreticalcoverageofthisbookisrelativelyatahigherlevelwith a semi-(non) parametric bent. I am grateful to the Springer Statistics Editor John Kimmel for his patience while this project was dragging on for eight-plus years after the initialtalk.Iamalsogratefultotheanonymousreviewersfortheircomments whichledtosubstantialimprovementsandre-organizationsofthebook.Juao˜ Santos-Silvaprovidedvaluablefeedbacksonmanyoccasions,andJing-young Choi helped me much by proof-reading most parts of the book. Also Sang- hyeokLee,Jong-hunChoiandYoung-minJuproof-readvariouschaptersand gave me comments. I should admit, however, that I could not incorporate all the comments/feedbacks due to the book-length/time constraints, and also duetothefactthatmakingtoomanychangesnearthefinalstageisarather risky thing to do. Without implicating any reviewer or anybody for that matter, I will be solely responsible for any errors in the book. REMARKS ON EXPRESSIONS AND NOTATIONS Manyacronymswillbeusedinlower/uppercaseletters:“rv”forrandom variable,“cdf”or“df”for(cumulative)distributionfunction,“rhs”forright- hand side, “lhs” for left-hand side, “dof” for degree of freedom, “wrt” for “with respect to”, “cov” for covariance, “cor” for correlation, and so on. For matrix A, “p.d.” (n.d.) stands for “positive definite” (negative defi- nite)and“p.s.d.”(n.s.d.)standsfor“positivesemidefinite”(negativesemidef- inite). tr(A) denotes its trace, and |A| (or det(A)) denotes its determinant; ||A|| will be then the absolute value of the determinant. But sometimes, |A| or ||A|| may mean the matrix norm{tr(A(cid:2)A)}1/2. For matrices a ,...,a , 1 M “diag(a ,...,a )” is the block diagonal matrix with a ,...,a along the di- 1 M 1 M agonal. The notation “→p” means “convergence in probability”, and “→ae” or “→as” means “convergence almost surely (a.s.)” or “convergence almost ev- erywhere(a.e.)”. The notation “(cid:2)”denotes “convergence indistribution (or inlaw)”, and“∼”denotesthedistributionofarv;e.g.,“x∼N(0,1)” means thatxfollowsthestandardnormaldistribution.Wewillalsouse“x∼(μ,σ2)” tomeanthatE(x)=μ andV(x)=σ 2 withoutitsdistributionspecified.Fre- quently, φ and Φ will be used to denote the N(0,1) density and distribution function, respectively. Uniform distribution on [a,b] is denoted as U[a,b], ex- ponential distribution with parameter θ is denoted as Expo(θ), and Poission distribution with parameter λ is denoted as Poi(λ). Other distributions are often denoted analogously; e.g., Weibull(α,θ). In many textbooks, an uppercase letter and its lowercase letter are used to denote, respectively, a rv and its realized value. In this case, for a rv Y with distr(cid:2)ibution function F(y) ≡ P(Y ≤ y) and density f(y), we have E{g(Y)}= g(y)f(y)dy for a function g(·). But this distinction between Y and y will not be followed in most parts of this book, because upper case letters are frequently used to denote matrices in this book. A downside of n(cid:2)ot following the uppercase/lowercase convention can(cid:2) be seen in E{g(y)} = g(y)f(y)dy where y(cid:2)in E{g(y)} is a rv but y in g(y)f(y)dy is just an integration dummy— g(z)f(z)dz would mean just the same thing. In most cases, it will be clear from the context whether y is a rv(cid:2)or not. But if necessary,toavoidthiskindofconfusion,wemayalsowrite g(y )f(y )dy . o o o Also from the given context, it will be clear whether F(y) means P(Y ≤ y) with Y random and y fixed, or F(·) taken on a rv y; if the meaning is not clear, we may write F(y ) forP (y ≤y ) where y is a rv. o o TheconditionaldistributionfunctionforP(y ≤y |x=x )isdenotedas o o Fy|x(yo|xo), Fy|x=xo(yo), or Fy|xo(yo). But if not interested in any particular values of yo and xo, we may just write Fy|x(y|x), Fy|x(y), or F(y|x). The correspondingdensityfunctionwillbedenotedasfy|x(y|x),fy|x(y),orf(y|x), respectively. In these cases, y and x in parentheses are not rv’s, but stand for some values that those rv’s can take (just to indicate that F and f are for those rv’s). Ex(·) and Ey|x(·) denote that the expected value is taken x Remarks on Expressions and Notations xi for x and y|x, respectively. Med(y|x) and Mode(y|x) denote the conditional medianandmode,respectively.Q (y|x)(orq (y|x))denotestheconditional α α αth quantile. The independence between two random vectors x and y is denoted as x(cid:6)y, and the conditional independence between x and y given z is denoted as x(cid:6)y|z. The ‘indicator function’ 1[·] is defined as 1[A] = 1 if A holds and 0 otherwise. The ‘sign function’ sgn(a) ≡ 2×1[a ≥ 0]−1 denotes the sign of a: sgn(a) = 1 if a ≥ 0 and−1 ifa <0. Sometimes the sign function may be defined such that it becomes 0 or −1 when a=0. WhenE(z)isused,itisimplicitlyassumedthatE(z)<∞;whenE−1(z) is used (a shorthand for {E(z)}−1), it is also assumed that E(z) is invert- ible. Most vectors in this book are column vectors, and for a m×1 vector function g(b) where b has dimension k×1, its first derivative matrix g has b dimension k ×m; in comparison, gb(cid:2) ≡ gb(cid:2) is a m×k matrix. Rk denotes the k-dimensional Euclidean space, and R = R1 denotes the real space; |·| denotes the Euclidean norm in most cases. For a function g(·), “increasing” means “non-decreasing,” and “strictly increasing” means increasing without the equality; the analogous usages hold “decreasing” and “strictly decreas- ing.” L(y|x) and L (y|x) denote the linear projection E(yx(cid:2))E−1(xx(cid:2))x and N itsestimator,respectively;incomparisontoL(y|x),E(y|x)maybecalledthe (nonlinear) projection and E (y|x) denotes an estimator for E(y|x). N Since this book is mainly for cross-section micro-econometrics, unless otherwise noted, we will assume that data, say z ,z ,...,z from N subjects 1 2 N (individuals), are (independent and identically distributed) from a common distribution.Theindividualswillbeindexedbyi=1,...,N,andwewilloften drop the subscript i to write z just as z, if not interested in any particular i subject.Hence,whenz =(z ,...,z )(cid:2) is an“m-vector”(i.e., m×1vector), i i1 im its mth component z may be denoted as z with i omitted. im m
Description: