- DOMINICK SALVATORE, Ph.D. DERRICK REAGE, PT UpdaW examples withthe most arrent U. So and world data 'Ikro complete self examinations I New chapter on Time Series kgnom.e- trics ,. -,., Theory and Problems o f STATISTICS AND ECO NOM ETRICS SECOND EDITION DOMINICK SALVATORE, Ph.D. Projessor and Chairperson, Department of Economics, Fordham University DERRICK REAGLE, Ph.D. Assistan b Professo r of Economics, Fordham University Schaum's Outline Series McGRAW-HILL New York Chicago San Francisco Lisbon London Madrid Mexico City Milan New Delhi San Juan Seoul Singapore Sydney Toronto McGraw -Hill R A Llivision of TheMcGraw.HillC ompanies Copyiight O 2002 by The McGraw-Hill Companies, Inc. All lights resewed. Manufactured in the United States of America. Except as permitted under the United States Copyright Act of 1976, no pat of this publication may be reproduced or dishibuted in any form or by any means, or stored in a database or retrieval system, without the prior written permission of the publisher. The material in this eBook also appears in the print version of this title: 0-07-134852-2. All trademarks are trademarks of their respective owners. Rather than put a trademark symbol after every occuivence of a trade- marked name, we use names in an editorial fashion only, and to the benefit of the trademark owner, with no intention of infi-inge- ment of the trademark. Where such designations appear in this book, they have been printed with initial caps. McGraw-Hill eBooks are available at special quantity discounts to use as premiums and sales promotions, or for use in corporate training programs. For more information, please contact George Hoare, Special Sales, at [email protected] or (212) 904-4069. TERMS OF USE This is a copyrighted work and The McGraw-Hill Companies, Inc. ("McGraw-Hill") and its licensors reserve all rights in and to the work. Use of this work is subject to these terms. Except as permitted under the Copyright Act of 1976 and the right to store and retrieve one copy of the work, you may not decompile, disassemble, reverse engineer, reproduce, modify, create derivative works based upon, transmit, distribute, disseminate, sell, publish or sublicense the work or any part of it without McGraw-Hill's prior consent. You may use the work for your own noncommercial and personal use; any other use of the work is strictly prohib- ited. Your right to use the work may be terminated if you fail to comply with these terms. THE WORK IS PROVIDED "AS IS". McGRAW-HILL AND ITS LICENSORS MAKE NO GUARANTEES OR WARRANTIES AS TO THE ACCURACY, ADEQUACY OR COMP1,ETENESS OF OR RESU1,TS TO RE OBTAINED FROM USING THE WORK, INCLUDING ANY INFORMATION THAT CAN BE ACCESSED THROUGH THE WORK VIA HYPERLINK OR OTHERWISE, AND EXPRESSLY DISCLAIM ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIM- ITED TO IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. McCraw-Hill and its licensors do not warrant or guarantee that the functions contained in the work will meet your requirements or that its oper- ation will be uninterrupted or eivor free. Neither McGraw-Hill nor its licensors shall be liable to you or anyone else for any inac- curacy, eivor or omission, regardless of cause, in the work or for any damages resulting therefrom. McGraw-Hill has no respon- sibility for the content of any information accessed through the work. Under no circumstances shall McGraw-Hill and/or its licen- sors be liable for any indirect, incidental, special, punitive, consequential or similar damages that result from the use of or inabil- ity to use the work, even if any of them has been advised of the possibility of such damages. This limitation of liability shall apply to any claim or cause whatsoever whether such claim or cause arises in contract, tort or otherwise. DOI: 10.1036/0071395687 This book presents a clear and concise introduction to statistics and econometrics. A course in statistics or wonomelrics is orlm unt: u1 lht: musl userul bul also unt: u1 lhe musl cliKcull or lht: requiwd courws in colleges and universities. The purpose of this book is to help overcome this difficulty by using a problem-solving approach. Each chapter begins with a statement of theory, principles, or background information, fully illu- strated with examples. This is followed by numerous theoretical and practical problems with detailed, stepby-step solutions. While primarily intended as a supplement to all current standard textbooks of statistics andlor econometrics, the book can also be used as an independent text, as well as to supplement class lectures. The book is aimed at college students in economics, business administration, and the social sciences taking a onesernester or a oneyear course in statistics andlor econometrics. It also provides a very useful source of reference for M.A. and M.B.A. students and for all those who use (or would like to use) statistics and econometrics in their work. No prior statistical background is assumed. The book is completely self-contained in that it covers the statistics (Chaps. 1 to 5) required for econometrics (Chaps. 6 to 11). Tt is applied in nature, and all proofs appear in the problems section rather than in the text itself. Real-world socioeconomic and business data are used, whenever possible, to demonstrate the more advanced econometric techniques and models. Several sources of online data are used, and Web addresses are given for the student's and researcher's further use (App. 12). Topics frequently encountered in econometrics, such as multicollinearity and autocorrelation, are clearly and concisely discussed as to the problems they create, the methods to test for their presence, and possible wrrwlion lwhniques. In lhis mond diliun, wt: havt: exp'andd lht: compuler appli~liunslo provide a general introduction to data handling, and specific programming instruction to perform all estimations in this book by computer (Chap. 12) using Microsoft Excel, Eviews, or SAS statistical packages. We have also added sections on nonparametric testing matrix notation, binary choice models, and an entire chapter on time series analysis (Chap. ll), a field of econometrics which has expanded as of late. A sample statistics and econometrics examination is also included. The methodology of this book and much of its content has been tested in undergraduate and graduate classes in statistics and econometrics at Fordham University. Students found the approach and content of the book extremely useful and made many valuable suggestions for improvement. We have also received very useful advice from Professors Mary Beth Combs, Edward Dowling, and Damo- dar Gujarati. The following students carefully read through the entire manuscript and made many useful comments: Luca Bonardi, Kevin Coughlin, Sean Hennessy, and James Santangelo. To all of them we are deeply grateful. We owe a great intellectual debt to our former professors of statistics and econometrics: J. S. Butler, Jack Johnston, Lawrence Klein, and Bernard Okun. We are indebted to the Literary Executor of the late Sir Ronald A. Fisher, F. R. S., to Dr. Frank Yates, F. R. S., and the Longrnan Group Ltd., London, for permission to adapt and reprint Tables TII and IV from their book, Statistical Tables for W ological, Agricultural and Medical Research. In addition to Statistics and Econometrics, the Schaum's Outline Series in Economics includes Microeconomic Theory, Macroeconomk Theory, International Economics, Mathematics for Economists, and Principles uf Ecunuwrics. New York, 2001 iii Copyright 2002 The McGraw-Hill Companies, Inc. Click Here for Terms of Use. L CONTENTS CHAPTER 1 Introduction 1.1 The Nature of Statistics 1.2 Statistics and Econometrics 1.3 The Methodology of Econometrics CHAPTER 2 Desdp tive Statistics 2.1 Frequency Distributions 2.2 Measures of Central Tendency 2.3 Measures of Dispersion 2.4 Shape of Frequency Distributions CHAPTER 3 Probability and Probability Distributions 3.1 Probability of a Single Event 3.2 Probability of Multiple Events 3.3 Discrete Probability Distributions: The Binomial Distribution 3.4 The Poisson Distribution 3.5 Continuous Probability Distributions: The Normal Distribution CHAPTER 4 Statistical Inference: Estimation 4.1 Sampling 4.2 Sampling Distribution of the Mean 4.3 Estimation Using the Normal Distribution 4.4 Confidence Intervals fnr the Mean TJsing the t Distributinn CHAPTER 5 Statistical Inference: Testing Hypotheses 5.1 Testing Hypotheses 5.2 Testing Hypotheses about the Population Mean and Proportion 3.3 Testing Hypotheses for Differences between Two Means or Proportions 5.4 Chi-Square Test of Goodness of Fit and Tndependence 5.5 Analysis of Variance 5.6 Nonparametric Testing STATISTICS EXAMINATION CHAPTER 6 Simple Regression Analysis 6.1 The Two-Variable Linear Model 6.2 The Ordinary Least-Squares Method Copyright 2002 The McGraw-Hill Companies, Inc. Click Here for Terms of Use. CONTENTS v 6.3 Tests of Significance of Parameter Estimates 6.4 Test d Goodness of Fit and Correlation 6.5 Properties of Ordinary Least-Squares Estimators CHAPTER 7 Multiple Regression Analysis 154 7.1 The Thrse-Variable Linear Model 7.2 Tests of Significance of Parameter Estimates 7.3 The Coefficient of Multiple Determination 7.4 Test of the Overall Significance of the Regression 7.5 Partial-Correlation Coefficients 7.6 Matrix Notation CHAPTER 8 Further Techniques and Applications in Regression Anatysis 181 8.1 Functional Form 181 8.2 Dummy Variables 182 8.3 Distributed Lag Models 182 8.4 Forecasting 183 8.5 Binary Choice Models 184 8.6 Tnterpretation of Binary Choice Models 185 CHAPTER 9 Problems in Regression Analysis 9.1 Multicollinearity 9.2 Heterosoedasticity 9.3 Autocorrelation 9.4 Errors in Variables CHAPTER 10 Simultaneous-Equations Met hods 10.1 Simultaneous-Equations Models 10.2 Identification 10.3 Estimation: Tndirect Least Squares 10.4 Estimation: Two-Stage Least Squares CHAPTER 11 Time-Series Met hods 11.1 ARMA 11.2 Tdent.ifyin ARMA 11.3 Nonstationary Series 11.4 Testing for Unit Root 11.5 Cointegration and Error Correction 11.6 Causality CHAPTER 12 Computer Applications in Econometrics 12.1 Data Formats 12.2 Microsoft Excel CONTENTS 12.3 Eviews 12.4 SAS ECONOMETRICS 'EXAMINATION Appendix 1 Binomial Distribution Appendix 2 Poisson Distribution Appndix 3 Standard Normal Distribution Appendix 4 Table of Random Numbers Appendix 5 Student's t Distribution Appendix 6 Chi-Square Distribution Appendix 7 F Distribution Appendix 8 Durbin-Watson Statistic Appendix 9 Wilcoxon W Appendix 10 KolrnogorovSrnirnov Critical Values Appendix 11 ADF Critical Values Appendix 12 Data Souroes on the Web Introduction NATURE OF STATISTICS refers to the collection, presentation, analysis, and utilization of numerical data to make nd reach decisions in the face of uncertainty in economics, business, and other social and cs is subdivided into descriptive and inferential. Descriptive statistics is concerned with g and describing a body of data. Inferential statistics is the process of reaching general- whole (called the population) by examining a portion (called the sample). In order valid, the sample must be representative of the population and the probability of error also statistics is discussed in detail in Chap. 2. This is followed by (the more crucial) rence; Chap. 3 deals with probability, Chap. 4 with estimation, and Chap. 5 with hypoth- EXAMPLE 1. Suppose that we have data on the incomes of 1000 U.S. families. This body of data can be summarized by finding the average family income and the spread of these family incomes above and below the ata also can be described by constructing a table, chart, or graph of the number or proportion of wme class. This is descriptive statistics. If these 1000 families are representative of all U.S. estimate and test hypotheses about the average family income in the United States as a whole. ns are subject to error, we also would have to indicate the probability of error. This is ECONOMETRICS refers to the application of economic theory, mathematics, and statistical techniques f testing hypotheses and estimating a.nd forecasting ewnomic phenomena. Ewno- strongly identified with regression analysis. This relates a dependent variable to one nt or explanatory variables. Since relationships among economic variables are disturbance or error term (with well-defined probabilistic properties) must be 7 deal with regression analysis; Chap. 8 extends the basic regression model; Chap. 9 s of testing and correcting for violations in the assumptions of the basic regression . 10 and 11 deal with two specific areas of econometrics, specifically simultaneous- -series methods. Thus Chaps. 1 to 5 deal with the statistics required for econometrics Chapter 12 is concerned with using the computer to aid in the calculations involved in Copyright 2002 The McGraw-Hill Companies, Inc. Click Here for Terms of Use. 2 INTRODUCTION [CHAP. 1 EXAMPLE 2. Consumption theory tells us that, in general, people increase their consumption expenditure C as their disposable (after-tax) income Yd increases, but not by as much as the increase in their disposable income. This can be stated in explicit linear equation form as where bo and bl are unknown constants called parameters. The parameter bl is the slope coefficient representing the marginal propensity to consume (MPC). Since even people with identical disposable income are likely to have somewhat different consumption expenditures, the theoretically exact and deterministic relationship represented by Eq. (1.1) must be modified to include a random disturbance or error term, u, making it stochastic: 1.3 THE METHODOLOGY OF ECONOMETRICS Econometric research, in general, involves the following three stages: 1. Specification of the model or maintained hypothesis in explicit stochastic equation form, together with the a priori theoretical expectations about the sign and size of the parameters of the function. 2. Collection of data on the variables of the model and estimation of the coefficients of the function with appropriate econometric techniques (presented in Chaps. 6 to 8). 3. Evaluation of the estimated coefficients of the function on the basis of economic, statistical, and econometric criteria. EXAMPLE 3. The $rst stage in econometric research on consumption theory is to state the theory in explicit stochastic equation form, as in Eq. (1.1), with the expectation that bo > 0 (i.e., at Yd = 0, C > 0 as people dissave and/or borrow) and 0 < bl < 1. Thc second stage involvcs thc collcction of data on consumption cxpcnditurc and disposable income and estimation of Eq. (1 .I). The thirdstage in econometric research involves (1) checking to see if the estimated value of bo > 0 and if 0 < bl < 1; (2) determining if a "satisfactory" proportion of the variation in Cis "explained" by changes in Yd and if bo and bl are "statistically significant at acceptable levels" [see Prob. 1.13(c) and Sec. 5.21; and (3) testing to see if the assumptions of the basic regression model are satisfied or, if not, how to correct for violations. If the estimated relationship does not pass these tests, the hypothesized relationship must be modified and reestimated until a satisfactory estimated consumption relationship is achieved. Solved Problems THE NATURE OF STATISTICS 1.1 What is the purpose and function of (a) The field of study of statistics? (b) Descriptive sta- tistics? (c) Inferential statistics? (a) Statistics is the body of procedures and techniques used to collect, present, and analyze data on which to base decisions in the face of uncertainty or incomplete information. Statistical analysis is used today in practically every profession. The economist uses it to test the efficiency of alternative production techniques; the businessperson may use it to test the product design or package that maximizes sales; the sociologist to analyze the result of a drug rehabilitation program; the industrial psychologist to examine workers' responses to plant environment; the political scientist to forecast voting patterns; the physician to test the effectiveness of a new drug; the chemist to produce cheaper fertilizers; and so on. (b) Descriptive statistics summari~esa body of data with one or two pieces of information that characteri~e the whole data. It also refers to the presentation of a body of data in the form of tables, charts, graphs, and other forms of graphic display. CHAP. 11 INTRODUCTION (c) Inferential statistics (both estimation and hypothesis testing) refers to the drawing of generalizations about the properties of the whole (called a population) from the specific or a sample drawn from the population. Inferential statistics thus involves inductive reasoning. (This is to be contrasted with deductive reasoning, which ascribes properties to the specific starting with the whole.) 1.2 (a) Are descriptive or inferential statistics more important today? (b) What is the importance of a representative sample in statistical inference? (c) Why is probability theory required? (a) Statistics started as a purely descriptive science, but it grew into a powerful tool of decision making as its inferential branch was developed. Modern statistical analysis refers primarily to inferential or inductive statistics. However, deductive and inductive statistics are complementary. We must study how to generate samples from populations before we can learn to generalize from samples to popula- tions. (b) In order for statistical inference to be valid, it must be based on a sample that fully reflects the characteristics and properties of the population from which it is drawn. A representative sample is ensured by random sampling, whereby each element of the population has an equal chance of being included in the sample (see Sec. 4.1). (c) Since the possibility of error exists in statistical inference, estimates or tests of a population property or characteristic are given together with the chance or probability of being wrong. Thus probability thcory is an csscntial clcmcnt in statistical infcrcncc. 1.3 How can the manager of a firm producing lightbulbs summarize and describe to a board meeting the results of testing the life of a sample of 100 lightbulbs produced by the firm? Providing the (raw) data on the life of each in the sample of 100 lightbulbs produced by the firm would be very inconvenient and time-consuming for the board members to evaluate. Instead, the manager might summarize the data by indicating that the average life of the bulbs tested is 360 h and that 95% of the bulbs tested lasted between 320 and 400 h. By doing this, the manager is providing two pieces of information (the average life and the spread in the average life) that characterize the life of the 100 bulbs tested. The manager also might want to describe the data with a table or chart indicating the number or proportion of bulbs tested that lasted within each 10-h classification. Such a tubular or graphic representation of the data is also very useful for gaining a quick overview of the data. In summarizing and describing the data in the ways indicated, the manager is engaging in descriptive statistics. It should be noted that descriptive statistics can be used to summarize and describe any body of data, whether it is a sample (as above) or a population (when all the elements of the population are known and its characteristics can be calculated). 1.4 (a) Why may the manager in Prob. 1.3 want to engage in statistical inference? (b) What would this involve and require? (a) Quality control requires that the manager have a fairly good idea about the average life and the spread in the life of the lightbulbs produced by the firm. However, testing all the lightbulbs produced would destroy the entire output of the firm. Even when testing does not destroy the product, testing the entire output is usually prohibitively expensive and time-consuming. The usual procedure is to take a sample of the output and infer the properties and characteristics of the entire output (population) from the corresponding characteristics of a sample drawn from the population. (b) Statistical inference requires first of all that the sample be representative of the population being sampled. If the firm produces lightbulbs in different plants, with more than one workshift, and with raw materials from different suppliers, these must be represented in the sample in the proportion in which they contribute to the total output of the firm. From the average life and spread in the life of the bulbs in the sample, the firm manager might estimate, with 95% probability of being correct and 5% probability of being wrong, the average life of all the lightbulbs produced by the firm to be between 320 and 400 h (see Sec. 4.3). Instead, the manager may use the sample information to test, with 95% probability of being correct and 5% probability of being wrong, that the average life of the population of all the bulbs produced by the firm is greater than 320 h (see Sec. 5.2). In estimating or testing the average for a population from sample information, the manager is engaging in statistical inference.