Probability and Statistics for Economists Yongmiao Hong Department of Economics & Department of Statistical Science Cornell University Email: [email protected] March 2013 (cid:13)c 2013 Yongmiao Hong. All rights reserved. 1 Outline of Contents Preface Chapter 1 Introduction to Probability and Statistics 1.1 Quantitative Analysis 1.2 Fundamental Axioms of Econometrics 1.3 Roles of Statistics in Economics 1.4 Limitations of Statistical Analysis in Economics 1.5 Conclusion Exercise 1 Chapter 2 Foundation of Probability Theory 2.1 Random Experiments 2.2 Basic Concepts of Probability 2.3 Review of Set Theory 2.4 Fundamental Probability Laws 2.4.1 Interpretation of Probability 2.4.2 Basic Probability laws 2.5 Methods of Counting 2.5.1 Permutation 2.5.2 Combination 2.6 Conditional Probability 2.7 Bayes’ Theorem 2.8 Independence 2.9 Conclusion Exercise 2 Chapter 3 Random Variables and Univariate Probability Distributions 3.1 Random Variables 3.2 Distribution Functions 3.3 Discrete Random Variable 3.4 Continuous Random Variables 3.5 Functions of a Random Variable 3.6 Mathematical Expectations 3.7 Moments 3.8 Quantiles 3.9 Moment Generating Function 3.10 Characteristic Function 3.11 Conclusion Exercise 3 Chapter 4 Important Probability Distributions 4.1 Introduction 4.2 Discrete Probability Distributions 2 4.2.1 Bernoulli Distribution 4.2.2 Binomial Distribution 4.2.3 Negative Binomial Distribution 4.2.4 Geometric Distribution 4.2.5 Poisson Distribution 4.3 Continuous Probability Distributions 4.3.1 Uniform Distribution 4.3.2 Beta Distribution 4.3.3 Normal Distribution 4.3.4 Cauchy and Stable Distributions 4.3.5 Lognormal Distribution 4.3.6 Gamma and Generalized Gamma Distributions 4.3.7 Chi-square Distribution 4.3.8 Exponential and Weibull Distributions 4.3.9 Double Exponential Distribution 4.4 Conclusion Exercise 4 Chapter 5 Multivariate Probability Distributions 5.1 Random Vectors and Joint Probability Distributions 5.1.1 The Discrete Case 5.1.2 The Continuous Case 5.2 Marginal Distributions 5.2.1 The Discrete Case 5.2.2 The Continuous Case 5.3 Conditional Distributions 5.3.1 The Discrete Case 5.3.2 The Continuous Case 5.4 Independence 5.5 Bivariate Transformation 5.6 Bivariate Normal Distribution 5.7 Expectations and Covariance 5.8 Joint Moment Generating Function 5.9 Implications of Independence on Expectations 5.10 Conditional Expectations 5.11 Conclusion Exercise 5 Chapter 6 Introduction to Statistics 6.1 Population and Random Sample 6.2 Sampling Distribution of Sample Mean 6.3 Sampling Distribution of Sample Variance 6.4 Student’s t Distribution 6.5 Snedecor’s F Distribution 6.6 Sufficient Statistics 3 6.7 Conclusion Exercise 6 Chapter 7 Convergences and Limit Theorems 7.1 Limits and Orders of Magnitude: A Review 7.2 Motivation for Convergence Concepts 7.3 Convergence in Quadratic Mean and L -convergence p 7.4 Convergence in Probability and Weak Law of Large Numbers 7.5 Almost Sure Convergence and Strong Law of Large Numbers 7.6 Convergence in Distribution 7.7 Central Limit Theorems 7.8 Conclusion Exercise 7 Chapter 8 Parameter Estimation and Evaluation 8.1 Population and Distribution Model 8.2 Maximum Likelihood Estimation 8.3 Asymptotic Properties of MLE 8.4 Method of Moments and Generalized Method of Moments 8.4.1 Method of Moments Estimation 8.4.2 Method of Generalized Moment Estimation 8.5 Asymptotic Properties of GMM 8.6 Mean Squared Error Criterion 8.7 Best Unbiased Estimators 8.8 Cramer-Rao Lower Bound 8.9 Conclusion Exercise 8 Chapter 9 Hypothesis Testing 9.1 Introduction to Hypothesis Testing 9.2 Neyman-Pearson Lemma 9.3 Wald Test 9.4 Lagrangian Multiplier (LM) Test 9.5 Likelihood Ratio (LR) Test 9.6 Illustrative Examples 9.6.1 Hypothesis Tests Under Bernoulli Distribution 9.6.2 Hypothesis Tests Under Normal Distribution 9.7 Conclusion Exercise 9 Chapter 10 Conclusion 10.1 Summary 10.2 Direction for Studies in Econometrics Exercise 10 References 4 Preface This book is an introduction to probability theory and statistics for graduate students in economics, finance, management, statistics, applied mathematics, and other related fields. Statistics is a science about data. It consists of descriptive statistics and inferential statis- tics. The former is on collection, summary, analysis and presentation of often a large amount of observed data in a simple and interpretable manner. The latter, on the other hand, makes use of fundamental laws of probability and statistical inference to draw conclusions about the underlying system that generates the observed data. Probability and statistics is often the first course in a graduate econometrics sequence in North American universities. Why do we need to teach probability and statistics to graduate students in economics? Put it simply, such a course provides necessary probability and statistics background for first-year graduate students for their courses in econometrics, microeconomics, macroeconomics, and finance. Statistics and calculus are two basic analytic tools in economics. Statistics is an essential tool to study situa- tions involving uncertainties, in the same way as calculus is essential to characterize optimizing behaviors in economics. For example, probability theory is needed in the study of game theory. In macroeconomics, as Robert Lucas points out, the introduction of stochastic factors can pro- vide much new insights into dynamic economic systems. In certain sense, a course on probability and statistics might not be called Econometrics I, because they are necessary analytic tools in every field of economics. Of course, the demand for probability and statistics varies from field to field in economics, with econometrics most heavily using it. Specifically, those who are attracted to theoretical econometrics would be expected to study probability and statistics in more depth by taking further graduate courses in mathematics and statistics. Those who are not attracted to should find this book an adequate preparation in the theory of probability and statistics for both their applied courses in econometrics and their courses in microeconomics and macroeconomics. The aims of this book are two-folds: First, it provides essentials of probability theory and mathematical statistics needed for a graduate student in economics and finance. Second, it offers tuitions, explanations, and applications of important probability and statistical concepts and tools from an economic perspective. Indeed, there have been many textbooks on probability and statistics. It is the second aim that motivates the writing of this textbook. It is strongly believed that the second aim is an indispensable part of training for graduate students in economics and finance when they study probability and statistics. The book is written as a one semester course. It consists of two parts: Part I is probability theory, and Part II is statistical theory. Probability theory is a natural mathematical tool to describe stochastic phenomena. A solid background in probability theory allows students to have a better understanding of statistical inference. Without some formalism of probability theory, students cannot appreciate the valid interpretation from data analysis through modern statistical methods. Chapters 1 – 5 are probability theory, and Chapters 6-9 are statistics theory. Chapter 1 is an introduction to probability and statistics, arguing why probability and statistics are basic analytic tools for economics. Chapter 2 lays down the foundation of probability theory, which is important for understanding subsequent materials. Chapter 3 introduces random variables and probability distributions in a univariate context. Chapter 4 discusses important examples of 5 discrete and continuous probability distributions that are commonly used in economics and fi- nance. Chapter 5 introduces random vectors and multivariate probability distributions. In most cases, we consider bivariate distributions, which offer much insight for multivariate distributions. Chapter 6 is an introduction to sampling theory, focusing on the classical statistical sampling theory under the normality distribution assumption. Chapter 7 introduces basic analytic tools for large sample or asymptotic theory suitable for analysis under the nonnormality distribution assumption. Chapter 8 discusses parameter estimation methods and methods to evaluate pa- rameter estimators. Chapter 9 deals with hypothesis testing. Finally, Chapter 10 concludes the book. The book contains discussions ranging from basic concepts to advanced asymptotic analysis, with much intuition and explanations provided for important probability and statistics concepts and ideas. The purpose of this book is to develop a deep understanding of probability and statistics and a solid intuition for statistical concepts. One year of calculus is a prerequisite for understanding the materials in this book, and an additional year of advanced calculus and some basic back- ground in probability and statistics will be very helpful. The analysis is conducted in a relatively rigorous manner. Proofs will be given for some important theorems, because the proofs them- selves can aid understanding and in some cases, the proof techniques or methods have practical value, particularly for the students who are later interested in econometric theory. On the other hand, graphical representation is also useful to understand the abstract and subtle concepts of probability and statistics. Many students taking this course are experiencing the ideas of probability and statistics for thefirsttime. Itisimportantforstudentsineconomicsandfinancetobuildupstochasticthinking and statistical thinking. Essentially, this requires, among many other things, students to view observed economic data to be generated from a stochastic economic system or process and the observeddata, whichrepresentslimitedinformationofthesystem, canbeusedtomakeinferences on the stochastic system or process. It will be helpful for them to spend some time learning how the mathematical ideas of probability and statistics carry over into the world of applications in economics and finance. Thus, in addition to develop a fundamental understanding of probability and mathematical statistics that are most relevant to modern econometrics, this book also tries to develop a sound intuition for statistical concepts from economic perspective. For example, why are statistical concepts (e.g., conditional mean, conditional variance) useful in economics? What are economic intuition and interpretation for probability and statistical relations? The book will provide many economic and financial examples to illustrate how probability tools and statistical methods can be used in economic analysis. This is in fact a most important feature that distinguishes this book from many other textbooks on probability and statistics. The book is based on my lecture notes that I have been using in teaching probability and statistics for the first year doctoral students in Department of Economics at Cornell University. I thank the comments from the students who have taken this course. Moreover, I thank Weiping Bao, Daumantas Bloznelis, Biqing Cai, Ying Fang, Muyi Li, Ming Lin, Xia Wang, Ke Xiao for their comments and suggestions. I’m also grateful for their comments and suggestions. Yongmiao Hong 6 Ernest S. Liu Professor of Economics & International Studies Department of Economics & Department of Statistical Science Cornell University, Ithaca, U.S.A. 7 Chapter 1 Introduction to Probability and Statistics Abstract: Probability has become the best analytic tool to describe any system involving uncertainties, and statistics provides mathematical foundation to model situations involving uncertainty. As the beginning of this book, this chapter will introduce two fundamental axioms behind modern econometrics, emphasizes the important role of statistics in economics and also discusses the limitation of statistics analysis in economics. Key words: Chaos, Data generating process, Econometrics, Quantitative analysis, Probability law, Uncertainty. 1.1 Quantitative Analysis in Economics The most important feature of modern economics and finance is the wide use of quantita- tive analysis. Quantitative analysis consists of mathematical modeling of economic theory and empirical study of economic data. This is due to the cumulative effort of many generations of economists in their efforts to make economics a science, something like or close to physic- s, chemistry and biology, which can make accurate predictions or forecasts. Economic theory, when formulated via mathematical tools, can achieve its logical consistency among assumptions, theories, and its implications. Indeed, as Karl Marx points out, the use of mathematics is an indication of the mature stage of a science. On the other hand, for any economic theory to be a science, it must be able to explain important empirical stylized facts and to predict future economic evolutions. This requires validating economic models using the observed economic phenomena, usually in form of data. Mathematical tools cannot help in achieving this objective. Instead, statistical tools have proven to be rather useful. The history of the development of economics is a continuous process of refuting the existing economic theory that cannot explain new empirical stylized facts and developing new economic theories that can explain new observed empirical stylized facts. Empirical analytic tools play a vital role in such a process. In a sense, statistical methods and techniques are really the heart of the scientific research in economics. As a matter of fact, the main empirical analytic tool in economics is econometrics. Econo- metrics is the statistical analysis of economic data in combination with economic theory. There is a lot of uncertainty in real economies and financial markets, and economic agents usually have to make decisions under certainty. Probability is a natural quantitative tool to describe uncertainty in economics. Historically, probability was motivated by interest in games of chance. Scholars then began to apply probability theory to actuarial problems and some aspects of the social sciences. Later, probability and statistics were introduced into physics by L. Boltzmann, J. Gibbs, and J. Maxwell, and by last century, they have found applications in all phases of human endeavor that in some way involve an element of uncertainty or risk. Indeed, probability theory has become the best analytic tool to describe any system involving uncertainties. Modern statistics has encompassed the science of basing inferences on observed data and the entire problem of making decisions in the face of uncertainty. It would be presumptuous to say that statistics, in its present state of development, can handle all situations involving uncertainties, but new techniques are constantly being developed and modern statistics can, at least, provides a framework for looking at the situations involving uncertainty in a logical and 8 systematic fashion. It can be said that statistics provides mathematical models that are needed to study situations involving uncertainty in the same way as calculus provides the mathematical models that are needed to describe, say the concepts of Newtonian physics. Indeed, as Robert Lucas points out, the introduction of stochastic factors into a dynamic economic system can provide new insight into dynamic economic processes. 1.2 Fundamental Axioms of Statistical Analysis in Economics There are two fundamental axioms behind modern econometrics: • Axiom A: Any economy can be viewed as a stochastic process governed by some probability law; • Axiom B: Economic phenomenon, as often summarized in form of data, can be viewed as a realization of this stochastic data generating process. Economicsisaboutresourceallocationinanuncertainenvironment. Whenaneconomicagent makes a decision, he or she usually does not know precisely the outcome of his or her action, which usually will arise in an unpredictable manner after some time lags. As a consequence, uncertainty and time are two of the most important features of an economy. Therefore it seems reasonable to assume Axiom A. With Axiom A, it is natural to assume Axiom B under which one can call the economic system a “data generating process”. It is impossible to prove these two axioms. They are the philosophic views of econometricians and economists about an economy. We note that not all economists and econometricians agree with these two axioms. For example, some economists view that an economic system is a chaotic process, which is deterministic but can generate seemingly random numbers. 1500 SS&&PP550000 ddaaiillyy cclloossiinngg pprriiccee 0.10 ddaaiillyy rreettuurrnn 1250 0.05 0.00 1000 -0.05 750 -0.10 500 -0.15 250 -0.20 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 Figure 1.1: Standard & Poor 500 Da ily Closing Prices and Daily Returns To illustrate the different implications between the stochastic view and chaotic view on an economy, we consider an example. As a well-known empirical stylized fact, high frequency stock returns are found to have little autocorrelation with their own lagged returns. Figure 1.1 plots the observations on the Standard & Poor 500 daily closing price and daily return over time 9 respectively. To explain this empirical stylized fact, there are at least two possible hypotheses or conjectures. The first is to assume that the stock price follows a geometric random walk, that is, lnP = lnP +X , t t−1 t so that its return series {X = ln(P /P )} is a statistically independent sequence over time, t t t−1 which implies zero correlation between stock returns over time. Figure 1.2 plots the observations on the price level P and the return X generated from this geometric random walk model using t t a random number generator on a personal computer. Comparing Figures 1.1 and 1.2, we can observe some similarity between the real data series and the artificial series generated from the computer. 0.08 700 PPrriiccee lleevveell PPtt 0.07 RReettuurrnn XXtt 0.06 600 0.05 0.04 500 0.03 400 0.02 0.01 300 0.00 200 -0.01 -0.02 100 -0.03 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 Figure 1.2: Price Series P and Return Series X t t Alternatively, one may assume that the stock return follows a deterministic chaotic logistic map: X = 4X (1−X ). t t−1 t−1 If we generate a large set of observations from this logistic map and calculate the sample au- tocorrelations between observations over time, we would find zero or little correlations as well. Thus, both the stochastic random walk hypothesis and the deterministic logistic map hypothesis can explain the empirical stylized fact of zero or little autocorrelation in high-frequency stock returns. However, their implications are different: the random walk hypothesis implies that the future stock return cannot be predicted using historical stock returns, because a future stock return is independent of historical stock returns. On the other hand, the time series observations from a logistic map display zero autocorrelation, but they are not independent over time. In fact, there exists a deterministic nonlinear quadratic relationship between X and X from which t t−1 one can predict X perfectly using X . Which view is more realistic to explain stock returns is t t−1 an issue for empirical study. The probability law of a stochastic economic process describes the average behavior of mass economic phenomena and may be called the “law of economic motions”. The objective of econo- metrics is to infer the probability law of an economic system based on observed economic data, 10