Essential Statistics, Regression, and Econometrics Second Edition Gary Smith Pomona College Amsterdam(cid:129)Boston(cid:129)Heidelberg(cid:129)London(cid:129)NewYork(cid:129)Oxford Paris(cid:129)SanDiego(cid:129)SanFrancisco(cid:129)Singapore(cid:129)Sydney(cid:129)Tokyo AcademicPressisanimprintofElsevier AcademicPressisanimprintofElsevier 125,LondonWall,EC2Y5AS 525BStreet,Suite 1800,SanDiego, CA 92101-4495,USA 225WymanStreet,Waltham,MA02451,USA TheBoulevard,LangfordLane,Kidlington, OxfordOX51GB,UK Copyright©2015, 2012GarySmith.PublishedbyElsevierInc.Allrights reserved. Nopartofthispublication maybereproducedortransmittedinanyform orbyany means, electronicormechanical,including photocopying,recording,oranyinformation storageand retrievalsystem,without permissioninwritingfromthepublisher. Detailsonhow toseek permission,furtherinformation aboutthePublisher’s permissions policiesandour arrangements withorganizations suchastheCopyrightClearance CenterandtheCopyright LicensingAgency,canbefoundatourwebsite:www.elsevier.com/permissions. Thisbook andtheindividual contributions containedinitare protectedundercopyright bythe Publisher(otherthanasmaybenotedherein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitionersandresearchersmustalwaysrelyontheirownexperienceandknowledgein evaluatingandusingany information,methods, compounds,orexperimentsdescribed herein. Inusingsuchinformationormethodstheyshouldbemindfuloftheirownsafety andthesafety ofothers, includingparties forwhomtheyhaveaprofessional responsibility. Tothefullestextentofthelaw, neitherthePublishernortheauthors,contributors, oreditors, assumeany liability forany injuryand/or damagetopersons orproperty asamatter of productsliability, negligenceorotherwise,orfromany useoroperationofany methods, products,instructions,orideascontained inthematerialherein. ISBN:978-0-12-803459-0 BritishLibraryCataloguinginPublicationData Acataloguerecord forthis bookisavailablefromtheBritishLibrary LibraryofCongress Cataloging-in-Publication Data Acatalog recordforthisbook isavailablefromtheLibraryofCongress ForInformationonallAcademic Presspublications visitourwebsiteathttp://store.elsevier.com/ Introduction Econometricsiselegantandpowerful.Manydepartmentsofeconomics,politics,psychology, andsociologyrequirestudentstotakeacourseinregressionanalysisoreconometrics.Sodo many business, law, and medical schools. These courses are traditionally preceded by an introductorystatisticscoursethatadherestothefirehosepedagogy:bombardstudentswith informationandhopetheydonotdrown.Encyclopedicstatisticscoursesareamilewideand aninchdeep,andmanystudentsrememberlittleafterthefinalexam.Thistextbookfocuses on what students really need to know and remember. EssentialStatistics,Regression,andEconometricsiswrittenforanintroductorystatistics coursethathelpsstudentsdevelopthestatisticalreasoningtheyneedforregressionanalysis. It can be used either for a statistics class that precedes a regression class or for a one-term course that encompasses statistics and regression. Onereasonforthisbook’sfocusedapproachisthatthereisnotenoughtimeinaone- term course to cover the material in more encyclopedic books. Another reason is that an unfocused course overwhelms students with so much nonessential material that they have trouble remembering what is essential. This book does not cover the binomial distribution and related tests of a population successprobability.Alsoomittedaredifference-in-meanstests,chi-squaretests,andANOVA tests. Instructors who cover these topics can use the supplementary material at the book’s website. Theregressionchaptersattheendofthebooksetupthetransitiontoamoreadvanced regression or econometrics course and are also sufficient for students who take only one statistics class but need to know how to use and understand basic regression analysis. This textbook is intended to give students a deep understanding of the statistical reasoning they need for regression analysis. It is innovative in its focus on this preparation and in the extended emphasis on statistical reasoning, real data, pitfalls in data analysis, modeling issues, and word problems. Too many people mistakenly believe that statistics courses are too abstract, mathematical, and tedious to be useful or interesting. To demon- stratethepower,elegance,andevenbeautyofstatisticalreasoning,thisbookincludesalarge number of compelling examples, and discusses not only the uses but also the abuses of statistics. These examples show how statistical reasoning can be used to answer important questions and also expose the errors—accidental or intentional—that people often make. Thegoalistohelpstudentsdevelopthestatisticalreasoningtheyneedforlatercourses and for life after college. I am indebted to the reviewers who helped make this a better book: Woody Studenmund, The Laurence de Rycke Distinguished Professor of Economics, Occidental College; Michael Murray, Bates College; Steffen Habermalz, Northwestern University; and Manfred Keil, Claremont Mckenna College. Mostofall,Iamindebtedtothethousandsofstudentswhohavetakenstatisticscourses fromme—fortheirendlessgoodwill,contagiousenthusiasm,and,especially,forteachingme how to be a better teacher. ix 1 Data, Data, Data CHAPTER OUTLINE Measurements........................................................................................................................................2 FlyingBlindandClueless..................................................................................................................3 TestingModels.......................................................................................................................................4 ThePolitical BusinessCycle..............................................................................................................5 MakingPredictions................................................................................................................................5 Okun’sLaw.........................................................................................................................................5 NumericalandCategoricalData...........................................................................................................6 Cross-SectionalData..............................................................................................................................6 TheHamburgerStandard.................................................................................................................7 TimeSeriesData....................................................................................................................................8 SilencingBuzzSaws..........................................................................................................................8 Longitudinal(orPanel)Data...............................................................................................................10 IndexNumbers (Optional)...................................................................................................................10 TheConsumerPriceIndex..............................................................................................................11 TheDowJonesIndex......................................................................................................................12 DeflatedData.......................................................................................................................................12 NominalandRealMagnitudes......................................................................................................13 TheRealCostofMailingaLetter.................................................................................................15 RealPerCapita................................................................................................................................16 Exercises................................................................................................................................................16 You’re right, we did it. We’re very sorry. But thanks to you, we won’t do it again. BenBernanke The Great Depression was a global economic crisis that lasted from 1929 to 1939. Millions of people lost their jobs, their homes, and their life savings. Yet, government officials knew too little about the extent of the suffering, because they had no data measuring output or unemployment. They instead had anecdotes:“It is a recession when your neighbor is unemployed; it is a depression when you lose your job.” Herbert Hoover was president of the United StateswhentheGreatDepressionbegan.Hewasverysmartandwell-intentioned,buthe EssentialStatistics,Regression,andEconometrics.http://dx.doi.org/10.1016/B978-0-12-803459-0.00001-7 1 Copyright©2015GarySmith.PublishedbyElsevierInc.Allrightsreserved. 2 ESSENTIAL STATISTICS, REGRESSION, AND ECONOMETRICS didnotknowthathewaspresidingoveraneconomicmeltdownbecausehisinformation came from his equally clueless advisors—none of whom had yet lost their jobs. He had virtually no economic data and no models that predicted the future direction of the economy. In his December 3, 1929, State of the Union message, Hoover concluded that “The problemswithwhichweareconfrontedaretheproblemsofgrowthandprogress”[1].In March 1930, he predicted that business would be normal by May [2]. In early May, Hoover declared that “we have now passed the worst” [3]. In June, he told a group that had come to Washington to urge action, “Gentlemen, you have come 60days too late. The depression is over” [4]. A private organization, the National Bureau of Economic Research (NBER), began estimating the nation’s output in the 1930s. There were no regular monthly unem- ployment data until 1940. Before then, the only unemployment data were collected in the census, once every 10years. With hindsight, it is now estimated that between 1929 and 1933, national output fell by one-third, and the unemployment rate rose from 3percent to 25percent. The unemployment rate averaged 19percent during the 1930s and never fell below 14percent. More than a third of the nation’s banks failed and household wealth dropped by 30percent. Behind these aggregate numbers were millions of private tragedies. One hundred thousand businesses failed and 12million people lost their jobs, income, and self- respect. Many lost their life savings in the stock market crash and the tidal wave of bankfailures.Withoutincomeorsavings,peoplecouldnotbuyfood,clothing,orproper medicalcare.Thosewhocouldnotpaytheirrentlosttheirshelter;thosewhocouldnot make mortgage payments lost their homes. Farm income fell by two-thirds and many farms were lost to foreclosure. Desperate people moved into shanty settlements (called Hoovervilles),sleptundernewspapers(Hooverblankets),andscavengedforfoodwhere they could. Edmund Wilson [5] reported that: There is not a garbage-dump in Chicago which is not haunted by the hungry. Last summer in the hot weather when the smell was sickening and the flies were thick, there were a hundred people a day coming to one of the dumps. Measurements Today, we have a vast array ofstatistical data that can help individuals, businesses, and governments make informed decisions. Statistics can help us decide which foods are healthy, which careers are lucrative, and which investments are risky. Businesses use statistics to monitor operations, estimate demand, and design marketing strategies. Government statisticians measure corn production, air pollution, unemployment, and inflation. Theproblemtodayisnotascarcityofdata,butratherthesensibleinterpretationand use of data. This is why statistics courses are taught in high schools, colleges, business Chapter 1 • Data, Data, Data 3 schools, law schools, medical schools, and Ph.D. programs. Used correctly, statistical reasoningcanhelpusdistinguishbetweeninformativedataanduselessnoise,andhelp us make informed decisions. Flying Blind and Clueless US government officials had so little understanding of economics during the Great Depression that even when they finally realized the seriousness of the problem, their policies were often counterproductive. In 1930, Congress raised taxes on imports to record levels. Other countries retaliated by raising their taxes on goods imported from the United States. Worldwide trade collapsed with US exports and imports falling by more than 50percent. In 1931, Treasury Secretary Andrew Mellon advised Hoover to “liquidate labor, liquidatestocks,liquidatethefarmers,liquidaterealestate”[6].WhenFranklinRoosevelt campaigned for president in 1932, he called Hoover’s federal budget “the most reckless and extravagant that I have been able to discover in the statistical record of any peacetime government anywhere, anytime” [7]. Roosevelt promised to balance the budget by reducing government spending by 25percent. One of the most respected financial leaders, Bernard Baruch, advised Roosevelt to “Stop spending money we haven’t got. Sacrifice for frugality and revenue. Cut government spending—cut it as rationsarecutinasiege.Tax–taxeverybodyforeverything”[8].Today—becausewehave modelsanddata—weknowthatcuttingspendingandraisingtaxesareexactlythewrong policies for fighting an economic recession. The Great Depression did not end until World War II, when there was a massive increase in government spending and millions of people enlisted in the military. The Federal Reserve (the “Fed”) is the government agency in charge of monetary policy in the United States. During the Great Depression, a seemingly clueless Federal Reserve allowed the money supply to fall by a third. In their monumental work, A Monetary History of the United States, Milton Friedman and Anna Schwartz argued that theGreatDepressionwaslargelyduetomonetaryforces,andtheysharplycriticizedthe Fed’sperversepolicies.Ina2002speechhonoringMiltonFriedman’s90thbirthday,Ben Bernanke,whobecameFedchairmanin2006,concludedhisspeech:“Iwouldliketosay to Milton and Anna: Regarding the Great Depression. You’re right, we did it. We’re very sorry. But thanks to you, we won’t do it again” [9]. During the economic crisis that began in the United States in 2007, the president, Congress,andFederalReservedidnotrepeattheerrorsofthe1930s.Facedwithacredit crisis that threatened to pull the economy into a second Great Depression, the gov- ernment did the right thing by pumping billions of dollars into a deflating economy. Instead of destroying the economy, they saved it. Why do we now know that cutting spending, raising taxes, and reducing the money supply are the wrong policies during economic recessions? Because we now have reasonable economic models that have been tested with data. 4 ESSENTIAL STATISTICS, REGRESSION, AND ECONOMETRICS Testing Models The great British economist John Maynard Keynes observed that the master economist “must understand symbols and speak in words” [10]. We need words to explain our reasoning, but we also need models so that our theories can be tested with data. In the 1930s, Keynes hypothesized that household spending rises and falls with in- come. This “consumption function” was the lynchpin of his explanation of business cycles. If people spend less, others will earn less and spend less, too. This fundamental interrelationshipbetweenspendingandincomeexplainshowrecessionscanpersistand grow like a snowball rolling downhill. If, on the other hand, people buy more coal from a depressed coal-mining area, the owners and miners will then buy more and better food, farmers will buy new clothes, and tailors will start going to the movies again. Not only do the coal miners gain; the region’s entire economy prospers. At the time, Keynes had no data to test his theory. It just seemed reasonable to him that households will spend more when their incomegoes up and spendless when their income goes down. Eventually, a variety of data were assembled that confirmed his intuition. Table 1.1 shows estimates of US aggregate disposable income (income after taxes)andspendingfortheyears1929through1940.Whenincomefell,sodidspending; and when income rose, so did spending. Table1.2showsaverydifferent typeofdatabasedonahouseholdsurveyduringthe years 1935–1936. Families with more income tended to spend more. Today, economists agree that Keynes’ hypothesis is correct—that spending does dependonincome—butthatotherfactorsalsoinfluencespending.Thesemorecomplex models can be tested with data, and we do so in later chapters. Table 1.1 US Disposable Personal Income and Consumer Spending, Billions of Dollars [11] Income Spending 1929 83.4 77.4 1930 74.7 70.1 1931 64.3 60.7 1932 49.2 48.7 1933 46.1 45.9 1934 52.8 51.5 1935 59.3 55.9 1936 67.4 62.2 1937 72.2 66.8 1938 66.6 64.3 1939 71.4 67.2 1940 76.8 71.3 Chapter 1 • Data, Data, Data 5 Table 1.2 Family Income and Spending, 1935–1936 [12] IncomeRange($) AverageIncome($) AverageSpending($) <500 292 493 500–999 730 802 1000–1499 1176 1196 1500–1999 1636 1598 2000–2999 2292 2124 3000–3999 3243 2814 4000–4999 4207 3467 5000–10,000 6598 4950 The Political Business Cycle There seems to be a political business cycle in the United States, in that the unem- ployment rate typically increases after a presidential election and falls before the next presidential election. The unemployment rate has increased in only three presidential election years since the Great Depression. This is no doubt due to the efforts of incumbent presidents to avoid the wrath of voters suffering through an economic recession. Two exceptions were the reelection bids of Jimmy Carter in 1980 (the unemploymentratewentup1.3percentagepoints)andGeorgeH.W.Bushin1992(the unemployment rate rose 0.7 percentage points). In each case, the incumbent was soon unemployed,too.Thethirdexceptionwasin2008,whenGeorgeW.Bushwaspresident; the unemployment rate rose 1percent and the Republicans lost the White House. In later chapters, we test the political business cycle model. Making Predictions Models help us understand the world and are often used to make predictions; for example, a consumption function can be used to predict household spending, and the political business cycle model can be used to predict the outcome of a presidential election. Here is another example. Okun’s Law TheUSunemploymentratewas6.6percentwhenJohnF.Kennedybecamepresidentof the United States in January 1961 and reached 7.1percent in May 1961. Reducing the unemployment rate was a top priority because of the economic and psychological distress felt by the unemployed and because the nation’s aggregate output would be higherifthesepeoplewereworking.Notonlywouldtheunemployedhavemoreincome if they were working, but also they would create more food, clothing, and homes for others to eat, wear, and live in. 6 ESSENTIAL STATISTICS, REGRESSION, AND ECONOMETRICS One of Kennedy’s advisors, Arthur Okun, estimated the relationship between gross domestic product (GDP) and the unemployment rate. His estimate, known as Okun’s law, was that output would be about 3percent higher if the unemployment rate were 1 percentage point lower. Specifically, if the unemployment rate had been 6.1percent, instead of 7.1percent, output would have been about 3percent higher. This prediction helped sell the idea to Congress and the public that there are both privateandpublicbenefitsfromreducingtheunemploymentrate.Laterinthisbook,we estimate Okun’s law using more recent data. Numerical and Categorical Data Unemployment, inflation, and other data that have natural numerical values—5.1 percentunemployment,3.2percentinflation—arecallednumericalorquantitativedata. The income and spending in Tables 1.1 and 1.2 are quantitative data. Some data, for example, whether a person is male or female, do not have natural numerical values (a person cannot be 5.1percent male). Such data are said to be cate- goricalorqualitativedata.Withcategoricaldata,wecountthenumberofobservationsin each category. The data can be described by frequencies (the number of observations) or relative frequencies (the fraction of the total observations); for example, out of 1000 people surveyed, 510, or 51percent, were female. The Dow Jones Industrial Average, the most widely reported stock market index, is based on the stock prices of 30 of the most prominent US companies. If we record whether the Dow went up or down each day, these would be categorical data. If we record the percentage change in the Dow each day, these would be numerical data. From 1901 through 2007, the Dow went up on 13,862days and went down on 12,727days. The relative frequency of up days is 52.1percent: 13;862 ¼0:521 13;862þ12;727 For the numerical data on daily percentagechanges, wemight calculate asummary statistic, such as the average percentage change (0.021percent), or we might separate the percentage changes into categories, such as the number of days when the per- centage change in the Dow was between 1 and 2percent, between 2 and 3percent, and so on. Cross-Sectional Data Cross-sectionaldataareobservationsmadeatthesamepointintime.Thesecouldbeon a single day, as in Table 1.3, which shows the price/earnings ratios for each of the Dow Jones stocks on February 5, 2015. Cross-sectional data can also be for a single week, month,oryear;forexample,thesurveydataonannualhouseholdincomeandspending in Table 1.2 are cross-sectional data. Chapter 1 • Data, Data, Data 7 Table 1.3 Price/Earnings Ratios for Dow Stocks, February 5, 2015 Company Price/EarningsRatio Company Price/EarningsRatio Merck 32.3 CiscoSystems 18.4 Visa 30.4 Wal-MartStores 18.2 AT&T 29.0 Johnson&Johnson 18.0 Nike 27.8 UnitedTechnologies 17.6 Procter&Gamble 25.0 Microsoft 17.1 HomeDepot 24.7 GeneralElectric 16.2 WaltDisney 24.1 AmericanExpress 15.1 Pfizer 23.4 Intel 14.7 Coca-Cola 23.2 Caterpillar 14.2 3M 22.2 ExxonMobil 11.6 Boeing 20.1 GoldmanSachs 10.6 Verizon 19.8 Chevron 10.1 McDonald’s 19.6 IBM 10.1 EIduPontdeNemours 19.5 TravelersCompanies 10.0 UnitedHealthGroup 19.1 JPMorgan 9.9 The Hamburger Standard The law of one price says that, in an efficient market, identical goods should have the same price. Applied internationally, it implies that essentially identical goods should cost about the same anywhere in the world, once we convert prices to the same cur- rency. Suppose the exchange rate between US dollars and British pounds (£) is 1.50dollars/pound. If a sweater sells for £20 in Britain, essentially identical sweaters shouldsellfor$30intheUnitedStates.IfthepriceoftheAmericansweatersweremore than $30, Americans would import British sweaters instead of buying overpriced Americansweaters.If thepricewerelessthan$30,the EnglishwouldimportAmerican sweaters. The law of one price works best for products (like wheat) that are relatively homo- geneous and can be imported relatively inexpensively. The law of one price does not work very well when consumers do not believe that products are similar (for example, BMWs and Chevys) or when there are high transportation costs, taxes, and other trade barriers. Wine can be relatively expensive in France if the French prohibit wine imports or tax imports heavily. A haircut and round of golf in Japan can cost more than in the UnitedStates,becauseitisimpracticalfortheJapanesetohavetheirhaircutinIowaand play golf in Georgia. Since 1986, The Economist, an influential British magazine, has surveyed the prices of Big Mac hamburgers around the world. Table 1.4 shows cross-sectional data on the prices of Big Mac hamburgers in 20 countries. The law of one price does not apply to Big Macs, because Americans will not travel to Hong Kong to buy a hamburger for lunch.