ebook img

Statistics Made Simple PDF

196 Pages·1968·19.998 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Statistics Made Simple

Statistics Made Simple H. T. Hayslett, Jr. 12 IL':> •••• 12.r |)|(\\\ ! Ol II11MK I'KiHIMill in STATISTICS MADE SIMPLE HAYSLETT, H. T. JR., M.S. ASSISTANT PROFESSOR OF MATHEMATICS, COLBY COLLEGE MADE SIMPLE BOOKS DOUBLEDAY & COMPANY, INC. GARDEN CITY, NEW YORK ABOUT THIS BOOK Thisbookcontainsaselection oftopicsfrom avast amountofmaterialinthefield ofstatistics.Theuseof statistical techniques in experiments in areas ofscience asdiverse as agronomy, anthropology, and astro- physics is becoming more and more important, to say nothing of the use of statistics in economic and government forecasting and planning. Nomathematicaltrainingexcepthigh-school algebraispresupposedinthisbook, although whenyou read some sections it will be clear that themore mathematical maturity you possess the better. Perhaps summation notation, inequalities, and the equations ofstraightlines are unfamiliar to you. Ifso, you will find thatthesetopics aretreated wherethey areneeded sothat thebookis self-contained. Numerous examples are included. Most of the data used in the examples are artificial. No attempt has been made to have examples from all subject areas, nor to provide realistic examples in every case. The author has assumed that anyone usingthisbook will be motivated tolearn some ofthe concepts and techniquesofstatisticsbecausehemusthave someknowledgeofthesubjectforhisworkorhisstudy.The reader can, therefore, undoubtedly supply his own specific applications after reading the explanation and examples given. In the later chapters, where the problems become more complex, step-by-stepdirections formakingvariousstatisticaltestsaregivenandthenthesedirectionsareillustratedbymeansofexamples. The material in Chapters 2 through 4 is basic, and is needed throughout the book. The material in Chapter 5 is needed specifically for Chapter 6. More generally, though, a knowledge of probability is helpfulthroughout, althoughitisnotessential tothe abilityto performvarioustests. IwouldliketothankProfessorWilfredJ.Combellackforhisencouragementandadvice,andforgiving so freely of his time, energy, knowledge, and wisdom. Professor Combellack read most of the chapters, and nearly all ofhisnumeroussuggestionsfor improvement have beenincorporated intothetext, thereby makingit clearer and eliminating many errors.Thereader ofthisbookisin hisdebt almost as much asI. The errors that might remain are my sole responsibility. Finally, I wish to thank my wife, Loyce, for her help in innumerable ways. Not onlydid she read, criticize, type, and check the entire manuscript, but she took on many of my responsibilities at home during the months that I was engaged in writing this book while teaching full-time.This book could not havebeen completed withouther help, and I am grateful for it. H. T. Hayslett, Jr. LibraryofCongressCatalogCardNumber67-10414 Copyright © 1968 by Doubleday & Company, Inc. AllRightsReserved PrintedintheUnited StatesofAmerica . .. TABLE OF CONTENTS About This Book „ ,..., , 2 ProbabilitiesofSimpleEvents „ 36 ProbabilitiesofTwoEvents 38 CHAPTER ProbabilitiesforCombinationsofThree 1 orMoreEvents _ 43 What Is Statistics? Permutations 47 ThePresentImportanceofStatistics 6 FundamentalPrinciple 47 TwoKindsofStatistics .....„. 6 Combinations 50 MoreProbability mmm 52 CHAPTER 2 TheBinomialDistribution ..„«. 53 TheTheoreticalMeanoftheBinomial Pictorial Description of Data Distribution _ 60 Introduction „ _ 8 TheTheoreticalVarianceoftheBinomial SelectingaRandomSample _ 8 Distribution 63 ClassificationofData _ 9 Exercises.M ... „^._ 65 FrequencyDistributionsandCumulative FrequencyDistributions .. .. 11 CHAPTER 6 GraphicalRepresentationofData Histogram „ 12 The Normal Distribution FrequencyPolygon 14 Introduction _....„ 66 Ogice 14 TheNormalDistribution 67 Exercises........... ... ..^.. 15 UseofStandardNormalTables 68 MoreNormalProbabilities „_ 73 CHAPTER 3 TheNormalApproximationtothe Binomial „ 75 Measures of Location Theorem 77 Introduction „..„ ........ 16 Exercises.^ M_ 81 TheMidrange 16 TThheeMMeoddiean „.„ 1177 CHAPTER 7 TheArithmeticMean - 17 Some Tests of Statistical Hypotheses TheMedianofClassifiedData ,. 18 Introduction — 82 SummationNotation » 20 TheNatureofaStatisticalHypothesis The MeanofClassifiedData 22 TwoTypesofError 82 Exercises _•• ............. 23 TestofHo:r—toversusaSpecified Alternative «._. 84 TestsabouttheMeanofaNormal CHAPTER 4 Distribution 89 Measures of Variation Exercises .................._.... 91 Introduction — 24 TheRange 25 CHAPTER 8 TheMeanAbsoluteDeviation - 26 TheVarianceandtheStandardDeviation.... 27 More Tests of Hypotheses — TheVarianceandStandardDeviationof Introduction 92 ClassifiedData - 31 TestsofHo:u.—no, NormalPopulation, Exercises..^...........-~ — — 34 a2 Unknown _ 92 Testsaboutthe Meanofa Non-Normal CHAPTER 5 TesPtospualbaotuitontheDifferenceofTwo _. 95 Elementary Probability and the Proportions 97 Binomial Distribution TestsabouttheDifferencesofTwoMeans... 101 Introduction................ ._ 35 Exercises - 106 . .. TABLE OF CONTENTS CHAPTER 9 The Rank-Correlation Coefficient 146 TheSignTest(OnePopulation) 149 Correlation and Regression The WilcoxonSigned-RankTest 152 TheSampleCorrelationCoefficient 107 The Rank-SumTest(TwoPopulations). 154 Computationofr 109 Exercises 157 TestingHypothesesaboutthePopulation CorrelationCoefficient 110 Linear Regression 113 CHAPTER 12 FindingtheRegression(Least-Square)Line 114 The Analysis of Variance TestingHypothesesaboutnina RegressionProblem 117 Introduction 158 TesRteignrgesHsyipoonthPersoebsleambout/3 ina 119 OOnnee--WWaayyAAnnaallyyssiissooffVVaarriiaannccee— 159 Exercises 120 AnotherApproach 164 One-WayAnalysisofVariance, Different SampleSizes 168 CHAPTER 10 Two-WayAnalysisofVariance 173 , Exercises 177 , Confidence Limtts Introduction 121 A NoteonInequalities 122 APPENDIX ConfidenceIntervals forn 123 Some Notes about Desk Calculators 178 Confidence Intervalfortt — 127 List of Selected Symbols 179 ConfidenceIntervalforyt\ —yti. 128 Tables ConfidenceInterval forw\ W2. 133 Area oftheStandardNormalDistribution 180 ConfidenceIntervalforp 134 /-Distribution 182 Exercises 136 X2-Distribution 182 F-Distribution 183 CHAPTER 11 Fisher-z Values 187 SpearmanRank-CorrelationCoefficient... 187 Non-parametric Statistics WilcoxonSigned-Rank Values 187 Introduction 136 Rank-SumCriticalValues 188 TheChi-SquaredDistribution. 137 Answers to Exercises 189 ContingencyTables 143 Index 191 CHAPTER 1 WHAT IS STATISTICS? Inordertostudythesubjectofstatisticsintelligentlyweshouldfirstunderstandwhatthetermmeanstoday, and know something ofits origin. As with most other words, the word "statistics" hasdifferent meanings to different persons. When most people hear the word theythink oftablesoffiguresgiving births,deaths, marriages,divorces, auto- mobile accidents, and so on, such as might be found in the WorldAlmanac, for instance. This isindeed a vitalandcorrect use oftheterm. Infact,theword "statistics" wasfirstapplied totheseaffairsofthestate, to data that government finds necessary for effective planning, ruling, and tax-collecting. Collectors and analyzers of this information were once called "statists," which shows much more clearly than the term "statistician" the original preoccupation with the facts ofthe state. Today, of course, the term "statis—tics" is applied, in this first sense, to nearly any kind of factual informationgivenintermsofnumbers theso-called"factsandfigures." Radioandtelevisionannouncers tell usthat they will "havethestatisticsofthegamein afew minutes," and newspapersfrequentlypublish articles about beautycontests giving the "statistics" ofthe contestants. The term "statistics," however, hasother meanings, and people who have notstudied the subject are relatively unfamiliar with these other meanings. Statistics is a body of knowledge in the area of applied mathematics, with its own symbolism, terminology, content, theorems, and techniques. When people study the subject, statistics, they usually attempt to master some ofthese techniques. Theterm "statistics" hasasecond meaningforthose who have been initiated into the mysteriesofthe subject "statistics." In this second sense, "statistics" are quantities that have been calculated from sample data; asinglequantitythathasbeensocalculatediscalled a"statistic." Forexample, the sample mean isa statistic, as are the sample median and sample mode(alldiscussed in Chapter 3). The sample variance isa statistic, and so is the sample range (both discussed in Chapter 4). The sample correlation coefficient (discussed in Chapter 9) is a statistic, and so on. Wecan summarize these meanings ofthe word "statistics": 1. The public meaning offacts and figures, graphs and charts. The word is plural when used in this sense. 2. Thesubjectitself, withaterminology, methodology, andbodyofknowledgeofitsown.Thewordis singular when used in this sense. 3. Quantitiescalculated fromsampledata.The word isplural when used in thissense. In this book we will not use the word "statistics" at all in the first sense above. When we want to refer to "facts and figures" we will use theterm "observations," or the term "data." We will occasionally refer to a quantity that has been calculated from sampledata as a "statistic." In these cases, we will be using the singular ofthe word "statistics," in the third sense above. Nearly always, when we use the word "statistics" we willmean the subject itself, the body ofknowledge. The methodology of statistics is sufficiently misunderstood to give rise to a number of humorous comments about statistics and statisticians. For example: "A statistician is a person whodraws a mathe- matically precise line from an unwarranted assumption to a foregoneconclusion."Thisstrikes out at two abuses of statistical techniques, although the abuse is not by professional statisticians. In order to apply most statistical techniques, certain assumptions must be made, the number and scope ofthe assumn.ons varying from situation to situation. Perhaps some persons do make assumpt.ons that they not justified, and disguise their doubt. And perhaps, also, some persons do have a conclusio! . ready decided upon, and then choose their sample or "doctor" theirdata in order to "prove" the.r c Each ofthese abuses, when knowinglydone, isdishonest. 6 STATISTICS MADESIMPLE One indictment of the techniques and methodology of statistics says that "statistical analysis has often meant the manipulation ofambiguousdata by means ofdubious methods to solve a problem that hasnot beendenned." Probablythe remarkthat is known best is the one attributed by MarkTwain to Disraeli: 'There are three kinds oflies: lies, damned lies, and statistics." And yet another well-known remark,—critical of the manner in which statistics are used, is, "He uses statistics as adrunk uses a street lamp for support, rather than illumination." THE PRESENT IMPORTANCE OF STATISTICS The application ofstatistical techniques is so widespread, and the influence ofstatistics on our lives and habits so great, that the importance ofstatistics can hardly be overemphasized. Our present agricultural abundance can be partially ascribed to the application of statistics to the design and analysis ofagricultural experiments. This is an area in which statistical techniques were used relativelyearly. Some questions that the methods of statistics help answer are: Which typeofcorngives the best yield? Which feed mixture should chickens be fed so that they will gain the most weight? What kind ofmixtureofgrassseedsgivesthemosttonsofhayper acre? All ofthesequestions, and hundredsof others, have adirect effect on all ofusthrough thelocal supermarket. The methodology of statistics is also used constantly in medical and pharmaceutical research. The effectivenessofnewdrugsisdeterminedbyexperiments,firstonanimals,andthenonhumans.Newdevelop- mentsin medical research and newdrugs affect most ofus. Statistics is used bythe government as well. Economicdata are studied and affect the policies ofthe government in the areas of taxation, funds spent for public works (such as roads, dams, etc.), public assistance funds, and so on. Statistics on unemployment affect efforts to lower the unemployment rate. Statisticalmethods areused toevaluatethe performance ofevery sort ofmilitaryequipment, from bullets used in pistols to huge missiles. Probability theory and statistics (especially a rather new area known as statisticaldecisiontheory)areused asanaidinmakingextremelyimportantdecisionsatthehighestlevels. Inprivateindustry theuses ofstatistics are nearly asimportant and theireffectsnearly aswidespread as they are in government use. Statistical techniques are used to control the quality of products being produced and to evaluate new products before they are marketed. Statistics are used in marketing, in decisions to expand business, in the analysis of the effectiveness of advertising, and so oh. Insurance companies make use ofstatistics inestablishing their rates at a realisticlevel. — The list could go on and on. Statistics is used in geology, biology, psychology, sociology in any areain whichdecisions must be made on the basis ofincomplete information. Statisticsis used in educa- tionaltesting, insafetyengineering. Meteorology, thescienceofweatherprediction, isusingstatisticsnow. Even seemingly unlikely areas use statistics. Who would think that statistics could help a literary scholar or a historical sleuth determine the authorship ofdisputed documents? Perhaps the best-known instance of this is the use of statistical techniques to settle the long controversy over who .wrote those essaysin The FederalistPapers whose authorship had beendisputed. Onthelighterside, statisticalstudieshavebeenmadeoftheeffectofthefullmoonontroutfishing; of which oftwo kinds ofwater glasses are better for use in restaurants; and of the optimum strategies for games ofskill and chance such as bridge, solitaire, blackjack, and baseball. There can be littledoubt, then, ofthe effect ofstatistics and statistical techniques on each ofus.The resultsofstatisticalstudiesareseen,butperhapsnotrealized,inourpaychecks, ournational security, our insurance premiums, our satisfaction with products ofmany kinds, and our health. TWO KINDS OF STATISTICS Inadditiontoabriefconsiderationofthebasicelementsofprobability,therearetwokindsofstatistics treated in this book. In Chapters 2, 3, and 4 we are concerned primarily with thedescription ofdata. In Chapter2 wetreatthe pictorialdescriptionofdata; inChapters3 and 4wetreatthe numericaldescription ofdata. The natural name for this kind ofstatistics isdescriptive statistics. The classification ofdata; the drawingofhistogramsthatcorrespondtothefrequencydistributionsthatresultafterthedataareclassified; WHATISSTATISTICS? 7 the representation ofdata by other sorts of graphs, such as line graphs, bar graphs, pictograms; the computation—ofsamplemeans,medians,ormodes;thecomputationofvariances,meanabsolutedeviations, and ranges all these activities deal with descriptive statistics. The statistical work done back in the nineteenthcentury andtheearlypart ofthiscentury waslargelydescriptive statistics. The secondimportantkind ofstatisticsisknown asinferentialstatistics. Statisticshasbeendescribed asthescienceofmakingdecisionsinthefaceofuncertainty;thatis,makingthebestdecisiononthebasisof incomplete information. In order to make a decision about a population, a sample (usuallyjust a few members) ofthat population is selected from it.The selection is usuallyby a random process. Although there arevariouskindsofsampling, thekind that wewill be assumingthroughoutthisbookisknown as randomsampling.Asthetermsuggests,thisisakindofsamplinginwhichthemembersofthesampleare selected by some sort of process that is not under the control of the experimenter. There are various mathematicaldefinitionsofrandom sampling,butwewillconsiderit asasampleforwhicheachmember ofthe population has anequalchance ofbeing selected, and for whichthe selection ofany one member doesnotaffecttheselectionofanyothermember. Onthebasisoftherandomsample,weinferthingsaboutthepopulation.Thisinferringaboutpopula- tions on the basis ofsamplesisknown asstatisticalinference. In other words, statistical inference is the use ofsamples to reach conclusions about the populations from which those samples have beendrawn. Let us mention several examples of statistical inference. Suppose that a manufacturer of tricycles buys bolts in large quantities. The manufacturer has the right to refuse to accept the shipment ifmore than 3 per cent oftheboltsaredefective.Itisnotfeasible, ofcourse, tocheck all oftheboltsbeforethey areused.Thiswouldtaketoolong.Neitherisitpossibletosimplylayasidethedefectiveboltsastheyare encounteredduringthe assemblyofthetricycles.Theboltscannotbereturned aftertheyhavebeen used, evenif20percentaredefective; and, ofcoursethetricyclemanufacturerdoesnotwanttouseashipment ofboltsthat containsalargepercentageofdefectives, becauseitisexpensivetoattempttouse adefective bolt,realizethatitisdefective,andthendothejobagainwithasatisfactoryboltSoforseveralreasons,the manufacturerneedstohaveaquick,inexpensivemethodbywhichhecandeterminewhethertheshipment containstoomanydefectives.Soheobtainsarandomsamplefromtheshipmentofbolts, and onthebasis ofthepercentage ofdefectives in the sample, hemakesa decision about thepercentage ofdefectives in the population (the shipment).Thisisanexampleofstatisticalinference. Consider another example of statistical inference. A medical research worker wants to determine whether a new drug is superior to the old one. One hundred patients in a large hospital are divided at randomintotwogroups.Onegroupisgiventheolddrugandtheothergroupisgiventhenewdrug.Various medicaldata are obtainedforeachpatientonthedaytheadministration ofthedrugbegan, andthe same things measured tendayslater. By analyzingthedataforeach group, and bycomparingthedata, acon- clusion canbereached about therelativeeffectivenessofthetwodrugs. AsimilarexampleisdiscussedatgreaterlengthatthebeginningofChapter7,inwhichthetestingof statistical hypothesesisfirstdiscussed. Theusual procedurefor testing a statisticalhypothesisisthe following: Ahypothesis, known asthe null hypothesis, is proposed about a population; a random sample isobtained from thepopulation, and a numerical quantity, known as a statistic, is calculated from the sample data. The null hypothesis is accepted or rejected, depending upon the value ofthe statistic. (An alternative hypothesis is formulated atthe sametime asthenull hypothesis, andrejection ofthe nullhypothesismeans automatic acceptance ofthe alternative hypothesis.)Thus, the testing ofa statistical hypothesis is an illustration of statistical inference,becauseadecisionismadeaboutapopulationbymeansofasample.Chapters7,8,9,11,and12 areconcerned(somepartially,someentirely)withtestsofstatisticalhypotheses,andthereforewithstatisti- calinference.Chapter10,inwhichanimportanttopicknownasconfidenceintervalsisdiscussed,alsodeals with statisticalinference. •__*. Insummary, thesubjectmatterinthisbookfallsrathernaturallyintothreecategories.Chapters2, 3, and 4 treat the description ofdata, both graphically and numerically, and are classified as descriptive statistics. Someverysimpletopicsfromp-robabilitytheory are discussed i-n Chapters5 and 6: elementary probability, two importantdistributions the binomial and the normal and how thetwo are related, andtheuseofthenormaltable.The final six chapters treat selected topics, mainly about testing hypothe- ses,fromthatpartofthesubjectmatterofstatisticsknownasinferentialstatistics.Thefirsttwocategories, comprising the first six chapters, arepreliminaryto thelast one. CHAPTER 2 PICTORIAL DESCRIPTION OF DATA INTRODUCTION This chapter is concerned with the presentation ofsampledata. Before treating the classification ofdata and the sketching ofhistograms, we will brieflydiscuss the idea ofarandom sample and how one can be obtained. Ifone is sampling from a population composed ofan infinite number ofelements, a sample selected in such a manner that the selection ofany member ofthe populationdoes not affect the selection ofany other member, and each member has the same chance ofbeing included in the sample, iscalled a random sample. If one is sampling from a finite population with replacement (each member is returned to the populationafterbeingselected, andmightbeselectedmorethanonce), arandomsampleisdefinedexactly as above. Ifoneis samplingfrom afinitepopulation withoutreplacement(theelements arenotreturned to the population after they have been observed), then we say that a sample is a random sample if all other samplesofthesamesizehaveanequalchanceofbeingselected. Nosampleisanymorelikelytobeselected than any other. Theword "random" indicatesthatthe sampleisselected insuch awaythatitisimpossibleto predict which members ofthe population will be included, and that it is simply a matter ofchance that any par- ticular member is selected. In order to applythe statistical techniquesexplained in thisbook in analyzing sample data, it is necessary that the sample be a random one (with very few exceptions). The statistical techniques are justified by statistical theory, which in turn rests upon probability theory, and we must haverandom samples before the probability theory is applicable. SELECTING A RANDOM SAMPLE Itissometimesnot an easy matter to obtain arandom sample. Ifthe population is small, one ofthe simplestwaysofobtainingarandomsampleistolistthemembers(onsmallpiecesofpaper, for instance) and draw the sample "out of a hat." Perhaps you remember seeing the famous picture of the Secretary ofWarintheearly 1940sdrawingfrom alargecontainerthenamesofthefirstmentobeinductedinto the Armyunder thenewlypassed conscription act.Thisis an excellentexample ofdrawing a sample "from a hat." Whenever anintegercanbe assignedtoeachmemberofthepopulation,arandomnumbertable can be used to obtain arandom sample.Thistable is a listing ofdigits that have been obtained by some random process. One way of assigning an integer to each member of the population is simply to number the members 1, 2, 3, and so on. (Sometimes the members cannot be conveniently numbered, in which case there are other methods ofobtaining a random sample by means ofrandom numbers.) Each member of thepopulationhasacorrespondingnumberintherandomnumbertable(orperhaps more than one corre- spondingnumber).Toobtain a random sample, we would begin reading numbers in the random number table at some randomly chosen place, and for each random number read, the member ofthe population that corresponds to that number is included in the sample. For instance, ifour population consists ofa thousand members, wecould assign them numbers from 000 to 999. If we read the numbers 027, 831, and 415 in the random number table, we would include in the random sample those members of the population whose numbers are 027, 831, and 415. ThedatashowninTable2-1 arethescoresthatonehundredstudentsobtainedontheverbalportionof the Scholastic AptitudeTest; we shall refer to these scores as the SAT-Verbal scores. The sample was obtained from a population offreshmen students, using atableofrandom numbersto guarantee that the sample wasrandom. Manystatisticstextbookscontain arandomnumbertable anddiscussitsuse. See Wallisand Roberts, Statistics:A NewApproach,orDixonandMassey,IntroductiontoStatisticalAnalysis.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.