ebook img

Collaborative Statistics Using R PDF

2011·0.31 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Collaborative Statistics Using R

Collaborative Statistics Using R By: Ananda Mahto Collaborative Statistics Using R By: Ananda Mahto Online: < http://cnx.org/content/col11219/1.7/ > C O N N E X I O N S RiceUniversity,Houston,Texas This selection and arrangement of content as a collection is copyrighted by Ananda Mahto. It is licensed under the CreativeCommonsAttribution3.0license(http://creativecommons.org/licenses/by/3.0/). Collectionstructurerevised:January4,2011 PDFgenerated:February6,2011 Forcopyrightandattributioninformationforthemodulescontainedinthiscollection,seep.37. Table of Contents Preface ................................................................................................1 1 SamplingandData 1.1 Introduction ................................................................... ..............3 1.2 KeyTerms ..................................................................... ..............3 1.3 Data .........................................................................................5 1.4 Sampling ...................................................................... ..............6 1.5 VariationandCriticalEvaluation .............................................................9 1.6 Frequency,RelativeFrequency,andCumulativeFrequency ..................................12 1.7 Practice .....................................................................................15 Solutions ........................................................................................18 2 DescriptiveStatistics 2.1 Introduction ................................................................................19 2.2 StemandLeafGraphsandBarGraphs ......................................................19 2.3 Histograms .................................................................................24 2.4 BoxPlots ....................................................................................28 Solutions ........................................................................................32 Glossary ..............................................................................................34 Index .................................................................................................36 Attributions ..........................................................................................37 iv Preface1 About "Collaborative Statistics" Collaborative Statistics was written by Barbara Illowsky and Susan Dean, faculty members at De Anza CollegeinCupertino,California. TheoriginalprefacetothebookaswrittenbyprofessorsIllowskyandDean,nowfollows: This book is intended for introductory statistics courses being taken by students at two– and four– yearcollegeswhoaremajoringinfieldsotherthanmathorengineering. Intermediatealgebraistheonly prerequisite. The book focuses on applications of statistical knowledge rather than the theory behind it. ThetextisnamedCollaborativeStatisticsbecausestudentslearnbestbydoing. Infact,theylearnbestby workinginsmallgroups. Theoldsaying“twoheadsarebetterthanone”trulyapplieshere. Ouremphasisinthistextisonfourmainconcepts: • thinkingstatistically • incorporatingtechnology • workingcollaboratively • writingthoughtfully These concepts are integral to our course. Students learn the best by actively participating, not by just watching and listening. Teaching should be highly interactive. Students need to be thoroughly engaged in the learning process in order to make sense of statistical concepts. Collaborative Statistics provides techniquesforstudentstowriteacrossthecurriculum,tocollaboratewiththeirpeers,tothinkstatistically, andtoincorporatetechnology. Thisbooktakesstudentsstepbystep.Thetextisinteractive.Therefore,studentscanimmediatelyapply whattheyread. Oncestudentshavecompletedtheprocessofproblemsolving,theycantackleinteresting and challenging problems relevant to today’s world. The problems require the students to apply their newlyfoundskills. Thebookalsocontainslabsthatuserealdataandpracticesthatleadstudentsstepby stepthroughtheproblemsolvingprocess. About this custom edition of Collaborative Statistics ThiscustomeditionofCollaborativeStatisticsisdesignedforuseinashortcourseinintroductorystatistics. Additionally,thetextincludesexamplesofhowtousetheR-projectopen-sourcestatisticalpackageforthe calculations. R software was chosen for several reasons. First, it is free. Second, it is relatively easy to learn once youactuallystartusingit. Third,thesoftwareisstableandquiteadvanced;therearemanyfeaturesimple- mentedinRthatarenotfoundincommercialsoftwarepackages. Fourth,Rhasgreatcommunitysupport; ifthereareanyquestionsyoumighthave,therearenumeroususer-groupswhichcanhelpyousolveyour problems. 1Thiscontentisavailableonlineat<http://cnx.org/content/m35060/1.1/>. 1 2 Wehopethatyouenjoytheprocessoflearningaboutstatisticsandsimultaneouslylearninghowtouse R. Chapter 1 Sampling and Data 1.1 Introduction1 1.1.1StudentLearningObjectives Bytheendofthischapter,thestudentshouldbeableto: • Recognizeanddifferentiatebetweenkeyterms. • Applyvarioustypesofsamplingmethodstodatacollection. • Createandinterpretfrequencytables. • BeabletoapplysomebasicfunctionsinRtogeneratesamplesandtosummarizedata. The R functions you will be using in this chapter are cumsum(), cut(), length(), sample(), set.seed(), andtable(). 1.1.2Introduction You are probably asking yourself the question, "When and where will I use statistics?". If you read any newspaperorwatchtelevision,orusetheInternet,youwillseestatisticalinformation. Therearestatistics about crime, sports, education, politics, and real estate. Typically, when you read a newspaper article or watch a news program on television, you are given sample information. With this information, you may makeadecisionaboutthecorrectnessofastatement,claim,or"fact."Statisticalmethodscanhelpyoumake the"besteducatedguess." Since you will undoubtedly be given statistical information at some point in your life, you need to knowsometechniquestoanalyzetheinformationthoughtfully. Thinkaboutbuyingahouseormanaging abudget. Thinkaboutyourchosenprofession. Thefieldsofeconomics, business, psychology, education, biology, law, computer science, police science, and early childhood development require at least a basic understandingofstatistics. Includedinthischapterarethebasicideasofstatistics. Youwillalsolearnhowdataaregatheredand what"good"dataare. Additionally, youwillbeintroducedtosomeverybasicfunctionsinRtohelpyou workmoreefficiently. 1.2 Key Terms2 Instatistics,wegenerallywanttostudyapopulation. Youcanthinkofapopulationasanentirecollection ofpersons,things,orobjectsunderstudy. Tostudythelargerpopulation,weselectasample. Theideaof 1Thiscontentisavailableonlineat<http://cnx.org/content/m35069/1.2/>. 2Thiscontentisavailableonlineat<http://cnx.org/content/m35062/1.1/>. 3 4 CHAPTER1. SAMPLINGANDDATA samplingistoselectaportion(orsubset)ofthelargerpopulationandstudythatportion(thesample)to gaininformationaboutthepopulation. Dataaretheresultofsamplingfromapopulation. Because it takes a lot of time and money to examine an entire population, sampling is a very practi- cal technique. If you wished to compute the overall grade point average at your school, it would make sensetoselectasampleofstudentswhoattendtheschool. Thedatacollectedfromthesamplewouldbe thestudents’gradepointaverages. Inpresidentialelections,opinionpollsamplesof1,000to2,000people are taken. The opinion poll is supposed to represent the views of the people in the entire country. Man- ufacturersofcannedcarbonateddrinkstakesamplestodetermineifa16ouncecancontains16ouncesof carbonateddrink. From the sample data, we can calculate a statistic. A statistic is a number that is a property of the sample. For example, if we consider one math class to be a sample of the population of all math classes, then the average number of points earned by students in that one math class at the end of the term is an exampleofastatistic. Thestatisticisanestimateofapopulationparameter. Aparameterisanumberthat isapropertyofthepopulation. Sinceweconsideredallmathclassestobethepopulation,thentheaverage numberofpointsearnedperstudentoverallthemathclassesisanexampleofaparameter. One of the main concerns in the field of statistics is how accurately a statistic estimates a parameter. Theaccuracyreallydependsonhowwellthesamplerepresentsthepopulation. Thesamplemustcontain thecharacteristicsofthepopulationinordertobearepresentativesample. Weareinterestedinboththe sample statistic and the population parameter in inferential statistics. In a later chapter, we will use the samplestatistictotestthevalidityoftheestablishedpopulationparameter. A variable, notated by capital letters like X and Y, is a characteristic of interest for each person or thing in a population. Variables may be numerical or categorical. Numerical variables take on values with equal units such as weight in pounds and time in hours. Categorical variables place the person or thing into a category. If we let X equal the number of points earned by one math student at the end of a term,thenX isanumericalvariable. IfweletYbeaperson’spartyaffiliation,thenexamplesofYinclude Republican,Democrat,andIndependent. Y isacategoricalvariable. Wecoulddosomemathwithvalues of X (calculatetheaveragenumberofpointsearned,forexample),butitmakesnosensetodomathwith valuesofY(calculatinganaveragepartyaffiliationmakesnosense). Data are the actual values of the variable. They may be numbers or they may be words. Datum is a singlevalue. Twowordsthatcomeupofteninstatisticsareaverageandproportion. Ifyouweretotakethreeexams inyourmathclassesandobtainedscoresof86,75,and92,youcalculateyouraveragescorebyaddingthe threeexamscoresanddividingbythree(youraveragescorewouldbe84.3toonedecimalplace).If,inyour mathclass, thereare40studentsand22aremenand18arewomen, thentheproportionofmenstudents is 22 andtheproportionofwomenstudentsis 18. Averageandproportionarediscussedinmoredetailin 40 40 laterchapters. Example1.1 Definethekeytermsfromthefollowingstudy: Wewanttoknowtheaverageamountofmoney firstyearcollegestudentsspendatABCCollegeonschoolsuppliesthatdonotincludebooks. We randomlysurvey100firstyearstudentsatthecollege. Threeofthosestudentsspent$150, $200, and$225,respectively. Solution ThepopulationisallfirstyearstudentsattendingABCCollegethisterm. ThesamplecouldbeallstudentsenrolledinonesectionofabeginningstatisticscourseatABC College(althoughthissamplemaynotrepresenttheentirepopulation). The parameter is the average amount of money spent (excluding books) by first year college studentsatABCCollegethisterm. The statistic is the average amount of money spent (excluding books) by first year college studentsinthesample. Thevariablecouldbetheamountofmoneyspent(excludingbooks)byonefirstyearstudent.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.