Use R! Advisors: RobertGentleman KurtHornik GiovanniParmigiani Forothertitlespublishedinthisseries,goto http://www.springer.com/series/6991 · Richard M. Heiberger Erich Neuwirth R Through Excel A Spreadsheet Interface for Statistics, Data Analysis, and Graphics 123 RichardM.Heiberger ErichNeuwirth DepartmentofStatistics UniversityofVienna TempleUniversity Fakulta¨tfu¨rInformatik PhiladelphiaPA19122 Dr.-Karl-Lueger-Ring1 USA A-1010Vienna [email protected] Austria [email protected] SeriesEditors RobertGentleman KurtHornik PrograminComputationalBiology Wirtschaftsuniversita¨tWien,Vienna DivisionofPublicHealthSciences Austria FredHutchinsonCancerResearchCenter 1100FairviewAvenue N.M2-B876 Seattle,Washington98109 USA GiovanniParmigiani JohnsHopkinsUniversity Baltimore,MD USA ISBN978-1-4419-0051-7 e-ISBN978-1-4419-0052-4 DOI10.1007/978-1-4419-0052-4 SpringerDordrechtHeidelbergLondonNewYork LibraryofCongressControlNumber:2009929743 (cid:2)c SpringerScience+BusinessMedia,LLC2009 Allrightsreserved.Thisworkmaynotbetranslatedorcopiedinwholeorinpartwithoutthewritten permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY10013,USA),exceptforbriefexcerptsinconnectionwithreviewsorscholarlyanalysis.Usein connection with any form of information storage and retrieval, electronic adaptation, computer software,orbysimilarordissimilarmethodologynowknownorhereafterdevelopedisforbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not theyaresubjecttoproprietaryrights. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Let’snotkidourselves:Themostwidelyused pieceofsoftwareforstatisticsisExcel. BrianD.Ripley “StatisticalMethodsNeedSoftware:AView ofStatistical Computing.” Openinglecture RoyalStatistical Society2002,Plymouth (September2002). Preface Abstract MS Excel, the most widely available spreadsheet on MS Windows ma- chines,isoftenusedfordatacollection,manipulation,andstorage.Elementaryand medium-complexitymathematicalandstatisticalfunctionsareincludedwithExcel. More advanced and highly reliable statistical analysis in Excel requires an add-in package. R is one of the best statistics programs available. It is an extensible sys- tem of software facilities for data manipulation, statistical analysis, and graphical display. With RExcel, the entire R environment (including more than a thousand contributedpackages)canbetreatedasanextensionofExcel. This book is a supplementary text to any introductory course in statistics. The book supports the instructor by giving students step-by-step screenshots showing accesstostate-of-the-artstatisticalcomputationsinRdirectlyfromthemenubarin Excel. Thebookcanalsobeusedasacomputationalintroductionbydataanalystswho alreadyhavebasicstatisticalskills. Risaprogramforstatisticalanalysisandgraphicaldisplayofdata. Risoneofthebestprogramsforstatisticalanalysisandgraphicaldisplayofdata. It is maintained and distributed by an international team of statisticians and com- puter scientists working in universities and industry. R is one of the major tools usedinstatisticalresearchandinapplicationsofstatisticsinscience,socialscience, economics,andbusiness.Rusedinbothacademiaandindustry. Amongotherthings,Rhas • datahandlingandstoragefacilities. • asuiteofoperatorsforcalculationsonarrays,inparticularmatrices. • alarge,coherent,integratedcollectionofintermediatetoolsfordataanalysis. vii viii Preface • graphical facilities for data analysis and display either directly at the computer oronhardcopy. • awell-developed,powerful,andeffectiveprogramminglanguage(calledS)that includes conditionals, loops, user-defined functions (including recursive func- tions), functions for creating complex data structures, and input and output fa- cilities.(Indeed,mostofthesystem-suppliedfunctionsarethemselveswrittenin theSlanguage.) • Alargeselectionofdemonstrationdatasetsusedintheillustrationofmanysta- tisticalmethods. Excelisthemostwidelyusedspreadsheetprogram. MicrosoftExcel(cid:2)R [Microsoft,2008a]isthemostwidelyavailablespreadsheet.En- tering data, cleaning data, and simple data processing (including simple statistics) are very easily done on spreadsheets. As a consequence, much statistical data is availableas,orevencreatedin,Excelworksheets. Spreadsheets have a different paradigm for representing mathematical formu- las than statistical (and mathematical) programming languages. The spreadsheet paradigm is much more visual and action-oriented than the functional or proce- dural paradigm of statistical programming languages. This problem of different paradigms can be overcome. In this book, we illustrate some of the ways the two paradigmscanbemadetoworkwitheachother.[NeuwirthandArganbright,2004] discussindetailhowtorepresentthedevelopmentandstructureofspreadsheetsin printedformandhowspreadsheetscanbeusedtodoseriousmathematicalwork. Exceliseasytouse,butstatisticianshavefoundithassomedeficiencies inthe area of numerical precision. Statistical software is usually perceived as difficult to learn.Thiscanbeamajorobstacleforpotentialusersofadvancedstatisticalmeth- ods.Asthisbookshows,usingRwithinExcelallowsaccesstoboththeeasy-to-use toolsfordataentryandmanipulationavailableinExcelandthepowerandprecision oftheadvancedstatisticalmethodsavailableviaR. RExcelisaninterfaceprogramthatusesRasanadd-in toExcel. RExcel is an add-in to Excel on MS Windows [Microsoft,2008b] machines that allows the use of R as a “helper application” for Excel. Data can be transferred betweenExcelandR(inbothdirections),andExcelcancallRfunctionstoperform calculationsandthentransfertheresultstoExcel. RExceloffersthefollowingfeatures: • allows the use of R functions in Excel cell formulas, effectively controlling R calculationsfromExcel’sautomaticrecalculationmechanism. • connectsRdataframesandExceldatalists. Preface ix • handlesmissingdata. • allowsthecreationofastandaloneRExcelapplicationthathidesRalmostcom- pletelyfromtheuserandusesExcelasthemaininterfacetoR.(Instructionsare giveninRExcel’shelpfile.) • if R Commander [Fox et al., 2007] is available, RExcel optionally places the RcmdrmenusonanExcelmenubar.AnymenuitemintegratedintotheRcmdr menuusinganRcmdrpluginwillalsobeavailableontheRExcelRcmdrmenu. • works with the statconn (D)COM server (previously called R(D)COM server) server,turningRintoan(invisible)backgroundserverforExcel. • works with the rcom package, turning R into a (visible) foreground server for Excel.Usingthisconfiguration,theusercanaccessthesameinstanceofReither fromExcelorfromthecommandlineinanRGUIConsolewindow. • supportsRprocessesrunningunderthecontrolofRServerManager.Rserveris attached to Excel from a server pool. Different instances of Excel (running on different machines) may access the same R process with the same data. We do notusethiscapabilityinthisbook. TheRExcelinterfaceisdescribedin[BaierandNeuwirth,2007].RExcelisbuilt on the the rcom and statconn (D)COM (previously called R(D)COM) packages, which we use for communication between R and the Microsoft Office software [Baier,2007].BasicinformationontheinstallationofR,RExcel,andRcmdrisin AppendixA.FullinformationonRExcelisavailableathttp://rcom.univie.ac.at/. RcmdrisanRpackagethatprovidesGUImenuaccesstoR. Rcmdr (R Commander) is a platform-independent menu interface to R. The menu itemsimplementedbyRcmdrareprimarilydesignedforintroductorycourses.They canbeextendedbytheRcmdrpluginfacilitytoprovideaclickablegraphicaluser interface(GUI)toanystatisticalprocedurecodedinR. Audience Therearetwoaudiencesforthisbook: 1. studentslearningstatistics. 2. peopleanalyzingdata. x Preface Students Introductorycoursesinstatistics,andintroductorystatisticscomponentsofcourses inallothersubjects,requireaccesstoasoftwaresystemforthecollectionandanal- ysisofdata. This book is a supplementary text for an introductory course in statistics. We includeexamplesofallthestandarddataanalysistechniquesthatareintroducedin such courses. We also include some of the elementary probability examples from thosecourses. Manyexamplesarestructuredparalleltosimilarpresentationsforothersoftware thatappearsinsuchtexts.Theoutlineforsuchexamplesisasfollows: 1. ReaddataintoExcelfromatextbookCD. 2. PutdataintoRfromtheRExcelmenu. 3. ConstructstandardanalysistablesandgraphsfromtheRcmdrmenuinstalledin theExcelmenubar. 4. Cutandpastethetablesandgraphsintoadocumentdescribingtheresultsofthe analysis. Wehavetwoworksheets,usingExcelandRonly,thatareusedtoillustratethe fundamental concepts of hypothesis testing, the construction and interpretation of confidence intervals, and the ideas behind least-squares fitting. We have several otherworksheetsthatareusedtoillustratedatatransferbetweenRandExcelandto illustrateadditionalstatisticaltechniques. DataAnalysts SeetheStudentssectionaboveforanintroductiontotheuseofRExcelandRcmdr toaccessmanyoftheanalysisandgraphicalcapabilitiesofR.Itispossibletowrite additionalmenuitemstoaccessspecializedfunctionswritteninRdirectlyfromthe RcmdrmenuinstalledintheExcelmenubar.SeetheRExcelandRcmdrdocumen- tationfordetails. UpdatesandAdditionalInformation RExcelhasaWikiathttp://rcom.univie.ac.at/. Update material for this book will be available from the book’s website at Springerhttp://www.springer.com/978-1-4419-0051-7. AvideoonRExcel,includingboththematerialinthisbookandadditionalma- terial,isavailableathttp://rcom.univie.ac.at/RExcelDemo/. Preface xi Acknowledgments Firstandforemost,wehavetothankThomasBaier,withoutwhosework[thercom package and the statconn (D)COM server (previously called R(D)COM server)] RExcel and the book built on it would not have been possible. It should be noted that his design, now more than 10 years old, has not needed any change—a very uncommoneventinthesoftwareworld. Christian Ritter has been the premier user of RExcel, and he has contributed manyideastothedesignofthesystem. WewishtoacknowledgeourstudentsatTempleUniversityandtheUniversityof Vienna who have used preliminary versions of this book and the RExcel software in class and made many helpful suggestions that have been incorporated into this version.WewishtothankBurtHollandatTempleUniversityforteachingwithan earlier version of the book and software. Our early experiences using RExcel in teachingaredescribedin[Baieretal.,2006]. WewishtoacknowledgeJohnFoxofMcMasterUniversitybothfortheRcmdr menu system [Foxetal.,2007] and especially for his willingness to incorporate changes into his system that were needed to make Rcmdr and RExcel cooperate. Rcmdrwasdesignedasaplatform-independentmenusystem.Wehavemovedthe RcmdrmenutotheExceltoolbaraspartofourintegrationofExcelandR. WewishtothankRCorefortheRprogram[RDevelopmentCoreTeam,2008]. PhiladelphiaandVienna RichardM.Heiberger July2009 ErichNeuwirth