Statistics and Computing Series Editors: J. Chambers W. Eddy W. Ha¨rdle S. Sheather L. Tierney Statistics and Computing Gentle:NumericalLinearAlgebraforApplicationsinStatistics. Gentle:RandomNumberGenerationandMonteCarloMethods. Ha¨rdle/Klinke/Turlach:XploRe:AnInteractiveStatisticalComputingEnvironment. Krause/Olson:TheBasicsofS-PLUS,3rdEdition. Lange:NumericalAnalysisforStatisticians. Loader:LocalRegressionandLikelihood. O´ Ruanaidh/Fitzgerald:NumericalBayesianMethodsAppliedtoSignalProcessing. Pannatier:VARIOWIN:SoftwareforSpatialDataAnalysisin2D. Pinheiro/Bates:Mixed-EffectsModelsinSandS-PLUS. Venables/Ripley:ModernAppliedStatisticswithS-PLUS,3rdEdition. Venables/Ripley:SProgramming. Wilkinson:TheGrammarofGraphics. Andreas Krause Melvin Olson The Basics of S-P LUS Third Edition With 94 Illustrations AndreasKrause MelvinOlson NovartisPharmaAG NovartisPharmaAG Biostatistics Biostatistics 4002Basel 4002Basel Switzerland Switzerland [email protected] [email protected] SeriesEditors: J.Chambers W.Eddy W.Ha¨rdle BellLabs,LucentTechnologies DepartmentofStatistics Institutfu¨rStatistikundO¨konometrie 600MountainAve. CarnegieMellonUniversity Humboldt-Universita¨tzuBerlin MurrayHill,NJ07974 Pittsburgh,PA15213 SpandauerStr.1 USA USA D-10178Berlin Germany S.Sheather L.Tierney AustralianGraduateSchool SchoolofStatistics ofManagement UniversityofMinnesota UniversityofNewSouthWales VincentHall Sydney,NSW2052 Minneapolis,MN55455 Australia USA LibraryofCongressCataloging-in-PublicationData Krause,Andreas. ThebasicsofS-PLUS/AndreasKrause,MelvinOlson.—3rded. p.cm.—(Statisticsandcomputing) Rev.ed.of:ThebasicsofSandS-PLUS.2nded.c2000. Includesbibliographicalreferencesandindex. ISBN0-387-95456-2(pbk.:alk.paper)) 1.S-PLUS. 2.Mathematicalstatistics—Dataprocessing. I.Olson,Melvin. II.Krause, Andreas.BasicsofSandS-PLUS. III.Title. IV.Series. QA276.4.K73 2002 519.5′0285′53—dc21 2002021152 ISBN0-387-95456-2 Printedonacid-freepaper. ©2002,2000,1997Springer-VerlagNewYork,Inc. All rights reserved. This work may not be translated or copied in whole or in part without the writtenpermissionofthepublisher(Springer-VerlagNewYork,Inc.,175FifthAvenue,NewYork, NY10010,USA),exceptforbriefexcerptsinconnectionwithreviewsorscholarlyanalysis.Use inconnection withany formof informationstorageand retrieval,electronic adaptation,computer software,orbysimilarordissimilarmethodologynowknownorhereafterdevelopedisforbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if theyarenotidentifiedassuch,isnottobetakenasanexpressionofopinionastowhetherornot theyaresubjecttoproprietaryrights. PrintedintheUnitedStatesofAmerica. 9 8 7 6 5 4 3 2 1 SPIN10868549 www.springer-ny.com Springer-Verlag NewYork Berlin Heidelberg AmemberofBertelsmannSpringerScience+BusinessMediaGmbH Preface Only five years have passed since we published the first edition, and here is the third edition already. We are very grateful to all our readers, in particular those sending us suggestions, comments, and any other kind of feedback. S-Plus(andR)havegainedmuchpopularityinthepastfewyears.This popularityisdocumentedbythemanybooksthatappearedrecently:“An- alyzingMedicalDataUsing SandS-Plus,”byEverittandRabe-Hesketh; “Regression Modeling Strategies,” by Harrell; “Applied Statistics in the Pharmaceutical Industry,” edited by Millard and Krause; “Mixed Effects ModelsinSandS-Plus,”byPinheiroandBates;“ModelingSurvivalData: Extending the Cox Model,” by Therneau and Grambsch; and the fourth edition of “Modern Applied Statistics with S-Plus” as well as the second edition of ”S Programming,” both by Venables and Ripley. The secondedition of our book was basedon S-Plus for Windows 2000 andS-PlusforUNIX 5.1.The formeris basedonthe S LanguageVersion 3, whereas the latter already incorporatesS Version 4. With the release of S-Plus Version 6 for UNIX and Windows, both systems are based on the same S Versionand evenon the same code base. ThisbookwaswrittenusingS-PlusforWindows6.0andS-PlusforLinux 6.0 (on SuSE Linux). Wehaveupdatedthemanuscripttoreflectchangessuchastheintroduc- tion of a graphicaluser interface (GUI) for S-Plus for UNIX, new import and export formats for data, and new graphics formats. In particular, the GUI under Windows and the new UNIX GUI are covered side-by-side. More emphasis was put on Trellis-type graphics, and a new section on the vi Preface detailsoffactor objectswasadded.TheRsystemis treatedin moredetail in a new chapter, although on a basic level it is not much different from S-Plus, and this book serves well as an introduction to R. The basic structure of the book remains. If you have never worked with S-Plus before and want to get a first impression, take a look at “A First Session,” “A Second Session,” and “Exploring Data” to get an idea of the philosophy of the programming language. If you intend to work with point-and-click graphical user interfaces, take a look at “Graphical User Interfaces.”Weareconvincedthoughthatifyouwanttoworkpermanently withS-Plus,youwillneedtousethecommandlineandappreciateitsoon. Thebookoriginatesfromlecturessuchthateachchaptercanserveasthe basis for a 90-minute lecture. The exercises reinforce the material covered and sometimes point out a few more details. It should be interesting to workoutasolutiontoanexercisebeforecomparingittothesolutiongiven here. There’s never a single solution. Finally, we would like to thank all those who have directly or indirectly contributed to this edition. David Smith at Insightful coordinated a very thorough alpha and beta testing period for S-Plus 6 and provided much appreciated support. Maria Beth Silkey (Predict AG, Reinach, Switzer- land), Martin Ma¨chler (ETH Zu¨rich), and Tony Rossini (University of Washington,Seattle)readpartsofthemanuscriptandprovidedveryuseful comments. It continues to be a pleasure to work with John Kimmel as editor and the professional team at Springer-Verlag. If you would like to provide any kind of comment, you are wel- come to send an E-mail to [email protected]/or [email protected] to provide general support for S-Plus or R. Finally, a Web page is set up to accompany this book: http://www2.active.ch/∼krause.a/doc/splus-book/ Basel, Switzerland Andreas Krause and Melvin Olson April 2002 Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix 1 Introduction 1 1.1 The History of S and S-Plus . . . . . . . . . . . . . . . 2 1.2 S-Plus on Different Operating Systems . . . . . . . . . 4 1.3 Notational Conventions . . . . . . . . . . . . . . . . . . . 6 2 Graphical User Interface 7 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 System Overview . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 Using a Mouse. . . . . . . . . . . . . . . . . . . . 9 2.2.2 Object Explorer . . . . . . . . . . . . . . . . . . . 9 2.2.3 Commands Window . . . . . . . . . . . . . . . . 9 2.2.4 Toolbars . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.5 Graph Sheets . . . . . . . . . . . . . . . . . . . . 10 2.2.6 Script Window . . . . . . . . . . . . . . . . . . . 10 2.3 Getting Started with the Interface . . . . . . . . . . . . . 11 2.3.1 Importing Data . . . . . . . . . . . . . . . . . . . 11 2.3.2 Graphs . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.3 Data and Statistics . . . . . . . . . . . . . . . . . 13 2.3.4 Customizing the Toolbars . . . . . . . . . . . . . 13 2.3.5 Chapters . . . . . . . . . . . . . . . . . . . . . . . 14 2.4 Detailed Use of the GUI Interface . . . . . . . . . . . . . 16 viii Contents 2.5 Object Explorer . . . . . . . . . . . . . . . . . . . . . . . 16 2.6 Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.7 Data Export . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.8 Working Directory . . . . . . . . . . . . . . . . . . . . . 21 2.9 Data Import . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.10 Data Summaries . . . . . . . . . . . . . . . . . . . . . . . 25 2.11 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.12 Trellis Graphs . . . . . . . . . . . . . . . . . . . . . . . . 34 2.13 Linear Regression . . . . . . . . . . . . . . . . . . . . . . 36 2.14 PowerPoint(Windows Only) . . . . . . . . . . . . . . . . 41 2.15 Excel (Windows Only) . . . . . . . . . . . . . . . . . . . 41 2.16 Script Window. . . . . . . . . . . . . . . . . . . . . . . . 43 2.17 UNIX/Linux GUI . . . . . . . . . . . . . . . . . . . . . . 45 2.18 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.19 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 2.20 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3 A First Session 71 3.1 General Information. . . . . . . . . . . . . . . . . . . . . 71 3.1.1 Starting and Quitting . . . . . . . . . . . . . . . . 72 3.1.2 The Help System . . . . . . . . . . . . . . . . . . 72 3.1.3 Before Beginning . . . . . . . . . . . . . . . . . . 73 3.2 Simple Structures . . . . . . . . . . . . . . . . . . . . . . 74 3.2.1 Arithmetic Operators . . . . . . . . . . . . . . . . 74 3.2.2 Assignments . . . . . . . . . . . . . . . . . . . . . 75 3.2.3 The Concatenate Command: c. . . . . . . . . . . 77 3.2.4 The Sequence Command: seq . . . . . . . . . . . 78 3.2.5 The Replicate Command: rep . . . . . . . . . . . 79 3.3 Mathematical Operations . . . . . . . . . . . . . . . . . 80 3.4 Use of Brackets . . . . . . . . . . . . . . . . . . . . . . . 81 3.5 Logical Values . . . . . . . . . . . . . . . . . . . . . . . . 83 3.6 Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 3.8 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4 A Second Session 93 4.1 Constructing and Manipulating Data . . . . . . . . . . . 93 4.1.1 Matrices . . . . . . . . . . . . . . . . . . . . . . . 94 4.1.2 Arrays . . . . . . . . . . . . . . . . . . . . . . . . 99 4.1.3 Data Frames . . . . . . . . . . . . . . . . . . . . . 102 4.1.4 Lists . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.2 Introduction to Functions . . . . . . . . . . . . . . . . . 106 4.3 Introduction to Missing Values . . . . . . . . . . . . . . . 106 4.4 Merging Data . . . . . . . . . . . . . . . . . . . . . . . . 108 4.5 Putting It All Together . . . . . . . . . . . . . . . . . . . 108 Contents ix 4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.7 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5 Graphics 119 5.1 Basic Graphics Commands . . . . . . . . . . . . . . . . . 119 5.2 Graphics Devices . . . . . . . . . . . . . . . . . . . . . . 120 5.2.1 Working with Multiple Graphics Devices . . . . . 122 5.3 Plotting Data . . . . . . . . . . . . . . . . . . . . . . . . 122 5.3.1 The plot Command . . . . . . . . . . . . . . . . 123 5.3.2 Modifying the Data Display . . . . . . . . . . . . 124 5.3.3 Modifying Figure Elements. . . . . . . . . . . . . 124 5.4 Adding Elements to Existing Plots . . . . . . . . . . . . 125 5.4.1 Functions to Add Elements to Graphs . . . . . . 127 5.4.2 More About abline . . . . . . . . . . . . . . . . 128 5.4.3 More on Adding Axes . . . . . . . . . . . . . . . 129 5.4.4 Adding Text to Graphs . . . . . . . . . . . . . . . 131 5.5 Setting Options . . . . . . . . . . . . . . . . . . . . . . . 131 5.6 Figure Layouts . . . . . . . . . . . . . . . . . . . . . . . 133 5.6.1 Layouts Using Trellis Graphs . . . . . . . . . . . 134 5.6.2 Matrices of Graphs . . . . . . . . . . . . . . . . . 134 5.6.3 Multiple-Screen Graphs. . . . . . . . . . . . . . . 135 5.6.4 Figures of Specified Size . . . . . . . . . . . . . . 136 5.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 5.8 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 6 Trellis Graphics 145 6.1 An Example . . . . . . . . . . . . . . . . . . . . . . . . . 146 6.2 Trellis Basics. . . . . . . . . . . . . . . . . . . . . . . . . 148 6.2.1 Trellis Syntax . . . . . . . . . . . . . . . . . . . . 148 6.2.2 Trellis Functions . . . . . . . . . . . . . . . . . . 149 6.2.3 Displaying and Storing Graphs . . . . . . . . . . 149 6.3 Output Devices . . . . . . . . . . . . . . . . . . . . . . . 150 6.4 Customizing Trellis Graphs. . . . . . . . . . . . . . . . . 152 6.4.1 Setting Options . . . . . . . . . . . . . . . . . . . 152 6.4.2 Arranging the Layout of a Trellis Graph . . . . . 153 6.4.3 Layout . . . . . . . . . . . . . . . . . . . . . . . . 153 6.4.4 Ordering of Graphs . . . . . . . . . . . . . . . . . 155 6.4.5 Changing Graph Elements . . . . . . . . . . . . . 156 6.4.6 Modifying Panel Strips . . . . . . . . . . . . . . . 156 6.4.7 Arranging Several Graphs on a Single Page . . . 157 6.4.8 Updating Existing Trellis Graphs . . . . . . . . . 158 6.4.9 Writing Panel Functions . . . . . . . . . . . . . . 159 6.5 Further Hints . . . . . . . . . . . . . . . . . . . . . . . . 162 6.5.1 Graphing Individual Profiles . . . . . . . . . . . . 162 6.5.2 Preparing Data to Use for Trellis . . . . . . . . . 163 x Contents 6.5.3 The subset Option . . . . . . . . . . . . . . . . . 164 6.5.4 The key Option . . . . . . . . . . . . . . . . . . . 164 6.5.5 The subscripts Option in Panel Functions . . . . 165 6.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 6.7 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 7 Exploring Data 179 7.1 Descriptive Data Exploration . . . . . . . . . . . . . . . 179 7.2 Graphical Exploration . . . . . . . . . . . . . . . . . . . 190 7.2.1 Interactive Dynamic Graphics . . . . . . . . . . . 205 7.2.2 Old-Style Graphics . . . . . . . . . . . . . . . . . 205 7.3 Distributions and Related Functions . . . . . . . . . . . 207 7.4 Confirmatory Statistics and Hypothesis Testing . . . . . 212 7.5 Missing and Infinite Values . . . . . . . . . . . . . . . . . 217 7.5.1 Testing for Missing Values . . . . . . . . . . . . . 218 7.5.2 Supplying Data with Missing Values to Functions 218 7.5.3 Missing Values in Graphs . . . . . . . . . . . . . 219 7.5.4 Infinite Values . . . . . . . . . . . . . . . . . . . . 220 7.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 7.7 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 8 Statistical Modeling 237 8.1 Introductory Examples . . . . . . . . . . . . . . . . . . . 237 8.1.1 Regression . . . . . . . . . . . . . . . . . . . . . . 237 8.1.2 Regression Diagnostics . . . . . . . . . . . . . . . 239 8.2 Statistical Models . . . . . . . . . . . . . . . . . . . . . . 241 8.3 Model Syntax . . . . . . . . . . . . . . . . . . . . . . . . 242 8.4 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 243 8.4.1 Linear Regression and Modeling Techniques . . . 244 8.4.2 ANOVA . . . . . . . . . . . . . . . . . . . . . . . 247 8.4.3 Logistic Regression . . . . . . . . . . . . . . . . . 249 8.4.4 Survival Data Analysis . . . . . . . . . . . . . . . 251 8.4.5 Endnote . . . . . . . . . . . . . . . . . . . . . . . 253 8.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 8.6 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 9 Programming 271 9.1 Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 9.1.1 Adding and Deleting List Elements . . . . . . . . 273 9.1.2 Naming List Elements . . . . . . . . . . . . . . . 274 9.1.3 Applying the Same Function to List Elements . . 276 9.1.4 Unlisting a List . . . . . . . . . . . . . . . . . . . 280 9.1.5 Generating a List by Using split . . . . . . . . . 280 9.2 Writing Functions . . . . . . . . . . . . . . . . . . . . . . 280 9.2.1 Documenting Functions . . . . . . . . . . . . . . 283