ebook img

Probability and Statistics with R PDF

710 Pages·2008·9.963 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Probability and Statistics with R

PROBABILITY and STATISTICS WITH R C8911_FM.indd 1 3/3/08 1:21:00 PM C8911_FM.indd 2 3/3/08 1:21:00 PM PROBABILITY and STATISTICS WITH R María Dolores Ugarte Ana F. Militino Alan T. Arnholt C8911_FM.indd 3 3/3/08 1:21:00 PM CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2008 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Version Date: 20140904 International Standard Book Number-13: 978-1-58488-892-5 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the valid- ity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or uti- lized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopy- ing, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http:// www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com Preface The authors would like to thank their parents Lola: Pedro and Loli Ana: Carmelo and Juanita Alan: Terry and Loretta for their unflagging support and encouragement. The Book Probability and Statistics with R is a work born of the love of statistics and the advancements that havebeen made in the field as more powerfulcomputers canbe used to performcalculationsand simulationsthat were only dreamedofby those who came before. TheSlanguageanditsderivative,R,havemadethepracticeofstatisticsavailabletoanyone with the time and inclination to do so. Teachers will enjoy the real-world examples and the thoroughly worked out derivations. Those wanting to use this book as a reference work will appreciate the extensive treat- ments on data analysis using appropriate techniques, both parametric and nonparametric. Students who are visual learners will appreciate the detailed graphics and clear captions, while the hands-on learners will be pleased with the abundant problems and solutions. (A solutions manual should be available from Taylor & Francis.) It is our hope that practitioners of statistics at every level will welcome the features of this book and that it will become a valuable addition to their statistics libraries. The Purpose Our primary intention when we undertook this project was to introduce R as a teaching statistical package, rather than just a program for researchers. As much as possible, we have made a great effort to link the statistical contents with the procedures used by R to show consistency to undergraduate students. The reader who uses S-PLUS will also find this text useful, as S-PLUS commands are included with those for R in the vastmajority of the examples. This book is intended to be practical, readable, and clear. It gives the reader real-world examples of how S can be used to solve problems in every topic covered including, but not limited to, general probability in both the univariate and multivariate cases, sampling distributions and point estimation, confidence intervals, hypothesis testing, experimental design, and regression. Most of the problems are taken from genuine data sets rather than created out of thin air. Next, it is unusually thorough in its treatment of virtually every topic,coveringboththetraditionalmethodstosolveproblemsaswellasmanynonparamet- ric techniques. Third, the figures used to explain difficult topics are exceptionally detailed. v vi Finally, the derivations of difficult equations are worked out thoroughly rather than being left as exercises. These features, and many others, will make this book beneficial to any reader interested in applying the S language to the world of statistics. The Program TheSlanguageincludesbothRandS-PLUS.“Rcanberegardedasanimplementationof the S language which was developed at Bell Laboratories by Rick Becker,John Chambers, and Allan Wilks, and also forms the basis of the S-PLUS systems.” (http://cran.r-project.org/doc/manuals/R-intro.html#Preface) The current R is the result of a collaborative effort with contributions from all over the world. R was initially written by Robert Gentleman and Ross Ihaka of the Statistics Department of the University of Auckland. Since mid-1997 there has been a core group with write access to the R source (http://www.r-project.org/—click “Contributors”on the sidebar). Not only is R an outstanding statistical package, but it is offered free of charge and can be downloaded from http://www.r-project.org/. The authors are greatly indebted to the giantsof statistics andprogrammingonwhose shoulderswe have stoodto see whatwe will show the readers of this text. The Content The core of the material covered in this text has been used in undergraduate courses at thePublicUniversityofNavarreforthelasttenyears. Ithasbeenusedtoteachengineering (agricultural, industrial, and telecommunications) and economics majors. Some of the material in this book has also been used to teach graduate students studying agriculture, biology, engineering, and medicine. The book starts with a brief introduction to S that includes syntax, structures, and functions. It is designed to provide an overview of how to use both R and S-PLUS so that even a neophyte will be able to solve the problems by the end of the chapter. Chapter 2, entitled “Exploring Data,”covers important graphical and numerical descrip- tive methods. This chapter could be used to teach a first course in statistics. Thenextthreechaptersdealwithprobabilityandrandomvariablesinagenerallyclassical presentationthatincludesmanyexamplesandanextensivecollectionofproblemstopractice all that has been learned. Chapter 6 presents some important statistics and their sampling distributions. Solving the exerciseswillgive any readerconfidence that the difficult topics coveredin this chapter are understood. Thenextfourchaptersencompasspointestimation,confidenceintervals,hypothesistest- ing,anda wide rangeof nonparametricmethods including goodness-of-fittests, categorical data analysis, nonparametric bootstrapping, and permutation tests. Chapter 11 provides an introduction to experimental design using fixed and random effects models as well as the randomized block design and the two-factor factorial design. Thebookendswithachapteronsimpleandmultipleregressionanalysis. Theprocedures from this chapter are used to solve three interesting case studies based on real data. The Fonts Knowing several typographical conventions will help the reader in understanding the materialpresentedinthistext. Rcodeisdisplayedinamonospacedfontwiththe>symbol in front of commands that are entered at the R prompt. vii > x<-0.28354 > round(x,2) [1] 0.28 The same font is used for data sets and functions, though functions are followed by (). For example, the PASWR package but the round() function would be shown. Throughout the text, a is found at the end of solutions to examples. In the index, page numbers in BOLD are where the primary occurrences of topics are found, while those in ITALICS indicate the pages where a problem about a topic or using a given data set can be located. The Web Thistextissupportedathttp://www1.appstate.edu/∼arnholta/PASWRontheInternet. Thewebsite hasup-to-dateerrata,chapterscripts,andacopyofthe PASWRpackage(which is also on CRAN) available for download. Acknowledgments WegratefullyacknowledgetheinvaluablehelpprovidedbySusieArnholt. Herwillingness to apply her expertise in LATEX and knowledge of English grammar to the production of this book is appreciated beyond words. Several people were instrumental in improving the overall readability of this text. The recommendationsmadebyPhilSpector,theApplicationsManagerandSoftwareConsultant for the Statistical Computing Facility in the Department of Statistics at the University of CaliforniaatBerkeley,whoreviewedthis textforTaylor& Francis,wereusedinimproving much of the original R code as well as decreasing the inevitable typographical error rate. Toma´sGoicoa,a member of the SpatialStatistics ResearchGroupatthe Public University of Navarre, was of great help in preparing and checking exercises. Celes Alexander, an Appalachian State University graduate student, graciously read the entire text and found several typos. Any remaining typos or errors are entirely the fault of the authors. Thanksto our editoratTaylor& Francis,DavidGrubbs, forembracingandencouraging our project. Many thanks the Statistics and Operations Research Department at Public University of Navarre and to the Department of Mathematical Sciences at Appalachian State University for the support they gave us in our writing of this text. The“Youchoose,youdecide”initiativesponsoredbyCajaNavarraalsoprovidedfunding forin-personcollaborations. ThankstotheUniversidadNacionaldeEducaci´onaDistancia, inparticulartheCentroAsociadodePamplona,forallowingustopresentthisprojectunder their auspices. Special thanks to Jos´e Luis Iriarte, the former Vicerector of International Relations of the Public University of Navarre, and to T. Marvin Williamsen, the former Associate Vice Chancellor for International Programs at Appalachian State University. These men were instrumentalingainingfundingandsupportforseveralin-personcollaborationsincludinga year-longvisit at the Public University of Navarrefor the third author and two multi-week visits for the first two authors to Appalachian State University. Finally, to the geniuses of this age who first conceived of the idea of an excellent open source software for statistics and those who reared the idea to adulthood, our gratitude is immeasurable. May the lighthouse of your brilliance guide travelers on the ocean of statistics for decades to come. Thank you, R Core Team. Contents 1 A Brief Introduction to S 1 1.1 The Basics of S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Using S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.3 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 Data Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4.1 S Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4.2 Mathematical Operations . . . . . . . . . . . . . . . . . . . . . . . 4 1.4.3 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4.4 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4.5 Reading Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.4.5.1 Using scan() . . . . . . . . . . . . . . . . . . . . . . . . 7 1.4.5.2 Using read.table(). . . . . . . . . . . . . . . . . . . . 8 1.4.5.3 Using write() . . . . . . . . . . . . . . . . . . . . . . . 8 1.4.5.4 Using dump() and source() . . . . . . . . . . . . . . . 9 1.4.6 Logical Operators and Missing Values . . . . . . . . . . . . . . . . 9 1.4.7 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4.8 Vector and Matrix Operations . . . . . . . . . . . . . . . . . . . . 14 1.4.9 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.4.10 Lists. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.4.11 Data Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.4.12 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.4.13 Functions Operating on Factors and Lists . . . . . . . . . . . . . . 19 1.5 Probability Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.6 Creating Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.7 Programming Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.8 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 1.9 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2 Exploring Data 29 2.1 What Is Statistics? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.3 Displaying Qualitative Data . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.3.1 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.3.2 Barplots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.3.3 Dot Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.3.4 Pie Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.4 Displaying Quantitative Data . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.4.1 Stem-and-Leaf Plots . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.4.2 Strip Charts (R Only) . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.4.3 Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.5 Summary Measures of Location . . . . . . . . . . . . . . . . . . . . . . . . 39 2.5.1 The Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.5.2 The Median . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 ix

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.