ebook img

Using R for Introductory Statistics PDF

413 Pages·2004·3.205 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Using R for Introductory Statistics

Using R for Introductory Statistics Using R for Introductory Statistics John Verzani CHAPMAN & HALL/CRC A CRC Press Company Boca Raton London New York Washington, D.C. This edition published in the Taylor & Francis e-Library, 2005. “To purchase your own copy of this or any of Taylor & Francis or Routledge’s collection of thousands of eBooks please go to http://www.ebookstore.tandf.co.uk/.” Library of Congress Cataloging-in-Publication Data Verzani, John. Using R for introductiory statistics/John Verzani. p. cm. Includes index. ISBN 1-58488-4509 (alk. paper) 1. Statistics— Data processing. 2. R (Computer program language) I. Title QA276.4.V47 2004 519.5—dc22 2004058244 This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use. Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage or retrieval system, without prior permission in writing from the publisher. The consent of CRC Press does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific permission must be obtained in writing from CRC Press for such copying. Direct all inquiries to CRC Press, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation, without intent to infringe. Visit the CRC Press Web site at www.crcpress.com © 2005 by Chapman & Hall/CRC Press No claim to original U.S. Government works ISBN 0-203-49989-1 Master e-book ISBN ISBN 0-203-59470-3 (Adobe e-Reader Format) International Standard Book Number 1-58488-4509 (Print Edition) Library of Congress Card Number 2004058244 Contents 1 Data 1 2 Univariate data 31 3 Bivariate data 67 4 Multivariate Data 102 5 Describing populations 138 6 Simulation 161 7 Confidence intervals 178 8 Significance tests 207 9 Goodness of fit 239 10 Linear regression 264 11 Analysis of variance 298 12 Two extensions of the linear model 327 A Getting, installing, and running R 343 B Graphical user interfaces and R 348 C Teaching with R 354 D More on graphics with R 356 E Programming in R 369 Index 392 Preface What is R? R is a computer language for statistical computing similar to the S language developed at Bell Laboratories. The R software was initially written by Ross Ihaka and Robert Gentleman in the mid 1990s. Since 1997, the R project has been organized by the R Development Core Team. R is open-source software and is part of the GNU project. R is being developed for the Unix, Macintosh, and Windows families of operating systems. The R home page (http://www.r-project.org/) contains more information about R and instructions for downloading a copy. R is excellent software to use while first learning statistics. It provides a coherent, flexible system for data analysis that can be extended as needed. The open-source nature of R ensures its availability. R’s similarity to S allows you to migrate to the commercially supported S-Plus software if desired. Finally, despite its reputation, R is as suitable for students learning statistics as it is for researchers using statistics. The purpose of this book This book started as a set of notes, titled “simpleR,” that were written to fill a gap in documentation for using R in an introductory statistics class. The College of Staten Island had been paying a per-seat fee to use a commercial statistics program. The cost of the program precluded widespread installation and curtailed accessibility. It was determined that the students would be better served if they could learn statistics with a software package that taught them good computer skills at the outset, could be installed all over campus and at home with relative ease, and was free to use. However, no suitable materials were available to accompany the class text. Hence, the basis for “simpleR”—a set of notes to accompany an in-class text. Now, as R gains wider acceptance, for pedagogic, style, and economic rea-sons, there is an increase, but no abundance, in available documentation. The adoption of R as the statistical software of choice when learning statistics depends on introductory materials. This book aims to serve the needs of students in introductory applied-statistics classes that are based on precalculus skills. An emphasis is put on finding simple-looking solutions, rather than clever ones. Certainly, this material could be covered more quickly (and is in other books such as those by Dalgaard, Fox, and Venables and Ripley). The goal here is to make it as accessible to student-learners as possible. This book aims to serve a hybrid purpose: to cover both statistical topics and the R software. Though the material stands alone, this book is also intended to be useful as an accompaniment to a standard introductory statistics book. Description of this book The pacing and content of this book are a bit different from those in most introductory texts. More time is spent with exploratory data analysis (EDA) than is typical, a chapter on simulation is included, and a unified approach to linear models is given. If this book is being used in a semester-long sequence, keep in mind that the early material is conceptually easier but requires that the student learn more on the computer. The pacing is not as might be expected, as time must be spent learning the software and its idiosyncrasies. Chapters 1 through 4 take a rather leisurely approach to the material, developing the tools of data manipulation and exploration. The material is broken up so that users who wish only to analyze univariate data can safely avoid the details of data frames, lists, and model formulas covered in Chapter 4. Those wishing to cover all the topics in the book can work straight through these first four chapters. Chapter 5 covers populations, random samples, sampling distributions, and the central limit theorem. There is no attempt to cover the background probability concepts thoroughly. We go over only what is needed in the sequel to make statistical inference. Chapter 6 introduces simulation and the basics of defining functions. Since R is a programming language, simulations are a strong selling point for R’s use in the classroom. Traditional topics in statistical inference are covered in chapters 7–11. Chapters 7, 8, and 9 cover confidence intervals, significance tests, and goodness of fit. Chapters 10 and 11 cover linear models. Although this material is broken up into chapters on linear regression and analysis of variance, for the most part we use a common approach to both. Chapter 12 covers a few extensions to the linear model to illustrate how R is used in a consistent manner with many different statistical models. The necessary background to appreciate the models is left for the reader to find. The appendices cover some background material and have information on writing functions and producing graphics that goes beyond the scope of the rest of the text. Typographic conventions The book uses a few quirky typographic conventions. Variables and commands are typeset with a data typeface; functions as a. function() (with accompanying parentheses); and arguments to functions as col= (with a trailing equal sign). Help-page references have a leading question mark: ?par. Data sets are typeset like faithful. Those that require a package to be loaded prior to usage also have the package name, such as Animals (MASS). Large blocks of commands are set off with a vertical bar: > hist(rnorm(100)) # draw histogram Often the commands include a comment, as does the one above. The output is formatted to have 4 digits and 65 characters per column, and the type size is smaller, in order to get more information in a single line. This may cause minor differences if the examples are tried with different settings. Web accompaniments The home page for this book is http://www.math.csi.cuny.edu/UsingR On this page you will find solutions to selected homework problems (a full solutions manual for instructors is available from the publisher), a list of errata, and an accompanying package containing data sets and a few functions used in the text. The UsingR package contains data sets collected from various places. Consult the help page of a data set for proper attribution. The package needs to be installed on your computer prior to usage. If your computer has an internet connection, the command > install.packages("UsingR") will fetch the package from CRAN, R’s warehouse of add-on packages, and install it. The command library (UsingR) will load the package for use. If for some reason this fails, the package can be retrieved from this book’s home page with the commands > where="http://www.math.csi.cuny.edu/UsingR" > install.packages("UsingR",contriburl=where) Finally, if that fails, the package can be downloaded from the home page and installed manually as described in Chapter 1. Using R The R software is obtained from the Comprehensive R Archive Network (CRAN), which may be reached from the main R web site http://www.r-project/. org. Some basic details for installation appear in Appendix A and more detail is on the CRAN website. This book was written to reflect the changes introduced by version 2.0.0 of R. R has approximately two new major releases per year (the second number in the version number). Despite the addition of numerous improvements with each new version, the maintainers of R do a very careful job with the upgrades to R. Minor bug fixes appear in maintenance versions (the third number). It is recommended that you upgrade your installation to keep pace with these changes to R, although these new releases may affect some of the details given in this text. Acknowledgments The author has many people to thank for this project. First, the numerous contributors to the R software and especially the core members for their seemingly tireless efforts in producing this excellent software. Next, the editorial staff at Chapman Hall/CRC was

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.