Table Of Contentwww.it-ebooks.info
Statistical Analysis with R
Beginner's Guide
Take control of your data and produce superior statistical
analyses with R
John M. Quick
BIRMINGHAM - MUMBAI
www.it-ebooks.info
Statistical Analysis with R
Beginner's Guide
Copyright © 2010 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system,
or transmitted in any form or by any means, without the prior written permission of the
publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the
information presented. However, the information contained in this book is sold without
warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers
and distributors will be held liable for any damages caused or alleged to be caused directly
or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the
companies and products mentioned in this book by the appropriate use of capitals.
However, Packt Publishing cannot guarantee the accuracy of this information.
First published: October 2010
Production Reference: 1191010
Published by Packt Publishing Ltd.
32 Lincoln Road
Olton
Birmingham, B27 6PA, UK.
ISBN 978-1-849512-08-4
www.packtpub.com
Cover Image by John M. Quick (john@johnmquick.com)
www.it-ebooks.info
Credits
Author Editorial Team Leader
John M. Quick Akshara Aware
Reviewers Project Team Leader
Ajay Ohri Priya Mukherji
Joshua Wiley
Project Coordinator
Acquisition Editor Jovita Pinto
Douglas Paterson
Proofreaders
Aaron Nash
Development Editor
Chris Smith
Meeta Rajani
Graphics
Technical Editor
Nilesh Mohite
Vanjeet D'souza
Production Coordinator
Indexer Aparna Bhagat
Tejal Daruwale
Cover Work
Aparna Bhagat
www.it-ebooks.info
About the Author
John M. Quick is an Educational Technology Ph.D. student at Arizona State University who
is interested in the design, research, and use of educational innovations. Currently, his work
focuses on mixed-reality systems, interactive media, and innovation adoption. In addition,
he has recently published multiple gaming applications for the iPhone and iPad. John's blog,
High-Technically Correct, which covers various topics in technology, is available online at
http://www.johnmquick.com.
I give thanks to the R Project and its user community for offering the
world superior open-source statistical software. I also thank Dr. Roy Levy
for introducing me to, and encouraging me to share my knowledge of, R.
Lastly, I would like to thank my parents for their lifelong support and Zarraz
for the companionship and insights that she offered to me throughout the
authoring of this book.
www.it-ebooks.info
About the Reviewers
Ajay Ohri has been working in the field of analytics since 2004 , when it was a still nascent
emerging Industry in India. He has worked with the top two Indian outsourcers listed
on NYSE, and with Citigroup on cross-sell analytics where he helped sell an extra 50000
credit cards by cross-sell analytics .He was one of the very first independent data mining
consultants in India working on analytics products and domestic Indian market analytics.
He regularly writes on analytics topics on his website www.decisionstats.com and is
currently working on open source analytical tools like R and analytical software like SAS.
Joshua Wiley has implemented R in several laboratories on multiple campuses of the
University of California system to run statistical analyses and produce high-quality graphics.
He also uses it for data processing in descriptive and inferential statistics. He is currently
working towards his Ph.D. at UCLA, where he researches Health Psychology. In addition to
his own work with R, Mr. Wiley has led tutorials for other psychology researchers on using R,
and is an active member of the R-help mailing list.
www.it-ebooks.info
www.it-ebooks.info
Table of Contents
Preface 1
Chapter 1: Uncovering the Strategist's Data Analysis Tool 7
What is R? 8
What are the benefits of using R? 8
Why should I use R? 9
Why should I read this book? 9
What topics are covered in this book? 9
Chapter 2—Preparing R for Battle 10
Chapter 3—Exploring the Mysterious Data Analysis Tool 11
Chapter 4—Collecting and Organizing Information 11
Chapter 5—Assessing the Situation 12
Chapter 6—Planning the Attack 12
Chapter 7—Organizing the Battle Plans 13
Chapter 8—Briefing the Emperor 14
Chapter 9—Briefing the Generals 15
Chapter 10—Becoming a Master Strategist 17
Summary 17
Chapter 2: Preparing R for Battle 19
Time for action – downloading and installing R 20
Example: R 2.11.1 Mac OS X 10.5+ installation wizard demonstration 24
Time for action – issuing your first R command 29
Time for action – setting your R working directory 30
Summary 32
Chapter 3: Exploring the Mysterious Data Analysis Tool 33
Deciphering Zhuge Liang's magic square 34
Time for action – solving the first 4x4 magic square 35
Lines 37
Comments 37
www.it-ebooks.info
Table of Contents
Calculations 38
Output 38
Visualizing the R console 39
Summary 41
Chapter 4: Collecting and Organizing Information 43
Time for action – importing external data 43
read.csv(file) 44
comma-separated values (csv) files 44
Time for action – creating and calling variables 45
Time for action – accessing data within variables 47
variable$column notation 49
attach(variable) function 49
variable[row, column] notation 50
Time for action – manipulating variable data 51
Performing a calculation on an entire dataset 53
Performing a calculation on a row, column, or cell 54
Using variable data in function arguments 54
Saving a variable calculation into a new variable 55
Time for action – managing the R workspace 57
Listing the contents of the R workspace 58
Saving the contents of the R workspace 59
Loading the contents of the R workspace 59
Quitting R 59
Distinguishing between the R console and workspace 59
Saving the R console 60
Summary 62
Chapter 5: Assessing the Situation 63
Time for action – making an initial inference from our data 63
Examining our data 65
Time for action – creating a subset from a large dataset 66
Multi-argument functions 67
Variable-argument functions 67
Equivalency operators 67
subset(data, ...) 67
Time for action – deriving summary statistics 69
Means 71
Standard deviations 71
Ranges 72
summary(object) 72
Why use summary statistics? 72
[ ii ]
www.it-ebooks.info
Table of Contents
Time for action – quantifying categorical variables 73
as.numeric(data) 75
Overwriting variables 75
Time for action – correlating variables 77
Interpreting correlations 78
cor(x, y) 79
cor(data) 80
NA values 80
Regression 82
Time for action – modelling with simple linear regression 82
lm(formula, data) 84
Linear model output 84
Linear model summary 85
Interpreting a linear regression model 86
Time for action – modelling with multiple linear regression 88
Interpreting the summary output 90
Explaining model differences 91
Time for action – modelling interactions 92
Interpreting interaction variables 94
Time for action – comparing and choosing models 96
Interpreting the model summaries 98
Interpreting the ANOVA results 99
anova(object, ...) 100
Summary 101
Chapter 6: Planning the Attack 103
Review of models 103
Head to head 104
Surround 105
Ambush 106
Fire 107
Predicting outcomes using regression models 108
Rating 108
Successfully executed 108
Number of Wei soldiers 109
Duration of battle 110
A word about assumptions 110
Time for action – calculating outcomes from regression models 110
Time for action – creating custom functions 111
function() 113
Extended lines 114
[ iii ]
www.it-ebooks.info
Description:R is a data analysis tool, graphical environment, and programming language. Without any prior experience in programming or statistical software, this book will help you quickly become a knowledgeable user of R. Now is the time to take control of your data and start producing superior statistical ana