Mastering Scientific Computing with R Employ professional quantitative methods to answer scientific questions with a powerful open source data analysis environment Paul Gerrard Radia M. Johnson BIRMINGHAM - MUMBAI Mastering Scientific Computing with R Copyright © 2015 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. First published: January 2015 Production reference: 1270115 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-78355-525-3 www.packtpub.com Cover image by Jason Dupuis Mayer ([email protected]) Credits Authors Project Coordinator Paul Gerrard Mary Alex Radia M. Johnson Proofreaders Simran Bhogal Reviewers Laurent Drouet Martin Diver Ratanlal Mahanta Ameesha Green Mzabalazo Z. Ngwenya Paul Hindle Donato Teutonico Bernadette Watkins Commissioning Editor Indexer Kartikey Pandey Priya Subramani Acquisition Editor Graphics Greg Wild Sheetal Aute Disha Haria Content Development Editor Abhinash Sahu Akshay Nair Production Coordinator Technical Editors Conidon Miranda Rosmy George Ankita Thakur Cover Work Conidon Miranda Copy Editors Shivangi Chaturvedi Pranjali Chury Puja Lalwani Adithi Shetty About the Authors Paul Gerrard is a physician and healthcare researcher who is based out of Portland, Maine, where he currently serves as the medical director of the cardiopulmonary rehabilitation program at New England Rehabilitation Hospital of Portland. He studied business economics in college. After completing medical school, he did a residency in physical medicine and rehabilitation at Harvard Medical School and Spaulding Rehabilitation Hospital, where he served as chief resident and stayed on as faculty at Harvard before moving to Portland. He continues to collaborate on research projects with researchers at other academic institutions within the Boston area and around the country. He has published and presented research on a range of topics, including traumatic brain injury, burn rehabilitation, health outcomes, and the epidemiology of disabling medical conditions. I would like to thank my beautiful wife, Deirdre, and my son, Patrick. My work on this book is dedicated to the loving memory of Fiona. Radia M. Johnson has a doctorate degree in immunology and currently works as a research scientist at the Institute for Research in Immunology and Cancer at the Université de Montréal, where she uses genomics and bioinformatics to identify and characterize the molecular changes that contribute to cancer development. She routinely uses R and other computer programming languages to analyze large data sets from ongoing collaborative projects. Since obtaining her PhD at the University of Toronto, she has also worked as a research associate at the University of Cambridge in Hematology, where she gained experience using system biology to study blood cancer. I would like to thank Dr. Charlie Massie for teaching me to love programming in R and Dr. Phil Kousis for all his support through the years. You are both excellent mentors and wonderful friends! About the Reviewers Laurent Drouet holds a PhD in economics and social sciences from the University of Geneva, Switzerland, and a master's degree in applied mathematics from the Institute of Applied Mathematics of Angers, France. He was also a postdoctoral research fellow at the Research Lab of Economics and Environmental Management at the Ecole Polytechnique Federale de Lausanne (EPFL), Switzerland. He was also a researcher at the Public Research Center Tudor, Luxembourg. He is currently a senior researcher at Fondazione Eni Enrico Mattei (FEEM) and a research affiliate at Centro Euro-Mediterraneo sui Cambiamenti Climatici (CMCC), Italy. His main research is related to integrated assessment modeling and energy modeling. For more than a decade, he designed scientific tools to perform data analysis for this type of modeling. He also built optimization frameworks to couple models of many kinds (such as climate models, air quality models, and economy models). He created and developed the bottom-up techno-economic energy model ETEM to study optimal energy policies at urban or national levels. I want to thank my wife for her support every day both in my private life and professional life. Ratanlal Mahanta holds an MSc in computational finance. He is currently working at GPSK Investment Group as a senior quantitative analyst. He has 4 years of experience in quantitative trading and strategies developments for sell side and risk consulting firms. He is an expert in high frequency and algorithmic trading. He has expertise in these areas: quantitative trading (FX, equities, futures and options, and engineering on derivatives); algorithms—partial differential equations, stochastic differential equations, the finite difference method, Monte Carlo, and Machine Learning; code—R programming, C++, MATLAB, HPC, and scientific computing; data analysis—Big Data analytic [EOD to TBT], Bloomberg, Quandl, and Quantopian; and strategies—vol-arbitrage, vanilla and exotic options modeling, trend following, mean reversion, co-integration, Monte Carlo simulations, ValueatRisk, stress testing, buy side trading strategies with high Sharpe ratio, credit risk modeling, and credit rating. He has reviewed Mastering R for Quantitative Finance, Packt Publishing. He is currently reviewing two other books for Packt Publishing: Mastering Python for Data Science and Machine Learning with R Cookbook. Mzabalazo Z. Ngwenya holds a postgraduate degree in mathematical statistics from the University of Cape Town. He has worked extensively in the field of statistical consulting, wherein he utilized varied statistical software including R. His area of interest are primarily centered around statistical computing. Previously, he was involved in reviewing Learning RStudio for R Statistical Computing, Mark P.J. van der Loo and Edwin de Jonge; R Statistical Application Development Example Beginner's Guide, Prabhanjan Narayanachar Tattar; Machine Learning with R, Brett Lantz; R Graph Essentials, David Alexandra Lillis, and R Object-oriented Programming, Kelly Black, all by Packt Publishing. He currently works as a biometrician. Donato Teutonico has several years of experience in modeling and the simulation of drug effects and clinical trials in industrial and academic settings. He received his PharmD degree from the University of Turin, Italy, specializing in chemical and pharmaceutical technology, and his PhD in pharmaceutical sciences from Paris-Sud University, France. He is the author of two R packages for pharmacometrics, CTStemplate and panels-for-pharmacometrics, which are both available on Google Code. He is also the author of Instant R Starter, Packt Publishing. www.PacktPub.com Support files, eBooks, discount offers, and more For support files and downloads related to your book, please visit www.PacktPub.com. Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details. At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks. TM https://www2.packtpub.com/books/subscription/packtlib Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books. Why subscribe? • Fully searchable across every book published by Packt • Copy and paste, print, and bookmark content • On demand and accessible via a web browser Free access for Packt account holders If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access. Table of Contents Preface 1 Chapter 1: Programming with R 7 Data structures in R 10 Atomic vectors 10 Operations on vectors 14 Lists 15 Attributes 19 Factors 21 Multidimensional arrays 22 Matrices 23 Data frames 25 Loading data into R 28 Saving data frames 31 Basic plots and the ggplot2 package 33 Flow control 43 The for() loop 43 The apply() function 44 The if() statement 46 The while() loop 46 The repeat{} and break statement 47 Functions 48 General programming and debugging tools 52 Summary 55 Chapter 2: Statistical Methods with R 57 Descriptive statistics 59 Data variability 61 Confidence intervals 62 Probability distributions 63