ebook img

Parallel R PDF

122 Pages·2012·5.627 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Parallel R

Parallel R Q. Ethan McCallum and Stephen Weston Beijing • Cambridge • Farnham • Köln • Sebastopol • Tokyo Parallel R by Q. Ethan McCallum and Stephen Weston Copyright © 2012 Q. Ethan McCallum and Stephen Weston. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/institutional sales department: (800) 998-9938 or [email protected]. Editors: Mike Loukides and Meghan Blanchette Cover Designer: Karen Montgomery Production Editor: Kristen Borg Interior Designer: David Futato Proofreader: O’Reilly Production Services Illustrator: Robert Romano Revision History for the First Edition: 2011-10-21 First release See http://oreilly.com/catalog/errata.csp?isbn=9781449309923 for release details. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. Parallel R, the image of a rabbit, and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information con- tained herein. ISBN: 978-1-449-30992-3 [LSI] 1319202138 Table of Contents Preface ..................................................................... vii 1. Getting Started ......................................................... 1 Why R? 1 Why Not R? 1 The Solution: Parallel Execution 2 A Road Map for This Book 2 What We’ll Cover 3 Looking Forward… 3 What We’ll Assume You Already Know 3 In a Hurry? 4 snow 4 multicore 4 parallel 4 R+Hadoop 4 RHIPE 5 Segue 5 Summary 5 2. snow .................................................................. 7 Quick Look 7 How It Works 7 Setting Up 8 Working with It 9 Creating Clusters with makeCluster 9 Parallel K-Means 10 Initializing Workers 12 Load Balancing with clusterApplyLB 13 Task Chunking with parLapply 15 Vectorizing with clusterSplit 18 Load Balancing Redux 20 iii Functions and Environments 23 Random Number Generation 25 snow Configuration 26 Installing Rmpi 29 Executing snow Programs on a Cluster with Rmpi 30 Executing snow Programs with a Batch Queueing System 32 Troubleshooting snow Programs 33 When It Works… 35 …And When It Doesn’t 36 The Wrap-up 36 3. multicore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Quick Look 37 How It Works 38 Setting Up 38 Working with It 39 The mclapply Function 39 The mc.cores Option 39 The mc.set.seed Option 40 Load Balancing with mclapply 42 The pvec Function 42 The parallel and collect Functions 43 Using collect Options 44 Parallel Random Number Generation 46 The Low-Level API 47 When It Works… 49 …And When It Doesn’t 49 The Wrap-up 49 4. parallel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Quick Look 52 How It Works 52 Setting Up 52 Working with It 53 Getting Started 53 Creating Clusters with makeCluster 54 Parallel Random Number Generation 55 Summary of Differences 57 When It Works… 58 …And When It Doesn’t 58 The Wrap-up 58 iv | Table of Contents 5. A Primer on MapReduce and Hadoop ...................................... 59 Hadoop at Cruising Altitude 59 A MapReduce Primer 60 Thinking in MapReduce: Some Pseudocode Examples 61 Calculate Average Call Length for Each Date 62 Number of Calls by Each User, on Each Date 62 Run a Special Algorithm on Each Record 63 Binary and Whole-File Data: SequenceFiles 63 No Cluster? No Problem! Look to the Clouds… 64 The Wrap-up 66 6. R+Hadoop ............................................................ 67 Quick Look 67 How It Works 67 Setting Up 68 Working with It 68 Simple Hadoop Streaming (All Text) 69 Streaming, Redux: Indirectly Working with Binary Data 72 The Java API: Binary Input and Output 74 Processing Related Groups (the Full Map and Reduce Phases) 79 When It Works… 83 …And When It Doesn’t 83 The Wrap-up 84 7. RHIPE ................................................................ 85 Quick Look 85 How It Works 85 Setting Up 86 Working with It 87 Phone Call Records, Redux 87 Tweet Brevity 91 More Complex Tweet Analysis 96 When It Works… 98 …And When It Doesn’t 99 The Wrap-up 100 8. Segue ............................................................... 101 Quick Look 101 How It Works 102 Setting Up 102 Working with It 102 Model Testing: Parameter Sweep 102 When It Works… 105 Table of Contents | v …And When It Doesn’t 105 The Wrap-up 106 9. New and Upcoming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 doRedis 107 RevoScale R and RevoConnectR (RHadoop) 108 cloudNumbers.com 108 vi | Table of Contents Preface Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords. Constant width bold Shows commands or other text that should be typed literally by the user. Constant width italic Shows text that should be replaced with user-supplied values or by values deter- mined by context. This icon signifies a tip, suggestion, or general note. This icon indicates a warning or caution. Using Code Examples This book is here to help you get your job done. In general, you may use the code in this book in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does vii require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission. We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “ Parallel R by Q. Ethan McCallum and Stephen Weston (O'Reilly). Copyright 2012 Q. Ethan McCallum and Stephen Weston, 978-1-449-30992-3.” If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at [email protected]. Safari® Books Online Safari Books Online is an on-demand digital library that lets you easily search over 7,500 technology and creative reference books and videos to find the answers you need quickly. With a subscription, you can read any page and watch any video from our library online. Read books on your cell phone and mobile devices. Access new titles before they are available for print, and get exclusive access to manuscripts in development and post feedback for the authors. Copy and paste code samples, organize your favorites, down- load chapters, bookmark key sections, create notes, print out pages, and benefit from tons of other time-saving features. O’Reilly Media has uploaded this book to the Safari Books Online service. To have full digital access to this book and others on similar topics from O’Reilly and other pub- lishers, sign up for free at http://my.safaribooksonline.com. How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-998-9938 (in the United States or Canada) 707-829-0515 (international or local) 707-829-0104 (fax) We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at: http://oreilly.com/catalog/0636920021421 To comment or ask technical questions about this book, send email to: [email protected] viii | Preface

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.