ebook img

Big Data Now: 2012 Edition PDF

131 Pages·2012·24.202 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Big Data Now: 2012 Edition

Change the world with data. We’ll show you how. strataconf.com Sep 25–27, 2013 Boston, MA Oct 28 – 30, 2013 New York, NY Nov 11–13, 2013 London, England ©2013 O’Reilly Media, Inc. O’Reilly logo is a registered trademark of O’Reilly Media, Inc. 13110 Big Data Now: 2012 Edition O’Reilly Media, Inc. Big Data Now: 2012 Edition by O’Reilly Media, Inc. Copyright © 2012 O’Reilly Media. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/institutional sales department: (800) 998-9938 or [email protected]. Cover Designer: Karen Montgomery Interior Designer: David Futato October 2012: First Edition Revision History for the First Edition: 2012-10-24 First release See http://oreilly.com/catalog/errata.csp?isbn=9781449356712 for release details. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their prod‐ ucts are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. ISBN: 978-1-449-35671-2 Table of Contents 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2. Getting Up to Speed with Big Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 What Is Big Data? 3 What Does Big Data Look Like? 4 In Practice 8 What Is Apache Hadoop? 10 The Core of Hadoop: MapReduce 11 Hadoop’s Lower Levels: HDFS and MapReduce 11 Improving Programmability: Pig and Hive 12 Improving Data Access: HBase, Sqoop, and Flume 12 Coordination and Workflow: Zookeeper and Oozie 14 Management and Deployment: Ambari and Whirr 14 Machine Learning: Mahout 14 Using Hadoop 15 Why Big Data Is Big: The Digital Nervous System 15 From Exoskeleton to Nervous System 15 Charting the Transition 16 Coming, Ready or Not 17 3. Big Data Tools, Techniques, and Strategies. . . . . . . . . . . . . . . . . . . . . 19 Designing Great Data Products 19 Objective-based Data Products 20 The Model Assembly Line: A Case Study of Optimal Decisions Group 21 Drivetrain Approach to Recommender Systems 25 Optimizing Lifetime Customer Value 28 Best Practices from Physical Data Products 31 The Future for Data Products 35 iii What It Takes to Build Great Machine Learning Products 35 Progress in Machine Learning 36 Interesting Problems Are Never Off the Shelf 37 Defining the Problem 39 4. The Application of Big Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Stories over Spreadsheets 41 A Thought on Dashboards 43 Full Interview 43 Mining the Astronomical Literature 43 Interview with Robert Simpson: Behind the Project and What Lies Ahead 48 Science between the Cracks 51 The Dark Side of Data 51 The Digital Publishing Landscape 52 Privacy by Design 53 5. What to Watch for in Big Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Big Data Is Our Generation’s Civil Rights Issue, and We Don’t Know It 55 Three Kinds of Big Data 60 Enterprise BI 2.0 60 Civil Engineering 62 Customer Relationship Optimization 63 Headlong into the Trough 64 Automated Science, Deep Data, and the Paradox of Information 64 (Semi)Automated Science 65 Deep Data 67 The Paradox of Information 69 The Chicken and Egg of Big Data Solutions 71 Walking the Tightrope of Visualization Criticism 73 The Visualization Ecosystem 74 The Irrationality of Needs: Fast Food to Fine Dining 76 Grown-up Criticism 78 Final Thoughts 80 6. Big Data and Health Care. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Solving the Wanamaker Problem for Health Care 83 Making Health Care More Effective 85 More Data, More Sources 89 iv | Table of Contents Paying for Results 90 Enabling Data 91 Building the Health Care System We Want 94 Recommended Reading 95 Dr. Farzad Mostashari on Building the Health Information Infrastructure for the Modern ePatient 96 John Wilbanks Discusses the Risks and Rewards of a Health Data Commons 100 Esther Dyson on Health Data, “Preemptive Healthcare,” and the Next Big Thing 106 A Marriage of Data and Caregivers Gives Dr. Atul Gawande Hope for Health Care 112 Five Elements of Reform that Health Providers Would Rather Not Hear About 119 Table of Contents | v CHAPTER 1 Introduction In the first edition of Big Data Now, the O’Reilly team tracked the birth and early development of data tools and data science. Now, with this second edition, we’re seeing what happens when big data grows up: how it’s being applied, where it’s playing a role, and the conse‐ quences — good and bad alike — of data’s ascendance. We’ve organized the 2012 edition of Big Data Now into five areas: Getting Up to Speed With Big Data — Essential information on the structures and definitions of big data. Big Data Tools, Techniques, and Strategies — Expert guidance for turning big data theories into big data products. The Application of Big Data — Examples of big data in action, in‐ cluding a look at the downside of data. What to Watch for in Big Data — Thoughts on how big data will evolve and the role it will play across industries and domains. Big Data and Health Care — A special section exploring the possi‐ bilities that arise when data and health care come together. In addition to Big Data Now, you can stay on top of the latest data developments with our ongoing analysis on O’Reilly Radar and through our Strata coverage and events series. 1

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.