ebook img

Data science for dummies PDF

438 Pages·2015·9.255 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Data science for dummies

Data Science For Dummies® Published by: John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030-5774, www.wiley.com Copyright © 2015 by John Wiley & Sons, Inc., Hoboken, New Jersey Published simultaneously in Canada No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions. Trademarks: Wiley, For Dummies, the Dummies Man logo, Dummies.com, Making Everything Easier, and related trade dress are trademarks or registered trademarks of John Wiley & Sons, Inc. and may not be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book. LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: THE PUBLISHER AND THE AUTHOR MAKE NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTS OF THIS WORK AND SPECIFICALLY DISCLAIM ALL WARRANTIES, INCLUDING WITHOUT LIMITATION WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE. NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES OR PROMOTIONAL MATERIALS. THE ADVICE AND STRATEGIES CONTAINED HEREIN MAY NOT BE SUITABLE FOR EVERY SITUATION. THIS WORK IS SOLD WITH THE UNDERSTANDING THAT THE PUBLISHER IS NOT ENGAGED IN RENDERING LEGAL, ACCOUNTING, OR OTHER PROFESSIONAL SERVICES. IF PROFESSIONAL ASSISTANCE IS REQUIRED, THE SERVICES OF A COMPETENT PROFESSIONAL PERSON SHOULD BE SOUGHT. NEITHER THE PUBLISHER NOR THE AUTHOR SHALL BE LIABLE FOR DAMAGES ARISING HEREFROM. THE FACT THAT AN ORGANIZATION OR WEBSITE IS REFERRED TO IN THIS WORK AS A CITATION AND/OR A POTENTIAL SOURCE OF FURTHER INFORMATION DOES NOT MEAN THAT THE AUTHOR OR THE PUBLISHER ENDORSES THE INFORMATION THE ORGANIZATION OR WEBSITE MAY PROVIDE OR RECOMMENDATIONS IT MAY MAKE. FURTHER, READERS SHOULD BE AWARE THAT INTERNET WEBSITES LISTED IN THIS WORK MAY HAVE CHANGED OR DISAPPEARED BETWEEN WHEN THIS WORK WAS WRITTEN AND WHEN IT IS READ. For general information on our other products and services, please contact our Customer Care Department within the U.S. at 877-762-2974, outside the U.S. at 317- 572-3993, or fax 317-572-4002. For technical support, please visit www.wiley.com/techsupport. Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com. Library of Congress Control Number: 2014955780 ISBN 978-1-118-4155-6 (pbk); ISBN 978-1-118-84145-7 (ebk); ISBN 978-1-118- 84152-5 Data Science For Dummies® Visit www.dummies.com/cheatsheet/datascience to view this book’s cheat sheet. Table of Contents Cover Foreword Introduction About This Book Foolish Assumptions Icons Used in This Book Beyond the Book Where to Go from Here Part I: Getting Started With Data Science Chapter 1: Wrapping Your Head around Data Science Seeing Who Can Make Use of Data Science Looking at the Pieces of the Data Science Puzzle Getting a Basic Lay of the Data Science Landscape Chapter 2: Exploring Data Engineering Pipelines and Infrastructure Defining Big Data by Its Four Vs Identifying Big Data Sources Grasping the Difference between Data Science and Data Engineering Boiling Down Data with MapReduce and Hadoop Identifying Alternative Big Data Solutions Data Engineering in Action — A Case Study Chapter 3: Applying Data Science to Business and Industry Incorporating Data-Driven Insights into the Business Process Distinguishing Business Intelligence and Data Science Knowing Who to Call to Get the Job Done Right Exploring Data Science in Business: A Data-Driven Business Success Story Part II: Using Data Science to Extract Meaning from Your Data Chapter 4: Introducing Probability and Statistics Introducing the Fundamental Concepts of Probability Introducing Linear Regression Simulations Introducing Time Series Analysis Chapter 5: Clustering and Classification Introducing the Basics of Clustering and Classification Identifying Clusters in Your Data Chapter 6: Clustering and Classification with Nearest Neighbor Algorithms Making Sense of Data with Nearest Neighbor Analysis Seeing the Importance of Clustering and Classification Classifying Data with Average Nearest Neighbor Algorithms Classifying with K-Nearest Neighbor Algorithms Using Nearest Neighbor Distances to Infer Meaning from Point Patterns Solving Real-World Problems with Nearest Neighbor Algorithms Chapter 7: Mathematical Modeling in Data Science Introducing Multi-Criteria Decision Making (MCDM) Using Numerical Methods in Data Science Mathematical Modeling with Markov Chains and Stochastic Methods Chapter 8: Modeling Spatial Data with Statistics Generating Predictive Surfaces from Spatial Point Data Using Trend Surface Analysis on Spatial Data Part III: Creating Data Visualizations that Clearly Communicate Meaning Chapter 9: Following the Principles of Data Visualization Design Understanding the Types of Visualizations Focusing on Your Audience Picking the Most Appropriate Design Style Knowing When to Add Context Knowing When to Get Persuasive Choosing the Most Appropriate Data Graphic Type Choosing Your Data Graphic Chapter 10: Using D3.js for Data Visualization Introducing the D3.js Library Knowing When to Use D3.js (and When Not To) Getting Started in D3.js Understanding More Advanced Concepts and Practices in D3.js Chapter 11: Web-Based Applications for Visualization Design Using Collaborative Data Visualization Platforms Visualizing Spatial Data with Online Geographic Tools Visualizing with Open Source: Web-Based Data Visualization Platforms Knowing When to Stick with Infographics Chapter 12: Exploring Best Practices in Dashboard Design Focusing on the Audience Starting with the Big Picture Getting the Details Right Testing Your Design Chapter 13: Making Maps from Spatial Data Getting into the Basics of GIS Analyzing Spatial Data Getting Started with Open-Source QGIS Part IV: Computing for Data Science Chapter 14: Using Python for Data Science Understanding Basic Concepts in Python Getting on a First-Name Basis with Some Useful Python Libraries Using Python to Analyze Data — An Example Exercise Chapter 15: Using Open Source R for Data Science Introducing the Fundamental Concepts Previewing R Packages Chapter 16: Using SQL in Data Science Getting Started with SQL Using SQL and Its Functions in Data Science Chapter 17: Software Applications for Data Science Making Life Easier with Excel Using KNIME for Advanced Data Analytics Part V: Applying Domain Expertise to Solve Real-World Problems Using Data Science Chapter 18: Using Data Science in Journalism Exploring the Five Ws and an H Collecting Data for Your Story Finding and Telling Your Data’s Story Bringing Data Journalism to Life: Washington Post’s The Black Budget Chapter 19: Delving into Environmental Data Science Modeling Environmental-Human Interactions with Environmental Intelligence Modeling Natural Resources in the Raw Using Spatial Statistics to Predict for Environmental Variation across Space Chapter 20: Data Science for Driving Growth in E-Commerce Making Sense of Data for E-Commerce Growth Optimizing E-Commerce Business Systems Chapter 21: Using Data Science to Describe and Predict Criminal Activity Temporal Analysis for Crime Prevention and Monitoring Spatial Crime Prediction and Monitoring Probing the Problems with Data Science for Crime Analysis Part VI: The Part of Tens Chapter 22: Ten Phenomenal Resources for Open Data Digging through Data.gov Checking Out Canada Open Data Diving into data.gov.uk Checking Out U.S. Census Bureau Data Knowing NASA Data Wrangling World Bank Data Getting to Know Knoema Data Queuing Up with Quandl Data Exploring Exversion Data Mapping OpenStreetMap Spatial Data Chapter 23: Ten (or So) Free Data Science Tools and Applications Making Custom Web-Based Data Visualizations with Free R Packages Checking Out More Scraping, Collecting, and Handling Tools Checking Out More Data Exploration Tools Checking Out More Web-Based Visualization Tools About the Author Cheat Sheet Advertisement Page Connect with Dummies End User License Agreement Foreword We live in exciting, even revolutionary times. As our daily interactions move from the physical world to the digital world, nearly every action we take generates data. Information pours from our mobile devices and our every online interaction. Sensors and machines collect, store and process information about the environment around us. New, huge data sets are now open and publicly accessible. This flood of information gives us the power to make more informed decisions, react more quickly to change, and better understand the world around us. However, it can be a struggle to know where to start when it comes to making sense of this data deluge. What data should one collect? What methods are there for reasoning from data? And, most importantly, how do we get the answers from the data to answer our most pressing questions about our businesses, our lives, and our world? Data science is the key to making this flood of information useful. Simply put, data science is the art of wrangling data to predict our future behavior, uncover patterns to help prioritize or provide actionable information, or otherwise draw meaning from these vast, untapped data resources. I often say that one of my favorite interpretations of the word “big” in Big Data is “expansive.” The data revolution is spreading to so many fields that it is now incumbent on people working in all professions to understand how to use data, just as people had to learn how to use computers in the 80’s and 90’s. This book is designed to help you do that. I have seen firsthand how radically data science knowledge can transform organizations and the world for the better. At DataKind, we harness the power of data science in the service of humanity by engaging data science and social sector experts to work on projects addressing critical humanitarian problems. We are also helping drive the conversation about how data science can be applied to solve the world’s biggest challenges. From using satellite imagery to estimate poverty levels to mining decades of human rights violations to prevent further atrocities, DataKind teams have worked with many different nonprofits and humanitarian organizations just beginning their data science journeys. One lesson resounds through every project we do: The people and organizations that are most committed to using data in novel and responsible ways are the ones who will succeed in this new environment. Just holding this book means you are taking your first steps on that journey, too. Whether you are a seasoned researcher looking to brush up on some data science techniques or are completely new to the world of data, Data Science For Dummies will equip you with the tools you need to show whatever you can dream up. You’ll be able to demonstrate new findings from your physical activity data, to present new insights from the latest marketing campaign, and to share new learnings about preventing the spread of disease. We truly are on the forefront of a new data age, and those that learn data science will be able to take part in this thrilling new adventure, shaping our path forward in every field. For you, that adventure starts now. Welcome aboard! Jake Porway Founder and Executive Director of DataKind™

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.