ebook img

Practical Data Science with Python 3: Synthesizing Actionable Insights from Data PDF

553 Pages·2019·15.507 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Practical Data Science with Python 3: Synthesizing Actionable Insights from Data

Ervin Varga Practical Data Science with Python 3 Synthesizing Actionable Insights from Data Ervin Varga Kikinda, Serbia Any source code or other supplementary material referenced by the author in this book is available to readers on GitHub via the book’s product page, located at www. apress.com/978-1-4842-4858-4 . For more detailed information, please visit http://www.apress.com/source- code . ISBN 978-1-4842-4858-4 e-ISBN 978-1-4842-4859-1 https://doi.org/10.1007/978-1-4842-4859-1 © Ervin Varga 2019 Apress Standard Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with every occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the author nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Distributed to the book trade worldwide by Springer Science+Business Media New York, 233 Spring Street, 6th Floor, New York, NY 10013. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail [email protected], or visit www.springeronline.com. Apress Media, LLC is a California LLC and the sole member (owner) is Springer Science + Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation. Like traveling, writing a book is more enjoyable when accompanied by your family. I am thankful to my wife, Zorica, and sons, Andrej and Stefan, for all their great support. Introduction This book amalgamates data science and software engineering in a pragmatic manner. It guides the reader through topics from these worlds and exemplifies concepts through software. As a reader, you will gain insight into areas rarely covered in textbooks, since they are hard to explain and illustrate. You will see the Cynefin framework in action via examples that give an overarching context and systematic approach for your data science endeavors. The book also introduces you to the most useful Python 3 data science frameworks and tools: Numpy, Pandas, scikit- learn, matplotlib, Seaborn, Dask, Apache Spark, PyTorch, and other auxiliary frameworks. All examples are self- contained and allow you to reproduce every piece of content from the book, including graphs. The exercises at the end of each chapter advise you how to further deepen your knowledge. Finally, the book explains, again using lots of examples, all phases of a data science life cycle model: from project initiation to data exploration and retrospection. The aim is to equip you with necessary comprehension pertaining to major areas of data science so that you may see the forest for the trees . Acknowledgments I would like to thank Apress for giving me an opportunity and full support for writing this book about data science. Comments and help from James Markham, Aditee Mirashi, and Celestin Suresh John were invaluable. I am also grateful for excellent remarks from Jojo John Moolayil, who was the technical reviewer on this book. Table of Contents Chapter 1: Introduction to Data Science Main Phases of a Data Science Project Brown Cow Model Case Study Big Data Big Data Example: MOOC Platforms How to Learn Data Science Domain Knowledge Attainment—Example Programming Skills Attainment—Example Overview of the Anaconda Ecosystem Managing Packages and Environments Sharing and Reproducing Environments Summary References Chapter 2: Data Engineering E-Commerce Customer Segmentation: Case Study Creating a Project in Spyder Downloading the Dataset Exploring the Dataset Inspecting Results Persisting Results Restructuring Code to Cope with Large CSV Files Public Data Sources Summary References Chapter 3: Software Engineering Characteristics of a Large-Scale Software System Software Engineering Knowledge Areas Rules, Principles, Conventions, and Standards Context Awareness and Communicative Abilities Reducing Cyclomatic Complexity Cone of Uncertainty and Having Time to Ask Fixing a Bug and Knowing How to Ask Handling Legacy Code Understanding Bug-Free Code Understanding Faulty Code The Importance of APIs Fervent Flexibility Hurts Your API The Socio-* Pieces of Software Production Funny Elevator Case Study Summary References Chapter 4: Documenting Your Work JupyterLab in Action Experimenting with Code Execution Managing the Kernel Descending Ball Project Refactoring the Simulator’s Notebook Document Structure Wikipedia Edits Project Summary References Chapter 5: Data Processing Augmented Descending Ball Project Version 1.1 Version 1.2 Version 1.3 Abstractions vs. Latent Features Compressing the Ratings Matrix Summary References Chapter 6: Data Visualization Visualizing Temperature Data Case Study Showing Stations on a Map Plotting Temperatures Closest Pair Case Study

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.