ebook img

Hands-On Data Analysis with Pandas: A Python data science handbook for data collection, wrangling, analysis, and visualization PDF

788 Pages·2021·70.837 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Hands-On Data Analysis with Pandas: A Python data science handbook for data collection, wrangling, analysis, and visualization

Hands-On Data Analysis with Pandas – Second Edition A Python data science handbook for data collection, wrangling, analysis, and visualization Stefanie Molin BIRMINGHAM—MUMBAI Hands-On Data Analysis with Pandas Second Edition Copyright © 2021 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. Group Product Manager: Kunal Parikh Publishing Product Manager: Sunith Shetty Senior Editor: Roshan Ravikumar Content Development Editor: Athikho Sapuni Rishana Technical Editor: Sonam Pandey Copy Editor: Safis Editing Project Coordinator: Aishwarya Mohan Proofreader: Safis Editing Indexer: Pratik Shirodkar Production Designer: Shankar Kalbhor First published: July 2019 Second edition: April 2021 Production reference: 1270421 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-80056-345-2 www.packt.com To everyone that made the first edition such a success. Foreword to the Second Edition As educators, we are inclined to teach across the medium that we best learn from. I personally gravitated towards video content early in my career. As I produce more online content, surprisingly, one of the most frequently asked questions I get is: What book would you recommend for someone getting started in data science? Initially, I was baffled at why people would turn to books when there are so many great online resources out there. However, after reading Hands-On Data Analysis with Pandas, my perception of books for learning data science began to change. The first thing I loved about Hands-On Data Analysis with Pandas was the structure. The book gives you just the right amount of information at the right time to keep you progressing at a natural pace. Starting with light foundations in statistics and concepts gives the perfect amount of cognitive glue to keep theory and practice comfortably bound together. After the foundations, you are introduced to the star of the show: pandas. Stefanie uses practical examples (not the same old datasets you have used before) to bring the module to life. I use pandas almost every day, and I still learned quite a few tricks across these sections. As a software engineer, Stefanie knows the importance of quality documentation. She has all of the data, examples, and more in a tidy GitHub repo. Through these examples, the book truly earns the "Hands-On" moniker in its title. The latter portion of the book gives the reader a taste of what is possible with a strong foundation in pandas. Stefanie leads you just a little bit deeper into the more advanced machine learning concepts. Once again, she provides just enough information to get you excited about taking the next step in your learning journey without inundating you with overly technical jargon. I could sense the pride Stefanie took in this work through our conversations. While the book is a great resource for people looking to learn data science tools, it was also a way for her to solidify her own knowledge and push her boundaries. In my opinion, you want to learn from people that are creating not only for the community but also for their own learning. People with intrinsic motivation like this are willing to go the extra mile to make that extra revision or get the wording perfect. I hope you enjoy learning from this book as much as I did. To those who asked me the question above, I have a simple answer: This one. Ken Jee YouTuber & Head of Data Science @ Scouts Consulting Group Honolulu, HI (03/09/2021) Foreword to the First Edition Recent advancements in computing and artificial intelligence have completely changed the way we understand the world. Our current ability to record and analyze data has already transformed industries and inspired big changes in society. Stefanie Molin's Hands-On Data Analysis with Pandas is much more than an introduction to the subject of data analysis or the pandas Python library; it's a guide to help you become part of this transformation. Not only will this book teach you the fundamentals of using Python to collect, analyze, and understand data, but it will also expose you to important software engineering, statistical, and machine learning concepts that you will need to be successful. Using examples based on real data, you will be able to see firsthand how to apply these techniques to extract value from data. In the process, you will learn important software development skills, including writing simulations, creating your own Python packages, and collecting data from APIs. Stefanie possesses a rare combination of skills that makes her uniquely qualified to guide you through this process. Being both an expert data scientist and a strong software engineer, she can not only talk authoritatively about the intricacies of the data analysis workflow but also about how to implement it correctly and efficiently in Python. Whether you are a Python programmer interested in learning more about data analysis, or a data scientist learning how to work in Python, this book will get you up to speed fast, so you can begin to tackle your own data analysis projects right away. Felipe Moreno New York, June 10, 2019. Felipe Moreno has been working in information security for the last two decades. He currently works for Bloomberg LP, where he leads the Security Data Science team within the Chief Information Security Office and focuses on applying statistics and machine learning to security problems. Contributors About the author Stefanie Molin is a data scientist and software engineer at Bloomberg LP in NYC, tackling tough problems in information security, particularly revolving around anomaly detection, building tools for gathering data, and knowledge sharing. She has extensive experience in data science, designing anomaly detection solutions, and utilizing machine learning in both R and Python in the AdTech and FinTech industries. She holds a B.S. in operations research from Columbia University's Fu Foundation School of Engineering and Applied Science, with minors in economics, and entrepreneurship and innovation. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers. Writing this book was a tremendous amount of work, but I have grown a lot through the experience: as a writer, as a technologist, and as a person. This wouldn't have been possible without the help of my friends, family, and colleagues. I'm very grateful to you all. In particular, I want to thank Aliki Mavromoustaki, Felipe Moreno, Suphannee Sivakorn, Lucy Hao, Javon Thompson, and Ken Jee. (The full version of my acknowledgments can be found in the code repository; see the preface for the link.) About the reviewer Aliki Mavromoustaki is the lead data scientist at Tasman Analytics. She works with direct-to-consumer companies to deliver scalable infrastructure and implement event-driven analytics. Previously, she worked at Criteo, an AdTech company that employs machine learning to help digital commerce companies target valuable customers. Aliki has worked on optimizing marketing campaigns and designed statistical experiments comparing Criteo products. Aliki holds a PhD in fluid dynamics from Imperial College London and was an assistant adjunct professor in applied mathematics at UCLA. Table of Contents Preface Section 1: Getting Started with Pandas 1 Introduction to Data Analysis Chapter materials 4 Inferential statistics 32 The fundamentals of data analysis 6 Setting up a virtual environment 34 Data collection 7 Virtual environments 34 Data wrangling 8 Installing the required Python packages 39 Exploratory data analysis 9 Why pandas? 40 Drawing conclusions 10 Jupyter Notebooks 40 Statistical foundations 11 Summary 44 Sampling 12 Exercises 44 Descriptive statistics 12 Further reading 46 Prediction and forecasting 28 2 Working with Pandas DataFrames Chapter materials 48 From a Python object 60 Pandas data structures 49 From a file 64 From a database 68 Series 53 From an API 70 Index 55 DataFrame 56 Inspecting a DataFrame object 74 Creating a pandas DataFrame 59 Examining the data 74

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.