Theodore Petrou Pandas Cookbook fi Recipes for Scienti c Computing, Time Series Analysis and Data Visualization using Python Pandas Cookbook Recipes for Scientific Computing, Time Series Analysis and Data Visualization using Python Theodore Petrou BIRMINGHAM - MUMBAI Pandas Cookbook Copyright © 2017 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. First published: October 2017 Production reference: 1181017 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-78439-387-8 www.packtpub.com Credits Author Copy Editor Theodore Petrou Tasneem Fatehi Reviewers Project Coordinator Sonali Dayal Manthan Patel Kuntal Ganguly Shilpi Saxena Commissioning Editor Proofreader Veena Pagare Safis Editing Acquisition Editor Indexer Tushar Gupta Tejal Daruwale Soni Content Development Editor Graphics Snehal Kolte Tania Dutta Technical Editor Production Coordinator Sayli Nikalje Deepika Naik About the Author Theodore Petrou is a data scientist and the founder of Dunder Data, a professional educational company focusing on exploratory data analysis. He is also the head of Houston Data Science, a meetup group with more than 2,000 members that has the primary goal of getting local data enthusiasts together in the same room to practice data science. Before founding Dunder Data, Ted was a data scientist at Schlumberger, a large oil services company, where he spent the vast majority of his time exploring data. Some of his projects included using targeted sentiment analysis to discover the root cause of part failure from engineer text, developing customized client/server dashboarding applications, and real-time web services to avoid the mispricing of sales items. Ted received his masters degree in statistics from Rice University, and used his analytical skills to play poker professionally and teach math before becoming a data scientist. Ted is a strong supporter of learning through practice and can often be found answering questions about pandas on Stack Overflow. Acknowledgements I would first like to thank my wife, Eleni, and two young children, Penelope, and Niko, who endured extended periods of time without me as I wrote. I’d also like to thank Sonali Dayal, whose constant feedback helped immensely in structuring the content of the book to improve its effectiveness. Thank you to Roy Keyes, who is the most exceptional data scientist I know and whose collaboration made Houston Data Science possible. Thank you to Scott Boston, an extremely skilled pandas user for developing ideas for recipes. Thank you very much to Kim Williams, Randolph Adami, Kevin Higgins, and Vishwanath Avasarala, who took a chance on me during my professional career when I had little to no experience. Thanks to my fellow coworker at Schlumberger, Micah Miller, for his critical, honest, and instructive feedback on anything that we developed together and his constant pursuit to move toward Python. Thank you to Phu Ngo, who critically challenges and sharpens my thinking more than anyone. Thank you to my brother, Dean Petrou, for being right by my side as we developed our analytical skills through poker and again through business. Thank you to my sister, Stephanie Burton, for always knowing what I’m thinking and making sure that I’m aware of it. Thank you to my mother, Sofia Petrou, for her ceaseless love, support, and endless math puzzles that challenged me as a child. And thank you to my father, Steve Petrou, who, although no longer here, remains close to my heart and continues to encourage me every day. About the Reviewers Sonali Dayal is a masters candidate in biostatistics at the University of California, Berkeley. Previously, she has worked as a freelance software and data science engineer for early stage start-ups, where she built supervised and unsupervised machine learning models as well as data pipelines and interactive data analytics dashboards. She received her bachelor of science (B.S.) in biochemistry from Virginia Tech in 2011. Kuntal Ganguly is a big data machine learning engineer focused on building large-scale data-driven systems using big data frameworks and machine learning. He has around 7 years of experience building several big data and machine learning applications. Kuntal provides solutions to AWS customers in building real-time analytics systems using managed cloud services and open source Hadoop ecosystem technologies such as Spark, Kafka, Storm, Solr, and so on, along with machine learning and deep learning frameworks such as scikit-learn, TensorFlow, Keras, and BigDL. He enjoys hands-on software development, and has single-handedly conceived, architectured, developed, and deployed several large scale distributed applications. He is a machine learning and deep learning practitioner and very passionate about building intelligent applications. Kuntal is the author of the books: Learning Generative Adversarial Network and R Data Analysis Cookbook - Second Edition, Packt Publishing. Shilpi Saxena is a seasoned professional who leads in management with an edge of being a technology evangelist--she is an engineer who has exposure to a variety of domains (machine-to-machine space, healthcare, telecom, hiring, and manufacturing). She has experience in all aspects of the conception and execution of enterprise solutions. She has been architecturing, managing, and delivering solutions in the big data space for the last 3 years, handling high performance geographically distributed teams of elite engineers. Shilpi has around 12+ years (3 years in the big data space) experience in the development and execution of various facets of enterprise solutions, both in the product/services dimensions of the software industry. An engineer by degree and profession who has worn various hats- -developer, technical leader, product owner, tech manager--and has seen all the flavors that the industry has to offer. She has architectured and worked through some of the pioneer production implementation in big data on Storm and Impala with auto scaling in AWS. LinkedIn: http:/(cid:8203)/(cid:8203)in.(cid:8203)linkedin.(cid:8203)com/(cid:8203)pub/(cid:8203)shilpi-(cid:8203)saxena/(cid:8203)4/(cid:8203)552/(cid:8203)a30 www.PacktPub.com For support files and downloads related to your book, please visit www.PacktPub.com. Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details. At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks. https:/(cid:8203)/(cid:8203)www.(cid:8203)packtpub.(cid:8203)com/(cid:8203)mapt Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career. Why subscribe? Fully searchable across every book published by Packt Copy and paste, print, and bookmark content On demand and accessible via a web browser Customer Feedback Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https:/(cid:8203)/(cid:8203)www.(cid:8203)amazon.(cid:8203)com/(cid:8203)dp/(cid:8203)1784393878. If you'd like to join our team of regular reviewers, you can email us at [email protected]. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products! Table of Contents Preface 1 Chapter 1: Pandas Foundations 15 Introduction 15 Dissecting the anatomy of a DataFrame 16 Getting ready 16 How to do it... 16 How it works... 17 There's more... 18 See also 18 Accessing the main DataFrame components 18 Getting ready 18 How to do it... 19 How it works... 20 There's more... 21 See also 21 Understanding data types 22 Getting ready 23 How to do it... 23 How it works... 23 There's more... 24 See also 24 Selecting a single column of data as a Series 24 Getting ready 24 How to do it... 25 How it works... 25 There's more... 26 See also 27 Calling Series methods 27 Getting ready 28 How to do it... 28 How it works... 32 There's more... 33 See also 34 Working with operators on a Series 34