ebook img

Mastering Python for Data Science PDF

294 Pages·5.65 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Mastering Python for Data Science

[ 1 ] Mastering Python for Data Science Explore the world of data science through Python and learn how to make sense of data Samir Madhavan BIRMINGHAM - MUMBAI Mastering Python for Data Science Copyright © 2015 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. First published: August 2015 Production reference: 1260815 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-78439-015-0 www.packtpub.com Credits Author Project Coordinator Samir Madhavan Neha Bhatnagar Reviewers Proofreader Sébastien Celles Safis Editing Robert Dempsey Maurice HT Ling Indexer Monica Ajmera Mehta Ratanlal Mahanta Yingssu Tsai Graphics Disha Haria Commissioning Editor Jason Monteiro Pramila Balan Production Coordinator Acquisition Editor Arvindkumar Gupta Sonali Vernekar Cover Work Content Development Editor Arvindkumar Gupta Arun Nadar Technical Editor Chinmay S. Puranik Copy Editor Sonia Michelle Cheema About the Author Samir Madhavan has been working in the field of data science since 2010. He is an industry expert on machine learning and big data. He has also reviewed R Machine Learning Essentials by Packt Publishing. He was part of the ubiquitous Aadhar project of the Unique Identification Authority of India, which is in the process of helping every Indian get a unique number that is similar to a social security number in the United States. He was also the first employee of Flutura Decision Sciences and Analytics and is a part of the core team that has helped scale the number of employees in the company to 50. His company is now recognized as one of the most promising Internet of Things—Decision Sciences companies in the world. I would like to thank my mom, Rajasree Madhavan, and dad, P Madhavan, for all their support. I would also like to thank Srikanth Muralidhara, Krishnan Raman, and Derick Jose, who gave me the opportunity to start my career in the world of data science. About the Reviewers Sébastien Celles is a professor of applied physics at Universite de Poitiers (working in the thermal science department). He has used Python for numerical simulations, data plotting, data predictions, and various other tasks since the early 2000s. He is a member of PyData and was granted commit rights to the pandas DataReader project. He is also involved in several open source projects in the scientific Python ecosystem. Sebastien is also the author of some Python packages available on PyPi, which are as follows: • openweathermap_requests: This is a package used to fetch data from OpenWeatherMap.org using Requests and Requests-cache and to get pandas DataFrame with weather history • pandas_degreedays: This is a package used to calculate degree days (a measure of heating or cooling) from the pandas time series of temperature • pandas_confusion: This is a package used to manage confusion matrices, plot and binarize them, and calculate overall and class statistics • There are some other packages authored by him, such as pyade, pandas_datareaders_unofficial, and more He also has a personal interest in data mining, machine learning techniques, forecasting, and so on. You can find more information about him at http://www. celles.net/wiki/Contact or https://www.linkedin.com/in/sebastiencelles. Robert Dempsey is a leader and technology professional, specializing in delivering solutions and products to solve tough business challenges. His experience of forming and leading agile teams combined with more than 15 years of technology experience enables him to solve complex problems while always keeping the bottom line in mind. Robert founded and built three start-ups in the tech and marketing fields, developed and sold two online applications, consulted for Fortune 500 and Inc. 500 companies, and has spoken nationally and internationally on software development and agile project management. He's the founder of Data Wranglers DC, a group dedicated to improving the craft of data wrangling, as well as a board member of Data Community DC. He is currently the team leader of data operations at ARPC, an econometrics firm based in Washington, DC. In addition to spending time with his growing family, Robert geeks out on Raspberry Pi's, Arduinos, and automating more of his life through hardware and software. Maurice HT Ling has been programming in Python since 2003. Having completed his PhD in bioinformatics and BSc (Hons) in molecular and cell biology from The University of Melbourne, he is currently a research fellow at Nanyang Technological University, Singapore. He is also an honorary fellow of The University of Melbourne, Australia. Maurice is the chief editor of Computational and Mathematical Biology and coeditor of The Python Papers. Recently, he cofounded the first synthetic biology start-up in Singapore, called AdvanceSyn Pte. Ltd., as the director and chief technology officer. His research interests lie in life itself, such as biological life and artificial life, and artificial intelligence, which use computer science and statistics as tools to understand life and its numerous aspects. In his free time, Maurice likes to read, enjoy a cup of coffee, write his personal journal, or philosophize on various aspects of life. His website and LinkedIn profile are http://maurice.vodien.com and http://www.linkedin.com/in/mauriceling, respectively. Ratanlal Mahanta is a senior quantitative analyst. He holds an MSc degree in computational finance and is currently working at GPSK Investment Group as a senior quantitative analyst. He has 4 years of experience in quantitative trading and strategy development for sell-side and risk consultation firms. He is an expert in high frequency and algorithmic trading. He has expertise in the following areas: • Quantitative trading: This includes FX, equities, futures, options, and engineering on derivatives • Algorithms: This includes Partial Differential Equations, Stochastic Differential Equations, Finite Difference Method, Monte-Carlo, and Machine Learning • Code: This includes R Programming, C++, Python, MATLAB, HPC, and scientific computing • Data analysis: This includes big data analytics (EOD to TBT), Bloomberg, Quandl, and Quantopian • Strategies: This includes Vol Arbitrage, Vanilla and Exotic Options Modeling, trend following, Mean reversion, Co-integration, Monte-Carlo Simulations, ValueatRisk, Stress Testing, Buy side trading strategies with high Sharpe ratio, Credit Risk Modeling, and Credit Rating He has already reviewed Mastering Scientific Computing with R, Mastering R for Quantitative Finance, and Machine Learning with R Cookbook, all by Packt Publishing. You can find out more about him at https://twitter.com/mahantaratan. Yingssu Tsai is a data scientist. She holds degrees from the University of California, Berkeley, and the University of California, Los Angeles. www.PacktPub.com Support files, eBooks, discount offers, and more For support files and downloads related to your book, please visit www.PacktPub.com. Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details. At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks. TM https://www2.packtpub.com/books/subscription/packtlib Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books. Why subscribe? • Fully searchable across every book published by Packt • Copy and paste, print, and bookmark content • On demand and accessible via a web browser Free access for Packt account holders If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access. Table of Contents Preface vii Chapter 1: Getting Started with Raw Data 1 The world of arrays with NumPy 2 Creating an array 2 Mathematical operations 3 Array subtraction 4 Squaring an array 4 A trigonometric function performed on the array 4 Conditional operations 4 Matrix multiplication 5 Indexing and slicing 5 Shape manipulation 6 Empowering data analysis with pandas 7 The data structure of pandas 7 Series 7 DataFrame 8 Panel 9 Inserting and exporting data 10 CSV 11 XLS 11 JSON 12 Database 12 Data cleansing 12 Checking the missing data 13 Filling the missing data 14 String operations 16 Merging data 19 Data operations 20 Aggregation operations 20 [ i ]

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.