ebook img

Foundations for Analytics with Python: From Non-Programmer to Hacker PDF

351 Pages·2016·17.2 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Foundations for Analytics with Python: From Non-Programmer to Hacker

Foundations for Analytics with Python Clinton W. Brownley BBeeiijjiinngg BBoossttoonn FFaarrnnhhaamm SSeebbaassttooppooll TTookkyyoo For Aisha and Amaya, “Education is the kindling of a flame, not the filling of a vessel.” —Socrates May you always enjoy stoking the fire. Foundations for Analytics with Python by Clinton W. Brownley Copyright © 2016 Clinton Brownley. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://safaribooksonline.com). For more information, contact our corporate/ institutional sales department: 800-998-9938 or [email protected]. Editors: Laurel Ruma and Tim McGovern Indexer: Judith McConville Production Editor: Colleen Cole Interior Designer: David Futato Copyeditor: Jasmine Kwityn Cover Designer: Karen Montgomery Proofreader: Rachel Head Illustrator: Rebecca Demarest August 2016: First Edition Revision History for the First Edition 2016-08-10: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781491922538 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Foundations for Analytics with Python, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 978-1-491-92253-8 [LSI] Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1. Python Basics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 How to Create a Python Script 1 How to Run a Python Script 4 Useful Tips for Interacting with the Command Line 7 Python’s Basic Building Blocks 11 Numbers 12 Strings 14 Regular Expressions and Pattern Matching 19 Dates 22 Lists 25 Tuples 31 Dictionaries 32 Control Flow 37 Reading a Text File 44 Create a Text File 44 Script and Input File in Same Location 47 Modern File-Reading Syntax 47 Reading Multiple Text Files with glob 48 Create Another Text File 49 Writing to a Text File 52 Add Code to first_script.py 53 Writing to a Comma-Separated Values (CSV) File 55 print Statements 57 Chapter Exercises 58 v 2. Comma-Separated Values (CSV) Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Base Python Versus pandas 61 Read and Write a CSV File (Part 1) 62 How Basic String Parsing Can Fail 69 Read and Write a CSV File (Part 2) 70 Filter for Specific Rows 72 Value in Row Meets a Condition 73 Value in Row Is in a Set of Interest 75 Value in Row Matches a Pattern/Regular Expression 77 Select Specific Columns 79 Column Index Values 79 Column Headings 81 Select Contiguous Rows 83 Add a Header Row 86 Reading Multiple CSV Files 88 Count Number of Files and Number of Rows and Columns in Each File 90 Concatenate Data from Multiple Files 93 Sum and Average a Set of Values per File 97 Chapter Exercises 100 3. Excel Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Introspecting an Excel Workbook 104 Processing a Single Worksheet 109 Read and Write an Excel File 109 Filter for Specific Rows 113 Select Specific Columns 120 Reading All Worksheets in a Workbook 124 Filter for Specific Rows Across All Worksheets 124 Select Specific Columns Across All Worksheets 127 Reading a Set of Worksheets in an Excel Workbook 129 Filter for Specific Rows Across a Set of Worksheets 129 Processing Multiple Workbooks 132 Count Number of Workbooks and Rows and Columns in Each Workbook 134 Concatenate Data from Multiple Workbooks 136 Sum and Average Values per Workbook and Worksheet 138 Chapter Exercises 142 4. Databases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Python’s Built-in sqlite3 Module 145 Insert New Records into a Table 151 Update Records in a Table 156 MySQL Database 160 vi | Table of Contents Insert New Records into a Table 165 Query a Table and Write Output to a CSV File 170 Update Records in a Table 172 Chapter Exercises 177 5. Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Find a Set of Items in a Large Collection of Files 179 Calculate a Statistic for Any Number of Categories from Data in a CSV File 192 Calculate Statistics for Any Number of Categories from Data in a Text File 204 Chapter Exercises 213 6. Figures and Plots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 matplotlib 215 Bar Plot 216 Histogram 218 Line Plot 220 Scatter Plot 222 Box Plot 224 pandas 226 ggplot 227 seaborn 231 7. Descriptive Statistics and Modeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 Datasets 239 Wine Quality 239 Customer Churn 240 Wine Quality 241 Descriptive Statistics 241 Grouping, Histograms, and t-tests 243 Pairwise Relationships and Correlation 244 Linear Regression with Least-Squares Estimation 247 Interpreting Coefficients 249 Standardizing Independent Variables 249 Making Predictions 251 Customer Churn 252 Logistic Regression 255 Interpreting Coefficients 257 Making Predictions 259 8. Scheduling Scripts to Run Automatically. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 Task Scheduler (Windows) 261 The cron Utility (macOS and Unix) 270 Table of Contents | vii Crontab File: One-Time Set-up 271 Adding Cron Jobs to the Crontab File 273 9. Where to Go from Here. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 Additional Standard Library Modules and Built-in Functions 278 Python Standard Library (PSL): A Few More Standard Modules 278 Built-in Functions 279 Python Package Index (PyPI): Additional Add-in Modules 280 NumPy 280 SciPy 286 Scikit-Learn 290 A Few Additional Add-in Packages 292 Additional Data Structures 293 Stacks 293 Queues 294 Graphs 294 Trees 295 Where to Go from Here 295 A. Download Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 B. Answers to Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 Bibliography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 viii | Table of Contents

Description:
If you’re like many of Excel’s 750 million users, you want to do more with your data—like repeating similar analyses over hundreds of files, or combining data in many files for analysis at one time. This practical guide shows ambitious non-programmers how to automate and scale the processing a
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.