ebook img

Data Science Fundamentals Pocket Primer PDF

451 Pages·2021·4.253 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Data Science Fundamentals Pocket Primer

D S ata cience F unDamentalS Pocket Primer LICENSE, DISCLAIMER OF LIABILITY, AND LIMITED WARRANTY By purchasing or using this book and companion files (the “Work”), you agree that this license grants permission to use the contents contained herein, including the disc, but does not give you the right of ownership to any of the textual content in the book / disc or ownership to any of the information or products contained in it. This license does not permit uploading of the Work onto the Internet or on a network (of any kind) without the written consent of the Publisher. Duplication or dissemination of any text, code, simulations, images, etc. contained herein is limited to and subject to licensing terms for the respective products, and permission must be obtained from the Publisher or the owner of the content, etc., in order to reproduce or network any portion of the textual material (in any media) that is contained in the Work. Mercury Learning and inforMation (“MLI” or “the Publisher”) and anyone involved in the creation, writing, or production of the companion disc, accompanying algorithms, code, or computer programs (“the software”), and any accompanying Web site or software of the Work, cannot and do not warrant the performance or results that might be obtained by using the contents of the Work. The author, developers, and the Publisher have used their best efforts to ensure the accuracy and functionality of the textual material and/or programs contained in this package; we, however, make no warranty of any kind, express or implied, regarding the performance of these contents or programs. The Work is sold “as is” without warranty (except for defective materials used in manufacturing the book or due to faulty workmanship). The author, developers, and the publisher of any accompanying content, and anyone involved in the composition, production, and manufacturing of this work will not be liable for damages of any kind arising out of the use of (or the inability to use) the algorithms, source code, computer programs, or textual material contained in this publication. This includes, but is not limited to, loss of revenue or profit, or other incidental, physical, or consequential damages arising out of the use of this Work. The sole remedy in the event of a claim of any kind is expressly limited to replacement of the book and/or disc, and only at the discretion of the Publisher. The use of “implied warranty” and certain “exclusions” vary from state to state, and might not apply to the purchaser of this product. Companion files for this title are available by writing to the publisher at [email protected]. D S ata cience F unDamentalS Pocket Primer Oswald Campesato mercury learning anD inFormation Dulles, Virginia Boston, Massachusetts New Delhi Copyright ©2021 by Mercury Learning and inforMation LLC. All rights reserved. This publication, portions of it, or any accompanying software may not be reproduced in any way, stored in a retrieval system of any type, or transmitted by any means, media, electronic display or mechanical display, including, but not limited to, photocopy, recording, Internet postings, or scanning, without prior permission in writing from the publisher. Publisher: David Pallai Mercury Learning and inforMation 22841 Quicksilver Drive Dulles, VA 20166 [email protected] www.merclearning.com 800-232-0223 O. Campesato. Data Science Fundamentals Pocket Primer. ISBN: 978-1-68392-733-4 The publisher recognizes and respects all marks used by companies, manufacturers, and developers as a means to distinguish their products. All brand names and product names mentioned in this book are trademarks or service marks of their respective companies. Any omission or misuse (of any kind) of service marks or trademarks, etc. is not an attempt to infringe on the property of others. Library of Congress Control Number: 2021937777 212223321 This book is printed on acid-free paper in the United States of America. Our titles are available for adoption, license, or bulk purchase by institutions, corporations, etc. For additional information, please contact the Customer Service Dept. at 800-232-0223(toll free). All of our titles are available in digital format at academiccourseware.com and other digital vendors. Companion files (figures and code listings) for this title are available by contacting info@merclearning. com. The sole obligation of Mercury Learning and inforMation to the purchaser is to replace the disc, based on defective materials or faulty workmanship, but not based on the operation or functionality of the product. I’d like to dedicate this book to my parents– may this bring joy and happiness into their lives. Contents Preface ..........................................................................................................xix Chapter 1 Working With Data ...............................................1 What are Datasets? .............................................................................................1 Data Preprocessing .......................................................................................2 Data Types ..........................................................................................................3 Preparing Datasets .............................................................................................4 Discrete Data Versus Continuous Data .......................................................4 “Binning” Continuous Data ..........................................................................5 Scaling Numeric Data via Normalization ....................................................5 Scaling Numeric Data via Standardization ..................................................6 What to Look for in Categorical Data ..........................................................7 Mapping Categorical Data to Numeric Values ............................................8 Working with Dates ......................................................................................9 Working with Currency ..............................................................................10 Missing Data, Anomalies, and Outliers ...........................................................10 Missing Data ...............................................................................................10 Anomalies and Outliers ..............................................................................11 Outlier Detection ........................................................................................11 What is Data Drift?.....................................................................................12 What is Imbalanced Classification? .................................................................12 What is SMOTE? .............................................................................................14 SMOTE Extensions ....................................................................................14 Analyzing Classifiers (Optional) .......................................................................14 What is LIME? ...........................................................................................15 What is ANOVA? ........................................................................................15 The Bias-Variance Trade-Off ...........................................................................16 Types of Bias in Data ..................................................................................17 Summary ...........................................................................................................18 viii • Contents Chapter 2 Intro to Probability and Statistics .......................19 What is a Probability? .......................................................................................20 Calculating the Expected Value .................................................................20 Random Variables .............................................................................................22 Discrete versus Continuous Random Variables.........................................22 Well-Known Probability Distributions .......................................................22 Fundamental Concepts in Statistics ................................................................23 The Mean ....................................................................................................23 The Median .................................................................................................23 The Mode ....................................................................................................23 The Variance and Standard Deviation .......................................................24 Population, Sample, and Population Variance ...........................................25 Chebyshev’s Inequality ...............................................................................25 What is a P-Value? ......................................................................................25 The Moments of a Function (Optional) ..........................................................26 What is Skewness? ......................................................................................26 What is Kurtosis? ........................................................................................26 Data and Statistics ............................................................................................27 The Central Limit Theorem .......................................................................27 Correlation versus Causation .....................................................................28 Statistical Inferences ...................................................................................28 Statistical Terms – RSS, TSS, R^2, and F1 Score ...........................................28 What is an F1 Score? ..................................................................................29 Gini Impurity, Entropy, and Perplexity ...........................................................30 What is the Gini Impurity? .........................................................................30 What is Entropy? ........................................................................................30 Calculating Gini Impurity and Entropy Values .........................................31 Multidimensional Gini Index .....................................................................32 What is Perplexity? .....................................................................................32 Cross-Entropy and KL Divergence .................................................................32 What is Cross-Entropy? ..............................................................................33 What is KL Divergence? ............................................................................33 What’s their Purpose? .................................................................................34 Covariance and Correlation Matrices ..............................................................34 The Covariance Matrix ...............................................................................34 Covariance Matrix: An Example ................................................................35 The Correlation Matrix ...............................................................................36 Eigenvalues and Eigenvectors....................................................................36 Calculating Eigenvectors: A Simple Example .................................................36 Gauss Jordan Elimination (Optional).........................................................37 PCA (Principal Component Analysis) .............................................................38 The New Matrix of Eigenvectors ...............................................................40 Well-Known Distance Metrics .........................................................................41 Pearson Correlation Coefficient .................................................................41 Jaccard Index (or Similarity).......................................................................41 Local Sensitivity Hashing (Optional) .........................................................42 Contents • ix Types of Distance Metrics ................................................................................42 What is Bayesian Inference? ............................................................................44 Bayes’ Theorem ..........................................................................................44 Some Bayesian Terminology ......................................................................44 What is MAP? .............................................................................................45 Why Use Bayes’ Theorem? ........................................................................45 Summary ...........................................................................................................45 Chapter 3 Linear Algebra Concepts ....................................47 What is Linear Algebra? ...................................................................................48 What are Vectors? .............................................................................................48 The Norm of a Vector .................................................................................48 The Inner Product of Two Vectors .............................................................48 The Cosine Similarity of Two Vectors ........................................................49 Bases and Spanning Sets ............................................................................50 Three Dimensional Vectors and Beyond ...................................................50 What are Matrices? ..........................................................................................51 Add and Multiply Matrices .........................................................................51 The Determinant of a Square Matrix.........................................................51 Well-Known Matrices .......................................................................................52 Properties of Orthogonal Matrices.............................................................53 Operations Involving Vectors and Matrices ...............................................53 Gauss Jordan Elimination (Optional) ..............................................................53 Covariance and Correlation Matrices ..............................................................54 The Covariance Matrix ...............................................................................55 Covariance Matrix: An Example ................................................................56 The Correlation Matrix ...............................................................................56 Eigenvalues and Eigenvectors .........................................................................57 Calculating Eigenvectors: A Simple Example ...........................................57 What is PCA (Principal Component Analysis)? ..............................................58 The Main Steps in PCA ....................................................................................58 The New Matrix of Eigenvectors ...............................................................60 Dimensionality Reduction ...............................................................................60 Dimensionality Reduction Techniques ...........................................................61 The Curse of Dimensionality .....................................................................62 SVD (Singular Value Decomposition) .......................................................62 LLE (Locally Linear Embedding) .............................................................63 UMAP .........................................................................................................63 t-SNE ..........................................................................................................64 PHATE ........................................................................................................64 Linear Versus Non-Linear Reduction Techniques .........................................65 Complex Numbers (Optional) .........................................................................66 Complex Numbers on the Unit Circle .......................................................66 Complex Conjugate Root Theorem ...........................................................67 Hermitian Matrices.....................................................................................67 Summary ...........................................................................................................67

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.