ebook img

Probability for Data Scientists PDF

355 Pages·2019·9.859 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Probability for Data Scientists

Probability for Data Scientists 1st Edition Probability for Data Scientists 1st Edition Juana Sánchez University of California, Los Angeles SAN DIEGO Bassim Hamadeh, CEO and Publisher Mieka Portier, Acquisitions Editor Tony Paese, Project Editor Sean Adams, Production Editor Jess Estrella, Senior Graphic Designer Alexa Lucido, Licensing Associate Susana Christie, Developmental Editor Natalie Piccotti, Senior Marketing Manager Kassie Graves, Vice President of Editorial Jamie Giganti, Director of Academic Publishing Copyright © 2020 by Cognella, Inc. All rights reserved. No part of this publication may be reprinted, reproduced, transmitted, or utilized in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information retrieval system without the written permission of Cognella, Inc. For inquiries regarding permissions, translations, foreign rights, audio rights, and any other forms of reproduction, please contact the Cognella Licensing Department at [email protected]. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Cover image and interior image copyright © 2018 Depositphotos/SergeyNivens; © 2017 Depositphotos/rfphoto; © 2015 Depositphotos/creisinger; © 2014 Depositphotos/Neode; © 2013 Depositphotos/branex; © 2013 Deposit- photos/vitstudio; © 2012 Depositphotos/oconner; © 2012 Depositphotos/scanrail; © 2016 Depositphotos/lamnee; © 2012 Depositphotos/shirophoto. Printed in the United States of America. 3970 Sorrento Valley Blvd., Ste. 500, San Diego, CA 92121 Contents PREFACE XVII Part 1. Probability in Discrete Sample Spaces 1 1 An Overview of the Origins of the Mathematical Theory of Probability 3 2 Building Blocks of Modern Probability Modeling 29 3 Rational Use of Probability in Data Science 57 4 Sampling and Repeated Trials 101 5 Probability Models for a Single Discrete Random Variable 139 6 Probability Models for More Than One Discrete Random Variable 193 Part 2. Probability in Continuous Sample Spaces 221 7 Infinite and Continuous Sample Spaces 223 8 Models for More Than One Continuous Random Variable 273 9 Some Theorems of Probability and Their Application in Statistics 299 10 How All of the Above Gets Used in Unsuspected Applications 333 v Detailed Contents PREFACE XVII Part 1. Probability in Discrete Sample Spaces 1 1 An Overview of the Origins of the Mathematical Theory of Probability 3 1.1 Measuring uncertainty 4 1.1.1 Where do probabilities come from? 4 1.1.2 Exercises 6 1.2 When mathematics met probability 8 1.2.1 It all started with long (repeated) observations (experiments) that did not conform with our intuition 8 1.2.2 Exercises 10 1.2.3 Historical empirical facts that puzzled gamblers and mathematicians alike in the seventeenth century 10 1.2.4 Experiments to reconcile facts and intuition. Maybe the model is wrong 10 1.2.5 Exercises 12 1.2.6 The Law of large numbers and the frequentist definition of probability 13 1.2.7 Exercises 14 1.3 Classical definition of probability. How gamblers and mathematicians in the seventeenth century reconciled observation with intuition. 14 1.3.1 The status of probability studies before Kolmogorov 16 1.3.2 Kolmogorov Axioms of Probability and modern probability 17 1.4 Probability modeling in data science 18 1.5 Probability is not just about games of chance and balls in urns 20 1.6 Mini quiz 22 1.7 R code 24 1.7.1 Simulating roll of three dice 24 1.7.2 Simulating roll of two dice 25 1.8 Chapter Exercises 25 1.9 Chapter References 28 vii 2 Building Blocks of Modern Probability Modeling 29 2.1 Learning the vocabulary of probability: experiments, sample spaces, and events. 30 2.1.1 Exercises 32 2.2 Sets 33 2.2.1 Exercises 34 2.3 The sample space 35 2.3.1 A note of caution 37 2.3.2 Exercises 38 2.4 Events 39 2.5 Event operations 41 2.6 Algebra of events 46 2.6.1 Exercises 46 2.7 Probability of events 49 2.8 Mini quiz 49 2.9 R code 51 2.10 Chapter Exercises 52 2.11 Chapter References 55 3 Rational Use of Probability in Data Science 57 3.1 Modern mathematical approach to probability theory 58 3.1.1 Properties of a probability function 59 3.1.2 Exercises 63 3.2 C alculating the probability of events when the probability of the outcomes in the sample space is known 64 3.2.1 Exercises 66 3.3 I ndependence of events. Product rule for joint occurrence of independent events 67 3.3.1 Exercises 70 3.4 Conditional Probability 71 3.4.1 An aid: Using two-way tables of counts or proportions to visualize conditional probability 73 3.4.2 An aid: Tree diagrams to visualize a sequence of events 74 3.4.3 Constructing a two way table of joint probabilities from a tree 75 3.4.4 Conditional probabilities satisfy axioms of probability and have the same properties as unconditional probabilities 76 3.4.5 Conditional probabilities extended to more than two events 77 3.4.6 Exercises 78 3.5 Law of total probability 79 3.5.1 Exercises 80 viii Probability for Data Scientists 3.6 Bayes theorem 81 3.6.1 Bayes Theorem 82 3.6.2 Exercises 87 3.7 Mini quiz 88 3.8 R code 90 3.8.1 Finding probabilities of matching 90 3.8.2 Exercises 91 3.9 Chapter Exercises 91 3.10 Chapter References 98 4 Sampling and Repeated Trials 101 4.1 Sampling 101 4.1.1 n-tuples 102 4.1.2 A prototype model for sampling from a finite population 103 4.1.3 Sets or samples? 106 4.1.4 An application of an urn model in computer science 110 4.1.5 Exercises 111 4.1.6 An application of urn sampling models in physics 112 4.2 Inquiring about diversity 113 4.2.1 The number of successes in a sample. General approach 114 4.2.2 The difference between k successes and successes in k specified draws 117 4.3 Independent trials of an experiment 118 4.3.1 Independent Bernoulli Trials 121 4.3.2 Exercises 123 4.4 Mini Quiz 124 4.5 R corner 126 R exercise Birthdays. 126 4.6 Chapter Exercises 127 4.7 Chapter References 130 SIMULATION: Computing the Probabilities of Matching Birthdays 131 The birthday matching problem 131 The solution using basic probability 131 The solution using simulation 134 Testing assumptions 136 Using R statistical software 137 Summary comments on simulation 137 Chapter References 137 Detailed Contents ix 5 Probability Models for a Single Discrete Random Variable 139 5.1 New representation of a familiar problem 139 5.2 Random variables 142 5.2.1 The probability mass function of a discrete random variable 142 5.2.2 The cumulative distribution function of a discrete random variable 146 5.2.3 Functions of a discrete random variable 147 5.2.4 Exercises 147 5.3 Expected value, variance, standard deviation and median of a discrete random variable 148 5.3.1 The expected value of a discrete random variable 148 5.3.2 The expected value of a function of a discrete random variable 149 5.3.3 The variance and standard deviation of a discrete random variable 149 5.3.4 The moment generating function of a discrete random variable 150 5.3.5 The median of a discrete random variable 151 5.3.6 Variance of a function of a discrete random variable 151 5.3.7 Exercises 151 5.4 Properties of the expected value and variance of a linear function of a discrete random variable 153 5.4.1 Short-cut formula for the variance of a random variable 154 5.4.2 Exercises 155 5.5 Expectation and variance of sums of independent random variables 156 5.5.1 Exercises 159 5.6 Named discrete random variables, their expectations, variances and moment generating functions 159 5.7 Discrete uniform random variable 160 5.8 Bernoulli random variable 160 5.8.1 Exercises 161 5.9 Binomial random variable 161 5.9.1 Applicability of the Binomial probability mass function in Statistics 164 5.9.2 Exercises 164 5.10 The geometric random variable 166 5.10.1 Exercises 168 5.11 Negative Binomial random variable 169 5.11.1 Exercises 171 5.12 The hypergeometric distribution 171 5.12.1 Exercises 172 5.13 W hen to use binomial, when to use hypergeometric? When to assume independence in sampling? 173 5.13.1 Implications for data science 174 5.14 Th e Poisson random variable 174 5.14.1 Exercises 178 x Probability for Data Scientists 5.15 The choice of probability models in data science 179 5.15.1 Zipf laws and the Internet. Scalability. Heavy tails distributions. 180 5.16 Mini quiz 181 5.17 R code 183 5.18 Chapter Exercises 186 5.19 Chapter References 191 6 Probability Models for More Than One Discrete Random Variable 193 6.1 Joint probability mass functions 193 6.1.1 Example 194 6.1.1 Exercises 196 6.2 Marginal or total probability mass functions 197 6.2.1 Exercises 199 6.3 Independence of two discrete random variables 199 6.3.1 Exercises 200 6.4 Conditional probability mass functions 201 6.4.1 Exercises 202 6.5 Expectation of functions of two random variables 203 6.5.1 Exercises 208 6.6 Covariance and Correlation 208 6.6.1 Alternative computation of the covariance 208 6.6.2 The correlation coefficient. Rescaling the covariance 208 6.6.3 Exercises 210 6.7 Linear combination of two random variables. Breaking down the problem into simpler components 211 6.7.1 Exercises 212 6.8 Covariance between linear functions of the random variables 212 6.9 J oint distributions of independent named random variables. Applications in mathematical statistics 213 6.10 The multinomial probability mass function 214 6.10.1 Exercises 215 6.11 Mini quiz 215 6.12 Chapter Exercises 218 6.13 Chapter References 220 Detailed Contents xi

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.