ebook img

Introduction to statistical thought PDF

393 Pages·2007·5.701 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Introduction to statistical thought

Introduction to Statistical Thought Michael Lavine November 11, 2007 i Copyright (cid:13)c 2005 by Michael Lavine C ONTENTS List of Figures vi List of Tables x Preface xi 1 Probability 1 1.1 Basic Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Probability Densities . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Parametric Families of Distributions . . . . . . . . . . . . . . . . . . . 14 1.3.1 The Binomial Distribution . . . . . . . . . . . . . . . . . . . . 14 1.3.2 The Poisson Distribution . . . . . . . . . . . . . . . . . . . . . 17 1.3.3 The Exponential Distribution . . . . . . . . . . . . . . . . . . 20 1.3.4 The Normal Distribution . . . . . . . . . . . . . . . . . . . . . 22 1.4 Centers, Spreads, Means, and Moments . . . . . . . . . . . . . . . . 29 1.5 Joint, Marginal and Conditional Probability . . . . . . . . . . . . . . 40 1.6 Association, Dependence, Independence . . . . . . . . . . . . . . . . 51 1.7 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 1.7.1 Calculating Probabilities . . . . . . . . . . . . . . . . . . . . . 57 1.7.2 Evaluating Statistical Procedures . . . . . . . . . . . . . . . . 61 1.8 R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 1.9 Some Results for Large Samples . . . . . . . . . . . . . . . . . . . . . 77 1.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 2 Modes of Inference 93 2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 2.2 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 2.2.1 Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . 95 ii CONTENTS iii 2.2.2 Displaying Distributions . . . . . . . . . . . . . . . . . . . . . 100 2.2.3 Exploring Relationships . . . . . . . . . . . . . . . . . . . . . 113 2.3 Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 2.3.1 The Likelihood Function . . . . . . . . . . . . . . . . . . . . . 132 2.3.2 Likelihoods from the Central Limit Theorem . . . . . . . . . . 139 2.3.3 Likelihoods for several parameters . . . . . . . . . . . . . . . 144 2.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 2.4.1 The Maximum Likelihood Estimate . . . . . . . . . . . . . . . 154 2.4.2 Accuracy of Estimation . . . . . . . . . . . . . . . . . . . . . . 155 2.4.3 The sampling distribution of an estimator . . . . . . . . . . . 158 2.5 Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 2.6 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 2.7 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 2.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 3 Regression 202 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 3.2 Normal Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . 210 3.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 3.2.2 Inference for Linear Models . . . . . . . . . . . . . . . . . . . 221 3.3 Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . . . 236 3.3.1 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . 236 3.3.2 Poisson Regression . . . . . . . . . . . . . . . . . . . . . . . . 245 3.4 Predictions from Regression . . . . . . . . . . . . . . . . . . . . . . . 250 3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 4 More Probability 263 4.1 More Probability Density . . . . . . . . . . . . . . . . . . . . . . . . . 263 4.2 Random Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 4.2.1 Densities of Random Vectors . . . . . . . . . . . . . . . . . . . 265 4.2.2 Moments of Random Vectors . . . . . . . . . . . . . . . . . . 266 4.2.3 Functions of Random Vectors . . . . . . . . . . . . . . . . . . 266 4.3 Representing Distributions . . . . . . . . . . . . . . . . . . . . . . . . 271 4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 5 Special Distributions 279 5.1 Binomial and Negative Binomial . . . . . . . . . . . . . . . . . . . . 279 5.2 Multinomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 5.3 Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 CONTENTS iv 5.4 Uniform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 5.5 Gamma, Exponential, Chi Square . . . . . . . . . . . . . . . . . . . . 305 5.6 Beta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 5.7 Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 5.7.1 The Univariate Normal Distribution . . . . . . . . . . . . . . . 315 5.7.2 The Multivariate Normal Distribution . . . . . . . . . . . . . . 320 5.8 t and F . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 5.8.1 The t distribution . . . . . . . . . . . . . . . . . . . . . . . . . 328 5.8.2 The F distribution . . . . . . . . . . . . . . . . . . . . . . . . 334 5.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334 6 More Models 342 6.1 Hierarchical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 342 6.2 Time Series and Markov Chains . . . . . . . . . . . . . . . . . . . . . 343 6.3 Contingency Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 6.4 Survival analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 6.5 The Poisson process . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 6.6 Change point models . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 6.7 Spatial models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358 6.8 Point Process Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 358 6.9 Evaluating and enhancing models . . . . . . . . . . . . . . . . . . . . 358 6.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358 7 Mathematical Statistics 360 7.1 Properties of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 360 7.1.1 Sufficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360 7.1.2 Consistency, Bias, and Mean-squared Error . . . . . . . . . . . 363 7.1.3 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 7.1.4 Asymptotic Normality . . . . . . . . . . . . . . . . . . . . . . 365 7.1.5 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 7.2 Transformations of Parameters . . . . . . . . . . . . . . . . . . . . . 365 7.3 Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 7.4 More Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . 365 7.4.1 p values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366 7.4.2 The Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . . 366 7.4.3 The Chi Square Test . . . . . . . . . . . . . . . . . . . . . . . 366 7.4.4 Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366 7.5 Exponential families . . . . . . . . . . . . . . . . . . . . . . . . . . . 366 CONTENTS v 7.6 Location and Scale Families . . . . . . . . . . . . . . . . . . . . . . . 366 7.7 Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366 7.8 Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366 7.9 Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366 7.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 Bibliography 375 L F IST OF IGURES 1.1 pdf for time on hold at Help Line . . . . . . . . . . . . . . . . . . . . 7 1.2 p for the outcome of a spinner . . . . . . . . . . . . . . . . . . . . . 9 Y 1.3 (a): Ocean temperatures; (b): Important discoveries . . . . . . . . . 11 1.4 Change of variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.5 Binomial probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.6 P[X = 3|λ] as a function of λ . . . . . . . . . . . . . . . . . . . . . . 19 1.7 Exponential densities . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.8 Normal densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.9 Ocean temperatures at 45◦N,30◦W, 1000m depth . . . . . . . . . . . 25 1.10 Normal samples and Normal densities . . . . . . . . . . . . . . . . . 27 1.11 hydrographic stations off the coast of Europe and Africa . . . . . . . 31 1.12 Water temperatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 1.13 Two pdf’s with ±1 and ±2 SD’s. . . . . . . . . . . . . . . . . . . . . . 37 1.14 Water temperatures with standard deviations . . . . . . . . . . . . . 41 1.15 Permissible values of N and X . . . . . . . . . . . . . . . . . . . . . . 44 1.16 Features of the joint distribution of (X,Y) . . . . . . . . . . . . . . . 48 1.17 Lengths and widths of sepals and petals of 150 iris plants . . . . . . . 52 1.18 correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 ˆ 1.19 1000 simulations of θ for n.sim = 50, 200, 1000 . . . . . . . . . . . 60 ˆ 1.20 1000 simulations of θ under three procedures . . . . . . . . . . . . . 64 1.21 Monthly concentrations of CO at Mauna Loa . . . . . . . . . . . . . 66 2 1.22 1000 simulations of a FACE experiment . . . . . . . . . . . . . . . . . 69 1.23 Histograms of craps simulations . . . . . . . . . . . . . . . . . . . . . 82 2.1 quantiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 2.2 Histograms of tooth growth . . . . . . . . . . . . . . . . . . . . . . . 101 2.3 Histograms of tooth growth . . . . . . . . . . . . . . . . . . . . . . . 102 vi LIST OF FIGURES vii 2.4 Histograms of tooth growth . . . . . . . . . . . . . . . . . . . . . . . 103 2.5 calorie contents of beef hot dogs . . . . . . . . . . . . . . . . . . . . 107 2.6 Strip chart of tooth growth . . . . . . . . . . . . . . . . . . . . . . . . 110 2.7 Quiz scores from Statistics 103 . . . . . . . . . . . . . . . . . . . . . 112 2.8 QQ plots of water temperatures (◦C) at 1000m depth . . . . . . . . . 114 2.9 Mosaic plot of UCBAdmissions . . . . . . . . . . . . . . . . . . . . . . 118 2.10 Mosaic plot of UCBAdmissions . . . . . . . . . . . . . . . . . . . . . . 119 2.11 Old Faithful data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 2.12 Waiting time versus duration in the Old Faithful dataset . . . . . . . 123 2.13 Time series of duration and waiting time at Old Faithful . . . . . . . 124 2.14 Time series of duration and waiting time at Old Faithful . . . . . . . 125 2.15 Temperature versus latitude for different values of longitude . . . . . 128 2.16 Temperature versus longitude for different values of latitude . . . . . 129 2.17 Spike train from a neuron during a taste experiment. The dots show the times at which the neuron fired. The solid lines show times at which the rat received a drop of a .3 M solution of NaCl. . . . . . . . 130 2.18 Likelihood function for the proportion of red cars . . . . . . . . . . . 134 (cid:80) 2.19 (cid:96)(θ) after y = 40 in 60 quadrats. . . . . . . . . . . . . . . . . . . . 137 i 2.20 Likelihood for Slater School . . . . . . . . . . . . . . . . . . . . . . . 138 2.21 Marginal and exact likelihoods for Slater School . . . . . . . . . . . . 141 2.22 Marginal likelihood for mean CEO salary . . . . . . . . . . . . . . . . 143 2.23 FACE Experiment: data and likelihood . . . . . . . . . . . . . . . . . 146 2.24 Likelihood function for Quiz Scores . . . . . . . . . . . . . . . . . . . 149 2.25 Log of the likelihood function for (λ,θ ) in Example 2.13 . . . . . . . 152 f 2.26 Likelihood function for the probability of winning craps . . . . . . . 157 2.27 Sampling distribution of the sample mean and median . . . . . . . . 160 2.28 Histograms of the sample mean for samples from Bin(n,.1) . . . . . . 162 2.29 Prior, likelihood and posterior in the seedlings example . . . . . . . . 169 2.30 Prior, likelihood and posterior densities for λ with n = 1,4,16 . . . . 171 2.31 Prior, likelihood and posterior densities for λ with n = 60 . . . . . . . 172 2.32 Prior, likelihood and posterior density for Slater School . . . . . . . . 173 2.33 Plug-in predictive distribution for seedlings . . . . . . . . . . . . . . 176 2.34 Predictive distributions for seedlings after n = 0,1,60 . . . . . . . . . 179 2.35 pdf of the Bin(100,.5) distribution . . . . . . . . . . . . . . . . . . . . 184 2.36 pdfs of the Bin(100,.5) (dots) and N(50,5) (line) distributions . . . . 185 2.37 Approximate density of summary statistic t . . . . . . . . . . . . . . . 186 2.38 Number of times baboon father helps own child . . . . . . . . . . . . 190 2.39 Histogram of simulated values of w.tot . . . . . . . . . . . . . . . . . 191 LIST OF FIGURES viii 3.1 Four regression examples . . . . . . . . . . . . . . . . . . . . . . . . 203 3.2 1970 draft lottery. Draft number vs. day of year . . . . . . . . . . . . 206 3.3 Draft number vs. day of year with smoothers . . . . . . . . . . . . . . 207 3.4 Total number of New seedlings 1993 – 1997, by quadrat. . . . . . . . 209 3.5 Calorie content of hot dogs . . . . . . . . . . . . . . . . . . . . . . . 211 3.6 Density estimates of calorie contents of hot dogs . . . . . . . . . . . . 213 3.7 The PlantGrowth data . . . . . . . . . . . . . . . . . . . . . . . . . . 215 3.8 Ice cream consumption versus mean temperature . . . . . . . . . . . 222 3.9 Likelihood functions for (µ,δ ,δ ) in the Hot Dog example. . . . . . 228 M P 3.10 pairs plot of the mtcars data . . . . . . . . . . . . . . . . . . . . . . 230 3.11 mtcars — various plots . . . . . . . . . . . . . . . . . . . . . . . . . . 233 3.12 likelihood functions for β , γ , δ and δ in the mtcars example. . . . 235 1 1 1 2 3.13 Pine cones and O-rings . . . . . . . . . . . . . . . . . . . . . . . . . . 238 3.14 Pine cones and O-rings with regression curves . . . . . . . . . . . . . 239 3.15 Likelihood function for the pine cone data . . . . . . . . . . . . . . . 242 3.16 Actual vs. fitted and residuals vs. fitted for the seedling data . . . . . 247 3.17 Diagnostic plots for the seedling data . . . . . . . . . . . . . . . . . . 249 3.18 Actual mpg and fitted values from three models . . . . . . . . . . . . 251 3.19 Happiness Quotient of bankers and poets . . . . . . . . . . . . . . . . 256 4.1 The (X ,X ) plane and the (Y ,Y ) plane . . . . . . . . . . . . . . . . 270 1 2 1 2 4.2 pmf’s, pdf’s, and cdf’s . . . . . . . . . . . . . . . . . . . . . . . . . . 272 5.1 The Binomial pmf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 5.2 The Negative Binomial pmf . . . . . . . . . . . . . . . . . . . . . . . 289 5.3 Poisson pmf for λ = 1,4,16,64 . . . . . . . . . . . . . . . . . . . . . . 295 5.4 Rutherford and Geiger’s Figure 1 . . . . . . . . . . . . . . . . . . . . 300 5.5 Numbers of firings of a neuron in 150 msec after five different tas- tants. Tastants: 1=MSG .1M; 2=MSG .3M; 3=NaCl .1M; 4=NaCl .3M; 5=water. Panels: A: A stripchart. Each circle represents one delivery of a tastant. B: A mosaic plot. C: Each line represents one tastant. D: Likelihood functions. Each line represents one tastant. . 302 5.6 ThelineshowsPoissonprobabilitiesforλ = 0.2; thecirclesshowthe fraction of times the neuron responded with 0, 1, ..., 5 spikes for each of the five tastants. . . . . . . . . . . . . . . . . . . . . . . . . . 304 5.7 Gamma densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 5.8 Exponential densities . . . . . . . . . . . . . . . . . . . . . . . . . . . 310 5.9 Beta densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 5.10 Water temperatures (◦C) at 1000m depth . . . . . . . . . . . . . . . 316 LIST OF FIGURES ix 5.11 Bivariate Normal density . . . . . . . . . . . . . . . . . . . . . . . . . 323 5.12 Bivariate Normal density . . . . . . . . . . . . . . . . . . . . . . . . . 325 5.13 t densities for four degrees of freedom and the N(0,1) density . . . . 333 6.1 Graphical representation of hierarchical model for fMRI . . . . . . . 343 6.2 Some time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 6.3 Y vs. Y for the Beaver and Presidents data sets . . . . . . . . . . . 347 t+1 t 6.4 Y vs. Y for the Beaver data set and lags 0–5 . . . . . . . . . . . . 348 t+k t 6.5 coplot of Y ∼ Y |Y for the Beaver data set . . . . . . . . . . . . 350 t+1 t−1 t 6.6 Fit of CO data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352 2 6.7 DAX closing prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354 6.8 DAX returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356 7.1 The Be(.39,.01) density . . . . . . . . . . . . . . . . . . . . . . . . . . 370 ¯ 7.2 Densities of Y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372 in 7.3 Densities of Z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 in

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.