Table Of ContentStatistics and Machine Learning Toolbox™
User's Guide
R2016a
How to Contact MathWorks
Latest news: www.mathworks.com
Sales and services: www.mathworks.com/sales_and_services
User community: www.mathworks.com/matlabcentral
Technical support: www.mathworks.com/support/contact_us
Phone: 508-647-7000
The MathWorks, Inc.
3 Apple Hill Drive
Natick, MA 01760-2098
Statistics and Machine Learning Toolbox™ User's Guide
© COPYRIGHT 1993–2016 by The MathWorks, Inc.
The software described in this document is furnished under a license agreement. The software may be used
or copied only under the terms of the license agreement. No part of this manual may be photocopied or
reproduced in any form without prior written consent from The MathWorks, Inc.
FEDERAL ACQUISITION: This provision applies to all acquisitions of the Program and Documentation
by, for, or through the federal government of the United States. By accepting delivery of the Program
or Documentation, the government hereby agrees that this software or documentation qualifies as
commercial computer software or commercial computer software documentation as such terms are used
or defined in FAR 12.212, DFARS Part 227.72, and DFARS 252.227-7014. Accordingly, the terms and
conditions of this Agreement and only those rights specified in this Agreement, shall pertain to and
govern the use, modification, reproduction, release, performance, display, and disclosure of the Program
and Documentation by the federal government (or other entity acquiring for or through the federal
government) and shall supersede any conflicting contractual terms or conditions. If this License fails
to meet the government's needs or is inconsistent in any respect with federal procurement law, the
government agrees to return the Program and Documentation, unused, to The MathWorks, Inc.
Trademarks
MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See
www.mathworks.com/trademarks for a list of additional trademarks. Other product or brand
names may be trademarks or registered trademarks of their respective holders.
Patents
MathWorks products are protected by one or more U.S. patents. Please see
www.mathworks.com/patents for more information.
Revision History
September 1993 First printing Version 1.0
March 1996 Second printing Version 2.0
January 1997 Third printing Version 2.11
November 2000 Fourth printing Revised for Version 3.0 (Release 12)
May 2001 Fifth printing Minor revisions
July 2002 Sixth printing Revised for Version 4.0 (Release 13)
February 2003 Online only Revised for Version 4.1 (Release 13.0.1)
June 2004 Seventh printing Revised for Version 5.0 (Release 14)
October 2004 Online only Revised for Version 5.0.1 (Release 14SP1)
March 2005 Online only Revised for Version 5.0.2 (Release 14SP2)
September 2005 Online only Revised for Version 5.1 (Release 14SP3)
March 2006 Online only Revised for Version 5.2 (Release 2006a)
September 2006 Online only Revised for Version 5.3 (Release 2006b)
March 2007 Eighth printing Revised for Version 6.0 (Release 2007a)
September 2007 Ninth printing Revised for Version 6.1 (Release 2007b)
March 2008 Online only Revised for Version 6.2 (Release 2008a)
October 2008 Online only Revised for Version 7.0 (Release 2008b)
March 2009 Online only Revised for Version 7.1 (Release 2009a)
September 2009 Online only Revised for Version 7.2 (Release 2009b)
March 2010 Online only Revised for Version 7.3 (Release 2010a)
September 2010 Online only Revised for Version 7.4 (Release 2010b)
April 2011 Online only Revised for Version 7.5 (Release 2011a)
September 2011 Online only Revised for Version 7.6 (Release 2011b)
March 2012 Online only Revised for Version 8.0 (Release 2012a)
September 2012 Online only Revised for Version 8.1 (Release 2012b)
March 2013 Online only Revised for Version 8.2 (Release 2013a)
September 2013 Online only Revised for Version 8.3 (Release 2013b)
March 2014 Online only Revised for Version 9.0 (Release 2014a)
October 2014 Online only Revised for Version 9.1 (Release 2014b)
March 2015 Online only Revised for Version 10.0 (Release 2015a)
September 2015 Online only Revised for Version 10.1 (Release 2015b)
March 2016 Online only Revised for Version 10.2 (Release 2016a)
Contents
Getting Started
1
Statistics and Machine Learning Toolbox Product
Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
Key Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
Supported Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3
Statistics and Machine Learning Toolbox Functions with
gpuArray Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5
Organizing Data
2
Other MATLAB Functions Supporting Nominal and Ordinal
Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
Create Nominal and Ordinal Arrays . . . . . . . . . . . . . . . . . . . . 2-4
Create Nominal Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4
Create Ordinal Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6
Change Category Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10
Reorder Category Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12
Reorder Category Levels in Ordinal Arrays . . . . . . . . . . . . . 2-12
Reorder Category Levels in Nominal Arrays . . . . . . . . . . . . 2-13
Categorize Numeric Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-18
Merge Category Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-22
v
Add and Drop Category Levels . . . . . . . . . . . . . . . . . . . . . . . 2-24
Plot Data Grouped by Category . . . . . . . . . . . . . . . . . . . . . . . 2-28
Test Differences Between Category Means . . . . . . . . . . . . . 2-33
Summary Statistics Grouped by Category . . . . . . . . . . . . . . 2-42
Sort Ordinal Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-44
Categorical Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-46
What Are Categorical Arrays? . . . . . . . . . . . . . . . . . . . . . . . 2-46
Categorical Array Conversion . . . . . . . . . . . . . . . . . . . . . . . 2-46
Advantages of Using Categorical Arrays . . . . . . . . . . . . . . . 2-48
Manipulate Category Levels . . . . . . . . . . . . . . . . . . . . . . . . 2-48
Analysis Using Categorical Arrays . . . . . . . . . . . . . . . . . . . 2-48
Reduce Memory Requirements . . . . . . . . . . . . . . . . . . . . . . 2-49
Index and Search Using Categorical Arrays . . . . . . . . . . . . 2-51
Index By Category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-51
Common Indexing and Searching Methods . . . . . . . . . . . . . 2-51
Grouping Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-56
What Are Grouping Variables? . . . . . . . . . . . . . . . . . . . . . . 2-56
Group Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-57
Analysis Using Grouping Variables . . . . . . . . . . . . . . . . . . . 2-57
Missing Group Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-58
Dummy Indicator Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 2-59
What Are Dummy Variables? . . . . . . . . . . . . . . . . . . . . . . . 2-59
Creating Dummy Variables . . . . . . . . . . . . . . . . . . . . . . . . . 2-60
Regression with Categorical Covariates . . . . . . . . . . . . . . . . 2-62
Create a Dataset Array from Workspace Variables . . . . . . . 2-67
Create a Dataset Array from a Numeric Array . . . . . . . . . . 2-67
Create Dataset Array from Heterogeneous Workspace
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-70
Create a Dataset Array from a File . . . . . . . . . . . . . . . . . . . . 2-74
Create a Dataset Array from a Tab-Delimited Text File . . . 2-74
Create a Dataset Array from a Comma-Separated Text File 2-77
vi Contents
Create a Dataset Array from an Excel File . . . . . . . . . . . . . 2-79
Add and Delete Observations . . . . . . . . . . . . . . . . . . . . . . . . . 2-82
Add and Delete Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-86
Access Data in Dataset Array Variables . . . . . . . . . . . . . . . . 2-90
Select Subsets of Observations . . . . . . . . . . . . . . . . . . . . . . . 2-96
Sort Observations in Dataset Arrays . . . . . . . . . . . . . . . . . . 2-100
Merge Dataset Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-104
Stack or Unstack Dataset Arrays . . . . . . . . . . . . . . . . . . . . . 2-108
Calculations on Dataset Arrays . . . . . . . . . . . . . . . . . . . . . . 2-113
Export Dataset Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-116
Clean Messy and Missing Data . . . . . . . . . . . . . . . . . . . . . . 2-118
Dataset Arrays in the Variables Editor . . . . . . . . . . . . . . . . 2-123
Open Dataset Arrays in the Variables Editor . . . . . . . . . . 2-123
Modify Variable and Observation Names . . . . . . . . . . . . . 2-124
Reorder or Delete Variables . . . . . . . . . . . . . . . . . . . . . . . 2-126
Add New Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-128
Sort Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-130
Select a Subset of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-131
Create Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-134
Dataset Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-137
What Are Dataset Arrays? . . . . . . . . . . . . . . . . . . . . . . . . 2-137
Dataset Array Conversion . . . . . . . . . . . . . . . . . . . . . . . . . 2-137
Dataset Array Properties . . . . . . . . . . . . . . . . . . . . . . . . . 2-138
Index and Search Dataset Arrays . . . . . . . . . . . . . . . . . . . . 2-140
Ways To Index and Search . . . . . . . . . . . . . . . . . . . . . . . . 2-140
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-140
vii
Descriptive Statistics
3
Introduction to Descriptive Statistics . . . . . . . . . . . . . . . . . . 3-2
Measures of Central Tendency . . . . . . . . . . . . . . . . . . . . . . . . 3-3
Measures of Central Tendency . . . . . . . . . . . . . . . . . . . . . . . 3-3
Measures of Dispersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5
Compare Measures of Dispersion . . . . . . . . . . . . . . . . . . . . . 3-6
Quantiles and Percentiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7
Exploratory Analysis of Data . . . . . . . . . . . . . . . . . . . . . . . . . 3-11
Resampling Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-16
Bootstrap Resampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-16
Jackknife Resampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-19
Parallel Computing Support for Resampling Methods . . . . . 3-20
Data with Missing Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-21
Working with Data with Missing Values . . . . . . . . . . . . . . . 3-21
Statistical Visualization
4
Introduction to Statistical Visualization . . . . . . . . . . . . . . . . 4-2
Create Scatter Plots Using Grouped Data . . . . . . . . . . . . . . . 4-3
Box Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-6
Compare Grouped Data Using Box Plots . . . . . . . . . . . . . . . . 4-6
Distribution Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-9
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-9
Normal Probability Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-9
Quantile-Quantile Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-12
Cumulative Distribution Plots . . . . . . . . . . . . . . . . . . . . . . . 4-14
Other Probability Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-16
viii Contents
Probability Distributions
5
Working with Probability Distributions . . . . . . . . . . . . . . . . . 5-3
Types of Probability Distributions . . . . . . . . . . . . . . . . . . . . . 5-3
Probability Distribution Objects . . . . . . . . . . . . . . . . . . . . . . 5-4
Probability Distribution Functions . . . . . . . . . . . . . . . . . . . . 5-8
Probability Distribution Apps and User Interfaces . . . . . . . 5-10
Supported Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-17
Continuous Distributions (Data) . . . . . . . . . . . . . . . . . . . . . 5-19
Continuous Distributions (Statistics) . . . . . . . . . . . . . . . . . . 5-23
Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-25
Multivariate Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 5-27
Nonparametric Distributions . . . . . . . . . . . . . . . . . . . . . . . . 5-29
Flexible Distribution Families . . . . . . . . . . . . . . . . . . . . . . . 5-29
Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . 5-30
Negative Loglikelihood Functions . . . . . . . . . . . . . . . . . . . . . 5-33
Random Number Generation . . . . . . . . . . . . . . . . . . . . . . . . . 5-37
Nonparametric and Empirical Probability Distributions . . 5-40
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-40
Kernel Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-40
Empirical Cumulative Distribution Function . . . . . . . . . . . . 5-42
Piecewise Linear Distribution . . . . . . . . . . . . . . . . . . . . . . . 5-44
Pareto Tails . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-45
Triangular Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-46
Fit Kernel Distribution Object to Data . . . . . . . . . . . . . . . . . 5-49
Fit Kernel Distribution Using ksdensity . . . . . . . . . . . . . . . 5-54
Fit Distributions to Grouped Data Using ksdensity . . . . . . 5-57
Create and Plot Empirical Cumulative Distribution
Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-60
Fit a Nonparametric Distribution with Pareto Tails . . . . . . 5-61
ix
Generate Random Numbers Using the Triangular
Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-66
Explore the Probability Distribution Function UI . . . . . . . 5-71
Model Data Using the Distribution Fitting App . . . . . . . . . . 5-74
Explore Probability Distributions Interactively . . . . . . . . . . 5-74
Create and Manage Data Sets . . . . . . . . . . . . . . . . . . . . . . 5-75
Create a New Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-80
Display Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-85
Manage Fits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-87
Evaluate Fits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-88
Exclude Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-92
Save and Load Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-98
Generate a File to Fit and Plot Distributions . . . . . . . . . . . 5-99
Fit a Distribution Using the Distribution Fitting App . . . 5-101
Step 1: Load Sample Data . . . . . . . . . . . . . . . . . . . . . . . . . 5-101
Step 2: Import Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-101
Step 3: Create a New Fit . . . . . . . . . . . . . . . . . . . . . . . . . 5-103
Step 4: Create and Manage Additional Fits . . . . . . . . . . . . 5-108
Custom Distributions Using the Distribution Fitting App 5-111
Opening the Distribution Fitting App . . . . . . . . . . . . . . . . 5-111
Defining Custom Distributions . . . . . . . . . . . . . . . . . . . . . 5-113
Importing Custom Distributions . . . . . . . . . . . . . . . . . . . . 5-113
Explore the Random Number Generation UI . . . . . . . . . . . 5-114
Compare Multiple Distribution Fits . . . . . . . . . . . . . . . . . . 5-117
Fit Probability Distribution Objects to Grouped Data . . . 5-124
Multinomial Probability Distribution Objects . . . . . . . . . . 5-128
Multinomial Probability Distribution Functions . . . . . . . . 5-132
Generate Random Numbers Using Uniform Distribution
Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-135
Represent Cauchy Distribution Using t Location-Scale . . 5-138
Generate Cauchy Random Numbers Using Student’s t . . . 5-142
x Contents