Table Of ContentThe correct bibliographic citation for this manual is as follows: Carver, Robert. 2019. Practical Data Analysis with
JMP®, Third Edition. Cary, NC: SAS Institute Inc.
Practical Data Analysis with JMP®, Third Edition
Copyright © 2019, SAS Institute Inc., Cary, NC, USA
ISBN 978-1-64295-614-6 (Hardcover)
ISBN 978-1-64295-610-8 (Paperback)
ISBN 978-1-64295-611-5 (Web PDF)
ISBN 978-1-64295-612-2 (EPUB)
ISBN 978-1-64295-613-9 (Kindle)
All Rights Reserved. Produced in the United States of America.
For a hard copy book: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in
any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission
of the publisher, SAS Institute Inc.
For a web download or e-book: Your use of this publication shall be governed by the terms established by the vendor
at the time you acquire this publication.
The scanning, uploading, and distribution of this book via the Internet or any other means without the permission of
the publisher is illegal and punishable by law. Please purchase only authorized electronic editions and do not
participate in or encourage electronic piracy of copyrighted materials. Your support of others’ rights is appreciated.
U.S. Government License Rights; Restricted Rights: The Software and its documentation is commercial computer
software developed at private expense and is provided with RESTRICTED RIGHTS to the United States Government.
Use, duplication, or disclosure of the Software by the United States Government is subject to the license terms of this
Agreement pursuant to, as applicable, FAR 12.212, DFAR 227.7202-1(a), DFAR 227.7202-3(a), and DFAR 227.7202-4,
and, to the extent required under U.S. federal law, the minimum restricted rights as set out in FAR 52.227-19 (DEC
2007). If FAR 52.227-19 is applicable, this provision serves as notice under clause (c) thereof and no other notice is
required to be affixed to the Software or documentation. The Government’s rights in Software and documentation
shall be only those set forth in this Agreement.
SAS Institute Inc., SAS Campus Drive, Cary, NC 27513-2414
October 2019
SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute
Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.
SAS software may be provided with certain third-party software, including but not limited to open-source software,
which is licensed under its applicable third-party software license agreement. For license information about third-
party software distributed with SAS software, refer to http://support.sas.com/thirdpartylicenses.
Contents
About This Book .................................................................................................................. ix
About The Author .............................................................................................................. xvii
Chapter 1: Getting Started: Data Analysis with JMP ............................................................ 1
Overview .................................................................................................................................................................1
Goals of Data Analysis: Description and Inference .................................................................................................1
Types of Data ..........................................................................................................................................................2
Starting JMP ............................................................................................................................................................3
A Simple Data Table ...............................................................................................................................................5
Graph Builder: An Interactive Tool to Explore Data .................................................................................................8
Using an Analysis Platform ................................................................................................................................... 11
Row States ............................................................................................................................................................ 14
Exporting and Sharing JMP Reports ..................................................................................................................... 16
Saving and Reproducing Your Work ..................................................................................................................... 20
Leaving JMP ......................................................................................................................................................... 21
Chapter 2: Data Sources and Structures ............................................................................ 23
Overview ............................................................................................................................................................... 23
Populations, Processes, and Samples .................................................................................................................. 23
Representativeness and Sampling ........................................................................................................................ 25
Cross-Sectional and Time Series Sampling .......................................................................................................... 28
Study Design: Experimentation, Observation, and Surveying ............................................................................... 29
Creating a Data Table ........................................................................................................................................... 36
Raw Case Data and Summary Data ..................................................................................................................... 36
Application ............................................................................................................................................................. 37
Chapter 3: Describing a Single Variable ............................................................................. 39
Overview ............................................................................................................................................................... 39
The Concept of a Distribution ................................................................................................................................ 39
Variable Types and Their Distributions.................................................................................................................. 40
Distribution of a Categorical Variable .................................................................................................................... 41
Using Graph Builder to Explore Categorical Data Visually .................................................................................... 44
Distribution of a Quantitative Variable ................................................................................................................... 45
iv Contents
Using the Distribution Platform for Continuous Data ............................................................................................ 46
Exploring Further with the Graph Builder.............................................................................................................. 52
Summary Statistics for a Single Variable.............................................................................................................. 53
Application ............................................................................................................................................................ 56
Chapter 4: Describing Two Variables at a Time ................................................................. 61
Overview .............................................................................................................................................................. 61
Two-by-Two: Bivariate Data ................................................................................................................................. 61
Describing Covariation: Two Categorical Variables .............................................................................................. 63
Describing Covariation: One Continuous, One Categorical Variable .................................................................... 71
Describing Covariation: Two Continuous Variables .............................................................................................. 74
Application ............................................................................................................................................................ 82
Chapter 5: Review of Descriptive Statistics ....................................................................... 87
Overview .............................................................................................................................................................. 87
The World Development Indicators ...................................................................................................................... 87
Questions for Analysis .......................................................................................................................................... 88
Applying an Analytic Framework .......................................................................................................................... 89
Preparation for Analysis ....................................................................................................................................... 92
Univariate Descriptions......................................................................................................................................... 92
Explore Relationships with Graph Builder ............................................................................................................ 95
Further Analysis with the Multivariate Platform ..................................................................................................... 98
Further Analysis with Fit Y by X .......................................................................................................................... 100
Summing Up: Interpretation and Conclusions .................................................................................................... 101
Visualizing Multiple Relationships ...................................................................................................................... 101
Chapter 6: Elementary Probability and Discrete Distributions ......................................... 105
Overview ............................................................................................................................................................ 105
The Role of Probability in Data Analysis............................................................................................................. 105
Elements of Probability Theory ........................................................................................................................... 106
Contingency Tables and Probability ................................................................................................................... 109
Discrete Random Variables: From Events to Numbers ...................................................................................... 111
Three Common Discrete Distributions ................................................................................................................ 112
Simulating Random Variation with JMP.............................................................................................................. 116
Discrete Distributions as Models of Real Processes .......................................................................................... 117
Application .......................................................................................................................................................... 118
Chapter 7: The Normal Model .......................................................................................... 123
Overview ............................................................................................................................................................ 123
Continuous Data and Probability ........................................................................................................................ 123
Density Functions ............................................................................................................................................... 124
The Normal Model .............................................................................................................................................. 127
Normal Calculations ........................................................................................................................................... 128
Checking Data for the Suitability of a Normal Model .......................................................................................... 133
Contents v
Generating Pseudo-Random Normal Data .......................................................................................................... 137
Application ........................................................................................................................................................... 138
Chapter 8: Sampling and Sampling Distributions ............................................................. 143
Overview ............................................................................................................................................................. 143
Why Sample? ...................................................................................................................................................... 143
Methods of Sampling........................................................................................................................................... 144
Using JMP to Select a Simple Random Sample ................................................................................................. 145
Variability Across Samples: Sampling Distributions ............................................................................................ 148
Application ........................................................................................................................................................... 159
Chapter 9: Review of Probability and Probabilistic Sampling ........................................... 163
Overview ............................................................................................................................................................. 163
Probability Distributions and Density Functions .................................................................................................. 163
The Normal and t Distributions ............................................................................................................................ 164
The Usefulness of Theoretical Models ................................................................................................................ 166
When Samples Surprise Us: Ordinary and Extraordinary Sampling Variability ................................................... 167
Conclusion .......................................................................................................................................................... 171
Chapter 10: Inference for a Single Categorical Variable .................................................. 173
Overview ............................................................................................................................................................. 173
Two Inferential Tasks .......................................................................................................................................... 173
Statistical Inference Is Always Conditional .......................................................................................................... 174
Using JMP to Conduct a Significance Test ......................................................................................................... 174
Confidence Intervals............................................................................................................................................ 179
Using JMP to Estimate a Population Proportion .................................................................................................. 179
A Few Words about Error .................................................................................................................................... 183
Application ........................................................................................................................................................... 184
Chapter 11: Inference for a Single Continuous Variable .................................................. 189
Overview ............................................................................................................................................................. 189
Conditions for Inference ...................................................................................................................................... 189
Using JMP to Conduct a Significance Test ......................................................................................................... 190
What If Conditions Are Not Satisfied? ................................................................................................................. 197
Using JMP to Estimate a Population Mean ......................................................................................................... 197
Matched Pairs: One Variable, Two Measurements ............................................................................................. 199
Application ........................................................................................................................................................... 201
Chapter 12: Chi-Square Tests .......................................................................................... 205
Overview ............................................................................................................................................................. 205
Chi-Square Goodness-of-Fit Test ....................................................................................................................... 205
Inference for Two Categorical Variables ............................................................................................................. 208
Contingency Tables Revisited ............................................................................................................................. 209
Chi-Square Test of Independence ...................................................................................................................... 211
Application ........................................................................................................................................................... 213
vi Contents
Chapter 13: Two-Sample Inference for a Continuous Variable ........................................ 217
Overview ............................................................................................................................................................ 217
Conditions for Inference ..................................................................................................................................... 217
Using JMP to Compare Two Means ................................................................................................................... 217
Using JMP to Compare Two Variances .............................................................................................................. 224
Application .......................................................................................................................................................... 226
Chapter 14: Analysis of Variance ..................................................................................... 229
Overview ............................................................................................................................................................ 229
What Are We Assuming? ................................................................................................................................... 229
One-Way ANOVA ............................................................................................................................................... 230
What If Conditions Are Not Satisfied? ................................................................................................................ 237
Including a Second Factor with Two-Way ANOVA ............................................................................................. 238
Application .......................................................................................................................................................... 245
Chapter 15: Simple Linear Regression Inference ............................................................. 249
Overview ............................................................................................................................................................ 249
Fitting a Line to Bivariate Continuous Data ........................................................................................................ 249
The Simple Regression Model ........................................................................................................................... 253
What Are We Assuming? ................................................................................................................................... 255
Interpreting Regression Results ......................................................................................................................... 256
Application .......................................................................................................................................................... 261
Chapter 16: Residuals Analysis and Estimation ............................................................... 267
Overview ............................................................................................................................................................ 267
Conditions for Least Squares Estimation............................................................................................................ 267
Residuals Analysis ............................................................................................................................................. 268
Estimation ........................................................................................................................................................... 276
Application .......................................................................................................................................................... 280
Chapter 17: Review of Univariate and Bivariate Inference............................................... 285
Overview ............................................................................................................................................................ 285
Research Context ............................................................................................................................................... 285
One Variable at a Time....................................................................................................................................... 286
Life Expectancy by Income Group ...................................................................................................................... 287
Life Expectancy by GDP per Capita ................................................................................................................... 291
Conclusion .......................................................................................................................................................... 293
Chapter 18: Multiple Regression ...................................................................................... 295
Overview ............................................................................................................................................................ 295
The Multiple Regression Model .......................................................................................................................... 295
Visualizing Multiple Regression .......................................................................................................................... 296
Fitting a Model .................................................................................................................................................... 298
A More Complex Model ...................................................................................................................................... 302
Residuals Analysis in the Fit Model Platform ...................................................................................................... 304
Using a Regression Tree Approach: The Partition Platform ............................................................................... 306
Contents vii
Collinearity .......................................................................................................................................................... 309
Evaluating Alternative Models ............................................................................................................................. 315
Application ........................................................................................................................................................... 319
Chapter 19: Categorical, Curvilinear, and Non-Linear Regression Models ....................... 323
Overview ............................................................................................................................................................. 323
Dichotomous Independent Variables................................................................................................................... 323
Dichotomous Dependent Variable ....................................................................................................................... 327
Curvilinear and Non-Linear Relationships ........................................................................................................... 330
More Non-Linear Functions ................................................................................................................................. 337
Application ........................................................................................................................................................... 338
Chapter 20: Basic Forecasting Techniques ...................................................................... 341
Overview ............................................................................................................................................................. 341
Detecting Patterns Over Time ............................................................................................................................. 341
Smoothing Methods ............................................................................................................................................ 344
Trend Analysis .................................................................................................................................................... 350
Autoregressive Models ........................................................................................................................................ 352
Application ........................................................................................................................................................... 355
Chapter 21: Elements of Experimental Design ................................................................. 359
Overview ............................................................................................................................................................. 359
Why Experiment? ................................................................................................................................................ 360
Goals of Experimental Design ............................................................................................................................. 360
Factors, Blocks, and Randomization ................................................................................................................... 361
Multi-Factor Experiments and Factorial Designs ................................................................................................. 362
Blocking ............................................................................................................................................................... 369
A Design for Main Effects Only ........................................................................................................................... 371
Definitive Screening Designs .............................................................................................................................. 373
Non-Linear Response Surface Designs .............................................................................................................. 375
Application ........................................................................................................................................................... 379
Chapter 22: Quality Improvement .................................................................................... 385
Overview ............................................................................................................................................................. 385
Processes and Variation ..................................................................................................................................... 385
Control Charts ..................................................................................................................................................... 386
Variability Charts ................................................................................................................................................. 395
Capability Analysis .............................................................................................................................................. 398
Pareto Charts ...................................................................................................................................................... 400
Application ........................................................................................................................................................... 402
Bibliography ..................................................................................................................... 407
Index ................................................................................................................................. 413
viii Contents
About This Book
What Does This Book Cover?
Purpose: Learning to Reason Statistically
We live in a world of uncertainty. Today more than ever before, we have vast resources of data
available to shed light on crucial questions. But at the same time, the sheer volume and
complexity of the “data deluge” can distract and overwhelm us. The goal of applied statistical
analysis is to work with data to calibrate, cope with, and sometimes reduce uncertainty. Business
decisions, public policies, scientific research, and news reporting are all shaped by statistical
analysis and reasoning. Statistical thinking is an essential part of the boom in “big data analytics”
in numerous professions. This book will help you use and discriminate among some fundamental
techniques of analysis, and it will also help you engage in statistical thinking by analyzing real
problems. You will come to see statistical investigations as an iterative process and will gain
experience in the major phases of that process.
To be an effective analyst or consumer of other people’s analyses, you must know how to use
these techniques, when to use them, and how to communicate their implications. Knowing how
to use these techniques involves mastery of computer software like JMP. Knowing when to use
these techniques requires an understanding of the theory underlying the techniques and
practice with applications of the theory. Knowing how to effectively communicate with
consumers of an analysis or with other analysts requires a clear understanding of the theory and
techniques, as well as clarity of expression, directed toward one’s audience.
There was a time when a first course in statistics emphasized abstract theory, laborious
computation, and small sets of artificial data—but not practical data analysis or interpretation.
Those days are thankfully past, and now we can address all three of the skill sets just cited.
Scope and Structure of This Book
As a discipline, statistics is large and growing; the same is true of JMP. One paperback book must
limit its scope, and the content boundaries of this book are set intentionally along several
dimensions.
First, this book provides considerable training in the basic functions of JMP 15. JMP is a full-
featured, highly interactive, visual, and comprehensive package. The book assumes that you
have the software at your school or office. The software’s capabilities extend far beyond an
introductory course, and this book makes no attempt to “cover” the entire program. The book
introduces students to its major platforms and essential features and should leave students with
x Practical Data Analysis with JMP, Third Edition
sufficient background and confidence to continue exploring on their own. Fortunately, the Help
system and accompanying manuals are quite extensive, as are the learning resources available
online at http://www.jmp.com.
Second, the chapters largely follow a traditional sequence, making the book compatible with
many current texts. As such, instructors and students will find it easy to use the book as a
companion volume in an introductory course. Chapters are organized around core statistical
concepts rather than software commands, menus, or features. Several chapters include topics
that some instructors might view as “advanced”—typically when the output from JMP makes it a
natural extension of a more elementary topic. This is one way in which software can redefine the
boundaries of introductory statistics.
Third, nearly all the data sets in the book are real and are drawn from those disciplines whose
practitioners are the primary users of JMP software. Inasmuch as most undergraduate programs
now require coursework in statistics, the examples span major areas in which statistical analysis
is an important path to knowledge. Those areas include engineering, life sciences, business, and
economics.
Fourth, each chapter invites students to practice the habits of thought that are essential to
statistical reasoning. Long after readers forget the details of a particular procedure or the
options available in a specific JMP analysis platform, this book may continue to resonate with
valuable lessons about variability, uncertainty, and the logic of inference.
Each chapter concludes with a set of “Application Scenarios,” which lay out a problem-solving or
investigative context that is in turn supported by a data table. Each scenario includes a set of
questions that implicitly require the application of the techniques and concepts presented in the
chapter.
New in the Third Edition
This edition preserves much of the content and approach of the earlier editions, while updating
examples and introducing new JMP features. As in the second edition, there are three review
chapters (Chapters 5, 9, and 17) that pause to recap concepts and techniques. One of the
perennial challenges in learning statistics is that it is easy to lose sight of major themes as a
course progresses through a series of seemingly disconnected techniques and topics. Some
readers should find the review chapters to be helpful in this respect. The review chapters share a
single large data set of World Development Indicators, published by the World Bank.
The scope and sequence of chapters is basically the same as the prior edition. There is some
additional new material about the importance of documenting one’s work with an eye toward
reproducibility of analyses, as well as production of presentation-ready reporting. The second
edition was based on JMP 11, and since that time, platforms have been added or modified, and
some functionality has relocated in the menu system. This edition captures those changes.
Some of the updated data tables are considerably larger than their counterparts in earlier
editions. This creates the opportunity to demonstrate methods for meaningful graphs when data
density and overplotting become issues. I also use some of the larger data tables to introduce
machine learning practices like partitioning a data set into training and validation sets.