ebook img

Practical Data Analysis with JMP, Third Edition PDF

442 Pages·2019·8.162 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Practical Data Analysis with JMP, Third Edition

The correct bibliographic citation for this manual is as follows: Carver, Robert. 2019. Practical Data Analysis with JMP®, Third Edition. Cary, NC: SAS Institute Inc. Practical Data Analysis with JMP®, Third Edition Copyright © 2019, SAS Institute Inc., Cary, NC, USA ISBN 978-1-64295-614-6 (Hardcover) ISBN 978-1-64295-610-8 (Paperback) ISBN 978-1-64295-611-5 (Web PDF) ISBN 978-1-64295-612-2 (EPUB) ISBN 978-1-64295-613-9 (Kindle) All Rights Reserved. Produced in the United States of America. For a hard copy book: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc. For a web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication. The scanning, uploading, and distribution of this book via the Internet or any other means without the permission of the publisher is illegal and punishable by law. Please purchase only authorized electronic editions and do not participate in or encourage electronic piracy of copyrighted materials. Your support of others’ rights is appreciated. U.S. Government License Rights; Restricted Rights: The Software and its documentation is commercial computer software developed at private expense and is provided with RESTRICTED RIGHTS to the United States Government. Use, duplication, or disclosure of the Software by the United States Government is subject to the license terms of this Agreement pursuant to, as applicable, FAR 12.212, DFAR 227.7202-1(a), DFAR 227.7202-3(a), and DFAR 227.7202-4, and, to the extent required under U.S. federal law, the minimum restricted rights as set out in FAR 52.227-19 (DEC 2007). If FAR 52.227-19 is applicable, this provision serves as notice under clause (c) thereof and no other notice is required to be affixed to the Software or documentation. The Government’s rights in Software and documentation shall be only those set forth in this Agreement. SAS Institute Inc., SAS Campus Drive, Cary, NC 27513-2414 October 2019 SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. SAS software may be provided with certain third-party software, including but not limited to open-source software, which is licensed under its applicable third-party software license agreement. For license information about third- party software distributed with SAS software, refer to http://support.sas.com/thirdpartylicenses. Contents About This Book .................................................................................................................. ix About The Author .............................................................................................................. xvii Chapter 1: Getting Started: Data Analysis with JMP ............................................................ 1 Overview .................................................................................................................................................................1 Goals of Data Analysis: Description and Inference .................................................................................................1 Types of Data ..........................................................................................................................................................2 Starting JMP ............................................................................................................................................................3 A Simple Data Table ...............................................................................................................................................5 Graph Builder: An Interactive Tool to Explore Data .................................................................................................8 Using an Analysis Platform ................................................................................................................................... 11 Row States ............................................................................................................................................................ 14 Exporting and Sharing JMP Reports ..................................................................................................................... 16 Saving and Reproducing Your Work ..................................................................................................................... 20 Leaving JMP ......................................................................................................................................................... 21 Chapter 2: Data Sources and Structures ............................................................................ 23 Overview ............................................................................................................................................................... 23 Populations, Processes, and Samples .................................................................................................................. 23 Representativeness and Sampling ........................................................................................................................ 25 Cross-Sectional and Time Series Sampling .......................................................................................................... 28 Study Design: Experimentation, Observation, and Surveying ............................................................................... 29 Creating a Data Table ........................................................................................................................................... 36 Raw Case Data and Summary Data ..................................................................................................................... 36 Application ............................................................................................................................................................. 37 Chapter 3: Describing a Single Variable ............................................................................. 39 Overview ............................................................................................................................................................... 39 The Concept of a Distribution ................................................................................................................................ 39 Variable Types and Their Distributions.................................................................................................................. 40 Distribution of a Categorical Variable .................................................................................................................... 41 Using Graph Builder to Explore Categorical Data Visually .................................................................................... 44 Distribution of a Quantitative Variable ................................................................................................................... 45 iv Contents Using the Distribution Platform for Continuous Data ............................................................................................ 46 Exploring Further with the Graph Builder.............................................................................................................. 52 Summary Statistics for a Single Variable.............................................................................................................. 53 Application ............................................................................................................................................................ 56 Chapter 4: Describing Two Variables at a Time ................................................................. 61 Overview .............................................................................................................................................................. 61 Two-by-Two: Bivariate Data ................................................................................................................................. 61 Describing Covariation: Two Categorical Variables .............................................................................................. 63 Describing Covariation: One Continuous, One Categorical Variable .................................................................... 71 Describing Covariation: Two Continuous Variables .............................................................................................. 74 Application ............................................................................................................................................................ 82 Chapter 5: Review of Descriptive Statistics ....................................................................... 87 Overview .............................................................................................................................................................. 87 The World Development Indicators ...................................................................................................................... 87 Questions for Analysis .......................................................................................................................................... 88 Applying an Analytic Framework .......................................................................................................................... 89 Preparation for Analysis ....................................................................................................................................... 92 Univariate Descriptions......................................................................................................................................... 92 Explore Relationships with Graph Builder ............................................................................................................ 95 Further Analysis with the Multivariate Platform ..................................................................................................... 98 Further Analysis with Fit Y by X .......................................................................................................................... 100 Summing Up: Interpretation and Conclusions .................................................................................................... 101 Visualizing Multiple Relationships ...................................................................................................................... 101 Chapter 6: Elementary Probability and Discrete Distributions ......................................... 105 Overview ............................................................................................................................................................ 105 The Role of Probability in Data Analysis............................................................................................................. 105 Elements of Probability Theory ........................................................................................................................... 106 Contingency Tables and Probability ................................................................................................................... 109 Discrete Random Variables: From Events to Numbers ...................................................................................... 111 Three Common Discrete Distributions ................................................................................................................ 112 Simulating Random Variation with JMP.............................................................................................................. 116 Discrete Distributions as Models of Real Processes .......................................................................................... 117 Application .......................................................................................................................................................... 118 Chapter 7: The Normal Model .......................................................................................... 123 Overview ............................................................................................................................................................ 123 Continuous Data and Probability ........................................................................................................................ 123 Density Functions ............................................................................................................................................... 124 The Normal Model .............................................................................................................................................. 127 Normal Calculations ........................................................................................................................................... 128 Checking Data for the Suitability of a Normal Model .......................................................................................... 133 Contents v Generating Pseudo-Random Normal Data .......................................................................................................... 137 Application ........................................................................................................................................................... 138 Chapter 8: Sampling and Sampling Distributions ............................................................. 143 Overview ............................................................................................................................................................. 143 Why Sample? ...................................................................................................................................................... 143 Methods of Sampling........................................................................................................................................... 144 Using JMP to Select a Simple Random Sample ................................................................................................. 145 Variability Across Samples: Sampling Distributions ............................................................................................ 148 Application ........................................................................................................................................................... 159 Chapter 9: Review of Probability and Probabilistic Sampling ........................................... 163 Overview ............................................................................................................................................................. 163 Probability Distributions and Density Functions .................................................................................................. 163 The Normal and t Distributions ............................................................................................................................ 164 The Usefulness of Theoretical Models ................................................................................................................ 166 When Samples Surprise Us: Ordinary and Extraordinary Sampling Variability ................................................... 167 Conclusion .......................................................................................................................................................... 171 Chapter 10: Inference for a Single Categorical Variable .................................................. 173 Overview ............................................................................................................................................................. 173 Two Inferential Tasks .......................................................................................................................................... 173 Statistical Inference Is Always Conditional .......................................................................................................... 174 Using JMP to Conduct a Significance Test ......................................................................................................... 174 Confidence Intervals............................................................................................................................................ 179 Using JMP to Estimate a Population Proportion .................................................................................................. 179 A Few Words about Error .................................................................................................................................... 183 Application ........................................................................................................................................................... 184 Chapter 11: Inference for a Single Continuous Variable .................................................. 189 Overview ............................................................................................................................................................. 189 Conditions for Inference ...................................................................................................................................... 189 Using JMP to Conduct a Significance Test ......................................................................................................... 190 What If Conditions Are Not Satisfied? ................................................................................................................. 197 Using JMP to Estimate a Population Mean ......................................................................................................... 197 Matched Pairs: One Variable, Two Measurements ............................................................................................. 199 Application ........................................................................................................................................................... 201 Chapter 12: Chi-Square Tests .......................................................................................... 205 Overview ............................................................................................................................................................. 205 Chi-Square Goodness-of-Fit Test ....................................................................................................................... 205 Inference for Two Categorical Variables ............................................................................................................. 208 Contingency Tables Revisited ............................................................................................................................. 209 Chi-Square Test of Independence ...................................................................................................................... 211 Application ........................................................................................................................................................... 213 vi Contents Chapter 13: Two-Sample Inference for a Continuous Variable ........................................ 217 Overview ............................................................................................................................................................ 217 Conditions for Inference ..................................................................................................................................... 217 Using JMP to Compare Two Means ................................................................................................................... 217 Using JMP to Compare Two Variances .............................................................................................................. 224 Application .......................................................................................................................................................... 226 Chapter 14: Analysis of Variance ..................................................................................... 229 Overview ............................................................................................................................................................ 229 What Are We Assuming? ................................................................................................................................... 229 One-Way ANOVA ............................................................................................................................................... 230 What If Conditions Are Not Satisfied? ................................................................................................................ 237 Including a Second Factor with Two-Way ANOVA ............................................................................................. 238 Application .......................................................................................................................................................... 245 Chapter 15: Simple Linear Regression Inference ............................................................. 249 Overview ............................................................................................................................................................ 249 Fitting a Line to Bivariate Continuous Data ........................................................................................................ 249 The Simple Regression Model ........................................................................................................................... 253 What Are We Assuming? ................................................................................................................................... 255 Interpreting Regression Results ......................................................................................................................... 256 Application .......................................................................................................................................................... 261 Chapter 16: Residuals Analysis and Estimation ............................................................... 267 Overview ............................................................................................................................................................ 267 Conditions for Least Squares Estimation............................................................................................................ 267 Residuals Analysis ............................................................................................................................................. 268 Estimation ........................................................................................................................................................... 276 Application .......................................................................................................................................................... 280 Chapter 17: Review of Univariate and Bivariate Inference............................................... 285 Overview ............................................................................................................................................................ 285 Research Context ............................................................................................................................................... 285 One Variable at a Time....................................................................................................................................... 286 Life Expectancy by Income Group ...................................................................................................................... 287 Life Expectancy by GDP per Capita ................................................................................................................... 291 Conclusion .......................................................................................................................................................... 293 Chapter 18: Multiple Regression ...................................................................................... 295 Overview ............................................................................................................................................................ 295 The Multiple Regression Model .......................................................................................................................... 295 Visualizing Multiple Regression .......................................................................................................................... 296 Fitting a Model .................................................................................................................................................... 298 A More Complex Model ...................................................................................................................................... 302 Residuals Analysis in the Fit Model Platform ...................................................................................................... 304 Using a Regression Tree Approach: The Partition Platform ............................................................................... 306 Contents vii Collinearity .......................................................................................................................................................... 309 Evaluating Alternative Models ............................................................................................................................. 315 Application ........................................................................................................................................................... 319 Chapter 19: Categorical, Curvilinear, and Non-Linear Regression Models ....................... 323 Overview ............................................................................................................................................................. 323 Dichotomous Independent Variables................................................................................................................... 323 Dichotomous Dependent Variable ....................................................................................................................... 327 Curvilinear and Non-Linear Relationships ........................................................................................................... 330 More Non-Linear Functions ................................................................................................................................. 337 Application ........................................................................................................................................................... 338 Chapter 20: Basic Forecasting Techniques ...................................................................... 341 Overview ............................................................................................................................................................. 341 Detecting Patterns Over Time ............................................................................................................................. 341 Smoothing Methods ............................................................................................................................................ 344 Trend Analysis .................................................................................................................................................... 350 Autoregressive Models ........................................................................................................................................ 352 Application ........................................................................................................................................................... 355 Chapter 21: Elements of Experimental Design ................................................................. 359 Overview ............................................................................................................................................................. 359 Why Experiment? ................................................................................................................................................ 360 Goals of Experimental Design ............................................................................................................................. 360 Factors, Blocks, and Randomization ................................................................................................................... 361 Multi-Factor Experiments and Factorial Designs ................................................................................................. 362 Blocking ............................................................................................................................................................... 369 A Design for Main Effects Only ........................................................................................................................... 371 Definitive Screening Designs .............................................................................................................................. 373 Non-Linear Response Surface Designs .............................................................................................................. 375 Application ........................................................................................................................................................... 379 Chapter 22: Quality Improvement .................................................................................... 385 Overview ............................................................................................................................................................. 385 Processes and Variation ..................................................................................................................................... 385 Control Charts ..................................................................................................................................................... 386 Variability Charts ................................................................................................................................................. 395 Capability Analysis .............................................................................................................................................. 398 Pareto Charts ...................................................................................................................................................... 400 Application ........................................................................................................................................................... 402 Bibliography ..................................................................................................................... 407 Index ................................................................................................................................. 413 viii Contents About This Book What Does This Book Cover? Purpose: Learning to Reason Statistically We live in a world of uncertainty. Today more than ever before, we have vast resources of data available to shed light on crucial questions. But at the same time, the sheer volume and complexity of the “data deluge” can distract and overwhelm us. The goal of applied statistical analysis is to work with data to calibrate, cope with, and sometimes reduce uncertainty. Business decisions, public policies, scientific research, and news reporting are all shaped by statistical analysis and reasoning. Statistical thinking is an essential part of the boom in “big data analytics” in numerous professions. This book will help you use and discriminate among some fundamental techniques of analysis, and it will also help you engage in statistical thinking by analyzing real problems. You will come to see statistical investigations as an iterative process and will gain experience in the major phases of that process. To be an effective analyst or consumer of other people’s analyses, you must know how to use these techniques, when to use them, and how to communicate their implications. Knowing how to use these techniques involves mastery of computer software like JMP. Knowing when to use these techniques requires an understanding of the theory underlying the techniques and practice with applications of the theory. Knowing how to effectively communicate with consumers of an analysis or with other analysts requires a clear understanding of the theory and techniques, as well as clarity of expression, directed toward one’s audience. There was a time when a first course in statistics emphasized abstract theory, laborious computation, and small sets of artificial data—but not practical data analysis or interpretation. Those days are thankfully past, and now we can address all three of the skill sets just cited. Scope and Structure of This Book As a discipline, statistics is large and growing; the same is true of JMP. One paperback book must limit its scope, and the content boundaries of this book are set intentionally along several dimensions. First, this book provides considerable training in the basic functions of JMP 15. JMP is a full- featured, highly interactive, visual, and comprehensive package. The book assumes that you have the software at your school or office. The software’s capabilities extend far beyond an introductory course, and this book makes no attempt to “cover” the entire program. The book introduces students to its major platforms and essential features and should leave students with x Practical Data Analysis with JMP, Third Edition sufficient background and confidence to continue exploring on their own. Fortunately, the Help system and accompanying manuals are quite extensive, as are the learning resources available online at http://www.jmp.com. Second, the chapters largely follow a traditional sequence, making the book compatible with many current texts. As such, instructors and students will find it easy to use the book as a companion volume in an introductory course. Chapters are organized around core statistical concepts rather than software commands, menus, or features. Several chapters include topics that some instructors might view as “advanced”—typically when the output from JMP makes it a natural extension of a more elementary topic. This is one way in which software can redefine the boundaries of introductory statistics. Third, nearly all the data sets in the book are real and are drawn from those disciplines whose practitioners are the primary users of JMP software. Inasmuch as most undergraduate programs now require coursework in statistics, the examples span major areas in which statistical analysis is an important path to knowledge. Those areas include engineering, life sciences, business, and economics. Fourth, each chapter invites students to practice the habits of thought that are essential to statistical reasoning. Long after readers forget the details of a particular procedure or the options available in a specific JMP analysis platform, this book may continue to resonate with valuable lessons about variability, uncertainty, and the logic of inference. Each chapter concludes with a set of “Application Scenarios,” which lay out a problem-solving or investigative context that is in turn supported by a data table. Each scenario includes a set of questions that implicitly require the application of the techniques and concepts presented in the chapter. New in the Third Edition This edition preserves much of the content and approach of the earlier editions, while updating examples and introducing new JMP features. As in the second edition, there are three review chapters (Chapters 5, 9, and 17) that pause to recap concepts and techniques. One of the perennial challenges in learning statistics is that it is easy to lose sight of major themes as a course progresses through a series of seemingly disconnected techniques and topics. Some readers should find the review chapters to be helpful in this respect. The review chapters share a single large data set of World Development Indicators, published by the World Bank. The scope and sequence of chapters is basically the same as the prior edition. There is some additional new material about the importance of documenting one’s work with an eye toward reproducibility of analyses, as well as production of presentation-ready reporting. The second edition was based on JMP 11, and since that time, platforms have been added or modified, and some functionality has relocated in the menu system. This edition captures those changes. Some of the updated data tables are considerably larger than their counterparts in earlier editions. This creates the opportunity to demonstrate methods for meaningful graphs when data density and overplotting become issues. I also use some of the larger data tables to introduce machine learning practices like partitioning a data set into training and validation sets.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.