ebook img

Data Analysis: A Model Comparison Approach To Regression, ANOVA, and Beyond PDF

379 Pages·2017·2.06 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Data Analysis: A Model Comparison Approach To Regression, ANOVA, and Beyond

Data Analysis Third Edition Data Analysis: A Model Comparison Approach to Regression, ANOVA, and Beyondis an integrated treatment of data analysis for the social and behavioral sciences. It covers all of the statistical models normally used in such analyses, such as multiple regression and analysis of variance, but it does so in an integrated manner that relies on the comparison of models of data estimated under the rubric of the general linear model. Data Analysis also describes how the model comparison approach and uniform framework can be applied to models that include product predictors (i.e., interactions and nonlinear effects) and to observations that are nonindependent. Indeed, the analysis of nonindependent observations is treated in some detail, including models of non- independent data with continuously varying predictors as well as standard repeated measures analysis of variance. This approach also provides an integrated introduction to multilevel or hierarchical linear models and logistic regression. Finally, Data Analysis provides guidance for the treatment of outliers and other problematic aspects of data analysis. It is intended for advanced undergraduate and graduate level courses in data analysis and offers an integrated approach that is very accessible and easy to teach. Highlights of the third edition include: • a new chapter on logistic regression; • expanded treatment of mixed models for data with multiple random factors; • updated examples; • an enhanced website with PowerPoint presentations and other tools that demonstrate the concepts in the book; exercises for each chapter that highlight research findings from the literature; data sets, R code, and SAS output for all analyses; additional examples and problem sets; and test questions. Charles “Chick” M. Judd is Professor of Distinction in the College of Arts and Sciences at the University of Colorado at Boulder. His research focuses on social cognition and attitudes, intergroup relations and stereotypes, judgment and decision making, and behavioral science research methods and data analysis. Gary H. McClelland is Professor of Psychology at the University of Colorado at Boulder. A Faculty Fellow at the Institute of Cognitive Science, his research interests include judgment and decision making, psychological models of economic behavior, statistics & data analysis, and measurement and scaling. Carey S. Ryan is a Professor in the Department of Psychology at the University of Nebraska at Omaha. She has research interests in stereotyping and prejudice, group processes, and program evaluation. Data Analysis A Model Comparison Approach to Regression, ANOVA, and Beyond Third Edition Charles M. Judd University of Colorado Gary H. McClelland University of Colorado Carey S. Ryan University of Nebraska at Omaha Third edition published 2017 by Routledge 711 Third Avenue, New York, NY 10017 and by Routledge 2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN Routledge is an imprint of the Taylor & Francis Group, an informa business © 2017 Taylor & Francis The right of Charles M. Judd, Gary H. McClelland, and Carey S. Ryan to be identified as authors of this work has been asserted by them in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. First edition published by Harcourt Brace Jovanovich 1989 Second edition published by Routledge 2009 Library of Congress Cataloging in Publication Data Names: Judd, Charles M., author. | McClelland, Gary H., 1947–author. | Ryan, Carey S., author. Title: Data analysis : a model comparison approach to regression, ANOVA, and beyond / Charles M. Judd, University of Colorado, Gary H. McClelland, University of Colorado, Carey S. Ryan, University of Nebraska at Omaha. Description: Third Edition. | New York : Routledge, 2017. | Revised edition of Data analysis, 2007. | Includes bibliographical references and indexes. Identifiers: LCCN 2016055541| ISBN 9781138819825 (hardback : alk. paper) | ISBN 9781138819832 (pbk. : alk. paper) | ISBN 9781315744131 (ebook) Subjects: LCSH: Mathematical statistics. Classification: LCC QA276 .J83 2017 | DDC 519.5/36—dc23 LC record available at https://lccn.loc.gov/2016055541 ISBN: 978-1-138-81982-5 (hbk) ISBN: 978-1-138-81983-2 (pbk) ISBN: 978-1-315-74413-1 (ebk) Typeset in Times New Roman by Florence Production Ltd, Stoodleigh, Devon, UK Visit the companion website: www.dataanalysisbook.com Contents Preface vii 1 Introduction to Data Analysis 1 2 Simple Models: Definitions of Error and Parameter Estimates 10 3 Simple Models: Models of Error and Sampling Distributions 25 4 Simple Models: Statistical Inferences about Parameter Values 43 5 Simple Regression: Estimating Models with a Single Continuous Predictor 72 6 Multiple Regression: Models with Multiple Continuous Predictors 103 7 Moderated and Nonlinear Regression Models 135 8 One-Way ANOVA: Models with a Single Categorical Predictor 168 9 Factorial ANOVA: Models with Multiple Categorical Predictors and Product Terms 205 10 ANCOVA: Models with Continuous and Categorical Predictors 229 11 Repeated-Measures ANOVA: Models with Nonindependent Errors 260 12 Incorporating Continuous Predictors with Nonindependent Data: Towards Mixed Models 292 13 Outliers and Ill-Mannered Error 314 14 Logistic Regression: Dependent Categorical Variables 339 References 354 Appendix 357 Author Index 362 Subject Index 363 Preface We are stunned that twenty-seven years after publishing the first edition of our text. we find ourselves preparing this third edition. The first edition resulted from our teaching a graduate level course in data analysis in the 1980s at the University of Colorado (Judd, C. M., & McClelland, G. H. (1989). Data analysis: A model comparison approach. Orlando, FL: Harcourt Brace Jovanovich). The second edition was published in 2008, with the invaluable assistance of Carey Ryan (Judd, C. M., McClelland, G. H., & Ryan, C. S. (2008). Data analysis: A model comparison approach. New York, NY: Routledge). Now we once again find ourselves happy to see the continuing interest in our approach and our book, as revealed by the continuing course adoptions and the enthusiasm of our publisher. There is much that is new and different in this third edition. At a later point in this Preface, we make clear the significant changes that we have undertaken. At the same time, our basic model comparison approach and our way of thinking about data have not changed at all and remain the core of the book. Accordingly, we reproduce below parts of the Preface of the second edition, making clear our assumptions and our approach from the very beginning. GOALS AND ASSUMPTIONS Statistics courses, textbooks, and software are usually organized in the same way that a cookbook is organized. Typically, various recipes are given in different chapters for different kinds of research designs or data structures. In fact, numerous statistics books include a chart at the beginning, pointing the reader to various chapters and ostensibly different statistical procedures, depending on what their data look like and the kinds of questions that they wish to answer. As a result, social and behavioral scientists, as consumers of statistics, typically organize their statistical knowledge in much the same way: “With this sort of data, I know to do this test. With that sort of data, I know to do that test.” This book has been written under the assumption that this sort of organization for statistical knowledge has distinct costs. To extend the cookbook analogy, cooks who rely on cookbooks do not know how to proceed when they wish to prepare a dish for which no recipe has been included. When students are confronted with data that do not fit nicely into one of the categories for which they have learned a statistical procedure, frustration and error frequently occur. The student may proceed to use the wrong test, making inappropriate assumptions about the structure of the data in order to use a well- viii Preface learned statistical procedure. We have encountered students who have bludgeoned their data to fit an orthogonal analysis of variance with equal sample sizes because it is the only test they know. We have also heard some of our colleagues moan that a given data analysis problem would be simple if only they had a particular specialized computer program that was available at their previous university. A totally different sort of organization forms the basis of this book. Our focus is on how statistical procedures can be seen as tools for building and testing models of data. A consistent framework is used throughout the book to develop a few powerful techniques for model building and statistical inference. This framework subsumes all of the different statistical recipes that are normally found in the cookbooks, but it does so in a totally integrated manner. As we build and test models, we develop a consistent vocabulary to refer to our models and inference procedures. At each step, however, we clearly show the reader how our vocabulary and models can be translated back into the recipes of the old cookbooks. We are convinced that this integrated framework represents a better way to learn and think about data analysis. Students who understand data analysis through our framework will be able to ask the questions of their data that they want to ask instead of the questions that the designer of some textbook or of some computer statistical package assumed they would want to ask. In the end, students will use theory and their own intelligence to guide their data analysis instead of turning over the analysis to a cookbook set of procedures or to a computer program. Intelligent data analysis is our goal. Rather than applying recipes that have been learned by rote, we wish to enable students to write their own recipes as they do data analysis. A few words are in order concerning our assumptions about the reader and the structure of the book. We assume that the reader is a student or professional researcher in the behavioral or social sciences, business, education, or a related field who wants to analyze data to answer significant theoretical or practical questions in a substantive discipline. We assume that the reader does not want to become a statistician but is more interested in the substantive discipline, with data analysis being a tool rather than an end in itself. We assume that the reader is reasonably facile with basic algebraic manipulations because almost all of our models for data are expressed algebraically. But we do not assume any other mathematical training such as calculus or matrix algebra. A very important assumption is that the reader has access to a good multiple regression program in a statistical software package. But, importantly, we do not care which one. The book is appropriate for semester or year-long statistics and data analysis courses emphasizing multiple regression and/or analysis-of-variance and taught at the upper undergraduate or graduate levels. Previous editions have been successfully used with students from psychology, sociology, anthropology, political science, linguistics, cognitive science, neuroscience, biology, geology, geography, museum sciences, applied mathematics, marketing, management science, and organizational behavior. Our assumptions about the reader and our goal of training data analysts instead of statisticians have prompted several decisions about the structure of the book: 1. We present only enough mathematical derivations to provide conceptual clarity. We will generally assume that the mathematical statisticians have done their job correctly, and so we will use many of their results without proving them ourselves. At the Preface ix same time, however, we cannot abandon all mathematical details because it is extremely important that the data analyst be able to recognize when the data analysis and particularly the model of the data are inappropriate or in need of modification. The choice of which derivations to include is therefore guided by the goal of training educated data analysts and not mathematical statisticians. 2. We let the computer do the computational work. Most statistics cookbooks present many different formulas for each statistic. The different formulas facilitate hand calculation for different organizations of the raw data. We assume that the reader is interested in the science and not the calculations and will use the most efficient means to do the calculations—the computer. Ironically, the old hand-calculation formulas can be disastrous when implemented in computer programs because of problems of rounding errors and other computer idiosyncrasies. Neither the computational formulas that are best for computers nor the hand-calculation formulas are very conceptual. Hence, we avoid both computer- and hand-oriented formulas. We present instead formulas that emphasize the concepts but may not be useful for direct computations either by hand or by computer. 3. We try as much as possible to work from the general to the specific so that an integrated framework emerges. Many statistics books begin with details and simple statistics and then slowly build up to more general models, changing concepts and notation frequently along the way. Although we begin with simple models of data and work up, we do so within the context of a consistent overall framework, and we present the simple models using the same concepts and notation that we use with the more complex models presented later. Most importantly, we use the same inferential statistics throughout. 4. We do not try to cover all of statistics. There are many statistical procedures that we have left out either because they are infrequently used or because the same thing can be accomplished with a more general model that we do cover. We provide the data analyst with a limited stock of models and statistical procedures, but the models we do provide are quite general and powerful. The goal is to learn a few powerful techniques very thoroughly. 5. Our framework for data analysis is consistent with what has been termed the regression approach or the general linear model. Historically, the regression approach has often been contrasted, erroneously, with an analysis-of-variance approach. And, in this tradition, regression has been more often associated with surveys, quasi- experiments, and nonexperimental data, while analysis of variance has more frequently been used for randomized, laboratory experiments. These historical associations caused many people to have a false belief that there is a fundamental difference between the two approaches. We adopt the more general regression approach and show how all of analysis of variance can be accomplished within this framework. This has the important advantage of reducing the number of specialized statistical techniques that must be learned for particular data analysis problems. More importantly, we show how the regression approach provides more control in the analysis of experimental data so that we can ask specific, theoretically motivated questions of our data—questions that often differ from the standard questions that the procedures of a traditional analysis of variance presume we want to ask. 6. We had a difficult decision to make about the vocabulary to be used in this book. On the one hand, we are convinced that the traditional names for the various

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.