ebook img

Introductory Topics PDF

375 Pages·1978·17.113 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Introductory Topics

SHELBY J. HABERMAN DEPARTMENT OF STATISTICS UNIVERSITY OF CHICAGO CHICAGO, ILLINOIS Analysis of Qualitative Data Volume 1 INTRODUCTORY TOPICS ACADEMIC PRESS New York San Francisco London 1978 A Subsidiary of Harcourt Brace Jovanovich, Publishers COPYRIGHT © 1978, BY ACADEMIC PRESS, INC. ALL RIGHTS RESERVED. NO PART OF THIS PUBLICATION MAY BE REPRODUCED OR TRANSMITTED IN ANY FORM OR BY ANY MEANS, ELECTRONIC OR MECHANICAL, INCLUDING PHOTOCOPY, RECORDING, OR ANY INFORMATION STORAGE AND RETRIEVAL SYSTEM, WITHOUT PERMISSION IN WRITING FROM THE PUBLISHER. ACADEMIC PRESS, INC. Ill Fifth Avenue, New York, New York 10003 United Kingdom Edition published by ACADEMIC PRESS, INC. (LONDON) LTD. 24/28 Oval Road, London NW1 7DX Library of Congress Cataloging in Publication Data Haberman, Shelby J Analysis of qualitative data. Includes bibliographical references and indexes. CONTENTS: v. 1. Introductory topics. 1. Log-linear models. 2. Contingency tables. 3. Social sciences—Statistical methods. I. Title. HA33.H27 519.5 77-25731 ISBN 0-12-312501-4 PRINTED IN THE UNITED STATES OF AMERICA Preface Qualitative data are encountered throughout the social sciences. De­ spite this widespread occurrence, statistical methods for analysis of these data remained quite primitive until the 1960s. Since the early 1960s, development of log-linear models has resulted in very rapid progress in the sophisticated analysis of qualitative and nominal data. Although much of the development of log-linear models has been brought about by statisticians such as Leo Goodman and Frederick Mosteller with close ties to the social sciences, few social scientists have had more than a limited acquaintance with this class of statistical models. A social scientist interested in learning about log-linear models cur­ rently has available a welter of journal articles. These articles appear in the statistics, social science, and biological science literature, and varying levels of mathematical sophistication are required to read them. Three recent books, Plackett (1974), Bishop, Fienberg, and Holland (1975), and Haberman (1974a), are also available. None is specifically oriented to­ ward the needs of social scientists, either in choice of examples and topics or in mathematical level. This book attempts to provide an introduction to log-linear models that is oriented toward social scientists. Mathematical requirements are kept to a minimum. Some familiarity with matrix algebra sometimes will be helpful, but no explicit use will be made of calculus. Although the formal mathematical requirements are limited, the book does contain numerous algebraic formulas and summation notation is repeatedly used. Without such expressions it is not possible to present general computational al­ gorithms, and assessment of the variability of parameter estimates is impossible. The book assumes sufficient prior knowledge of statistics so vii viii PREFACE that concepts such as confidence intervals, hypothesis tests, and point estimates are familiar. Knowledge of analysis of variance and regression analysis is very helpful in gaining perspective on the relationships be­ tween the analysis of continuous and of discrete variables; however, this knowledge is not required for understanding the matérial in this book. General familiarity with maximum-likelihood estimation is also helpful but not necessary. These restrictions prevent inclusion of formal proofs of results; however, such proofs can be found in Haberman (1974a). Examples used to illustrate the statistical techniques described in this book are obtained from real data of interest to social scientists. Some examples will involve basic problems in survey research, such as memory error. Some examples will consider topics of public interest such as variations in homicide rates related to variables such as the race and sex of victim. Still other examples will use the General Social Survey of the National Opinion Research Center to examine public opinion concerning abortion. The first four chapters are ordered by the number of variables under study. Chapter 1 introduces some basic methods for study of the distribu­ tion of a single polytomous variable. Several simple examples provide an introduction to the basic estimation and testing procedures associated with log-linear models. Chapter 2 considers contingency tables in which two polytomous vari­ ables are cross classified. Tests for independence and methods for de­ scribing departures from independence are discussed. In Chapter 3 hierarchical log-linear models for three-way tables are introduced. In these tables, three polytomous variables are cross classified. Models for independence, conditional independence, and no three-factor interaction are described. The iterative proportional fitting algorithm of Deming and Stephen (1940) is introduced. Chapter 4 extends procedures developed in Chapter 3 to contingency tables in which four or more variables are cross classified. Chapter 5 provides an introduction to logit models. In these models discrete or continuous independent variables are used to predict a dichotomous dependent variable. The models considered are analogous to regression models used with continuous dependent variables. In Chapter 6 multinomial-response models are developed for prediction of one or more polytomous dependent variables. These models are generalizations of the logit models of Chapter 5. Chapters 7 and 8 examine contingency tables with special structures. In Chapter 7 tables are considered in which certain cells are unusual in their behavior. The quasi-independence model and other models that ignore these unusual cells are considered. In Chapter 8 models are considered for PREFACE ix contingency tables in which several polytomous variables have the same categories. Symmetry models, quasi-symmetry models, and distance models are introduced. Chapter 9 considers a classical problem of applied statistics, the ad­ justment of data. The methods developed by Deming and Stephan in the 1940s are illustrated, and some newer possibilities are discussed. Chapter 10 examines the relationship of log-linear models to latent- structure models. The material in this chapter is quite new. It is intended to provide an indication of possible future developments. When used as a textbook in a one-quarter or one-semester course, the first five chapters, which constitute Volume 1, can be regarded as a minimal goal. A cknowledgments Work on this book has been supported in part by National Science Foundation Grant No. SOC72-05228 A04, National Science Foundation Grant No. MCS72-04364 A04, and National Institutes of Health Grant No. GM22648. Discussions with Professors Leo Goodman, William Mason, and Clif­ ford Clogg have been particularly helpful in preparation of the manu­ script. Much of the data used in this book has been provided by the National Opinion Research Center of the University of Chicago. Thanks are also due to the editors of Sociological Analysis for permission to use Table 2.2, to John Wiley & Sons for permission to use Tables 2.4 and 2.14, and to Harcourt Brace Jovanovich for permission to use Tables 2.1 and 2.12. Contents of Volume 2 Ό MULTINOMIAL-RESPONSE MODELS 7 INCOMPLETE TABLES O SYMMETRICAL TABLES \)ADJUSTMENT OF DATA 10 LATENT-CLASS MODELS APPENDIX: COMPUTER PROGRAMS FOR COMPUTATION OF MAXIMUM-LIKELIHOOD ESTIMATES 1 Po ly torno us Responses The basic tools used to analyze frequency tables by means of log-linear models may be introduced by an examination of some one-way frequency tables which summarize observations on a single polytomous variable, examples for which are provided in Sections 1.1 to 1.4. In Section 1.5, some general observations are made concerning use of log-linear models with one-way tables; chapter references are provided here. Sections 1.1 and 1.2 treat multinomial data. The example in Section 1.1 uses memory data to illustrate use of Pearson and likelihood-ratio chi- square statistics, adjusted residuals, and linear combinations of frequency counts to examine departures from a model of equal probabilities for each class. Departures from this model, which has many formal analogies to those of simple linear regression, are then described by a log-linear model. The analogies are used to describe the Newton-Raphson algorithm for com­ putation of maximum-likelihood estimates in terms of a series of weighted regression analyses. Formulas used in weighted regression are also used to obtain normal approximations for the distribution of estimated parameters. Once maximum-likelihood estimates are obtained, chi-square tests and adjusted residuals are used to explore the adequacy of the proposed model. In Section 1.2, which has a similar structure to Section 1.1, self-classi­ fication by social class is considered. Chi-square statistics and adjusted residuals are now used to determine whether the distribution of self-classi­ fications is symmetrical. Departures from symmetry are examined by a log-linear model which is formally analogous to a multiple regression model, and shows that computational procedures for maximum-likelihood estimates correspond to weighted multiple regressions. In Sections 1.3 and 1.4 attention shifts to Poisson data. Section 1.3 con­ siders whether daily suicide rates are seasonally dependent and models of 1 2 1. POL TTOMO US RESPONSES constant daily rates and of constant daily rates for each season are tested with chi-square statistics and adjusted residuals. Regional variations in suicide rates are considered in Section 1.4 and simultaneous confidence intervals are introduced to determine which regional differences are clearly established. Section 1.5 provides the general results on which the chapter is based. Maximum-likelihood equations and the Newton-Raphson algorithm are introduced for general log-linear models for both multinominal and Poisson data. General analogies with weighted regression analysis are used to de­ scribe the Newton-Raphson algorithm and to develop large-sample ap­ proximations to the distributions of parameter estimates, residuals and chi-square statistics. 1.1 Effects of Memory—An introduction to Log-Linear Models Analysis of data by log-linear models involves several distinct stages. First, a plausible model is proposed for the data under study. Second, unknown parameters in the model are estimated from the data, generally by the method of maximum likelihood. This methods yields estimates available in closed form, as in the equiprobability model, or estimates computed by a version of the Newton-Raphson algorithm with analogies to weighted regression analysis, as in the log-linear time-trend model. Third, these param­ eter estimates are used in statistical tests of the model's adequacy. Pearson and likelihood-ratio chi-square tests provide overall measures of the com­ patibility of the model and the data. More specific insight into deviations between model and data are provided by analysis of adjusted residuals and of selected linear combinations of frequencies. Two possibilities exist at the fourth step. If the model appears adequate, then the parameter estimates are used to obtain quantitative implications concerning the data. Asymptotic standard deviations and approximate confidence intervals are major tools at this point. If the model appears inadequate, then the residual analyses and analyses of linear combinations of frequencies of the third stage are employed to suggest new models that are more consistent with the data, to which the new model are then applied. Thus, analysis will often be an iterative process in which the four-stage analysis is applied to several distinct models, many of which are suggested by previous exploration of the data. As an introduction to the use of log-linear models, consider the data shown in Table 1.1. These data were compiled to study a problem observed during a study conducted to explore the relationship between life stresses and illnesses. The study, which is described in Uhlenhuth, Lipman, Baiter,

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.