ebook img

R (and S-PLUS) Manual to Accompany Agresti’s Categorical PDF

281 Pages·2009·0.85 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview R (and S-PLUS) Manual to Accompany Agresti’s Categorical

R (and S-PLUS) Manual to Accompany Agresti’s Categorical Data Analysis (2002) nd 2 edition © Laura A. Thompson, 2009 Table of Contents Introduction and Changes from First Edition .....................1 A. Obtaining the R Software for Windows....................................................................1 B. Libraries in S-PLUS and Packages in R..................................................................1 C. Setting contrast types using Options()....................................................................3 D. Credit for functions..................................................................................................3 E. Editing functions......................................................................................................3 F. A note about using Splus Menus.............................................................................4 G. Notice of errors.......................................................................................................4 H. Introductions to the S Language.............................................................................4 I. References...............................................................................................................4 J. Acknowledgements..................................................................................................5 Chapter 1: Distributions and Inference for Categorical Data: ..................................................................................6 A. Summary of Chapter 1, Agresti..............................................................................6 B. Categorical Distributions in S-PLUS and R............................................................6 C. Proportion of Vegetarians (Statistical Inference for Binomial Parameters).............8 D. The Mid-P-Value..................................................................................................11 E. Pearson’s Chi-Squared Statistic...........................................................................11 F. Likelihood Ratio Chi-Squared Statistic .................................................................12 G. Maximum Likelihood Estimation...........................................................................12 Chapter 2: Describing Contingency Tables .....................16 A. Summary of Chapter 2, Agresti.............................................................................16 B. Comparing two proportions...................................................................................18 C. Partial Association in Stratified 2 x 2 Tables.........................................................19 D. Conditional Odds Ratios .......................................................................................23 E. Summary Measures of Assocation: Ordinal Trends..............................................24 Chapter 3: Inference for Contingency Tables..................28 A. Summary of Chapter 3, Agresti.............................................................................28 B. Confidence Intervals for Association Parameters..................................................29 C. Testing Independence in Two-way Contingency Tables.......................................35 D. Following Up Chi-Squared Tests...........................................................................37 E. Two-Way Tables with Ordered Classification........................................................39 F. Small Sample Tests of Independence...................................................................41 G. Small-Sample Confidence Intervals For 2x2 Tables.............................................44 i Chapter 4: Generalized Linear Models............................50 A. Summary of Chapter 4, Agresti.............................................................................50 B. Generalized Linear Models for Binary Data...........................................................51 C. Generalized Linear Models for Count Data...........................................................56 D. Overdispersion in Poisson Generalized Linear Models.........................................61 E. Negative Binomial GLIMs......................................................................................63 F. Residuals for GLIMs..............................................................................................65 G. Quasi-Likelihood and GLIMs.................................................................................67 H. Generalized Additive Models (GAMs)...................................................................68 Chapter 5 : Logistic Regression.......................................72 A. Summary of Chapter 5, Agresti............................................................................72 B. Logistic Regression for Horseshoe Crab Data.....................................................73 C. Goodness-of-fit for Logistic Regression for Ungrouped Data...............................77 D. Logit Models with Categorical Predictors.............................................................78 E. Multiple Logistic Regression.................................................................................82 F. Extended Example (Problem 5.17).......................................................................88 Chapter 6 – Building and Applying Logistic Regression Models .............................................................................92 A. Summary of Chapter 6, Agresti.............................................................................92 B. Model Selection.....................................................................................................93 C. Using Causal Hypotheses to Guide Model Fitting.................................................94 D. Logistic Regression Diagnostics...........................................................................96 E. Inference about Conditional Associations in 2 x 2 x K Tables.............................102 F. Estimation/Testing of Common Odds Ratio.........................................................105 G. Using Models to Improve Inferential Power ........................................................106 H. Sample Size and Power Considerations.............................................................107 I. Probit and Complementary Log-Log Models .......................................................109 J. Conditional Logistic Regression and Exact Distributions.....................................111 K. Bias-reduced Logistic Regression.......................................................................116 Chapter 7 –Logit Models for Multinomial Responses....117 A. Summary of Chapter 7, Agresti...........................................................................117 B. Nominal Responses: Baseline-Category Logit Models........................................118 C. Cumulative Logit Models.....................................................................................121 D. Cumulative Link Models......................................................................................125 E. Adjacent-Categories Logit Models.......................................................................127 F. Continuation-Ratio Logit Models..........................................................................128 G. Mean Response Models.....................................................................................134 H. Generalized Cochran-Mantel Haenszel Statistic for Ordinal Categories............139 ii Chapter 8 –Loglinear Models for Contingency Tables ..141 A. Summary of Chapter 8, Agresti...........................................................................141 B. Loglinear Models for Three-way Tables..............................................................142 C. Inference for Loglinear Models............................................................................145 D. Loglinear Models for Higher Dimensions ...........................................................147 E. Loglinear-Logit Model Connection.......................................................................150 F. Contingency Table Standardization.....................................................................151 Chapter 9 –Building and Extending Loglinear Models...152 A. Summary of Chapter 9, Agresti...........................................................................152 B. Model Selection and Comparison.......................................................................153 C. Diagnostics for Checking Models.......................................................................155 D. Modeling Ordinal Assocations............................................................................156 E. Assocation Models..............................................................................................158 F. Association Models, Correlation Models, and Correspondence Analysis............164 G. Poisson Regression for Rates.............................................................................170 H. Modeling Survival Times.....................................................................................172 I. Empty Cells and Sparseness................................................................................174 Chapter 10 – Models for Matched Pairs ........................176 A. Summary of Chapter 10, Agresti.........................................................................176 B. Comparing Dependent Proportions.....................................................................177 C. Conditional Logistic Regression for Binary Matched Pairs..................................178 D. Marginal Models for Square Contingency Tables...............................................181 E. Symmetry, Quasi-symmetry, and Quasi-independence.....................................186 F. Square Tables with Ordered Categories ............................................................189 G. Measuring Agreement Between Observers.......................................................192 H. Kappa Measure of Agreement...........................................................................195 I. Bradley-Terry Model for Paired Preferences .......................................................196 J. Bradley-Terry Model with Order Effect................................................................199 K. Marginal and Quasi-symmetry Models for Matched Sets...................................200 Chapter 11 –Analyzing Repeated Categorical Response Data ...............................................................................203 A. Summary of Chapter 11, Agresti.........................................................................203 B. Comparing Marginal Distributions: Multiple Responses......................................203 C. Marginal Modeling: Maximum Likelihood Approach............................................205 D. Marginal Modeling: Maximum Likelihood Approach. Modeling a Repeated Multinomial Response .............................................................................................211 E. Marginal Modeling: GEE Approach. Modeling a Repeated Multinomial Response215 F. Marginal Modeling: GEE Approach. Modeling a Repeated Multinomial Ordinal Response ................................................................................................................219 iii G. Markov Chains: Transitional Modeling................................................................221 Chapter 12 – Random Effects: Generalized Linear Mixed Models for Categorical Responses................................226 A. Summary of Chapter 12, Agresti.........................................................................226 B. Logit GLIMM for Binary Matched Pairs................................................................227 C. Examples of Random Effects Models for Binary Data.........................................230 D. Random Effects Models for Multinomial Data.....................................................243 E. Multivariate Random Effects Models for Binary Data..........................................245 Chapter 13 – Other Mixture Models for Categorical Data252 A. Summary of Chapter 13, Agresti.........................................................................252 B. Latent Class Models...........................................................................................252 C. Nonparametric Random Effects Models.............................................................260 D. Beta-Binomial Models........................................................................................268 E. Negative-Binomial Regression...........................................................................273 F. Poisson Regression with Random Effects..........................................................275 iv 1 Introduction and Changes from First Edition This manual accompanies Agresti’s Categorical Data Analysis (2002). It provides assistance in doing the statistical methods illustrated there, using S-PLUS and the R language. Although I have used the Windows versions of these two softwares, I suspect there are few changes in order to use the code in other ports. I have included examples of almost all of the major (and some minor) analyses introduced by Agresti. The manual chapters parallel those from Agresti so that, for example, in Chapter 2 I discuss using the software to conduct analyses from Agresti’s Chapter 2. In most cases I use the data provided in the text. There are only one or two occasions where I use data from the problems sections in Agresti. Discussion of results of analyses is brief since the discussion appears in the text. That said, one of the improvements in the current manual over the previous version of the manual (Thompson, 1999) is that it is more self-contained. In addition, I include a summary of the corresponding chapter from Agresti at the beginning of each chapter in the manual. However, it will still be helpful to refer to the text to understand completely the analyses in this manual. In the manual, I frequently employ functions that come from user-contributed libraries (packages) of S-PLUS (or R). In the text, I note when a particular library or package is used. These libraries are not automatically included with the software, but can be easily downloaded from the internet, especially for the newest version of R for Windows. I mention in the next section how to get these libraries and how to install them for use. Many times, the library or package has its own help manual or help section. I will demonstrate how to access these from inside the software. I used S-PLUS 6.1 through 7.0 for Windows and R versions 1.8 through 2.8.1 for Windows for the analyses. However, S-PLUS for Windows versions as far back as 3.0 will do many of the analyses (but not all). This is not so easily said for R, as user-contributed packages frequently apply to the newer versions of R (e.g., at least 1.3.0). Many of the analyses can be applied to either S-PLUS or R. Some need small changes in order to apply to both softwares; these changes I have provided where necessary. In general, when I refer to both of the softwares, I will use the “generic” name, S. Also, if I do not indicate that I am using either S-PLUS or R for a particular command, that means it will work in both softwares. To separate input to R or S-PLUS and output from other text in this manual, I have put normal text in Arial font and commands and output in courier font. The input commands are in bold font, whereas the output is not. Also, all assignments will use the “<-“ convention instead of “=” (or, “_”). Finally, this manual assumes some familiarity in using the basic commands of S. To keep the manual from being too long I do not discuss at great length functions that I use which are not directly related to categorical data analysis. See Section H below for information on obtaining introductory documentation for R or S-PLUS. A. Obtaining the R Software for Windows The language (and associated software interface) R can loosely be described as “open-source” S. It is downloadable from the site http://cran.r-project.org. Information on how to install R, as well as several PDF documents and user-contributed documents on the language and its features are included on the website. B. Libraries in S-PLUS and Packages in R The S-PLUS libraries used in this manual that do not come with the software are MASS (B. Ripley) - (used throughout) Multiv (F. Murtagh) - (used for correspondence analysis) cond (A Brazzale) – (used for conditional logistic regression in chapter 6, NOTE: no longer supported) http://www.ladseb.pd.cnr.it/~brazzale/lib.html#ins Design (F. Harrell) - (used throughout) Hmisc (F. Harrell) - (support for Design library) nnet (B. Ripley) - for the function multinom (chapter 7, multinomial logit models) nolr (M. Mathieson) - (nonlinear ordinal models - supplement to chapter 9) rmtools (A. Azzalini & M. Chiogna) - (used for chapter 11) 2 yags2 (V. Carey) - (used for chapter 11) Most of these libraries can be obtained in .zip form from URL http://lib.stat.cmu.edu/DOS/S/Swin or http://lib.stat.cmu.edu/DOS/S. Currently, the URL http://www.stats.ox.ac.uk/pub/MASS4/Winlibs/ contains many ports to S-PLUS 6.0 for Windows. To install a library, first download it to any folder on your computer. Next, “unzip” the file using an “unzipping” program. This will extract all the files into a new folder in the directory into which you downloaded the zip file. Move the entire folder to the library directory under your S-PLUS directory (e.g., c:/program files/Insightful/splus61/library). To load a library, you can either pull down the File menu in S-PLUS and select Load Library or type one of the following in a script or command window library(“libraryname”,first=T) # loads libraryname into first database position library(libraryname) To use the library’s help manual from inside S-PLUS or R type in a script or command window help(library=“libraryname”) Many of the R packages used in this manual that do not come with the software are listed below (not a compete list) MASS – (VR bundle, Venables and Ripley) rmutil (J. Lindsey) – (used with gnlm) http://alpha.luc.ac.be/~lucp0753/rcode.html gnlm (J. Lindsey) – http://alpha.luc.ac.be/~lucp0753/rcode.html repeated (J. Lindsey) – http://alpha.luc.ac.be/~lucp0753/rcode.html SuppDists (B. Wheeler) – (used in chapter 1) combinant (V. Carey) – (used in chapters 1, 3) methods – (used in chapter 3) Bhat (E. Luebeck)– (used throughout) mgcv (S. Wood) – (used for fitting GAM models) modreg (B. Ripley) – (used for fitting GAM models) gee and geepack (J. Yan) – (used in chapter 11) yags (V. Carey) – (used in chapter 11) gllm – (used for generalized log linear models and latent class models) GlmmGibbs (Myles and Clayton) – (used for generalized linear mixed models, chap. 12) glmmML (G. Broström) – (used for generalized linear mixed models, chapter 12) CoCoAn (S. Dray) – (used for correspondence analysis) e1071 (A. Weingessel) – (used for latent class analysis) vcd (M. Friendly)– (used in chapters 2, 3, 5, 9 and 10) brlr (D. Firth) – (used in chapter 6) BradleyTerry (D. Firth) – (used in chapter 10) ordinal (P. Lindsey) – (used in chapter 7) http://popgen.unimaas.nl/~plindsey/rlibs.html design (F. Harrell) – (used throughout) http://hesweb1.med.virginia.edu/biostat Hmisc (F. Harrell) – (used throughout) VGAM (T. Yee) – (used in chapters 7 and 9) http://www.stat.auckland.ac.nz/~yee/VGAM/download.shtml mph (J. Lang) – (used in chapters 7, 10) http://www.stat.uiowa.edu/~jblang/mph.fitting/mph.fit.documentation.htm#availability exactLoglinTest (B. Caffo) – (used in chapters 9, 10) http://www.biostat.jhsph.edu/~bcaffo/downloads.htm aod – used in chapter 13 lca – used in chapter 13 mmlcr – used in chapter 13 flexmix – used in chapter 13 npmlreg – used in chapter 13 3 Rcapture – used in chapter 13 R packages can be installed from Windows using the install.packages function. This function can be called from the console or from the pull-down “Packages” menu. A package is loaded in the same way as for S-PLUS. As well, the command help(package=pkg) can used to invoke the help menu for package pkg. To detach a library or package, named library.name or pkg, respectively, one can issue the following commands, among others. detach(“library.name”) # S-PLUS detach(“package:pkg”) # R C. Setting contrast types using Options() The options function can be used to set the type of contrasts used in estimating models. The default for S-PLUS is Helmert contrasts for factors and polynomial contrasts for ordered factors. The default for R is treatment contrasts for factors and polynomial contrasts for ordered factors. I use treatment contrasts for factors for most of the analysis so that they match Agresti’s estimates. However, there are times when I use sum-to-zero contrasts (contr.sum). The type of contrasts I use is usually indicated by a call to options prior to fitting the model, if necessary. If the call to options has been omitted, please assume I am using treatment contrasts for factors. One can find out exactly what the contrasts are in a glm-type fit by using the functions model.matrix and contrasts. Issuing the comand contrasts(model.matrix(fit)) gives the contrasts. D. Credit for functions The author of a function is named if the function was not written by me. Whenever I use functions that do not come with S, I will mention the S-PLUS library or R package that includes them. I also give a URL, if applicable. E. Editing functions In several places in the text, I mention creating functions or editing existing functions that come with either S-PLUS or R. There are many ways to edit or create a function. Some of them are specific to your platform. However, one procedure that works on all platforms from the command line is a call to fix. For example, to edit the method function update.default, and call the new version update.crosstabs, type the following at the command line (R or S-PLUS) update.crosstabs<-fix(update.default) This will bring up the function code for update.default in a text editor from which you can make changes to the function, save them, and exit the editor. The changes will be incorporated in update.crosstabs. Note that the function edit works in mostly the same way here, but is actually a generic function that allows editing of not just function objects, but all other S objects as well. To create a function from scratch, put the name of the new function as the argument to fix. For example, fix(my.new.function) 4 To create functions from a script file (e.g., S-PLUS) or another editing program, one general procedure is to source the script file using e.g., source(“c:/path/name.of.script.file”) F. A note about using S-PLUS Menus Many of the more common methods I will illustrate can be accomplished via the S-PLUS menus. If you want to know what code corresponds to a particular menu command, issue the menu command and call up the History window (using the Window menu). All commands you have issued from menus will be there in (gui) code form which can then be used in a command window or script. G. Notice of errors All code has been tested, but there are undoubtedly still errors. Please notify me of any errors in the manual or of easier ways to perform tasks. My email address is [email protected]. H. Introductions to the S Language This manual assumes some working knowledge of the S language. There is not space to also describe it. Fortunately, there are many tutorials available for learning S. Some of them are listed in your User’s Guides that come with S-PLUS. Others are listed on the CRAN website for R (see Section A above). I. References Agresti, A. (2002). Categorical Data Analysis 2nd edition. Wiley. Bishop, C. (1995). Neural Networks for Pattern Recognition. Cambridge University Press Chambers, J. (1998). Programming with Data. Springer-Verlag. Chambers, J. and Hastie, T. (1992). Statistical Models in S. Chapman & Hall. Ewens, W. and Grant. G. (2001) Statistical Methods in Bioinformatics. Springer-Verlag. Gentleman, R. and Ilhaka, R. (2000). “Lexical Scope and Statistical Computing.” Journal of Computational and Graphical Statistics, 9, 491-508. Green, P. and Silverman, B. (1994). Nonparametric Regression and Generalized Linear Models, Chapman & Hall. Fletcher, R. (1987). Practical Methods of Optimization. Wiley. Harrell, F. (1998). Predicting Outcomes: Applied Survival Analysis and Logistic Regression. Unpublished manuscript. Now available as Regression Modeling Strategies : With Applications to Linear Models, Logistic Regression, and Survival Analysis. Springer (2001). Liao, L. and Rosen, O. (2001). “Fast and Stable Algorithms for Computing and Sampling From the Noncentral Hypergeometric Distribution." The American Statistician, 55, 366-369. Lloyd, C. (1999). Statistical Analysis of Categorical Data, Wiley. 5 McCullagh, P. and Nelder, J. (1989). Generalized Linear Models, 2nd ed., Chapman & Hall. Ripley B. (2002). On-line complements to Modern Applied Statistics with SPLUS (http://www.stats.ox.ac.uk/pub/MASS4). Ross, S. (1997). Introduction to Probability Models. Addison-Wesley. Selvin, S. (1998). Modern Applied Biostatistical Methods Using SPLUS. Oxford University Press. Sprent, P. (1997). Applied Nonparametric Statistical Methods. Chapman & Hall. Thompson, L. (1999) S-PLUS Manual to Accompany Agresti’s (1990) Catagorical Data Analysis. (http://math.cl.uh.edu/~thompsonla/5537/Splusdiscrete.PDF). Venables, W. and Ripley, B. (2000). S programming. Springer-Verlag. Venables, W. and Ripley, B. (2002). Modern Applied Statistics with S. Springer-Verlag. Wickens, T. (1989). Multiway Contingency Tables Analysis for the Social Sciences. LEA. J. Acknowledgements Thanks to Gregory Rodd and Frederico Zanqueta Poleto for reviewing the manuscript and finding many errors that I overlooked. Remaining errors are the sole responsibility of the author.

Description:
R (and S-PLUS) Manual to Accompany Agresti’s Categorical Data Analysis (2002) 2nd edition Laura A. Thompson, 2009©
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.