Master thesis Department of Statistics Masteruppsats, Statistiska institutionen On the Performance of Chi Square Tests Adjusted for Complex Survey Design Israel Gebregiorgis Masteruppsats 30 högskolepoäng, vt 2013 Supervisor: Dan Hedlin Abstract This paper discuses a Monte Carlo study of procedures for testing independence in a two way table under two stage cluster sampling. The study was designed to compare the different adjusted chi square test of independence procedures with respect to their type I error control and power, and by how much this adjustments improve as compared to the unadjusted once, and also how much of an effect factors like number of clusters, strength of clustering and number of secondary sampling units in a sample have on the type I error control and power performance of this adjusted chi square testes. Acknowledgment I like to express my deepest gratitude to my advisor Dan Hedlin who was abundantly helpful and offered invaluable assistance, support and guidance throughout the course of this thesis. Table of contents 1. Introduction ........................................................................................................................1 2. The standard Chi square test of independence for multinomial case ....................................3 3. Impact of design on standard statistical tests. ......................................................................4 3.1 The Impacts of design on Contingency table .................................................................6 3.2 The impact of design on hypothesis test and confidence intervals ..................................6 4. First order Rao and Scott adjustment(goodness of fit test) ...................................................7 4.1 The second order Rao-Scott adjustment(goodness of fit test) .........................................8 4.2 First order Rao-Scott adjustment(test of independence) .................................................9 4.3 Wald statistics ............................................................................................................. 11 5. Simulations ....................................................................................................................... 12 5.1 Generating clustered data ............................................................................................ 12 5.2 The Monte Carlo study ................................................................................................ 14 5.3 Empirical measures used in the study .......................................................................... 15 6. Study result and discussion on type I error ........................................................................ 15 6.1 The effect of strength of clustering .............................................................................. 16 6.2 The effect of number of clusters................................................................................... 17 6.3 The effect of secondary sampling units ........................................................................ 18 7. Study result and discussion on power ................................................................................ 19 7.1 The effect of strength of clustering .............................................................................. 19 7.2 The effect of number of clusters .................................................................................. 20 7.3 The effect of secondary sampling units ........................................................................ 21 8. Finite population correction to Pearson chi square test ...................................................... 22 9. Conclusion ....................................................................................................................... 24 References ............................................................................................................................ 26 Appendix .............................................................................................................................. 27 1. Introduction A variety of statistical methods for the analysis of cross classified count data are being used by survey researchers to analyze data from large surveys, but often researcher choose software that are designed for simple random samples and arrive at conclusions that are misleading since they did not take in to account the impact of the complex sample design. The earlier versions of SAS, SPSS and GLIM are some of the standard computer packages used to implement these methods, this packages are based on multinomial sampling assumptions which means they assume that samples are selected by simple random sampling (SRS) with replacement or are based on the assumptions of iid (independent and identically distributed) random variables, and researchers have been using this methods to analyze sample survey data but often survey designs use stratification or cluster sampling. Hence they do not satisfy the multinomial assumptions so first we will examine the impact of complex designs such as stratification and clustering has on this methods which are based on the assumption of iid observations. The first and second order Rao-Scott corrections to the standard Pearson chi square statistics ((cid:1)(cid:3)) and likelihood ratio statistics( (cid:1)(cid:3)) , Wald test and their F corrected (cid:2) (cid:4)(cid:5) versions are some of the alternative procedures which take account of the complexity of the design such as the clustering effect mentioned above. It is possible to calculate the Rao-Scott Chi square statistics from the Pearson Chi square statistics and the design correction is based on the design effect of the proportions. If we consider the null hypothesis which says there is no association between row and column variables, the Rao-Scott statistics approximately follows a chi square distribution with(R-1)(C- 1) degrees of freedom. The main goal of this paper will be to test how well this adjustment procedures are performing in comparison to one another and to simple Pearson chi square test and the log likelihood ratio test on test of independence of a two by two contingency table on an artificial population consisting of variables x and y. 1 There are a very limited number of papers which addressed this issue and most of them are older than 15 years. Since then all the statistical software have been updated and new features added to them so that we can use this adjustment methods more easily but if we were to chose from this methods to use we have to look back papers that were written before this software were developed so in this paper using the latest SAS(9.3) statistical software, I will try to compare the adjustment methods and in doing so we can also see by how much this adjustments are performing better than the simple chi square tests. Another goal of this paper will be to corroborate the papers written on this subject matter and get results that are more convincing and strong and the analysis and interpretations will be done more clearly and intuitively so that researchers can repeat the procedures step by step and see why we need to adjust the simple (cid:3) and (cid:3) tests (cid:1) (cid:1) (cid:2) (cid:4)(cid:5) for complex statistical surveys and which to choose from. The adjustment methods are meant to perform better with increasing of the number of clusters so in this paper we will see how much of an effect this might have on their performance and also the effect of the strength of clustering and the number of secondary sampling units will be discussed. There is an unproved statement in Rao,J.N.K & Thomas,D.R.(1989). That says the Pearson statistics is conservative when applied to a data from Simple Random Sampling without replacement and a finite population correction to the Pearson chi square test will correct this problem and make sure the type I error rate is equal to the nominal level(α) asymptotically, and also makes the test asymptotically more powerful. So in this paper we will do a simulation study to see if the Pearson test is actually conservative when applied to a data from SRS, and if it is conservative, does the finite population correction solves the problem? 2 2. The standard Chi square test of independence for multinomial case Consider the two way contingency tables with multinomial assumption and do the independency test. Let the row factor R with r levels and column factor C with c levels be the two factors that cross-classify each of n independent observation. Each observation has probability of falling in to row i and column j (cid:6) (cid:7)(cid:8) (cid:6) = ∑(cid:12) (cid:6) is the probability that a randomly selected unit will fall in row i (cid:7)(cid:9) (cid:8)(cid:13)(cid:14) (cid:7)(cid:8) (cid:6) = ∑(cid:5) (cid:6) Is the probability that randomly selected unit will fall in column j (cid:9)(cid:8) (cid:7)(cid:13)(cid:14) (cid:7)(cid:8) is the observed count in cell (i,j) (cid:15) (cid:7)(cid:8) If all units in the sample are independent the ‘s are from a multinomial sampling (cid:15) (cid:7)(cid:8) distribution with rc categories. In surveys, multinomial sampling assumptions are met in SRS with replacement. The null hypothesis of independence is H : for i=1….r and j=1….c o (cid:6) = (cid:6) (cid:6) (cid:7)(cid:8) (cid:7)(cid:9) (cid:9)(cid:8) The alternative hypothesis is H : for i=1….r and j=1…..c 1 (cid:6) ≠ (cid:6) (cid:6) (cid:7)(cid:8) (cid:7)(cid:9) (cid:9)(cid:8) Let represent the expected counts (cid:18) = (cid:19)(cid:6) (cid:7)(cid:8) (cid:7)(cid:8) If H is true and can be estimated by o (cid:18) = (cid:19)(cid:6) (cid:6) (cid:18) (cid:7)(cid:8) (cid:7)(cid:9) (cid:9)(cid:8) (cid:7)(cid:8) (cid:22)(cid:23)(cid:24) (cid:22)(cid:24)(cid:28) (cid:18)̑ = (cid:19)(cid:6)̑ (cid:6)̑ = (cid:19)(cid:21) (cid:26)( ) (cid:7)(cid:8) (cid:7)(cid:9) (cid:9)(cid:8) (cid:25) (cid:25) where (cid:6)̑ = (cid:22)(cid:23)(cid:28), (cid:6)̑ = ∑(cid:5) (cid:6) and (cid:6)̑ = ∑(cid:12) (cid:6) (cid:7)(cid:8) (cid:25) (cid:9)(cid:8) (cid:7)(cid:13)(cid:14) (cid:7)(cid:8) (cid:7)(cid:9) (cid:8)(cid:13)(cid:14) (cid:7)(cid:8) The standard Pearson chi –square statistics for testing H : for i=1….r and j=1….c is given by o (cid:6) (cid:6) (cid:6) (cid:7)(cid:8)(cid:13) (cid:7)(cid:9) (cid:9)(cid:8) 3 % $ (cid:5) (cid:12) (cid:3) (cid:3) (cid:3) (cid:1) ~ (x −(cid:18)̑ ) / (cid:18)̑ = (cid:19) ((cid:6)̑ −(cid:6)̑ (cid:6)̑ ) /(cid:6)̑ (cid:6)̑ (cid:2) "# (cid:7)(cid:8) (cid:7)(cid:8) (cid:7)(cid:8) (cid:7)(cid:9) (cid:9)(cid:8) (cid:7)(cid:9) (cid:9)(cid:8) "(cid:13)(cid:14) "(cid:13)(cid:14) (cid:7)(cid:13)(cid:14)(cid:8)(cid:13)(cid:14) The Standard likelihood ratio statistics for testing for i=1….r and j=1….c is given by ( :(cid:6) (cid:6) (cid:6) ) (cid:7)(cid:8)(cid:13) (cid:7)(cid:9) (cid:9)(cid:8) (cid:1)(cid:3) ~ 2∑(cid:5) ∑(cid:12) (cid:15) ln ((cid:22)(cid:23)(cid:28)) (cid:4)(cid:5) (cid:7)(cid:13)(cid:14) (cid:8)(cid:13)(cid:14) (cid:7)(cid:8) .̑(cid:23)(cid:28) If multinomial sampling is used with a sufficiently large sample size, (cid:3) and (cid:3) are (cid:1) (cid:1) (cid:2) (cid:4)(cid:5) approximately distributed as /(cid:3) random variables with I-1 degrees of 01(cid:14) freedom(df),when Ho holds, but reject Ho when (cid:1)(cid:3)or (cid:1)(cid:3) exceeds /(cid:3) (α) where I (cid:2) (cid:4)(cid:5) 01(cid:14) is the number of cells, the upper α-point for /(cid:3) , so that type I error rate is equal to 01(cid:14) α(eg 0.05 or 0.01) 3. Impact of design on standard statistical tests. It has been a common practice to use the standard chi square tests (cid:1)(cid:3) and (cid:1)(cid:3) which (cid:2) (cid:4)(cid:5) are based on the assumption of multinomial sampling as a valid test on survey researches with complex design features, but studies like (Holt et al.1982:Rao & Scott 1981) have shown the effects of complex sampling procedures such as cluster have on the use of the standard chi square tests. These studies found that the standard tests give erroneous results when applied to data arising from complex design. When we use complex survey design such as clustering the estimated cell probabilities and the test of association are affected because in these designs we no longer have the random sampling which gives the standard (cid:3)or (cid:3) an approximate (cid:1) (cid:1) (cid:2) (cid:4)(cid:5) /(cid:3) distribution, therefore ignoring the survey design will result in wrong significance levels and p-values. If we take clustering as an example with intra class correlation coefficient (ICC) usually the resulting p-value will be much bigger than the p-value reported by statistical packages using multinomial sampling assumptions. 4 For simple goodness of fit test with I cells Rao and Scott(1979,1981) have shown that for complex designs the standard procedures are not valid and they also showed that the (cid:1)(cid:3) and (cid:1)(cid:3) are distributed asymptotically as a weighted sum δ1 W1+……..+ δI-1 (cid:2) (cid:4)(cid:5) WI-1 of I-1 independent /(cid:3)variables 2. 0 (cid:7) 2 ~ /(cid:3) and independent (cid:7) 0 3 are the eigenvalues of the design effect matrix 41(cid:14)5 (cid:7) ) (cid:2) is the multinomial covariance matrix under 4 ( ) ) 4 = (cid:19)1(cid:14) [789:((cid:6) )−(cid:6) (cid:6);] ) ) ) ) (cid:6); = (cid:6) (cid:6) ……(cid:6) vector of hypothesised proportions ) )(cid:14) )(cid:3) )0 is the covariance matrix under the actual design 5 (cid:2) if I=2 the design effect matrix becomes the ordinary design effect which is >9?((cid:6)̑ )/[(cid:19)1(cid:14)(cid:6) (1−(cid:6) )] (cid:14) )(cid:14) )(cid:14) In the special case of simple random sampling with replacement 1(cid:14) 4 5 = A ) (cid:2) This is an identity matrix with all the eigenvalues equal to 1.In this case the weighted sum of W’s will become χ2 which is the original distribution . i I-1 /(cid:3)~ ∑01(cid:14)3 2 = ∑01(cid:14)2 (cid:2) (cid:7)(cid:13)(cid:14) (cid:7) (cid:7) (cid:7)(cid:13)(cid:14) (cid:7) It should be noted that BC/(cid:3)D = ∑01(cid:14)3 = (A−1)3̅ (cid:2) (cid:7)(cid:13)(cid:14) (cid:7) If 3F > 1 /(cid:3) will often be larger than /(cid:3) (cid:2) (01(cid:14)) 5 This will make the test to reject too often causing high type I error. If 3F < 1 And this will make the test conservative which means it does not reject as often as it should. (I-1)δ can be interpreted as a weighted average of the I-1 design effects. So large design effects imply the chi square test will have a large Type 1 error. 3.1 The Impacts of design on Contingency table Unless our sampling method is self weighting (eg SRS) the observed counts in a (cid:15) (cid:7)(cid:8) contingency table do not necessarily reflect the relative frequency of the categories in the population. If samples are selected with unequal probability, a contingency table with observed counts from the sample ignoring inclusion probabilities would not give the real picture of the association between the two categories in the table and this leads to wrong estimated numbers which uses the margins of the contingency table. 3.2 The impact of design on hypothesis test and confidence intervals Usually cluster sampling gives a design effect greater than one. (cid:3)and (cid:3) values are (cid:1) (cid:1) (cid:2) (cid:4)(cid:5) often larger If we ignore clustering, and this will lead to smaller p-values which could mean that you can declare an association to be statistically significant when it is really just due to random variation in the data. Confidence intervals for log odds ratios will be narrower than they should be which leads to believing that estimates are more precise than they actually are. Because of the reasons mentioned above ignoring clustering may lead to adapting an unnecessarily complicated model to describe the data and the social implications can be very expensive. 6
Description: