ebook img

ESTIMATING INTRACLUSTER HOMOGENEITY IN MULTISTAGE SAMPLES by Anne Fakler ... PDF

155 Pages·2008·3 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview ESTIMATING INTRACLUSTER HOMOGENEITY IN MULTISTAGE SAMPLES by Anne Fakler ...

ESTIMATING INTRACLUSTER HOMOGENEITY IN MULTISTAGE SAMPLES by Anne Fakler Clemmer Department of Biostatistics University of North Carolina at Chapel Hill Institute of Statistics Mimeo Series No. 1482lr February 1985 Estimating Intracluster Homogeneity In Multistage Samples by Anne Fakler Clemmer ADissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Doctor of Public Health in the Department of Biostatistics Chapel Hill 1985 Approved by: ABSTRACT CLEMMER, ANNE FAKLER, Estimating Intracluster Homogeneity In Multistage Samples. (Under the direction of WILLIAM D. KALSBEEK.) Multistage cluster samples are often used in social science because clusters of households are cost effective. However, sampling elements within the clusters are homogeneous, thus negating independence assump- tions and increasing the variance compared to simple random sampling. This research quantifies when each of two formulas is appropriate to estimate rate of homogeneity (roh). Roh is a measure of how much sampling elements are alike within clusters. It is speculated that one formula performs better when several different domain categories are present within each sampling cluster (cross-class domains) and the other better when only one domain category is present per cluster (segregated domains). As a secondary issue, this research investigates how the relationship between the domain and analysis variables may determine the appropriate formula for roh. To investigate the primary issue, two-stage cluster samples were selected from the National Medical Care Expenditure Survey (NMCES) household file; sample values for roh were calculated using each formula and compared with the population value from the entire file. The secon- dary issue was explored in two steps. First, the average domain ratio (ADR) or ratio of average rate of homogeneity across all categories in a particular domain to roh for the population total was calculated for each sample. The ADR was calculated using each of the two estimation procedures and compared with the true population value. Secondly, the ~ domain degree of segregation or cross-classness in each sample was modeled and the fit evaluated separately for each estimation procedure. i i TABLE OF CONTENTS ACKNm~LEDGMENTS LIST OF TABLES LIST OF FIGURES CHAPTER 1 - STATEMENT OF THE PROBLEM 1.1 What Is Intracluster Homogeneity And Why Is It Important • • • • • • • • • • • • • • • • • 1 1.2 Usefulness Of Intracluster Homogeneity 3 1.2.1 Population Measures For Cluster Homogeneity: Clusters Of Equal Sizes •••••••••• 3 1.2.2 Sample Estimate For Cluster Homogeneity: Clusters of Equal Sizes •••••••••• 5 1.2.3 Population Measures For Cluster Homogeneity: Clusters Of Unequal Sizes •••••••• 6 1.2.4 Sample Estimate For Cluster Homogeneity: Clusters Of Unequal Sizes ••••••••• 11 1.2.5 Current Uses For Intracluster Homogeneity • 12 1.2.6 Range Of Values For Intracluster Homogeneity 20 1.3 Recapitulation and Specification of Objectives Of This Research ••••.•••••...•••••.•••. 29 1.3.1 Primary Objective ••• • • • • • • • • 30 1.3.2 Secondary Objective ••• • • • • • • • • 31 1.3.3 Orientation Of Analysis Toward Veterans •••• 31 CHAPTER 2 - PLAN OF RESEARCH 2.1 Sample Selection ••••••••••••••••• 34 2.1.1 Subsample From Total To Limited Population. 35 2.1.2 Select Ten Independent Samples From Limited Population. • • • • • • • • • • • • • • • • 38 2.2 Selection Of Domain And Analysis Variables 44 2.2.1 Determination Of Analysis Variables 45 2.2.2 Determination Of Domain Variables 46 2.3 Primary Purpose: To Determine Under What Circumstances The Design Effect and Proportion Variation Estimation Method Are Appropriate To Calculate The Rate Of Homogeneity •••••••••••••••••••••• 48 2.3.1 Estimation Of Roh Using Design Effect Method •• 48 2.3.2 Estimation Of Roh Using Proportion Variation Method • • • • • • • • • • • • • • • • 50 2.3.3 Comparison Of Methods ••••••••••••• 50 iii TABLE OF CONTENTS (continued) 2.4 Secondary Purpose: To Investigate How The Relationship Between The Analysis Variable And The Nature Of The Domain May Determine The Appropriate Formula ••••• 54 2.4.1 Evaluate How Average Domain Ratio May Determine Appropriate Formula For Roh •••••••••• 54 2.4.2 Investigate How The Proportion Of The Domain Contained In AParticular Category May Affect The Appropriate Formula •••••••••••• 57 2.5 Analyses Of Interest To Veterans . 65 CHAPTER 3 - FINDINGS 3.1 Results from Primary Objective •••••• 70 3.2 Secondary Objective Findings •••••• 87 3.2.1 Determination of Average Domain Ratio 89 3.2.2 Model Proportion of Domain Contained In a Particular Category. • • • ••• 93 3.3 Findings of Interest to Veterans •• 96 3.4 Discussion of Results ••••• 107 CHAPTER 4 - SUMMARY OF RESULTS AND SUGGESTIONS FOR FUTURE RESEARCH 4.1 Summary of Results ••••••••• 113 4.2 Suggestions For Further Research 115 BIBLIOGRAPHY APPENDIX A- Flowchart And Computer Software To Calculate Rate Of Homogeneity APPENDIX B - Show Algebraically That p* For Unequal Clusters Reduces To For Equal Clusters p APPENDIX C - Summary Of Checks To Insure Computer Software To Calculate Roh Is Accurate APPENDIX D- Derivation Of Another Estimator For Roh, Designed Specifically For Proportions And Assuming Approximately Equal Samp1 ing C1 usters APPENDIX E - Proof That Mean of Difference In Absolute Values Of Relative Deviations Minus Its Expected Value And Divided By Its Standard Error Follows Student's t Distribution APPENDIX F - Proof That Proportion Variation Estimator Provides A Consistent Estimate For Roh According To The Definition Of Consistency In Hansen, Hurwitz, and Madow iv ACKNOWLEDGMENTS I am especially indebted to my advisor, Dr. William D. Kalsbeek, for his patient guidance, strong encouragement, and helpful suggestions at each step of this research effort. The valuable comments and support of the other members of my committee, Dr. Shirley A. A. Beresford, Dr. James R. Chromy, Dr. Dennis B. Gillings, Dr. Barbara S. Hulka, and Dr. Gary G. Koch are also gratefully acknowledged. Many thanks are additionally due to the Veterans Administration Medical Center in Durham, North Carolina for providing financial support for this research project. I also am grateful to Ruth Drum for her skillful and conscientious typing of this manuscript. Finally, very special thanks are due to my husband, Tom, my children, Hew and Nikki Dalhouse, and my parents, Mr. and Mrs. Paul C. Fakler, whose unwavering support, patience, and understanding, I have constantly relied on. v LIST OF TABLES TABLE PAGE 1.1 Analysis of variance of the mean into the components of the total variance. • • • • • • • 7 2.1 Comparison of stratum representation in total and . . . limited population •••••••••••• 37 2.2 Comparison of response frequencies for total and limited population •••••••••••• 39 2.3 Description of one independent replicate from limited population •••••••••••••••••• 41 2.4 Some summary measures of intracluster homogeneity for analysis variables in limited population •••• 45 . . . . . . . 2.5 Selected analysis variables •• 46 2.6 Four interesting average domain sample sizes 46 2.7 Selected domain variables ••• 47 2.8 Recommended domain and analysis variables to calculate rates of homogeneity •• • • • • • • • • • • • • •• 67 2.9 Suggested domain and analysis variables to compare differences in domain proportions. • • • • • • • 67 3.1 Comparison of sample rates of homogeneity by estimation method and analysis variable •••••••••••• 72 3.2 Sample rates of homogeneity by estimation method and domain variable for categorical analysis variables.. 74 3.3 Sample rates of homogeneity by estimation method and domain variable for continuous analysis variables.. 75 3.4 Comparison of relative deviations by estimation method and analysis variable. • • • • • • • • • • • • • • 78 3.5 Comparison of differences in absolute values of relative deviations by domain category and analysis variable •• • •• • • • • • • • • • • • • • •• 80 vi LIST OF TABLES (continued) 3.6 Comparison of estimation methods for categorical analysis variables •••••••••••••• 83 3.7 Comparison of estimation methods for continuous analysis variables ••••••••••••• 85 3.8 Comparison of true average domain ratios (ADR) for study population ••••••••••••• 88 3.9 Comparison of average domain ratios by estimation method and analysis variable •••• • • • • • • 92 3.10 Estimates of study population degree of segregation and model fit for Verma et al model • • • • • 93 3.11 Estimated rates of homogeneity by design effect and proportion variation methods for urban and rural veteran users of medical services from entire NMCES sample •••••••••••••••• 99 3.12 Estimated rates of homogeneity by design effect and proportion variation methods for veteran users of medical services by region from entire NMCES sample ••••••.•••• • • • • • • • • 101 3.13 Comparisons between veterans and nonveterans in the total NMCES sample ••••••• 104 3.14 Comparisons between veteran users of medical services and other veterans in the total NMCES sample •••• 106 3.15 Comparisons between urban and rural veteran users of medical services in the total NMCES sample • •• 108 vii LIST OF FIGURES FIGURE PAGE 1.1 Indirect Imputation of Deft and S.ta.nd.ar.d.E.rro.r.V.ia. . Roh • • • • • • • • 14 viii

Description:
Institute of Statistics Mimeo Series No. 1482lr. February .. sample under consideration, usually a cluster design, to the variance of a simple random
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.