Series/Number07–071 ANALYZING COMPLEX SURVEY DATA Second Edition EunSulLee DivisionofBiostatistics,SchoolofPublicHealth, UniversityofTexasHealthScienceCenter—Houston RonaldN.Forthofer SAGEPUBLICATIONS InternationalEducationalandProfessionalPublisher ThousandOaks London NewDelhi Copyright(cid:1)2006bySagePublications,Inc. Allrightsreserved.Nopartofthisbookmaybereproducedorutilizedinanyformorbyany means,electronicormechanical,includingphotocopying,recording,orbyanyinformation storageandretrievalsystem,withoutpermissioninwritingfromthepublisher. Forinformation: SagePublications,Inc. 2455TellerRoad ThousandOaks,California91320 E-mail:[email protected] SagePublicationsLtd. 1Oliver’sYard 55CityRoad LondonEC1Y1SP UnitedKingdom SagePublicationsIndiaPvt.Ltd. B-42,PanchsheelEnclave PostBox4109 NewDelhi110017 India PrintedintheUnitedStatesofAmerica LibraryofCongressCataloging-in-PublicationData Lee,EunSul. Analyzingcomplexsurveydata/EunSulLee,RonaldN.Forthofer.—2nded. p.cm.—(Quantitativeapplicationsinthesocialsciences;vol.71) Includesbibliographicalreferencesandindex. ISBN0-7619-3038-8(pbk.:alk.paper) 1.Mathematicalstatistics.2.Socialsurveys—Statisticalmethods.I.Forthofer,RonN., 1944-II.Title.III.Series:Sageuniversitypapersseries.Quantitativeapplicationsinthe socialsciences;no.07–71. QA276.L33942006 (cid:1) 001.422—dc22 2005009612 Thisbookisprintedonacid-freepaper. 05 06 07 08 09 10 9 8 7 6 5 4 3 2 1 AcquisitionsEditor: LisaCuevasShaw EditorialAssistant: KarenGiaWong ProductionEditor: MelanieBirdsall CopyEditor: A.J.Sobczak Typesetter: C&MDigitals(P)Ltd CONTENTS SeriesEditor’sIntroduction v Acknowledgments vii 1. Introduction 1 2. SampleDesignandSurveyData 3 TypesofSampling 4 TheNatureofSurveyData 7 ADifferentViewofSurveyData* 9 3. ComplexityofAnalyzingSurveyData 11 AdjustingforDifferentialRepresentation:TheWeight 11 DevelopingtheWeightbyPoststratification 14 AdjustingtheWeightinaFollow-UpSurvey 17 AssessingtheLossorGaininPrecision:TheDesignEffect 18 TheUseofSampleWeightsforSurveyDataAnalysis* 20 4. StrategiesforVarianceEstimation 22 ReplicatedSampling:AGeneralApproach 23 BalancedRepeatedReplication 26 JackknifeRepeatedReplication 29 TheBootstrapMethod 35 TheTaylorSeriesMethod(Linearization) 36 5. PreparingforSurveyDataAnalysis 39 DataRequirementsforSurveyAnalysis 39 ImportanceofPreliminaryAnalysis 41 ChoicesofMethodforVarianceEstimation 43 AvailableComputingResources 44 CreatingReplicateWeights 47 SearchingforAppropriateModelsforSurveyDataAnalysis* 49 6. ConductingSurveyDataAnalysis 49 AStrategyforConductingPreliminaryAnalysis 50 ConductingDescriptiveAnalysis 52 ConductingLinearRegressionAnalysis 57 ConductingContingencyTableAnalysis 61 ConductingLogisticRegressionAnalysis 65 OtherLogisticRegressionModels 69 Design-BasedandModel-BasedAnalyses* 74 7. ConcludingRemarks 78 Notes 80 References 83 Index 88 AbouttheAuthors 91 SERIESEDITOR’SINTRODUCTION When George Gallup correctly predicted Franklin D. Roosevelt as the 1936presidentialelectionwinner,publicopinionsurveysenteredtheage of scientific sampling, and the method used then was quota sampling, a type of nonprobability sampling representative of the target population. Thesamemethod,however,incorrectly predictedThomasDeweyasthe 1948 winner though Harry S. Truman actually won. The method failed because quota sampling is nonprobabilistic and because Gallup’s quota frames were based on the 1940 census, overlooking the urban migration duringWorldWarII. Today’s survey sampling has advanced much since the early days, now relyingonsophisticatedprobabilisticsamplingdesigns.Akeyfeatureisstra- tification:Thetargetpopulationisdividedintosubpopulationsofstrata,and thesamplesizesinthestrataarecontrolledbythesamplerandareoftenpro- portionaltothestratapopulationsizes.Anotherfeatureisclusterandmulti- stage sampling: Groups are sampled as clusters in a hierarchy of clusters selectedatvariousstagesuntilthefinalstage,whenindividualelementsare sampledwithinthefinal-stageclusters.TheGeneralSocialSurvey,forexam- ple,usesastratifiedmultistageclustersamplingdesign.(Kalton[1983]gives aniceintroductiontosurveysampling.) Whenthesurveydesignisofthiscomplexnature,statisticalanalysisofthe dataisnolongerasimplematterofrunningaregression(oranyothermodel- ing) analysis. Surveys today all come with sampling weights to assist with correct statistical inference. Most texts on statistical analysis, by assuming simplerandomsampling,donotincludetreatmentofsamplingweights,an omissionthatmayhaveimportantimplicationsformakinginferences.Dur- ing the last two tothree decades,statisticalmethods for dataanalysishave also made huge strides. These must have been the reason my predecessor, MichaelLewis-Beck,whosawthroughtheearlystagesoftheeditorialwork inthisvolume,chosetohaveasecondeditionofAnalyzingComplexSurvey Data. Lee and Forthofer’s second edition of the book brings us up to date in uniting survey sampling designs and survey data analysis. The authors beginbyreviewingcommontypesofsurveysampledesignsanddemystify- ingsamplingweightsbyexplainingwhattheyare,andhowtheyaredevel- oped and adjusted.They then carefullydiscussthe majorissues ofvariance estimation and of preliminary as well as multivariate analysis of complex cross-sectional survey data when sampling weights are taken into account. Theyfocusonthedesign-basedapproachthatdirectlyengagessampledesigns intheanalysis(althoughtheyalsodiscussthemodel-basedperspective,which v vi canaugmentadesign-basedapproachinsomeanalyses),andtheyillustrate theapproachwithpopularsoftwareexamples.Studentsofsurveyanalysiswill find the text of great use in their efforts in making sample-based statistical inferences. —TimFutingLiao SeriesEditor vii ACKNOWLEDGMENTS We sadly acknowledge that Dr. Ronald J. Lorimor, who had collaborated on the writing of the first edition of this manuscript, died in 1999. His insights into survey data analysis remain in this second edition. We are gratefultoTomW.Smithforansweringquestionsaboutthesampledesign oftheGeneralSocialSurveyandtoBarryL.Graubard,LuAnnAday,and MichaelS.Lewis-Beckfortheirthoughtfulsuggestionsforthefirstedition. Thanksalsoareduetoanonymousreviewersfortheirhelpfulcommentsfor botheditionsandtoTimF.Liaoforhisthoughtfuladviceforthecontents ofthesecondedition.Specialthanksgotomanystudentsinourclassesat theUniversityofTexasSchoolofPublicHealthwhoparticipatedindiscus- sionsofmanytopicscontainedinthisbook. ANALYZING COMPLEX SURVEY DATA, SECOND EDITION EunSulLee DivisionofBiostatistics,SchoolofPublicHealth, UniversityofTexasHealthScienceCenter—Houston RonaldN.Forthofer 1.INTRODUCTION Survey analysis often is conducted as if all sample observations were independently selected with equal probabilities.This analysis iscorrect if simplerandomsampling(SRS)isusedindatacollection;however,inprac- ticethesampleselectionismorecomplexthanSRS.Somesampleobserva- tions may be selected with higher probabilities than others, and some are included in the sample by virtue of their membership in a certain group (e.g.,household)ratherthanbeingselectedindependently.Canwesimply ignorethesedeparturesfromSRSintheanalysisofsurveydata?Isitappro- priate to use the standard techniques in statistics books for survey data analysis? Or are there special methods and computer programs available for a more appropriate analysis of complex survey data? These questions areaddressedinthefollowingchapters. Thetypicalsocialsurveytodayreflectsacombinationofstatisticaltheory and knowledgeabout socialphenomena, and its evolution has been shaped byexperiencegainedfromtheconductofmanydifferentsurveysduringthe last70years.Socialsurveyswereconductedtomeettheneedforinformation to address social, political, and public health issues. Survey agencies were established within and outside the government in response to this need for information.Intheearlyattemptstoprovidetherequiredinformation,how- ever,thesurveygroupsweremostlyconcernedwiththepracticalissuesinthe fieldwork—such as sampling frame construction, staff training/supervision, andcostreduction—andtheoreticalsamplingissuesreceivedonlysecondary emphasis(Stephan,1948).Asthesepracticalmatterswereresolved,modern samplingpracticehaddevelopedfarbeyondSRS.Complexsampledesigns hadcometothefore,andwiththem,anumberofanalyticproblems. Because the early surveys generally needed only descriptive statistics, there was little interest in analytic problems. More recently, demands for 1
Description: