ebook img

A non-linear hierarchical modelling approach for census undercoverage estimation. PDF

14 Pages·2000·0.29 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview A non-linear hierarchical modelling approach for census undercoverage estimation.

Canada ssc Statistical Society of Societe Statistique du Canada 2000 Proceedings ofthe Survey Methods Section Recueil 2000 dela Section des methodes d'enquete Paperspresentedatthe28thAnnualMeetingofthe StatisticalSocietyofCanada, Ottawa, Ontario, Canada, June4-7, 2000 Communicationspresenteesau28ifemeCongresannueldela SocietestatistiqueduCanada, Ottawa, Ontario, Canada du4au7juin2000 SSCAnnualMeeting,June2000 ProceedingsoftheSurveyMethodsSection,pp.185-190 ANONLINEARHIERARCHICALMODELLINGAPPROACHFORCENSUS UNDERCOVERAGEESTIMATION YongYou1 ABSTRACT Area-levelnonlinearmixedeffectsmodelsareconsideredinthispaperforCanadacensusundercoyerageestimation. Wefitanarea-levelnonlinearmixedeffectsmodeltotheprovince-levelundercoveragesurveyestimates.Inparticular, thesamplingmodelisbasedonthesurveyestimateoftheundercoveragecount,andthelinkingmodelisalog-linear modelfortheundercoveragerate. AfullhierarchicalBayes(HB)approachisdevelopedtoobtaintheposterior estimatesofthecensusundercoverageusingMarkovChainMonteCarlo(MCMC)samplingmethods. Ourresult showsthattheproposedmethodcanprovideefficientmodel-basedestimates. Analysisofmodelfittingisalso presentedusingposteriorpredictivedistributions,andthecorrespondingresultindicatesthattheproposedmodelfits thedataquitewell. KEYWORDS:Censusundercoverage,Gibbssampling,Modelchecking,Nonlinearmixedmodel,Posterior. RESUME LesmodelesnonlinSairesaeffetsmixtesauniveaudesregionssontconsidereddanscetarticledanslecadrede l'estimationdelasous-couverturepr6senteaurecensementcanadien.Unmodelenonlineaireaeffetsmixtesauniveau desr6gionsestutilise'afindemoddliserlesestimesdesous-couvertureauniveauprovincial.Enparticulier,lemodele d'echantillonnageestbas6suruneestimationdelasous-couverture,etlemodeledeliaison,servantamodeliserletaux desous-couverture,estunmodelelog-lin6aire.UneapprochehierarchiquedeBayesestdeveloppeeafind'obtenirdes estimateursaposterioridelasous-couverture,enutilisantlesm&hodesdeMonteCarloavecchainedeMarkov (MCCM).Lesr^sultatsobtenusindiquentquelamSthodeproposeeengendredesestimateurs,bas6ssurlemodele,qui sontefficaces.Deplus,uneevaluationdumodele,utilisantlesdistributionspredictivesaposteriori,estpresenteeetles rgsultatsindiquentquelemodeleproposesembleadequat. MOTSCLES:sous-couvenireauniveaudurecensement,echantillonneurdeGibbs,evaluationdumodele,modelenon lineaireaeffetsmixtes,aposteriori. 1.INTRODUCTION thesecoveragestudies,thecollectionmethodologyis adjusted in order to improve coverage in the InCanada,acensusisconductedeveryfiveyears. succeeding census. More details on the coverage However, the census does not enumerate all the studiescanbefound,forexample,inGermainand inhabitantsthatshouldfillacensusformonCensus Mien(1993). Day. Inthe1991Canadiancensus,itisestimatedthat about 3% ofthe population were not enumerated. In1991,thepopulationestimateswerebasedonthe Thus the census needs to be adjusted for census counts adjusted for the estimated net undercoverage in order to properly represent the undercoverageinthecensus. Thebasepopulationwas demographicpictureofthecountryonCensusDay. formedbyaddingthenetprovincial undercoverage Since 1966,theReverseRecord Check (RRC) has estimatetotheprovincialcensuscount. Thiscreated beenusedbyStatisticsCanadatomeasurethegross anadjustedbaseuponwhichalltheotherpopulation numberofpersonsmissedbythecensus. Starting figures were derived using modelling and 1991, an Overeoverage Study was conducted to demographic metvhods. Rivest (1995) proposed a measure,the gross number ofpersons erroneously composite estimator to estimate the provincial includedinthecensus. In1991,forthefirsttime,the undercoverageusingthenationalundercoveragerate RRC results were combined with those of the asasyntheticestimate. Theeffectofthecomposite Overeoverage Study to produce the direct survey estimatoristoshrinkallprovincialratestothenational estimatesofthenetundercoverageforthenationand rate. Rivest'scompositeestimatorperformspoorlyat allprovinces. Throughtheanalysisoftheresultsof extreme provinces, namely, P.E.I. and Ontario, the 1YongYou,HouseholdSurveyMethodsDivision,StatisticsCanada,Ottawa,Canada,KlA0T6,[email protected]. 185 . smallestandthelargestprovincesofCanada (You, design-unbiased assumption may be restrictive, in 1997). particular the estimates of missed persons y,are probably subjecttosomeunknownbias(Zaslavsky, In recent years, modeling techniques have been 1993). However,sincewedon'thaveestimateforthe appliedtoimprovethedirectundercoverageestimates possiblebiasasintheUScase(Zaslavsky,1993),we from sample surveys. Datta et al. (1992) and assumethatthesurveyestimate y,isdesign-unbiased ZbmpaarossodlepeadolvssosenkdfyoUraS(1trc9heee9gn3rs)aeudssjspiurcosoontpumonemstnoetaddneodlhfiseiurcttaeuirnalcstihizuioiscnnasgl.unaBdDnaeirycecekmospv(ie1r(r9iHa9cBg5a)e)l fdsiomsratlrlmi,bu.atrieToanhiasesssitsuimmaaplttsiiooonnaoc(fRoamtohm,eosn1a9m9ap9s)ls.iunmgpteTirhorenorsunsoaetrdmtahilen Bayes(EB)methodtoestimatetheundercoveragein province level seems quitereasonable'"dueto large small domains. DickandYou(1997) employed a samplesize. Thesamplingvariances£2areestimated linearFay-Herriotmodel(FayandHerriot,1979)for throughgeneralizedvariancefunctionmodelsofthe ctthheiensspuprsaopvueinrndcewer-elceoavvreeelrapcgaerentisrcuaustleasurnliydneiranctoHevrBeesrtafegrdeaimcneowumonortdkse..linIIgnn fkonromwnVri(ny,t)hoecmCo|?'del(D.icTkh,us19£925)isaanctduatlhlyenatsrmeaotoetdheads particular, weproposeanonlinearmixedmodel in estimateofthesamplingvarianceofy, which the sampling model is based on the undercoveragecounts whereas thelinking model is Instead of using (1) directly to model the btoDeavirsecmerksdcoaoomnnfedrusanYtsdeoesou.rmceTo(vh1le9iemr9iua7tn)gadeteaicronocndusonvctoeasfr.nathgpeTerhomrevaotidepdereilospmepooxrsdpoeerpdleos-smsbeoeadddsebeilydn uturnnaddneesrrfccooorvvmeearrtaaiggoeencraotuen.tanNuda,m,eDliyc,ckolnaestnidddeiYr=oeduw,(/1(u9,9,m7+o)dc,eu)lsleaidnndga estimates for undercoverage counts and rates r.=y./(y.+c,), where 0,-is defined as the true mbIsoSniueaomtcsduSteelaeidltscosatnonfineoo3flonotlurhosw4ewlesycp.w.erienoIlslnpTuudossheSstseereccadurttrneiiendbomoeetnanhrilect2inhondeGvweeeiaerirbrmabofpghsfoliereetsmrmhaaaeeerslmnctlptahiylarimittcapiniatcrgloileonesmnme.oioensdtftehllttaoshIhid.ende uttfrrnoealdanletsorefwcdoiornvamgresar,tmiat=oohgd0neee,,ld+:riDearit„eeccktfOoea^rsnxtdtiJhmYe/aot3oiu-r+th(vo,1f,p9r9o7fvt)=i.nclBc,oea.n,.ss.e,imadd,nedroen(d2f)ttthhiiess GSfiineitcbtStibeniscgo.tnisoa5Fnmiwpn6ale.leldrye,sacnwrdeibpeprredesiseaengntntotsshtoeimcesesutcsioemndacttliouodnaisrnsegessursletmsm.oadrekIlsn wnwohhreemrraeel<fr7ia2insisdaoumvnekecnftfooewrcnto;fwierttehg~rNEe((s0vs,,ii)o/n/=?p)a.raanmdTethVee(rvss,;a)mv=p;liaisn1ga„ 2.UNDERCOVERAGEMODELS vaarianceys?Tawyalsortreatedaskanpoprwonxiamnadticaolnculatedfroams Supposetherearemprovinces. Inthei-thprovince ^=(l-^)VU.)/c,2=(l-i;.)^/c,2. Themodel nthuembceenrsuus,hpaesrscoonusntaerdecm,-ipsesresdonbsywhtihleecaennsuusn.knoTwhne gm1io9v7de9e)nl.bfyoHr(o2w)semivaslelra,satrwaeenadaenrsodttieampapttlhiaiotcnatmi(ooFdnaeyolfa(Fn2ad)y-HhHeaersrrriiototht,e coveragestudiesprovideasurveyestimate y,-ofthe following limitations: (i) the zero mean sampling netundercoveragealongwithanassociatedvariance model may not be true due to the nonlinear pocw£oerbfnirssts.ettorernvTnusehcdteamis,wntig.rtTuahTteoh=mupeotocd,pc.seu,oal+lvamuetc,pr.eilanoitgnSnreigeonssfcetreoutrndohtireteh,hseiem-uotceshsesetntpsirsoumoafasvmtiptenchlcoeoeeufwnsmcotuiarrsnvkcse,ebyiidesns tbdttirhhafaeesfneoesdkrrfienognoroitnmnwafanttlrihoosesmnua.trtmvrhpeaoTlynshiseenefsogotbrritemvmsaeaauidtrlnetiedssadan(tocfabYertoocauomiyusnatflenhiddedsfbmRraeoaodosmv,eielgt2rnh0ybie0af0isms)cetoa;drndot(eoinlilngy) whichthroughstandardestimationproceduresproduce assumption, since ysfis indeed a function of the amsinotdueeastltioinmgyaec,t.naee=nraoub,lfel+ywtrh£ie,utstneehund-maibs,t.eo.r.;dmoe,fs>crmiibses"edthipserseosntsi.m(a1t)Tihoen nuusionnmdrkiemnlraaoclrwovnmmeomradaeyegalensnroa0tat,re;ebw(eihaiiil)casopihnpirumsosopbedredietalwtbee(ye2n)boaetcshaasenurudsmeip1en,ogaplltvte,i.sh,tooutfgbhoheer where''e,-N(0,^). This'modelassumesthatthe example,Dattaetal.(1992)andZaslavsky(1993). A direct survey estimator y(is design-unbiased. The morerealisticmodelmayuselog(0,)*=xJ/3+vt. To 186 , , oyercometheselimitations, wenowpropose anew • 03\Y,U,a2]~ |M•m»ooddeLSweliahl:nmek1pr:ilenig£ngfm2oimsdoekdlne:ol:wn;y,.=m,+£,, £t-Af(0,£2) IG{<am=[1a+2m\1Y2,,U.bm,=1+P]£~(log^—^-•Mm-vxj0)212). log(U—,^+C—i)=xj0+Vi, Vi~N(0,a2). Drawing sampli=eis fro"'m• [f3\Y,U,a ] and IguninvdMeenorcdoevdlierrea1c,gteltyhecobsuyanmt.pl(1i)Tngheamnoddsaemlipsliisnbgaaslevidnaerairaonnmcoedtehiles [[mC,72\1Yy,,p£,/,a/?2]] doeissnotsthraaivgehtafocrlwoarsde.d^Tor|m.HTowoedvrear,w known. The linking model is a log-linear random samples from [«,\Y,/3,a2], Metropolis-Hastings effectsmodelfortheundercoveragerate,butherethe updatingscheme(see,e.g.,ChibandGreenberg,1995) rate0,isexpressedasafunctionofcountm,,thatis, isusedwithintheGibbs'sampler. TheMetropolis- l0i,n=kimn,g/(m«o,d+elc,)w.itIhnMtohedeslam1p,lwiengcamnondoetlcotmobifnoermthae HSausptpionsgesthuepdMaatrinkgovstcehpainisissautmtmhaerik-ztehditaesratfioonl.lowTso: Tmshteiatnshdoakdrisdndfloironefalripnrmeoaibrxlemedimxseefdffoercmtossdmemalolsdlelca.arnenaoTthesubtseimsaattpaipnoldniaerdi.ds uspadmaptleem[-u*s+1\)Yf,rpo,ma2N](,y,w,£e2),firtshtendwriatwhpraobacbainldiitdyate dHioswceuvsesred,binyRusaiong(a19c9o9m)plaentdeYHoBuaapnpdroaRcahowi(2t0h00t)h.e a-,(/.u(\k)\u\(A+l)%)=_n:u_nf{8g\(uMiw,ft'<H*>,'g"**>)>,n1}' ((C)\ • Gesitbibmsatessampolfingthemetuhnodde,rcwoevercaagneficnodunttheapnodsterriaoter weacceptthis uf+V);otherwise,set uf+1)=u\k).In simultaneously. (6),g(-)isafunctionofut,/3,a2givenby 3.GIBBSSAMPLER g(Ui,p,cr2)=^U:+C:s-exp{-(log(M,./(2M,a.z+c,)-^r^)2}• WenowuseModel1toobtainmodel-basedestimates (6) ofrfacmeen,sMuosduenlde1ricsoevxeprraegseseudta.s:InahierarchicalBayes Twehleleassti£m(«a,t.io|nnoafndtheV(pmo,st|erKi)ocrandisbteribbutaisoendoofnm,tahse y,\u,-NiMi4f\ i=l.-,m, (3) samples{u\k)}fromtheGibbssampler. and wPurh,ieuo+rirces—\lpoo,fgNa2/dS~ean\onotdgeNs(axa]2Plao,rgeo-rn2os)re,mtail=asd\:e,.n.s.i,;tmr,y(/Tf)u°n=(c4t1)iaonnd. Aacnosaul4niy.tnsIiDsaMi.scPktLWhaeEenMdauuEYxsNioeldTuiaArl(yo1Tg9I-9vat7Or)ri,aNanbwsAlfeeoN,ruDmstahetEaditSo1nTi9sI,9o1xfMuAdta=hTtealIocOgie(nNncs,ouu)sr a2~IG(a,b) with a and b are known positive then the linking model for m,is constants.Let Y=(yl,...,ym)T and U=(u1,...,um)T, log(u,/(«,+c,))=PQ+*,,/?!+v,. Toimplementand we are interested in the posterior distribution of monitortheconvergence oftheGibbs sampler, we ssuEia,(mmUgupiillvai|etYnneg)amtsnheaedmtphpldooaesdtstae(rGfieoYolr,rfavi«nan,dr.ipaaaTnnrocdteiSciumVlmi(aptruhl,t,e\mpY1eo)9ns9tt.0e)rTtiihhoseerusGGmeiiedbbabbtnsso fet(ith1oaee9lcr9lha2oat)wuio.toftnohscWleoeorebnrfagesteliihaanctcditheao=ppn2espdner,idqoneuawnectitnhhtlcehyegissdwvei=eeqm5nruu0eeli0nan0dct.eGeel,edeltLTwem=dhea.8entsaTfoeinoorqdksutreReen5udvc0beue0ircsn0y,e sampling,weneedtodrawsamplesfromthefollowing 10thiterationfortheleft5000iterations,leadingto fullconditionaldistributions: 500 iterations foreach sequence kept for analysis. |,r./»,a'j-^«P.{-^i- ThuswefinallyhaveL=8sequenceswithsizen=500 . I«| W. foreachsequence. (lg(«,/(»2,+aC:,)-Xjfi)'-}, i=l,...,m; Fsoimrutlhaetepdarvaamleuteerofofmi,nitneretshteu1-tt,hlesteqku^.endceen.otTehtehnejt-hteh 187 m ' , . posterior mean of m,is estimated by small provinces such as P.E.I, and Manitoba, the ui=YL X"_iM«//Ln. ThenwecomputedBlIn,the iesmtpirmoavteemiesnstignoifficpaonstt.eriFoorreOsnttiamraitoeaonvderQutehbeecd,irtehcet vbaarsieadncoenbent=w5e0e0nstihmeula8tesdeqvuaelnucees moefanust,; twh);a.teaicsh, lalaanrrdggeetshtestapwmoopslteeprrioosvirizneecsetsfiomriantetChaehnsaeavdeat,ewqoutahelprdCoivVrienccdteusee.sttiomatAthnee tBhet/nav=erXag;Le=](o«,f,.t-heu,)82/w(iLth-i1n)-.seqAulesnoc,elevtariWa,nced,enost\e, piontsetreersitoirngesrteismualtteis«,thhaastlfaorrgerNeCwVBtrhuannswtihcek,dirtehcet each based on n-1 degrees of freedom; that is, estimate y,,andalso «,isquitedifferentfrom y,. neWsa,tmie=mlay^t,ed_&fs?bt=yIL(.na-Tw1eh)Wie,g-nh/ttneh+deBpoatsv/tener,raigwoerhevroaefrina=Wn5,c0ea0n.odfNuoBtt,ies TpAwochhstietusreaerliisloydrmiorprerracoettvelioynb0c,veeisisotNuioesmnawlitnyeBdtr2eu.rrna5mts5sew%iocrfftkouriinssdNea3ern.cw2oou5vBte%lrriuaenagrsnewdinircatttkhhe.e,e thatifonlyonesequenceissimulated, Bi cannotbe model. Moredetailedanalysiswillbegiveninthe calculated. To estimate the i-th province nextsection. u6ntd=eructo/v(we,ra+gce,),ratleet 6e'taj,=uw'hyi/c(uh*,j+isci)d,eftihneend thaes 5.TESTOFMODELFITTING posterior mean of 0,can be estimated by TotesttheoverallfitofModel2,weusethemethod §i=X/=iSn=i^;^n-Similarly,wecancalculatethe oetfaplo.s,te1r9i9o5r).preLdeitctiTv(ey,p0)valbueea(Mdeinsgcr,ep1a9n9c4y;mGeealsmuarne posteriorvarianceof0,. Tomonitortheconvergence dependingonthedatayandonparameters6. Let9*v oftheGibbssampler,wecalculateVJ=&?+BtILn, representadrawfromtheposteriordistributionof8: andthenfindRt=V^/W, foreachobservation. R, is and let y* representa draw from f{y\9*), then knownasapotentialscalereductionfactor(Gelman marginally y* is a sample from the posterior andRubin, 1992). Iffl.'sarenear1forallofthe predictive distribution /(y|yofa), where yobs parameters m, ofinterest,thenthissuggeststhatthe representstheobserveddata. Theposteriorpredictive desiredconvergenceisachievedintheGibbssampler. p value is defined as eInquoaulrtostourdyv,ervyalculeosseoftoR1t,'sthfeorytshteron1g0lyprsouvgignecsetstahraet ppro=baPbirl(itnyyi's,weit)h>rneyspobesc,tdt)o\ythobes)p.osterNioortedistthraitbuttihoen the Gibbs sampler converged very well. Table 1 giventheobserveddata. shows the direct undercoverage estimates and the posterior undercoverage estimates as well as the Thisisanaturalextensionoftheusualpvalueina associated coefficients ofvariation. The posterior Bayesiancontext. Ifamodelfitstheobserveddata, esssmutarilvmleaeytreecsootefifmfaittcheieencyt,enofsofursvaalrulinadptreioorvncion(vcCeerVsa)geetxhcacenoputtnhtOentmda,irrhieaocs,t tfsiihtmesinltathrh.eeoItbnwsooetrhvveearldwueomsroddsoe,fl,itfhetthheednigsicvTre(enypaombnoscd,y0e)lmeasadhseouqurulaedtealbryee QuebecandNewBrunswick..Particularlyforsome Table1.Censusundercoverageestimation Province yt cvty,) u, CV(h,.) rt% CV(f,) e;% cvw,) NFLD, 11566 0.16 10782 0.14 1.99 0.16 1.86 0.13 PEI' ; 1220 0.30 1486 0.19 0.93 0.30 1.13 0.19 NS 17329 0.20 17412 0.14 1.89 0.20 1.90 0.14 NB 24280 0.14 18948 0.17 3.25 0.13 2.55 0.17 QUE ; 184473 0.08 189599 0.08 2.58 0.08 2.65 0.08 MONATN,! ,.3•8•120160941, 00..20-18- 32618540244 00..0184 31..6846 00..0280 31..5923 00..0184 SASK 18106 0.19 18822 0.14 1.80 0.18 1.87 0.13 BACLTA . ,5912822356 00..1150 5859392992 00..0192 22..0713 00..1104 22..1637 00..0192 188

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.