ebook img

A COURSE OF SMALL AREA ESTIMATION PDF

606 Pages·2021·6.166 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview A COURSE OF SMALL AREA ESTIMATION

Statistics for Social and Behavioral Sciences Domingo Morales María Dolores Esteban Agustín Pérez Tomáš Hobza A Course on Small Area Estimation and Mixed Models Methods, Theory and Applications in R Statistics for Social and Behavioral Sciences Statistics for Social and Behavioral Sciences (SSBS) includes monographs and advancedtextbooksrelatingtoeducation,psychology,sociology,politicalscience, publicpolicy,andlaw. Moreinformationaboutthisseriesathttp://www.springer.com/series/3463 Domingo Morales • María Dolores Esteban (cid:129) Agustín Pérez (cid:129) Tomáš Hobza A Course on Small Area Estimation and Mixed Models Methods, Theory and Applications in R DomingoMorales MaríaDoloresEsteban MiguelHernándezUniversityofElche MiguelHernándezUniversityofElche Elche,Spain Elche,Spain AgustínPérez TomášHobza MiguelHernándezUniversityofElche CzechTechnicalUniversityinPrague Elche,Spain Prague,CzechRepublic ISSN2199-7357 ISSN2199-7365 (electronic) StatisticsforSocialandBehavioralSciences ISBN978-3-030-63756-9 ISBN978-3-030-63757-6 (eBook) https://doi.org/10.1007/978-3-030-63757-6 MathematicsSubjectClassification:62J12,62P25,62D05 ©TheEditor(s)(ifapplicable)andTheAuthor(s),underexclusivelicencetoSpringerNatureSwitzerland AG2021 Thisworkissubjecttocopyright.AllrightsaresolelyandexclusivelylicensedbythePublisher,whether thewhole orpart ofthematerial isconcerned, specifically therights oftranslation, reprinting, reuse ofillustrations, recitation, broadcasting, reproductiononmicrofilmsorinanyotherphysicalway,and transmissionorinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilar ordissimilarmethodologynowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthors,andtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressedorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG. Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Preface Smallareaestimation(SAE)isabranchofstatisticalsciencethatcombinesmethods and tools of sampling and inference in finite populations, statistical models with randomeffects,andmathematicalprogramminglanguages.Statisticians,boththose fromacademiccentersdoingresearchinSAEandthosefromstatisticalofficeswho applytheSAEmethodologytorealdata,needtobetrainedinthethreedisciplines to be competitive. SAE is basically a multidisciplinary branch of statistics, and this complicates the training of new researchers or statisticians specialized in its application.Theuniversitydepartmentstrainresearchersinstatisticalmethodology, but there are still few doctoral programs specialized in SAE. The statistical officeshavetrainingprograms,suchasthe“EuropeanStatisticalTrainingProgram (ESTP)”ofEUROSTAT,wherefromtimetotimeSAEcoursesaretaught.Bothin thedoctoratecoursesandinthetrainingprograms,itisnecessarytohavebooksor manualsthatfacilitatetheworkofteachingandlearning. Currently, there are excellent books on SAE covering a broad spectrum of statistical inference procedures for SAE. These books try to cover most of the relevant theoretical developments. For this reason, some PhD students or applied statisticianshavecommentedontheneedtohaveabookwithadifferentapproach and that is complementary to the previous ones. The idea is to have a book that covers only some basic aspects of the theory and practice of SAE but that addressesthoseissuesthinkingthatthereaderisnotanexpertinsampling,statistical modeling,orprogramminglanguages. This book aims to be useful to researchers and doctoral students. For this, the chaptersexposethemathematicaldevelopmentswithplentyofdetails.Inthisway, thereaderisprovidedwiththefollow-upandunderstandingoftheusualreasoning intheresearchanddevelopmentofnewmethodologiesforSAE.Thechaptersofthe booktrytobeself-contained,sothattheycanbereadwithouthavingtonecessarily readthepreviouschapters.Thebookwantstobeusefultoappliedstatisticiansand statistical offices. For this, each chapter contains examples of application of SAE techniquestosyntheticdata.Thedatahavebeensimulatedbyimitatingthestructure of labor force or living conditionssurveys.Some of the socioeconomicindicators that are intendedto be estimated are totals of unemployedpeople,unemployment v vi Preface rates,averageannualnetincomes,orpovertyproportionsandgaps.Theexamples are implemented in the R language, and the corresponding code is provided. For a better understanding,the programmingis done in a simple language of didactic character. In this way, it is easier to identify the mathematical formulas and the methodology in the R code lines. As far as possible, functional programming is avoided.Thecontentsofthebookcomefromthesubjectstaughtbytheauthorsin SAE coursesin universitiesandstatistical officesand fromthe research theyhave done in recentyears. This has conditionedthe selection of contents, and it is also thereasonwhysomeveryrelevantaspectsoftheSAEtheoryarenotcoveredbythe book. Chapter 1 gives some elementary comments on SAE and mixed models and presents the structure and description of the data files used in the examples with R.Therearetwofilescontainingunit-leveldataandtwofilescontainingaggregated dataatthedomainlevel. Chapters2and3introducethemostpopulardesign-basedestimatorsofdomain means and totals and describe some resampling procedures for estimating their mean squared errors (MSEs). The book studies the direct estimators of Horvitz– ThompsonandHájecktypeandthebasicsynthetic,post-stratified,andgeneralized regression estimators. It also treats the problem of adapting these estimators to complexdesignsurveys,whereinverseofinclusionprobabilities(samplingweights) is corrected by non-response and calibrated to known population quantities for obtainingelevationfactors. Asanalternativetothedesign-basedapproach,Chap.4givesanintroductionto thepredictiontheoryinfinitepopulationsandprovesthegeneralpredictiontheorem. Thistheoremgivestheexpressionofthebestlinearunbiasedpredictor(BLUP)of alinearparameterandthecorrespondingpredictionvariance.Undertheprediction theory,wethinkdifferentlythanunderthetheoryofsamplingin afiniteandfixed population. For this reason, this chapter presents several examples in which the expressionoftheBLUPofatotalisderivedunderlinearmodels. Chapter 5 presentssome elementsof linear models(LMs) in the frameworkof SAE problems. This chapter is devoted to the derivation of best linear unbiased predictors of domain linear parameters and presents some examples where the BLUP has an expression similar to that of some known estimator based on the distributionofthesampledesign. Chapter6dealswithlinearmixedmodels(LMMs)andfocussesonthemethods andalgorithmsforestimating the modelparametersof LMMs. Threemethodsfor fitting LMMs are considered,namely the maximum likelihood (ML), the residual maximumlikelihood(REML),andtheHenderson3moment-based(H3)methods. The algorithms for implementing these methods are given, and two R functions (lmerfromlibrarylme4andlmefromlibrarynlme)forfittingLMMsaredescribed inexamplesofapplicationstothebookdatafiles. Chapter 7 studies the basic unit-level model in SAE. This is the so-called nested error regression (NER) model. It gives the algorithms for calculating the ML, REML, and H3 estimatorsof the modelparameters.The momentsof the H3 Preface vii estimator are also calculated. The chapter ends with a simulation experimentthat comparesthebehaviorofthethreetypesofestimators. Chapter 8 derives the empirical best linear unbiased predictor (EBLUP) of domainlinearparametersunderthe NERmodel.Italso introducesmodel-assisted estimators of domain means, which are constructed to be unbiased with respect to the design-based distribution. This chapter presents a design-based simulation studyforcomparingseveralSAE estimators.Forthissake,anartificialpopulation isgeneratedandstratifiedrandomsamplesaredrawnfromthepopulation. Chapter9presentsmathematicaldevelopmentstoapproximateandestimatethe MSEsoftheBLUPandtheEBLUPofadomainlinearparameter.Thischaptergives themathematicalderivationsfortheMSEoftheBLUPwithplentyofdetails.The derivationoftheapproximationoftheMSEoftheEBLUPandthepropertiesofthe correspondingestimatorareonlyoutlined.Thechaptermakestheparticularizations oftheobtainedresultstotheNERmodel. Chapter 10 addresses the problem of estimating domain nonlinear parameters under the NER model. It introducesthe empiricalbest predictors(EBPs) for esti- matingadditiveparametersandtreatstheproblemofestimatingpovertyindicators, such as poverty proportionsand gaps. This chapter presents parametric bootstrap proceduresforestimatingtheMSEoftheEBP. Chapters 11 and 12 deal with the two-fold nested error regression model. This modeltakesintoaccountthevariabilitybetweendomainsandbetweensubdomains inside each domain. Chapter 11 introduces the model and develops the fitting methods. It also derives the EBLUPs of domain means and the corresponding MSEestimators.Chapter12containstheEBPtheoryforadditiveparameterswith particularizationstothepredictionofpovertyproportionsandgapsandofaverage incomeindicators. In some SAE problems, we can intuitively expect that the slope parameters of someexplanatoryvariablearenotconstantandthereforetheyshouldtakedifferent values in different domains. Chapter 13 studies the random regression coefficient (RRC)models,whichgiveapracticalsolutiontothisproblembyassumingthatthe regression parameters are random and therefore they give a more flexible way of modeling.Dependingonthecovariancestructureoftherandomslopes,thechapter presents two RRC models and derives the EBLUPs of domain linear parameters withthecorrespondingMSEestimators. Chapters14and15considertwodifferentunit-levelmixedmodels,namelythe logit mixed model and the two-fold logit mixed model, which both belong to the class of generalized linear mixed models (GLMMs). These chapters adapt algo- rithmsfor fitting GLMMsto the consideredlogitmixedmodels.Moreconcretely, themethodsofsimulatedmoments,EMandLaplaceapproximation,areconsidered. With regard to the estimation of domain proportions, EBPs are introduced. For calculatingtheEBPs,statisticiansshouldemployanauxiliarycensusfile,andthisis aseriousdrawbackinpractice.Thisiswhy,thechaptersalsoshowhowtocompute theEBPswhenthevectorofauxiliaryvariablestakesonlyafinitenumberofvalues. BothchaptersgiveRcodeswithexamplesofapplicationstosyntheticdata. viii Preface SAE models can be formulated at the unit level or at the area level. The basic area-levelmodelinSAE istheFay–Herriotmodel.Chapter16studiesthismodel, takes into account the problem of estimating the sampling error variances by using the generalized variance function approach, and introduces the EBLUPs of domain means with the correspondingMSE estimators. For estimating the model parameters, four types of estimators are considered. This is the only chapter that considerstheBayesianapproachtoSAE. Chapters 17 and 18 generalizes the Fay–Herriot model by taking into account structuresoftemporalorspatialcorrelation.Chapter17studiestwotemporalarea- level LMMs, the first one with independenttime effects and the second one with AR(1)-correlatedtimeeffects.Chapter18dealswiththreespatialarea-levelLMMs. Thefirstonedoesnotincludepastdata,whiletheothertwomodelsincludespatial andtemporalcorrelations. Chapter19introducesabivariatearea-levelmodel,givesalgorithmstocalculate theMLandREMLestimatorsofmodelparameters,derivestheEBLUPsofdomain means,and approximatesthe matrixof MSEs. Thisis the onlychapterdevotedto multivariateLMMs. Finally, Chaps. 20 and 21 deal with non-temporal and temporal area-level Poissonmixedmodels,respectively.Thesechapterspresentsomefittingalgorithms to estimate the model parameters, introduce the EBPs of functions of fixed and random effects, and give analytic and/or bootstrap procedures to estimate the correspondingMSEs. ThisbookdoesnotcovermanyimportantmethodologiesofSAE.Forexample, itdoesnotgivepredictorsbasedonnonparametric,robust,orBayesianapproaches. The underlying philosophy is simple: to treat few topics in a depth and self- containedway. Thisbookisanintroductorymonographfordoctoratestudentsandpractitioners. It can be used as an auxiliary tool for SAE courses delivered in universities and statisticaloffices. Elche,Spain DomingoMorales Elche,Spain MaríaDoloresEsteban Elche,Spain AgustínPérez Prague,CzechRepublic TomášHobza September2020 Contents 1 SmallAreaEstimation..................................................... 1 1.1 Introduction........................................................... 1 1.2 MixedModels........................................................ 4 1.3 TheDataFiles........................................................ 4 1.3.1 TheLFSDataFiles......................................... 5 1.3.2 TheLCSDataFiles......................................... 8 References.................................................................... 9 2 Design-BasedDirectEstimation .......................................... 13 2.1 Introduction........................................................... 13 2.2 SurveySamplingTheory............................................. 14 2.3 DirectEstimatoroftheTotalandtheMean......................... 17 2.4 EstimatoroftheRatio................................................ 21 2.5 OtherDirectEstimatorsoftheMeanandtheTotal................. 23 2.6 BootstrapResamplingforVarianceEstimation..................... 26 2.7 JackknifeResamplingforVarianceEstimation..................... 27 2.7.1 Delete-One-ClusterJackknifefor Estimators ofDomainParameters...................................... 29 2.8 RCodesforDesign-BasedDirectEstimators....................... 30 2.8.1 Horvitz–ThompsonDirectEstimatorsoftheTotal andtheMean................................................ 30 2.8.2 HájekDirectEstimatoroftheMeanandtheTotal........ 33 2.8.3 JackknifeEstimatorofVariances.......................... 37 2.8.4 FunctionsforCalculatingDirectEstimators.............. 39 References.................................................................... 40 3 Design-BasedIndirectEstimation........................................ 41 3.1 Introduction........................................................... 41 3.2 BasicSyntheticEstimator............................................ 42 3.3 Post-StratifiedEstimator............................................. 45 3.4 SampleSizeDependentEstimator .................................. 47 3.5 GeneralizedRegressionEstimator................................... 47 ix

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.