SMALL AREA ESTIMATION Second Edition J.N.K.RAOANDISABELMOLINA WileySeriesinSurveyMethodology Copyright©2015byJohnWiley& Sons,Inc. PublishedbyJohnWiley&Sons,Inc.,Hoboken,NewJersey PublishedsimultaneouslyinCanada LibraryofCongressCataloging-in-PublicationData: Rao,J.N.K.,1937-author. Smallareaestimation/J.N.K.RaoandIsabelMolina.–Secondedition. pagescm–(Wileyseriesinsurveymethodology) Includesbibliographicalreferencesandindex. ISBN978-1-118-73578-7(cloth) 1.Smallareastatistics.2.Sampling(Statistics)3.Estimationtheory.I.Molina,Isabel,1975-author. II.Title.III.Series:Wileyseriesinsurveymethodology. QA276.6.R3442015 519.5′2–dc23 2015012610 PrintedintheUnitedStatesofAmerica CONTENTS ListofFigures xv ListofTables xvii ForewordtotheFirstEdition xix PrefacetotheSecondEdition xxiii PrefacetotheFirstEdition xxvii 1 *Introduction 1 1.1 WhatisaSmallArea? 1 1.2 DemandforSmallAreaStatistics, 3 1.3 TraditionalIndirectEstimators, 4 1.4 SmallAreaModels, 4 1.5 Model-BasedEstimation, 5 1.6 SomeExamples, 6 1.6.1 Health, 6 1.6.2 Agriculture, 7 1.6.3 IncomeforSmallPlaces, 8 1.6.4 PovertyCounts, 8 1.6.5 MedianIncomeofFour-PersonFamilies, 8 1.6.6 PovertyMapping, 8 2 DirectDomainEstimation 9 2.1 Introduction, 9 2.2 Design-BasedApproach, 10 2.3 EstimationofTotals, 11 2.3.1 Design-UnbiasedEstimator, 11 2.3.2 GeneralizedRegressionEstimator, 13 2.4 DomainEstimation, 16 2.4.1 CaseofNoAuxiliaryInformation, 16 2.4.2 GREGDomainEstimation, 17 2.4.3 Domain-SpecificAuxiliaryInformation, 18 2.5 ModifiedGREGEstimator, 21 2.6 DesignIssues, 23 2.6.1 MinimizationofClustering, 24 2.6.2 Stratification, 24 2.6.3 SampleAllocation, 24 2.6.4 IntegrationofSurveys, 25 2.6.5 Dual-FrameSurveys, 25 2.6.6 RepeatedSurveys, 26 2.7 *OptimalSampleAllocationforPlannedDomains, 26 2.7.1 Case(i), 26 2.7.2 Case(ii), 29 2.7.3 Two-WayStratification:BalancedSampling, 31 2.8 Proofs, 32 2.8.1 ProofofŶ (𝐱)=𝐗, 32 GR 2.8.2 DerivationofCalibrationWeights𝑤∗, 32 j 2.8.3 ProofofŶ =𝐗̂T𝐁̂ whenc =𝝂T𝐗, 32 j j 3 IndirectDomainEstimation 35 3.1 Introduction, 35 3.2 SyntheticEstimation, 36 3.2.1 NoAuxiliaryInformation, 36 3.2.2 *AreaLevelAuxiliaryInformation, 36 3.2.3 *UnitLevelAuxiliaryInformation, 37 3.2.4 Regression-AdjustedSyntheticEstimator, 42 3.2.5 EstimationofMSE, 43 3.2.6 StructurePreservingEstimation, 45 3.2.7 *GeneralizedSPREE, 49 3.2.8 *Weight-SharingMethods, 53 3.3 CompositeEstimation, 57 3.3.1 OptimalEstimator, 57 3.3.2 Sample-Size-DependentEstimators, 59 3.4 James–SteinMethod, 63 3.4.1 CommonWeight, 63 3.4.2 EqualVariances𝜓 =𝜓, 64 i 3.4.3 EstimationofComponentMSE, 68 3.4.4 UnequalVariances𝜓, 70 i 3.4.5 Extensions, 71 3.5 Proofs, 71 4 SmallAreaModels 75 4.1 Introduction, 75 4.2 BasicAreaLevelModel, 76 4.3 BasicUnitLevelModel, 78 4.4 Extensions:AreaLevelModels, 81 4.4.1 MultivariateFay–HerriotModel, 81 4.4.2 ModelwithCorrelatedSamplingErrors, 82 4.4.3 TimeSeriesandCross-SectionalModels, 83 4.4.4 *SpatialModels, 86 4.4.5 Two-FoldSubareaLevelModels, 88 4.5 Extensions:UnitLevelModels, 88 4.5.1 MultivariateNestedErrorRegressionModel, 88 4.5.2 Two-FoldNestedErrorRegressionModel, 89 4.5.3 Two-LevelModel, 90 4.5.4 GeneralLinearMixedModel, 91 4.6 GeneralizedLinearMixedModels, 92 4.6.1 LogisticMixedModels, 92 4.6.2 *ModelsforMultinomialCounts, 93 4.6.3 ModelsforMortalityandDiseaseRates, 93 4.6.4 NaturalExponentialFamilyModels, 94 4.6.5 *Semi-parametricMixedModels, 95 5 EmpiricalBestLinearUnbiasedPrediction(EBLUP):Theory 97 5.1 Introduction, 97 5.2 GeneralLinearMixedModel, 98 5.2.1 BLUPEstimator, 98 5.2.2 MSEofBLUP, 100 5.2.3 EBLUPEstimator, 101 5.2.4 MLandREMLEstimators, 102 5.2.5 MSEofEBLUP, 105 5.2.6 EstimationofMSEofEBLUP, 106 5.3 BlockDiagonalCovarianceStructure, 108 5.3.1 EBLUPEstimator, 108 5.3.2 EstimationofMSE, 109 5.3.3 ExtensiontoMultidimensionalAreaParameters, 110 5.4 *ModelIdentificationandChecking, 111 5.4.1 VariableSelection, 111 5.4.2 ModelDiagnostics, 114 5.5 *Software, 118 5.6 Proofs, 119 5.6.1 DerivationofBLUP, 119 5.6.2 EquivalenceofBLUPandBestPredictor E(𝐦T𝐯|𝐀T𝐲), 120 5.6.3 DerivationofMSEDecomposition(5.2.29), 121 6 EmpiricalBestLinearUnbiasedPrediction(EBLUP): BasicAreaLevelModel 123 6.1 EBLUPEstimation, 123 6.1.1 BLUPEstimator, 124 6.1.2 Estimationof𝜎2, 126 𝑣 6.1.3 RelativeEfficiencyofEstimatorsof𝜎2, 128 𝑣 6.1.4 *Applications, 129 6.2 MSEEstimation, 136 6.2.1 UnconditionalMSEofEBLUP, 136 6.2.2 MSEforNonsampledAreas, 139 6.2.3 *MSEEstimationforSmallAreaMeans, 140 6.2.4 *BootstrapMSEEstimation, 141 6.2.5 *MSEofaWeightedEstimator, 143 6.2.6 MeanCrossProductErrorofTwoEstimators, 144 6.2.7 *ConditionalMSE, 144 6.3 *RobustEstimationinthePresenceofOutliers, 146 6.4 *PracticalIssues, 148 6.4.1 UnknownSamplingErrorVariances, 148 6.4.2 StrictlyPositiveEstimatorsof𝜎2, 151 𝑣 6.4.3 PreliminaryTestEstimation, 154 6.4.4 CovariatesSubjecttoSamplingErrors, 156 6.4.5 BigDataCovariates, 159 6.4.6 BenchmarkingMethods, 159 6.4.7 MisspecifiedLinkingModel, 165 6.5 *Software, 169 7 BasicUnitLevelModel 173 7.1 EBLUPEstimation, 173 7.1.1 BLUPEstimator, 174 7.1.2 Estimationof𝜎2and𝜎2, 177 𝑣 e 7.1.3 *NonnegligibleSamplingFractions, 178 7.2 MSEEstimation, 179 7.2.1 UnconditionalMSEofEBLUP, 179 7.2.2 UnconditionalMSEEstimators, 181 7.2.3 *MSEEstimation:NonnegligibleSampling Fractions, 182 7.2.4 *BootstrapMSEEstimation, 183 7.3 *Applications, 186 7.4 *OutlierRobustEBLUPEstimation, 193 7.4.1 EstimationofAreaMeans, 193 7.4.2 MSEEstimation, 198 7.4.3 SimulationResults, 199 7.5 *M-QuantileRegression, 200 7.6 *PracticalIssues, 205 7.6.1 UnknownHeteroscedasticErrorVariances, 205 7.6.2 Pseudo-EBLUPEstimation, 206 7.6.3 InformativeSampling, 211 7.6.4 MeasurementErrorinArea-LevelCovariate, 216 7.6.5 ModelMisspecification, 218 7.6.6 Semi-parametricNestedErrorModel:EBLUP, 220 7.6.7 Semi-parametricNestedErrorModel:REBLUP, 224 7.7 *Software, 227 7.8 *Proofs, 231 7.8.1 Derivationof(7.6.17), 231 7.8.2 Proofof(7.6.20), 232 8 EBLUP:Extensions 235 8.1 *MultivariateFay–HerriotModel, 235 8.2 CorrelatedSamplingErrors, 237 8.3 TimeSeriesandCross-SectionalModels, 240 8.3.1 *Rao–YuModel, 240 8.3.2 State-SpaceModels, 243 8.4 *SpatialModels, 248 8.5 *Two-FoldSubareaLevelModels, 251 8.6 *MultivariateNestedErrorRegressionModel, 253 8.7 Two-FoldNestedErrorRegressionModel, 254 8.8 *Two-LevelModel, 259 8.9 *ModelsforMultinomialCounts, 261 8.10 *EBLUPforVectorsofAreaProportions, 262 8.11 *Software, 264 9 EmpiricalBayes(EB)Method 269 9.1 Introduction, 269 9.2 BasicAreaLevelModel, 270 9.2.1 EBEstimator, 271 9.2.2 MSEEstimation, 273 9.2.3 ApproximationtoPosteriorVariance, 275 9.2.4 *EBConfidenceIntervals, 281 9.3 LinearMixedModels, 287 9.3.1 EBEstimationof𝜇 =𝐥T𝜷+𝐦T𝐯, 287 i i i i 9.3.2 MSEEstimation, 288 9.3.3 ApproximationstothePosteriorVariance, 288 9.4 *EBEstimationofGeneralFinitePopulationParameters, 289 9.4.1 BPEstimatorUnderaFinitePopulation, 290 9.4.2 EBEstimationUndertheBasicUnitLevelModel, 290 9.4.3 FGTPovertyMeasures, 293 9.4.4 ParametricBootstrapforMSEEstimation, 294 9.4.5 ELLEstimation, 295 9.4.6 SimulationExperiments, 296 9.5 BinaryData, 298 9.5.1 *CaseofNoCovariates, 299 9.5.2 ModelswithCovariates, 304 9.6 DiseaseMapping, 308 9.6.1 Poisson–GammaModel, 309 9.6.2 Log-NormalModels, 310 9.6.3 Extensions, 312 9.7 *Design-WeightedEBEstimation:ExponentialFamily Models, 313 9.8 Triple-GoalEstimation, 315 9.8.1 ConstrainedEB, 316 9.8.2 Histogram, 318 9.8.3 Ranks, 318 9.9 EmpiricalLinearBayes, 319 9.9.1 LBEstimation, 319 9.9.2 PosteriorLinearity, 322 9.10 ConstrainedLB, 324 9.11 *Software, 325 9.12 Proofs, 330 9.12.1 Proofof(9.2.11), 330 9.12.2 Proofof(9.2.30), 330 9.12.3 Proofof(9.8.6), 331 9.12.4 Proofof(9.9.1), 331 10 HierarchicalBayes(HB)Method 333 10.1 Introduction, 333 10.2 MCMCMethods, 335 10.2.1 MarkovChain, 335 10.2.2 GibbsSampler, 336 10.2.3 M–HWithinGibbs, 336 10.2.4 PosteriorQuantities, 337 10.2.5 PracticalIssues, 339 10.2.6 ModelDetermination, 342 10.3 BasicAreaLevelModel, 347 10.3.1 Known𝜎2, 347 𝑣 10.3.2 *Unknown𝜎2:NumericalIntegration, 348 𝑣 10.3.3 Unknown𝜎2:GibbsSampling, 351 𝑣 10.3.4 *UnknownSamplingVariances𝜓, 354 i 10.3.5 *SpatialModel, 355 10.4 *UnmatchedSamplingandLinkingAreaLevelModels, 356 10.5 BasicUnitLevelModel, 362 10.5.1 Known𝜎2 and𝜎2, 362 𝑣 e 10.5.2 Unknown𝜎2and𝜎2:NumericalIntegration, 363 𝑣 e 10.5.3 Unknown𝜎2and𝜎2:GibbsSampling, 364 𝑣 e 10.5.4 Pseudo-HBEstimation, 365 10.6 GeneralANOVAModel, 368 10.7 *HBEstimationofGeneralFinitePopulationParameters, 369 10.7.1 HBEstimatorunderaFinitePopulation, 370 10.7.2 ReparameterizedBasicUnitLevelModel, 370 10.7.3 HBEstimatorofaGeneralAreaParameter, 372 10.8 Two-LevelModels, 374 10.9 TimeSeriesandCross-SectionalModels, 377 10.10 MultivariateModels, 381 10.10.1 AreaLevelModel, 381 10.10.2 UnitLevelModel, 382 10.11 DiseaseMappingModels, 383 10.11.1 Poisson-GammaModel, 383 10.11.2 Log-NormalModel, 384 10.11.3 Two-LevelModels, 386 10.12 *Two-PartNestedErrorModel, 388 10.13 BinaryData, 389 10.13.1 Beta-BinomialModel, 389 10.13.2 Logit-NormalModel, 390 10.13.3 LogisticLinearMixedModels, 393 10.14 *MissingBinaryData, 397 10.15 NaturalExponentialFamilyModels, 398 10.16 ConstrainedHB, 399 10.17 *ApproximateHBInferenceandDataCloning, 400 10.18 Proofs, 402 10.18.1 Proofof(10.2.26), 402 10.18.2 Proofof(10.2.32), 402 10.18.3 Proofof(10.3.13)–(10.3.15), 402 References 405 AuthorIndex 431 SubjectIndex 437 FIGURES 3.1 Direct, Census, composite SPREE, and GLSM estimates of Row Profiles 𝜽M =(𝜃M,…,𝜃M)T for Canadian Provinces Newfoundland i i1 iA andLabrador(a)andQuebec(b),forTwo-digitOccupationclassA1. 53 3.2 Direct, Census, composite SPREE, and GLSM estimates of row profiles𝜽M =(𝜃M,…,𝜃M)T forCanadianprovincesNewfoundlandand i i1 iA Labrador(a)andNovaScotia(b),forTwo-digitoccupationclassB5. 54 6.1 EBLUP and Direct Area Estimates of Average Expenditure on Fresh MilkforEachSmallArea(a).CVsofEBLUPandDirectEstimatorsfor EachSmallArea(b).AreasareSortedbyDecreasingSampleSize. 171 7.1 Leveragemeasuress versusscaledsquaredresiduals. 231 ii 8.1 Naive Nonparametric Bootstrap MSE Estimates Against Analytical MSE Estimates (a). Bias-corrected Nonparametric Bootstrap MSE EstimatesAgainstAnalyticalMSEEstimates(b). 267 8.2 EBLUPEstimates,BasedontheSpatialFHModelwithSARRandom Effects,andDirectEstimatesofMeanSurfaceAreaUsedforProduction ofGrapesforEachMunicipality(a).CVsofEBLUPEstimatesandof DirectEstimatesforEachMunicipality(b).MunicipalitiesareSortedby IncreasingCVsofDirectEstimates. 268 9.1 Bias (a) and MSE (b) over Simulated Populations of EB, Direct, and ELLEstimatesofPercentPovertyGap100F forEachAreai. 297 1i 9.2 True MSEs of EB Estimators of Percent Poverty Gap and Average of BootstrapMSEEstimatorsObtainedwithB=500forEachAreai. 298