SpringerBriefs in Statistics JSS Research Series in Statistics Shonosuke Sugasawa · Tatsuya Kubokawa Mixed-Effects Models and Small Area Estimation SpringerBriefs in Statistics JSS Research Series in Statistics Editors-in-Chief NaotoKunitomo,TheInstituteofMathematicalStatistics,Tachikawa,Tokyo,Japan AkimichiTakemura,TheCenterforDataScienceEducationandResearch,Shiga University,Hikone,Shiga,Japan SeriesEditors GenshiroKitagawa,MeijiInstituteforAdvancedStudyofMathematicalSciences, Nakano-ku,Tokyo,Japan ShigeyukiMatsui,GraduateSchoolofMedicine,NagoyaUniversity,Nagoya, Aichi,Japan ManabuIwasaki,SchoolofDataScience,YokohamaCityUniversity,Yokohama, Kanagawa,Japan YasuhiroOmori,GraduateSchoolofEconomics,TheUniversityofTokyo, Bunkyo-ku,Tokyo,Japan MasafumiAkahira,InstituteofMathematics,UniversityofTsukuba,Tsukuba, Ibaraki,Japan MasanobuTaniguchi,SchoolofFundamentalScienceandEngineering,Waseda University,Shinjuku-ku,Tokyo,Japan HiroeTsubaki,TheInstituteofStatisticalMathematics,Tachikawa,Tokyo,Japan SatoshiHattori,FacultyofMedicine,OsakaUniversity,Suita,Osaka,Japan KosukeOya,SchoolofEconomics,OsakaUniversity,Toyonaka,Osaka,Japan TaijiSuzuki,SchoolofEngineering,UniversityofTokyo,Tokyo,Japan The current research of statistics in Japan has expanded in several directions in linewithrecenttrendsinacademicactivitiesintheareaofstatisticsandstatistical sciencesovertheglobe.ThecoreoftheseresearchactivitiesinstatisticsinJapanhas beentheJapanStatisticalSociety(JSS).Thissociety,theoldestandlargestacademic organizationforstatisticsinJapan,wasfoundedin1931byahandfulofpioneerstatis- ticiansandeconomistsandnowhasahistoryofabout80years.Manydistinguished scholarshavebeenmembers,includingtheinfluentialstatisticianHirotuguAkaike, whowasapastpresidentofJSS,andthenotablemathematicianKiyosiItô,whowas anearliermemberoftheInstituteofStatisticalMathematics(ISM),whichhasbeen acloselyrelatedorganizationsincetheestablishmentofISM.Thesocietyhastwo academicjournals:theJournaloftheJapanStatisticalSociety(EnglishSeries)and theJournaloftheJapanStatisticalSociety(JapaneseSeries).ThemembershipofJSS consistsofresearchers,teachers,andprofessionalstatisticiansinmanydifferentfields includingmathematics,statistics,engineering,medicalsciences,governmentstatis- tics,economics,business,psychology,education,andmanyothernatural,biological, and social sciences. The JSS Series of Statistics aims to publish recent results of currentresearchactivities intheareas ofstatisticsand statisticalsciences inJapan thatotherwisewouldnotbeavailableinEnglish;theyarecomplementarytothetwo JSSacademicjournals,bothEnglishandJapanese.Becausethescopeofaresearch paperinacademicjournalsinevitablyhasbecomenarrowlyfocusedandcondensed in recent years, this series is intended to fill the gap between academic research activitiesandtheformofasingleacademicpaper.Theserieswillbeofgreatinterest toawideaudienceofresearchers,teachers,professionalstatisticians,andgraduate studentsinmanycountrieswhoareinterestedinstatisticsandstatisticalsciences,in statisticaltheory,andinvariousareasofstatisticalapplications. · Shonosuke Sugasawa Tatsuya Kubokawa Mixed-Effects Models and Small Area Estimation ShonosukeSugasawa TatsuyaKubokawa CenterforSpatialInformationScience FacultyofEconomics UniversityofTokyo UniversityofTokyo Kashiwa-shi,Chiba,Japan Tokyo,Japan ISSN 2191-544X ISSN 2191-5458 (electronic) SpringerBriefsinStatistics ISSN 2364-0057 ISSN 2364-0065 (electronic) JSSResearchSeriesinStatistics ISBN 978-981-19-9485-2 ISBN 978-981-19-9486-9 (eBook) https://doi.org/10.1007/978-981-19-9486-9 ©TheAuthor(s),underexclusivelicensetoSpringerNatureSingaporePteLtd.2023 Thisworkissubjecttocopyright.AllrightsaresolelyandexclusivelylicensedbythePublisher,whether thewholeorpartofthematerialisconcerned,specificallytherightsoftranslation,reprinting,reuse ofillustrations,recitation,broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,and transmissionorinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilar ordissimilarmethodologynowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthors,andtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressedorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSingaporePteLtd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore Preface Thisbookprovidesaself-containedintroductionofmixed-effectsmodelsandsmall area estimation techniques. In particular, it focuses on both introducing classical theory and reviewing the latest methods. It first introduces basic issues of mixed- effects models, such as parameter estimation, random effects prediction, variable selection,andasymptotictheory.Standardmixed-effectsmodelsusedinsmallarea estimation, known as Fay–Herriot model and nested error regression model, are then introduced. Both frequentist and Bayesian approaches are given to compute predictors of small area parameters of interest. For measuring uncertainty of the predictors,several methods tocalculate meansquared errorsandconfidence inter- vals are discussed. Various advanced approaches using mixed-effects models are introduced,coveringfromfrequentisttoBayesianapproaches.Thisbookishelpful forresearchersandgraduatestudentsinvariousfieldsrequiringdataanalysisskills aswellasinmathematicalstatistics. The authors would like to thank Professor Masafumi Akahira for giving us the opportunityofpublishingthisbook.Theworkofthefirstauthorwassupportedin partbyGrant-in-AidforScientificResearch(21H00699)fromtheJapanSocietyfor thePromotion ofScience (JSPI).Theworkofthesecond author wassupported in partbyGrant-in-AidforScientificResearch(18K11188)fromtheJSPI. Tokyo,Japan ShonosukeSugasawa September2022 TatsuyaKubokawa v Contents 1 Introduction ................................................... 1 References ..................................................... 2 2 GeneralMixed-EffectsModelsandBLUP ........................ 5 2.1 Mixed-EffectsModelsandExamples .......................... 5 2.2 BestLinearUnbiasedPredictors .............................. 9 2.3 REMLandGeneralEstimatingEquations ...................... 12 2.4 AsymptoticProperties ....................................... 13 2.5 ProofsoftheAsymptoticResults ............................. 18 References ..................................................... 21 3 MeasuringUncertaintyofPredictors ............................. 23 3.1 EBLUPandtheMeanSquaredError .......................... 23 3.2 ApproximationoftheMSE .................................. 24 3.3 EvaluationoftheMSEUnderNormality ....................... 29 3.4 EstimationoftheMSE ...................................... 31 3.5 ConfidenceIntervals ........................................ 33 References ..................................................... 35 4 BasicMixed-EffectsModelsforSmallAreaEstimation ............ 37 4.1 BasicArea-LevelModel ..................................... 37 4.1.1 Fay–HerriotModel ................................... 37 4.1.2 AsymptoticPropertiesofEBLUP ...................... 39 4.2 BasicUnit-LevelModels .................................... 49 4.2.1 NestedErrorRegressionModel ........................ 49 4.2.2 AsymptoticPropertiesofEBLUP ...................... 51 References ..................................................... 56 5 HypothesisTestsandVariableSelection .......................... 57 5.1 Test Procedures for a Linear Hypothesis on Regression Coefficients ................................................ 57 5.2 InformationCriteriaforVariableorModelSelection ............. 61 References ..................................................... 66 vii viii Contents 6 AdvancedTheoryofBasicSmallAreaModels .................... 67 6.1 AdjustedLikelihoodMethods ................................ 67 6.1.1 StrictlyPositiveEstimateofRandomEffectVariance ...... 67 6.1.2 AdjustedLikelihoodforEmpiricalBayesConfidence Intervals ............................................ 69 6.1.3 AdjustedLikelihoodforSolvingMultipleSmallArea EstimationProblems ................................. 72 6.2 ObservedBestPrediction .................................... 72 6.3 RobustMethods ............................................ 75 6.3.1 Unit-LevelModels ................................... 75 6.3.2 Area-LevelModels ................................... 77 References ..................................................... 80 7 SmallAreaModelsforNon-normalResponseVariables ............ 83 7.1 GeneralizedLinearMixedModels ............................ 83 7.2 NaturalExponentialFamilieswithConjugatePriors ............. 85 7.3 UnmatchedSamplingandLinkingModels ..................... 86 7.4 ModelswithDataTransformation ............................. 89 7.4.1 Area-LevelModelsforPositiveValues .................. 89 7.4.2 Area-LevelModelsforProportions ..................... 90 7.4.3 Unit-LevelModelsandEstimatingFinitePopulation Parameters .......................................... 91 7.5 ModelswithSkewedDistributions ............................ 95 References ..................................................... 98 8 ExtensionsofBasicSmallAreaModels ........................... 99 8.1 FlexibleModelingofRandomEffects ......................... 99 8.1.1 UncertaintyofthePresenceofRandomEffects ........... 99 8.1.2 Modeling Random Effects via Global–Local ShrinkagePriors ..................................... 104 8.2 MeasurementErrorsinCovariates ............................ 107 8.2.1 MeasurementErrorsintheFay–HerriotModel ........... 107 8.2.2 MeasurementErrorsintheNestedErrorRegression Model .............................................. 110 8.3 NonparametricandSemiparametricModeling .................. 111 8.4 ModelingHeteroscedasticVariance ........................... 113 8.4.1 ShrinkageEstimationofSamplingVariances ............. 113 8.4.2 HeteroscedasticVarianceinNestedErrorRegression Models ............................................. 116 References ..................................................... 120 Chapter 1 Introduction Theterm‘smallarea’or‘smalldomain’referstoasmallgeographicalregionsuch asacounty,municipalityorstate,orasmalldemographicgroupsuchasaspecific age–sex–race group. In the estimation of a characteristic of such a small group, the direct estimate based on only on the data from the small group is likely to be unreliable, because only the small number of observations are available from the small group. The problem of small area estimation is how to produce a reliable estimateforthecharacteristicofthesmallgroup,andthesmallareaestimationhas beenactivelyandextensivelystudiedfromboththeoreticalandpracticalaspectsdue to an increasing demand for reliable small area estimates from public and private sectors. The articles by Ghosh and Rao (1994) and Pfeffermann (2013) give good reviews and motivations, and the comprehensive book by Rao and Molina (2015) coversallthemaindevelopmentsinsmallareaestimation.Morerecentreviewonthe useofmixedmodelsinsmallareaestimationisgiveninSugasawaandKubokawa (2020). Also see Demidenko (2004) for general mixed models and Pratesi (2016) foranalysisofpovertydatabysmallareaestimation.Inthispaper,wedescribethe detailsofclassicalmethodsandgiveareviewofrecentdevelopments,whichwillbe helpfulforreaderswhoareinterestedinthistopic. Toimprovetheaccuracyofdirectsurveyestimates,wemakeuseoftherelevant supplementaryinformationsuchasdatafromotherrelatedareasandcovariatedata fromothersources.Thelinearmixedmodels(LMM)enableusto‘borrowstrength’ fromtherelevantsupplementarydata,andtheresultingmodel-basedestimatorsor thebestlinearunbiasedpredictors(BLUP)providereliableestimatesforthesmall areacharacteristics.TheBLUPshrinksthedirectestimatesinsmallareastowarda stable quantity constructed by pooling all the data, thereby BLUP is characterized by the effects of pooling and shrinkage of the data. These two features of BLUP mainlycomefromthestructureoflinearmixedmodelsdescribedas(observation) = (common parameters) + (random effects) + (error terms), namely, the shrinkage ©TheAuthor(s),underexclusivelicensetoSpringerNatureSingaporePteLtd.2023 1 S.SugasawaandT.Kubokawa,Mixed-EffectsModelsandSmallAreaEstimation, JSSResearchSeriesinStatistics,https://doi.org/10.1007/978-981-19-9486-9_1 2 1 Introduction effectarisesfromtherandomeffects,andthepoolingeffectisduetothesetupofthe common parameters. While BLUP was originally proposed by Henderson (1950), empiricalversionofBLUP(EBLUP)isrelatedtotheclassicalshrinkageestimator studiedbyStein(1956),whoestablishedanalyticallythatEBLUPimprovesonthe samplemeanswhenthenumberofsmallareasislargerthanorequaltothree.This factshowsnotonlythatEBLUPhasalargerprecisionthanthesamplemean,butalso thatasimilarconceptcameoutatthesametimebyHenderson(1950)forpractical useandStein(1956)fortheoreticalinterest.Basedonthesehistoricalbackgrounds, therehavebeenalotofmethodsproposedsofar. Asaformerpartofthisbook,wefirstintroducedetailsoftheoryoflinearmixed modelsandBLUP(orEBLUP)inChap.2.Sincemeasuringthevariabilityorriskof EBLUPisanimportanttaskinsmallareaestimation,wefocusonmeansquarederrors andpredictionintervalsinChap.3,anddescribeseveralmethodsbasedonasymptotic calculations and simulation-based methods such as jackknife and bootstrap. Our argument tries to keep generality without assuming normality assumptions for the distributionasmuchaspossible.Wethenintroducetwobasicsmallareamodels,the Fay–Herriotmodel(FayandHerriot1979)andthenestederrorregression(Battese etal.1988)inChap.4.InChap.5,weprovidebasictechniquesofhypothesistesting andvariableselectioninlinearmixedmodels,whichcanimmediatelybeappliedto thebasicsmallareamodels. Asalatterpartofthisbook,wefocusmoreontechniquesofsmallareaestimation basedonmixed-effectsmodels,withsomeexamplesinplaces.First,inChap.6,we explainadvancedtheoryofbasicsmallareamodelstohandlepracticalproblems.We mainlyfocusonthreetechniques,adjustedlikelihoodmethodsforestimatingrandom effectsvariance,observedbestpredictionforrandomeffectsandrobustprediction andfittingofthesmallareamodels.InChap.7,weintroduceseveraltechniquesto handlenon-normalresponsevariablesinsmallareaestimation,includinggeneralized linearmixedmodels,modelswithdatatransformation,andmodelswithnon-normal distributions.Finally,wereviewseveralextensionsofthebasicsmallareamodelsin Chap.8.Thetopicstreatedthereareflexiblemodelingofrandomeffects,measure- menterrormodels,nonparametricandsemiparametricmodels,andheteroscedastic variancemodels. References BatteseG,HarterR,FullerW(1988)Anerror-componentsmodelforpredictionofcountycrop areasusingsurveyandsatellitedata.JAmStatAssoc83:28–36 DemidenkoE(2004)Mixedmodels:theoryandapplications.Wiley FayR,HerriotR(1979)Estimatorsofincomeforsmallareaplaces:AnapplicationofJames-Stein procedurestocensus.JAmStatiAssoc74:341–353 GhoshM,RaoJ(1994)Smallareaestimation:anappraisal.StatSci9:55–76 HendersonC(1950)Estimationofgeneticparameters.AnnMathStat21:309–310 PfeffermannD(2013)Newimportantdevelopmentsinsmallareaestimation.StatSci28:40–68 PratesiM(ed)(2016)Analysisofpovertydatabysmallareaestimation.Wiley