Table Of ContentA New Model Variance Estimator for an Area Level
Small Area Model to Solve Multiple Problems
7
1 Simultaneously
0
2
n Masayo Yoshimori Hirose
a The Institute of Statistical Mathematics
J
and
6
1 Partha Lahiri
Joint Program in Survey Methodology, University of Maryland,
]
T College Park, U.S.A.
S
.
h
t
a
m
Abstract
[
Thetwo-levelnormalhierarchicalmodel(NHM)hasplayedacriti-
1 calroleinthetheoryofsmallareaestimation(SAE),oneofthegrowing
v
areas in statistics with numerous applications in different disciplines.
6
In this paper, we address major well-known shortcomings associated
7
1 withtheempiricalbestlinearunbiasedprediction(EBLUP)ofasmall
4 area mean and its mean squared error (MSE) estimation by consid-
0 ering an appropriate model variance estimator that satisfies multiple
.
1 properties. The proposed model variance estimator simultaneously
0 (i) improves on the estimation of the related shrinkage factors, (ii)
7
protects EBLUP from the common overshrinkage problem, (iii) avoids
1
complexbiascorrectioningeneratingstrictlypositivesecond-orderun-
:
v biased mean square error (MSE) estimator either by the Taylor series
i
X orsingleparametricbootstrapmethod. Theideaofachievingmultiple
desirable properties in an EBLUP method through a suitably devised
r
a model variance estimator is the first of its kind and holds promise in
providing good inferences for small area means under the classical lin-
ear mixed model prediction framework. The proposed methodology
is also evaluated using a Monte Carlo simulation study and real data
analysis.
Keywords: Adjustedmaximumlikelihoodmethod;EmpiricalBayes;Em-
pirical best linear unbiased prediction; Linear mixed model; Second-order
unbiasedness.
1
1 Introduction
Planning and evaluation of government programs usually requires access to
a wide range of national and sub-national socio-economic, environment and
health related statistics. There is, however, a growing need for statistics
relating to much smaller geographical areas where data are too sparse to
support the sort of standard estimation methods typically employed at the
national level. These small area official statistics are routinely used for a va-
riety of purposes, including assessing economic well-being of a nation, mak-
ing public policies, and allocating funds in various government programs.
In this context, the term small area typically refers to a sub-population for
which reliable statistics of interest cannot be produced using the limited
area specific data available from the primary data source.
With the availability alternative data sources such as survey data, ad-
ministrative and census records, different governmental agencies are now
exploring ways to combine information from different data sources in or-
der to produce reliable small area statistics. A common practice is to use a
statisticalmodel, usuallyamixedmodel, andanefficientstatisticalmethod-
ology such as Bayesian or EBLUP for combining information from multiple
databases. Such a strategy generally improves on estimation for a domain
with small or no sample from the primary data source. We refer to the book
by Rao and Molina (2015) for a comprehensive recent account of small area
estimation literature.
Both classical and Bayesian methods and theories have been developed
using the following widely applied two-level Normal hierarchical model:
A Two-Level Normal Hierarchical Model (NHM)
ind
Level 1 (sampling model): y |θ ∼ N(θ ,D );
i i i i
Level 2 (linking model): θ i∼nd N(x(cid:48)β,A),
i i
for i = 1,··· ,m.
In the above model, level 1 is used to account for the sampling distri-
bution of unbiased estimates y . For example, y could be a sample mean
i i
based on n observations taken from the ith population (e.g., a small geo-
i
graphic area, a hospital or a school.) As in other papers on the NHM (e.g.,
Efron and Morris 1973, 75; Fay and Herriot 1979; Morris 1983; Datta, Rao,
and Smith, 2005), we assume that the sampling variances D are known, in
i
ordertoconcentrateonthemainissues. Theassumptionofknownsampling
2
variances D often follows from the asymptotic variances of transformed di-
i
rect estimates (Efron and Morris 1975; Carter and Rolph 1974) and/or from
empirical variance modeling (Fay and Herriot 1979, Bell and Otto 1995).
Level 2 links the random effects θ to a vector of p known auxiliary
i
variables x = (x ,··· ,x )(cid:48), often obtained from various alternative data
i i1 ip
sources (e.g., administrative records, severity index for a hospital, school
register, etc.). The parameters β and A of the linking model, commonly re-
ferredtoashyperparameters,aregenerallyunknownandareestimatedfrom
the available data. We assume that β ∈ Rp, the p-dimensional Euclidian
space, and A ∈ [0,∞).
The NHM model can be viewed as the following simple linear mixed
model:
y = θ +e = x(cid:48)β+v +e , (i = 1,...,m), (1)
i i i i i i
where {v ...,v } and {e ,...,e } are independent with v ∼N(0,A) and
1 m 1 m i
e ∼N(0,D ); x is a p-dimensional vector of known auxiliary variables; β ∈
i i i
Rp is a p-dimensional vector of unknown regression coefficients; A ∈ [0,∞)
is an unknown variance component; D > 0 is the known sampling variance
i
of y (i = 1,··· ,m).
i
NHM is particularly effective in combining different sources of informa-
tion and explaining different sources of errors. Some earlier applications of
NHM include the estimation of: (i) false alarm probabilities in New York
city(CarterandRolph1974), (ii)thebattingaveragesofmajorleaguebase-
ball players (Efron and Morris 1975), and (iii) prevalence of toxoplasmosis
in El Salvador (Efron and Morris 1975).
Since the publication of the landmark paper by Fay and Herriot (with
971 google citation to date), the NHM, commonly known as the Fay-Herriot
(FH) model in the small area research community, has been extensively
used in developing small area estimation theory and in a wide range of
applications. In a small area estimation setting, NHM or the FH was used:
to estimate poverty rates for the US states, counties, and school districts
(Citro and Kalton 2000) and Chilean municipalities (Casas-Codero et al.
2015), and to estimate proportions at the lowest level of literacy for states
and counties (Mohadjer et al. 2007).
The MSE of a given predictor θˆ of θ is defined as M (θˆ) = E(θˆ −
i i i i i
θ )2, where the expectation is with respect to the joint distribution of y =
i
(y ,··· ,y )(cid:48) and θ = (θ ,··· ,θ )(cid:48) under the Fay–Herriot model (1). The
1 m 1 m
best linear unbiased predictor (BLUP) θˆBLUP of θ , which minimizes M (θˆ)
i i i i
3
among all linear unbiased predictors θˆ, is given by:
i
θˆBLUP(A) = (1−B )y +B x(cid:48)βˆ(A),
i i i i i
where B ≡ B (A) = D /(A + D ) is the shrinkage factor and βˆ(A) =
i i i i
(X(cid:48)V−1X)−1X(cid:48)V−1y is the weighted least square estimator of β when A is
known. Here we use the following notation: X(cid:48) = (x ,··· ,x ), a p × m
1 m
matrix of known auxiliary variables; V = diag(A + D ,··· ,A + D ), a
1 m
m × m diagonal matrix. By plugging in an estimator Aˆ for A (e.g., ML,
REML, ANOVA) in the BLUP, one gets an empirical BLUP (EBLUP):
θˆEB ≡ θˆBLUP(Aˆ).
i i
In the context of an empirical Bayesian approach, Morris (1983) noted
thatformakinginferencesaboutθ , estimationofB ismoreimportantthan
i i
thatofAbecausetheposteriormeansandvariancesofθ arelinearinB ,not
i i
inA. Healsonotedthat, evenifanexactunbiasedestimatorofAisplugged
in B ≡ B (A), one may estimate B with large bias. For that reason, to
i i i
motivate the James-Stein estimator of θ , Efron and Morris (1973) used an
i
exact unbiased estimator of B and not maximum likelihood estimator of
A. For small m, maximum likelihood estimator of A (even with the REML
correction) frequently produces estimate of A at the boundary (that is, 0)
resultinginB = 1foralli, evenwhensomeofthetrueB arenotcloseto1.
i i
ThiscausesanovershrinkageprobleminEBLUP.Thatis,foreachi,EBLUP
of θ reduces to the regression estimator. To overcome the overshrinkage
i
problem, Morris (1983) suggested the fraction (m−p−2)/(m−1) when
estimator of B is 1. Li and Lahiri (2010) and Yoshimori and Lahiri (2014)
i
avoidedtheovershrinkageproblembyconsideringstrictlypositiveconsistent
estimators of A, but did not devise their estimators of A to obtain nearly
accurate estimator of B ; that is, biases of their estimators of B , like all
i i
other existing estimators (e.g., ML or REML), are of the order O(m−1) and
not o(m−1). This is an important research gap, which we will fill in this
paper.
An estimator Mˆ (θˆEB) of M (θˆEB) is called second-order unbiased if
i i i i
E[Mˆ (θˆEB)] = M (θˆEB) + o(m−1), for large m, under suitable regularity
i i i i
conditions. Let M (A) be a second-order approximation to M (θˆEB).
i;approx i i
That is, M (θˆEB) = M (A) + o(m−1), for large m, under regular-
i i i;approx
ity conditions. Prasad and Rao (1990) proposed a second-order unbiased
estimator of M (θˆEB ), where θˆEB is EBLUP of θ when method-of-
i i;MOM i;MOM i
moments (MOM) estimator Aˆ of A is used. They noticed that the
MOM
simple plugged-in estimator M (Aˆ ) is not second-order unbiased
i;approx MOM
4
estimator of M (θˆEB ). They showed that
i i;MOM
E[M (Aˆ )] = M [θˆEB(Aˆ )]+O(m−1),
i;approx MOM i i MOM
for large m, under regularity conditions. In fact, M (Aˆ) is not second-
i;approx
orderunbiasedestimatorofM (θˆEB)foranyvariancecomponentestimators
i i
proposed in the literature. Bias correction is usually applied to achieve
second-order unbiasedness. However, some bias-correction can even yield
negative estimates of MSE. See Jiang (2007) and Molina and Rao (2015) for
further discussions.
Mimicking a Bayesian hyperprior calculation, Laird and Louis (1987)
introduced a parametric bootstrap method for measuring uncertainty of
an empirical Bayes estimator. While their point estimator is identical to
EBLUP, their measure of uncertainty has more of a Bayesian flavor rather
than MSE. Butar (1997) [see also Butar and Lahiri 2003] was the first to
introduce parametric bootstrap method to produce a second-order unbiased
MSE estimator in the small area estimation context. Since Butar’s work,
a number of papers on parametric bootstrap MSE estimation methods ap-
peared in the SAE literature; see Pfeffermann and Glickman (2004), Chat-
terjee and Lahiri (2007); Hall and Maiti (2006); Pfefferman and Correra
(2012). Some of them are the second-order unbiased but not strictly posi-
tive. Some adjustments were proposed to make the second-order unbiased
double parametric bootstrap MSE estimators strictly positive, but adjusted
MSE estimators were not claimed to have the dual property of second-order
unbiasedness and strict positivity. As pointed out in Jiang et al. (2016),
a proof is not at all trivial and it is not even clear if the adjustments for
positivity retain the second-order unbiasedness of the MSE estimators.
In this paper, we focus on the estimation of two important area-specific
functions of A — the shrinkage factor B and the MSE of the EBLUP
i
M (θˆEB). We propose a single area specific estimator of A, say Aˆ , that
i i i
simultaneously satisfies the following multiple desirable properties under
certain mild regularity conditions:
Property 1: Obtain a second-order unbiased estimator of B , that is,
i
E(Bˆ ) = B +o(m−1), among the class of estimators of B with iden-
i i i
tical variance, up to the order O(m−1), where Bˆ = D /(Aˆ +D ).
i i i i
Property 2: 0 < inf Bˆ ≤ sup Bˆ < 1. That is, it protects EBLUP
m≥1 i m≥1 i
from overshrinking to the regression estimator, a common problem
encountered in the EB method;
5
Property 3: Obtain second-order unbiased Taylor series MSE estimator
of EBLUP without any bias correction; that is, E[M (Aˆ )] =
i;approx i
M (θˆEB)+o(m−1).
i i
Property 4: Produce a strictly positive second-order unbiased single para-
metric bootstrap MSE estimator without any bias-correction.
Note that the variance component in the FH model (1) is not area specific,
but to satisfy the above properties simultaneously for a given area, we pro-
pose an area specific estimator of A. This introduces an area specific bias,
but interestingly the order of bias is O(m−1), same as the bias of the ML
estimatorofAbuthigherthanthatofREMLinthehigher-orderasymptotic
sense. This seems to be a reasonable approach as our main targets are area
specific parameters and not the global parameter A. Obviously, if A is the
main target, we would recommend a standard variance component method.
We stress that in general none of the existing methods for estimating A
satisfies any of all the four properties simultaneously.
In Section 2, we propose a new adjusted maximum likelihood estima-
tor of A that satisfies all the four desirable properties listed above. The
balanced case has been heavily studied in the literature. We consider the
balanced case in Section 3 and show how our results are related to the ones
in the literature. In Section 4, using a real life data from the U.S. Census
Bureau, we demonstrate superior performances of our proposed estimators
and MSE estimators over the competing estimators. A Monte Carlo sim-
ulation study, described in Section 5, shows that the proposed estimators
outperform competing estimators. All the technical proofs are deferred to
the Appendix.
2 A New Adjusted Maximum Likelihood Estima-
tor of A
The residual maximum likelihood estimator of A is defined as:
Aˆ = arg maxL (A),
RE RE
A∈[0,∞)
where L (A) is the residual likelihood of A given by
RE
(cid:18) (cid:19)
1
LRE(A) = |X(cid:48)V−1X|−21|V|−12 exp − y(cid:48)Py ,
2
6
with P = V−1−V−1X(X(cid:48)V−1X)−1X(cid:48)V−1. Note that Aˆ does not satisfy
RE
any of the four desirable properties listed in the introduction.
In an effort to find a likelihood-based estimator of A that satisfies all the
four desirable properties, we define the followed adjusted maximum likeli-
hood estimator of A:
Aˆ = arg maxh (A)L (A),
i i RE
A∈[0,∞)
where h (A) is a factor to be suitably chosen so that all the four desirable
i
properties are satisfied.
Wefirstfindh (A)sothattheresultingestimatorofAresultsinanearly
i
unbiased estimator of B that also protects EBLUP from overshrinking. In
i
other words, we first find the adjustment factor h (A) that simultaneously
i
satisfies Properties 1 and 2. Interestingly, it turns out that such a adjusted
maximum likelihood estimator also satisfies Properties 3 and 4.
Using Lemma 1 in Appendix A and Taylor series expansion, we have
2D2
Var(Bˆ ) = i +o(m−1), (2)
i (A+D )4tr[V−2]
i
forlargem. WerestrictourselvestotheclassofestimatorsofAthatsatisfies
(2).
Using Lemma 1 and Taylor series expansion, we have
(cid:20)∂B ∂logh (A) 1∂2B (cid:21) 2
E(Bˆ ) = B + i i + i +o(m−1). (3)
i i ∂A ∂A 2 ∂A2 tr[V−2]
Thus, Property 1 is satisfied if we have
∂B ∂logh (A) 1∂2B
i i i
+ = 0. (4)
∂A ∂A 2 ∂A2
Now the differential equation (12) simplifies to:
∂logh (A) 1
i
= . (5)
∂A A+D
i
Thus, an adjustment factor that satisfies (5) is given by
h (A) = (A+D ).
i0 i
This adjustment factor is indeed the unique solution to (12) up to the order
O(m−1). Let Aˆ be the adjusted maximum likelihood estimator of A for
i0
7
the choice h (A) = h (A). We note that Aˆ is not strictly positive. To
i i0 i0
achieve strict positivity, we propose our final estimator of A as:
Aˆ = arg maxh˜ (A)L (A),
i;MG i RE
A∈[0,∞)
where h˜ (A) = h (A)h (A) with the additional adjustment h (A) satisfy-
i + i0 +
ing regularity conditions R4 and R6-R7.
Our proposed estimator of B and EBLUP are given by
i
Bˆ = B (Aˆ ), θˆEB = θˆBLUP(Aˆ ),
i;MG i i;MG i;MG i i;MG
respectively.
Unlike the common practice, we avoid bias correction in obtaining both
Taylor series and parametric bootstrap MSE estimators of our proposed
EBLUP. Interestingly, our approach ensures the important dual property
of MSE estimator — second-order unbiasedness and strict positivity. This
kind of MSE estimators is the first of its kind in the small area estimation
literature.
We obtain our Taylor series estimator of MSE of EBLUP by simply
plugging in the proposed estimator Aˆ for A in the second-order MSE
i;MG
approximation M (A) and is given by:
approx
Mˆ ≡ M (Aˆ ) = g (Aˆ )+g (Aˆ )+g (Aˆ ), (6)
i;MG i;approx i;MG 1i i;MG 2i i;MG 3i i;MG
OurproposedparametricbootstrapMSEestimatorretainsthesimplicity
of bootstrap originally intended in Efron (1979). It is given by
Mˆboot ≡ E [θˆ(Aˆ∗ ,y∗)−θ∗]2, (7)
i;MG ∗ i i;MG i
where θ∗ = x(cid:48)βˆ(Aˆ ,··· ,Aˆ )+v∗ with v∗ ∼ N(0,Aˆ ). Note that
i i 1;MG m;MG i i i;MG
the new bootstrap MSE estimator does not require any bias correction.
The following theorem states that our proposed adjusted maximum like-
lihood estimator of A satisfies all the four desirable properties.
Theorem 1. Under the regularity conditions R1−R7, we have, for large
m,
(i)Bias(Bˆ ) = o(1); Var(Bˆ ) = 2Di2 +o(m−1);
i;MG i;MG (A+Di)4tr[V−2]
(ii) 0 < inf Bˆ ≤ sup Bˆ < 1, for m > p+2;
m≥1 i;MG m≥1 i;MG
(iii)E(Mˆ )−M (θˆEB ) = o(m−1);
i;MG i i;MG
(iv) E(Mˆboot )−M (θˆEB ) = o(m−1).
i;MG i i;MG
For proof of Theorem 1, see Appendix B.
8
3 The balanced case: D = D, i = 1,··· ,m
i
In this section, we show how the proposed adjusted maximum likelihood
estimatorofAisrelatedtotheproblemofsimultaneousestimationofseveral
independent normal means, a topic for intense research activities, especially
in the 60’s, 70’s and 80’s, since the introduction of the celebrated James-
Stein estimator (James and Stein 1961).
ind
Let y |θ ∼ N(θ ,1), i = 1,··· ,m. James and Stein (1961) showed that
i i i
form ≥ 3themaximumlikelihood(alsounbiased)estimatorofθ isinadmis-
i
sible under the sum of squared error loss function L(θˆ,θ) = (cid:80)m (θˆ −θ )2
j=1 j j
and is dominated by the James-Stein estimator: θˆJS = (1−Bˆ )y , where
i JS i
Bˆ = (m−2)/(cid:80)m y2. That is,
JS j=1 j
m m
(cid:88) (cid:88)
E (θˆjJS −θj)2|θ ≤ E (yj −θj)2|θ, ∀θ ∈ Rm, (8)
j=1 j=1
whereRm isthem-dimensionalEuclideanspace, withstrictinequalityhold-
ing for at least one point θ. The dominance result, however, does not hold
for individual components.
EfronandMorris(1973)offeredanempiricalBayesianjustificationofthe
iid
James-Stein estimator under the prior θ ∼ N(0,A), i = 1,··· ,m. Their
i
model is indeed a special case of two level normal hierarchical model with
D = 1, x(cid:48)β = 0, i = 1,··· ,m, and thus the James-Stein estimator of θ
i i i
can be also viewed as an EBLUP.
Morris (1983) discussed an empirical Bayesian estimation of θ for a
i
Bayesian model that is equivalent to the balanced case of NHM, that is,
when D = D implying B = B, i = 1,··· ,m. In this case, he noted that
i i
Bˆ = (m−p−2)D/S is an exact unbiased estimator of B, using the fact
U
that, under NHM, S = (cid:80)m (y −x(cid:48)βˆ )2 ∼ (D +A)χ2 , where βˆ is
j=1 j j ols m−p ols
the ordinary least square estimator of β. We can write Bˆ ≡ B(Aˆ ) =
U Morris
D/(D+Aˆ ), where Aˆ = S/(m−p−2) − D. One can alter-
Morris Morris
natively estimate B by a simple plug-in estimator: Bˆ ≡ B(Aˆ ) =
plug U
D/(D+Aˆ ), where Aˆ = S/(m−p)−D is an unbiased estimator of A.
U U
Note that for m > p+4
2
E(Bˆ −B) = 0, E(Bˆ −B) = B = O(m−1),
U plug
m−p−2
(cid:18)m−p−2(cid:19)2
V(Bˆ ) = V(Bˆ ) ≤ V(Bˆ ).
U plug plug
m−p
9
Thus, Bˆ is better than Bˆ both in terms of bias and variance prop-
U plug
erties. We can write Bˆ = Bˆ (m−p−2)/(m−p). As pointed out by
U plug
Morris (1983), the factor (m−p−2)/(m−p) helps correct for the curvature
dependence of B on A.
Consider the following empirical Bayes estimator (same as EBLUP) of
θ :
i
θˆEB(Aˆ ) = (1−Bˆ )y +Bˆ x(cid:48)βˆ . (9)
i Morris U i U i ols
In this case, exact MSE and exact unbiased estimator of MSE can be ob-
tained. Componentwise, for m ≥ p+3, we have
E[(θˆEB(Aˆ )−θ )2] ≤ D.
i Morris i
Thus, θˆEB(Aˆ ) dominates y in terms of unconditional MSE for
i Morris i
m ≥ p+3. Such a componentwise dominance property, however, does not
hold for conditional MSE (conditional on θ); see Morris (1983) for details.
Since B < 1, using Stein’s argument, Morris (1983) suggested the fol-
lowing estimator of B : Bˆ = D/(D+Aˆ+ ), where Aˆ+ =
Morris Morris Morris
S/(m−p−2) − D if S > (m − p − 2)D and Aˆ+ = 2D/(m−p−2)
Morris
otherwise. This improves the estimation of both B and θ . It is straight-
i
forward to show that in this special case Aˆ+ satisfies all the four prop-
Morris
erties. Moreover, under the regularity condition R6-R8 and m > p + 2,
Aˆ , our proposed estimator of A, is unique (see Appendix C for a proof)
MG
and is equivalent to Aˆ+ in the higher-order asymptotic sense, that is,
Morris
E(Aˆ −Aˆ+ ) = o(m−1).
MG Morris
LetθˆEB = θˆEB(Aˆ)denoteanEBLUPofθ ,whereAˆcouldbeAˆ , Aˆ+
i i i MG Morris
or the REML Aˆ = max(0,Aˆ ). We can write M (A) = g (A) +
RE U i;approx 1
g (A)+g (A) as the second-order approximation to M (θˆEB) = MSE(θˆEB)
2 3 i i i
for any of the three choices of the estimator of A. The traditional second-
orderunbiasedMSEestimatorisobtainedbycorrectingbiasofM (Aˆ ),
i;approx RE
up to the order O(m−1). It is given by Mˆ = g (Aˆ ) + g (Aˆ ) +
i,RE 1 RE 2 RE
2g (Aˆ ); see Prasad and Rao (1990), Datta and Lahiri (2000), Das et al.
3 RE
(2004). In this paper, we suggest an alternative second-order unbiased MSE
estimator without bias-correction, that is, Mˆ = g (Aˆ )+g (Aˆ )+
i;MG 1 MG 2 MG
g (Aˆ ).
3 MG
We can show that
V(Mˆ ) = a +o(m−1),
i,RE m
V[Mˆ ] = b +o(m−1),
i;MG m
10