New Important Developments in Small Area Estimation

[
[
University of Southampton and Hebrew University of Jerusalem
Danny Pfeffermann is Professor of Statistics, Southampton Statistical Sciences Research Institute, University of
Southampton, Southampton, SO17 1BJ, United Kingdom, and Department of Statistics, Hebrew University of Jerusalem, Jerusalem,
91905, Israel \printeade1.

Abstract

The problem of small area estimation (SAE) is how to produce reliable
estimates of characteristics of interest such as means, counts,
quantiles, etc., for areas or domains for which only small samples or no
samples are available, and how to assess their precision. The purpose of
this paper is to review and discuss some of the new important
developments in small area estimation methods. Rao [Small Area Estimation (2003)] wrote a very
comprehensive book, which covers all the main developments in this topic
until that time. A few review papers have been written after 2003, but
they are limited in scope. Hence, the focus of this review is on new
developments in the last 7–8 years, but to make the review more
self-contained, I also mention shortly some of the older developments.
The review covers both design-based and model-dependent methods, with
the latter methods further classified into frequentist and Bayesian
methods. The style of the paper is similar to the style of my previous
review on SAE published in 2002, explaining the new problems
investigated and describing the proposed solutions, but without dwelling
on theoretical details, which can be found in the original articles. I
hope that this paper will be useful both to researchers who like to
learn more on the research carried out in SAE and to practitioners who
might be interested in the application of the new methods.

1 Preface

The problem of small area estimation (SAE) is how to produce reliable
estimates of characteristics of interest such as means, counts,
quantiles, et cetera, for areas or domains for which only small samples
or no samples are available. Although the point estimators are usually
of first priority, a related problem is how to assess the estimation
(prediction) error.

The great importance of SAE stems from the fact that many new programs,
such as fund allocation for needed areas, new educational or health
programs and environmental planning rely heavily on these estimates. SAE
techniques are also used in many countries to test and adjust the counts
obtained from censuses that use administrative records.

In 2002 I published a review paper with a similar title (Pfeffermann (2002)). In that year small area estimation (SAE) was flourishing both in
research and applications, but my own feeling then was that the topic
has been more or less exhausted in terms of research and that it will
just turn into a routine application in sample survey practice. As the
past 9 years show, I was completely wrong; not only is the research in
this area accelerating, but it now involves some of the best known
statisticians, who otherwise are not involved in survey sampling theory
or applications. The diversity of new problems investigated is
overwhelming, and the solutions proposed are not only elegant and
innovative, but also very practical.

Rao (2003) published a comprehensive book on SAE that covers all the
main developments in this topic up to that time. The book was written
about ten years after the review paper of Ghosh and Rao (1994),
published in Statistical Science, which stimulated much of the
early research in SAE. Since 2003, a few other review papers have been
published; see, for example, Rao (2005, 2008), Jiang and Lahiri (2006a, 2006b),
Datta (2009) and Lehtonen and Veiganen (2009). Notwithstanding, SAE is
researched and applied so broadly that I decided that the time is ripe for a new
comprehensive review that focuses on the main developments in the last
7–8 years that I am aware of, and which are hardly covered in the review
papers mentioned above. The style of the paper is similar to the style
of my previous review, explaining the problems investigated and
describing the proposed solutions, but without dwelling on theoretical
details, which can be found in the original articles. For further
clarity and to make the paper more self-contained, I start with a short
background and overview some of the “older” developments. I hope
that this paper will be useful to researchers who wish to learn about
the research carried out in SAE and to practitioners who might be
interested in applying the new methods.

2 Some Background

The term “SAE” is somewhat confusing, since it is the size of
the sample in the area that causes estimation problems, and not the size
of the area. Also, the “areas” are not necessarily geographical
districts and may define another grouping, such as socio-demographic
groups or types of industry, in which case they are often referred to as
domains. Closely related concepts in common use are “poverty
mapping” or “disease mapping,” which amount to SAE of poverty
measures or disease incidence and then presenting the results on a map,
with different colors defining different levels (categories) of the
estimators. What is common to most small area estimation problems is
that point estimators and error measures are required for every area
separately, and not just as an average over all the areas under
consideration.

SAE methods can be divided broadly into “design-based” and
“model-based” methods. The latter methods use either the
frequentist approach or the full Bayesian methodology, and in some cases
combine the two, known in the SAE literature as “empirical
Bayes.” Design-based methods often use a model for the construction of
the estimators (known as “model assisted”), but the bias,
variance and other properties of the estimators are evaluated under the
randomization (design-based) distribution. The randomization
distribution of an estimator is the distribution over all possible samples that could be
selected from the target population of interest under the sampling
design used to select the sample, with the population measurements
considered as fixed values (parameters). Model-based methods on the
other hand usually condition on the selected sample, and the inference
is with respect to the underlying model.

A common feature to design- and model-based SAE is the use of auxiliary
covariate information, as obtained from large surveys and/or
administrative records such as censuses and registers. Some estimators
only require knowledge of the covariates for the sampled units and the
true area means of these covariates. Other estimators require knowledge
of the covariates for every unit in the population. The use of auxiliary
information for SAE is vital because with the small sample sizes often
encountered in practice, even the most elaborated model can be of little
help if it does not involve a set of covariates with good predictive power for
the small area quantities of interest.

3 Notation

Consider a population U of size N, divided into M exclusive and
exhaustive areas U1∪⋯∪UM with Ni units in area
i, ∑Mi=1Ni=N. Suppose that samples are available for
m≤M of the areas, and let s=s1∪⋯∪sm define
the overall sample, where si of size ni is the sample observed
for sampled area i, ∑mi=1ni=n. Note that ni is random
unless a planned sample of fixed size is taken in that area. Let y
define the characteristic of interest, and denote by yij the
response value for unit j belonging to area i, i=1,…,M, j=1,…,Ni with sample means ¯yi=∑nij=1yij/ni, where we assume without loss of generality that the
sample consists of the first ni units. We denote by
xij=(x1ij,…,xpij)′ the covariate values
associated with unit (i,j) and by ¯xi=∑nij=1xij/ni the column vector of sample means.
The corresponding vector of true area means is ¯Xi=∑Nij=1xij/Ni. The area target quantity is denoted
by θi; for example, θi=¯Yi=∑Nij=1yij/Ni, the response area mean. Estimating a
proportion is a special case where yij is binary. In other
applications θi may represent a count or a quantile.

4 Design-Based Methods

4.1 Design-Based Estimators in Common Use

A recent comprehensive review of design-based methods in SAE is provided
by Lehtonen and Veijanen (2009). Here I only overview some of the basic
ideas. Suppose that the sample is selected by simple random sampling
without replacement (SRSWOR) and that the target quantities of interest
are the means ¯Yi. Estimation of a mean contains as special
cases the estimation of a proportion and the estimation of the area
distribution Fi(t)=∑j∈Uivij/Ni, in which case
vij=I(yij≤t), where I(A) is the indicator function.
Estimators of the percentiles of the area distribution are commonly
obtained from the estimated distribution.

If no covariates are available the direct design-unbiased
estimator of the area mean and its conditional design variance over the
randomization distribution for given ni are given by

¯yi

=

ni∑j=1yij/ni;

(1)

VD[¯yi|ni]

=

(S2i/ni)[1−(ni/Ni)],

where S2i=∑Nij=1(yij−¯Yi)2/(Ni−1). The term “direct” is used to
signify an estimator that only uses the data available for the target
area at the specific time of interest. The variance
VD[¯yi|ni] is O(1/ni), and for small ni it is
usually large, unless S2i is sufficiently small.

Next suppose that covariates xij are also observed with
x1ij≡1. An estimator in common use that utilizes the
covariate information is the synthetic estimator,

^¯Ysynreg,i=¯X′i^B=1NiNi∑j=1(x′ij^B),

(2)

where ^B=[∑mi=1∑nij=1xijx′ij]−1∑mi=1∑nij=1xijyij is the
ordinary least square (OLS) estimator. Notice that under SRSWOR,
^B is approximately design-unbiased and consistent for the vector
B of regression coefficients computed from all the population values,
irrespective of whether a linear relationship between y and x exists in the population. The design-unbiasedness and
consistency are with respect to the randomization distribution, letting
N and n increase to infinity in a proper way. An estimator is
approximately design-unbiased if the randomization bias tends to zero as
the sample size increases. The term “synthetic” refers to the
fact that an (approximately) design-unbiased estimator computed from all
the areas (^B in the present case) is used for every area
separately, assuming that the areas are “homogeneous” with
respect to the quantity being estimated. Thus, synthetic estimators
borrow information from other “similar areas” and they are
therefore indirect estimators.

The obvious advantage of the synthetic estimator over the simple sample
mean or other direct estimators such as the regression estimator
^¯Ydirreg,i=¯yi+(¯Xi−¯xi)′^Bi, where ^Bi is computed only from the
data observed for area i, is that
VarD(^¯Ysynreg,i)=O(1/n), and n=∑mi=1ni is usually large. The use of the synthetic estimator is
motivated (“assisted”) by a linear regression model of y on
x in the population with a common vector of coefficients.
However, for x1ij≡1,
ED(^¯Ysynreg,i−¯Yi)≅−¯X′i(Bi−B), where Bi is the OLS computed from all the
population values in area i. Thus, if in fact different regression
coefficients Bi operate in different areas, the synthetic estimator
may have a large bias. When the sample is selected with unequal
probabilities, the OLS estimator ^B in (2) is commonly replaced
by the probability weighted (PW) estimator

^Bpw=[m∑i=1ni∑j=1wijxijx′ij]−1m∑i=1ni∑j=1wijxijyij,

where {wij=1/Pr[(i,j)∈s]} are the base sampling weights.

In order to deal with the possible large bias of the synthetic
estimator, it is common to estimate the bias and then subtract it from
the synthetic estimator. The resulting survey regression
estimator takes the form

^¯YS--Ri

=

¯X′i^Bpw+1Nini∑j=1wij(yij−x′ij^Bpw)

=

^¯Yi,H--T+(¯Xi−^¯Xi,H--T)′^Bpw,

where (^¯Yi,H--T,^¯Xi,H--T) are the
Horvitz–Thompson (H–T) estimators of (¯Yi,¯Xi). The
estimator (4.1) is approximately design-unbiased and performs well when
the covariates have good predictive power, but the variance is back to
order O(1/ni). The variance is often reduced by multiplying the
bias correction ∑nij=1wij(yij−x′ij^Bpw)/Ni by Ni/∑nij=1wij=Ni/^Ni.

A compromise between the possibly large bias of the synthetic estimator
and the possibly large variance of the survey regression estimator is
achieved by taking a linear combination of the two. The resulting
combined (composite) estimator is defined as

^¯YCOMi=δi^¯YS--Ri+(1−δi)^¯Ysynreg,i;0≤δi≤1.

(4)

Ideally, the coefficient δi should be chosen to minimize the
mean square error (MSE) of ^¯YCOMi, but assessing
sufficiently accurately the bias of the synthetic estimator for a given area is
usually impossible. Hence, it is common to let δi depend on
the sample size ni in the area, such that the larger ni, the
larger is δi. See Rao (2003) for review of other combined estimators, and methods of
specifying δi.

4.2 Some New Developments in Design-Based Small Area Estimation

A general class of estimators is obtained by calibrating the base
sampling weights wij. Suppose that the population can be
partitioned into C calibration groups U=U(1)∪⋯∪U(C) with known totals tx(c) of the auxiliary
variables in the groups, such that each area Ui belongs to one of
the groups. Let s=s(1)∪⋯∪s(C) define the respective
partitioning of the sample. In a special case C=1 and U(1)=U.
The calibrated estimator of the mean ¯Yi is computed
as

^¯Ycali=ni∑j=1wcijyij/Ni;∑i,j∈s(c)wcijxij=tx(c).

(5)

The calibration weights {wcij} are chosen so that they
minimize an appropriate distance from the base weights {wij},
subject to satisfying the constraints ∑i,j∈s(c)wcijxij=tx(c). For example, when using the
distance χ2=∑i,j∈s(c)(wcij−wij)2/wij and x1ij≡1, the calibrated weights are

wcij

=

wijgij;

gij

=

{1+(tx(c)−^tx(c),H--T)′

⋅[∑i,j∈s(c)wijxijx′ij]−1xij},

where ^tx(c),H--T is the H–T estimator of the total
tx(c). When Uc=Ui (the calibration group is the
domain), ^¯Ycali is the familiar generalized
regression (GREG) estimator in the domain.

Calibration of the sampling weights is in broad use in sample survey
practice, not only for SAE. See Kott (2009) for a recent comprehensive
review and discussion. The rationale of the use of calibrated estimators
in SAE is that if y is approximately a linear combination of
x in U(c), then ¯Yi≅¯X′iB(c) for domains i∈Uc, and since
∑i,j∈s(c)wcijxij=tx(c),
^¯Ycali=∑nij=1wcijyij/Ni
is expected to be a good estimator of ¯Yi. Indeed, the
advantage of estimator (5) over (2) is that it is assisted by a
model that only assumes common regression coefficients within the groups
U(c), and not for all the domains, as implicitly assumed by
estimator (2). Estimator (5) is approximately design-unbiased
irrespective of any model, but VarD(^¯Ycali|ni)=O(1/ni), which may still be large.

Another way of calibrating the weights is by use of instrumental
variables (Estevao and Särndal, 2004, 2006). Denote the vector of
instrument values for unit (i,j) by hij. The calibrated weights
are defined as

winsij

=

wij(1+g′chij);

(7)

g′c

=

(tx(c)−^tx(c),H--% T)′[∑i,j∈s(c)wijhijx′ij]−1.

Note that the instrument values need only beknown for the sampled units
in s(c) and that∑i,j∈s(c)winsijxij=tcx, thus satisfying the same constraints
as before. The calibrated estimator of ¯Yi is now
^¯Ycali,ins=∑nij=1winsij⋅\allowbreakyij/Ni. When h=x,winsij=wcij. The use
of instruments replaces the search for an appropriate distance function
by imposing a structure on the calibration weights, and it allows one,
in principle, to find the best instruments in terms of minimizing an
approximation to the variance of the calibrated estimator. However, as
noted by Estevao and Särndal (2006), the resulting optimal weights depend on
unknown population quantities which, when estimated from the sample, may
yield unstable estimators. See Kott (2009) for further discussion.

The synthetic estimator (2), the survey regression estimator (4.1) and
the various calibrated estimators considered above are all assisted by
models that assume a linear relationship between y and
x. These estimators only require knowledge of the covariates
for the sampled units, and the area (or group) totals of these
covariates. Lehtonen, Särndal and Veijanen (2003, 2005) consider the use of generalized
linear models (GLM), or even generalized linear mixed models (GLMM) as
the assisting models, which require knowledge of the covariates for
every element in the population. Suppose that EM(yij)=f(xij;ψ) for some nonlinear function f(⋅) with an
unknown vector parameter ψ, where EM(⋅) defines the
expectation under the model. A simple important example is where
f(xij;ψ) is the logistic function. Estimating ψ by
the pseudo-likelihood (PL) approach yields the estimator
^ψpl and predicted values {^yij=f(xij;^ψpl)}. The PL approach consists of
estimating the likelihood equations that would be obtained in case of a
census by the corresponding H–T estimators (or weighting each score
function by its sampling weight), and then solving the resulting
estimated equations. The synthetic and “generalized GREG” estimators
are computed as

^¯YsynGLM,i

=

1NiNi∑j=1f(xij;^ψpl);

^¯YGREGGLM,i

=

^¯YsynGLM,i

+1Nini∑j=1wij[yij−f(xij;^ψpl)].

A further extension is to include random area effects in the assisting
model, assuming EM(yij|xij,\breakui)=f(xij,ui;ψ∗), EM(ui)=0, VarM(ui)=σ2u. Estimation of the fixed parameters ψ∗, σ2u and the random effects ui is now under the model,
ignoring the sampling weights. The extended synthetic and generalized
GREG estimators are defined similarly to (4.2), but with
f(xij;^ψpl) replaced by
f(xij,^ui;^ψ∗). For sufficiently large
sample size ni, the extended generalized GREG is approximately
design-unbiased for the true area mean, but it is not clear how to
estimate the design (randomization) variance in this case in a way that
accounts for the prediction of the random effects. Torabi and Rao (2008)
compare the MSE of model-based predictors and a GREG assisted by a
linear mixed model (LMM).

Jiang and Lahiri (2006a) propose the use of model-dependent estimators
that are design-consistent under the randomization distribution as the
area sample sizes increase. The basic idea is to model the direct
estimators ^¯Yiw=∑nij=1wijyij/∑nij=1wij instead of the individual observations
yij, and then employ the empirical best predictor of the area mean
under the model. The authors consider the general two-level model
EM[^¯Yiw|ui]=ξi=ξ(ui,^¯Xiw;ψ),where the uis are independent
random area effects with zero mean and variance σ2u,
^¯Xiw=∑nij=1wijxij/\break∑nij=1wij, and ξ(⋅) is some known function with
unknown parameters ψ. The empirical best predictor is the best
predictor under the model (minimum expected quadratic loss), but with
the parameters ψ replaced by model consistent estimators;
^¯YEBPi=EM(ξi|^¯Yiw,^¯Xiw;^ψ). The estimator is shown to be model-consistent under correct model
specification and design-consistent for large ni, even if the model
is misspecified, thus robustifying the estimation. The authors develop
estimators of the prediction mean squared error (PMSE) for bounded
sample sizes ni, with bias of desired order o(1/m), where m is the
number of sampled areas. The PMSE is computed with respect to the model
holding for the individual observations and over the randomization
distribution. The use of design consistent estimators in SAE is somewhat
questionable because of the small sample sizes in some or all of the
areas, but it is nonetheless a desirable property. This is so because it
is often the case that in some of the areas the samples are large, and
it is essential that an estimator should work well at least in these
areas, even if the model fails. Estimators with large randomization bias
even for large samples do not appeal to practitioners.

Chandra and Chambers (2009) propose the use of model-based direct
estimators (MBDE). The idea is to fit a model for the population values,
compute the weights defining the Empirical Best Linear Unbiased
Predictor (EBLUP) of the population total under the model and then use
the weights associated with a given area to compute an almost direct
estimator. The model fitted for the population values YU is the
general linear model,

YU

=

XUβ+εU;E(εU)=0,

(9)

E(εUε′U)

=

Σ=[ΣssΣsrΣrsΣrr],

where s signifies the sample of size n, and r signifies the
sample-complement of size (N−n). As seen later, the models in common
use for SAE defined by (12) and (14) below are special cases of (9).
Let ys denote the column vector of sample outcomes. For known
Σ, the BLUP of the population total ty=∑Nk=1yk under the model is

^tBLUPy

=

1′nys+1′N−n[Xr^βGLS

+ΣrsΣ−1ss(ys−Xs^βGLS)]

=

∑k∈swBLUPkyk,

where 1′k is a row vector of ones of length k, Xs(Xr) is the design matrix corresponding to the sampled (nonsampled) units and
^βGLS is the generalized least square estimator.
The EBLUP is ^tEBLUPy=\break∑k∈swEBLUPkyk,
where the EBLUP weights are the same as in (4.2), but with estimated
parameters. The MBDE of the true mean in area i is

^¯YMBDi=∑j∈siwEBLUPjyj/∑j∈siwEBLUPj.

(11)

The authors derive estimators for the bias and variance of the MBDE and
illustrate its robustness to certain model misspecifications. Note,
however, that ^¯YMBDi is a ratio estimator and
therefore may have a nonnegligible bias in areas i with small sample
size.

All the estimators considered so far assume a given sampling design with
random area sample sizes.When the target areas are known in advance,
considerable gains in efficiency can be achieved by modifying the
sampling design and in particular, by controlling the sample sizes
within these areas. In a recent article, Falrosi and Righi (2008)
propose a general strategy for multivariate multi-domain estimation that
guarantees that the sampling errors of the domain estimators are lower
than pre-specified thresholds. The strategy combines the use of a
balanced sampling technique and GREG estimation, but extensions to the
use of synthetic estimators and model-based estimation are also
considered. A successful application of this strategy requires good
predictions of weighted sums of residuals featuring in the
variance
expressions, and it may happen that the resulting overall sample size is
far too large, but this is a promising approach that should be studied
further.

4.3 Pros and Cons of Design-Based Small Area Estimation

The apparent advantage of design-based methods is that the estimation is
less dependent on an assumed model, although models are used (assisted)
for the construction of the estimators. The estimators are approximately
unbiased and consistent under the randomization distribution for large
sample sizes within the areas, which as discussed before is a desirable
property that protects against possible model misspecification at least
in large areas.

Against this advantage stand many disadvantages. Direct estimators
generally have large variance due to small sample sizes. The survey
regression estimator is approximately unbiased but may likewise be too
variable. Synthetic estimators have small variance but are generally
biased. Composite estimators have smaller bias than synthetic estimators
but larger variance, and it is not obvious how to best choose the
weights attached to the synthetic estimator and the unbiased estimator. Computation of randomization-based confidence
intervals generally requires large sample normality assumptions, but the
sample sizes in at least some of the areas may be too small to justify
asymptotic normality.

Another limitation of design-based inference (not restricted to SAE) is
that it does not lend itself to conditional inference, for example,
conditioning on the sampled values of the covariates or the sampled
clusters in a two-stage sampling design. This again inflates the
variance of the estimators. Conditional inference is in the heart of
classical statistical inference under both the frequentist and the
Bayesian approaches. Last, but not least, an important limitation of
design-based SAE is that there is no founded theory for estimation in
areas with no samples. The use of the randomization distribution does
not extend to prediction problems, such as the prediction of small area
means for areas with no samples. It is often the case that samples are
available for only a minority of the areas, but estimators and MSE
estimators are required for each of the areas, whether sampled or not.

5 Model-Based Methods

5.1 General Formulation

Model-based methods assume a model for the sample data and use the
optimal or approximately optimal predictor of the area characteristic of
interest under the model. The MSE of the prediction error is likewise
defined and estimated with respect to the model. Note that I now use the
term “prediction” rather than estimation because the target
characteristics are generally random under the model. The use of models
overcomes the problems underlying the use of design-based methods, but
it is important to emphasize again that even the most elaborated model
cannot produce sufficiently accurate predictors when the area sample
size is too small, and no covariates with good predictive power are
available. The use of models raises the question of the robustness of
the inference to possible model misspecification, and Sections 6.3–6.6
review studies that deal with this problem from different perspectives.
Section 8 considers model selection and diagnostic checking.

Denote by θi the target quantity in area i (mean,
proportion, …). Let yi define the observed responses for area
i and xi define the corresponding values of the
covariates (when available). As becomes evident below, yi is either
a scalar, in which case xi is a vector, or yi is a
vector, in which case xi is usually a matrix. A typical
small area model consists of two parts: The first part models the
distribution (or just the moments) of yi|θi;ψ(1). The
second part models the distribution (moments) of
θi|xi;ψ(2), linking the θis to
known covariates and to each other. This is achieved by including in the
model random effects that account for the variability of the
θis not explained by the covariates. The hyper-parameters
ψ=(ψ(1),ψ(2)) are typically unknown and are estimated
either under the frequentist approach, or under the Bayesian approach
by setting appropriate prior distributions. In some applications the index i
may define time, in which case the model for
θi|xi;ψ2 is a time series model.

5.2 Models in Common Use

In this section, I review briefly three models in common use, as most of
the recent developments in SAE relate to these models or extensions of
them. For more details see Rao (2003), Jiang and Lahiri (2006a, 2006b), Datta (2009)
and the referencestherein. I assume that the model holding for
the sample data is the same as the model holding in the population, so
that there is no sample selection bias. The case of informative selection of the areas to be sampled or
informative sampling within the selected areas, whereby the sample selection or
response probabilities are related to the response variable even after
conditioning on the model covariates is considered in Section 7. Notice
that in this case the sample model differs from the population model.

5.2.1 Area level model

This model is in broad use when the covariate information is only at the
area level, so that xi is a vector of known area
characteristics. The model, studied originally for SAE by Fay and Herriot (1979) is defined as

~yi=θi+ei;θi=x′iβ+ui,

(12)

where ~yi denotes the direct sample estimator of θi
(e.g., the sample mean ¯yi when the sample is selected by
SRS), and ei represents the sampling error, assumed to have zero
mean and known design (randomization) variance, VarD(ei)=σ2Di. The random effects ui are assumed to be
independent with zero mean and variance σ2u. For known
σ2u, the best linear unbiased predictor (BLUP) of
θi under this model is

^θi

=

γi~yi+(1−γi)x′i^βGLS

=

x′i^βGLS+γi(~yi−x′i^βGLS)

=

x′i^βGLS+^ui.

The BLUP ^θi is in the form of a composite estimate
[equation (4)], but with a tuning (shrinkage) coefficient γi=σ2u/(σ2u+σ2Di), which is a function
of the ratio σ2u/σ2Di of the variances of the
prediction errors of x′iβ and ~yi,
respectively. The coefficient γi defines optimally the
weights
assigned to the synthetic estimator x′i^βGLS
and ~yi, unlike the case of design-based estimators where
the weight is assigned in a more ad hoc manner. See the discussion below
(4). Note that the BLUP property does not require specifying the
distribution of the error terms beyond the first two moments, and
^θi is also the linear Bayes predictor in this case.
Under normality of the error terms and a diffuse uniform prior for
β,^θi is the Bayesian predictor (posterior mean) of
θi. For a nonsampled area k, the BLUP is now obtained
optimally as x′k^βGLS.

In practice, the variance σ2u is seldom known and is
replaced in γi and ^βGLS by a sample estimate,
yielding what is known as the empirical BLUP (EBLUP) under the
frequentist approach, or the empiricalBayes (EB) predictor when
assuming normality. The latter predictor is the posterior mean of
θi, but with σ2u replaced by a sample estimate
obtained from the marginal distribution of the direct estimators given
the variance. Alternatively, one may compute the Hierarchical Bayes (HB)
predictor by assuming prior distributions for β and σ2u and
computing the posterior distribution of θi given the available
data. The posterior distribution can be used for computation of the point predictor and a
credibility (confidence) interval.

{Remark}

The synthetic estimator
x′i^βGLS, and hence the BLUP ^θi are unbiased predictors under the joint distribution of yi
and θi in the sense that E(^θi−θi)=0, but are biased when conditioning on ui. The predictor
^θi is biased also under the randomization distribution.
Conditioning on ui amounts to assuming different fixed intercepts
in different areas and the unbiasedness of ^θi under the
model is achieved by viewing the intercepts as random.

{Remark}

It is often the case that the linking
model is defined for a transformation of θi. For example, Fay and Herriot (1979)
actually assume log(θi)=x′iβ+ui in (12) and use the direct estimator
~yi=log(¯yi), and then predict θi as
exp(~θi), where ~θi is the BLUP
(EBLUP) of log(θi) under the model. However,
exp(~θi) is not the BLUP of θi=exp[log(θi)]. On the other hand, the EB and HB approaches produce
optimal predictors of θi, even if the linking model uses a
transformation of θi, with or without the use of a similar
transformation for the direct estimator. In this respect, the latter two
approaches are more flexible and with wider applicability, but at the
expense of requiring further parametric assumptions.

5.2.2 Nested error unit level model

This model uses individual observations yij such that yi is
now a vector, and xi is generally a matrix. The use of
this model for SAE requires that the area means
¯Xi=∑Nij=1xij/Ni are known. The model, first proposed for SAE by
Battese, Harter and Fuller (1988) has the form

yij=x′ijβ+ui+εij,

(14)

where the uis (random effects) and the εijs
(residual terms) are mutually independent with zero means and variances
σ2u and σ2ε, respectively. Under
the model, the true small area means are ¯Yi=¯X′iβ+ui+¯εi, but since
¯εi=∑Nij=1εij/Ni≅0 for large Ni, the target means are often defined as
θi=¯X′iβ+ui=E(¯Yi|ui). For
known variances (σ2u,σ2ε), the BLUP
of θi is

^θi

=

γi[¯yi+(¯Xi−¯xi)′^βGLS]

+(1−γi)¯X′i^βGLS,

where ^βGLS is the GLS of β computed from all the
observations, ¯xi=∑nij=1xij/ni and γi=σ2u/(σ2u+σ2ε/ni). For area k with no sample (but
known ¯Xk), the BLUP is ^θk=¯X′k^βGLS. See Rao (2003) for the BLUP of the
means ¯Yi in sampled areas.

The BLUP (5.2.2) is also the Bayesian predictor (posterior mean) under
normality of the error terms and a diffuse uniform prior for β.
Replacing the variances σ2u and σ2ε
in γi and ^βGLS by sample estimates yields the
corresponding EBLUP or EB predictors. Hierarchical Bayes (HB) predictors
are obtained by specifying prior distributions for β and the two
variances and computing the posterior distribution of θi (or
¯Yi) given all the sample observations in all the areas.
Remark 5.2.1 applies to the BLUP (EBLUP) under this model as well.

5.2.3 Mixed logistic model

The previous two models assume continuous responses. Suppose now that
yij is binary, taking the values 1 or 0, in which case the small
area quantities of interest are usually proportions or counts (say, the
proportion or total of unemployed persons in the area). The following
generalized linear mixed model (GLMM) considered originally by MacGibbon
and Tomberlin (1989) for SAE is in broad use for this kind of problems:

Pr(yij=1|pij)

=

pij;

(16)

logit(pij)

=

x′ijβ+ui;ui∼N(0,σ2u).

The responses yij are assumed to be conditionally independent,
given the random effects ui, and likewise for the random effects.
The purpose is to predict the true area proportions pi=∑Nij=1yij/Ni. Let ψ=(β,σ2u) denote the
model parameters. For this model, there is no explicit expression for
the best predictor (BP) under a quadratic loss function, that is, for
^pBPi=E(pi|yi,xi;ψ), but as shown in
Jiang and Lahiri (2006b), the BP can be computed (approximated)
numerically as the ratio of two one-dimensional integrals. Jiang and
Lahiri
review methods of estimating ψ, yielding the empirical BP (EBP)
^pEBPi=E(pi|yi,xi;^ψ), which
is also the EB predictor under the same assumptions. Application of the
full HB approach under this model consists of the following basic steps:
{longlist}[(3)]

specify prior distributions for σ2u and β;

generate observations from the posterior distributions of β, σ2u and
u1,…,um by say, MCMC simulations, and draw a large number of
realizations (^β(r),σ2(r)u,{^u(r)i}), r=1,…,R, i=1,…,m, and hence realizations
y(r)ik∼p(r)ik=exp(x′ikβ(r)+u(r)i)1+exp(x′ikβ(r)+u(r)i) for k∉si;

predict: ^pi=(∑j∈siyij+∑k∉si^yik)/Ni; ^yik=∑Rr=1y(r)ik/R,k∉si.

Writing ^pi=1R∑Rr=1(∑j∈siyij+∑k∉siy(r)ik)/Ni=1R∑Rr=1^p(r)i, the posterior variance
is approximated as ^Vpost(^pi)=1R(R−1)∑Rr=1(^p(r)i−^pi)2.

Ghosh et al. (1998) discuss the use of HB SAE for GLMM, covering binary,
count, multi-category and spatial data. In particular, sufficient
conditions are developed for the joint posterior distribution of the
parameters of interest to be proper.

6 New Developments in Model-Based SAE

6.1 Estimation of Prediction MSE

As stated in the introduction, an important aspect of SAE is the
assessment of the accuracy of the predictors. This problem is solved
“automatically” under the Bayesian paradigm, which produces
realizations of the posterior distribution of the target quantities.
However, estimation of the prediction MSE (PMSE) and the computation of
confidence intervals (C.I.) under the frequentist approach is
complicated because of the added variability induced by the estimation
of the model hyper-parameters. Prasad and Rao (1990) developed PMSE
estimators with bias of order o(1/m), (m is the number of sampled
areas), under the linear mixed models (12) and (5.2.1) for the case where
the random errors have a normal distribution, and the model variances
are estimated by the ANOVA method of moments. Datta and Lahiri (2000)
extended the estimation of Prasad and Rao to the more general mixed linear
model,

yi=Xiβ+Ziui+ei,i=1,…,m,

(17)

where Xi and Zi are fixed matrices of order ni×k and ni×d, respectively, and ui and ei are
independent normally distributed random effects and residual terms of
orders d×1 and ni×1, respectively, ui∼Nd(0,Qi), ei∼Nni(0,Ri). The variance matrices are
known functions of variance components ζ=(ζ1,…,ζL). The authors develop PMSE estimators with
bias of order o(1/m) for the EBLUP obtained when estimating β
and ζ by MLE or REML. Das, Jiang and Rao (2004) extend the model of
Datta and Lahiri (2000) by relaxing the assumption of independence of
the error terms between the areas and likewise develop an estimator for
the PMSE of the EBLUP when estimating the unknown model parameters by
MLE or REML, with bias of order o(1/m). Datta, Rao and Smith (2005) show that for the area level model (12), if
σ2u is estimated by the method proposed by Fay and Herriot (1979), it is required to add an extra term to the PMSE estimator to
achieve the desired order of bias of o(1/m). See Datta (2009) for an extensive review of methods of estimating the PMSE of the EBLUP and
EB under linear mixed models (LMM).

Estimation of the PMSE under the GLMM is more involved, and in what
follows, I review resampling procedures that can be used in such cases.
For convenience, I consider the mixed logistic model (16), but the
procedures are applicable to other models belonging to this class. The
first procedure, proposed by Jiang, Lahiri and Wan (2002) uses the jackknife
method. Let λi=E(^pEBPi−pi)2 denote the
PMSE, where pi=∑Nij=1yij/Ni is the true
proportion and ^pEBPi=E(pi|yi,xi;^ψ) is the EBP. The following
decomposition holds:

λi

=

E(^p(BP)i−pi)2+E(^p(EBP)i−^p(BP)i)2

=

M1i+M2i,

where M1i is the PMSE of the BP (assumes known parameter values)
and M2i is the contribution to the PMSE from estimating the model
parameters, ψ. Denote by ^λBPi(^ψ) the
“naive” estimator of M1i, obtained by setting
ψ=^ψ. Let ^λBPi(^ψ−l)
denote the naive estimator when estimating ψ from all the areas
except for area l, and ^pEBPi(^ψ−l) denote
the corresponding EBP. The jackknife estimator of PMSE is

^λJKi

=

^M1i+^M2i;

^M1i

=

^λBPi(^ψ)

−m−1mm∑l=1[^λBPi(^ψ−l)−^λBPi(^ψ)],

^M2i

=

m−1mm∑l=1[^pEBPi(^ψ−l)−^pEBPi(^ψ)]2.

Under some regularity conditions, E(^λJKi)−λi=o(1/m), as desired.

The jackknife estimator estimates the unconditional PMSE over the joint
distribution of the random effects and the responses. Lohr and Rao (2009) proposed a modification of the jackknife, which is simpler and
estimates the conditional PMSE,E[(^p(EBP)i−pi)2|yi]. Denoting qi(ψ,yi)=Var(pi|yi;ψ), the modification consists of replacing
^M1i in (6.1) by ^M1i,c=qi(^ψ,yi)−∑ml≠i[qi(^ψ−l,yi)−qi(^ψ,yi)]. The modified estimator ^λJKi,c=^M1i,c+^M2i has bias of order op(1/m) in
estimating the conditional PMSE and bias of order o(1/m) in estimating
the unconditional PMSE.

Hall and Maiti (2006) propose estimating thePMSE by use of
double-bootstrap. For model (16), the procedure consists of the
following steps:

(1) Generate a new population from the model (16) with parameters
^ψ and compute the “true” area proportions for this
population. Compute the EBPs based on new sample data and newly
estimated parameters. The new population and sample use the same
covariates as the original population and sample. Repeat the process
independently B1 times, with B1 sufficiently large. Denote by
pi,b1(^ψ) and ^p(EBP)i,b1(^ψb1) the “true” proportions and corresponding EBPs for
population and sample b1, b1=1,…,B1. Compute the
first-step bootstrap PMSE estimator,

The double-bootstrap PMSE estimator is obtained by computing one of the
classical bias corrected estimators. For example,

(22)

Notice that whereas the first-step bootstrap estimator (20) has bias of
order O(1/m), the double-bootstrap estimator has bias of order
o(1/m) under some regularity conditions.

Pfeffermann and Correa (2012) develop a general method of bias
correction, which models the error of a target estimator as a function
of the corresponding bootstrap estimator, and the original estimators
and bootstrap estimators of the parameters governing the model fitted to
the sample data. This is achieved by drawing at random a large number of
plausible parameters governing the model, generating a pseudo original
sample for each parameter and bootstrap samples for each pseudo sample,
and then searching by a cross validation procedure the best functional
relationship among a set of eligible bias correction functions that
includes the classical bootstrap bias corrections. The use of this
method produces estimators with bias of correct order and under certain
conditions it also permits estimating the MSE of the bias corrected
estimator. Application of the method for estimating the PMSE under the
model (16) in an extensive simulation study outperforms the
double-bootstrap and jackknife procedures, with good performance in
estimating the MSE of the PMSE estimators.

{Remark}

All the resampling methods considered above are in
fact model dependent since they require computing repeatedly the
empirical best predictors under the model.

Chambers, Chandra and Tzavidis (2011) develop conditional bias-robust PMSE estimators
for the case where the small area estimators can be expressed as
weighted sums of sample values. The authors assume that for unit j∈Ui, yj=x′jβi+ej; E(ej)=0,
Var(ej)=σ2j, j=1,…,ni, with βi taken as
a fixed vector of coefficients, and consider linear estimators of the
form ^θi=∑k∈swikyk with fixed
weights wik. Thus, if θi defines the true area mean,