Introduction

Consider a time series yt for t=1,…,n that is independent conditional
on an unobserved state αt which is assumed to be Markov process. We
wish to perform on-line filtering to learn about the unobserved state given
the currently available information by estimating the density
f(αt|y1,…,yt)=f(αt|Yt)
for t=1,…,n. The measurement densityf(yt|αt)
and transition densityf(αt+1|αt) implicitly depend on
a finite vector of parameters. The initial distribution of the state is
f(α0).

Suppose we know the filtering distribution f(αt|Yt) at time t
and we receive a new observation for period t+1. We can obtain the updated
filtering density in two steps. First, we use the transition density to obtain
f(αt+1|Yt) from f(αt|Yt) as
f(αt+1|Yt)=∫f(αt+1|αt)dF(αt|Yt).
Then, we obtain the new filtering density f(αt+1|Yt+1) by
using Bayes’ Theorem:
f(αt+1|Yt+1)=f(yt+1|αt+1)f(αt+1|Yt)∫f(yt+1|αt+1)dF(αt+1|Yt).

If the support of αt+1|αt is known and finite, then the
above integral is simply the weighted sum over the points in the support. In
other cases, numerical methods might need to be used.

Particle Filters

Particle filters are a class of simulation-based filters that recursively
approximate the distribution of αt|Yt using a collection of
particles αt1,…,αtM with probability masses
πt1,…,πtM. The particles are thought of as a sample from
f(αt|Yt). In this article, the weights are taken to be equal:
πt1=⋯=πtM=1/M for all t. As M→∞, we want the
approximation to become better. Thus, we can approximate the true
filtering density (1) by an empirical one:

(2)f^(αt+1|Yt+1)∝f(yt+1|αt+1)∑j=1Mf(αt+1|αtj).

Then, a new sample of particles αt+11,…,αt+1M can be
generated from this empirical density and the procedure can continue
recursively. A particle filter is said to be fully adapted if it generates
independent and identically distributed samples from (2).
It is useful to think of (2) as a posterior density which
is the product of a prior, ∑j=1Mf(αt+1|αtj), and
a likelihood f(yt+1|αt+1).

Assuming that we can evaluate f(yt+1|αt+1) up to a constant
of proportionality, we can sample from (2) by first
obtaining a draw αtj with probability 1/M and then drawing from
f(αt+1|αtj). The authors describe three of the possible
methods for doing this. The most commonly used is the Sampling/importance
resampling (SIR) method of Rubin (1987). The first particle filter,
independently proposed by several authors, was based on SIR. In particular,
Gordon, Salmond, and Smith (1993) suggested
it for non-Gaussian, nonlinear state space models and Kitagawa (1996) for
time series models. The other two methods, acceptance sampling and MCMC
methods, are discussed in the article but not in these notes.

Sampling/importance resampling (SIR)

Given a set of draws αt1,…,αtM, the SIR method
first takes draws αt+11,…,αt+1R from
f(αt+1|αtj)
and assigns a weight πt+1j to each draw, where

(3)πt+1j=wj∑i=1Rwi

and wj=f(yt+1|αt+1j).
This weighted sample converges to a nonrandom sample from
the empirical filtering distribution as R→∞. To generate a random
sample of size M, a resampling step is introduced where the draws
αt+11,…,αt+1R are resampled with weights
πt+11,…,πt+1R to produce a uniformly weighted sample.

Adaption

Basically, the SIR particle filter above produces proposal draws of
αt+1 without taking into account the new information, the value of
yt+1. A particle filter is said to be adapted if it makes proposal
draws taking into account this new information. An adapted version of the
algorithm would look something like

Draw αt+1r∼g(αt+1|yt+1) for
r=1,…,R.

Evaluate the weights

(4)wt+1r=f(yt+1|αt+1r)∑j=1Mf(αt+1r|αtj)g(αt+1r|yt+1).

Resample with weights proportional to wt+1r to obtain a
sample of size M.

This algorithm allows for proposals to come from a general density
g(αt+1|yt+1) which depends on yt+1 as opposed to the
standard SIR particle filter where the proposal density does not depend
on yt+1. To understand how the importance weights above were derived,
consider the importance sampler of f(αt+1|yt+1) with
the importance sampling density g(αt+1|yt+1). We would
first take draws from g(αt+1|yt+1) and then weight
by f(αt+1|yt+1)/g(αt+1|yt+1). But from
Bayes’ Theorem,

Hence, after dividing by g(αt+1|yt+1), we have the
importance weights shown above.

This illustrates the difficulty of adapting the standard particle filter.
To obtain a single new particle we must evaluate M+1 densities:
f(yt+1|αt+1)
as well as f(αt+1|αtj) for each j=1,…,M.

Auxiliary Particle Filters

The authors extend standard particle filtering methods by including an
auxiliary variable which allows the particle filter to be adapted in a more
efficient way. They introduce a variable, k, which is an index to the
mixture (2) and filter in a higher dimension. This
auxiliary variable is introduced only to aid in simulation. With
this additional variable, the filtering density we wish to approximate
becomes

(6)f(αt+1,k|Yt+1)∝f(yt+1|αt+1)f(αt+1|αtk)

for k=1,…,M. Now, if we can sample from f(αt+1,k|Yt+1), then we can discard the sampled values of k and be left with
a sample from the original filtering density (2).

To sample from (6) using SIR, we make R proposal
draws (αt+1j,kj) from some proposal density g(αt+1,k|Yt+1) and calculate the weights

(7)wj=f(yt+1|αt+1j)f(αt+1j|αtkj)g(αt+1j,kj|Yt+1)

for j=1,…,R.

The choice of g is left completely to the researcher. The authors propose a
generic choice of g which can be applied in many situations and go on to
provide more examples in specific models where the structure of the model
informs the choice of g. Here, I present only the generic g in terms
of the SIR algorithm. The density (6) can be approximated
by

(8)g(αt+1,k|Yt+1)∝f(yt+1|μt+1k)f(αt+1|αtk)

where μt+1k is some value with a high probability of occurance, for
example, the mean or mode of the distribution of αt+1|αtk. This choice is made for convenience since
g(k|Yt+1)∝∫f(yt+1|μt+1k)dF(αt+1|αtk)=f(yt+1|μt+1k).
Hence, we can draw from g(αt+1,k|Yt+1) by first drawing
values of k with probabilities λk∝g(k|Yt+1) and
then drawing from the transition probabilities f(αt+1|αtk). The weights λk are called first stage weights. Then,
after sampling R times from g(αt+1,k|Yt+1) we form the
weights

(9)wr=f(yt+1|αt+1r)f(yt+1|μt+1kr)

for r=1,…,R. We could also resample M times from this distribution.

Auxiliary Particle Filter Algorithm

The following algorithm is based on the generic choice of g from the
discussion above. Other choices are possible, and may be more efficient
for some model specifications.

Initialize the algorithm with a uniformly weighted sample
α01,…,α0M from the distribution f(α0).

Given draws αt1,…,αtM from
f(αt|Yt), determine μt+1k and
the first stage weightsλk∝f(yt+1|μt+1k) for each k=1,…,M.

For r=1,…,R, draw kr from the indices k=1,…,M
with weights λk and then draw αt+1r from
the transition density f(αt+1|αtkr).