Pages

Friday, May 11, 2012

Bayes Estimators, Loss Functions, and J. M. Keynes

As a result of my recent post on Bayesian estimation of a simple consumption function, a few people emailed asking for proofs of the results that the Bayes estimator is the mean (a median) [a mode] of the posterior density, when the loss function is quadratic (absolute error) [zero-one].

Let's take a look at this, for the case of a single parameter.

Throughout, the parameter to be estimated will be called θ; y will denote the vector of random data; and θ* will be an estimator of θ. L[θ , θ*] will denote the (non-negative) loss when the θ* is used to estimate θ.

First, some preliminaries........

The risk function is just expected loss, where the expectation is taken with respect to the data density. That is, R[θ , θ*] = ∫ L[θ , θ*] p(y | θ) dy. The Bayes estimator is defined as the estimator that minimizes the "Bayes risk", or "average risk", where now the averaging is now done with respect to the prior p.d.f. for θ, p(θ). That is, BR(θ*) = ∫ R(θ , θ*) p(θ) dθ.

If the double integral that's implicit in the definition of the Bayes risk converges, so that the order of integration can be reversed (Fubini's Theorem), then it's easily shown that choosing θ* so as to minimize BR(θ*) amounts to choosing θ* so as to minimize posterior expected loss, which is defined as ∫ L[θ , θ*] p(θ | y) dθ.

Whenever the Bayes risk is defined, the Bayes and "minimum expected loss" (MELO) estimators coincide. In addition, the latter estimator is usually defined even if the Bayes risk isn't. So, it's quite common to refer to the MELO estimator as the Bayes estimator of θ, even though that's not strictly the correct definition.

Alright, so let's now consider our three loss functions:

L[θ , θ*] = a (
θ - θ*)2 ; where a > 0

L[θ , θ*] = a |θ - θ*| ; where a > 0

L[θ , θ*] = 0 ; if
|θ - θ*| < ε ; where ε > 0

= c ; |θ - θ*| ≥ ε ; where c > 0 .

Here, ε is going to be very small; and without any loss of generality, let's set a = c = 1.

Notice that each of these loss functions is symmetric. This can be unduly restrictive, and often we use asymmetric loss functions, such as the LINEX loss (e.g., Varian, 1974; Zellner, 1986). See, also, Christoffersen and Diebold (1997).

Another point to note is that the first two loss functions above are unbounded, while the third one is bounded. Bounded loss functions are often needed, for instance in quality assurance. Another example of a bounded loss that has received some attention is the "reflected Normal" loss function, suggested by Spiring (1993) and applied by various authors, including Kulkarni (2008), and Giles (2002).

Anyway, back to our three cases.............

Quadratic Loss

This one is really easy. We want to choose θ* so as to minimize Q = ∫ ( θ - θ*)2 p(θ | y) dθ .

[I've used the result that ∫ p(θ | y) dθ = 1; that is, that the posterior pd.f. is "proper". This isn't restrictive as this condition is generally satisfied, even if we use a diffuse "improper" prior to represent a state of prior ignorance.]

Absolute Error Loss

This case is a little more tricky. In class, I use one of two different ways to show that the median of the posterior p.d.f. is the Bayes estimator. For simplicity, I assume that the median is unique, but the result still holds when it isn't.

The first method, reproduced here, looks at the difference between L[θ , m] and L[θ , θ*], where m is the median and θ* is an arbitrary estimator, and then uses the result that the Bayes estimator minimizes posterior expected loss.

The second method uses Leibniz's rule for the differentiation of an integral. I have a proof here that the median minimizes the mean absolute deviation. (It's interesting to note that this basic result relates to Laplace's "first law", and hence the Laplace distribution, and the first modern treatment of this was given by Keynes - yes, Keynes (1911) - based on his thesis work of 1907 and 1908. (See Klein and Grottke, 2008, for a modern interpretation of Keynes' 1911 paper.)

Using this result, with the mean being taken with respect to the posterior pd.f., yields the posterior median as the Bayes estimator immediately.

Zero-One Loss

The zero-one loss function is sometimes called the "step loss function" (e.g., Smith, 1980). This third case is an interesting one, and I prove it by using Leibniz's rule, together with a graphical argument taken from Leonard and Hsu (1999). You can find my proof here.

For each of these three loss functions, the same results apply in the case of multiple parameters, if we marginalize the joint posterior density, and then apply our loss structures to the marginal posterior for a particular parameter. However, there are some issues that we have to be careful about if we take that route.

For example, suppose there are two parameters, θ1 and θ2, and the modes of the marginal; posteriors occur at θ1m and θ2m. There is no reason why the mode of the joint posterior density, p(θ1, θ2 | y), has to lie at the point (θ1m , θ2m)! In addition, while the median of a univariate density is easily defined, its definition is not so obvious in the case of a multivariate density. Finally, even for univariate distributions, there can be multiple modes and medians.

For more on these sorts of issues, see De Groot (1970, chap. 11) and O'Hagan (1976).