Bayesian estimation

Objectives: learn how to combine maximum likelihood estimation and Bayesian estimation of the population parameters.

Projects: theobayes1_project, theobayes2_project,

Introduction

The Bayesian approach considers the vector of population parameters as a random vector with a prior distribution. We can then define the *posterior distribution* of :

We can estimate this conditional distribution and derive statistics (posterior mean, standard deviation, quantiles, etc.) and the so-called maximum a posteriori (MAP) estimate of :

The MAP estimate maximizes a penalized version of the observed likelihood. In other words, MAP estimation is the same as penalized maximum likelihood estimation. Suppose for instance that is a scalar parameter and the prior is a normal distribution with mean and variance . Then, the MAP estimate is the solution of the following minimization problem:

This is a trade-off between the MLE which minimizes the deviance, , and which minimizes . The weight given to the prior directly depends on the variance of the prior distribution: the smaller is, the closer to the MAP is. In the limiting case, ; this means that is fixed at and no longer needs to be estimated. Both the Bayesian and frequentist approaches have their supporters and detractors. But rather than being dogmatic and following the same rule-book every time, we need to be pragmatic and ask the right methodological questions when confronted with a new problem.
All things considered, the problem comes down to knowing whether the data contains sufficient information to answer a given question, and whether some other information may be available to help answer it. This is the essence of the art of modeling: find the right compromise between the confidence we have in the data and our prior knowledge of the problem. Each problem is different and requires a specific approach. For instance, if all the patients in a clinical trial have essentially the same weight, it is pointless to estimate a relationship between weight and the model’s PK parameters using the trial data. A modeler would be better served trying to use prior information based on physiological knowledge rather than just some statistical criterion.
Generally speaking, if prior information is available it should be used, on the condition of course that it is relevant. For continuous data for example, what does putting a prior on the residual error model’s parameters mean in reality? A reasoned statistical approach consists of including prior information only for certain parameters (those for which we have real prior information) and having confidence in the data for the others. Monolix allows this hybrid approach which reconciles the Bayesian and frequentist approaches. A given parameter can be

a fixed constant if we have absolute confidence in its value or the data does not allow it to be estimated, essentially due to lack of identifiability.

estimated by maximum likelihood, either because we have great confidence in the data or no information on the parameter.

estimated by introducing a prior and calculating the MAP estimate or estimating the posterior distribution.

Computing the Maximum a posteriori (MAP) estimate

We want to introduce a prior distribution for in this example. Click on the option button

and select Maximum A Poteriori Estimation

We propose a typical value, here 2 and standard deviation 0.1 for and to compute the MAP estimate for . The distribution of the MAP is inevitably the same as the the one used for the parameter.
The parameter is then colored in purple.