statistics, probability, nature of evidence

Menu

Monthly Archives: September 2016

This post is mostly for people who know a bit about doing statistics and want a primer on mixed-effects models. I’ve used the plainest language possible, so even if you don’t fit that bill, you might still find it readable. At the very least, you’ll get to laugh at my nerdy attempts to make statistics more entertaining to non-experts. My main goals are:

Give accessible examples (the last being an online services simulation) of fitting mixed-effects models in R

Discuss the interpretation of and motivation for mixed-effects models

Provide short, educational GOT fan-fic, with pictures

Some of this is in response to a great post from folks at Google. They show a cool result about the predictive efficacy of a mixed-effects model, so I decided to try a related experiment. However, their introduction of the model did not thoroughly enough (for me) distinguish it from a Bayesian approach. So I’ll be discussing this as well.

I’ll provide some code I used, but not all of it. Most code I provide in-line is meant to display the basic functionality of the R library I chose to fit the mixed-effects models, called lme4. I put some other code at the end, and the rest is available upon request.

Hat tip to this cool website for providing me endless Westeros-style names to use! In the following example, set in the Game of Thrones universe, I go over what it means to have different experimental units in a regression model. This is an important concept to understanding why we might want to use a mixed-effects model in the first place. Enter the Lannister family:

Jaime’s archers

Jaime was preparing for a long journey, on official Lannister business. Knowing that things would likely come to swords, he assembled a small force to accompany him. He had some time to take special care when outfitting his troops, so he decided to run an experiment with two different kinds of bows.

Normally, Lannister forces use bows made from Northwood, as common knowledge says they are stronger than Southwood bows. However, since Jaime is a Southerner, he prefers Southwood bows out of pride and convenience. He wanted to see if Southwood bows cause a difference in archer range.

So, he chose three archers randomly from the kingdom’s forces, who each would shoot a Northwood bow and a Southwood bow 10 times. He bullied some local city-folk into recording the distance of the shots, in meters. Some of the resulting data is shown below:

The other two archers were named Fabiar Darklyn and Margan Drumm. Jaime chose to fit the following linear regression model to his experiment’s data: \[
Y_{ik} = \mu + \tau\mathbb{1}(i = 2) + \epsilon_{ik}
\] Above, \(Y_{ik}\) is the \(k\)-th shot from bow \(i = 1,2\), and the Southwood bow is considered to be the second bow. The variables \(\mu\) and \(\tau\) represent (respectively) the mean shot distance and mean effect of the Southwood bow. The final term \(\epsilon_{ik}\) is the random error (the Winds of Westeros, perhaps) contributing to the \(k\)-th shot from the \(i\)-th bow. There are \(3\times 10 = 30\) shots per bow, since each markman shoots each bow 10 times. The errors are independent shot-to-shot and Normally distributed with constant variance \(\sigma^2_e\). Here’s the summary of Jaime’s analysis:

We see that when the archers switched to a Southwood bow, their shot distances decreased by about 10 meters, on average. However, Jaime realized his p-value was not below that magical 5% threshold! He strutted merrily out of the experiment grounds, confident that his results must be due to unfair winds. (Clearly, Jaime was absent for some important Statistical Lectures of the Crown.)

“A most excellent result”

Adding archer terms

He was, of course, stopped by his brother Tyrion, who observed the whole experiment. “Now Jaime, can’t you see that you chose archers of wildly different statures? Only a fool would not account for this in his analysis.” Grumpily, Jaime re-fit his model with additional terms for the archers:

Above, \(Y_{ijk}\) is the \(k\)-th shot (now \(k\) goes from 1 to 10) from the \(j\)-th archer when he is using the \(i\)-th bow. The \(\alpha_j\) parameters represent archer \(j\)’s baseline skill. Here were Jaime’s results, using the new model:

(A small detail you’ll notice above is that there is no estimate for the 1st archer effect. The estimate of that effect is couched in the Intercept term; to predict shot distance from the 1st archer, we simply leave out all archer terms).

After accounting for the different archer, the estimated effect of the Southwood bow became extremely statistically significant. These results weren’t as pleasing to Jaime, who stormed past a smirking Tyrion. “Well? What read your results?” he heard as he slammed the training field gate behind him.

So what happened? In truth, there are two separable sources of error around the bow effect: per-shot random error, and variance among archer shooting strengths. Jaime (well, actually Tyrion) controlled for this in the second model by including archer effects.

“I have some experience in these matters”

If you’ve never encountered this statistical situation before, a visualization can help illustrate why it’s important to include archer effects, or in general, “experimental unit” effects. Here’s what Jaime’s data look like, without knowledge of the particular archer:

The red shots look somewhat above the blue shots, but something’s fishy. The shot points seem to be layered, somehow. Now let’s shape the points differently for different archers:

Now we can see clearly that the red shots are usually higher than the green shots, and the shots cluster around distinct locations that depend on the archer. By including archer terms in our model, we’re able to estimate the noise variance around the archer centers, rather than the overall center. This is beneficial from (at least) two standpoints:

Interpretation: the variance due to archer skill is clearly not random error (Winds of Westeros), so a model that does not account for the marksman will give a highly innacurate estimate of the per-shot variance.

Estimation of \(\tau\): Loosely speaking, we are better able to “see” the effect of the Southwood bow when we separate the shots by archer.

After cooling down a little, Jaime realized that (despite not achieving the result he desired) his findings could be of use to the Lannister army. He decided to run a larger experiment with 50 archers to solidify the results. Here was the outcome:

Having disregarded much of his scientific education, Jaime felt quite overwhelmed with this massive list of parameter estimates. In particular, he was confused by the few archers without a statistically significant estimate. Should he re-run the experiment without them?

(If you’re wondering why all the estimated standard errors for the archer effect estimates are the same, it’s because they all fired the same number of shots. Intuitively, this means we should have the same amount of certainty in each estimate.)

This time, Cersei happened to be hanging out around the training grounds. Overhearing his concerns, she decided to step in. “Jaime, don’t trouble yourself with those worrisome parameter estimates for the archer. I looked at your charts from the first experiment. It is certainly proper to account for archers in the model, but the only quantity we should care about right now is the variance of their skills.” Later that afternoon she developed Westeros’s first linear mixed-effects model, in which archer effects are considered to be random draws from a larger population.

“With these equations I shall secure the reputation of the Lannister name in ways my father would never have dreamed”

Formulation of the mixed-effects model

Cersei’s key observation is that Jaime chose 50 archers out of the large population of archers in and around King’s Landing. Furthermore, he did so in a random fashion, caring only to distribute the bow test across a representative population of archers. In this sense, the archer effects are random. We can incorporate this in the model, by treating them as Normal random variables with a common, fixed variance: \[
Y_{ijk} = \mu + \tau\mathbb{1}(i = 2) + \alpha_j + \epsilon_{ik}\;\;\;\;\;\;\;\;\;\;\;\;\;\;\alpha_j\sim\mathcal{N}(0, \sigma^2_a)
\] The \(\alpha_j\)’s are now not of interest to estimate. They exist simply to separate out the contribution of marksman skill to per-shot variance. This “variance component”, \(\sigma^2_a\), is now the archer parameter of interest.

It is important to realize that, though we model the \(\alpha_j\)’s as random, they do not change shot-to-shot. From a physical perspective, they are still fixed features of the archer. We regard them as random only to focus on the underlying distribution of archer skill, and its effect on the predictive model.

Random-effects vs. Bayesian randomness

Wait, if the parameters are random, isn’t this a Bayesian model? This is a reasonable question. One easy answer is that the bow effect is fixed, so it doesn’t make sense to give some parameters a Bayesian treatment and not others. But there are mixed-effects models that have no fixed parameters (usually termed “random-effects” models), so the distinction is more fundamental.

Here’s one perspective on the difference. As mentioned above, the parameters in a mixed model get their randomness from the unit (marksman) sampling process. In this sense, they are objectively random, because there is a real population (large enough to be considered infinite) of parameters being sampled from. In a Bayesian model, however, parameters are subjectively random in the sense that their prior and posterior distributions are meant to quantify our personal uncertainty about their values.

These are distinct conceptions of randomness. In fact, it is possible to model the random effects in a mixed-effects model as, additionally, subjectively random with respect to their values. Thus, we can fit both fixed-effects or mixed-effects models with fully Bayesian approaches. The choice to employ a mixed-effects model in the first place still exists in the Bayesian framework, and depends on all the same aspects of our modeling goals.

Let’s map out the Bayesian approach to our two types of models. Look back at the full fixed-effects model we considered for the bow experiments. In the most straightforward Bayesian framework, we would specify a conjugate set of of priors distributions: Normal for \(\mu\), \(\tau\), and the \(\alpha_j\)’s, and Inverse Gamma for \(\sigma^2_e\).

For the full mixed-effects model, however, it sort of looks like we already have a prior on the \(\alpha\)’s. I would argue that this is an abuse of the term “prior”. The Normal distribution on the \(\alpha\)’s still represents the sampling uncertainty in the model. In our Gibbs sampler, it is used to update the \(\alpha\)’s, but if we store the \(\alpha\)’s what we end up with should not be interpreted as a posterior distribution of a fixed parameter.

This is not merely a semantic difference: in the Bayesian approach to our mixed-effects model, we would need a prior on \(\sigma^2_a\), since that is the archer parameter of interest. In the Bayesian fixed-effects model, we gave the \(\alpha\)’s Normal priors directly, and \(\sigma^2_a\) was a fixed hyperparameter.

But the semantic distinctions are important too. Though I’m no expert in Bayesian analysis, I feel like it would be inappropriate, for instance, to talk about Bayes Factors or credible intervals associated with the \(\alpha\)’s, since from a modeling perspective, we have decided not to treat them as values to estimate or do inference upon.

When not to use a mixed-effects model

Aren’t we always picking experimental units out of a larger population? What’s special about this situation? We are, but we don’t always do so randomly. Often, we care about particular units of interest that we knew about before the experiment, so we pick them directly. There’s nothing random about this process. Also, if we have that sort of interest in the units, usually estimating their effects directly is an important goal of the analysis.

For instance, Jaime might have also cared to estimate the skills of a few archers he knew about. In this case, he’d have probably hand-picked fewer archers (like in his first experiment), and have them take more shots. As a more realistic example, consider a small school district doing some sort of standardized test analysis on their three middle schools. There will be many, many observations from each middle school, and the district is not necessarily interested in considering the larger population of middle schools. Their aim is to build a predictive model for their particular district, and do inference on parameters for their schools only.

Fitting the mixed-effects model

Alright, enough philosophy. What can mixed effects modeling actually do for our us, from an inferential or decision-based perspective? First, let’s fit the model to Jaime’s 2nd set of experimental data:

lme2 <- lmer(Distance_ ~ Bow_ + (1 | Name_), data = JaimeExp2$Data)

The switch to mixed effects modeling shifts the focus from inference on the individual archer to inference on the population of archers. This focus is important in applications from many fields, like biology (where observations may come from many organisms), economics (where observations may come from many regions or agents), or clinical trials (where observations may come from many different labs). In our example, it makes sense for Cersei to want to learn about the variance in city-wide marksman skill and separate it from random error. The estimate \(\hat\sigma^2_a\) gives her precisely this, and she won’t get that from the fixed-effects model:

The output above gives the estimates for \(\sigma_a\) and \(\sigma_e\), and the assocated confidence intervals.

Using the model to choose archers

“But wait,” Jaime might protest, “what if I care about the particular archer I happened to put to task? I now accept that Northwood bows are the obvious choice, but how does our model help me choose which men to take on my journey? At least with the fixed effects model I had parameter estimates!”

Actually, a mixed-effects model can rank the units (archers) in a functionally identical way. Once the model is fit, we can calculate the conditional expectation of a particular archer’s effect, given his experimental data. This can operate as a skill score for the archer: \[
\text{Score for archer }j = \mathbb{E}\left[\alpha_j | \bar y_{j}\right]
\] Above, \(\bar y_{j}\) is the \(j\)-th archer’s average shot distance. Remember, in the original mixed-effects model, each \(\alpha_j\) is random, with mean zero. So taking expectations shouldn’t seem useful, at first. However, knowledge of our experimental data biases our expectation of \(\alpha_j\), depending on where the data fall relative to the overall mean. The above quantity is often taken as a “predictor” of \(\alpha_j\), and archers with higher \(\bar y_{j}\)’s have higher predictions. As it turns out, the predictions are exactly correlated with the parameter estimates from the fixed-effects model: