Functional regression, INLA, and how to build the World’s Slowest Fourier Transform

[This was originally going to be about multilevel modelling using INLA, but to do that flexibly you need INLA’s observation matrix mechanism, and this one deserves an explanation in its own right, so I’m introducing it by way of an easier example (for which you don’t actually need the observation matrix mechanism, but it’s clearer that way and it generalises better). For more on INLA, see the INLA website or this previous post.]

In Ramsay and Silverman’s Functional Data Analysis there’s a chapter devoted to doing regression when what you are trying to predict is a scalar quantity and what you have as covariate (input) is a function. For example, you might have measured someone’s heart rate during an effort test (which can be thought of as , a function of time), and you want to predict their performance in a marathon. The model they suggest is a functional linear regression, which is simply:

with the usual assumption that . You also need to constrain in some way (otherwise you are guaranteed to always get a perfect fit), and they choose to force to be smooth, which makes sense in a lot of contexts. In the marathon example, maybe what matters is not heart rate at the beginning of the effort test, but heart rate after a few minutes when subjects have reached a steady-state of sorts, so that would be small for around 0 and gradually grow, but we don’t expect it to suddenly jump at some .

The linear functional regression problem actually comes up quite a bit: in neuroscience and psychology, reverse correlation and classification images problems can be viewed as more or less exactly the same thing (trying to predict a neuron’s linear response from an audio signal is exactly the marathon example, with a different likelihood function). In other fields, it might be known as a signal reconstruction problem, or as an integral equation problem. In signal reconstruction we know think of as a signal to be reconstructed from measurements, and the ‘s are measurement functions. If , then we have sampled at locations . If , and is an indicator function, we have measured the average value of over a sequence of intervals . If the measurement functions are Fourier polynomials, we are measuring in Fourier space (apparently this happens in some real physical systems).

So suppose we have measured a bunch of Fourier coefficients. If the measurements are noisy we can’t just use the inverse Fourier transform, there is some uncertainty involved. The Bayesian way would be to set a prior on , and look at . If the prior can be expressed as a Gauss-Markov random field, which is the case for smoothness priors, then we can do inference using INLA. It’ll be useful to describe the framework in full generality first.

In latent Gaussian models you have, on the one hand, a set of latent variables, which we’ll call , and on the other a set of observations, . The latent variables z have a prior distribution that is Gaussian, hence the name. INLA has the further constraint that the ‘s are independently distributed given a specific linear combination of the ‘s. The whole thing can be summarised as:

(the prior)

and:

(the likelihood)

where is some distribution parameterised by , for example a Gaussian with mean . Most often will define what sort of observation we are making on datapoint , and it is useful to stack all those vectors in an observation matrix , and to define:

Usually is the identity matrix, and that is indeed the default in INLA, but modifying greatly expands the class of models that can be fitted. In our case we can take the latent field to be the values of at different time points, , arranged on a regular grid. Each of our measurements corresponds to a noisy version of , which we can approximate as:

so that (that’s not the optimal way of doing quadrature, but for fine grids it’s good enough).

Here’s a concrete example. We take to be a relatively complicated function over :

Our measurement functions will be cosines at various frequencies and phases

To get the equivalent of a discrete Fourier transform we can restrict the ‘s to be integers, and the to be either 0 (cosine phase) or (sine phase).

We construct the matrix by stacking row-wise, so that A looks something like that when plotted as a heatmap:

This bit uses a “rw2” model to enforce smoothness, and a variable “ind” which corresponds to the time index (it corresponds to in the expressions above). There are two notable changes compared to normal INLA calls. First, data, which is usually a data.frame, is now a list, because has a different length from the index variable. Second, the interpretation of the formula changes. The left hand-side defines the response as usual, but the right-hand side now defines the latent field . This makes the notation a tad inadequate, because the terms are not in one-to-one correspondence anymore. It probably should be , or something similar. I’m putting this out there in case INLA developers happen to read this post (*).

We take measurements for a sequence of (non-integer) from 0 to 15, and a noise level , and run the INLA command shown above. Below, the measurement values (blue dots), along with the posterior expectations and the usual 2.5% and 97.5% quantiles.

And here is the estimated signal

If you remove measurement noise, you’ll have the world’s slowest discrete Fourier transform! It’s actually interesting to play around with this example a bit, because there are plenty of ways to make it go completely wrong (for example when removing sine measurements).

(*) Not only would this notation be clearer, but it would also make it possible to have things like

that’d define two latent fields with different dimensions. On the other hand, I imagine it would be a lot harder to parse.