Breadcrumbs

Blog posts

This post was jointly authored with Prof Sebastian Reich, University of Potsdam. An edited version will appear on the Cambridge University Press blog page.

Computer generated forecasts play an important role in our daily lives, for example, predicting weather or economies. Forecasts combine computational models of relevant dynamical processes with measured data. Errors are always present through incomplete observations plus imperfections in the model, so forecasts must be constantly calibrated with new data. In the geosciences, this is called data assimilation. The introduction of probabilistic approaches to forecasting and data assimilation has been a major breakthrough in the field, facilitated by powerful supercomputers.

Why is the forecast being made in terms of probabilities?

The classical idea of a forecast is a prediction of the precise value of something (e.g., temperature) at a particular point in space and time. This is referred to as a deterministic forecast. In a probabilistic forecast, this is expressed as a probability instead. For example, a forecast could say that there is a 60% chance that the temperature will fall between 21 and 25 degrees Celsius, a 20% chance that it will be below this range, and a 20% chance that it will be above.

I’m mainly posting this because I’m getting tired of explaining it repeatedly! There are plenty of other better written articles about this topic but they don’t make the combination of points that I would like to make. This post is about correlation, and what you can and can’t use it for. More generally it is about being careful about drawing conclusions from data.

It is very easy to jump to conclusions when we see changes in the world around us. For example, you might be looking at the success of a particular vaccine in protecting populations from a particular disease. Let’s say that in the countries where the population is given this vaccine, the disease levels are lower.