The following selection is intended to illustrate the key research areas that formed the focus of my ecological research in the years 1998 (when I began my PhD) to 2015 (when I joined DeepMind). The selected papers are very much not what most would consider my most prestigious. Rather, each selected paper is what I consider to be the best exemplar of a given cluster of research. The other papers in each cluster can be found via my [ Google Scholar Profile ]

Emergent global patterns of ecosystem structure and function from a mechanistic general ecosystem model

Whilst at Microsoft Research, our group head Stephen Emmott set us a seemingly ridiculous challenge: model the future of all life on Earth. Ten or so person years of work later, and we had created the Madingley Model, which simulates the key aspects of the ecology of most of the multicellular animals on Earth (vegetation is modelled using the carbon model outlined below). With each tick of the model, around ten thousand trillion interacting animals eat, get eaten, metabolize, grow, shrink, give birth, move, and die, all based on ecological rules specified at the level of the individual. What emerges are patterns that show a good match to our best understanding of real ecosystems, from the lifespan and reproductive rates of individuals, to trophic pyramids at one location, to regional, global and seasonal trends in biomass and functional diversity. Hopefully, The Madingley Model will encourage others to begin to build General Ecosystems Models of this sort, which will one day help scientists and others to understand and manage the biosphere, on which we all depend for our survival. The Madingley Model can be run at any spatial scale, is open source (the code is also currently being re-engineered to make it easier to use), and we are seeing the beginnings of a community of users and contributors. The animals component of the Madingley Model has yet to be rigorously trained and tested against data, but several scientists are currently planning how to make this happen.

The climate dependence of the terrestrial carbon cycle, including parameter and structural uncertainties

Smith, Purves, Vanderwel, Lyutsarev, Emmott; Biogeosciences, 2013.

Every year biosphere circulates more than ten times as much carbon as we emit through fossil fuels. In recent years, the net result has been that the biosphere has sucked up half of our emissions, doing us an enormous favour in slowing climate change. But will this beneficial effect continue, increase, or reverse, in the future? Worryingly, current global vegetation models disagree about the answer to this question, so much so that the natural carbon cycle is one of the single largest source of uncertainty in global climate change projections. The models disagree in part because they are complex, and so they are impossible to constrain properly against data … right? We tried to prove otherwise, by using a Bayesian approach to train such a model again a diverse array of publically-available data. The result was a model where, uniquely, it was possible to trace every parameter value back to data, and to rigorously include parameter uncertainty in the model projections. At the same time the functional forms and parameter distributions can be viewed as a summary of our current understanding of the climate dependence of terrestrial carbon cycle, hence the title. Before we get carried away, it is important to recognize that the model is missing some of the key complexities that are considered (but not statistically proven) to be important in state-of-the-art carbon predictions. In particular, it lacks a direct effect of CO2 on photosynthesis. Still, the work did prove that it is possible to train and test non-linear Earth System Model components against data, something that we will hopefully see more of in coming years.

A succinct definition of ecology is ‘the study of how the interaction among organisms and their environment leads to the distribution and abundance of those organisms’. And yet, there are almost no cases where ecologists have been answer this question for an actual species or group of species for an actual region on Earth. One of the very few exceptions is this paper by my previous postdoc Mark Vanderwel. Mark combined the forest modelling approach that I had been involved with at Princeton (see below), with the latest forest inventory data, and some serious computational statistics. The resulting model had the following amazing property. You could spread all of the tree species over the whole of the Eastern US. Then, as the millions of individual trees grew, and shaded each other in different physical conditions, over a simulated time span of 500 years, they separated themselves across the continent in approximately the way that they have done in reality. Cold-adapted trees in the north, early-successional pines giving way to late-successional deciduous species in the south, and so on. Mark then went on to kill various processes in the model, to diagnose which features were responsible for this environmental sorting. For me, this represented nothing less than a true, step advance in the state of the art in ecology. I know of nothing like it. If we can do this for US trees, can we eventually do it for many other species in many other places? It remains to be seen.

Predicting and understanding forest dynamics using a simple tractable model

Purves, Lichstein, Strigul, Pacala; PNAS, 2008.

Forests are obviously very important ecosystems. They harbour two thirds of terrestrial biodiversity, store as much carbon as in the atmosphere, and provide fuel, timber and livelihoods for billions of people. What is less obvious is that forests are a wonderful ecosystem to study for a theoretically-minded ecologist. Unlike animals, trees stay still to be counted. Unlike most other plants, they live a long time, and many record their growth histories in tree rings. Moreover, during the 20th century foresters, and forest ecologists, have arrived at a surprisingly coherent view of the fundamentals that drive forest ecosystems, in particular, the asymmetry of how larger trees shade smaller trees (but not v.v.). This undertanding led to some revolutionary simulation models of forest ecosystems in the late 20th century.
Working in the group of one of the pioneers of this class of forest model, Steve Pacala, my own contribution was to help develop a vastly simplified model -- the PPA model -- that could nonetheless capture the dynamics of these models – then rigorously train and test this model against real data. The key point here was that the model was trained only against short-term measurements taken from individual trees, but the model predictions were evaluated over 10- to 100-year timescales for whole forests. Moreover, the model was even mathematically tractable, allowing us to generate two simple metrics – Z* and H20 – that could predict the competitive ability of a given species for, respectively, the late- vs early-successional niche. I feel very privileged to have been involved in generating this kind of end-to-end understanding of an actual, real ecosystem. However, this understanding was restricted to the location from whence the data were taken. Later on, in a truly groundbreaking piece of work, Mark Vanderwel extended this approach to model how forest dynamics respond to continental-scale gradients in temperature and precipitation (see above).

During the time we shared at Microsoft Research, Rich Williams and I developed a new way to model and understand food webs – ecological networks describing what eats what in natural ecosystems. Our approach combined Rich’s previous thinking about generative models of food webs, with my experience in fitting such models to data. As a result, we could estimate, for each species, its position in an n-dimensional ‘niche space’, as well as the position of its feeding optimum within this same space. By selectively imposing various top-down constraints on the model webs (e.g., that the feeding optimum must be below the position of the species) as applied to over 50 real food webs, and comparing the loss of fit conferred by such constraints, webs. Strikingly, only a small minority of the webs conformed to all of the usual assumptions. Rich later went on, with my CEES colleague Lucas Joppa, to apply the same approach to bipartite networks of plants / pollinators, and plants / herbivores, in both cases providing yet more insights into the underlying structure of the networks. Separately, working with an undergraduate intern Jordan Ryda, I was able to apply a very similar approach to protein-protein interaction networks, resulting in a model that could predict interactions with an AUE of over 70% when trained on only a fifth of the data, and providing some suggestive patterns in the positions of the proteins in niche space.

Is ‘peak N’ key to understanding the timing of flowering in annual plants?

Guilbaud, Dalchau, Purves, Turnbull; New Phytologist 2015.

This paper, which was built on years of research with my close colleague Lindsay Turnbull, introduces a simple model that describes the growth and development of an optimal annual plant, and argues that this simple model ties together a great deal of otherwise disparate observations about the behaviour of Arabidopsis Thaliana, the lab-rat of plant biology. Every year there are tens of thousands of papers about Arabidopsis, but almost without exception this research is focussed on small details about how tiny sub-systems of the overall plant work. In contrast, this paper introduces a simple, system-wide model that describes what we think the whole plant is actually trying to achieve. The focus of the paper is on flowering, but the model predicts the root, leaf and flower / seed biomass at every point from germination to senescence, and how all of that responds to nutrients, pot volume, and in principle, light, temperature, humidity and CO2. In unpublished work we experimented with fitting the model to time series data, as we have done previously for analogous models. If the model proves to be a useful for understanding annual plants in general, it may help scientists understand, and even reprogram, one special kind of annual plant -- the cereal crops that give the world most of its calories.

At a general level we know that the geographical distributions of tree species in heterogeneous regions such as Spain, and their potential responses to climate change, are affected by all kinds of processes, from tree growth and mortality, to competition, to patterns of seed dispersal often mediated by animals (which themselves show complex behaviours), to disturbances like fire. This piece of work showed that, with sufficient data, model development time, and the right kind of computational statistics, it is possible create useful simulation models that combine such ecological complexities. My Spanish colleague Miguel Zavala and I developed what ecologists would call a non-homogenous, stochastic, spatial patch occupancy model (SPOM) and what computer scientists would call it a non-homogeneous Markov Random Field [CHECK!!]. The model has a particularly interesting aspect, in that the movement of seeds among nearby sites was modelled according to the known seed-caching behaviour of the European Jay, which takes Oak seeds (acorns) but buries them in Pine woodlands. We then rigorously fit the model to large amounts of Spanish forest inventory data, using Bayesian methods. We could then simulate the impacts of fire and climate change on the distributions, and assess how important the Jay behaviour was to Oak species at regional scales, including a propagation of parameter uncertainty. The work was a huge undertaking, and pushed the limits of the statistical methods that we had available to us at the time. This was the first of what turned out to be many ecological papers on Spanish forest dynamics that I have been lucky enough to have been involved with.

This paper describes the first time that I produced an ecological model that was properly grounded in data, and that made correct predictions. The research reported in this paper, resulting from my PhD research at the University of York under Richard Law. The model described the growth of Arabidopsis plants through time, as a function of the current size, and the size and distance of a neighbour plant. To train and test the model I seeded hundreds of pairs of plants into plant pots in a large greenhouse, then measured every single plant every 2-3 days for a couple of months. Each such set of measurements took me a whole working day. I carried out this kind of experiment twice, once with plants at different distances, and once with plants seeded at different times. It sounds laborious but I have never been happier. It was like a form of ecological meditation. Anyhow, when the model was trained against the short-term observations of growth, it then correctly predicted the long-term outcome of competition for pairs of plants of different relative sizes and distances apart. When I saw the predictions match reality it was a like a lightbulb going off in my head. At least some ecological systems could be modelled precisely, at least sometimes! To a large extent, the next 15 years of my research (as illustrated above) were an attempt to show that this kind of numerical predictability extends to ecological system that at are much larger and much more complex than a pair of annual plants.