Species distribution modelling

The problem of modelling the geographic distribution of a given
animal or plant species has received much attention recently. This
is a critical problem in conservation biology because one needs to
know where a given species prefers to live and what its
requirements are before conservation action can be taken. The data
available typically consists of a list of occurrences, i.e. a set
of geographic coordinates for locations where the species has been
observed. In addition, we have access to data on environmental
variables, such as climate, elevation, land uses, etc., which have
been measured or estimated across the region of interest. The goal
is to predict which areas within the region satisfy the
requirements of the species and thus form part of its potential
distribution. The potential distribution describes where
conditions are suitable for survival of the species. The
actual, or realised distribution, is often
somewhat smaller than the potential distribution either because the
species cannot reach all the areas that could potentially support
it (e.g. because of some barrier to dispersal) or because it is has
been eliminated from some areas by human exploitation, pollution,
competition with other species, etc.

Natural history museum and herbarium collections and field
observations from volunteers (collated by National Recording
Schemes and Local Record Centres) provide a rich source of such
occurrences. In the UK, his type of data is becoming
increasingly accessible via the National Biodiversity Network.
However, there is typically little or no information about the
failure to observe the species at any given location and
many locations have not been surveyed. Consequently it is usually
the case that only presence data is available to indicate
the occurrence of the species. In addition, for many species in
more obscure groups (e.g. many lower plants and
invertebrates), even this data is quite sparse.

This sort of scattered presence only data is much more
difficult to deal with than systematically collected
presence/absence or abundance data and
statistical methods to analyse and model it are a recent and
rapidly developing field of study.

The most successful modelling methods so far are based on
machine learning techniques. The computer packages Maxent and
DesktopGarp (GARP: Genetic
Alogorithm for Rule set Production) are both freely downloadable
and use the same formats for their environmental and species
observation data. The most comprehensive model comparison to date
was provided by Elith et al. (2006). The authors compared
16 modeling methods using 226 species across six regions of the
world. These analyses found differences between predictions from
alternative methods, but also found that some methods consistently
outperformed others. Maxent came out as the top rated method,
narrowly ahead of GARP. However, on practical
grounds, Maxent is much easier and quicker to use and is the
main modelling system that has been used here. It is described in
detail by Philips et al. (2006). The GARP algorithm is
described by Stockwell & Peters (1999).