Monday, 8 August 2016

GBIF has just put a new backbone taxonomy into production! Since our last update of the GBIF Backbone we have received various feedback and gained insight into potential code improvements. Here is a quick summary of what has changed in this August 2016 version.

Important code changes:

much less eager basionym detection resulting in fewer algorithmically assigned synonyms and removing many false synonyms especially in plants

New sources

The following new sources have been incorporated into the august backbone:

major new version of The Paleobiology Database contributing 2,315 new families, 11,390 genera and 131,958 species names to the backbone. Feeds many isExtinct and livingPeriod values into the backbone for fossil taxa

thousands of new Plazi articles with 1,883 genera, 28,725 species and 1,935 infraspecific names. Only use genus names and below from Plazi, excluding any synonyms until we are confident they are all correctly marked up

added Artsnavnebasen source, contributing 3,640 new genera and 29,751 species names to the backbone

Backbone impact

The new backbone has a total of 5,307,978 names of which it treats 2,525,274 species names as accepted (previously 2,420,842 out of 5,208,172). More backbone metrics are available through our portal and in more detail through our API.

Occurrence impact

With a new backbone we have reprocessed all of our 642 million occurrences. The larger changes were:

Fixed various old/new world distributions of incorrectly synonymized species

Reduced the number of virus records from 157,492 down to just 5,348 records. Most occurrences were Lepidoptera, e.g. the common peacock butterfly that had formerly been mismatched because there was no classification given with the name.

Some more metrics of backbone names in our occurrences. There are:

216,699 distinct genera in GBIF occurrences. That is 55% out of all 396.990 genera in the backbone

1,226,668 accepted species in GBIF occurrences. That is 50% out of all 2,420,842 backbone species

2,059,961 distinct names in GBIF occurrences. Which is 39% of all 5.208.172 names in the backbone

The distribution of the major taxonomic groups exceeding 3%, i.e have a minimum of 36.800 species, is shown in this last diagram: