SPECIES: a standalone command line application capable of identifying taxonomic mentions in documents and mapping them to corresponding NCBI Taxonomy database entries.

Given a folder with plain text files, SPECIES based on its taxonomic name and synonym dictionary reports the taxonomic mentions (start, end position in each document), the detected term and the corresponding NCBI Taxonomy database record identifier.

Besides binomials following the Linnaean naming convention, recognised taxonomic mentions include acronyms, common names and abbreviations, as well as misspellings and the rest of the naming types supported by the NCBI Taxonomy.

To increase the corpus taxonomic mention diversity the S800 abstracts were collected by selecting 100 abstracts from the following 8 categories: bacteriology, botany, entomology, medicine, mycology, protistology, virology and zoology. S800 has been annotated with a focus at the species level; however, higher taxa mentions (such as genera, families and orders) have also been considered.

Availability: the tagger software (under BSD license) along with its species-level and complete taxonomic dictionaries and associated stopword lists (both under CC-BY license) are available here. The species-level S800 corpus (subject to Medline restrictions) can be downloaded from here.

Sister Projects: ORGANISMS, a web resource providing access to the tagging results of all abstract from the Medline database, including all taxonomic levels.