89 percent of the Moscow University Herbarium have been digitized in the last three years

A senior researcher of the Moscow University Herbarium published in Taxon journal the results of his work on the Moscow University Digital Herbarium. Working within the framework of the Noah's Ark project in the last three years, the scientist managed the data mining for the largest biodiversity database in Russia devoted to plants from different regions of the world. Currently, the Moscow University Digital Herbarium is available in web to the wide audience, and in the future scientists plan to train a neural network to check the accuracy of plant identifications. They also intend to create an Atlas and a Checklist of the Russian Flora.

The collection of the Moscow University Herbarium consists of over a million specimens. Extensive work on its digitisation began in 2015 within the framework of the Noah's Ark project. The scientists imaged the specimens of dried plants and captured the data from the labels into the web-system. In three years the employees of the university together with volunteers and a partner company digitised over 900,000 records (89% of the collection). Besides images of plant specimens, the digital herbarium contains the text from the original labels and coordinates of the locations at which the plants were collected. In the course of digitisation the team used both the assistance of volunteers and automatic systems that read barcodes and help to work with geodata. For example, the algorithm can determine the place of plant collecting either by matching the name of the botanist with the date of collection or by grouping the plants by descriptions of collections sites in labels. Later on, the coordinates for each group were determined and entered manually.

The majority of plants from the Moscow University Herbarium were collected in Russia (634K). The floras of Ukraine and Mongolia are also well-presented (30K and 27K, respectively). At least 99K specimens were collected in Middle Asia. Significant collections originated from Mali, Vietnam, and North Korea.

The materials of the Moscow University Digital Herbarium are fully available on its website. The images have a CC-BY 4.0 license; i.e. the copyright limitations do not apply to them if a direct reference to the original source is provided. Scanned plants and text from the labels are accessible via a number of search engines. The herbarium records could be filtered by the characteristics of plants, collection localities, and other criteria.

The collections are not only being digitised, but also constantly growing. In 2016, 22K specimens were added to the herbarium, and 19K in 2017. The majority of new specimens arrived from Eastern Europe, Asian Russia, Middle Asia, and Caucasus. In 2016, MSU employees described 16 new species of plants from all over the world.

"Due to machine learning technologies and neural networks we will soon be able to check automatically the accuracy of the identifications of dry plants. The last year was a turning point — at least three articles on machine recognition of plants based on scans were published. Machine learning technologies will be based on available libraries of dry plants images which already exist and verified. The Moscow University Herbarium is among top-seven largest digital herbaria of the world, and the data contained in it will certainly serve a basis for this unusual futuris," — said Alexey Seregin, the author of the article, and a senior researcher at the Faculty of Biology, MSU.

The database of the herbarium specimens collected within the framework of the project will help to produce an Atlas and a Checklist of the Russian Flora in due time. The first book will contain the data on plant distribution of the Russian plants and requires the data from other herbaria, including those that have not been digitised yet. The Checklist of the Russian Flora is a standard list of all species of plants occurring in the country. It may be produced on the basis of the Moscow University Digital Herbarium in a couple of years. Both projects are important for documentation and scientific analysis of the Russian plant diversity and conservation of endangered species.