02/10/2016

Conveying Specimen Value through Digitization

The United States National Herbarium (Herbarium Code: US) has always been about sharing. Initially, the “diffusion of knowledge” was achieved primarily by publishing articles, hosting visitors, and sending out specimen loans. Now, the Department of Botany is taking sharing to a new level by digitizing a significant portion of the collection’s 5 million specimens using conveyor belt technology contracted from Picturae, a European company that specializes in digitization. Using this new technology, the herbarium has begun the task of imaging specimens at an average rate of about 4,000 specimens per day. The plan is to create an image for half a million plant specimens in 6 to 8 months. This rapid digitization project, which began in October 2015 with the evening primrose family (Onagraceae), is now well into the ferns, and will eventually continue with the sunflower family (Asteraceae).

The timing of the project is no coincidence, according to Sylvia Orli, IT Manager and Webmaster for the Department of Botany. The digitization project “follows rapid and amazing advances in computer technology,” she explains, with the U.S. National Herbarium joining a worldwide digitization effort that is particularly well-liked among herbaria. The once insurmountable task of making millions of specimens accessible is suddenly feasible. The herbarium digitization project, using conveyor-based technology to image natural history collections, is the first of its kind for a herbarium in the United States, and the second conveyor-based project at the Smithsonian (following the digitization of the National Museum of American History's National Numismatics Collection).

The U.S. National Herbarium has been working towards the goal of complete collection digitization for a long time. “We have always looked out for opportunities to digitize,” reports Orli, even if those opportunities involved the laborious process of punching holes into a paper card and keying them into a computer, as was done in the 1970s.

An inventory of the TypeCollection was initiated in the 1970s to make this portion of the collection easier to use. Over the course of almost 50 years, the herbarium has built a digital inventory of around 1.5 million specimens, representing more than a third of the pressed specimens in the herbarium. Using rapid digitization approaches, the entire collection could be completed in under 10 years.

The herbarium of the Muséum National d'Histoire Naturelle in Paris has already imaged about 7 million specimens using the conveyor technique, although their entries, for now, contain only the image and the name on the specimen’s folder. The U.S. National Herbarium has its eyes on a bigger, more scientifically valuable prize: a complete database of label information and images with verified species names.

The Venezuelan fern specimen, Huperzia myrsinites, collected by Julian Steyermark in 1944, has the distinction of being the 500,000th image captured by the Smithsonian Digitization project, which has digitized objects from across the Smithsonian Institution. The Smithsonian Digitization Program Office team has worked alongside staff to digitize items from nine museums: National Museum of Natural History, Smithsonian Gardens, Smithsonian National Museum of African American History and Culture, National Museum of American History, Smithsonian's Freer and Sackler Galleries, Cooper Hewitt, Smithsonian Center for Folklife and Cultural Heritage, National Air and Space Museum, and Smithsonian National Postal Museum.

The Smithsonian began exploring the idea of using a conveyor belt to image specimens about a year and a half ago. The Office of the Chief Information Officer (OCIO) and the Digitization Program Office (DPO) are in charge of managing money for Smithsonian’s digitization projects across the entire Institution. Per specimen conveyor costs run at less than a dollar, so it was decided to begin with a portion of the collection and move forward from there. “The hope is that we find a donor,” comments Orli, “but we haven’t found that resource yet.”

Though current technology makes digitization more affordable than ever before, money remains one of the primary obstacles to completing the project. Time, of course, is the other. “We can’t keep up with the level of need,” Orli laments, proposing that we may need to rethink the current workflow. Specimen preparation is time- and labor-intensive, requiring experts who can provide the correct name for a specimen, as well as a set of barcodes linking names to EMu (Electronic Museum management database) for each specimen within each folder. The ideal situation would be to complete this preparation before scanning the images, but with the speed of the conveyor, it will be more efficient to scan first, then retroactively provide the correct labels and barcodes. This arrangement is more cost-effective, if slightly slower. After the scanning is finished, another company (contracted through Picturae) will transcribe the label data and return it for inclusion in the EMu catalogue.

The improved accessibility will make the herbarium a more influential force in conservation and scientific research. In the past, the herbarium relied on specimen loans and researcher visits, limited to those individuals who had the necessary resources to travel to study the collections. The costs and difficulties associated with lending specimens, including packaging and shipping costs, are compounded by the risks of mailing fragile specimens. These risks are greatly reduced by redirecting traffic to online images and data.

We have already witnessed some of the benefits of online access. Whenever Museum Specialist Barrett Brooks received a loan request, he would digitize the requested specimens before sending them through the mail. By creating a record of what the specimens looked like before being loaned, Brooks captured an image of the specimen in its original condition in case it got lost or damaged. Often, upon seeing the images, the researcher who made the request decided that he or she can work with the high-resolution images instead, saving the specimens from the dangerous journey.

Digitization will benefit herbarium collections by creating a detailed inventory of plants and records at each herbarium. This database will allow us to track and group individual specimens, as well as keeping record of what is currently on loan (and to whom). According to Curator Pedro Acevedo, “The first thing we need to know is what we have, then we have a responsibility to put that data out there for others to use.”

Because digitization opens up new research possibilities, it will allow the collections to be used more frequently and in new ways. As Laurence Dorr (Curator and Chair of the Botany Department) muses, “When you collect specimens, you don’t necessarily know what will happen in the future.”

Herbaria develop different strengths based on their location or particular collectors’ interests. “Herbaria tend to grow in ways that form unique collections,” explains Dorr. “Information is scattered across continents and countries.” The U.S. National Herbarium is particularly strong in ferns, specimens from early exploring expeditions, types, Mexico/South America, and the Philippines. Digitization will allow researchers to integrate information from a variety of herbaria, and to keep better track of which herbaria have which specimens.

Similarly, researchers can access specimens from all over the world right from their desks. Acevedo describes how he can access both literature and specimens online. For example, he can pull up protologues (initial descriptions of a species) for species of Paullinia (Sapindaceae), and then compare them alongside type specimens on his computer. Digitization can lead to reduced travel costs by prioritizing visits only to those herbaria that have exactly what researchers are looking for. The database will also serve as an inventory that allows researchers to identify under-collected and under-researched areas.

Using specimen images, morphological traits can be used in phylogenetic and evolutionary studies. Perhaps some researchers will perform image analyses and morphometrics. With label data describing the time and location a specimen was collected, researchers will be able to easily track distributions over time to monitor invasive and endangered species. These distribution data may also be used to chart the effects of climate change or human interactions on natural populations and ecosystems. For example, because the information will be globally accessible, those working in tropical areas, such as Guaramacal National Park in Venezuela where Dorr has done some of his research, will be able to access the U.S. National Herbarium and other herbarium specimens to aid in describing new species and mapping biodiversity across difficult-to-access areas.

Specimens from the United States National Herbarium are placed on a conveyer belt for rapid digitization. (photo by Ingrid Lin)

Herbarium specimens are especially useful in tracing botanical history. A searchable, usable catalogue of specimen collection dates can allow researchers to piece together itineraries of early exploratory expeditions. Additionally, when combined with early descriptions, a digitized database will make it easier to identify types for species described before the late 19th century, when the use of types became standard practice.

Perhaps the most exciting possibility is the promise that the database will grow as new discoveries are made. Curator Eric Schuettpelz states that “by letting people see the specimens, we can increase their value.” As time goes on, researchers might incorporate photos from the field, additional literature, or research findings into the database, adding information to and improving each herbarium specimen.

Even though you cannot take a DNA sample from a digital specimen, digitization helps streamline molecular analysis as well. Curator Ashley Egan, a molecular systematist and population geneticist, describes that she “would benefit from the metadata that will come from the digitization of our specimens,” as they would allow her to integrate metadata from the U.S. National Herbarium with data from other digitized collections all over the world. Further, since the database also functions as an inventory, it will be easier to manage requests for DNA samples when they are needed.

While digitization is widely accepted as positive and necessary for moving forward, some have expressed concerns that will need to be considered. First the creation of a database could place rare and endangered plants at risk by revealing their georeferenced locations, making them susceptible to illegal poaching. It is possible, however, to mask critical location information to protect vulnerable species, limiting this knowledge to experts with approved access. Another concern involves the management of the collections themselves. Curators might be reluctant to lend type specimens if they are available online, though some research requires direct interaction with the physical specimen. Some might argue that once specimens are digitized there is little reason to maintain the actual herbarium collections, which may impact funding or other management decisions. Yet, with new technology in molecular biology and DNA barcoding, the actual specimens have proven useful in ways that a digitized image cannot.

It is important to realize that digitized databases are not a replacement for existing herbaria, but simply another tool to expand the possibilities of research. “You can ask questions you couldn’t possibly ask before,” Dorr says. “Specimens are here so you can form ideas about the world and how things are related.”

Elizabeth Jacobsen was a 2015 undergraduate intern in the Plant Conservation Unit, Department of Botany