When you have a few hundred years worth of data on biological records, as the Smithsonian does, from journals to preserved specimens to field notes to sensor data, even the most diligently kept records don't perfectly align over the years, and in some cases there is outright conflicting information. This data is important, it is our civilization's best minds giving their all to capture and record the biological diversity of our planet. Unfortunately, as it stands today, if you or I were to decide we wanted to learn more, or if we wanted to research a specific species or subject, accessing and making sense of that data effectively becomes a career. Earlier this year an executive order was given which generally stated that federally funded research had to comply with certain data management rules, and the Smithsonian took that order to heart, event though it didn't necessarily directly apply to them, and has embarked to make their treasure of information more easily accessible. This is a laudable goal, but how do we actually go about accomplishing this? Starting with digitized information, which is a challenge in and of itself, we have a real Big Data challenge, setting the stage for data visualization.
The Smithsonian has already gone a long way in curating their biodiversity data on the Biodiversity Heritage Library (BHL) website, where you can find ever increasing sources. However, we know this curation challenge can not be met by simply wrapping the data with a single structure or taxonomy. When we search and explore the BHL data we may not know precisely what we're looking for, and we don't want a scavenger hunt to ensue where we're forced to find clues and hidden secrets in hopes of reaching our national treasure; maybe the Gates family can help us out...

People see relationships in the data differently, so when we go exploring one person may do better with a tree structure, others prefer a classic title/subject style search, or we may be interested in reference types and frequencies. Why we don't think about it as one monolithic system is akin to discussing the number of Angels that fit on the head of a pin, we'll never be able to test our theories. Our best course is to accept that we all dive into data from different perspectives, and we must therefore make available different methods of exploration.

Data Visualization DC (DVDC) is partnering with the Smithsonian's Biodiversity Heritage Library to introduce new methods of exploring their vast national data treasure. Working with cutting edge visualizers such as Andy Trice of Adobe, DVDC is pairing new tools with the Smithsonian's, our public, biodiversity data. Andy's development of web standards for Visualizing data with HTML5 is a key step forward to making the BHL data more easily accessible not only by creating rich immersive experiences, but also by providing the means through which we all can take a bite out of this huge challenge. We have begun with simple data sets, such as these sets organized by title and subject, but there are many fronts to tackle. Thankfully the Smithsonian is providing as much of their BHL data as possible through their API, but for all that is available there is more that is yet to even be digitized. Hopefully over time we can utilize data visualization to unlock this great public treasure, and hopefully knowledge of biodiversity will help us imagine greater things.