Organizing and Sharing Data Online to Advance Conservation

Information Power

Grant Ballard

More than ever before, we at PRBO perform our work in a changing world. Instead of thinking of conservation science in the context of preserving a place or restoring it to some previous, less disturbed state, we are compelled to consider where ecosystems are headed in the context of global warming and staggering human population expansion. As baselines and expectations of normalcy shift on an annual basis, we need a way to monitor the natural world's responses to adaptive management strategies--in real time.

One silver lining to all the rapid change is that new technologies have significantly enhanced our ability to process and share information, connecting the observer to a global network of concerned stewards and researchers determined to make a difference. Thanks to huge improvements in our computing infrastructure, provided by our new San Francisco Bay Research Center, and bolstered by timely support from the National Science Foundation (NSF), PRBO is emerging as a leader in employing new information technologies to further conservation.

For as long as I've worked at PRBO, we've been busily organizing millions of bird observation records, collected since 1965. When I arrived, in 1991, we were limited by how few computers were available to process all the data collected before personal computers came of age. Even if we'd been able to enter all those data1, computers then lacked the capacity even to store that much information.

An ongoing question has been whether technologies that allow ever better organization, storage, query, and communication of information--"informatics" technologies--would ever catch up with technologies that allow us to gather more and more information.

Some of our information-gathering techniques need not be very high-tech to produce a lot of data in a hurry. For example, PRBO has had great success at working with hundreds of volunteer "citizen scientists" to survey thousands of miles of habitat, resulting in millions of bird observations (see page 6 of this Observer). All that was required was paper, pencils, binoculars, the US Postal Service, and a lot of dedicated, passionate people. A relatively simple increase in our ability to communicate with volunteers--say, via email or a website--has the potential to exponentially increase participation and efficiency in citizen- science projects.

As new informatics technologies have become available, we've necessarily taken a piecemeal approach to upgrades in our database. This has left us with a bit of a patchwork, with each piece (representing years of work) resulting in a PhD thesis, a few publications, a report to a habitat manager, or perhaps just a data set still waiting for analysis. We reached a critical juncture in the late 1990s, when storage capacities finally began to match and exceed our data collection capabilities, and the question of whether we'd be able to store all our data became simply a question of when. Since then, most of our historic data have been stored digitally but in loosely associated tables with various designs, limiting their utility to the conservation and scientific communities. /p/We've now reached another critical point in this race: PRBO has entered the era of powerful new large-scale data management and data sharing across high-speed networks.

Big Steps

Thanks to our recent move into the new Center, it is now possible for PRBO to transfer all of our data into a unified database, which will greatly increase our ability to assure data integrity, backup, and ease of access for future use. With two years (so far!) of full-time volunteer contributions from a professional database administrator, and hundreds of volunteer hours from staff data managers (especially Christine Abraham, Chris Rintoul, Lynne Stenzel, and Mark Herzog), we've begun making this critical transition within PRBO. For the first time in PRBO's existence, all of our data will exist in a common framework. The new database is sensitive to the multitude of standards under which data have been collected, and it integrates protocol descriptions, data documentation, and other project information to preserve the meaning and purpose of the collected data, as well as the data itself. It is the key to securely storing these millions of records for perpetuity and to providing access to them for the largest possible audience.

Why is this significant? Because environmental change on a large scale is accelerating. Even as scientists amass more and more data, the world's ecosystems face increasingly urgent challenges. It's no longer acceptable for ecological monitoring data to take years to see the light of day: we need to put the data to work right now. Only by sharing our information with a global community of researchers, policy-makers, and habitat stewards can we maximize its potential value. People with different perspectives will see patterns in the data that we can't imagine, and we will bring our expertise to bear on others' data sets, as well.

A second major step in this crucial process is also currently under way, thanks to a technology that many people use daily without knowing its name--the distributed database. When you query Amazon.com for a book, for example, that company's computers search inventories from participating booksellers across the country in order to offer the largest possible range of used and new copies, paper and hardbound, and shipping options. The "global marketplace" rests on the premise of networking distributed data.

At PRBO, we are participating in an effort to build an analogous system for avian and associated ecological information, enabling all the world's ecological data providers to share information within a common framework, the Avian Knowledge Network (see Steve Kelling's article on page 8).

Expanding Opportunities

In August 2006, PRBO, the Cornell Lab of Ornithology (CLO), and Cornell University's Computer Science department were awarded $1.2 million from the National Science Foundation's Biological Databases and Informatics program. The grant will involve close collaboration with the USDA Forest Service's Redwood Sciences Laboratory, Klamath Bird Observatory, UC Davis' Information Center for the Environment, and others. This new funding will greatly broaden the amount and types of bird monitoring data that can be archived, accessed, and analyzed across the Internet, aggregating data from hundreds of field stations. We will also develop new analytical methods that combine emerging "data-mining" techniques with more traditional statistical approaches. As a result, we will be able to search the networked database for meaningful patterns--across space and time, and at scales ranging from local to hemispheric.

For PRBO, the grant will support several staff positions, including an Avian Knowledge Network coordinator, web programmer, database administrator, and other statistical and programming staff. PRBO's initial task will be to create what we're tentatively calling the "California Avian Data Center," an access portal to the Avian Knowledge Network and a model for other regional data centers across North America in the future.

The NSF funding for PRBO's role in the Avian Knowledge Network will enable us to help lead an international effort to make ecological observation data readily available across the Internet. Think of the implications for conservation as we begin to unravel the complexities of our impacts on the planet and search for ways to mitigate these impacts. While many answers are available in the vast data banks we have already built, we'll need sophisticated methods to retrieve useful information in a timely fashion.

True to the spirit of the race I described above, between data storage capacity and speed of data acquisition, the new systems will also immensely increase our ability to gather new information--especially useful for large-scale, citizen science projects.

Imagine all the information contained in birders' checklists and the number of birders who frequently collect such data in California. CLO's "eBird" (www.ebird.org) is designed to harness those data and put them to work. In the next year, PRBO will help develop a refined version of eBird that will target tens of thousands of California birdwatchers and encourage them to submit baseline bird monitoring data. eBird currently receives several thousand checklists per month from California birders, but the data are not standardized in a way that could maximize their potential utility to managers and to bird conservation in general.

In partnership with CLO and Audubon California, we will add breeding status and basic demographic information to eBird for California, and then promote the tool as a baseline monitoring method for Audubon California's Important Bird Area program (www.audubon.org/bird/iba/). As a model, this system has potential to reach millions of birdwatchers internationally, and we have already begun discussions with partners in other countries, such as Canada, Mexico, Panama, and Colombia.

We have entered a time when the perceptions and realities of the conservation challenges facing the earth evolve on a monthly basis. The flood of scientific results and media focused on global warming has touched off an international groundswell. The first hurdle is cleared, we now believe. But in order to provide stewardship in the context of global change, we need to maintain a living network of information: quickly reporting when action is required in specific areas; elucidating the underlying processes that ecosystems need to function; and supporting adaptive approaches, in real time, to conservation in this changing world.