The Biodiversity Heritage Library improves research methodology by collaboratively making biodiversity literature openly available to the world as part of a global biodiversity community.
BHL also serves as the literature component of the Encyclopedia of Life .

Purposeful gaming and BHL: engaging the public in improving and enhancing access to digital texts

Although this project ended in Nov 2015, both Smorball and Beanstalk games will continue to be available in 2016 at http://smorballgame.org__ and http://beanstalkgame.org__ and the input will continue to improve OCR output from BHL. Thank you for playing and helping improve access to science resource!

Players of the more challenging Smorball game are asked to type the words they see as quickly and accurately as possible to help coach their team, the Eugene Melonballers, to victory to win the coveted Dalahäst Trophy in the fictional sport of Smorball. Each word typed correctly defeats an opposing smorbot and brings the Melonballers closer to the championships.

smor-main.png

smor-load.png

smor-gameplay.png

About Beanstalk - Players of the more relaxed Beanstalk game must type the words presented to them correctly in order to grow their beanstalk from a tiny tendril to a massive cloudscraper. The more words they type correctly, the faster the beanstalk grows. Players who accurately transcribe the most words will ascend to the top of the leaderboard as a result of their valuable contributions.

bean-grown.png

bean-start.png

bean-menu.png

Both Smorball and Beanstalk were designed by Tiltfactor and are licensed as Free and Open Source Software (FOSS).

We're not currently integrating material from other institutions in OUR build of the game, but the good news is the games and their supporting software are open source so you can fairly easily host your own.

There are a few steps to hosting your own Smorball or Beanstalk games:
1. Prepare your material. The games are OCR correction games, and in order for them to function they take data in the form of single words that different OCR software disagree on their interpretations of. Each "difference" sent to the games must have a page image URL, a location on that page image, and two strings that represent what the two OCR software THINK the word is. It's from these two strings that the games estimate whether or not the player has typed the right answer.
2. Host the game(s) and the game backend. You can find the game code here: https://github.com/tiltfactor/smorball and the code for the game database and data management server here: https://github.com/tiltfactor/SmorballBeanstalk-Backend
3. Configure the games. If you want to run Beanstalk, make sure your version of Beanstalk has its own high score database (via parse.com). If you want the facebook and twitter buttons in your Smorball to go to your social media accounts, generate facebook and twitter developer API keys, etc.

The BHL is an international consortium of the world’s leading natural history libraries, including the Missouri Botanical Garden’s Peter H. Raven Library, that have collaborated to digitize the public domain literature documenting the world’s biological diversity. This has resulted in the single largest, open-licensed source of biodiversity literature made available both through the Internet Archive and through a customized portal at http://www.biodiversitylibrary.org/. BHL is a perfect testbed for investigating alternate solutions to the generation of digital outputs both because it is a significantly large corpus (41 million pages of scanned texts accompanied by 41 million OCR outputs) and because most of its content is historic literature (the majority of BHL content was published between 1450s-1900s). OCR is also largely ineffective on hand-written texts such as field notebooks–a growing content type in the BHL.

Purposeful Gaming and BHL will demonstrate whether or not digital games are a successful tool for analyzing and improving digital outputs from OCR and transcription activities because large numbers of users can be harnessed quickly and efficiently to focus on the review and correction of particularly problematic words by being presented the task as a game.

The project runs from December 1, 2013 through November 30, 2015 and will be conducted by the Missouri Botanical Garden's Center for Biodiversity Informatics (CBI) in partnership with Harvard University, Cornell University, and the New York Botanical Garden.

A sample of poor OCR output from an 18th century publication.
This page is from Linneaus' Species Plantarum published in 1753
An image of the original text is on the left. The OCR is on the right.

A sample of poor OCR output from a hand written text. This page is from the Diaries of William Brewster, 1865-1919

Related

We recently joined the Crowdsourcing Consortium for Libraries and Archives (CCLA) . Supported by the Institute of Museum and Library Services, the goal of CCLA is to create a forum that enables all interested stakeholders to join a national conversation about the most pressing needs and challenges regarding the development and deployment of crowdsourcing technologies in the cultural heritage domain

Here is a page by Chris Freeland that covers the history of the thinking behind using games with BHL content.

Discussion minutes, software developed and presentations recorded from the Notes from Nature/iDigBio Hackathon to Further Enable Public Participation in the Online Transcription of Biodiversity Specimen Labels on December 16-20 at the University of Florida in Gainsville. https://www.idigbio.org/wiki/index.php/Transcription_Hackathon

Contact Us

For more information please contact the project's Principal Investigator, Trish Rose-Sandler at 314-577-9473 x6396 or trish.rose-sandler@mobot.org