Project Gado: Building an Open Archival Scanning Robot Using Python and Arduino

Description

Project Gado is an initiative which aims to create an open-source archival scanning robot which small archives can purchase for $500 and use to autonomously scan their photographic collections. This talk presents the Gado 2, a prototype scanning robot built around Python and Arduino, and shares lessons learned from using Python as the primary language in a large-scale archival scanning project.

Abstract

The archives of the Afro American Newspaper in Baltimore MD contain over 1.5 million historical photos spanning 115 years of the city’s African American history. One of the largest Black history collections in the world, the Afro’s archives include thousands of photos which have never been seen by the public.

Why? Of the paper’s 1.5 million photos, only around 10,000 exist in a digital form; the Afro, like many small archives, simply does not have the human resources to manually digitize its collections. As a result, photos with incredible value for scholars, educators and community members alike are available only to the select few with the access, specialized skills, and time to travel to the physical archive and locate them.

Project Gado was founded in 2010 to address these challenges. The project seeks to create an open source archival scanning robot which small organizations like the Afro can use to autonomously digitize their photographic holdings. The Gado 1, a proof-of-concept machine built using Python and Arduino, has successfully scanned over 1,000 photos to date.

At present, Project Gado is developing the Gado 2 (pictured below as an early prototype), a second-generation machine which will cut scanning time by a factor of four, occupy a footprint half the size of the Gado 1’s, and require no specialized skills to assemble and operate. The project is also developing a photographic licensing site (launching May 2012) which will allow archival partners to generate a lasting revenue stream from their digital collections, creating an incentive for more small archives to adopt the Gado technology.

This talk will provide an overview of Project Gado and the Gado 2, and will address specific challenges faced and lessons learned from using Python as the primary language for an open robotics project and a major archival digitization initiative.

Technical topics covered will include Python and Arduino interfacing for machine control, Python/TWAIN integration, use of PIL and OpenCV for post-processing, and MySQL integration for image management and metadata annotation. These topics will be presented primarily in the context of a case study, rather than a tutorial; the main goal will be to show how Project Gado used these Python technologies to solve problems, and to demonstrate how the technologies could be used to solve similar problems in other cases.

The talk will conclude with a discussion of opportunities for interested developers to contribute to the Gado codebase, and for interested institutions to implement the Gado 2 in their own archives.

Outline:

Brief overview of Project Gado

The Hardware

Overview of the Gado 2

Machine demo

The Gado Codebase: Design strategies, Problems faced, Modules used

Interfacing with Arduino

Scanning with TWAIN

Capturing metadata and performing OCR with Tesseract

OpenCV and PIL for automatic post-processing

Collection management with MySQL

Challenges, Pythonic solutions, next steps

Get involved!

Opportunities for developers

Options for partner archives/organizations

Pieces of the codebase with relevance to other problems/projects and how to steal them