During the 2012 election, ProPublica created an news application called Free the Files that crowdsourced political TV spending by asking users to transcribe certain data points from FCC filings. This Rails plugin extracts the "transcribable" bits from Free the Files so anyone can crowdsource data out of documents, as long as they're stored in DocumentCloud. This gem will handle building out the models, controllers and views you need, and it will also assign out documents and verify the data you get back.

To install, add

gem 'transcribable'

to your Gemfile. Then run:

bundle install

Transcribable will add a transcribable method your models. In your "master" table, (of items you'd like verified) specify which attributes you would like users to be able to transcribe, and define the one-to-many relationship like so:

Make sure your master table also has url (string) and verified (boolean) columns.

If you'd like users to be able to transcribe a field, but don't want that field to be verified (for example, interesting notes), add, for example

skip_verification :notes, :related_url

to your master model.

Run the generator, which will create everything you need for transcriptions: a migration based on your master table's transcribable attributes, a transcription model, controller, views and routes.

rails g transcribable

Then run:

rake db:migrate

If you ever need to add more transcribable columns, just add them to your master table, add them to the transcribable call in your master model, and then rerun rails g transcribable. That will generate a new migration for adding the new columns to the transcriptions table.

To populate your master table with documents for users to verify, you can harvest them from a DocumentCloud project. Fill out the documentcloud.yml file that was generated for you in your config directory, and run:

You can overwrite the assign! method -- the algorithm that chooses a filing for users to transcribe -- in your master table's model.

Note: By default, Transcribable keeps users from transcribing the same document more than once by assigning a UUID-based cookie. Obviously this isn't ideal for rigorous journalistic applications. You'll want to implement a real login system for complicated projects.