RailsRumble 2013: Building SeeMeSpeak Learnings from building an app in 48 hours

Biggest props to team members Bodo (@bitboxer), Florian (@Argorak) and Jan (@JanNilpferd) whom it was and honor and great joy to work with.
Be sure to follow @seemespeakdict on twitter to stay up to date with the future of this open platform

SeeMeSpeak, a crowd-sourced dictionary for sign languages

While many people take online dictionaries like Leo as granted and use them on a daily basis, there is no such “go-to” solution for sign language speakers.
The ones available are either not free (as in freedom) or not compelling to use.

Luckily, we had a native speaker of the German Sign Language on our team.
Our our domain expert (if you will) introduced us to the challenges involved in such an undertaking:
There are different languages (e.g. the German Sign Language, American Sign Language).
Furthermore a multitude of different dialects and regional variations lead to subtle but noticeable pronunciation differences.

Our goal was to build a dictionary platform that is fun to use, with minimal friction, and flexible enough to handle ontologies for several languages.

Features & Technology Stack

Like a wiki, SeeMeSpeak doesn’t require an account to add content (there’s a review flag in order to prevent abuse of of the matter).
Instead of a fixed ontology, we want to enable flexible categorization and search based on tags and text.
Modern technology and the capabilities of modern browsers to record and play videos natively were drivers on the tech side.
Obviously, internationalization plays a bigger part in this domain.
We localized the app to German and show translations for German transcriptions as a proof-of-concept.

Our tech stack looks as follows. On the frontend:

getUserMedia / Video API to record from a user’s webcam, plugin-less.
Unfortunately, we had issues with Firefox not supporting video/webm

MediaStreamRecorder. Its a standardized but unimplemented API, so we used a polyfill with the same name for recording videos from getUserMedia.
The API/polyfill library is a small abstraction of the recording process, which is generally “output the webcam image in a <video> and capture a frame every other millisecond”.
So instead of piling up BinaryArrays manually, you have a simple start and stop API available.

XHR2 to send JavaScript Blobs via multipart forms.
By thus, it made no difference to our backend whether the video was live recorded or sent using a regular <input type="file">.
The MediaStreamRecorder polyfill turned out the easiest and most future-proof way of recording from user media.

Popcorn.js for cross-browser video playback.

On the backend:

Rails 4 (thanks, Captian Obvious)

Elasticsearch for searching and metadata storage.
Thanks to its flexibility, we didn’t need an additional database.
For speaking to elasticsearch, we used the brand new elasticsearch gem instead of the retired Tire.
ElasticSearch to store ALL THE things is a refreshingly straight-forward approach.

Virtus and ActiveRecord::Validations as DSLs in our models.

Torquebox as an app server and background queuing system.
I had my doubts before, but the stack’s tooling was absolutely turn-key.
Installation via RubyGems and deployment works like a charm.
Queuing background jobs for video conversions were as easy as declaring a method as always_background.
We benefit from all cores of our machine out of the box.

Nginx for serving video files using pseudo streaming.

libav to shell out to avconv for converting incoming video streams to multiple formats

Bing Translator API for translating transcriptions.
Fun fact: Since Google shut down it’s translation API, people built hacks involving spreadsheets to use their service anyway.

Not requiring user sessions. We require your name for CC licensing/attribution reasons only.

Unfortunately, we couldn’t convince Firefox to reveal the webcam image as image/webp instead of png’s.
As we had to prioritize against handling necessary conversion on the server, Firefox users will currently kindly be asked to leave.
Despite those issues must be expected when using bleeding-edge browser features, the support should be given judging from feature matrices.

We integrated ~800 videos that were already available under a CC license.
This allowed us to present an engaging first experience instead of a looking like a generic video platform.
If you’re German, you may find searching for Angela Merkel or Telekom Taliban funny.

As for advice concerning the format of a RailsRumble, I’ll leave you with the words of How To Build an App in 48 Hours.
I was honored to rumble with veterans which probably was the reason we finished successfully, with almost all features as planned and without hacking night time or sleep deprivation.
Absolute prioritization, straightforward coding and avoiding to smart-ass complex solutions really is key here.

I also liked how the competition system is fully automated.
As opposed to other hackathron I participated and which encourage and reward faking, I really enjoyed this format for being focussed solely on shipping.

About the logo: This is the which will reveal you as a sign language
speaker. It’s a combination of spelling ILU, which expands to “I love
you”. I really like that aspect.

Almost all I know about sign language and the deaf community was picked
up alongside hacking this weekend. I’d be glad if you’d point out any
concerns in this text and apologize in advance for any errors.