Things used in this project

Hardware components

Software apps and online services

Google Cloud Functions

Google AutoML

Expo: React Native

Story

Video Demo of ASL Translator

Inspiration

Over 500 million people has impaired hearing, and while Google translate handles over 200 languages, it does not translate American Sign Language. We thought we can create something powered by computer vision in the cloud to help with that. Introducing the American Sign Language Translator, powered by cloud machine learning.

What it does

Take our your phone, open the trAnSLate app, and hear ASL turn into words as our app does the heavy lifting.

ASL Translation in Progress

After hitting start, the app takes a picture and classifies it in under a seconds. If no hand is detected, trAnSLate inserts a space and reads aloud the previous word. trAnSLate acts as a live transcribe service for ASL.

How we built it

The app is built using Node.js Expo, which takes an image and converts it into base64. The base64 string is then passed to Google Cloud Functions. The cloud functions call Cloud AutoML Vision, which classifies the image with a convolutional neural network trained from over 17,000 custom labeled images that we collected. After the classification, the translated character is passed back to the app, which displays the characters on the screen. There is an option to send a text message to a phone number via Twilio integration.

Services We Used

Challenges we ran into

Training an effective machine learning model was challenging. Our first models were not able to detect signs in different backgrounds. To overcome background noise, we took moving videos of each sign against different backgrounds and extracted the frames as training data. In the end, we had about 800 training images for each letter in the alphabet for a combined 17,000 custom labeled images. This is a major investment that paid off tremendously as there are no large data sets available online for American Sign Language.

Sample of Data Set

What's next

We will expand our machine learning model from single images to a sequence of images to capture motion. By doing so, we will be able to accurate translate the entirety of ASL, including waving motions, into text.