ASLens

What it does

ASLens uses the AWS DeepLens to translate the American Sign Language alphabet to speech.

The AWS DeepLens captures video and runs a deep learning model (built with AWS SageMaker) against each frame. When a letter from the ASL alphabet is recognised the AWS DeepLens plays the audio of that letter (using an MP3 file, generated using AWS Polly).

ASLens runs locally on the AWS DeepLens, as such an internet connection is not required (which eliminates bandwidth issues and increases speed, by eliminating hops between networks).

How I built it

The ASLens deep learning model was created with AWS SageMaker. Using the image transfer learning example I was able to go from training data to my first model in under an hour!

The Lambda function first optimises the AWS SageMaker model to run on the AWS DeepLens GPU, and then crops and scales each frame. Once resized, the video frame is run against the model, and if an ASL letter is detected, a corresponding MP3 file is played.

Challenges I ran into

As the letters J and Z include motion, I excluded these from the training set.

I spent a significant amount of time, using trial and error, to get AWS Polly MP3s to play on the AWS DeepLens. For anyone else struggling with this, in summary: add the ggc_user to the audio group, add resources to the Greengrass group (and the Lambda functions therein) - repeat after every deploy!

Accomplishments that I'm proud of

I still can’t believe it works! It’s like magic! My wife came up with the idea, and I thought it was too big to work. Whilst I was confident I could master the AWS DeepLens hardware, I was concerned that I lacked the experience to create the appropriate model. Thankfully, AWS SageMaker takes care of all of the machine learning heavy lifting, which meant I could focus on collating training data (and getting audio to play on the AWS DeepLens device).

What I learned

As the AWS DeepLens uses AWS Greengrass behind the scenes, I've learnt a lot about this service. This project has also inspired me to resume an online course on deep learning, so that I can advance my usage of AWS SageMaker to include custom algorithms.

What's next for ASLens

ASLens is currently limited to the ASL alphabet, and omits J and Z as they are not static. I'd like to continue my work on ASLens by identifying words, including movement and expression.