Building an interactive computer vision demo in a few hours on AWS DeepLens

Building an interactive computer vision demo in a few hours on AWS DeepLensRyan GrossBlockedUnblockFollowFollowingJan 21A couple months ago, I posted an article on explaining my job as a technology consultant to my daughter’s preschool class of 3-year-olds.

One of the more understandable parts of what I’m doing these days is working on computer vision problems.

People (even the toddler crowd) inherently understand the idea of recognizing what is in front of you.

Interestingly, as I would learn through this experience, computer vision is actually easier for a toddler to understand than most of the readers of this article.

My daughter’s generation will be the first to grow up with the expectation that they can interact with computers in the same way that they interact with each other.

The complete lack of surprise that kids had when the computer could say their names was shocking to me.

With today’s advancements in deep learning, cloud platforms, and IOT camera technology, we are able to bring this capability to computers quickly and easily.

In this article, I’ll go through the architecture and code I used to build a real-time computer vision system to greet people.

Most of the credit for this idea, architecture, and code goes to Sander van de Graaf and his Doorman entry to the AWS DeepLens challenge.

I’m going to assume that you understand coding principles, and my goal is that you can follow along to build your own demo.

Lets get started…Architecture OverviewThis system uses a host of AWS Serverless technology (no networks or operating systems to configure here).

The DeepLens is already integrated with AWS Lambda through the GreenGrass interface that runs the function code onto the device.

The remainder of the architecture uses cloud-native, event-driven processing.

The images of people uploaded from the DeepLens are moved around in S3, which triggers the appropriate actions using Lambda functions.

This has the nice side effect of sorting the images for downstream debugging and analytics purposes.

In this example, a console application uses AWS Polly to speak to the person discovered.

The actual workflow goes something like this:The DeepLens is funning a Find Person Lambda function, which loops each frame through a model for detecting a person and then identifying whether there is a face in the image.

Once the DeepLens recognizes a person in the frame, it uploads the image to an encrypted S3 bucket location.

The Guess Lambda function is configured to trigger on the S3 upload, and it passes the image along to Rekognition to use Amazon’s pre-trained facial recognition algorithms to determine which Slack user the image matches.

When the first image of a new person arrives, Rekognition will likely not find a match, so the Lambda function moves the image to an Unknown folder in S3 for further processing, setting appropriate permissions so that Slack will be able to display it.

Once the image is placed into S3, the Unknown Lambda function is triggered.

It sends a message to the Slack API that attaches the unknown image, which posts a message to a pre-configured Slack channel asking which slack user is in the image.

Once someone from your team has identified the slack user associated to the image, Slack sends a message back to AWS API Gateway, which triggers the Train Lambda function to tell Rekognition who the user is.

The next time the DeepLens uploads an image of a person that has been identified in Slack, Rekognition is likely to match that person in the Guess Lambda Function, which both posts the identified person to the Slack channel to allow a user to correct the identification if necessary.

The app also sends a message to an SNS topic including the user information and the additional Emotion Detection information.

An SQS queue is subscribed to the SNS topic to allow applications to process this information.

A simple console app listens to the SQS queue and uses AWS Polly to greet the person and note which emotion they are displaying.

Setting up the ServiceFirst you’ll need to have the following for your development environment:Git access: you’ll have to generate an SSH key and add it to your GitHub accountNodeJS & Python3 (with pip3 & pipenv)serverless framework & serverless-python-requirements plugin.

AWS CLI, along with their dependencies, installed on your development machine.

If you don’t already use it for work, you can set up a free account to test out this functionality (as of when I’m writing this, you just enter your email and click Get Started).

You will then need to create an app on the Slack API associated to your workspace.

For now, all you need is a name and pointer to the workspace you’ll be using.

Later on in the post, you will configure the Slack app to point to your AWS backend.

You will need to activate Incoming Webhooks.

This will allow the app to post to Slack.

You will then need to go to the OAuth and Permissions section and select the following scopes (channels:read, chat:write:bot, incoming-webhook, users:read):Next, you will need to Install the App to your workspace:This will bring you to a screen where you can install the app.

You will need to remember the the slack channel that you select for the app.

Security Note: the app will gain access to all user profile information in your workspace.

This will allow you to select a user for each picture that is uploaded by the DeepLens camera.

Only the opaque Slack User ID will be sent to AWS Rekognition, but with your access token, a user can get additional profile information from the slack API.

Moral of the story: be sure to protect your Slack API Token.

For added security, if Slack finds your access token on a public site like GitHub, they will automatically deactivate it for you.

Once you have the OAuth permissions set, you can copy your Access Token from either the Install App or OAuth & Permissions Page.

The Serverless deployment package uses several environment variables to allow the same code to be deployed pointing to multiple accounts.

To simplify deployments, you can create a shell script.

I call mine environment.

sh, and will reference it in the script below.

You will see references to <name> in many scriptsSecurity Note: do not commit this file to source control.

Also, if you are using a shared Cloud9 Environment, then you should only enter the slack API token environment variable directly in the terminal.

You will now need to update your Slack App and deploy your Lambda function to the DeepLens.

For the slack app, you will need to capture the API Gateway URL, which should appear on the console after the successful deployment:You will then need to update this URL in the Slack app and Reinstall it from the Install App screen.

The last thing you’ll need is to have a DeepLens camera from Amazon.

If you attended re:Invent 2017 and went to any ML sessions, you likely have one.

If you didn’t you can get one on Amazon for about $250.

The camera is a pre-packaged bundle that is essentially a mini-computer running AWS GreenGrass attached to a video camera.

The camera makes it easy to deploy Computer Vision models to run over live streaming video, especially if you’re already familiar with AWS Lambda.

I’m going to assume you’ve followed the setup guide and your DeepLens is now connected to the internet.

You should definitely capture the client certificate in order to view the video feed in a browser.

Once you have completed these steps, you should browse to the DeepLens Console.

You will need to Create a Project using the Object Detection template.

Once you choose this, you will click through to create the App, and then edit it to associate the find-person lambda function.

Next, you select it in the list of projects and click Deploy to Device.

From there, select the DeepLens that you registered and push it to the device.

When this is complete, you will see a green bar across the top of the DeepLens console letting you know that the Project has been deployed successfully.

Congratulations!.You should have a fully functional system at this point.

At this point, you can connect to the local video feed for your device.

To do this, first find the IP address on the Device Details page, then browse to https://<ip-address-here>:4000 to view the video feed.

If a person steps in front of the camera, it should put a box around them in the feed.

Once you’ve seen this happen, you can browse to the Slack channel you were sending messages to and look for the image upload with a prompt to select the user to associate.

That’s me testing detection in the dark (it works)If it isn’t working for you, feel free to ask questions in the comments below.

If there are enough questions, I’ll post a follow up with details on debugging these types of applications on AWS.