Exploring Facial Recognition with WebRTC

Published on 7th December, 2017 by Andy

Facial recognition technology has been floating around the periphery of smartphone innovations for a few years now. With Apple introducing Face ID to the iPhone X, it was no surprise when people started asking us more about it.

The Simpleweb team love experimenting with and exploring the potential in up and coming technologies. With that in mind, we asked Cristiano – who’s taken a placement year from his Digital Media degree to work with us – to do some research and exploration around facial recognition with WebRTC.

In this post, we share some of the things that Cristiano did, what he learned, and his take on the challenges and limits of the technology.

What is WebRTC?

WebRTC is an open framework for the web that enables Real Time Communications in the browser. It includes the fundamental building blocks for high-quality communications on the web, such as network, audio and video components used in voice and video chat applications.

In other words, Cristiano said, “it’s a way to access the microphone, audio and camera through your browser.”

Exploration

To help detect whether a face is happy or sad, Cristiano created a grid that maps different facial points via the webcam. When the user moves their face, the points and grid move too. This information is then sent to Microsoft’s Emotion database which is then able to detect what face someone is pulling. Sad face 🙁 or happy 🙂

Cristiano explored a few libraries to map out the points on the face. He eventually chose Beyond Reality Face (BRFv4) as it runs on the client side, and so doesn’t rely on a server and works easily in his browser. BRFv4 detects your face and then maps the points and the grid, giving you back little dots over the face. BRFv4 can detect 68 points on a human face. Pretty cool stuff.

You can find more about BRF4 and the other libraries Cristiano explored below.

BRFv4 was able to do most of the heavy lifting, but Cristiano wanted to go further and wanted to customise the grid and points to help him control it better.

To do this, Cristiano used Canvas, an element in HTML5 that allowed him to customise the grid and points to his choosing. He was able to change the colour easily, remove lines, and even replace the dots with other geometric shapes. It essentially gave him more options to play with.

The final step in detecting the emotion was to leverage Microsoft’s emotion API. It’s easy to think of this a super big database full of ‘emotional data’ that gives back a response once a video frame is sent to it. Cristiano was able to send each video frame to the API as a `Base64` image which returned a response or emotion. Cristiano did look into Affectiva’s API which he felt was more superior than Microsoft’s API, as it provides more detailed information. However it was limited due to browser support.

Challenges and limitations

Cristiano came across a number of challenges and limitations mainly centered around the problem of device and browser support. According to Cristiano, the key limitations are: