Microsoft Project Oxford: Machines That Recognize Human Emotion

One of the things that separate humans from machines is that we're able to recognize and distinguish between different emotions when we interact with others. With advances in Artificial Intelligence (AI), Machine Learning and computing power in general, apps and programs are being created that are able to recognize speech, identify faces and now even distinguish between different emotions!

Microsoft Project Oxford is an evolving portfolio of APIs and SDKs that allow developers to add intelligent services to their website or build an app to provide a better experience for their users by taking advantage of Microsoft's machine learning capabilities. There are three groups of tools within the portfolio - Vision, Speech and Language.

The Emotion APIsreleased for public beta this week, using Microsoft's cloud-based emotion recognition algorithms, identify emotions in the faces of a photograph. The eight emotions detected are anger, contempt, disgust, fear, happiness, neutral, sadness, and surprise which are understood to be cross-culturally and universally communicated with particular facial expressions.

Once the photograph has been uploaded, the emotion recognition tool automatically identifies the faces in the photograph and then assigns a score to each of the eight emotions, with the total adding up to 1. Now that we know how the tool works, we put it to the test with some iconic images from recent history. Let's see how the tool does, shall we?

Exhibit A: President Obama Re-election Speech

Let's start with this image from President Obama's re-election speech in November, 2012. I think it's fair to say that most of us would say he's pretty happy. Well, the Emotion Recognition Tool thinks the same way too, assigning President Obama a "happiness" score of 85%.

OK, so that was fairly easy. What about Mitt Romney who lost the election vote to President Obama in 2012? We thought he cut a rather sad figure, but interestingly, his score for sadness was rather non-existent. Instead, he was rated by the Emotion Recognition tool as being a mix of Neutral and Happy. Maybe he was just happy the campaign was finally over!

Exhibit B: 48 Hours After The Lehman Brothers Crash

Here's an image of Hank Paulson, Ben Bernanke and Tim Geithner 48 hours after the collapse of Lehman Brothers in 2008. Ben Bernanke was almost 100% neutral but we found Tim Geithner and Hank Paulson's scores more interesting. As you can see from the above picture, Tim Geithner looked pretty angry and his scores show, with a 40% "angry" rating. Hank Paulson on the other hand had the highest "fear" rating of the 3, coming in at 10%.

Exhibit C: Afghan Girl - National Geographic Cover

For our final image test, we decided to mix it up a little and test the iconic "Afghan Girl" image by journalist Steve McCurry, featured on the June 1985 cover of National Geographic. To us, she looked fearful and sad, with a tinge of anger in her eyes. Let's see how the Emotion Recognition Tool scores the image.

According to the tool, her expression is mostly neutral after which "sadness" and "anger" scored the next highest, which matches our initial assessment.

Applications

We see two main areas where the Emotion Tool can have a big impact. The first is in security - where security cameras can not only recognize faces on screen, but also identify their emotions and assess whether they pose any threat. From our experiment above, it's clear that a photograph only captures the emotions at a point in time. If the Emotion Tool could be extended to a video, this would provide more samples to draw from, allowing for a more complete assessment of the subject's emotions.

The other area which could use the Emotion APIs is Artificial Intelligence that interacts with humans. We previously wrote about Gmail's new Smart Reply feature that could parse through the content of an email you received and based on the words and sentences, come up with a short, intelligent reply. Words form only a small part of our communication and usually more importance is placed on non-verbal cues such as our facial expressions. It would really be the next step in building smart robots if they could decode not just our words and sentence structure, but also our facial expression.

Rounding Up

Check out the Emotion Tool and the rest of Project Oxfordfor some cool next-generation technology. Follows us for more interesting tech stories, cool sites and gadget buying guides!

Tech enthusiast who has worked in Tech and Finance for two of the largest companies on the planet. Often the go-to guy in the office for tech advice, whether it's about hardware or software, i'm now putting it down in writing so it can be Googled for everyone.