Head-up uses facial recognition and augmented reality

Scouter is a facial recognition system and head-up display that [Christopher Mitchell] developed for his Master’s Thesis. The wearable device combines the computing power of an eeePC 901 with a Vuzix VR920 wearable display and a Logitech Quickcam 9000. The camera is mounted face-forward on the wearable display like a third eye and the live feed is patched through to the wearer. [Christopher’s] software scans, identifies, and displays information about the people in the camera frame at six frames per second.

We can’t help but think of the Gargoyles in Snow Crash. This rendition isn’t quite that good yet, there’s several false positives in the test footage after the break. But there are more correct identifications than false ones. The fact that he’s using inexpensive off-the-shelf hardware is promising. This shouldn’t been too hard to distill down to an inexpensive dedicated system.

Thanks! Yeah, the lack of capabilities was a huge challenge for me. Everything realtime is hand-written in C and C++, and uses Intel’s IPP library (which contains hand-optimized ASM) for the convolution operations. A 900MHz port would operate at abysmal speeds. :)

People always underestimate the requirements needed for image processing.
2-D convolution filters require a huge computation process.
OpenCV really makes the development process a lot easier than other development routes which has led to some researchers to stray from the typical Matlab/Scilab platform. I just wish that intel would implement a VHDL synthesis tool into OpenCV to exploit FPGAs concurrency.

Ryan, you make an excellent point, the fact that the Atom CPU, with its near total lack of concurrency, would benefit greatly even from a small and underpowered FPGA running some fast concurrent multiplies and adds to take the convolutional load off the CPU. Although it’s a nice portable platform for a wearable computing system, including connectivity and decent battery life, the power is just not there for truly realtime CNNs at this point.

Their are a lot of limitations to providing a truly portable embedded system for augmented reality/vision.
There are various hand-off methods for DSP co-processing with FPGA specifically for image processing algorithms, but a lot of the work is current (read buggy).
Concurrent realization is the way to go for image algorithms, I just hope that their becomes a better design process that doesn’t require Matlab and the required toolboxes with Xilinx’s System Generator. As that is a rather costly approach, while OpenCV is free but limited in implementation.

I’m no expert at any of this, but I’m a long time coder and graphics enthusiast since the commodore days. Would any of this be suitable to offload to a GPU rathern than FPGA? I recently looked into nvidia’s CUDA, and it appears to support OpenCV.

A bit confused here. First, where’s the augmented reality part? Unless this isn’t Christopher Mitchell’s face on screen, I’m not seeing any augmentation of reality. Second, of the 5 guesses the computer makes, 3 of them are wrong. Can you really get a Master’s if your code doesn’t work right? I’m not trying to be snarky, even though it sounds snarky. I’m just confused.

@criznach: There are very much so, and I actually implemented a CUDA version of the CNN, but needless to say that has no applicability to the kind of portable system I was aiming for.

@Alfred: No offense taken. The main point of the thesis is the detection algorithm, ie, the thing that detects faces as faces and potatoes as not faces. Secondly, the demo video is taken with me having removed the goggles and placed them in front of me, so you’re seeing the third person view as if someone other than me were wearing the goggles.

@Pat: I have not yet released my code, but once I clean it up and comment it up a bit more, I plan to do so. Chances are I will not release my training set, however, as it is an aggregation of several other training sets, as acknowledged in my thesis.

I think you could find something comparable in size with more processing power. Have you seen the nvidia s series of devices designed for small form factor? Here are a couple that could pack some punch…

Cool beans. By no means a new concept, for example, the SixthSense videos demo another take on it. But there does seem to be a distinct shortage of open-source libraries for this, OpenCV notwithstanding.

The idea of running 6FPS on a VST HMD bugs me though. A VST HMD is annoying enough to wear for long periods already. 6FPS would be “good luck walking to down the hall without hurling or smashing into something” territory. If that’s all the Eee can manage, then it shouldn’t be used as the source. Yeah, it makes the system “portable”…but the result probably isn’t practically usable.

Finally having some useful augmented reality software is gonna be pretty awesome. I’ve built two wearcomps in the past and ended up dismantling both because there was an utter lack of software for them.

It’s a just a shame that OpenCV’s Haar cascade classifiers are so awful. You can see in the video that every time it detects Christopher’s face, it uses a different sized window, sometimes cutting off bits of his face, etc. I’m amazed that the face recognition component works at all with such inconsistent inputs.

Have you tried overclocking the Atom? Using SetFSB in windows, I can overclock my Eee 1008HA’s atom to about 1965MHz – it doesn’t sound like much, but there is a pretty noticeable performance difference. It definitely reduces battery life though. Not sure if it’s possible on the Eee 901 or if it would help that much, but I’m sure there’s a solution for Ubuntu.