AI System Learns to Recognize Faces and Felines

By Richard Adhikari
Jun 27, 2012 7:00 AM PT

A neural network built over the years by researchers from Stanford University and Google has managed to teach itself to recognize faces and cats.

The network consists of 16,000 processors on a cluster of 1,000 computers. After being exposed to 10 million images downloaded from the Internet, the network software learned to recognize human and cat faces, as well as human bodies, according to the researchers.

"The cool thing about this particular software was that no one was telling it that there were faces in these pictures," Carl Howe, vice president, data sciences research at the Yankee Group, told TechNewsWorld. "The software is figuring out if there are faces or not based on the fact that a lot of faces were in the data, and the neurons learned to recognize these patterns."

This self-learning network "should also be able to read expressions and determine the breed of cat if given enough visual information," Rob Enderle, principal analyst at the Enderle Group, told TechNewsWorld.

Training the System

The images selected were random, each picked from one of 10 million videos on YouTube. Those videos were also selected at random.

"The key to this research is that we didn't explicitly pick out particular images to train it on, such as images of a cat," Google spokesperson Jason Freidenfelds told TechNewsWorld. "Instead, we fed it 10 million random images, and it automatically learned certain recurring patterns, including roughly what a cat's face looks like. That's because, presumably, cats are relatively common on YouTube."

The team trained a 9-layered locally connected sparse autoencoder with pooling and local contrast normalization on the 10 million images in the network's dataset. A sparse autoencoder is a learning algorithm. The basic version isn't as good as the best hand-engineered features, but the features it can learn are more useful for a range of problems. More sophisticated versions of the algorithm are as good as, or better than, features coded by hand.

Kitty Spotter

The researchers concluded that it's possible to train a face detector without having to label images as containing a face or not. Previously, building a face detector required images labeled as containing faces, often with a bounding box around the face. This made it difficult to solve problems where labeled data was scant.

Further, the neural network will recognize faces even if the images viewed are slanted or made larger or smaller than the norm.

The researchers trained the network to recognize 20,000 object categories from ImageNet with nearly 16 percent accuracy. This is 70 percent better than results from previous state-of-the-art technology.

The neural network did more than recognize cats; it reportedly assembled a digital image of a cat by combining together features from various images of cats it had stored.

Faster and Smarter

Previous learning algorithms have only succeeded in learning low-level features such as edge or blob detectors, the researchers said. That's possibly because training deep-learning algorithms to yield good results is time-consuming. That might be why high-level detection is difficult, the researchers surmised.

"It appears that this research project was able to build a machine that can 'see' objects in images despite never having been told what the objects are or what they should look like," the Yankee Group's Howe said. "What's more remarkable is that the system learned to recognize a wide variety of objects and recognize the total sum of these objects with 16 percent accuracy.."

The neural network's accuracy may improve over time because, "as more information is collected and sub-categories result, you'll have the ability to make more detailed determinations," Enderle said. "This is a foundation step to building a computer that truly can be taught rather than be programmed."

Such networks could be useful in security and in tracking criminals, Enderle suggested.

However, "both kittens and babies can achieve similar results with far fewer resources, and those entities can be created with far less skilled labor," the Yankee Group's Howe pointed out.