Computer Scientists Search for the Secrets of Human Vision

By Patricia Sullivan

The new iPhone X knows your face.

You don’t need a passcode or fingerprint to unlock it. Through the wonders of face recognition, you just hold the screen up to your face and you’re in. You may be awed at the split-second accuracy of the iPhone X’s ability, but Erik Learned-Miller, a professor in the College of Information and Computer Sciences (CICS) at UMass Amherst, saw this coming a decade ago. In fact, early computer science research on campus helped lay the groundwork for computer vision, the bedrock of Apple’s Face ID technology.

Learned-Miller and Subhransu Maji, codirectors of the UMass Amherst Computer Vision Lab, and their colleagues are teaching computers to see—and to understand what they see. As computer vision improves, scientists say, machines will help us in infinite ways. For starters, rescue robots will save people from disasters, medical robots will diagnose your ailments, shopping apps will help you choose the chair that fits perfectly in the odd-shaped corner of your home, your butler robot will find your slippers, and safe self-driving cars will be the norm.

But teaching computers to see is hard. “Vision is a huge part of what our brains do,” says Learned-Miller. “So much of our intelligence is built on the ability to see something and pick it up, to understand what it might be made of and how far away it might be. These things are so easy for us that we take them for granted, but they’re critical as the underpinnings of intelligence.” And, while computer vision has made progress, and that progress has picked up
rapidly in the last few years, machines still have far to go before they see as people do.

One of the thorniest problems vision scientists wanted to solve was to teach computers to see the human face. Learned-Miller began work on face recognition in the early 2000s. He says, “One way I pick research topics is to ask, ‘What are the biggest discrepancies between computer performance and human performance?’ Computers can beat people at chess and are better at complex calculations, so let’s put that aside. But people are really good at recognizing faces. It might be one of our top capabilities.”

When Learned-Miller started to explore the problem, “computers could identify a person only if they were looking straight at the camera—no glasses, no makeup, no smile—like a passport photo,” he says. “If you were looking one way with your eyebrow raised and another way with your eyebrow furrowed, the software wouldn’t work.”

So, he and other scientists began to write complex algorithms to address the problems of A-PIE (aging, pose, illumination, and expression.) At the same time, computing power increased exponentially. Camera technology improved. More and more images were digitized. And face recognition got better.

In 2007, Learned-Miller created a database of 13,233 internet images of 5,749 faces for scientists at UMass and elsewhere to use to test the accuracy of their face recognition algorithms. For each pair of images in the database, they could ask, “Are these images the same or different people?” The database, called “Labeled Faces in the Wild,” spurred somewhat of a face recognition arms race and has been cited in more than 2,000 papers to date. Top tech companies, including Facebook and Google, bragged about their results on the benchmark database, bringing wide attention to the UMass Computer Vision Lab.

In the early days of “Labeled Faces in the Wild,” computers identified the faces in the database with only about 70 percent accuracy, Learned-Miller says. Now, the latest algorithms get virtually perfect results. This improvement is why, 10 years after he launched his database and 10 years after Apple debuted the iPhone, face recognition now lives in your pocket on the iPhone X.

It also lives in evermore places: you may have had your first contact with face recognition through photo tagging in Facebook. This summer, JetBlue, working with U.S. Customs and Border Protection, began to test face recognition at Boston’s Logan International Airport; passengers could have their face scanned instead of showing a boarding pass. Law enforcement uses the technology. For instance, the FBI has a digital mug shot repository that can be searched using face recognition to match photos in the repository with photos of suspects.

COMPUTER VISION IN YOUR POCKET

Try these apps to see computer vision at work.

iNaturalist
Take a picture of an unfamiliar animal or plant and this app will identify it.

Google Translate
Hold your phone up to a menu or sign written in another language and it will instantly translate the text into English.

Computer vision does very well with facial recognition and on other specifically defined problems, concedes Allen Hanson, professor emeritus of computer science, “but when compared with human vision, computer vision is way, way, way behind.”

And he should know. With the late professor Edward Riseman, Hanson was the founding director, back in 1974, of the UMass Computer Vision Lab. UMass was a top institution in the field at a time when few universities were doing computer vision research. The lab’s work began with pattern recognition, trying to recognize handwritten characters, such as those found on mail envelopes. This work soon generalized into the problem of recognizing objects and spatial relationships. While many other researchers focused on simple scenes, such as stacks of colored blocks, UMass decided to look at the much more difficult problem of recognition of naturally occurring objects in outdoor scenes.

Throughout the 1980s and 1990s, Riseman and Hanson took on a slew of ambitious real-world research efforts: “We worked with doctors, sports physiologists, astronomers, conservationists, mathematicians,” recalls Hanson. One project involved determining the biomass of forest tracts through aerial image analysis. Another was a mobile robot the scientists dubbed Harvey Wallbanger. “It used to run into everything—though it got a lot better,” says Hanson.

The lab even worked with the Department of Defense on its pioneering, unmanned ground vehicle program. If you were on campus in the early 1990s and spotted a Humvee with a huge camera on its roof lumbering around the intramural soccer fields, you saw one of the world’s first driverless vehicles. Recalls Hanson, “We could barely make that Humvee move 5 or 10 miles an hour because of the speed of the computers.”

Today’s successors to Riseman and Hanson, professors Learned-Miller and Maji and their CICS colleagues, are driving computer vision forward as a key component of artificial intelligence. The goal is to teach computers to recognize all kinds of objects and scenes and even to interpret those images and to act on what they see.

For his part, Learned-Miller has long been fascinated by the intersection of computer vision and machine learning. For instance, he says, you can show a child an apple or two and give him the name for it and he will be able to generalize from that what an apple is. “People are incredibly good at that,” he says, whereas “you might have to show a computer thousands of images of apples for the computer to reliably identify apples. We’re interested in teaching the computer to learn on its own from just an example or two.”

The next leap forward, Learned-Miller says, will be learning without any examples at all—unsupervised learning. “I’m inspired by how children can go out in the world and learn on their own through play and exploration, through sight, and sound, and feel, without being explicitly taught anything. A child might see a completely unfamiliar fruit, a star fruit, for example, and without being given the name for it, the child will know, through his prior experience, how to categorize it as a fruit.” Unsupervised learning, where machines look for patterns and connections and come to their own conclusions, is in its early days but holds much futuristic promise.

Meanwhile, Maji’s current UMass research focuses on algorithms for high-level image recognition. He explains: “Right now, computers are very good at coarse categorization. They can identify common objects—a bird or a chair. But the computer can’t identify what materials the chair is made of or what style chair it is.” Maji works on computer vision technology that can identify styles, materials, and textures from images and train computers to label images in natural language, using such high-level recognition words as paisley, perforated, pitted, pleated, polka-dotted, porous, and potholed.

Maji’s computer vision research also encompasses shape understanding. “How do we bridge the gap between what we see in two dimensions and the real world?” he wonders. Equipped with shape understanding, a computer could navigate through its environment better and could generate a 3-D model from a sketch or a cartoon.

Maji and others expect that, before long, a new generation of computer-vision enabled glasses or headsets will remind you of people’s names, translate for you in real time, identify every object you see, display your shopping list, and more. “If you look at your car, your glasses might remind you to get your keys,” he says. “Something like this can completely change how people interact and move through the world.”

Upholding its historic place as a world-class center of teaching and research, CICS has the tools it needs to continue pathbreaking work in computer vision and many other areas. Earlier this year, the university installed a new cluster of 400 specialized graphic processing units (GPUs), which give UMass tremendous capacity to run sophisticated algorithms and train neural networks. Acquisition of the GPUs is the result of a $5 million grant from Massachusetts Governor Charlie Baker’s administration and the Massachusetts Technology Collaborative. The $5 million grant represents a one-third match to a $15 million gift supporting data science and cybersecurity research from the MassMutual Foundation of Springfield, Massachusetts.

The new GPUs may increase the university’s computing power as much as 40-fold, says Learned-Miller. He comments, “With things moving so fast, I’m skeptical of anyone who claims to know exactly where computer vision and artificial intelligence is going.”

“There’s a lot to do!” says Maji.

Hanson, one of the first in the field, gives human context to the challenge of making machines that see: “We have a brain; the brain is kind of slow. We have eyes; they’re not great sensors. Yet we do all these wonderful things with our brain and eye combination. Somewhere in there is a mystery waiting to be solved.”