Why facial recognition tech failed in the Boston bombing manhunt

Despite what you see on TV, facial recognition isn't a silver bullet.

The drivers license and student ID photos of Dzhokhar Tsarnaev, and images published by the FBI and Massachusetts law enforcement during the manhunt for him and his brother.

FBI, photo illustration by Sean Gallagher

In the last decade, the US government has made a big investment in facial recognition technology. The Department of Homeland Security paid out hundreds of millions of dollars in grants to state and local governments to build facial recognition databases—pulling photos from drivers' licenses and other identification to create a massive library of residents, all in the name of anti-terrorism. In New York, the Port Authority is installing a "defense grade" computer-driven surveillance system around the World Trade Center site to automatically catch potential terrorists through a network of hundreds of digital eyes.

But then an act of terror happened in Boston on April 15. Alleged perpetrators Dzhokhar and Tamerlan Tsarnaev were both in the database. Despite having an array of photos of the suspects, the system couldn't come up with a match. Or at least it didn't come up with one before the Tsarnaev brothers had been identified by other means.

For people who understand how facial recognition works, this comes as no surprise. Despite advances in the technology, systems are only as good as the data they're given to work with. Real life isn't like anything you may have seen on NCIS or Hawaii Five-0. Simply put, facial recognition isn't an instantaneous, magical process. Video from a gas station surveillance camera or a police CCTV camera on some lamppost cannot suddenly be turned into a high-resolution image of a suspect's face that can then be thrown against a drivers' license photo database to spit out an instant match.

Not yet. Facial recognition technology has gotten a lot better in the past decade, and the addition of other biometric technologies to facial recognition is making it increasingly accurate. Facial recognition and other biometric and image processing technologies, such as gait recognition, helped law enforcement find the suspects in the rush of people around Copley Place that day with the help of retailers' own computerized surveillance systems.

The fact is that it's much more likely for a bank or department store to know who you are when you walk past a camera than for law enforcement to make an ID based on video footage. That's because you give retailers a lot more information to work with—and the systems they use are arguably better suited to keeping track of you than most police surveillance systems.

Three steps to (sometimes) finding the perfect match

Under the best circumstances, facial recognition can be extremely accurate, returning the right person as a potential match more than 99 percent of the time with ideal conditions. But to get that level of accuracy almost always requires some skilled guidance from humans, plus some up-front work to get a good image. Depending on the type of facial recognition system, finding the right match usually requires three stages of processing.

Face detection and enhancement

The software looks for patterns in the image that match models in its algorithms for faces. A simpler form of this technology is used in consumer cameras, in photo apps for mobile devices, and in entities like iPhoto or Facebook.

In some circumstances, even detecting a face within an image can be difficult for software without human guidance. Lighting, camera angle, and facial expression can all muddle the process. A photo will often be taken from an angle that requires investigators to do preprocessing. "Typically, you'll do some preprocessing of the image," said Brian Martin, director of Biometric Research for facial recognition system provider MorphoTrust USA. "You can try to get rid of blur or the interlacing artifacts from older cameras. Some people use Photoshop to clean up the image; our company has what we call ABIS Face Examiner Workstation, which is face-specific tools to clean up an image. You can take a non-frontal looking face and physically model it as a three-dimensional image, then rotate it toward the camera and re-render a new face. So you do this sort of cleanup of the image and then submit it to the database."

Enlarge/ At left, a face from an ATM camera video is recognized and evaluated for facial recognition quality; at right, a photo of a face is enhanced with a 3D model to improve its searchability.

If an image is too low-resolution, sometimes multiple images can be combined to create a higher-resolution composite. Lower resolution images may still work, but the results are more likely to misidentify the person—or miss him or her completely.

"Hollywood does a pretty good job of creating a myth that you could extract a better image by enhancing and zooming where information wasn't captured," said Masayuki Karahashi, senior vice president of engineering for surveillance and video analysis technology firm 3VR. "You're not going to create more information out of nothing.

Feature registration and extraction

Next, the software tries to identify common facial features to use as reference points to extract a "faceprint"—the centers of the eyes, tip of nose, and corners of the mouth are common features used for this. Again, depending on the quality of the image, a human may have to help the software with this, marking the location of reference points to help the software along.

With the reference points set, the software then adjusts the image to "normalize" it against the images in its database—making sure the face is scaled to the same size and removing other elements of the photo that might reduce the likelihood of a match. Then it runs calculations on the image to generate a faceprint. This is a binary value based on a mathematical representation of the patterns in the face.

There are several approaches to creating a faceprint. Some systems use algorithms that measure the distance between sets of features in the normalized image, while others detect contours and "facial boundaries."

Feature extraction is "the classic way" to gather data for facial recognition, according to Parham Aarabi, a professor of computer science at the University of Toronto and CEO of facial software firm ModiFace. “Another way is to do a direct match," he noted. This technique involves using the facial image itself as the basis of comparison rather than using an algorithmic representation. "A lot of the more recent work in facial recognition has been in direct face-to-face matching," Aarabi said. Other systems use multiple images of an individual to "learn" their facial characteristics to build a model, much like the Faces feature in Apple's iPhoto.

But in all of these approaches, the more detailed a source image is, the better. More data to base the faceprint on means a higher likelihood of success in the next steps—matching and classification.

Matching and classification

The feature-based faceprint of a subject can be used in a number of ways, depending on the facial recognition application. Some systems perform additional indexing based on the images to classify the subject for narrowing searches, processing the faceprint with algorithms that can estimate the age and gender of the subject. Other characteristics, such as skin tone and facial features, can be used to help index the image as well, allowing for searches to be narrowed by race, estimated weight, or hair color.

Classification can also be used with what Martin called "short-term biometrics"—things such as gait recognition, or clothing, or other identifying features (such as a black backpack). These all can help locate a subject within a set of images or video streams. This approach was used to find the Tsarnaev brothers in surveillance video and other images collected from multiple sources by law enforcement. Video analysis showed Dzhokhar walking quickly and calmly away from the site of the second bomb as the first exploded; characteristics such as the brothers' ball caps and backpacks were used to quickly identify the suspects by retailers. These businesses had surveillance systems from vendors such as 3VR that could recognize relevant footage in their systems to provide to law enforcement.

"The fact that they were able to start looking for a person with a white baseball cap, a black bag—they were able to use those as variables to pull up videos," said Masayuki Karahashi, 3VR's senior vice president of engineering. Several 3VR customers were able to automatically pull results from their systems to provide to law enforcement from terabytes of video footage from the day.

Finding the actual identity of someone in an image still requires a match against a facial database. In a facial recognition search, the binary faceprint of the subject is checked against those of a collection of "candidate" images. The bigger the pool of "candidates," the longer it takes to find a match—and the larger the pool of possible matches will likely be.

Performing matching, like everything else in facial recognition, requires significant computation resources. "Given how fast computers have become, it's not that much of an issue," said Aarabi. "If you narrow down a database to 10 million potential matches, that can be done in a reasonably short amount of time, so matching is not really a bottleneck anymore."

According to some National Institute of Standards and Technology benchmarks performed in 2010 (PDF), "Using the most accurate face recognition algorithm, the chance of identifying the unknown subject (at rank 1) in a database of 1.6 million criminal records is about 92 percent." But the study found that for larger data sets, such as the FBI's 12 million image database, the accuracy of searches rapidly degrades. "For other population sizes, this accuracy rate decreases linearly with the logarithm of the population size. In all cases a secondary (human) adjudication process will be necessary to verify that the top-rank hit is indeed that hypothesized by the system," the authors of the study wrote.

Under ideal conditions, a facial recognition scan can at least come close to how such things play out in the movies. And even though facial recognition requires significant computing power to pull off, cloud computing and improved graphics processing are making it a lot easier to deploy—even to consumer devices. In testimony before the Senate Judiciary Committee last July, MorphoTrust's Martin told senators, "The technology is currently at a state where these face recognition algorithms can be deployed in anything from cell phones to large multiserver search engines capable of searching over 100 million faces in just a few seconds with operational accuracy."

Useful as it can be the potential for gross misuse of the tech, particularly in light of how common LEO and governmental misuse of existing technologies are, is fairly scary to me.

We already can't count on various organizations not demanding cell records and the like in mass, the potential for them to similarly misuse private security cameras to bring almost omniscient surveillence is kinda scary.

I don't mind security cameras, in general, because they're pretty specific: you go to a gas station, and there's a camera watching the door and some general coverage of the interior to inhibit shoplifting. But combine a whole downtown's worth of cameras, some vein of biometric identification, and the government's lackadaisical attitude towards citizen privacy completely changes the equation.

It's rather like my attitude towards police use of drones: I don't mind them being used in a specific manner (e.g. SAR, a manhunt, known narcotic paths), but I strongly dislike the idea of using them for basic wide-area observation (just flying around the town watching for anything interesting). However, as I'm not really confident in LEO's ability to not misuse them, I tend to be fairly leery.

Useful as it can be the potential for gross misuse of the tech, particularly in light of how common LEO and governmental misuse of existing technologies are, is fairly scary to me.

We already can't count on various organizations not demanding cell records and the like in mass, the potential for them to similarly misuse private security cameras to bring almost omniscient surveillence is kinda scary.

I don't mind security cameras, in general, because they're pretty specific: you go to a gas station, and there's a camera watching the door and some general coverage of the interior to inhibit shoplifting. But combine a whole downtown's worth of cameras, some vein of biometric identification, and the government's lackadaisical attitude towards citizen privacy completely changes the equation.

It's rather like my attitude towards police use of drones: I don't mind them being used in a specific manner (e.g. SAR, a manhunt, known narcotic paths), but I strongly dislike the idea of using them for basic wide-area observation (just flying around the town watching for anything interesting). However, as I'm not really confident in LEO's ability to not misuse them, I tend to be fairly leery.

Accept mass blanketing of public spaces with security cameras when government accepts publicly accessible versions of the same in all official offices, bathrooms and hallways of power as well as at all meetings with registered lobbyists. I mean if they aren't doing anything they're ashamed of what's the issue?

We get the worst of the two at the end: no privacy and terrorism still...

Trying to pinpoint terrorists is very difficult due to the base rate: only 0.0000000000001% of the population are terrorists, so a good test has to be 1 - 0.0000000000001% = 99.9999...99999% accurate to be effective. Otherwise, you end up with many, many more false positives than real catches. This is not a problem of how good or complete your biometrics database is.

So pick your poison: getting an overwhelmingly amount of false positives that swamp law enforcement officers, or letting some terrorists through.

It's really sad that a differentiation needs to be made between TV dramas and real life. It's also really sad how low scientific literacy is, especially with technology as pedestrian as digital photography. You see the "tech gurus" on CSI Miami blowing up a 7-11 video feed and getting professional head-shot like quality from 250 feet away, with what is most likely a VGA camera. There's no understanding that the quality of the image is strongly correlated with the information captured (MP). You can't create that information if it isn't there.

A brief summary of TV Crime drama vs Real Life crime:

TV:A house gets broken into and robbed. The TV cop comes the next day and collects a few convenient fibers, hairs, and prints. He takes them back to the lab at 5:00PM and gets the results back the next morning, while matching the 8 perfect prints to an unknown bad guy in their master bad guy database.

Real life: (true story)My friend's house gets broken into while he was on vacation. The thieves stole his Xbox and about $350 worth of games. A detective comes 3 days later. There are no prints, fibers, and only one hair that is suspected to belong to my friend's live-in girlfriend (same color and length). The detective says he needs to get special permission to get the hair tested because the case only qualified for petty larceny. When asked when they would get the results of the DNA test back (should one be undertaken), the detective replied the fastest was 8 weeks, but an estimate of around 12 weeks was more reasonable.

We get the worst of the two at the end: no privacy and terrorism still...

Trying to pinpoint terrorists is very difficult due to the base rate: only 0.0000000000001% of the population are terrorists, so a good test has to be 0.0000000000001% accurate to be effective. Otherwise, you end up with many, many more false positives than real catches. This is not a problem of how good or complete your biometrics database is.

So pick your poison: getting an overwhelmingly amount of false positives that swamp law enforcement officers, or letting some terrorists through.

--B

I think you mean a test has to be beyond 99.9999999999999% accurate...

It's rather like my attitude towards police use of drones: I don't mind them being used in a specific manner (e.g. SAR, a manhunt, known narcotic paths), but I strongly dislike the idea of using them for basic wide-area observation (just flying around the town watching for anything interesting). However, as I'm not really confident in LEO's ability to not misuse them, I tend to be fairly leery.

Indeed. The common term for this kind of thing is a "fishing expedition". My bet is that if, for example, you're an attractive female who likes to sunbathe in her fenced-in backyard or apartment roof, you're going to end up on a lot of police "watch lists".

It's rather like my attitude towards police use of drones: I don't mind them being used in a specific manner (e.g. SAR, a manhunt, known narcotic paths), but I strongly dislike the idea of using them for basic wide-area observation (just flying around the town watching for anything interesting). However, as I'm not really confident in LEO's ability to not misuse them, I tend to be fairly leery.

Indeed. The common term for this kind of thing is a "fishing expedition". My bet is that if, for example, you're an attractive female who likes to sunbathe in her fenced-in backyard or apartment roof, you're going to end up on a lot of police "watch lists".

Wasn't there a study in the uk that monitored the use of security camera operators? Basically found that the pan and zoom functions were used to focus in on young black males arround 10% of the time, young attractive women 80+% of the time.There was another study that attempted to monitor the attention of people watching security camera by using eyetracking cameras in the security rooms, it was shut down as they monitor watchers objected to having camera watching them in the work place.

We spent billions on this stuff, it don't really work unless the guy looks directly at the camera, in good lighting, while standing still.......

The 9/11 attack, WTC bombing in 94, USS cole, Boston bombing, and all of the other attacks against America have probably cost less than a couple of Million Dollars.We have spent more on body armour for NYPD officers than all the attacks cost combined.......

We need a better plan. Preferably one that doesn't cost so much we go bankrupt trying to be safe, all while giving up our liberties in the name of said safety our government has yet to provide.....

To be effective, facial recognition tech would have to know a priori who you are looking for. Otherwise, every time there is an incident, you would have to run every face in the the bad guys data base against all faces found in images of the scene! This will never do.

What is really needed is "behavioral recognition" software. Unfortunately, that kind of software only runs in human brains.

All of this is political show. The politicians spend exorbitant amounts of money to show their constituents how they are "doing something" even when that "something" is either useless or actually detrimental to the effort it is meant to convey.

So to fight the war on "terror" or "drugs" or "pedophiles" or the next great catchphrase to attract the public's attention, they will use whatever they can to get elected and stay elected even if it's useless or marginally helpful.

Facial recognition is just another tool in that arsenal. Next we'll be hearing (more) how they can read thoughts into people's actions and declare that the solution to the war on whatever. Even though thoughts are not always (usually?) acted upon.

Useful as it can be the potential for gross misuse of the tech, particularly in light of how common LEO and governmental misuse of existing technologies are, is fairly scary to me.

We already can't count on various organizations not demanding cell records and the like in mass, the potential for them to similarly misuse private security cameras to bring almost omniscient surveillence is kinda scary.

I don't mind security cameras, in general, because they're pretty specific: you go to a gas station, and there's a camera watching the door and some general coverage of the interior to inhibit shoplifting. But combine a whole downtown's worth of cameras, some vein of biometric identification, and the government's lackadaisical attitude towards citizen privacy completely changes the equation.

It's rather like my attitude towards police use of drones: I don't mind them being used in a specific manner (e.g. SAR, a manhunt, known narcotic paths), but I strongly dislike the idea of using them for basic wide-area observation (just flying around the town watching for anything interesting). However, as I'm not really confident in LEO's ability to not misuse them, I tend to be fairly leery.

Aside from outright selling this data to "third parties", give an example of LEO misuse of said data?

Any images (facial recognition) of those mercenaries from http://www.thecraft.com? You'll recall the FBI and Boston PD insisting that only approved, hand-selected photos would be analyzed -- they weren't interested in the mercenaries with the black backpacks and matching black-and-tan outfits. Didn't want to discuss them at all. Kind of odd for a supposed investigation. They certainly did not want any freelance help from the Infobahn.

So now we have one dead alleged perp who's not going to be talking, and a second alleged perp who somehow received damage to his throat and will not be talking. HOW VERY CONVENIENT. So, yeah, let's put up some photos of the perps to continue with the wave of hatred for what certainly appears to be a false flag operation.

Are you for real? I'm not going to even cite the reasons you are wrong.

On another note, I still feel a little spooked about general facial recognition surveillance. Not that my life is that exciting, so I really have nothing to hide. Still a little spooked. And I don't think flying drones around are any different than your local Police cruising around looking for trouble. The problem I have is when it's used for traffic enforcement or other means of generating revenue for more and fancier drones. It seems rare that police are actually preventing major crime. Perhaps to an extent, but there is no replacement for on the ground detective work. Which is what I wish local police were spending time doing instead of trying to catch me speeding when late to work. Facial recognition and Drones on non-criminal civilians... a little too much for me.

As annoying as it is to have to remind people that you can't 'enhance' a photograph with data that isn't there, or get DNA from hair in a day for petty larceny, lets remember that we USED to have to remind people that watched too much 'Perry Mason' that not all trials ended with SOMEONE confessing to the crime.

The 9/11 attack, WTC bombing in 94, USS cole, Boston bombing, and all of the other attacks against America have probably cost less than a couple of Million Dollars.We have spent more on body armour for NYPD officers than all the attacks cost combined.......

A couple million? You're off by a few orders of magnitude. Just for the most recent one in Boston, I remember seeing estimates that the cost of shutting down the city while everything was going on was well into nine figures. Even ignoring security measures, cleaning up NYC wasn't exactly cheap either.

Edit: Unless I'm illiterate and you meant "the costs to carry out the attacks", which is entirely possible.

There's a segment of the population (of the sort that are Alex Jones fans) who believe the government has a secret dossier on every citizen, that the government listens to all of our phone conversations, that it watches our every move, reads all of our emails and web postings, etc. ad nauseam.

I think this episode shows just how limited our government's surveillance capabilities really are. Not to say they won't get better, but I don't lose a lot of sleep at night worrying about Big Brother watching me.

I'm sure in Alex Jones' world the government knew who these guys were from the very beginning and simply "allowed" them to kill people for some nefarious (and unprovable) purpose . . .

Indeed -- I thought of Decker in "Blade Runner"; not only zooming in an analog photograph, but ROTATING the image! I'm not usually one to limit what science may be able to do in the future, but being able to create information that isn't there will be a neat trick.

There's a segment of the population (of the sort that are Alex Jones fans) who believe the government has a secret dossier on every citizen, that the government listens to all of our phone conversations, that it watches our every move, reads all of our emails and web postings, etc. ad nauseam.

I think this episode shows just how limited our government's surveillance capabilities really are. Not to say they won't get better, but I don't lose a lot of sleep at night worrying about Big Brother watching me.

I'm sure in Alex Jones' world the government knew who these guys were from the very beginning and simply "allowed" them to kill people for some nefarious (and unprovable) purpose . . .

There's a segment of the population (of the sort that are Alex Jones fans) who believe the government has a secret dossier on every citizen, that the government listens to all of our phone conversations, that it watches our every move, reads all of our emails and web postings, etc. ad nauseam.

I think this episode shows just how limited our government's surveillance capabilities really are. Not to say they won't get better, but I don't lose a lot of sleep at night worrying about Big Brother watching me.

I'm sure in Alex Jones' world the government knew who these guys were from the very beginning and simply "allowed" them to kill people for some nefarious (and unprovable) purpose . . .

I think according to the latest from Alex Jones, and some elected offical from New Hampshire, these guys are innocent, .... It's more a false flag inside job government opperation, ... Achieving exactly what i dunno but hey! Its from Alex so you know he's gotta have a sound reasonable logical argument loaded with verifiable facts backing it up otherwise he wouldn't say it.

The idea that facial recognition "failed" here is a misunderstanding of how the technology works.

These systems are built and tested assuming a database of high quality images (a gallery) against which low quality images will be searched. This gallery includes only mugshots and images where the subject has been knowingly captured in balanced lighting conditions.

Low quality images like those in this article are used only for searching this type of gallery, and do return matching candidates when the subject exists in the gallery. Matching multiple poor quality surveillance photos against one another is not expected to work.

It doesn't look like there were high quality pictures of either Tsarnaev brother in anyone's database. Depending on your point-of-view, this is either an intelligence failure, or privacy working as intended. If the elder Tsarneav had been questioned in 2012, he might been enrolled into one of these gallery.

The idea that facial recognition "failed" here is a misunderstanding of how the technology works.

These systems are built and tested assuming a database of high quality images (a gallery) against which low quality images will be searched. This gallery includes only mugshots and images where the subject has been knowingly captured in balanced lighting conditions.

Low quality images like those in this article are used only for searching this type of gallery, and do return matching candidates when the subject exists in the gallery. Matching multiple poor quality surveillance photos against one another is not expected to work.

It doesn't look like there were high quality pictures of either Tsarnaev brother in anyone's database. Depending on your point-of-view, this is either an intelligence failure, or privacy working as intended. If the elder Tsarneav had been questioned in 2012, he might been enrolled into one of these gallery.

The article actually states that both of therm should be in the gallery.

Are security cameras even HD quality these days? And you'd need good incentive to replace the cameras that are already installed as well. Maybe with a 4K sensor we might be able to get a decent amount of info from a crop, but I really wouldn't expect too much due to lack of focus.

It really doesn't matter if they work or not, the bombings will be used as justification to buy them anyway, along with more cameras and license plate readers, and whet ever else can be crammed into the proposal. The public must be sold security and given comfort, even if they do not exist, and the bombings will be used as the justification for more security, everywhere!

Sean Gallagher / Sean is Ars Technica's IT Editor. A former Navy officer, systems administrator, and network systems integrator with 20 years of IT journalism experience, he lives and works in Baltimore, Maryland.