On big data, the Boston Marathon and civil liberties

For all the concerns over mobile phone logs, video footage and other data collection that could potentially be used to survail American citizens, it’s times like this that I think we see their real value.

According to a Los Angeles Times article about Monday’s bomb attack at the Boston Marathon, the FBI has collected 10 terabytes that it’s sifting through in order to seek out clues about what exactly happened and who did it. Maybe I’m just a techno-optimist, but I find this very reassuring.

According the Times, “The data include call logs collected by cellphone towers along the marathon route and surveillance footage collected by city cameras, local businesses, gas stations, media outlets and spectators who volunteered to provide their videos and snap shots.”

Lots of data means lots of potential value

It’s reassuring because I’ve spoken with so many smart people over the years who can do amazing things with data. Ten terabytes isn’t a huge data set by any stretch of the imagination, but it’s plenty to work with if it’s of high quality. It’s very possible there are some needles in that haystack of call logs, and I’m optimistic the analysts within the FBI — possibly with some outside help — will be able to find them.

Techniques around video analysis and facial recognition are better than many people think, too. If there’s a way to stitch together hundreds — maybe thousands — of videos into a single truth of what happened, then I’m confident it will happen. By tracking faces and objects over time and place, we can recreate a crime and track down suspects without relying on after-the-fact accounts by witnesses who weren’t paying any attention until the bomb actually went off.

It’s not that witnesses are lying, it’s just that an attack like this might artificially color certain observations as being more nefarious than they really were. A Middle Easterner standing nearby might seem suspicious in hindsight, for example, but a witness might not have seen that guy cheering on a friend beforehand, stop to get a soda, and then meander over to the area where the bomb went off.

I have no clue what really happened, of course, I just know that cameras — especially hundreds of them at different angle and shooting over different timeframes — don’t suffer from selective or incomplete memories.

Can we crowdsource some surveillance?

I also find all this now-surveillance data reassuring because — if it proves useful — it might actually help to preserve our civil liberties going forward. We don’t necessarily needs drones flying overhead and cameras on every corner if we can crowdsource (at least from densely populated areas or big events) relatively high-resolution videos and photos during the investigation phase. We don’t necessarily need all orders of mobile call and location-tracking if we can collect what we need from the relevant area afterward.

This does little to prevent attacks, of course, and intelligence agencies will no doubt continue to trace phone calls and generally do what they do. That’s fine by me. If airports want to use facial recognition to flag known threats as they walk in the door, I’m not sure I can take issue with that either.

But by and large, it seems there’s precious little that surveillance — especially video — can do to predict crime unless an agency already knows what it’s looking for and has the means to act fast enough to make a difference. (IBM Fellow and general identity analytics guru Jeff Jonas wrote a great blog post in November about what’s actually possible to predict given the data on hand.)

So to the extent anyone thinks additional surveillance is going to help solve crimes that we didn’t see coming, I think I’d rather leave the data in the hands of hundreds or thousands of individuals and businesses rather than a handful of city, state and federal governments that might be tempted to overstep the bounds of what’s acceptable.

Really, though, the notion of how to prevent terrorist attacks and other mass-casualty crimes is a complex issue, and I’m not sure there are many ethically right or wrong answers. But when we get past the tragedy and criminality of what happened in Boston, we have to look at it as part of the bigger picture that’s shaping up around all the data we’re generating, collecting and analyzing. If terabytes of geospatially targeted call records and crowdsourced audio-video surveillance can help solve this type of crime and save all the time, money and privacy concerns of more-intrusive and expansive government efforts, then maybe there’s something worth considering.