The Google Clones That Power NSA Surveillance

Share

The Google Clones That Power NSA Surveillance

NSA computer scientist Nathanael Burton, on stage at an open source conference this summer.

Photo: Aaron Hockney

Google. Amazon. Facebook. These giants of the web have a knack for juggling enormous amounts of data across tens of thousands of computer servers. And they've been kind enough to share their "big data" methods with the world at large. Many open source software projects now let anyone build similarly enormous operations capable of juggling similarly enormous amounts of data – including the National Security Agency.

The NSA has an estimated budget of approximately $10 billion a year, but in building its massive online operation – an operation that helps the agency track the behavior of people across the U.S. and beyond – it too has opted for free software inspired by the likes of Google and Amazon.

The agency's Googleness was on full display this week, when the Washington Post published another leaked document from NSA analyst Edward Snowden: a white paper detailing how it uses mobile location data to track the people. The paper confirms once again that the agency is using Hadoop – an open source clone of technology developed at Google – to store and analyze data collected as part of its surveillance efforts.

The agency first presented some details about its use of Hadoop at a cloud computing conference back in 2009, Information Week reported at the time. But the new story pinpoints the agency's use of Hadoop. And it shows that open source knows no morality. Anyone can use it, for any purpose.

Hadoop was created by Doug Cutting and Mike Cafarella in 2005. They were inspired by a research paper Google published a year before detailing a system called MapReduce. MapReduce stores data across hundreds or even thousands of computing servers, and then these machines can collectively process and analyze the data. Originally funded by Yahoo, Hadoop was soon adopted by Facebook and Twitter, but it's not just for web companies anymore. In recent years, many businesses – as well as government agencies – have adopted it.

The NSA doesn't use Hadoop to save money. The agency uses Hadoop to process data at an unprecedented scale. "The object is to do things that were essentially impossible before," Randy Garrett, a director of technology for NSA's integrated intelligence program, reportedly said in 2009.

Separately, the NSA uses OpenStack, an open source cloud computer system originally developed by NASA and Rackspace. OpenStack mimics Amazon's ability to build countless virtual servers spread across thousands of servers. Earlier this year, NSA computer scientist Nathanael Burton explained how the agency is using the system to speed the deployment of IT projects at the OpenStack Summit in Portland, Oregon.

What's more, the agency has built its own Google clone: a massively distributed databased called Accumulo. Like Hadoop, Accumulo was based on a research paper published by Google, a paper describing a sweeping database called BigTable. There are several other BigTable clones out there, such as Cassandra and Hbase, and the NSA's decision to build its own tool rather than use something that already existed got the agency into some hot water with Congress, as we reported last year. But it had good reason: It wanted tighter security controls.

"Back in 2008 when the project was started there just wasn’t a viable project to latch on to," Oren J. Falkowitz, who worked on Accumulo for the NSA, told us earlier this year. "HBase existed, but it was a very different project." He also explained that Hbase and Cassandra would have to be re-written from the ground-up to include the security tools offered by Accumulo.

The NSA isn't just using open source technology. It's also using stuff like IBM's Netezza data analytics tool, for example. But it's the open source software that gives the agency a certain power it didn't have before the rise of Google and Amazon.

The likes of Google have publicly criticized the NSA's online surveillance programs. But the irony is that they've helped to fuel them – in at least one way.