When the first revelations about the National Security Agency’s (NSA) widespread collection of phone call metadata and Internet traffic began to surface, South Carolina Senator Lindsey Graham noted that for those not talking to terrorists on the phone, “We don’t have anything to worry about. I’m glad that activity is going on, but it is limited to tracking people who are suspected to be terrorists and who they may be talking to.”

Turns out the data collection is not so limited. In testimony yesterday before the House Judiciary Committee, National Security Agency Deputy Director Chris Inglis said that the NSA’s probing of data in search of terrorist activity extended “two to three hops” away from suspected terrorists. Previously, NSA leaders had said surveillance was limited to only two “hops” from a suspect.

If you've ever played "Six Degrees of Kevin Bacon" or used LinkedIn to try to reach someone professionally, you know how small the world of interconnected contacts can be. When you use big data tools to mine for relationships, the world gets even smaller. That third hop in connections greatly expands the probability of innocent people worldwide being scooped up into the NSA’s surveillance machine to include a good-sized share of American citizens—citizens who Senator Graham said "don't have anything to worry about."

Just how does the NSA pick who falls within those three hops? Based on what the agency has said about its programs, what Edward Snowden has leaked, and what we know about the NSA’s technical capabilities, here’s a best guess at how the NSA does it and why it matters to you. Here's a hint: by reading this article, you're one hop from me—and three hops from Hamid Karzai.

Step 1: Collecting connections

The NSA has two major sources of information about interactions between people: phone call metadata and Internet metadata. As revealed by the FISA warrant leaked by Edward Snowden, the NSA has been collecting information on phone calls made through US telecommunications carriers, apparently for years.

The NSA also uses network taps at major Internet hubs to capture packet data. There’s no way the NSA can capture all Internet traffic in any useful fashion—it would mean a firehose-like torrent of petabytes per day, far too large to retransmit and store in data centers, despite the NSA's efforts to build a zettabytes-scale storage facility in Utah. But the NSA can collect much of the metadata from the traffic it intercepts, including the Internet addresses that send and receive the packets, as well as information like e-mail headers and Web visits. If those fall within a particular pattern of interest, the agency can then capture all of the associated content. But for the moment, let’s focus on this metadata.

Phone call metadata and Internet metadata became something like an involuntary Facebook—they can show who you talk to, when and how often, and what websites you visit. They can also give hints of what your interests are and be used to build a “graph” of relationships between individuals (or at least between phone numbers or IP addresses).

This is a vast amount of information. And while the NSA and courts have said that there’s no expectation of privacy for metadata—comparing it to the address written on an envelope as opposed to the letter inside—the knowledge that is now derivable from bulk metadata can be highly personal and sensitive.

But there’s no reason for you to fear unless you’re calling terrorists, right? Software robots are doing the analysis, not people. And robots are completely trustworthy.

Step 2: Counting the hops

Next, the NSA’s systems sort through the data using algorithms to find connections. These can be detected in near real time from Internet data or discovered in the periodic dumps of phone metadata from carriers, building upon the system’s knowledge of previous connections. The system narrows the field of potential surveillance targets through a process that’s similar to playing the game “Six Degrees of Kevin Bacon”—only, in this case, it’s more like “Three Degrees of Osama Bin Laden.”

Inglis said that the NSA looks at two to three hops from a suspect. To determine how many hops you are from Osama, for example, the NSA’s data analysis engine software constantly plows through information and builds a model of all the relationships between every phone number on record and every IP address. Other software robots query the graph to discover which “nodes”—phone numbers, IP addresses and email accounts—fall within three degrees of separation from an established suspect.

If you have a direct relationship with a suspected terrorist or target (you’ve called them, you’ve emailed them, you’ve visited their website) that’s a “one hop” relationship; there’s a solid line connecting you to that person in the NSA’s relationship graph. If you talk with, e-mail, or visit the Facebook page or website of someone who’s got a one-hop relationship, you’re two hops away. Add one more person in between in the graph, and you’re three hops away.

Step 3: Digging deeper

If you're within three hops, you may get flagged for analysis, and then you could get extra special attention, such as a secret FISA warrant request to use PRISM for access to your data on cloud providers’ servers.

Under the NSA’s FISA requests, Google, Microsoft, and other Internet services companies can be compelled to hand over relevant data from their servers on any account that falls within the three-hop range and is flagged as belonging to a person of interest. If you’ve won this lottery, the NSA will get access to your e-mails on Gmail or Outlook.com as well as your chats and Web-stored contacts, your documents, your synced data from computers and mobile devices, your backups, and anything else that can be handed over—at least, so the documents Snowden leaked imply.

Your raw Internet traffic will get more attention as well. Your IP address will be watched more carefully by deep packet inspection hardware at the NSA’s 'Net taps, and what you do online will get extra scrutiny.

If your behavior is anomalous enough, and if you’re a US resident, the NSA will pass the surveillance over to the FBI. Otherwise, your data will be collected and analyzed until it’s determined that you have nothing to do with the alleged terrorist; how long that process takes (and how long the data is retained after analysis) is unknown.

It’s a small world after all

Unfortunately, it doesn’t take much to hit the three-hop jackpot; without knowing it, a large percentage of the world's population (and the US population) could easily be classified as being in a third degree of separation from a suspected terrorist.

A great deal of research has been done into the interconnectedness of people in the Internet age. Social scientists, mathematicians, and computer scientists have explored the “small world” phenomenon with studies and experiments for over 50 years, and their findings show that the "small world" keeps getting smaller as technology advances. In 1979, chair and founder of MIT’s political science department Ithiel de Sola Pool and the University of Michigan’s Manfred Kochen published a paper titled "Contacts and Influence," which draws on a decade of research into social networks. De Sola Pool and Kochen posited that “in a country the size of the United States, if acquaintanceship were random and the mean acquaintance volume were 1,000, the mean length of minimum chain between pairs of persons would be well under two intermediaries.”

In other words, if the average person in the US has contact with and is acquainted with 1,000 others (through brief interactions, such as an e-mail or a phone call, or through stronger associations), then we’re at most two hops from anyone else in the US. Ergo, if any one person in the US is one hop from a terrorist, chances are good that you are three hops away.

The actual degrees of separation between people may be somewhat larger because the population of the US has grown significantly since 1979; our interconnectedness with the world at large has grown as well, widening the potential links between people. Live in a major metropolitan center in the US and you’re bound to be two degrees of separation away from someone in a country that’s of interest to the NSA. For example: I have been a regular customer of restaurants owned by Baltimore’s Karzai family, which is headed by a brother of Afghan President Hamid Karzai—two hops. I’m also, according to LinkedIn, two degrees of separation away from President Obama. Am I a good guy or a bad guy?

The Internet has blown the level of interconnectedness though the proverbial roof—we now have e-mail, social media, and instant message interactions with people we’ll never meet in real life and in places we’ll never go. A 2007 study by Carnegie Mellon University machine learning researcher Jure Leskovec and Microsoft Research’s Eric Horvitz found that the average number of hops between any two arbitrary Microsoft Messenger users, based on interaction, was 6.6. And a study of Twitter feeds published in 2011 found the average degree of separation between random Twitter users to be only 3.43.

So even if the NSA limited its surveillance activities—and by “surveillance” I mean active probing of the content of communications of an individual—to people within two hops of suspected terrorists, that’s a sizable population. Three ratchets it up to hundreds of millions or potentially billions of people, especially when the definition of a hop is based on relationships so casual we could create them by accidentally clicking on a link in a spam e-mail. So far, we know that there have been about 20,000 requests for FISA warrants to surveil domestic targets since 2001, but if those warrants covered three hops from the suspects at the center of the requests—depending on how tightly or loosely the NSA defines a relationship—three hops could encompass as much as 50 percent of the Internet-using population of the world.

What’s the likelihood that you’ve managed to fall into that 50 percent? Well, if you live outside the US or ever talk to anyone outside the US, your odds go up. If you have contacts in parts of the world that the US government has interest in as sources of terrorism, it goes up much more. That places people like me (journalists), social activists, academics, and a large chunk of the business world in a zone of high risk for NSA surveillance.

Sure, I’m not calling terrorists, and NSA analysts are not intercepting my calls or rifling through my Gmail account. (Well—probably not.) But the chance that they are is significantly higher than the probability I would have put on that scenario two months ago, and that’s disconcerting. And while government officials insist that all of this surveillance is tightly controlled and there’s no chance of it being abused—well, talk is cheap, as they say. You'd be a fool not to at least consider the possibility that the NSA is already reading your e-mail.

Promoted Comments

The math is almost certainly much more complicated than that. I image the actual study must take into account the idea that, especially at the higher distances, some fraction of the contacts will be "repeats" and thus won't count. Also, each person having exactly 1000 contacts is too simplistic, and doesn't examine the idea that some insular or extroverted people will have relatively few or many contacts, and the effects that has on the average distance (probability distributions, yo!).

Yes, it's better to use this formula for the probability that two random people are within d degrees of each other in a population of size p with n connections per person: 1 - (1-1/p)^(∑_{i=1}^d n^i). This formula doesn't take everything into account, particularly the non-independence of friends, but should give a good idea of the true probability.

According to this formula, with p = 350,000,000 and n = 1,000, the probability of being within one degree is 2.8571387781228808e-6; within 2 degrees is 2.8559140987294285e-3; within 3 degrees is 0.9427314035071006. So if you're in the US, you're highly likely to be within 3 degrees of a random person in the US.

Share this story

Sean Gallagher
Sean is Ars Technica's IT and National Security Editor. A former Navy officer, systems administrator, and network systems integrator with 20 years of IT journalism experience, he lives and works in Baltimore, Maryland. Emailsean.gallagher@arstechnica.com//Twitter@thepacketrat