When the first revelations about the National Security Agency’s (NSA) widespread collection of phone call metadata and Internet traffic began to surface, South Carolina Senator Lindsey Graham noted that for those not talking to terrorists on the phone, “We don’t have anything to worry about. I’m glad that activity is going on, but it is limited to tracking people who are suspected to be terrorists and who they may be talking to.”

Turns out the data collection is not so limited. In testimony yesterday before the House Judiciary Committee, National Security Agency Deputy Director Chris Inglis said that the NSA’s probing of data in search of terrorist activity extended “two to three hops” away from suspected terrorists. Previously, NSA leaders had said surveillance was limited to only two “hops” from a suspect.

If you've ever played "Six Degrees of Kevin Bacon" or used LinkedIn to try to reach someone professionally, you know how small the world of interconnected contacts can be. When you use big data tools to mine for relationships, the world gets even smaller. That third hop in connections greatly expands the probability of innocent people worldwide being scooped up into the NSA’s surveillance machine to include a good-sized share of American citizens—citizens who Senator Graham said "don't have anything to worry about."

Just how does the NSA pick who falls within those three hops? Based on what the agency has said about its programs, what Edward Snowden has leaked, and what we know about the NSA’s technical capabilities, here’s a best guess at how the NSA does it and why it matters to you. Here's a hint: by reading this article, you're one hop from me—and three hops from Hamid Karzai.

Step 1: Collecting connections

The NSA has two major sources of information about interactions between people: phone call metadata and Internet metadata. As revealed by the FISA warrant leaked by Edward Snowden, the NSA has been collecting information on phone calls made through US telecommunications carriers, apparently for years.

The NSA also uses network taps at major Internet hubs to capture packet data. There’s no way the NSA can capture all Internet traffic in any useful fashion—it would mean a firehose-like torrent of petabytes per day, far too large to retransmit and store in data centers, despite the NSA's efforts to build a zettabytes-scale storage facility in Utah. But the NSA can collect much of the metadata from the traffic it intercepts, including the Internet addresses that send and receive the packets, as well as information like e-mail headers and Web visits. If those fall within a particular pattern of interest, the agency can then capture all of the associated content. But for the moment, let’s focus on this metadata.

Phone call metadata and Internet metadata became something like an involuntary Facebook—they can show who you talk to, when and how often, and what websites you visit. They can also give hints of what your interests are and be used to build a “graph” of relationships between individuals (or at least between phone numbers or IP addresses).

This is a vast amount of information. And while the NSA and courts have said that there’s no expectation of privacy for metadata—comparing it to the address written on an envelope as opposed to the letter inside—the knowledge that is now derivable from bulk metadata can be highly personal and sensitive.

But there’s no reason for you to fear unless you’re calling terrorists, right? Software robots are doing the analysis, not people. And robots are completely trustworthy.

Step 2: Counting the hops

Next, the NSA’s systems sort through the data using algorithms to find connections. These can be detected in near real time from Internet data or discovered in the periodic dumps of phone metadata from carriers, building upon the system’s knowledge of previous connections. The system narrows the field of potential surveillance targets through a process that’s similar to playing the game “Six Degrees of Kevin Bacon”—only, in this case, it’s more like “Three Degrees of Osama Bin Laden.”

Inglis said that the NSA looks at two to three hops from a suspect. To determine how many hops you are from Osama, for example, the NSA’s data analysis engine software constantly plows through information and builds a model of all the relationships between every phone number on record and every IP address. Other software robots query the graph to discover which “nodes”—phone numbers, IP addresses and email accounts—fall within three degrees of separation from an established suspect.

If you have a direct relationship with a suspected terrorist or target (you’ve called them, you’ve emailed them, you’ve visited their website) that’s a “one hop” relationship; there’s a solid line connecting you to that person in the NSA’s relationship graph. If you talk with, e-mail, or visit the Facebook page or website of someone who’s got a one-hop relationship, you’re two hops away. Add one more person in between in the graph, and you’re three hops away.

Step 3: Digging deeper

If you're within three hops, you may get flagged for analysis, and then you could get extra special attention, such as a secret FISA warrant request to use PRISM for access to your data on cloud providers’ servers.

Under the NSA’s FISA requests, Google, Microsoft, and other Internet services companies can be compelled to hand over relevant data from their servers on any account that falls within the three-hop range and is flagged as belonging to a person of interest. If you’ve won this lottery, the NSA will get access to your e-mails on Gmail or Outlook.com as well as your chats and Web-stored contacts, your documents, your synced data from computers and mobile devices, your backups, and anything else that can be handed over—at least, so the documents Snowden leaked imply.

Your raw Internet traffic will get more attention as well. Your IP address will be watched more carefully by deep packet inspection hardware at the NSA’s 'Net taps, and what you do online will get extra scrutiny.

If your behavior is anomalous enough, and if you’re a US resident, the NSA will pass the surveillance over to the FBI. Otherwise, your data will be collected and analyzed until it’s determined that you have nothing to do with the alleged terrorist; how long that process takes (and how long the data is retained after analysis) is unknown.

It’s a small world after all

Unfortunately, it doesn’t take much to hit the three-hop jackpot; without knowing it, a large percentage of the world's population (and the US population) could easily be classified as being in a third degree of separation from a suspected terrorist.

A great deal of research has been done into the interconnectedness of people in the Internet age. Social scientists, mathematicians, and computer scientists have explored the “small world” phenomenon with studies and experiments for over 50 years, and their findings show that the "small world" keeps getting smaller as technology advances. In 1979, chair and founder of MIT’s political science department Ithiel de Sola Pool and the University of Michigan’s Manfred Kochen published a paper titled "Contacts and Influence," which draws on a decade of research into social networks. De Sola Pool and Kochen posited that “in a country the size of the United States, if acquaintanceship were random and the mean acquaintance volume were 1,000, the mean length of minimum chain between pairs of persons would be well under two intermediaries.”

In other words, if the average person in the US has contact with and is acquainted with 1,000 others (through brief interactions, such as an e-mail or a phone call, or through stronger associations), then we’re at most two hops from anyone else in the US. Ergo, if any one person in the US is one hop from a terrorist, chances are good that you are three hops away.

The actual degrees of separation between people may be somewhat larger because the population of the US has grown significantly since 1979; our interconnectedness with the world at large has grown as well, widening the potential links between people. Live in a major metropolitan center in the US and you’re bound to be two degrees of separation away from someone in a country that’s of interest to the NSA. For example: I have been a regular customer of restaurants owned by Baltimore’s Karzai family, which is headed by a brother of Afghan President Hamid Karzai—two hops. I’m also, according to LinkedIn, two degrees of separation away from President Obama. Am I a good guy or a bad guy?

The Internet has blown the level of interconnectedness though the proverbial roof—we now have e-mail, social media, and instant message interactions with people we’ll never meet in real life and in places we’ll never go. A 2007 study by Carnegie Mellon University machine learning researcher Jure Leskovec and Microsoft Research’s Eric Horvitz found that the average number of hops between any two arbitrary Microsoft Messenger users, based on interaction, was 6.6. And a study of Twitter feeds published in 2011 found the average degree of separation between random Twitter users to be only 3.43.

So even if the NSA limited its surveillance activities—and by “surveillance” I mean active probing of the content of communications of an individual—to people within two hops of suspected terrorists, that’s a sizable population. Three ratchets it up to hundreds of millions or potentially billions of people, especially when the definition of a hop is based on relationships so casual we could create them by accidentally clicking on a link in a spam e-mail. So far, we know that there have been about 20,000 requests for FISA warrants to surveil domestic targets since 2001, but if those warrants covered three hops from the suspects at the center of the requests—depending on how tightly or loosely the NSA defines a relationship—three hops could encompass as much as 50 percent of the Internet-using population of the world.

What’s the likelihood that you’ve managed to fall into that 50 percent? Well, if you live outside the US or ever talk to anyone outside the US, your odds go up. If you have contacts in parts of the world that the US government has interest in as sources of terrorism, it goes up much more. That places people like me (journalists), social activists, academics, and a large chunk of the business world in a zone of high risk for NSA surveillance.

Sure, I’m not calling terrorists, and NSA analysts are not intercepting my calls or rifling through my Gmail account. (Well—probably not.) But the chance that they are is significantly higher than the probability I would have put on that scenario two months ago, and that’s disconcerting. And while government officials insist that all of this surveillance is tightly controlled and there’s no chance of it being abused—well, talk is cheap, as they say. You'd be a fool not to at least consider the possibility that the NSA is already reading your e-mail.

Promoted Comments

The math is almost certainly much more complicated than that. I image the actual study must take into account the idea that, especially at the higher distances, some fraction of the contacts will be "repeats" and thus won't count. Also, each person having exactly 1000 contacts is too simplistic, and doesn't examine the idea that some insular or extroverted people will have relatively few or many contacts, and the effects that has on the average distance (probability distributions, yo!).

Yes, it's better to use this formula for the probability that two random people are within d degrees of each other in a population of size p with n connections per person: 1 - (1-1/p)^(∑_{i=1}^d n^i). This formula doesn't take everything into account, particularly the non-independence of friends, but should give a good idea of the true probability.

According to this formula, with p = 350,000,000 and n = 1,000, the probability of being within one degree is 2.8571387781228808e-6; within 2 degrees is 2.8559140987294285e-3; within 3 degrees is 0.9427314035071006. So if you're in the US, you're highly likely to be within 3 degrees of a random person in the US.

109 Reader Comments

You can trust the government to compartmentalize its information and never use that for unethical uses, right? I mean, what harm could there possibly be in filling out the 'Ethnicity' form on the census cards? Except, of course, when the government decided to use that information to illegally detain US citizens of Japanese descent in what amounted to PoW camps.

I have to wonder how they are handling corporations (like Verizon, bill collectors, hospitals, etc). I get a call from a collection agency, that happens to also call a person of interest. This means I'm only 2 hops? What about using a facility like Boston Medical Center? Since they handled some "one-hop" patients from the Marathon bombings, does that mean that every patient they have ever treated is now a two-hop relationship and must be monitored?

Lovely. I, as with many people online, am friends with others all over the world, several in middle-eastern countries. I guess I now have zero expectation of privacy online. Perhaps the joking "Hello, NSA agents!" that I open online chats with isn't quite so foolish after all.

I have to wonder how they are handling corporations (like Verizon, bill collectors, hospitals, etc). I get a call from a collection agency, that happens to also call a person of interest. This means I'm only 2 hops? What about using a facility like Boston Medical Center? Since they handled some "one-hop" patients from the Marathon bombings, does that mean that every patient they have ever treated is now a two-hop relationship and must be monitored?

Is it treated as a corporation or is the important unit an individual? As part of my job, I interact with large numbers of people (mostly via email), including many outside this country. But because those contacts are going through one or two email accounts, I imagine it treats me as one individual rather than a function of the company.

It may be different when the emails are generic rather than referring to particular individuals. I would not be surprised if it treats a generic email as a specific individual, though. Much easier to do so than to figure out which email addresses are generic.

I created an Outlook folder entitled "NSA suggestion box" that I like to post to from time to time.

lol. Thats an awesome idea.

I already qualify. I have a Muslim Name and originally from a country of interest. I have actually been visited by FBI 3 times in the last 8 years asking some really interesting questions. All in all if there is anyone NSA is listening to its me.

As a inmigrant with a scholar visitor visa (J1) it is pretty much guarranteed that NSA has paid special attention to my online activities at some point in the past. If that is true for all scholars working in US instituions, pretty much the entire US population could be flag in a three-hop scenario.

It's quite surreal. Surreal and disturbing. The government is violating constitutional rights in secret in a way that only illegal information breaches can reveal, and is arrogant enough to try sweeping it under the rug by telling us we're "probably" not being target and that the US Government is still your best friend. All the while, they're throwing everything they can at the leakers to try to make an example out of them.

I don't know what exactly needs to happen, but something does. It's severely disturbing that all of this surveillance is thought of as "okay" by so many politicians and law makers.

The flip side is that the NSA has a productivity-based interest in keeping their surveillance scope manageable, or else monitoring a hypothetical 50% of the national population might well require the other 50% in human resources to sift through all the content. So I imagine there are algorithms developed to maximize the potential relevance of those second and third hops, narrowing the likelihood of false positives more than this article seems to suggest.

But as long as these programs are hidden from public accountability, it's a moot point anyway, because we have no way to tell that these systems aren't being misused or abused.

I'd like more details on how a mean acquaintanceship of 1000 people could connect most pairs of 300 million people in two hops. Is this because we're talking averages and so some individuals are highly connected?

I personally don't see how the NSA's statements exclude any citizens at all, especially if they get to decide who a "suspected terrorist" is. Using three hops of fixed size 1000 acquaintanceships, you'd only need one suspected terrorist to cover everybody. With two hops of that size, you'd only need to suspect 0.3% of the population of terrorism. That's really easy, especially if you allow the FBI's methods that any loser who can be befriended and talked into buying a fake bomb is actually a terrorist.

Is it treated as a corporation or is the important unit an individual? As part of my job, I interact with large numbers of people (mostly via email), including many outside this country. But because those contacts are going through one or two email accounts, I imagine it treats me as one individual rather than a function of the company.

It may be different when the emails are generic rather than referring to particular individuals. I would not be surprised if it treats a generic email as a specific individual, though. Much easier to do so than to figure out which email addresses are generic.

Which works OK for email - I would expect that each email bucket/account would be considered either a "node", or attached to a person. But what about phone records? In most companies, the phone switch (PBX) sends outgoing calls through a series of available lines on a "first come first served" basis. It can also (optionally) encode the outgoing line with a different caller ID (all outgoing calls have the same phone number according to the caller ID system). Add this on top of a busy call center where multiple operators share the same desk on different shifts, you now have to (a) dump all calls in one bucket/node, (b) have the PBX do logging and capture that as part of the records request (along with call center operator/line mapping data), or (c) remove this data from the database as it's unreliable.

I really doubt that they are using option (c). I suspect it's a combination of (a) and (b) depending on which company they are dealing with.

As a inmigrant with a scholar visitor visa (J1) it is pretty much guarranteed that NSA has paid special attention to my online activities at some point in the past. If that is true for all scholars working in US instituions, pretty much the entire US population could be flag in a three-hop scenario.

I'd like more details on how a mean acquaintanceship of 1000 people could connect most pairs of 300 million people in two hops. Is this because we're talking averages and so some individuals are highly connected?

I personally don't see how the NSA's statements exclude any citizens at all, especially if they get to decide who a "suspected terrorist" is. Using three hops of fixed size 1000 acquaintanceships, you'd only need one suspected terrorist to cover everybody. With two hops of that size, you'd only need to suspect 0.3% of the population of terrorism. That's really easy, especially if you allow the FBI's methods that any loser who can be befriended and talked into buying a fake bomb is actually a terrorist.

The only issue is the partition between US and foreign direct contacts

Level 4 - 1000 * 1000 * 1000 * 1000 = 1,000,000,000,000

And the total world poputation about 7,000,000,000 or about 1/150 of the level 4 contacts. So according to this back of the envelope math most people are easily within 4 degrees of separation from each other..

The flip side is that the NSA has a productivity-based interest in keeping their surveillance scope manageable

There are two ways to manage scope: reduce the amount of data you search to the point where you see what you're looking for, or get a bigger hammer.

And this is why they're building their new facility in Utah. Because they're concerned about the productivity of having to query multiple data sources. It's much more efficient to dump everything into a single data source and query that instead.

It's severely disturbing that all of this surveillance is thought of as "okay" by so many politicians and law makers.

It's thought of as "OK" by almost all politicians for the very same reason that the Patriot Act passed with overwhelming numbers: if you don't like it, you're a traitor. And if you're a traitor, your opposition instantly has all the public opinion weaponry they need to destroy you in your next election.

On top of that, most Senators and top House members all get briefed on this stuff on a daily basis. They all know about the basics of these programs. But if they publicly admit they exist by starting to criticize them, they have all the same whistleblower protections as Snowden and Manning.

If you have a direct relationship with a suspected terrorist or target (you’ve called them, you’ve emailed them, you’ve visited their website) that’s a “one hop” relationship; there’s a solid line connecting you to that person in the NSA’s relationship graph. If you talk with, e-mail, or visit the facebark page or website of someone who’s got a one-hop relationship, you’re two hops away.

So is it now time to drop all our friends and acquaintances who might have above average curiosity, any interest in crime, politics, international affairs, etc? Or for them to drop us?

So is it now time to drop all our friends and acquaintances who might have above average curiosity, any interest in crime, politics, international affairs, etc? Or for them to drop us?

Please remove all friends you suspect of having anti-American views. This way we can know which friends of yours should be added to our top-tier list of suspects. Thank you for your cooperation, citizen.

The only issue is the partition between US and foreign direct contacts

Level 4 - 1000 * 1000 * 1000 * 1000 = 1,000,000,000,000

And the total world poputation about 7,000,000,000 or about 1/150 of the level 4 contacts. So according to this back of the envelope math most people are easily within 4 degrees of separation from each other..

The math is almost certainly much more complicated than that. I image the actual study must take into account the idea that, especially at the higher distances, some fraction of the contacts will be "repeats" and thus won't count. Also, each person having exactly 1000 contacts is too simplistic, and doesn't examine the idea that some insular or extroverted people will have relatively few or many contacts, and the effects that has on the average distance (probability distributions, yo!).

If you want to have 'fun' with the NSA, get an ad listed on a major ad server that loads random "pages of interest" into an unseen iframe (or similar) when the ad is loaded to a user's browser. Do it with the ad server for a site like cnn.com, foxnews.com, etc, and let the NSA sort through the mess.

The math is almost certainly much more complicated than that. I image the actual study must take into account the idea that, especially at the higher distances, some fraction of the contacts will be "repeats" and thus won't count.

Indeed. Imagine, albeit simplistically, a town of several hundred inhabitants who all know many of each other. The out-of-town connections would be far fewer than a 1000-per-person would imply.

I already qualify. I have a Muslim Name and originally from a country of interest. I have actually been visited by FBI 3 times in the last 8 years asking some really interesting questions. All in all if there is anyone NSA is listening to its me.

I already qualify. I have a Muslim Name and originally from a country of interest. I have actually been visited by FBI 3 times in the last 8 years asking some really interesting questions. All in all if there is anyone NSA is listening to its me.

De Sola Pool and Kochen posited that “in a country the size of the United States, if acquaintanceship were random and the mean acquaintance volume were 1,000, the mean length of minimum chain between pairs of persons would be well under two intermediaries.”

I had read "two intermediaries" as "two hops", when actually it's three. It's abundantly clear that three hops is enough, as I mentioned in the comment you replied to.

I already qualify. I have a Muslim Name and originally from a country of interest...

Great, thanks jerk. Now the NSA is wiretapping all of us, too.

That's the thing, we've no way of knowing how they define "contacts". But based on previous NSA misstatements and backtracks, visiting the same popular website could very well count. They've lost all credibility.

I recently discovered a configuration error on one of my mail servers that was causing incoming mail to disappear into a black hole. I was wondering if I could submit a FOIA request to get them back from the NSA. Seems like the govt could raise a little money by doing something useful. Just saying...

I already qualify. I have a Muslim Name and originally from a country of interest. I have actually been visited by FBI 3 times in the last 8 years asking some really interesting questions. All in all if there is anyone NSA is listening to its me.

Great, thanks jerk. Now the NSA is wiretapping all of us, too.

Just to give you even more cause for concern...I suspect that anyone who posts responses to other posters in Ars forums (as I am now doing) qualifies as a potential person of interest, since Edward Snowden used to post here, and many of the responders to his posts of the past are current active members with whom each of us may have had a "dialogue".

The math is almost certainly much more complicated than that. I image the actual study must take into account the idea that, especially at the higher distances, some fraction of the contacts will be "repeats" and thus won't count. Also, each person having exactly 1000 contacts is too simplistic, and doesn't examine the idea that some insular or extroverted people will have relatively few or many contacts, and the effects that has on the average distance (probability distributions, yo!).

Yes, it's better to use this formula for the probability that two random people are within d degrees of each other in a population of size p with n connections per person: 1 - (1-1/p)^(∑_{i=1}^d n^i). This formula doesn't take everything into account, particularly the non-independence of friends, but should give a good idea of the true probability.

According to this formula, with p = 350,000,000 and n = 1,000, the probability of being within one degree is 2.8571387781228808e-6; within 2 degrees is 2.8559140987294285e-3; within 3 degrees is 0.9427314035071006. So if you're in the US, you're highly likely to be within 3 degrees of a random person in the US.

The author has already clarified that he has connections to people of interest; but let's take it a step further. We already know that Edward Snowden socialized on Ars during his formative years, then went on to be a traitor to his country. These two points, plus all of the anti-American sentiment being expressed on this website, makes Ars Technica a terrorist organization, breeding future generations of traitors. I mean, the site even hosts many how-to guidelines for encrypting and anonymizing online communications, serving as a veritable recipe book for enemies of the state everywhere!

I created an Outlook folder entitled "NSA suggestion box" that I like to post to from time to time.

lol. Thats an awesome idea.

I already qualify. I have a Muslim Name and originally from a country of interest. I have actually been visited by FBI 3 times in the last 8 years asking some really interesting questions. All in all if there is anyone NSA is listening to its me.

Well, since you bring it up, could you turn your radio down, please? The feedback is really starting to give me a headache.

Anyone NOT worried about the possible (and eventually likely) abuse of the data collected by the NSA needs to only think of long-time head of the FBI, J. Edgar Hoover. He literally blackmailed presidents and other public officials with information collected by the agency.

I have no doubt that those working on this program have the best intentions. I believe they truly think they are keeping us safe. But the road to hell is paved with good intentions.

I recently discovered a configuration error on one of my mail servers that was causing incoming mail to disappear into a black hole. I was wondering if I could submit a FOIA request to get them back from the NSA.