June 5, 2014

Some numbers about NSA's data collection

(Updated: July 16, 2017)

Today it's exactly one year ago the Snowden-leaks started. Among the many highly classified documents which were disclosed during the past year are various charts that provide us with actual numbers about the amount of data the National Security Agency (NSA) is collecting.

Here we will take a look at those numbers and see what we can learn from them by comparing various sources and from breaking them down into NSA-divisions, countries and collection programs. As still only fragmented parts have been published, this overview cannot provide completeness or full accuracy (estimates are shown as round numbers).

The most detailed numbers about NSA's data collection are from the BOUNDLESSINFORMANT tool, which is used by NSA officials to view the metadata volumes collected from specific countries or by specific programs.

A worldwide overview is provided by a heat map which was published by The Guardian on June 11, 2013. It displays the figures over a 30-day period ending in March 2013:

NSA worldwide total:

Internet records (DNI):
Telephony records (DNR):

221.919.881.317

97.111.188.358
124.808.692.959

This total of 221 billion telephony and internet records a month equals 2,6 trillion a year and 7,3 billion a day. However, the actual number of what NSA collects worldwide might be higher - see the update below.

The BOUNDLESSINFORMANT worldwide overview for March 2013
(click to enlarge)

NSA volumes and limits

The BOUNDLESSINFORMANT tool seems to be very accurate, but there's another chart that gives different numbers. It's from a 2012 presentation for the SIGINT Development conference of the Five Eyes community and shows the volumes and limits of NSA metadata collection. The chart was published by The Washington Post on December 4, 2013 and again in Greenwald's book 'No Place To Hide' on May 13, 2014.

Chart showing the volumes and limits of NSA metadata collection
between January and June 2012
Redactions by Greenwald or the press, explanations added by the author
(click to enlarge)

This chart shows the numbers of:
- telephony metadata which are received by FASCIA, which is NSA's main ingest processor for telephony metadata;
- internet metadata that are transferred to MARINA, which is a huge NSA database that can store internet metadata for up to a year;
- internet metadata that had to be deleted because there was apparently not enough storage space.

Except for the deleted metadata, the charts shows ca. 10,4 billion internet metadata (DNI) a day, which makes 312 billion a month or 3,7 trillion a year.
There are ca. 4,5 billion telephony metadata (DNR) a day, which makes 135 billion a month or 1,6 trillion a year. If we compare these numbers with those from BOUNDLESSINFORMANT, we see a big difference:

Internet metadata (DNI):
Telephony metadata (DNR):

Volumes and Limits(a month, 1st half 2012)

312.000.000.000
135.000.000.000

BOUNDLESSINFORMANT(a month, 1st half 2013)

97.111.188.358
124.808.692.959

There's a difference of 11 billion telephony metadata between both charts, but an even bigger gap exists between the internet metadata: the Volumes and Limits chart shows 215 billion more than BOUNDLESSINFORMANT. This discrepancy wasn't noticed in the press reportings, nor in Greenwald's book, so at the moment there's no clear explanation for this.

Update:
A possible explanation for the discrepancies between these numbers can be found in a FAQ document for the BOUNDLESSINFORMANT tool, which says the numbers shown in the "map view" are lower than in the so-called "org view" of the tool because for the latter, also records are counted that doesn't contain the country identifiers which are needed to be counted in the "map view".
This would also explain the far bigger difference between the numbers of internet metadata, because for internet communications it is often much more difficult to attribute them to a particular country than for telephone conversations (which always contain country and region codes). This means the Volumes and Limits slide provides the more realistic numbers.

Telephony metadata

After being processed by FASCIA, the telephony metadata go to MAINWAY, which is another huge NSA database that keeps these kind of data for at least five years. In 2006 it was estimated that MAINWAY contained 1,9 trillion (1.900.000.000.000) call detail records.

For comparison: in 2007, AT&T's Daytona system, which is used to manage its call detail records (CDR's) supported 2,8 trillion records. In 2012, T-Mobile USA Inc. upgraded to an IBM Netezza 1000 platform with a capacity of 2 petabytes. This is used for loading 17 billion records a day, making 510 billion a month and more than 6 trillion a year.

If we assume the telecom providers and NSA use "records" in the same sense, than this shows that the telecommunication companies produce far more phone call metadata than NSA collects. As T-Mobile USA alone apparently creates 4 times more records as presented in NSA's BOUNDLESSINFORMANT tool, the domestic telephone metadata collection under section 215 Patriot Act cannot be included in the numbers we've seen so far.

Update #1:
Also interesting is that according to slides about the Hemisphere project, some 4 billion telephone metadata records are collected every day from any carrier that uses AT&T switches in response to grand jury subpoenas in counter-narcotics investigations.

Update #2:
During a parliamentary hearing in Germany, an official of BND explained that one cell phone creates between 100 and 200 metadata and business records a day. For 4.5 billion cell phone users worldwide that would equal at least 450 billion metadata each day.

Update #3:
A 2017 tourism report from the Netherlands provided numbers showing that in January 2013, Dutch mobile phone users generated 255 million metadata a day or 7,65 billion a month. The report also confirms that for Dutch users, mobile phones create about 100 "transactions" a day.

GCHQ metadata collection

Even more metadata seem to be collected by NSA's British partner agency GCHQ, which according to this slide from 2011 collects 50 billion metadata per day. This makes 1,5 trillion a month and an astonishing 18 trillion (18.000.000.000.000) a year!

This (partial) slide was published in Greenwald's book No Place To Hide, but without any further explanation, so we don't know whether GCHQ is able to actually store everything or has to delete large amounts, like NSA. From the slide itself it seems that the number of 50 billion refers to internet metadata alone, which would make this number even more remarkable.

According to a report by The Guardian, GCHQ also collects 600 million telephony metadata a day, which makes 18 billion a month - a small number compared to the internet metadata this agency receives:

Internet metadata per month:
Telephony metadata per month:

BOUNDLESS
INFORMANT

97 bln.
124 bln.

Volumes
and Limits

312 bln.
135 bln.

GCHQ

1500 bln.
18 bln.

For indexing and searching the content of internet communications, GCHQ uses the TEMPORA system, which is capable of processing the traffic from 46 fiber-optic cables of 10 gigabits per second. This makes that 21 petabytes of data flow past these systems every day.

NSA collection by country

The main BOUNDLESSINFORMANT interface with the heat map also lists the names of the countries which provide the highest numbers of data. These can be sorted in three different ways: Aggregate, DNI (internet) and DNR (telephony), each resulting in a slightly different top-5. The following aggregated totals (so both DNI and DNR) are known:

These numbers indicate from which countries NSA gathers most data, but the exact meaning of the numbers has still not been clarified. We do know that BOUNDLESSINFORMANT counts metadata records, but what these records exactly are (for example: how many records are created by one phone call?), and how they are attributed to a specific country is not clear.

Communications by definition have two ends: the originating and the receiving end. When both ends are in the same country, it's easy to attribute it to that particular country. But when the originating and the receiving ends are in a different country, how is such a communication registered? Maybe for both countries, although that would make many of them appear in these numbers twice.

United States

Edward Snowden saw the heat map with the 3 billion attributed to the United States as a proof that NSA was conducting domestic surveillance, although the heat map itself cannot provide sufficient evidence for that. The 3 billion could very well relate to foreign communications which are just transiting the US or to the American end of for example phone calls where the other end is a foreign suspect. Somewhat more information could have been provided by the bar charts for the US, but these haven't been published.

The number of 3.095.553.478 for the United States is the aggregated total. The number of internet records (DNI) for the US is 2.892.343.446, which leaves just 203.210.032 telephony records (DNR) or 0,065% of the aggregated total. In a table this looks like this:

United States total:

Internet records (DNI):
Telephony records (DNR):

3.095.553.478 per month

2.892.343.446 per month
203.190.032 per month

This tiny share for telephone metadata is rather strange given the fact that NSA is collecting all American phone records, but does not so with internet metadata. This seems to indicate that these domestic phone records are not counted by BOUNDLESSINFORMANT and that the internet records are from communications with at least one end foreign.

NSA collection by division

With a BOUNDLESSINFORMANT chart about the NSA's Special Source Operations (SSO) division published in Greenwald's book, we can also compare the number of data collected by this division with the total number of NSA data collection. We see that SSO, which is responsible for tapping the world's main fiber optic cables, accounts for 72% of all data:

NSA worldwide total:

Special Source Operations (SSO):
Other NSA divisions:

221.919.881.317 (100%)

160.168.000.000 (72%)
61.751.000.000 (28%)

This leaves the remaining 28% of the data to be collected by NSA's other main divisions: Global Access Operations (GAO), which operates mobile collection platforms like satellites, planes, drones and ships, and Tailored Access Operations (TAO), which collects data by hacking into foreign computer networks. The remaining 28% could also encompass data collected by the joint NSA/CIA Special Collection Service (SCS) units and by 3rd Party partner agencies.

BOUNDLESSINFORMANT chart about the SSO division
(click to enlarge)

SSO Collection programs

From the BOUNDLESSINFORMANT chart about Special Source Operations we can see how the total number of data collected by this division breaks down into the 5 biggest collection programs. From other charts we also know the numbers collected by some other programs, and these are added here too:

This listing shows that roughly one third of the data from telecommunication cables are collected by just on single program: DANCINGOASIS. Another third part is intercepted by the programs ranking second, third and fourth, but despite their weight, we still don't know more about them than just their names. Finally, the last third part of this type of collection is divided into numerous smaller and very small programs, a number of which have been disclosed through the Snowden-documents.

Update:
On June 18, 2014 the Danish newspaper Information and Greenwald's website The Intercept broke a story saying that SPINNERET, MOONLIGHTPATH and AZUREPHOENIX are all part of the RAMPART-A program, which encompasses access to fiber-optic cables abroad, in cooperation with 3rd Party partner agencies from at least five different countries.

According to a FAQ document, the BOUNDLESSINFORMANT tool doesn't count data which are collected under FISA authority, so numbers about the famous PRISM program are excluded. However, another source (pdf) says that under PRISM, more than 227 million "internet communications" are collected annually, which is ca. 19 million a month, but it is not known whether these "internet communications" are the same kind of records as presented by BOUNDLESSINFORMANT.

Processing and storing

Metadata from a number of big and important SSO collection programs are processed by a system codenamed SHELLTRUMPET. As can be read in the document below, this system processed almost 500 billion metadata records in 2012, which gives an average of 41,6 billion a month, but by the end of 2012 SHELLTRUMPET was already processing 2 billion call detail records a day, which would make 60 billion a month:

MUSCULAR contributes 60 gigabyte of data to the PINWALE database for internet content every day, which is 1,8 terabyte a month. As BOUNDLESSINFORMANT counts 181 million records for MUSCULAR, this would mean that 1 million internet metadata records represent almost 10 gigabyte of (content) data.

This correlation can be used to make a very rough estimate of the total amount of internet data collected by NSA. The worldwide total of 97 billion internet records a month would then equal some 961 terabyte of data each month or 11,5 petabyte a year (some numbers to compare are here; the new NSA data center in Bluffdale, Utah can store an estimated 12 exabytes, which is 12.000 petabytes).

Shared by 2nd party partner agencies

The very close working relationship between NSA and the 2party partner agencies from the Five Eyes community leads to a regular exchange of data, of which the most productive facilities can be seen in a BOUNDLESSINFORMANT chart that was published by Der Spiegel:

The total number of data received from these nine countries is slightly more than 1 billion a month, which is just a tiny 0,0045% of NSA's overall collection as counted by the BOUNDLESSINFORMANT tool.

Initially, Glenn Greenwald reported in various European newspapers that these numbers represented the phone calls of European citizens intercepted by NSA. But gradually it came out that his interpretation was wrong.

The charts actually show numbers of metadata that were collected from foreign communications by European military intelligence agencies in support of military operations abroad. These data were subsequently shared with partner agencies, most likely through the SIGDASYS system of the SIGINT Seniors Europe (SSEUR) group, which is led by NSA.

This may come to shock some readers, but perhaps it is due to our horrific foreign policy in which has provoked hatred around the Middle East. After all, I don't think they really hate us just because we have (some) freedom, I think some of these radicals would have not resorted to terrorism if we had tried to make peaceful international trade deals/ relationships - rather than blowing the sh*t out of their countries. When you push people, many times people will push back.

I want, particularly for those aged 60 and above and who care not a jot if they are a target, to be auto-cited - for speeding.

Each time google maps running on their phone and using the GSM satellites detects THEY (and only they) are 1m over the speed limit build into the mapping database, it reports the fact to the local police for a automatic fine.

Lets ensure folks "feel" the nature of a stasi-state - starting with those who care not if it exists. Criminals and criminals, including those driving at 1m over the limit.

"I really don’t care that the NSA might listen in on a phone call or read one of my emails."

I do care. They're my phone calls and my eMails, and I want to keep them private. Not because there's anything criminal in them. Just because I can make that choice, and that's what I choose. Anything else effectively makes me the property of the state.

This is the price of the society you're happy to surrender your liberties to protect; that those of us who choose NOT to surrender to the state, don't have to. You might not like it, but that's what you're protecting.

If you think it's worth protecting, then I get privacy because that's the whole point. If you don't think it's worth protecting, then I get privacy because you don't need to read my eMails. There is no argument in which the society is worth protecting AND gets to read my eMail; it's only worth protecting if it DOESN'T get to read my eMail.

Consider this: You can have a conversation with your neighbor in the middle of the street that is protected by the Fourth Amendment and private. If you pick up the phone and call the same neighbor, or chat online, or skype, or email them its recorded in some form. The only difference is a piece of technology exists between the two of you that can be exploited. How can the first conversation be private and protected but not the second?

That comparison is not fully correct: every conversation is protected by the fourth amendment, except for when there's a legal reason for law enforcement or intelligence agencies to eavesdrop on it.

If such agencies consider it necessary, they can also place microphones directed at people in their homes or even when they are meeting somewhere outside. However, this is somewhat cumbersome, it's true that eavesdropping on electronic communications is easier.

It should be noted that there's no evidence for indiscriminate or bulk recording of the content of American communications, at least not by NSA. NSA is only collecting the metadata of phone calls, to use that for targeted contact chaining.

US Red Phones

Sequence of the real Red Phones, not for the Washington-Moscow Hotline, but for the US Defense Red Switch Network (DRSN). The phones shown here were in use from the early eighties up to the present day and most of them were made by Electrospace Systems Inc. They will be discussed on this weblog later.

Contact

For questions, suggestions and other remarks about this weblog in general or any related issues, please use the following e-mail address: info (at) electrospaces.net

For sending an encrypted e-mail message, you can use the PGP Public Key under this ID: B4515E04

You can also communicate through Twitter: @electrospaces or XMPP/Jabber chat by using the address electrospaces (at) jabber.de

The title picture of this weblog shows the watch floor of the NSA's National Security Operations Center (NSOC) in 2006. The URL of this weblog recalls Electrospace Systems Inc., the company which made most of the top level communications equipment for the US Government. All information on this weblog is obtained from unclassified or publicly available sources.QW5kIGZpbmFsbHksIHRoaXMgaXMgd2hhdCBhIHRleHQgbG9va3MgbGlrZSwgd2hlbiBpdCdzIG9ubHkgZW5jb2RlZCB3aXRoIHRoZSBzdGFuZGFyZCBCYXNlNjQgc3lzdGVtLiBHdWVzcyBob3cgY29tcGxpY2F0ZWQgaXQgbXVzdCBiZSB3aGVuIGEgcmVhbCBzdHJvbmcgYWxnb3JpdGhtIHdhcyB1c2VkLg==