As Ars reported yesterday, documents provided to TheWashington Post by former National Security Agency contractor Edward Snowden show that the NSA was able to harvest enormous amounts of unencrypted information from Google and Yahoo by grabbing the data straight off the companies' wide-area networks. Analysis of the documents alongside previously leaked data and other information explains why engineers affiliated with Google shouted expletives when they were shown how the NSA effectively bypassed the safeguards that the companies had put in place to protect customer data.

In an interview with Bloomberg TV yesterday, NSA Director Gen. Keith Alexander said, "I can tell you factually we do not have access to Google servers [or] Yahoo servers." Technically, Gen. Alexander's denial is truthful—the NSA did not access Google's or Yahoo's servers itself. But the agency's MUSCULAR program, undertaken in collaboration with the United Kingdom's NSA equivalent, the GCHQ, does tap into the traffic of the networks that links those companies' data centers.

The taps, described as a "minor circuit move" by NSA documents, simply plugged into the telecommunications infrastructure that carries Google's and Yahoo's private fiber links. It gave the NSA access inside the two companies' Internet perimeters, allowing the agency to scan and capture massive amounts of data—so much that the NSA's Special Source Operations complained that it had too much garbage to sort through.

Forget the PRISM—go for the clear

The NSA already has access to selected content on Google and Yahoo through its PRISM program, a collaborative effort with the FBI that compels cloud providers to turn over select information through a FISA warrant. And it collects huge quantities of raw Internet traffic at major network exchange points, allowing the agency to perform keyword searches in realtime against the content and metadata of individual Internet packets.

But much of that raw traffic is encrypted, and the PRISM requests are relatively limited in scope. So the NSA went looking for a way to get the same sort of access to encrypted traffic to cloud providers that it had with unencrypted raw Internet traffic. The solution that the NSA and the GCHQ devised was to tap into the networks of the providers themselves as they crossed international borders.

Google and Yahoo maintain a number of overseas data centers to serve their international customers, and Internet traffic to Google and Yahoo is typically routed to the closest data center to the user. The Web and other Internet servers that handle those requests generally communicate with users via a Secure Socket Layer (SSL) encrypted session and act as a gateway to other services running within the data center—in the case of Google, this includes services like Gmail message stores, search engines, Maps requests, and Google Drive documents. Within Google's internal network, these requests are passed unencrypted, and requests often travel across multiple Google data centers to generate results.

In addition to passing user traffic, the fiber connections between data centers are also used to replicate data between data centers for backup and universal access. Yahoo, for example, replicates users' mailbox archives between data centers to ensure that they're available in case of an outage. In July of 2012, according to documents Snowden provided to the Washington Post, Yahoo began transferring entire e-mail accounts between data centers in its NArchive format, possibly as part of a consolidation of operations.

By gaining access to networks within Google's and Yahoo's security perimeters, the NSA was able to effectively defeat the SSL encryption used to protect customers' Web connections to the cloud providers, giving the agency's network filtering and data mining tools unfettered access to the content passing over the network. As a result, the NSA had access to millions of messages and Web transactions per day without having to use its FISA warrant power to compel Google or Yahoo to provide the data through PRISM. And it gained access to complete mailboxes of e-mail at Yahoo—including attachments that would not necessarily show up as part of intercepted Webmail sessions, because users would download them separately.

But the NSA and the GCHQ had to devise ways to process the streams of data passing between data centers to make it useful. That meant reverse-engineering some of the software and network interfaces of the cloud providers so that they could break apart data streams optimized to be sent across wide-area networks over multiple simultaneous data links. It also meant creating filtering capabilities that allowed the NSA and the GCHQ to separate traffic of intelligence interest from the vast amount of intra-data center communications that have nothing to do with user activity. So the NSA and the GCHQ configured a "distributed data distribution system" (as the NSA described MUSCULAR in this FAQ about the BOUNDLESSINFORMANT metadata search tool acquired by the American Civil Liberties Union) similar to XKeyscore to collect, filter, and process the content on those networks.

Mailbox overload

Even with filtering, the volume of that data presented a problem to NSA analysts. When Yahoo started performing its mailbox transfers, that data rapidly started to eclipse other sources of data being ingested into PINWALE, the NSA's primary analytical database for processing intercepted Internet traffic. PINWHALE also pulls in data harvested by the XKeyscore system and processes about 60 gigabytes of data per day that it gets passed from collection systems.

By February of 2013, Yahoo mailboxes were accounting for about a quarter of that daily traffic. And because of the nature of the mailboxes—many of them contained e-mail messages that were months or years old—most of the data was useless to analysts trying to find current data. Fifty-nine percent of the mail in the archives was over 180 days old, making it almost useless to analysts.

So the analysts requested "partial throttling" of Yahoo content to prevent data overload. "Numerous analysts have complained of [the Yahoo data's] existence," the notes from the PowerPoint slide on MUSCULAR stated, "and the relatively small intelligence value it contains does not justify the sheer volume of collection at MUSCULAR (1/4th of the daily total collect)."

This isn't the first throttling of data intercepts that the NSA had to undertake. It also had to throttle back its collection of Webmail address books, instead focusing on instant messaging "buddy lists." In 2012, the NSA created a "defeat" that exposed address book data over the Webmail protocols for Gmail, Yahoo, and Hotmail, and it later added Facebook instant messaging friends lists. That collection has been narrowed now to just Facebook address books—in part because of an episode last year where a Yahoo mail account monitored by the NSA was hacked and used by spammers, causing the account's address book to grow exponentially with unrelated e-mail addresses.

I really want to hear the reasoning behind why the NSA tapped the phone communications for the Pope. Nothing can or will be done with that information, unless the NSA believes the Vatican is connected terrorists. These actions just scream "we have too much power and money at our disposal".

I really want to hear the reasoning behind why the NSA tapped the phone communications for the Pope. Nothing can or will be done with that information, unless the NSA believes the Vatican is connected terrorists. These actions just scream "we have too much power and money at our disposal".

The problem as I see it is it all comes down to the mind set that hoovering is fine as long as you don't look at the data. With that kind of mentality, they have no real limits on what they are allowed to steal.

I really want to hear the reasoning behind why the NSA tapped the phone communications for the Pope. Nothing can or will be done with that information, unless the NSA believes the Vatican is connected terrorists. These actions just scream "we have too much power and money at our disposal".

Free indulgences via blackmail.

There is not always evil intend behind immoral acts. It's simply the Aperture method at work: "We do what we must because we can."

In an interview with Bloomberg TV yesterday, NSA director Gen. Keith Alexander said, "I can tell you factually we do not have access to Google servers [or] Yahoo servers." Technically, Gen. Alexander's denial is truthful—the NSA did not access Google's or Yahoo's servers themselves.

I'm repeatedly surprised by his tactics - every time he talks (and to anyone: Congress or a reporter) it's to tell functional lies that hide behind narrow technicalities. This is the sort of conduct one expects of an opposing counsel in a courtroom, not of a public servant. Of course I'm sure he regards his duties as being beyond the comprehension of common citizens (see Col. Jessep: "I have neither the time nor the inclination to explain myself to a man who rises and sleeps under the blanket of the very freedom that I provide, and then questions the manner in which I provide it."), but doesn't he care about how this seems? And if not then why does he bother with the functional lies?

I don't seem to have seen many comments about how mind-shatteringly stupid it was for Google or Yahoo to let plaintext leave their physical facilities in the first place.

Seriously. Any admin idiotic enough to think "leased fiber" is an adequate guarantee of confidentiality for that much information, or for information of unknown sensitivity, should have just been fired ages ago.

You don't even have to be the NSA to tap that stuff. All you have to do is subvert one of the hundreds of unknown people who have access , which is easy even for organized crime.

You shouldn't even be letting plaintext run around loose in a large data center. Too many people have access. Physical security and separation of duties are basic issues.

I'm not saying the NSA and friends aren't criminals, but I expect some basic diligence in protecting data from criminal elements.

No kidding. Goddammit, the more I hear about just how many ways the NSA is bypassing basic (or even complex) security to suck up all available data that they can (far beyond what they could possibly ever NEED) WITHOUT warrants, the more pissed off I'm getting.

The people in charge are really goddamn lucky this isn't an election year, a lot of heads would definitely be rolling. I just hope that collectively everyone remembers this and holds their elected officials accountable.

Numerical in this case referring to spelled out in numerals instead of letters. The OP is technically correct that it is considered bad form to begin a sentence with a number spelled out numerically, but it's a dumb rule that serves no purpose.

I don't seem to have seen many comments about how mind-shatteringly stupid it was for Google or Yahoo to let plaintext leave their physical facilities in the first place.

Seriously. Any admin idiotic enough to think "leased fiber" is an adequate guarantee of confidentiality for that much information, or for information of unknown sensitivity, should have just been fired ages ago.

You don't even have to be the NSA to tap that stuff. All you have to do is subvert one of the hundreds of unknown people who have access , which is easy even for organized crime.

You shouldn't even be letting plaintext run around loose in a large data center. Too many people have access. Physical security and separation of duties are basic issues.

I'm not saying the NSA and friends aren't criminals, but I expect some basic diligence in protecting data from criminal elements.

I saw one yesterday in Ars. The poster commented about overkill VPNs, then in hindsight the extra level of encryption didn't seem so silly.

Who was that company that did a VPN audit only to discover some employee had farmed his work out to China? That is, everything was encrypted, but one of the recipients of the data was not authorized. Now that is good security in the pre-Snowden era.

I don't seem to have seen many comments about how mind-shatteringly stupid it was for Google or Yahoo to let plaintext leave their physical facilities in the first place.

Seriously. Any admin idiotic enough to think "leased fiber" is an adequate guarantee of confidentiality for that much information, or for information of unknown sensitivity, should have just been fired ages ago.

You don't even have to be the NSA to tap that stuff. All you have to do is subvert one of the hundreds of unknown people who have access , which is easy even for organized crime.

You shouldn't even be letting plaintext run around loose in a large data center. Too many people have access. Physical security and separation of duties are basic issues.

I'm not saying the NSA and friends aren't criminals, but I expect some basic diligence in protecting data from criminal elements.

Tell me about it, I wonder if Google et al could be sued for negligence?

Yes. However, it does not change the fact that technically he is also a doucheb@g. I know its unfair but he personifies (some what) what the NSA has become.

Agreed. His claim reminds me of Clinton and his "did not have sexual relations with" statement.

Technically accurate on paper, but completely inaccurate in the real-world sense of the phrase. They are omitting the context of the statement, particularly the way in which they mean it, knowing that folks will tend to accept things on face value. Thus, they are practicing a subtle form of manipulation. And I find that I have less and less patience with people who practice this.

I do audits for part of my living, and as soon as I figure out that someone is hair splitting on technicalities that they keep to themselves, I shift into "I can't trust a single thing out of your mouth without independent proof."

Surprisingly, the audit actually tends to go better at that point because (IMHO) people hate working with guys like this. Guys like this tend to do this manipulation/deception all the time - not just at audit or grilling points. And so they are indeed complete doucheb@gs.

In an interview with Bloomberg TV yesterday, NSA director Gen. Keith Alexander said, "I can tell you factually we do not have access to Google servers [or] Yahoo servers." Technically, Gen. Alexander's denial is truthful—the NSA did not access Google's or Yahoo's servers themselves. But the agency's MUSCULAR program, undertaken in collaboration with the United Kingdom's NSA equivalent, the GCHQ, does tap into the traffic of the networks that links those companies' data centers.

Oy vey. They say they didn't access the servers but they don't deny they accessed the traffic BETWEEN servers which is the same difference.

It's half-truths like this that leads me to conclude that reforms of the programs are nigh-impossible, that the only thing that will work is a blanket prohibition no matter how blunt that may be. Again and again, when confronted with details of their program, the intelligence community has retreated to these fine, lawyerly answers that split the hair so close that, while they are strictly speaking true, are lies as understood in the everyday meaning of the word.

The NSA and its defenders say they want to have an honest debate about the place of surveillance in a democracy yet when confronted with the details of that surveillance, details that are needed to have that reasoned debate they claim they want to have, they repeatedly resort to these half-truths that are too clever by a half. You can't have an honest debate with folks like Gen. Alexander or my own Senator Diane Feinstein who says what the NSA is doing isn't "surveillance" because "surveillance" is the capture of information that enjoys privacy protections and since metadata is a public record that isn't private, vacuuming it up isn't "surveillance"! It's impossible to engage in a meaningful conversation with those who employ circular, Alice in Wonderland logic like this.

So, PRISM ends up largely being a reconstruction tool (in relation to Google and Yahoo at least).

It seems in many cases they'd already know what they want, because they've already seen it. Go to PRISM and ask for it with a subpoena and you have a sanitized source for use in crafting policy documents for clients.

If the above-board reported requests to Google essentially represent the subset of picked-over data the NSA needs to launder for dissemination ... I wonder how big the actual iceberg is?

When a news outlet like the Washington Post releases a story like this, and the senior government official in charge of the program uses weasel words to say "we don't access their servers," we call that person a "lying sack of shit."

Not that I think truth matters to Keith Alexander, but I'm waiting for him to say "we don't access their data." He obviously knows the difference, and unfortunately for him, so do we.

Sean Gallagher / Sean is Ars Technica's IT Editor. A former Navy officer, systems administrator, and network systems integrator with 20 years of IT journalism experience, he lives and works in Baltimore, Maryland.