One of the key points in the “fact sheet” that the White House published today about its plan to end the NSA’s bulk collection of phone record data is that while the NSA will no longer have possession of phone data, it will still have access to it. Under the newly proposed program, the White House document notes, the NSA would still have the ability to request data without a new court order in an emergency, and “[telecommunications] companies would be compelled by court order to provide technical assistance to ensure that the records can be queried and that results are transmitted to the government in a usable format and in a timely manner.”

In order to be able to live up to that mandate and deliver datasets for all numbers that are two “hops” from a specified phone number in a “timely manner,” one of two things would have to happen: telecom companies would have to have the capability to perform the same sort of analytic searches that the NSA currently performs with its Mainway database onsite; or the NSA would have to be able to make its own index of telco databases that would allow it to perform such searches. And while in either scenario the data available to the NSA would be a much smaller amount than what the agency currently retains (5 years’ worth), it would still give the NSA the ability to request large swaths of phone record data.

Hop to it

As I mentioned in my analysis of NSA’s “three hop” rule, two degrees of separation can cover a significant number of people. For example, if the average person were to have casual contact (such as by phone) with 1,000 people, anyone in the US would be separated by two hops at most. And a hop isn’t just a degree of separation; when an NSA official said the agency looks at “two to three hops” through phone records, that means that all of the phone contacts kept by the person that's two to three hops away are checked.

Phone companies generally don’t store it in a way that is indexed for this sort of search. They're usually done using an entity-relationship model database like Facebook’s Open Graph, Google’s Knowledge Graph, or the Palantir data analysis platform. These databases allow a search to follow down a chain of relationships from a given starting point (which is how Facebook tracks who’s in your network of “friends of friends” and determines who sees your information).

So, let’s say that the new phone metadata plan requires the phone companies to do that tracking themselves and to provide just the search results for specific queries against that data to the NSA. That would mean that they need to keep an entity database for all the calls they see crossing their networks, identifying each number involved as a node in the network and tracking all the other nodes that each connects to at least two hops out. If telcos are required to build these databases, they will essentially become miniature NSAs unto themselves; instead of waiting for bulk data dumps, NSA will be able to send either court-authorized or emergency queries to the phone companies and get nice tidy data sets piped back to them quickly. We’ll also need to pay much closer attention to the privacy statements for our “friends and family” plans, because the telcos own this data, and they are being told they have to relationship-index it, so they’ll inevitably try to find ways to use the capability to defray the cost of running that infrastructure.

Making a hash of it

The second route is to have the NSA own the hardware that stores an index of the data at each company, while the data itself still resides at the telcos. While this might sound more intrusive, it actually could boost privacy if done correctly, since the relationship data could be created and stored without having the phone numbers in question available.

Here’s how that would work: the NSA would run an analytical engine against the data stores of each phone company, but it would store pointers to the actual data in the entity structure rather than the phone numbers themselves. That pointer could be an encrypted version of the phone number, or a value created by a “hash” of the number or other data. As a result, the NSA relationship database would be useless by itself and could only be used to find the actual phone records associated with a number when given authorization by a court order (or in an emergency, by the phone company itself).

The biggest sticking point may be what a “timely manner” means. If the NSA requires the data in a matter of hours, that would mean NSA analysts would essentially need a live connection to the telcos’ data warehouses. Otherwise, it may mean the query results get delivered the old-fashioned way: on tape.