November 8, 2016

How the FBI analysed 650,000 emails in less than 8 days

“Drop non-responsive To:/CC:/BCC:, hash both sets, then subtract those that match,” Snowden said, responding to a question posed by CUNY Journalism School professor Jeff Jarvis about how long it would take the NSA “to dedupe 650k emails.”

A former FBI agent corroborated those assessments in an interview with Wired, noting that it “you can triage a dataset like this in a much shorter amount of time.”

“We’d routinely collect terabytes of data in a search,” said the agent, who requested anonymity. “I’d know what was important before I left the guy’s house.”

The FBI has not revealed the exact process by which it sifted through all of the relevant emails. But most experts agree that the agents probably didn’t even need the full eight days to do so.

“Given those emails, and a list of known email accounts from Hillary and associates, and a list of other search terms, it would take me only a few hours to reduce the workload from 650,000 emails to only a couple hundred, which a single person can read in less than a day,” cybersecurity consultant Rob Graham wrote in a blog post on Sunday.