Selby has consulted with US and European governments, investment banks, start-ups and venture capital firms on matters related to security intelligence. Since 2008 he has focused on law enforcement intelligence and was sworn in as a Texas police officer in 2010. He is based Fort Worth, Texas.

The story of the week is the revelation by a 29-year-old National Security Administration (NSA) contractor named Edward Snowden that the US government has been ingesting large quantities of metadata and data about telephony and Internet traffic for the purposes of gathering intelligence.

This is highly controversial, to say the least.

For those of you who have spent the last week on Pluto, head on over to Doc Searls’ excellent blog for a comprehensive set of links to the stories about this spectacular news event. It is no overstatement to say that this leak is, in terms of its public and operational impact, the most explosive and significant since the 1971 publication in The New York Times of the classified history of the Vietnam conflict – what became known as The Pentagon Papers — and the most profound leak of intelligence tactics since Duncan Campbell’s 1976 reporting on GCHQ in Time Out, and the subsequent ABC Trials in 1978.

To put this in perspective, Bradley Manning’s leaks to WikiLeaks pale in significance when compared to any of the above but certainly to Snowden’s. This is not about cables, communiques and field intelligence, it’s about the very essence of Signals Intelligence (SIGINT) and specific methods and targets.

There is so much about to happen on this story that to say anything else about it would be folly. So I won’t.

What I will discuss is the “Why?”

I don’t think I’m alone in feeling that explanations given by talking heads (including myself) about what intelligence analysts do with metadata involves “pattern matching” and “dot connecting” and “link analyses” leave people confused. If you already understand the topic, then those statements are highly descriptive. To the uninitiated, though, they are as opaque as viewing your kid’s ballet recital through milk-filled goggles.

So what is it that people would want to do with all that data and metadata that would lead to an agency apparently Hoovering it up wholesale?

And what the hell is metadata, anyway?

Intelligence is about making the pile smaller

Here’s a super-obvious case to get the gist: If you were in San Francisco and you were looking for the person who stole your red Swingline stapler and if you also, magically, had access to the GPS signals from every mobile device in the Bay Area, what would you do?

I’ll tell you what I’d do: I’d draw a geo-fence around the area in which my Swingline stapler was last seen, and I would slice the huge pile of data by seeking only those devices that were in that area at around the time of the theft.

By doing that I have just reduced from 7.15 million – the number of people in the San Francisco metropolitan area – to probably a couple of dozen people you can be pretty sure were near the scene of the crime at the time. Is this proof of anything? Absolutely not. But it is a very short list of people who may have information about who stole the stapler, and that list may even include someone who later turns out to be (for other reasons) a good suspect or in fact the culprit.

This exercise is a simplified and streamlined version of the work conducted by SIGINT people each day: find an event, a person, a time and place or even just a place that is for some reason interesting, and work to reduce the quantity of data you have on that thing until you’ve reached a manageable quantity. The difference is that in intelligence, we’re predicting which things are of interest, as opposed to reactively finding things of interest and working backwards to a root cause.

We do this reduction because we have limited resources – even the powerful NSA has limited resources. We need to use broadly-sourced SIGINT like this as a starting point, or as a place to return once an “interesting thing” is discovered.

Metadata defined

The best explanation I’ve heard of the nature and composition of metadata came from Eric Olson, who writes the Digital Water Blog. Eric tells a story of having to fly across the country to make a speech on a subject you’re not familiar with, and you have $20, enough to buy one book on the subject to prepare. You go into the bookshop and you start looking in the shelves, and as you do, you start asking yourself questions: Do you know the author’s name? Is the author someone you trust? Is the author someone who’s been recommended to you? Do you trust the person who recommended it? Have you read any reviews about the book, and were they good? Does the jacket have blurbs from people you trust, praising the book? Is the book heavy (because you do your best reading in the bath)? Is it too large (because you’ll be reading it on a plane and you’re sitting in the cheap seats)? Is it too thick (because you have to give this speech tomorrow and won’t have time to read a tome)?

Olson points out that these questions are all information about the book that can lead to decisions about whether you will buy it, and you haven’t even looked at the table of contents.

Those questions are metadata about the book that you’ve generated, dynamically. Metadata, as you can see, can be extremely powerful. That’s why some are unsatisfied by President Obama’s statements that the NSA is “only” looking at telephony metadata. As pointed out by the wonderful Electronic Frontier Foundation (a group to which I personally donate money each year), metadata is truly important and can be extremely telling. It’s misleading to say that it’s “just” metadata.

It would have been far better for the administration to say, “Yeah, this is what we do, there’s oversight and that’s it.”

Creating metadata

At my company, we put these concepts to work and derive and create metadata that is in fact more immediately applicable and telling than specific records might be. For example, it’s highly specific information that on March 15, 2009, a given fugitive was involved in an altercation with Officer Jack Smith (#1028) of the Tulsa, Oklahoma, Police Department at 18:25, at the corner of B Street and 3rd Avenue, while he was driving a white, 1997 Dodge Ram 1500, OKLP 192BRYS, during which he swung at the officer with his fist and threatened to kill the officer.

We would simply set the “Violent” flag to a 1. Officers don’t need to know all those details of a case from four or five years ago but they do need to have an operational indication that this fugitive has been known in the past to be violent, or to present a threat of violence. Does this mean that this fugitive is guilty of violence now? Of course not. But it tells the officer viewing the information to be on his toes, using an indicator derived solely from metadata.

An officer viewing this fugitive’s case file within StreetCred would not have any idea of the details of the case, but he’d have everything he needs in that one metadatum to take steps to ensure his safety.

Taking metadata from others

Of course, what is under frantic and frenetic discussion these days is metadata describing telephone calls, and even of Internet communication. Here again, descriptions of “pattern matching” and “link analysis” can create confusion. One great way to look at this is the creation of a “social map” showing links between people; when it’s done, it resembles an airline route map.

Say you’ve got information that a telephone number is associated with Dr. Evil. One extremely fast way of determining who really knows Dr. Evil — as opposed to who just sort of casually knows Dr Evil — would be to look at call data records. This has been practice for a very long time in the European Union (Germany banned it as unconstitutional a couple of years ago).

These calls would allow me to understand whom Dr. Evil calls and who calls Dr Evil, the number of times they call, and their general or even specific location when they do. It would let me see how long people spoke (so, for example, later, when a person says, “But I don’t know him,” the question can be asked, “Oh yeah? So why did you make 51 phone calls to him in the last six months averaging four minutes and 15 seconds each?”).

Very quickly, this would allow me to go out one level to seek more connections that are relevant. It’s really, really easy to see how investigators could use this information to, say, map out a potential terror cell, but here’s the big thing: this is how investigators disqualify people from being looked at further.

Your data is mostly safe because it’s mostly uninteresting (and that’s not the same as ‘nothing to hide, nothing to fear’, which is only ever said by politicians with something to hide who fear being found out).

The analyses are generally there to reduce from petabytes to kilobytes the size of the data-pile in which analysts search for good targets. This is not to opine that there is nothing wrong with what is happening – just as it is not to opine there’s nothing right about it. I’m making no statement at all about whether these revelations are “good” or “bad.”

This post was to give some context about just what someone might do with a lot of that kind of data, and to explain some of the kinds of thinking that goes into exploiting it once collected.

6 Archived Comments

But this begs the question of when I write the word "terrorist" in the comment field of this particular blog what happens?

Will someone now spend some tax dollars looking into my Facebook profile to see if I have in fact in other instances mentioned this before or if I have a large portion of friends with dubious backgrounds.

I do not have anything to hide so fall into the category of those that will, hopefully, be filtered out but now dollars will have to be spent to at least check that out.

After all since I have now mentioned that taboo word would it not be considered a lack of diligence not to?

It's an interesting question you ask. Resource constraints mean that the "interestingness" of individuals is judged through some major sets of questions that are designed to determine your "un-interestingness" so that no one wastes any time looking at uninteresting stuff. Your mentioning a word like "terrorist" or someone mentioning a phrase like, "pressure-cooker bomb" is hardly a unique occurrence. There are millions of people doing that every day. There are far more ways to skin that particular cat. Most of the stuff that people post on the Internet is posted publicly. There's no need to engage in hanky-panky or relatively expensive privacy intrusion when most people simply tell us the answer to the question of whether they are interesting or not. I would recommend Googling the phrase OSINT or "Open Source Intelligence" and starting there.

Well-made point. I'm choosing not to address the "right" or "wrong" - and by extension the "hero" or "traitor" - discussion yet, as I believe we still don't have all the facts. But your point, if I take it completely seriously, is very interesting when viewed in a context of law enforcement. Many issues within law enforcement are highly complex interactions between human beings and result almost by definition in regular occurrences of cases that rest on one man's word against another's. There is in the kind of analysis we do at StreetCred and that others perform in commercial and academic instances an ability to, for the first time ever, find data that helps prove the proverbial negative. While you were being sarcastic in saying that we can prove that that Congressman <strong>didn't</strong> make that phone call to his broker (or bookie) after that briefing, we are finding patterns in law enforcement, court, demographic, public and commercial data can be used to demonstrate that people in fact <strong>didn't</strong> engage in bad conduct. That is a very exciting development. Thanks again for your comment.

The short answers, in order, are, 'Yes', 'Nothing', 'They did', and, 'Your guess is as good as mine'. I highly recommend the book <a href="http://www.amazon.com/Gchq-Uncensored-Britains-Secret-Intelligence/dp/0007312660/ref=sr_1_1?ie=UTF8&amp;qid=1371122545&amp;sr=8-1" title="GCHQ" rel="nofollow">GCHQ: The Uncensored Story of Britain's Most Secret Intelligence Agency</a> by Richard J Aldrich (thank you, Chris!), which discusses in great detail the history of GCHQ and its symbiotic or at the very least co-dependent relationship with the NSA. It has a chapter on the Falklands sharing you described, plus details of cooperative relationships in other parts of the world where one was stronger than the other, and the history of the UK's attempts to go it on its own with space-based SIGINT and why they ended up simply buying, borrowing or renting from the NSA.

In the past we have shared intelligence and our Intel assets with allies. As an example: Great Britain received satellite imagery from the US during the Falklands Campaign. What’s to keep the US from spying on the citizens of our Allies and giving that intelligence directly to their governments? As I understand the issue, oversight is not required when we spy on anyone outside of the US. For that matter, why don't we give the Brits the knowledge necessary to set up these same systems in their country and they can spy on our citizens for us and share that information with us, no oversight required. We can also us the abilities these systems offer to blackmail foreign dignitaries over any questionable activities we may discover and help influence the elections in these targeted countries. The possibilities are endless and somewhat unnerving.

This is great Nick, thanks. Now we can see just how law enforcement can be assisted by a large store of SIGINT data such as that evidently held by the NSA (with access from other agencies). Most of us have watched The Wire, so we are familiar with the ins and outs of making connections from phone numbers called.

Although the expense has been tremendous perhaps the payoff in lower law enforcement costs are on their way. The next time congressional staffers are accused of insider trading privileges a simple query can demonstrate that they did *not* make calls to their brokers after leaving a meeting with a lobbyist. Or if a politician is accused of wrong doing a simple look up will determine that he/she was never in the presence of that bribing beltway bandit. For that matter after apprehending a retail drug pusher on the streets of NYC a quick check of his cell phone contacts can demonstrate he did not have contact with any clients in the last 7 years. Or, when a whistle blower is identified the FBI can quickly determine who was not the journalist he talked to.