Is Big Data the Way Ahead for Intelligence?

October 1, 2013

Another Overhyped Fad

By Mark M. Lowenthal

Director of National Intelligence Lt. Gen. James R. Clapper, USAF (Ret.), once observed that one of the peculiar behaviors of the intelligence community is to erect totem poles to the latest fad, dance around them until exhaustion sets in, and then congratulate oneself on a job well done.

One of our more recent totem poles is big data. Big data is a byproduct of the wired world we now inhabit. The ability to amass and manipulate large amounts of data on computers offers, to some, tantalizing possibilities for analysis and forecasting that did not exist before. A great deal of discussion about big data has taken place, which in essence means the possibility of gaining new insights and connections from the reams of new data created every day.

What do modern intelligence agencies run on? They are internal combustion engines burning pipelines of data, and the more fuel they burn the better their mileage. Analysts and decision makers are the drivers of these vast engines; but to keep them from hoofing it, we need big data.

The intelligence community necessarily has been a pioneer in big data since inception, as both were conceived during the decade after World War II. The intelligence community and big data science always have been intertwined because of their shared goal: producing and refining information describing the world around us, for important and utilitarian purposes.

Share Your Thoughts:

These are two of the smartest people I have ever met. But neither lets us know with clarity how they define Big Data. Seems like Mark defines Big Data as: that thing I don't understand but that everyone is talking about so they are missing the importance of analysis. Lewis, meanwhile, seems to define Big Data as: Data, same as when the IC was formed, having data is good and big data is nothing but data.

Well, there are more precise definitions. You can start with the one in Wikipedia, or can use the one TechAmerica used, they are both really really close to each other. Or you can use the one the White House office of science and technology policy used. It is close to the others too. It might be good to see a rematch of these two with them armed with a definition on what they are supposed to be arguing about.

In closing, a quote from Dan Ariely: Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it...

Hi Bob, thanks for watching and for the comment. I've read the definitions, and to be honest I doubt there'd be any difference in our debate were we to have been more definitionally explicit at the outset.... I know my case wouldn't change. The Wikipedia definition you cite, for example, immediately defines Big Data as "a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications." That is a definition unbounded by chronological time, and was as true in the 1950s about "massive-in-the-'50s-context" SIGINT collections as it is today for collections many orders of magnitude larger. "On-hand" tools and "traditional" applications change over time, but the aspect of Big Data about which I have been writing and speaking are relevant within whatever chronological slice one chooses in the IC's life. Again to quote the Wikipedia definition you cite, "Big data sizes are a constantly moving target," but the issue for the community is not size per se, but whether we gain value from exploiting the data we're ambitious enough to collect. I say yes.

Thanks Lewis. I'm with you on this. Your context is even better when you anchor yourself in definitions in this case. The community needs discipline here because when a concept reaches meme proportions everyone starts to peel off and use the term differently for their own ends (like we saw with other major tech concepts like Web Services, SOA, Cloud Computing). Using as precise a definition as possible is helpful, I believe.

But, as you mention, the core of your debate would not change, it would just make you seem much more right than you already were and would perhaps help Mark understand that technology is going to advance and is going to contribute, even though he is totally right that it is analysis we want, not just tech for tech's sake.

Lewis makes a good point about the definition of "Big Data". Big is a relative term. Looking at it temporally, what was big even 5-10 years ago is small by today's standards (Remember "No one will ever need more that 640K). But there are also other relative aspects and those deal with the types of problems one is attempting to address with the data. A graph with a few thousand nodes is not all that big size wise but performing reasoning on that graph is very big computationally.

In participating in the NIST Big Data Public Working group I know we struggled and debated endlessly the definition. Most folks start with the 3Vs (Volume, Velocity, Variety) and each of those brings a dimension of complexity to the problem. Many of us had other Vs that contribute to complexity like Variability and Veracity.

The key with any Data (big or not) is defining techniques and approaches that allow you to efficiently and effectively extract/derive knowledge from that data.

I believe that "Big" Data came about because now there are approaches that permit this type of exploitation for data sets that were not previously exploitable. The Big Data thrust so far has primarily been focused on storage and representation of the data. That is now transitioning more to an analytic and algorithmic focus (to include visualization) which to me is the harder problem.

I wish I had access to all that Big Data. I'd know what to do with some of it. Not all of it. But certainly some of it. I think good analysts and decision makers would too. That's where I agree with Lewis. Funny enough I was reading an article in the Financial Times (12/13/2013) which explored a conversation between analysts across the Atlantic. Seems they had the LeT data collected from faxes but it was all in Farsi and Arabic. They couldn't do anything with the data since they didn't speak the languages. Hopefully the IT vendors don't pitch more technology innovation to solve that.