Wide Awake Developers

Needles, Haystacks

July 22, 2002

So, this may seem a little off-topic, but it comes round in the end. Really, it does.

I've been aggravated with the way members of the fourth estate have been treating the supposed "information" that various TLAs had before the September 11 attacks. (That used to be my birthday, by the way. I've since decided to change it.) We hear that four of five good bits of information scattered across the hundreds of FBI, CIA, NSA, NRO, IRS, DEA, INS, or IMF offices "clearly indicate" that terrorists were planning to fly planes into buildings. Maybe so. Still, it doesn't take a doctorate in complexity theory to figure out that you could probably find just as much data to support any conclusion you want. I'm willing to bet that if the same amount of collective effort were invested, we could prove that the U. S. Government has evidence that Saddam Hussein and aliens from Saturn are going to land in Red Square to re-establish the Soviet Union and launch missiles at Guam.

You see, if you already have the conclusion in hand, you can sift through mountain ranges of data to find those bits that best support your conclusion. That's just hindsight. It's only good for gossipy hens clucking over the backyard fence, network news anchors, and not-so-subtle innuendos by Congresscritters.

The trouble is, it doesn't work in reverse. How many documents does just the FBI produce every day? 10,000? 50,000? How would anyone find exactly those five or six documents that really matter and ignore all of the chaff? That's the job of analysis, and it's damn hard. A priori, you could only put these documents together and form a conclusion through sheer dumb luck. No matter how many analysts the agencies hire, they will always be crushed by the tsunami of data.

Now, I'm not trying to make excuses for the alphabet soup gang. I think they need to reconsider some of their basic operations. I'll leave questions about separating counter-intelligence from law enforcement to others. I want to think about harnessing randomness. You see, government agencies are, by their very nature, bureaucratic entities. Bureaucracies thrive on command-and-control structures. I think it comes from protecting their budgets. Orders flow down the hierarchy, information flows up. Somewhere, at the top, an omniscient being directs the whole shebang. A command-and-control structure hates nothing more than randomness. Randomness is noise in the system, evidence of an inadequate procedures. A properly structured bureaucracy has a big, fat binder that defines who talks to whom, and when, and under what circumstances.

Such a structure is perfectly optimized to ignore things. Why? Because each level in the chain of command has to summarize, categorize, and condense information for its immediate superior. Information is lost at every exchange. Worse yet, the chance for somebody to see a pattern is minimized. The problem is this whole idea that information flows toward a converging point. Whether that point is the head of the agency, the POTUS, or an army of analysts in Foggy Bottom, they cannot assimilate everything. There isn't even any way to build information systems to support the mass of data produced every day, let alone correlating reports over time.

So, how do Dan Rather and his cohorts find these things and put them together? Decentralization. There are hordes of pit-bull journalists just waiting for the scandal that will catapult them onto CNN. ("Eat your heart out Wolf, I found the smoking gun first!")

Just imagine if every document produced by the Minneapolis field office of the FBI were sent to every other FBI agent and office in the country. A vast torrent of data flowing constantly around the nation. Suppose that an agent filing a report about suspicious flight school activity could correlate that with other reports about students at other flight schools. He might dig a little deeper and find some additional reports about increased training activity, or a cluster of expired visas that overlap with the students in the schools. In short, it would be a lot easier to correlate those random bits of data to make the connections. Humans are amazing at detecting patterns, but they have to see the data first!

This is what we should focus on. Not on rebuilding the $6 Billion Bureaucracy, but on finding ways to make available all of the data collected today. (Notice that I haven't said anything that requires weakening our 4th or 5th Amendment rights. This can all be done under laws that existed before 9/11.) Well, we certainly have a model for a global, decentrallized document repository that will let you search, index, and correlate all of its contents. We even have technologies that can induce membership in a set. I'd love to see what Google Sets would do with the 19 hijackers names, after you have it index the entire contents of the FBI, CIA, and INS databases. Who would it nominate for membership in that set?

Basically, the recipe is this: move away from ill-conceived ideas about creating a "global clearinghouse" for intelligence reports. Decentralize it. Follow the model of the Internet, Gnutella, and Google. Maximize the chances for field agents and analysts to be exposed to that last, vital bit of data that makes a pattern come clear. Then, when an agent perceives a pattern, make damn sure the command-and-control structure is ready to respond.