Or, "How not to build a data warehouse."

On November 5, 2009, an Army psychiatrist stationed at Fort Hood, Texas shot and killed 12 fellow soldiers and a civilian Defense Department employee while wounding 29 others. US Army Major Nidal Malik Hasan, the American-born son of two Palestinian immigrants, reportedly shouted “Allahu Akbar!”—“God is great!”—before launching his 10-minute shooting rampage at the Soldier Readiness Center. The shooting—the worst ever on an American military base—occurred as Hasan was facing imminent deployment to Afghanistan. A civilian police officer shot Hasan and placed him under arrest.

In the investigation that followed, the FBI and Defense Department investigators found that Hasan had been communicating with Anwar al-Aulaqi (sometimes spelled "al-Awlaki"), an American radical Islamic cleric living in Yemen. In the process of reviewing the evidence, investigators found that the FBI’s Joint Terrorism Task Forces in San Diego and Washington, DC had been aware of Hasan’s interactions with Aulaqi for over 11 months before the attack. Yet Hasan had never even been interviewed about his connection with the imam who would later be tied to “underwear bomber” Umar Farouk Abdulmutallab and to attempts to bomb US bound cargo planes with explosives packed in laser printer cartridges. (al-Aulaqi would later be killed by a US drone strike in Yemen.)

As federal officials looked into whether they had somehow missed leads that might have prevented the shooting, they found that the information technology at the heart of the FBI’s efforts to prevent terrorist attacks was fractured, overburdened, and running on aging and underpowered hardware.

Two weeks ago—coincidentally, just hours before another gunman would kill 12 and wound many more in Aurora, Colorado—an FBI independent commission led by former FBI director and federal judge William H. Webster filed its final report on the FBI’s performance leading up to the Fort Hood shooting. That report found no evidence the FBI's data would have set off alarms that Hasan was planning to kill fellow soldiers; he received no explicit instructions from al-Aulaqi and never mentioned his plans. But the report strongly implies that FBI IT systems and the bureau’s poor state of information sharing with other agencies played a role in the failure to take a harder look at Hasan.

Much has been made of government's power to survey citizens using technologies such as packet capture and deep packet inspection. Even used in a limited fashion, these technologies can gather massive amounts of data on the online behaviors of individuals, and when taken together they can create an electronic profile of people's lives. That potential—and concerns about its abuse—have driven privacy advocates to push for the repeal or alteration of laws such as the PATRIOT Act.

At the same time, US law enforcement and intelligence agencies have struggled over the past decade to take all of this information and put it to use. The poor search capabilities of the FBI's software, inadequate user training, and the fragmented nature of the organization's intelligence databases all meant there was no way for anyone involved in the investigation to have a complete picture of what was going on with Hasan.

While much has changed since November of 2009, the FBI’s intelligence analysis and sharing systems remain a work in progress at best—and there's no telling what other potential threats may have gone unnoticed.

Packet captured

Nidal Hasan in a US Army photo after his promotion to major.

Hasan first drew the interest of the Joint Terrorism Task Force in the FBI’s San Diego field office back in December of 2008, while he was a captain assigned to Walter Reed Army Medical Center and he attempted to make contact with Aulaqi via a message form on Aulaqi’s personal website.

The San Diego JTTF—a team made up of FBI agents and analysts, along with officers from the Defense Criminal Investigative Service (DCIS) and Navy Criminal Investigative Service (NCIS)— had been investigating Aulaqi since the late 1990s, when he was an increasingly radical imam at a San Diego mosque. As part of that investigation, the FBI monitored his electronic communications under a secret warrant, intercepting traffic to his personal webpage, his e-mails sent to a Yahoo webmail account, and his instant messages.

While the tools to do this could be primitive at the time, they did work. Back in 1997, the FBI began intercepting e-mails and other network traffic with the custom tool "Carnivore" (later given the bland name "DCS-1000" after copious criticism), a Windows-based packet sniffer that could capture specific types of communications as part of warranted surveillance. (In 2005, the FBI dropped its bespoke sniffer and switched to commercial deep packet inspection technology, which by then offered better features and performance.)

So when Hasan visited Aulaqi's website in 2008 and used its "Contact the Sheik" page to send Aulaqi a message, he identified himself with his name and his own AOL e-mail address. The FBI’s surveillance software scooped it up and noted Hasan's IP address, which resolved to Reston, Virginia:

Hasan's first message to al-Aulaqi, through his website.

Hasan’s message then entered an FBI database called the Data Warehousing Service (DWS), a database originally designed when the FBI was still using Carnivore. The FBI's Special Technologies and Applications section had designed DWS in 2001, but it was misnamed. DWS wasn’t a data warehouse per se but instead was designed as a transactional database for storing intercepted communications captured in criminal investigations—not for doing analysis on large data sets.

DWS wasn’t the only system used for handling surveillance data, though. In 2002, the Bureau had launched the Electronic Surveillance (ELSUR) Data Management System (EDMS), a separate system for handling foreign intelligence surveillance. The goal of EDMS was to help language analysts translate and annotate electronic content ranging from audio (collected from wiretaps and telephone monitoring) to intercepted e-mail and seized electronic media.

At the time that Hasan’s message ended up in the hands of the San Diego JTTF, the two systems were in turmoil. In February 2009, the systems were merged under a single user interface called DWS-EDMS as part of an effort to improve and consolidate surveillance data access—combining both criminal investigations and intelligence activities.

But with the Global War on Terror in full swing, DWS and EDMS had hit the wall—neither was really intended to handle the volume of surveillance that began rolling in to support counterterrorism investigations.

I interact with some state and federal agencies in my work, and it is a familiar pattern: politicians are perfectly fine increasing the budget a random percentage in something that sounds popular, but are totally unwilling to accept a 200-300 million dollar investment to replace aging infrastructure that is obsolete.

We are dealing now with a state agency that deals with 700,000 clients using a 1989 COBOL-based mainframe, for God´s sake.

As a federal employee this seems to be the norm. We have an entire infrastructure that is mis-budgeted and mismanaged to a ridiculous degree. Our hardware is literally dying and in need of replacement and only because the stupid software they install runs the PC's into the ground sooner than expected. The IT Security people are dolts who make sure software is literally unusable in any form and we have so many overlapping data systems its horrifying. If the government was a private enterprise the whole lot of them would be fired for sheer incompetency. But for some reason each department/agency/section all feel the need to reinvent the wheel. All the time.

We have a system where data from reports are stored in an Oracle SQL database, another system that is built off the same software yet houses different data and managed by a different department and contractor that houses another oracle database, then we have a another database system that is supposed to unify all these systems but is so poorly designed no one is using it except when we are made to and thus it is basically a duplicate of all other systems just with less useful information and the inability to run any useful reports.

That's why I always laugh when people talk of government conspiracies and what not. How could such an dysfunctional system put together such a well planned and well secreted plan would be beyond me.

Am I the only one that is more concerned with government surveillance than terrorism? Giving up 100% of your freedom to communicate without government watchers to possibly mitigate extremely rare attacks seems about as un-American as one can get.

Very interesting stuff and aligns with my own experience with government agencies. Most big agencies or even commercial organizations don't seem to realize that they are at the mercy of their IT systems and infrastructure. Essentially a lot of big organizations have become IT shops, but they refuse to recognize this.

Those who invest into IT and take it seriously will actually have a chance at gaining a considerable competitive edge.

Also typo in picture caption:

"including the plot to blow up cargo planes using bombes disguised in laser printer cartridges"

Very interesting stuff and aligns with my own experience with government agencies. Most big agencies or even commercial organizations don't seem to realize that they are at the mercy of their IT systems and infrastructure. Essentially a lot of big organizations have become IT shops, but they refuse to recognize this.

Those who invest into IT and take it seriously will actually have a chance at gaining a considerable competitive edge.

Also typo in picture caption:

"including the plot to blow up cargo planes using bombes disguised in laser printer cartridges"

bombs, not bombes as far as I'm aware

You missed this beauty:

That report found no evidence the FBI's data should would have set off alarms that Hasan was planning to kill fellow soldiers

You can't blame the technology when it comes down to human error and poor communication between departments.

Secondly it looks like they are making the problem worse for themselves by casting a ridiculously large net, rather than focusing on the problem.

In the documentary, USA vs Al-Arian (don't watch if you don't like being depressed for a whole week), you can see that years of family telephone conversations were recorded and it didn't serve any purpose.

The NYPD surveillance of non suspicious citizens in the tens of thousands puts needless strain on systems that are meant to track potential criminals.

Ultimately, it seems that poor judgment is to blame here.

This shooting in particular had warning flags that were not recorded via email or phone, but rather through observed behavior.

Thanks for this well researched article on a subject that is seldom seriously discussed. We all tend to view military and special agencies Information Systems as state of the art, oversized and efficient, a view in which we’re comforted by hundreds of books, movies, comics and probably quite a little government propaganda. In that light, it feels especially nice to read an article depicting the actual workings of one such system.

It’s quite impressive to realize that even today, with the prodigious technologies at our disposal, information access, dissemination and sharing still remains a challenge that most large structures and administration fail to properly address.

Our asswipe government is spending so much time with the Megauploads and Kim Dot Coms of the world that they don't have time to focus of much of anything else. They've got to protect the financial interests of the big corporations.

This really seems to shoot down the Fox News promoted story that he was a known problem and that they did not deal with him for political correctness reasons.

Except he was a known problem, to his superiors, classmates and other officers he served with. Dr. Val Finnell told The Associated Press that he and other classmates participating in a 2007-2008 master's program with Hasan at the Uniformed Services University had complained about his comments, including that the war on terror was "a war against Islam."Another classmate told the AP that he complained to five officers and two civilian faculty members at the university. He also wrote to Pentagon officials that fear in the military of being seen as politically incorrect prevented an "intellectually honest discussion of Islamic ideology" in the ranks.Other classmates who participated in a 2007-2008 master's program at a military college said they, too, had complained to superiors about Maj. Nidal Malik Hasan's anti-American views, which included his giving a presentation that justified suicide bombing and telling classmates that Islamic law trumped the U.S. Constitution.So, tell me again how he was not a known problem, or that the military did not deal with him because of their PC outreach program to Muslims?

Very well put-together article, Sean. It's both comforting and terrifying to know that the FBI IT environment is just as much a mess as the average Fortune 500 company's IT environment--broken and outdated silo'd data systems backed by poor or nonexistent disaster recovery.

If the government was a private enterprise the whole lot of them would be fired for sheer incompetency. But for some reason each department/agency/section all feel the need to reinvent the wheel. All the time.

You think private enterprise is any different? Having worked for both, it isn't. Having seen a large amount of systems on both sides of the fence, it really, really isn't different.

Thanks for this well researched article on a subject that is seldom seriously discussed. We all tend to view military and special agencies Information Systems as state of the art, oversized and efficient, a view in which we’re comforted by hundreds of books, movies, comics and probably quite a little government propaganda. In that light, it feels especially nice to read an article depicting the actual workings of one such system.

It’s quite impressive to realize that even today, with the prodigious technologies at our disposal, information access, dissemination and sharing still remains a challenge that most large structures and administration fail to properly address.

IRC that the East German Stasi had 1 out of 10 citizens working as informers but were unable to figure out and prevent their entire regime from being over thrown.

Wow... I'd imagine that any army member who subscribed to "44 Ways of Supporting Jihad" feed would be immediately picked up for questioning.

He didn't.

Article wrote:

Hasan did subscribe to get site updates from Aulaqi's site via Google Feedburner

In other words, he had a site on his Feed service of choice and that site posted the PDF you are talking about as well as the email listed in the article. This is the problem with cherry-picking evidence after the fact, all you need to do is go through the data and find things that look disparaging. If you subscribe to any feeds, do you think there are ever any posts/documents that an outsider would find questionable? Perhaps if you were reading about Anonymous even here on Ars? Imagine how that would get twisted if you were later charged of hacking something. Heck, you don't necessarily even have to read the article. Just as it is stated in this article, a site he subscribed to made those posts but do we even know if he read them?

I think this problem is analogous to logging everything in a corporate environment. Unless you have appropriate detective controls in place, the only purpose logging provides is for forensics so you know what happened after the fact. This is the problem when you are storing huge volumes of data - you need a way to parse it to generate usable information. Until that problem is solved, all you have is data you can use for forensics but that does nothing to prevent any events from occurring. Right now they seem pretty good at collecting evidence but the next step is the hardest when figuring out how to make it usable and actionable.

I love the fact that they have an app called "Telephone Application" That screams "amateur app I made because I was tired of dealing with this bullshit, then everyone started using it, and now we're stuck with it."

None of this is really surprising, large enterprise systems are always a nightmare, throw in government procurement, budgets, and politicking, and you will always have a clusterfuck. They would be a lot better off if they just contracted out things like the database storage and searching to Google, but what sane (aka, not already a government contractor) company would want to step into that mess? I think there's going to need to be a general push in government to modernize, they can't keep going this way forever. I think like medical information systems, it's going to get much, much worse before someone like Google or MS decides there is a market there (and the people buying are stuck in such a hole that they are willing to throw away their existing enterprise stuff) and wipes out all the competition, just like they did with smartphones.

I think this problem is analogous to logging everything in a corporate environment. Unless you have appropriate detective controls in place, the only purpose logging provides is for forensics so you know what happened after the fact. This is the problem when you are storing huge volumes of data - you need a way to parse it to generate usable information.

Basically this. If information is requested as part of an investigation, we don't release it without someone from IT being present to explain exactly what it means.

Because all too often managers will misinterpret something, go haring off down the wrong track, we'll get wind of it and then have to rugby tackle the fuckwit before he kicks off an unfair dismissal suit.

This really seems to shoot down the Fox News promoted story that he was a known problem and that they did not deal with him for political correctness reasons.

Except he was a known problem, to his superiors, classmates and other officers he served with. Dr. Val Finnell told The Associated Press that he and other classmates participating in a 2007-2008 master's program with Hasan at the Uniformed Services University had complained about his comments, including that the war on terror was "a war against Islam."Another classmate told the AP that he complained to five officers and two civilian faculty members at the university. He also wrote to Pentagon officials that fear in the military of being seen as politically incorrect prevented an "intellectually honest discussion of Islamic ideology" in the ranks.Other classmates who participated in a 2007-2008 master's program at a military college said they, too, had complained to superiors about Maj. Nidal Malik Hasan's anti-American views, which included his giving a presentation that justified suicide bombing and telling classmates that Islamic law trumped the U.S. Constitution.So, tell me again how he was not a known problem, or that the military did not deal with him because of their PC outreach program to Muslims?

In hindsight a lot of things seem obvious. Lots of soldiers have problems with policy, and conflicts with religion, and that goes for Christian and other religious soldiers as well. There is a line between those kinds of conflict and being willing to actively work against the military you serve in, or in a worst case, shoot your fellow soldiers. This article makes it pretty clear that the largest failure was a technological one, not a political correctness one.

Sure they could have taken more interest in him, but his struggles were not unique. But the FBI had everything needed to know he was on the road to this but their own issues kept them from calling it out.

Any of you thought that perhaps it would be more efficient to kinda avoid situations where you piss off people enough that they want you dead at any cost?

Would be a lot cheaper and safer than making stasi 2.0

When the "situation" is living in a society that claims to protect personal freedoms, and for the most part, succeeds, I choose to not avoid that. Even if the US weren't the "World Police" (and I personally do think we should not be) there would be condemnation of our way of life by other groups, which may lead to acts of terrorism. Did we kick the hornets nest? Probably -- but the sentiment of the hive existed beforehand.

Any of you thought that perhaps it would be more efficient to kinda avoid situations where you piss off people enough that they want you dead at any cost?

Fanatics are irrational, we had just about the same frequency of terrorist attacks before 9/11 as after. They don't need a legitimate reason to hate us. This is about a technological problem anyway, no the politics of why he went nuts.

Also, the incident where a Chinese consulate was bombed during the Bosnian conflict may have stemmed from people at the CIA using a convenient, but out-of-date, map that was available online on their internal intranet instead of checking on more up-to-date paper maps they also had...

For example, one could take a product like Ubuntu Cloud and program an open-source solution to the FBI's database problems. Populate the open-source solution using fake data of similar volume and nature, or if there is a repository of unclassified data available (complete with e-mails) then use that instead. Such a system would use products similar in nature to Apache Hadoop/Solr/Lucene, except a bit more "hardened" like the Accumulo project.

Analyze the existing system and find out what features are used most often and implement those first. When a reasonably functional system exists using the fake or unclassified data, then take the Ubuntu Cloud implementation and push it behind the governmental firewall for it to be populated with real data and then tested for its veracity. In a cyclical manner: wash, rinse, repeat adding a bit more functionality each and every time.

Change the user interface to one that is similar to the intuitiveness and simplicity of tablets or cellphones. Doing such will minimize end-user training needs because the end-users will already have a pretty good idea as to how to use it. "Share" your data with others much as social media users do today, except when one clicks on the "Share" button, one is shown a different set of targets other than the "Facebook", "Twitter", "Google+", et. al. destinations. The different set of targets could be other offices or organizations within the FBI, CIA, NSA and so on, each with a different icon.

All of this could be done in the open by open-source programmers and then pushed behind the firewall to be managed by the government or a third party on its behalf. It is the concept of rapid, iterative development coupled with fake or unclassified data that roughly resembles the actual data that will produce relevant, timely results.

The main problem the FBI or any other law enforcement organization will face in revamping their IT is that they'll try to do it "behind the firewall" initially instead of focusing on what can be done in the open but re-purposed for secrecy later.

Oh well, I guess I'm making the erroneous assumption that someone of influence will read this comment to begin with...sigh.

For example, one could take a product like Ubuntu Cloud and program an open-source solution to the FBI's database problems. Populate the open-source solution using fake data of similar volume and nature, or if there is a repository of unclassified data available (complete with e-mails) then use that instead. Such a system would use products similar in nature to Apache Hadoop/Solr/Lucene, except a bit more "hardened" like the Accumulo project.

Analyze the existing system and find out what features are used most often and implement those first. When a reasonably functional system exists using the fake or unclassified data, then take the Ubuntu Cloud implementation and push it behind the governmental firewall for it to be populated with real data and then tested for its veracity. In a cyclical manner: wash, rinse, repeat adding a bit more functionality each and every time.

Change the user interface to one that is similar to the intuitiveness and simplicity of tablets or cellphones. Doing such will minimize end-user training needs because the end-users will already have a pretty good idea as to how to use it. "Share" your data with others much as social media users do today, except when one clicks on the "Share" button, one is shown a different set of targets other than the "Facebook", "Twitter", "Google+", et. al. destinations. The different set of targets could be other offices or organizations within the FBI, CIA, NSA and so on, each with a different icon.

All of this could be done in the open by open-source programmers and then pushed behind the firewall to be managed by the government or a third party on its behalf. It is the concept of rapid, iterative development coupled with fake or unclassified data that roughly resembles the actual data that will produce relevant, timely results.

The main problem the FBI or any other law enforcement organization will face in revamping their IT is that they'll try to do it "behind the firewall" initially instead of focusing on what can be done in the open but re-purposed for secrecy later.

Oh well, I guess I'm making the erroneous assumption that someone of influence will read this comment to begin with...sigh.

I've been planning for some time to make one of these for an RPG (Dark*Matter), it would be hilarious if it could have IRL utility.