After hearing this story on NPR this morning about the Harmony Database of al Qaeda correspondence, I began wondering what other public databases like this exist out there that are ripe for analysis and picking apart. I'm looking specifically for things like this related to the "War on Terrorism" and the Iraq War, but older stuff would be good too.posted by proj to Law & Government (2 answers total) 4 users marked this as a favorite

My understanding is that one of the theories for the name Al Qaeda ("the Base") is that it is named after a computer database file of all sorts of jihadists that the U.S. used against the Soviets during the Afghanistan war. Bin Laden took a copy and used it to build his organization. I don't know if it is available or not, however.posted by Ironmouth at 12:10 PM on October 23, 2007

If you are interested in network corpora -- any large, rich collection of networked text that you can freely download -- and it doesn't necessarily have to do with terrorism or Iraq war, here are some interesting ones.

MediaDefender Email Corpus -- All of the emails exchanged by employees of MediaDefender, a company hired by Media companies to spoof and disrupt the P2P networks (you'll probably have to hit bittorrent for this)

NLTK -- the natural language toolkit, comes with a variety of research-quality text corpora in English and many other languages.

Citeseer -- massive collection of scientific papers. (There are several others of these, like the preprint archives and NIH databases).

And, of course, Gutenberg and Wikipedia can both be downloaded in bulk, but there's a specific procedure for doing so which you should follow. See especially DBPedia for a fantastic collection of semantic data mined from wikipedia.

My current project is a website to collect, share and redistribute rich datasets like these -- a Flickr for data, if you like. I don't want to shill so please Metafilter-mail me if you'd like a link.posted by mrflip at 10:38 PM on January 9, 2008

Tags

Share

About Ask MetaFilter

Ask MetaFilter is a question and answer site that covers nearly any question on earth, where members help each other solve problems. Ask MetaFilter is where thousands of life's little questions are answered.