Other sites

An analysis of the Wikileaks data with R

At his Zero Intelligence Agents blog, Drew Conway has taken on the task of performing a quantitative analysis (using R, of course) of the controversial Afghanistan document dump from Wikileaks. He’s started with an analysis of the overall flow of information in the five Afghanistan regions, categorized by type of activity (enemy, neutral, etc.).

(Click to enlarge.) It’s a 10,000-foot view of the data to be sure, but even show it does show some interesting trends: the relative quiet of some regions, surges and ebbs in the war, and the interchanges of activity between the various agents. Drew offers more analysis:

Given the nature of the reports, we would expect a noticeable degree of seasonality (peaks and valleys) given the natural ebb and flow of war. Any drastic deviations from this expectation could indicate a strong degree of selection on the part of Wikileaks. As you can see, however, the data generally do fit this expectation. Note the dramatic upward trending seasonality present in the heavy reporting areas of RC EAST and RC SOUTH. Perhaps more interestingly, though, is the sudden increase in the number of NEUTRAL reports present in the data for RC EAST and RC CAPITAL for the period roughly between mid-2006 and mid-2008.

Be sure to follow ZIA as Drew dives deeper into the analysis of this fascinating data set.