Felix Haass

List of useful R packages for quantitative conflict research

6 minute read

I’ve put together a list of R packages that might be useful for the quantitatively oriented conflict scholar. If you’ve done much of your work in R chances are high, however, that you already know most or all of them. Nevertheless the list might help those making the transition to R from other software or for beginners. Since I mostly do macro-comparative research on the country-level at the moment, this is the type of research these packages are most useful for. If you have suggestions for any other packages that are particularly useful to your work, let me know in the comments and I’ll add them to the list.

You can install most of them via

install.packages("packagename")

but some of them can only be installed via the author’s GitHub repository. In this case, you usually find installation instructions on the respective GitHub landing page or accompanying blog posts.

Ever since the world decided to use different identification codes for countries, researchers face the problem of identifying the same countries across different datasets using varying naming conventions. Anyone who has worked with that kind of data knows what kind of problems come with this. countrycode offers a remedy. Convert ISO-3 to Correlates-of-War codes? Plain-text names to ISO-2? IMF names to regions or even continents? This package does the heavy lifting for you. Nevertheless, always pay attention to inconsistent use of country codes in your source data. Countries that used to be part of Yugoslavia, for instance, are a likely candidate to be coded differently across datasets. (Author: Vincent Arel-Bundock)

The World Bank collects a wealth of economic, political and social country indicators and provides access to them through its API. The WDI package is essentially an R-wrapper for the API to download any of its 8000 indicators in a convenient country-year format. This makes the collection of cross-country statistics extremely easy (although one should keep in mind the limitations of the data quality). The package also provides a function to search the World Bank’s World Development Indicators to find the one you’re particularly interested in. (Author: Vincent Arel-Bundock)

When we study conflict, we are also often interested in the potential effects of different regime types or other institutional variables on conflict onset or recurrence. Some of the most frequently used datasets in this regard are the Polity IV data, the Database on Political Institutions (DPI), Reinhard and Rogoff’s financial crisis occurrence data and Bueno de Mesquita et al.’s ‘winset’ and ‘electorate’ measures. The psData package provides easy access to these datasets in R, with the respective functions returning country-year data frames that can be easily merged with your own data sets. Be advised that the package is still under construction. Nevertheless, it already works quite well for day-to-day tasks. (Author: Christopher Gantrud)

NB: As far as I can tell, it does not (yet) support downloading Freedom House scores, which come in relatively ugly and cumbersome Excel-files that need quite some data massaging to get into handy R data frames. Thankfully, Jay Ulfelder has written an R function that does exactly that for you. Another machine-readable compilation of FH scores is here. Keep in mind, though, that you always, always need manually to check if those country codes are correct.

DataCombine provides tons of useful functions to deal with cross-section time-series data; too many to list them here. But rest assured, if you have ever worked with CSTS data and you’ve despaired in face of one of the myriads of problems that come with this data structure (determining start/end dates, anyone?), DataCombine offers functions that make those problems less complicated. (Author: Christopher Gantrud)

UCDP’s conflict data is one of the most frequently used conflict data sets in the world. Although they are readily available on their website for download and analysis, UCDPtools allows you to directly download them into R data frames. The package not only provides access to UCDP’s “classic” Armed Conflict Worldwide data, but also to some of the more specialized datasets, such as the GED, External Support, or One-Sided Violence datasets. Note that for some of the datasets you need to have Perl installed. For installation guide, see here. (Author: Thomas Scherer)

Although certainly useful beyond conflict research, I add this package here because it provides one of the easiest ways to get your results out of R and into a paper (something that I see rarely taught in stats or R courses). The package offers screen, HTML or Tex output; the HTML output is especially convenient if you want to include clean regression tables in a Word document (simply change the .htm(l) suffix of the output file to .doc or copy & paste from your browser). The package also offers the convenient plotreg function which allows you to produce neat coefficient plots (much in the spirit of this piece). Check the excellent package vignette for a comprehensive walk-through. (Author: Philip Leifeld)

A small and simple package, dummies makes the creation of dummy variables out of factor variables easy. Over are the days of multiple ifelse() calls; simply call dummy(variable) and move on. Nice. (Author: Christopher Brown)

These are the packages that come to my mind. I’ve deliberately excluded widely-known, general purpose packages, such as ggplot2, reshape, dplyr, data.table etc., since this post’s focus was on packages that are particularly useful for (macro-)conflict research and, arguably, other types of macro-comparative research in R—although some of them are clearly useful beyond this type of research (e.g. texreg or dummies). Again, if you have any additions to the list, please let me know in the comments and I’ll include them.

— Update 4. Sep 2014 —

I’ve received some suggestions for additional packages over Twitter. I haven’t used the packages myself, but they look extremely useful, too. So here they are.

Quantitative conflict research is increasingly taking into account the geography of conflict, both within and between states. cshapes gives you the basic tools you need to calculate interesting geographical measures, such as distance between two countries, length of common borders, distance of a country’s capital to its geographical center, etc. Of course, it also comes with a lot of plotting functions to produce maps. Here is the article introducing the package + examples. (Authors: Nils Weidmann & Kristian Skrede Gleditsch)

This one is a bit beyond my mathematical pay grade, but it seems like it’s useful for a variety of tasks (certainly beyond conflict research), including forecasts–which are an important part of quantitative conflict studies these days. Here’s the package description:

Provides methods for estimating frequentist and Bayesian Vector Autoregression (VAR) models and Markov-switching Bayesian VAR (MSBVAR). Functions for reduced form and structural VAR models are also available. Includes methods for the generating posterior inferences for these models, forecasts, impulse responses (using likelihood-based error bands), and forecast error decompositions. Also includes utility functions for plotting forecasts and impulse responses, and generating draws from Wishart and singular multivariate normal densities. Current version includes functionality to build and evaluate models with Markov switching.

Conflict research frequently deals with binary political outcomes, the most famous one being outbreak of violent conflict. Thus, quantitative conflict scholars often build statistical models trying to explain and predict conflict onset based on a set of independent variables. However, it is often not straightforward to assess the overall quality of the model (the model “fit”). The separationplot package assists in this tasks and implements a method presented in this AJPS article to visually assess and communicate model fit, based on the model’s ability to correctly predict the outcome under investigation. (Authors: Brian Greenhill, Michael D. Ward & Audrey Sacks)

Finally, you should not forget to give credit to the package authors when you use them in your research. So, cite them! The necessary info can be obtained through