One of the most frustrating aspects of data journalism is that, even when data is publicly available and easy to download, it is often in a format that’s nearly impossible to work with. The congressional election results that the Federal Election Commission maintains are a perfect example. The vote totals for every candidate in every race are all tossed into one large, poorly documented Excel file. Some candidates appear multiple times under different parties, which are listed on a separate sheet.

It’s easy enough for a human to look at these files and, after much squinting, figure out how the data is structured. But for any large scale analysis, we need data that is readable both by people and machines. To get the data I needed for the feature on Libertarian “spoiler” candidates, this involved writing a Node.js script to extract the data and organize it in a coherent way.