Unfortunately, particularly in the US, lots of the raw data about politics comes on digital paper. For example, this PDF has 25 pages of detailed Missouri primary election results.

Enrique uses the data to build models in the stats software R. He needs structured tables first to load into R. He uses PDF Tables to convert the PDF files into Excel. The output looks like this:

Enrique tried various other conversion tools, such as from Adobe, but found the quality wasn’t high enough. In particular, cells were merged between columns, and data was misplaced. He had to spend too much time cleaning up the output, but often even that wouldn’t work.

Enrique’s models calculate, precinct by precinct, which voters to target. There are some people who will vote for you whatever happens, and others who never will. Where are the voters in between, along the chaotic edge? You need to learn as much as you can about the motives of those voters.

Because there is lots of open data, and a culture of data analysis in campaigns, the US is very active for 7-50 Electoral Math. Being in Spain, Enrique works a lot there too. There are less PDFs in Spain and more traditional web scraping, such as these Catalonia Parliamentary election results.

Enrique often has to go to separate sites for each region, and then into separate pages for each year.

The system in Spain is very secretive. There’s not as much detailed data available, so instead Enrique has to reach conclusions by approximations, and make projections from the data there is.

Israel, in contrast, is a “paradise for elections” according to Enrique. With 120 seats in the Knesset, politicians have constantly shifting alliances. They jump jobs a lot, making it a fun place to analyse.

PDF Tables has lots of customers getting political data from PDFs. One day, the world will work out a popular data interchange method. For now we’re glad Enrique is at least sleeping a bit better!

Got a PDF you want to get data from?
Try our easy web interface over at PDFTables.com!