Blog

PDFs were invented at the same time as the web. As “digital paper”, they’re trustworthy and don’t change behind your back.

This has a downside – often the definitive source of published data is a PDF. It’s hard to get tens of thousands of numbers out and into a spreadsheet or database. Copying and pasting is too slow, and popular conversion tools munge columns together.

At ScraperWiki, we’ve been helping people get the data back out of PDFs for nearly 5 years.

In that time we’ve developed an Artificial Intelligence algorithm. Just like your eyes, it can see the spacing between columns, picking out the structure of a table from its shape.