Menu

Monthly Archives: December 2012

A couple of months ago, I was taken to a the eponymous website on which the deaths of race horses are catalogued, (http://www.horsedeathwatch.com/), and taught a little about how to scrape information from a regularly updated table (of the sort found on that page of the site), using only a google spreadsheet.

Google spreadsheets are invaluable. Fact.

1. Create a new google spreadsheet. Type =importhtml into a cell on your spreadsheet. This start to your formula needs to be followed by some information telling the spreadsheet exactly what information to extract – using the following template: (“url”,”query”,”index”)

“url” (of the site from which you want to scrape)

“query” – this can be either “table” or “list”. In this case, horsedeathwatch have thoughtfully collated the information in a table rather than a list, so we’re plumping for “table”.

“index” – this is simply the number of the table or list, and may require a little trial and error. So if your table is the second on the page, enter “2”. In this case, we will enter “1”.