Why should journalists care about url patterns?

The other day I was asked to extract the data from the IEC website for an article on the 2016 Local Government Elections. Unfortunately, at that time there was no full report available. One option I had (which could not be any less attractive) was to download the documents one-by-one, municipality by municipality, or for each voting district.

Of course, that was not an option. Besides the fact that the page reloads for several seconds every time you select an option from the menu, the whole process can waste hours, just to get access to data and to then find out that you still need to clean it. I decided to write this post because this is not the first time I needed to do this, and you might find it valuable. So, how can we do this quickly?

When you click on “download”, a little screen pops up. In this screenshot you will see the url and will see that it consists of a bunch of parts. If you download two different files you will begin to see the pattern.

Take a closer look: The url is a source of information and you should always read it. In this case, it is telling us the vital parts we need to construct the url that we need to download the complete data.

Once we have seen what the composition of the url is, we can reproduce it and download the data in an easy way.

Build the pattern:

Using a Google spreadsheet, you will need four columns: the first part of the url, the province code, the municipality code and the end of the url.

(NB: Don’t forget the “/” at the end)

Concatenate the information.

Using the “concatenate” function, you will be able to merge the columns into one url.

Paste the urls in a text doc.

Now with the urls ready, you just need to copy and paste them in a text file using a text editor. Make sure you save it as “txt”.

Use the Firefox plugin “DownloadThemAll”.

You will need to install an add-in called DownloadThemAll in Firefox. Once you have downloaded and installed it you will need to go to “tools + download them all + manager”.

Right click on the empty space, click “advance + open the text file” and click “start”.

Do you know of another way to do it? Please share it! And don’t forget, our third Data-Driven Journalism Academy is about to start. If you want to learn a bit more about data journalism, don’t hesitate to take a look here to find out how you can apply.

About The Author

Daniela Q. Lépiz is a Costa Rican data journalist. She has a masters degree in data journalism from the Rey Juan Carlos University in Madrid, Spain. Currently involved with Code for South Africa where is Data Editor in the organisation's Data Journalism Academy -Africa's first data journalism school which offers training in data-driven storytelling to working journalists- She produces data driven analysis and articles for different publications. Previously worked for Central American publications (print and digital), including La Nación in Costa Rica, one of the first newspapers in Latin America to establish a data unit.

About the Data Journalism Academy

The Data Journalism Academy represents an initiative of Code for South Africa’s data literacy programme which is aimed at equipping participants with the skills and tools that enable them to access and explore public data.