My Blog about Data

Month: July 2015

I found an interesting data source on FigShare showing how much University College London paid for the Article Processing Charges. According to the description of this data set it should include items between 1st April 2013 and 31st July 2014. However, there are some entries that are dated earlier and later than that. I guess that only the previously mentioned period is complete. I included all data anyway and then I added a linear regression line on top of the plot showing total cost by date. Judging by that, it seems like the costs keep on increasing. However, it is just a short period of time and I don’t know whether this trend held for a longer period of time.
Large journals (e.g. Nature Communications, PLOS One, and Lancet) stand out as taking more APCs than others. The vast majority of articles in this data set has been published with a CC BY licence.
Feel free to further explore the plots. They are all interactive.

Lately I came across a data set published by Research Councils UK (RCUK) about the diversity of grant applicants. The document didn’t include any plots, so I decided to fix that. Unfortunately, the report was only available in a pdf format which is not optimal for acquiring data.
Luckily, I came across Tabula which is an open source project that allows extracting data from tables in pdf files. With the help of Tabula I extracted data from the document to create plots. Formatting of the original document required too much data wrangling, so I only used data from the first table.
This data is showing an estimate of academic populations applying to the Research Councils.

Times Higher Education published a 2015 pay survey in April, but only now I found time to create interactive visualisations showing their data.

I had to edit the table to make it usable in Tableau. I removed the ‘All’ column from the original data source. This column most likely represented weighted means. I couldn’t find any information about how it was calculated, so I only used ‘Female’ and ‘Male’ values. Combined (‘All’) values used in my workbooks are means of these two fields.

I used Edubase data to find additional information about universities, e.g. region or location (available only for HE institutions in England).

Department for Education regularly publishes the list of educational establishments in England and Wales. I was interested in seeing which areas have the largest number of schools with Speech Special Educational Needs Coordinators, and what are the characteristics of those schools. In order to find that out, I created a visualisation in Tableau.

The first tab (‘Map’) is showing the map of educational establishments with or without Speech SENCOs, type of establishment, number of establishments, and total number of pupils.
The second tab (‘Pupils by Region and FSM’) shows a fine-grained description of pupils in different types of educational establishments.