Tagged with data …

Recently, I was asked to give talks at both UCL’s CASA and the ETH Future Cities Lab in Singapore for students and staff new to ‘urban data science’ and the sorts of workflows involved in collecting, processing, analysing, and reporting on urban geo-data. Developing the talk proved to be a rather enjoyable opportunity to reflect on more than a decade in commercial data mining and academic research – not only did I realise how far I had come, I realised how far the domain had come in that time.

In some circles (e.g. mine) news that the government is trying (again) to sell off the Land Registry has caused something of a stir. The curtain closed on the first act of this drama in March 2014, by which time 91% of respondents to the consultation opposed the Land Registry’s transition to a service delivery company. Apparently, it wasn’t the overwhelming opposition from, well, everyone that scuppered the deal, it was Vince Cable.

In my previous post I looked at some of the issues affecting the extent to which ‘big data’ gives a reliable picture of the world around us. In this post I want to take you through one of the least sexy—but most important—parts: the data itself. My point, again, is not to suggest that big data is fatally flawed, but to call into question some of the easy assumptions upon which we rely when working with this type of data, and the universality of the conclusions that we can draw from this type of research.

Note: this was previously posted at simulacra.info, but I am in the process of (re)organising my technical notes and tutorials.

A bit of a dry post here, but I thought I’d share my experience of trying to get two instances of MySQL (and two different versions, to boot) running simultaneously on a single piece of hardware as I’ve spent the past two days tearing my hear out and swearing profusely (mostly) under my breath. Continue reading →