The Digital Divide of OpenStreetMap

28 December, 2009

In my previous analysis of OpenStreetMap (OSM) data, I compared it to the Index of Deprivation, as a way to understand if there is any socio-economic spatial pattern in the coverage of OSM. Following numerous interactions with various parts of the OSM community, I had suspected that there might be a bias, with the result that affluent areas might be mapped more completely than deprived areas. I explored this systematically, as only empirical analysis could provide evidence one way or another.

OSM completeness coverage compared to Index of Deprivation 2007

Here are the details of the analytical process that was used.

The core data that was used for the comparison is the UK government’s Index of Multiple Deprivation 2007 (IMD 2007) which is calculated from a combination of governmental datasets and provides a score for each Lower Level Super Output Area (LSOA) in England. The position of each LSOA was used to calculate the percentile position within the IMD 2007. Each percentile point includes about 325 LSOAs. Areas that are in the bottom percentile are the most deprived, while those at the 99th percentile are the most affluent places in the UK according to the index.

Following the same methodology that was used to evaluate completeness, the road datasets from OSM and from the Ordnance Survey’s Meridian 2 were clipped to each of the LSOAs, and then the total length of the two datasets was compared. Because the size of LSOAs varies, it is more meaningful to compare percentage completeness and not the absolute length.

The analysis of data from March 2008 showed a clear difference between the LSOAs at the bottom of the scale and those at the top. While the LOSAs at the bottom were not neglected, the level of coverage was far lower, even when taking into account the variability in LSOA areas. I wanted to explore whether the situation has changed since then and undertook further analysis using the same methodology.

Has the situation changed during the 19 months from March 2008 to October 2009?

The graph above shows that things have changed, but not for the better. The graph shows the level of completeness for each group of LSOAs. To avoid confusion with rural areas, where the size of the LSOA becomes very large, only LSOAs that are within a standard deviation of area size are included. The effect of this is that the graph shows the results for mostly urban LSOAs.

I compared 3 datasets: March 2008, March 2009 and October 2009. A rather alarming trend is visible. Instead of shrinking, the gap between affluent and deprived LSOAs is growing. The average completeness of the bottom percentile in March 2008 was 40.7%, grew to 65.7% a year later and to 71.8% by October 2008. For the most affluent percentile, completeness grew from 67.5% in March 2008 to 97.0% a year later and to 108.9% by October 2009. In other words, the gap between the top and the bottom has grown from 26.6% to 37.1% within the analysis period.

Within the OpenStreetMap community, there are activities such as those led by Mikel Maron to map informal settlements in Kenya and to ensure coverage of other marginalised parts of the world (see the posts on his blog). From the work that we are doing in Mapping for Change, it is clear to me that mapping can be an excellent motivator to encourage people to use digital tools, and therefore adding data to OSM can work as a way increase digital inclusion. So maybe OSM coverage can be increased in the UK with some governmental support, which has stated an aim of increasing digital inclusion?

If you would like to explore the data by yourself, here is a spreadsheet with the information, including the LSOA codes, the position in IMD 2004 and IMD 2007, and the coverage percentage for March 2008, March 2009 and October 2009. Please note the terms and conditions for its use – and let me know what you have done with it!

Yes, it is easy to map this – the data for LSOA is freely downloadable from the UK ONS. However, as assistance to mappers, I’m going to release a version of the 1km grid map very soon, and I think that it will be easier to use. I hope to finish the preparation within next week.

It should be very easy to link LSOA codes to their boundaries – the boundaries can be obtained from ONS free of charge – http://bit.ly/63bX97 . For obvious reasons (copyright), I can’t reproduce the shape file and release it, but by releasing the spreadsheet anyone who want to map it can request the data and create a map. I would be very interested in seeing visualisations that are based on this analysis.

Thanks for this Muki and it is good to be reminded of such neglect, however I am actually rather surprised about how uniform the coverage is across the deprivation scale. I also note that the most deprived area in Oct09 is now better mapped than the least deprived was in Mar08. If one trims out the most and least deprived 5% then that statement appears to also be true between the most deprived in Oct09 and the least deprived in Mar09.

I fully support your proposal that efforts should be made to even-out coverage across this index and I would be interested to hear if anyone in government would consider supporting such work. Possibly the purchase of aerial photography for target areas (£9 per sq km) and covering expenses for people attending mapping parties might be a good way to support such outreach.

Yes, wonder what other factors could be involved. The increasing disparity ..perhaps mapping in a particular geographic area does not grow linearly, but accelerates as the it nears “completeness”.

And how many mappers are based in each locality? Were there events, mapping party or otherwise, that focused attention? Do these areas have more “public” facilities, businesses etc? Is there a higher rate of crime (or just perceived rate of crime)? The root cause of any disparity in these factors is very likely also economic, but interesting to examine how poverty and perceptions manifest.

I agree with the comments above – use this analysis to fix the problem! Map the underserved areas to make it clear where work needs to be done. Revisit the analysis on a regular schedule. Publicize the usernames of mappers responsible for advancing coverage. Make it into a game, in the style of http://noticin.gs!

You’ve got my intentions spot-on. That’s why I’ve released the data that is behind the analysis, and it should be easy to turn it to an app that directs people to places that are under-mapped. I’ll be happy to provide more details to anyone who want to work on this problem.

Also excellent to connect directly with people in these communities .. as you said Muki, OSM is an excellent way to introduce people to participation on the web, and facilitate a change in perception of place. Even if you just focus on making a complete map .. only residents of a place are going to be able to keep the map up to date.

Is there a particular place, perhaps in London, to focus on? Connect with the local council, community groups, schools?

So impressive! Your research is really getting at the core of OSM’s value. I would really love to get a sense for how OSM ranks relative to Google maps — I feel intuitively certain that OSM has a better distribution of data.

And yes it would be thrilling to have a game-like interaction design to promote it further!

Thanks for the comment, and I do hope that the work is helpful to the OSM community. It will be indeed interesting to compare coverage to Google Maps and other sources, though not all data providers are as open as the Ordnance Survey to allow me to compare their data…

I have avoided rural areas in this analysis because if they are included, there are two impacts to deal with – first, the population density impact (I will post about this soon) and the socio-economic impact. When the rural areas are included, they create a ‘hump’ in the middle of scale in terms of their area (see slide 38 at http://www.slideshare.net/mukih/osm-quality-assessment-2008-presentation). Because the area is larger and the population is smaller, OSM completeness in rural LSOA is lower than in in urban areas.
So in order to focus on one factor (deprvisation scores) and avoid mixing the rural/urban issue, I have decided to present this analysis with ‘urban’ areas only.

Thanks, I will keep an eye out for your post. I think there would be less of a connection between deprivation and why people are less active in mapping those areas. Would population density affect the completeness of the data significantly when OSM coverage is attributed to a relatively small number of participants? Would be interesting to compare the profile of active participants too I guess.

On a very local level, when picking a part of a city to map, a mapper will tend to steer clear of the lower level super output area (otherwise known as the nasty concrete estate) Zoom out a little bit, and OSM completeness is clearly better in more affluent areas because this is where the internet surfers and gadget lovers live and work. In other words OpenStreetMap is a map of where the geeks live. If we had a measure of internet usage by location, I think that would show an even clearer correlation (though obviously similar) than general deprivation measures.

There’s another pattern to this which will always throw off these correlations slightly. I imagine mapping completeness as little mapping bombs being dropped. Circle blotches of completeness of different sizes appear at semi-random locations, each one representing a new mapper going crazy and mapping their whole area. The best example of this is Hull. I believe this city is comparatively deprived, but Chris Hill lives there, and has dropped mapping nuke on it!

Although it’s obviously sad that we are mapping deprived areas less well, it might also be interesting to consider areas where there is comparatively high internet usage but low OSM coverage. These are the areas where we should have an easy time promoting more mapping. The U.S. would glow bright on that heat map.