Not sure about the licensing requirements for my beautiful Qgis model (which shows roughly which areas of the UK are complete and which incomplete) so I don’t think I can publish the full results from that yet. Essentially though I have found that there is a strong correlation between the length of road in a particular area and the population in that area, its good enough to accurately predict the length of road there should be in an area and compare that to OSM road length. But enough about that project for now, hopefully soon I will publish all my findings.

So instead its onwards and upwards to a new project which was inspired by the view of the Osmarender layer on OpenStreetMap, shown below. It is clear to see that there are vast areas in Asia and South America, where there is no OpenStreetMap data, the question is whether there is actually nothing there or OSM is just missing cities, roads etc. I plan to find out using open source aerial imagery.

The plan is to use LandSat and other forms of freey available imagery to work out where there should be cities and roads and where there shouldn’t be. Then take this information and compare to OSM. Easier said than done, I’m sure but it wouldn’t be a project if it wasn’t challenging. So below is a LandSat image of London and to the North, which I have applied a Yellow Contrast Gradient Map to, using GIMP. As you can see it emphases cities and rural areas quite well and I’m sure this is the starting point to predicting accurately where there should be OSM data.

I’ve moved on to using the models I generated with Local Authority DfT and ONS statistics to predict which areas of the country are complete and incomplete. I used the model on Lower Layer Super Output Areas (LLSOAs) which have an average population of 1500 and variable areas. Using Qgis I wanted to create a ratio of OSM road length in an area divided by the length of road the model predicts. In this way areas that have a value of 1 are complete, values under 1 show that there is not as many roads in OSM as I would predict and are therefore incomplete. Unfortunately there are a few areas with values larger than 1, indicating that the model is under predicting the road length. After extracting OSM road length for every boundary using Qgis I used OpenOffice spreadsheet to apply the model to every boundary, the problem came in re-importing the data to the shapefile for use in displaying heat maps in Qgis. The only way my colleague could find, involved a serious amount of hacking and command line stuff, which I am not very fluent in. Luckily with a bit of a search I found this solution.

All of the attribute data for the shapefile (i.e. all the data apart from the coordinates) is contained in a dBASE (.dbf) database file. Now if you attempt to open this file up as a database file in OpenOffice, ie by right clicking on it and make you open with OpenOffice Base then it will load up in OpenOffice spreadsheet in the correct file format. Select Unicode (UTF – 8 ) as the character set in the pop-up window and you’re good to go. You can manipulate the data in whatever way you like just be sure to save it as the same file name in the same file format. Then when you import the shapefile to Qgis it will contain all of your new attributes and you can make some new fancy heat maps as I intend to do.

Pretty soon I hope to have some nice image outputs from Qgis which will show which areas of the UK are complete and which incomplete. Keep checking back for that and ask as many questions as you can muster, I’m always interested to hear from readers.

So I’ve been working with ONS and DfT stats for the past few days and come pretty close to insanity with the statistics before realising that the simplest models give the nicest results.

Have a look at these 3D graphs I’ve been working with. They show us nice and simply that the amount of roads in a boundary, depend on the land area of that boundary and on the population within the boundary. Kinda straight forward and what you might expect, but its good to get some concrete results.

Now onwards and upwards I plan to use these results to predict how many roads there should be within any boundary on OpenStreetMap and then compare that results to how many there actually are on OSM. If any one has any simpler ideas on measuring completeness, then let me know!

These hypotheses below will hopefully help in measuring the completeness of OpenStreetMap. If anyone has any other ideas or comments please do let me know.

A complete map will have all of the roads in an area, but we cannot obtain stats on the amount of road in every area we may wish to test in OpenStreetMap. So an accurate way to predict the length of road in any area would be useful. My hypothesis is that the length of road in a given area is dependent upon population density. That is to say I expect that urban areas will have less road per person than rural areas.

For areas where OSM has aerial imagery I would like to compare the complexity (ie file size) of the Yahoo! jpeg and the corresponding OSM tiles. The hypothesis is that areas with very small aerial jpeg files (because they are simply one colour like the sea or vast expanses of desert) will have few if any entries on OSM, whereas areas with large file sizes (cities) will have a large density of nodes and ways in OSM and therefore large tile size. I do not have the technical knowledge to test this so any help would be great.

Another hypothesis is that more complete areas of OSM will have a higher level of edit activity. If no-one has ever edited an area then it may be unlikely that the map is complete there, obviously however there may just be nothing there, so this test could be used in conjunction with the Yahoo! Imagery test stated before. If we could produce some sort of heat map showing which areas are edited most frequently and monitor it over time this could certainly show us some interesting trends.

This is an attempt to solve the problem of missing roads. I would think it unlikely that there would a road which is completely cut off from others, or that there would be an entire settlement of roads not connected to rest of the country’s road network (as in the Madiera example shown below again). The hypothesis is that every road is connected to at least one other road of equal or higher classification. So if in OSM there are roads that are not, then maybe there are missing roads. This testing may require a lot of calculations and may not return that many missing roads. If some one can think of a way to do it simpy then I would very much like that input.

First we must think about what it is important for a map to have, a complete map will obviously contain all of those important features. The features necessary may differ from user to user though, tourists may be interested in the location of landmarks whereas those travelling into work every day need accurate road maps with all turn restrictions and road names etc. In general I think named accurate roads are the most important feature so a lot of my analysis will be to do with the length of roads present in OpenStreetMap. That is not to say that POIs such as restaurants, post boxes etc are not important it is just that for these to be placed well we need a complete mapped road network. With this is mind I have developed a way in which we can follow the progress of the completeness of an area on OSM, using a stage system.

These stages may or may not occur in sequential order and each stage can be quoted complete in terms of percentages. For example we might say that London is 100% complete for Preliminary and stages 1 and 2, but only 60% stage 3 complete and 20% stage 4 complete. The hard question is how do we accurately measure these percentages. It is easy for a human to tell that the map of London below is more complete than that of Madeira with its limited amount of roads and dead ends but its a lot harder for a computer.

OpenStreetMap is the wiki-style answer to maps. It is a collaborative project to create free editable maps of the World using portable GPS devices and other free sources. Due to its open approach the map data is free for anyone to use in whatever way they choose, unlike other map sources such as Google Maps or Yahoo! Maps. I am currently compiling some research for CloudMade to assess the completeness of OpenStreetMap and therefore the viability of its commercial use.

What does it mean for a map to be complete?

Before we can describe what it means for a map to be complete we must think about what is important for different users. For most people having a complete categorised and named road network is probably most important however for some, such as tourists, points of interests may be just as important. Those living in cities may think having complete public transport networks are the most important facet.

Any thoughts on the completeness of OpenStreetMapor maps in general would be truly appreciated.