How to: Convert OSM waypoints defining polygons into Shapefile

March 5, 2014

Today I got sent a file by a colleague in OSM format. I’d never come across the format before, but I did a quick check and found that OGR could read it (like pretty much every vector GIS format under the sun). So, I ran a quick OGR command:

ogr2ogr -f "ESRI Shapefile" Villages.shp Villages.osm

and imported the results into QGIS:

Oh dear. The data that I was given was meant to be polygons covering village areas in India, but when I imported it I just got all of the vertices of the polygons. I looked around for a while for the best way to convert this in QGIS, but I gave up when I found that the attribute table didn’t seem to have any information showing in which order the nodes should be joined to create polygons (without that information the number of possible polygons is huge, and nothing automated will be able to do it).

Luckily, when I opened the OSM file in a text editor I found that it was XML- and fairly sensible XML at that. Basically the format was this:

Under a main <osm> tag, there seemed to be a number of <node>’s, each of which had a latitude and longitude value, and then a number of <way>’s, each of which defined a polygon by referencing the nodes through their ID, and then adding a few tags with useful information. Great, I thought, I can write some code to process this!

So, that’s what I did, and for those of you who don’t want to see the full explanation, the code is available here.

I used Python, with the ElementTree built-in library for parsing the XML, plus the Shapely and Fiona libraries for dealing with the polygon geometry and writing the Shapefile respectively. The code is fairly self-explanatory, and is shown below, but basically accomplishes the following tasks:

Iterate through all of the <node> elements, and store the lat/lon values in a dictionary with the ID used as the key of the dictionary.

For each <way>, iterate through all of the <nd> elements within it and use the ID to extract the appropriate lat/lon value from the dictionary we created earlier

Take this list of co-ordinates and create a Shapely polygon with it, storing it in a dictionary with the name of the village (extracted from the <tag> element) used as the key

Iterate through this dictionary of polygons, writing them to a shapefile

After all of that, we get this lovely output (overlain with the points shown above):

The code definitely isn’t wonderful, but it does the job (and is relatively well commented, so you should be able to modify it to fit your needs). It is below:

Unfortunately that doesn’t work for me – I get an error saying that multipolygons doesn’t exist. I think possibly my OSM file is a bit strange – it wasn’t actually downloaded from OSM, but was created by someone in JOSM. Anyway – hopefully for people reading this either your comment or my code will work.