All products will continue to be fully suported and developed by Microsoft, but will be accepting contributions from the community as well. Also being released under Apache 2.0 opens a lot of new possibilities - e.g. building your custom version of the projects.

26 March, 2012

In the previous post I have shown, how to do simple filtering of the OSM data with SpatialLITE library. Today I will go a step further and do something more complicated - extract data from OSM file within specific bounding box.

It seems to be pretty straightforward task so, how hard can it be?

There is one problem, we need to solve - some ways and relations near edges of the bounding box would spread across bounding box boundaries - part of the entity would lie inside the bounding box and the other part outside. We need to decide what to do with these split entities. Should we include them? Should we exclude them?

Split entities

Basically there are four way how to deal with them:

exclude ways and relations that have any part outside the bounding box - this leaves isolated nodes in the output - see picture

include ways and relations that lies partially outside the bounding box, but do not add any nodes that lies outside the bounding box - this leaves ways and relations incomplete - again see picture

include all entities with all nodes that lies partially inside the bounding box - this produces most the most complete map, but on the other hand you will end up with many entities outside the bounding box

clip entities at the edges of bounding box

The option 4 might seem as the best solution, but we would need to add new nodes on the edges of the bounding box, split ways into two, or do other complicated (with OSM data model) things.

Options 1 - 3 look very similar to each other and it depends on your needs, which one you should choose. But from the performance point of view option 3 is significantly more complex. Entities in OSM files are sorted by entity type and then by ID, so with options 1 and 2 we can create output file in single pass, but with option 3 it is impossible. During first pass we find all nodes in the bounding box, and all ways and relations that contain any of the selected nodes. In the second pass we need to add nodes that are outside bounding box but belong to the included ways and relations.

Tracking IDs

For all options we need to track IDs of entities inside bounding box (or you can keep all entities in the memory - but with millions of entities it might be impossible). There are couple ways of doing that - for example Osmosis is using either bit board or list of integers. I'd like to try something else - as mentioned above OSM files are sorted by entity type and then by ID, another observation tells us that entities with consecutive ID often lies near each other. We can take advantage of that - we will store IDs as list of Range objects.

struct IdRange {
public int From;
public int To;
}

Every IdRange structure will represent consecutive range of IDs. If there are at least 2 Ids in IdRange object in average, this representation of used nodes will use less memory then simple list of IDs.

When it comes to the memory consumption the results are worse for SpatialLITE. .NET applications are usually a little bit memory intensive and this sample application isn't exception. While peak memory consumption of Osmosis was 53MB, SpatialLITE took whole 84MB of the memory. It seems that the biggest memory consumption was caused by PbfWriter, which is weird and I will have take a look on it. The IdTracker, that should occupy the most of the memory, actually reached size of only 19MB. An internal stats from the IdTracker shows, that there are approx. 4.5 IDs in every Range object - so it isn't actually bad idea.

11 March, 2012

I’m a big fan of the OpenStreetMap project, so when I released the first version of the SpatialLITE library last week, classes for working with OpenSteetMap data couldn’t be missing in the project. Right now the library supports OSM XML files (without compression) and OSM PBF files, both for reading as well as writing. And how can you work with OSM data? How does SpatialLITE library compare with other tools in terms of speed? Let’s find out …

... err wait, some introduction might be neccessary ...

IEntityInfo vs. IOsmGeometry

There are two different representation of the OSM entities in the library – IEntityInfo and IOsmGeometry objects.

IEntityInfo represents a lightweight object that contains only data for the particular entity – relationships with other entities are described only by IDs of related entities.

WayInfo class

On the other hand a collection of IOsmGeometry objects represents tree of interconnected objects that implement IGeometry interface. This allows you to access properties of related entities directly, perform spatial analysis with methods from SpatialLite.Core library or use any other methods from the library that accepts IGeometry parameters. IOsmGeometry objects are pretty powerful, but the price is obvious – all objects must be held in the memory, and with large files it might easily become OutOfMemoryException kind of problem.

The decision whether use IEntityInfo objects or IOsmGeometry objects is up to you. Is some cases it might be better to stick with the simple IEntityInfo objects and sometimes you will need more complex IOsmGeometry objects. Fortunately it is possible to switch between these two representations of the OSM entities.

Right now there are two formats supported – OSM XML and OSM PBF. Both formats support reading as well as writing – so there is OsmXmlReader, OsmXmlWriter, PbfReader and PbfWriter.

Readers accepts a stream or a file in their constructors and provide forward only reading capabilities – pretty much the same behaviour you find in build-in readers (e.g. BinaryReader). Because of the structure of files it is impossible to create IOsmGeometry objects in the reader and thus readers returns IEntityInfo objects. If you need to work with IOsmGeometry objects, you can use OsmDatabase class that encapsulates process of creating full fledged objects from data read by IOsmReader.

The Principle of the writers is pretty much the same – they also accepts a stream or a file in the constructor and then provide forward only, writing capabilities. Both IOsmGeometry and IEntityInfo objects contains all necessary information for serialization, so both object types are accepted as parameter of the Write Method.

Putting it together

OK, now when we have covered basics, let’s go back to the title of this post – filtering of OSM data. For now we would implement filer that is able to process just nodes.

Ingredients

1 IOsmReader

1 IOsmWriter

1 expression to determine whether node should be filtered out or not

For the purpose of this demo we would like to find all guide posts in the OSM file. In OSM a guide post is represented by a node with the information:guidepost tag. Writing an expression to choose such entities is simple:

I wanted to find out how fast the SpatialLITE library is - Osmosis application was chosen as competitor, because it is probably the most popular tool for processing OSM data. Both tools were used to perform the same task: Extract all nodes with information:guidepost tag from the OSM files (5.25 GB OSM XML file, 245 MB PBF file).

Let's see the results ...

SpatialLITE (XML)

SpatialLITE (PBF)

Osmosis (XML)

Osmosis (PBF)

Run 1

5:01

1:30

5:58

1:46

Run 2

4:48

1:34

6:40

1:44

Run 3

4:56

1:29

6:01

1:43

Speed comparison of SpatialLITE and Osmosis (all times in minutes)

Not bad, I guess :-)

OK, it might not be fair comparison for the Osmosis, it has many additional features and can perform significantly more complex filtering, but it shows that the reader / writer classes in the SpatialLITE library are anything but slow.