Geo-spatial searches with RavenDB

For quite a while RavenDB had geo-spatial search capabilities, but ever since it was introduced it was limited to finding documents with latitude and longitude within a radius from a given point. In the past few weeks I was working on revamping the Lucene.Net spatial module, and earlier this week the work on that was complete. Next in line was getting those changes into RavenDB. I just finished doing that, and this post is going to show what it can do, and how.

First, a few words on geo-spatial indexes. To be able to represent a shape in an index, and then search for it, shapes are converted to an index-friendly representation. There are quite a few ways to do this, most commonly known approaches are prefix trees and bounding-box. The QuadPrefixTree approach, for example, represents the earth with 4 grid squares at it's first level of precision. The squares are labeled A, B, C and D. The next level of precision introduces another letter to the representation, so we get 16 grid squares - AA, AB, AC, AD, BA, ... and so on. By having this multiple layers of precision, we can create the most efficient representation of a shape which balances number of terms vs precision. Another implementation called GeohashPrefixTree uses geohashes which have more grid squares per layer.

Before diving any deeper, here's how you would perform a simple point and radius spatial search. This is taken directly from the old API (which we revised a bit), and since it's easier to use for the most common usage of geo-spatial searches, we left it mostly intact:

The new spatial stuff is quite powerful, and we really wanted to keep all that power in your hands. Therefore, when defining an index you get a chance to specify which spatial strategy and what prefix tree "height" to use. You can just use the defaults if you wish to, of course.

Shapes in both documents and queries are represented using WKT - a markup language for representing shapes, so they are as human readable as they can possibly be. Using WKT also frees everyone from hard to use API and tons of classes, at least as long as the shapes you use are simple enough. If you are expecting to handle complex shapes, it is recommended that you install NetTopologySuite from nuget to help you with creating shapes and serializing them to their WKT string representation.

Here is an example of the new capabilities. Please note, I just pushed the code for that in, so the API might change a bit by the time you get to play with it:

This is the unbound version of the API, and you can do quite about anything with it. A few notes about this new API:

The SpatialGenerate() method in the index definition is expecting a WKT formatted string. It can be any shape you want, but it has to be a legal shape string.

Specifying a spatial strategy is done when defining the index. Changing a strategy will trigger re-indexing.

The strategy and maxTreeLevels parameters are completely optional. Only use them if you know what you are doing, otherwise, stick to the defaults.

You can provide ANY shape while querying, and an expected relation to it. More details on shape relations below.

The results will be sorted by distance, unless otherwise requested.

You can store several shapes in one documents, and specify which shape it is you want to query on, using the fieldName argument in both the index definition and the query. However, at this point you can execute a query only against one spatial field at a time (but as many non-spatial fields as you want).

Obviously, one of the benefits of this new implementation is the ability to index any shape, and to issue a query with any shape against them. Circles, points, squares, polygons - RavenDB doesn't care anymore.

There are 3 types of shape relationships that are supported with this new implementation:

Intersects - querying for a shape which intersects a shape stored in a document within RavenDB will find those shapes which intersect with the given shape. Intersection occurs when the two shapes have at least one shared grid hash. Because of current limitations of the algorithm, very large indexed shapes are not deemed to intersect with very small query shapes. However, smaller indexed shapes will intersect with larger query shapes.

Disjoint - Finds those indexed shapes which are disjoint to the query shape. This means the the indexed shapes and query shape must have no shared grid hashes.

Within / Contains - Finds those indexed shapes which are fully contained within the query shape. Unlike intersects, this means that all of the indexed shape must be present in the query shape. Any shapes which have additional area outside of the query shape are excluded.

Limitations and gotchas:

Distances with this new implementation are Kilometers, while the old implementation was using Miles. Since this is what the internal implementation uses, and it is hardly exposed to the end user, we kept using the metric system. It is quite easy to convert this back to miles, and if there will be demand we might introduce a configuration option on the server side to do that.

Handling of polygons which cross the dateline isn't supported at this stage.

Multi-polygon support is lacking.

This new feature is really neat, and opens up great new opportunities with its simplicity and ease of use. It is available to us thanks to the spatial4j project, and powered by Lucene.Net, Spatial4n and NetTopologySuite.