On Friday 17 October 2008, Sascha Silbe wrote:
> On Fri, Oct 17, 2008 at 07:47:14PM +0200, Freek wrote:
> > Looks like we have enough ideas for adding indexes and such to an OSM
> > binary format, so if the rest of the structure is fixed I think we can
> > get it off the ground (I wouldn't mind at least helping out here and
> > there).
>> OK, so what do "we" want in a binary DB? Currently there is (in my
> implementation):
"we" referring to anyone wanting to get involved :-)
> Node/way/relation internal <-> external id:
> - int -> ext O(1)
> - ext -> int O(log n)
So what kind of indexes do you have, hash tables and balanced binary search
trees (what about external memory)? Anyway, when do you use such an index? I
guess one would want to know the OSM id of objects returned by certain
queries, but not specifically a translation from internal to external id
(though of course you probably get that for free if you have an index on
internal id).
Second case: when would you want to know the internal id of a given OSM
object? Probably we need this if we want to implement updates...
> Node location (2x 32bit int, 64bit Peano):
> - id -> location O(1)
> - location -> int O(log n)
Just a bit of a technical thing (not very important for the design at this
stage), but with "Peano" I suppose you mean the order that looks like a lot
of nested Z's? Historically this is not a curve Peano came up with (though
many people call it /a/ or even /the/ Peano curve). Actually, he proposed a
different one that does not have jumps and hence has better locality
properties. A disadvantage of that one may be that an implementation would be
slightly more complex, but such a discussion can better be deferred.
> Node/way/relation types (using tag -> id mapping with wildcard support
> during import):
> - id -> types ~O(1)
> - type -> ids ~O(log n)
Type? Can I, for example, retrieve all post boxes using this index?
> Node/way/relation "interesting" bitmap (name/ref/is_in set and/or known
> type):
> - id -> interesting O(1)
Important design decision, but more for the data format itself: does it store
all nodes/ways/relations, also the "uninteresting" ones?
> Node/way/relation tags (k/v-pairs are \n-separated and properly
> escaped):
> - id -> all tags O(1)
Guess we would want something like that, yes.
> Way MBR / RTree:
> - id -> MBR O(1)
> - MBR -> ids ~O(log n)
This is an index for retrieving all objects within a specified query
rectangle, right? (So basically what we've been talking about ;-)
> Way members (ordered sequence of node ids):
> - way id -> member ids ~O(1)
> - member id -> way ids ~O(1)
>> Relation members (ordered sequence of id+type+role):
> - rel id -> member ids ~O(1)
> - member id -> rel ids ~O(log n)
Will the data format have all these kinds of information (tags, members, etc.)
in different places (on disk), or will they all be in the same place? In the
latter case it may be logical to retrieve all this information with one
query.
> Node/way/relation name/ref/is_in:
> - text files sorted by id (intended for namefinder - could be
> changed into "proper" database entries similar to the way tags are
> handled)
I think name finding is a topic on its own. Also, a good look at the current
online OSM name finder and Gosmore may be a good idea.
> Actually, there probably isn't a one-fits-all solution for the indices,
> it depends too much on the application and its environment. So each
> index should be optional for the high-level API (makes it more complex,
> though).
Indeed, a number of separate indexes for different purposes seems natural. An
index for efficiently retrieving all data elements in a certain region may be
a good place to start.
--
Freek