Tuesday, March 25, 2014

Got through a late-night programming binge, followed by more of the same on my lunch break, followed by a couple more days of thought and work during downtime. It's up at my fact-base implementation. Basically, we have indices now. Fairly naive, simple indices, but they should suffice for what I'm up to[1].

Before we get to discussing any code, lets back up and discuss the idea of an index for a moment.

The Idea of an index

An index in this context is an extra layer on top of our fact storage engine that keeps track of what we've put in/taken out in a way that makes certain things easier to look up. It's easier for fact-bases than it is for relational databases. Since every fact is made up of three components[3], all we have to do is keep an index by one or two slots. What we're basically looking to do is a really fast lookup of some subset of all facts in a base based on one or two of those keys[4]. The way I've chosen to do it, after some advice from friends who've used systems something like this, is by maintaining hash tables[5] in memory that give you shortcuts to some specified indices. We're trading space[6] for time[7].

The Post-Explanation Version

This was a much different article initially; I was going to discuss some bone-headed intermediate states for this code before getting to the "final"[8]. It ended up looking like a stupid idea in this particular instance because the previous versions weren't things I'd consider running after having thought through it a bit more.

I try to do that a fair amount these days.

Not the stupid implementations thing, though that's also true. I mean sit down with another actual human being and talk them through the code I just wrote. It's not quite as good as blogging about it, but it's faster and turns up almost as many issues. It also has an added benefit that blogging doesn't seem to give me, which is showing me when I'm completely off my nut. Anyhow, here's what I had after I explained the thing to someone, poured some shower time into thinking about it, and drafting the first version of this post:

There, that's not so intimidating, is it? The actual interface to these indices is in fact-base.lisp, but the above contains most of the functionality. First, ignore the show methods at the bottom there. That was just a piece of hackery to give me a usable visual representation of an index while I was debugging this beast. Lets start at the top.

An index has a table of bindings. You'd call make-index in a way resembling (make-index '(:a :b :bc :ac)), which would give you back an index with room to dissect a fact base into chunklets keyed off of

the first element

the second element

the second then the third element

the first then the third element

No, I have no idea if I'll ever actually need a setup like this. The indexed? method takes an index and an ix-type symbol and tells you whether the given index is tracking that particular type of lookup.

Those both map an index type to the components they'll need for insertion/lookup. I've thought about factoring out the obvious pattern, and even wrote some prototype code to do it, but it turns out that for 6 indices, the macrology involved is more complicated than the trivial lookup-table thing. If fact-base were an arbitrary storage system, this would probably be complicated enough to macro away, but the whole point is that I'm only ever storing triples. Which means I'm never going to need more index types than this, and usually much less.

These three utility methods at the end are exactly what you'd expect. insert! takes a fact and an index and inserts one into the other, into how-many-ever particular lookups that index is tracking. map-insert! is a shorthand for inserting a bunch of facts at once into the same index. And finally, delete! takes a fact and removes it from all index lookups, then cleans up empty lookup lists.

And that's that. You've got a quick run-through of how this works out in practice here. I get the feeling I'll be talking about deltas, forking and the applications of fact bases before long, but we shall see.

Footnotes

1 - [back] - Granted, because "what I'm up to" at this point "an almost trivial semi-anonymous forum for a local meetup group", and "an almost trivial notebook-style REPL for common lisp", that's true of almost any data storage technique ever, but still. The naive, index-less storage was giving cl-kanren[2] a bit more trouble than I wanted it to when I pushed the stored entry count past 10000 or so. Which is not satisfactory. So yeah, this index is basically a search-space optimization for my database traversals. I'll let you know how it goes.

2 - [back] - Which I'll also have to talk about before long, if for no reason other than to get some ideas out of my head temporarily.

3 - [back] - The second two might be compound structures, but it's still only three top-level elements.

4 - [back] - If you have none of the three components of the fact you're looking for, you can't do better than "all facts"; if you have all of them, you already have the fact you're looking for. So we're only interested in the other two cases.

5 - [back] - There's no particular reason you couldn't use some tree structure if you like, but hashes are easy, and they come with Common lisp, so...

Ruby and Erlang each come with their own modes, and recent Emacs versions ship with a built-in Python mode and shell. Smalltalk uses its own environment (though GNU Smalltalk does have its own mode), and I'd really rather not talk about PHP. If you're writing in it, chances are you're using Eclipse or an IDE anyway.