Friday, January 29, 2016

An alist is a simple data structure that holds key-value pairs in a linked list. When a key is looked up, the list is searched to find it. The time it takes is proportional to the length of the list, or the number of entries.

A hash-table is a more complex data structure that holds key-value pairs in a set of "hash buckets". When a key is looked up, it is first "hashed" to find the correct bucket, then that bucket is searched for the entry. The time it takes depends on a number of things, the hash algorithm, the number of buckets, the number of entries in the bucket, etc. A hash-table can be faster than an alist because the hashing step is quick and the subsequent search step will have very few entries to search.

In theory, an alist takes time proportional to the number of entries, but a hash-table takes constant time independent of the number of entries. Let's find out if this is true for MIT/GNU Scheme.

I wrote a little program that measures how long it takes to look things up in an alist vs. a hash table. Here's what I measured:

It does indeed seem that alists are linear and hash tables are constant in lookup time. But the hashing step of a hash table does take time, so short alists end up being faster that hash tables. The breakeven point looks like a tad over 25 elements. So if you expect 25 or fewer entries, an alist will perform better than a hash table. (Of course different implementations will have different break even points.)

A tree data structure is slightly more complex than an alist, but simpler than a hash table. Looking up an entry in a tree takes time proportional to the logarithm of the number of entries. The logarithm function grows quite slowly, so a tree performs pretty well over a very large range of entries. A tree is slower than an alist until you have about 15 entries. At this point, the linear search of an alist cannot compete with the logarithmic search of a tree. The time it takes to search a tree grows, but quite slowly. It takes more than 100 entries before a tree becomes as slow as a hash table.

With a big enough tree, the growth is so slow that you can pretend it is constant.

I noticed that there seems to be a lot more going on than CDRing down a list. The interrupt check is responsible for a lot of the overhead. MIT/GNU Scheme polls for interrupts. The compiler inserts an interrupt check at every procedure entry and backwards branch. This ensures that only a small, bounded number of instructions can run between interrupt polls. The interrupt poll and the heap check are simultaneous because of a hack. To interrupt Scheme, you save the "Top of Heap" pointer and set it to zero. This causes the heap check to fail as if there were an out of memory condition. The out of memory handler first checks to see if this was actually an interrupt masquerading as an out of memory, and restores the heap pointer if so.

The two instructions,

(cmp q (r 7) (@r 6)) ;; Interrupt check
(jge (@pcr interrupt-12))

perform the check. The first compares register 7 with the contents of memory that register 6 points at. The convention for the compiler is that register 7 contains the free pointer and register 6 holds the address of MEMTOP. The interrupt check does a memory cycle.

The reason you poll for interrupts is that you need the virtual machine to be in a coherent state because the interrupt could attempt to run arbitrary code. The garbage collector in particular has to be able to parse the machine state
at an interrupt. Interrupts are polled at apply time. Before application, the interrupts are checked. If an interrupt is to be processed, a continuation is pushed that will do the original application, and an interrupt handler is applied instead.

In MIT-Scheme, the virtual machine is a stack machine, and at apply time the entire state of the virtual machine is pointed to by the stack pointer. This is a good state to be in when you handle interrupts or garbage collect. But this means that arguments are passed on the stack. This instruction:

(mov q (r 0) (@r 4))

loads register 0 with the contents of memory at register 4. (The compiler convention is that register 4 is the stack pointer.) In other words, it is fetching the value of the argument tail from the stack. This is the second memory cycle.

Between interrupt polls, the compiled code can freely manipulate Scheme objects without worrying about being embarrassed by the garbage collector. But at apply time, when a possible interrupt poll could happen, the compiled code must put the virtual machine back into a coherent state. In particular, modified values are stored back on the stack.I This instruction just before the jump puts the value of tail back on the stack before jumping to the top of the loop.

(mov q (@r 4) (r 1))

That's three memory cycles in addition to the actual fetching of the CDR in this instruction:

(mov q (r 1) (@ro 0 8))

Of course these are quite inexpensive memory cycles because the top of stack is cached, but there is at least the time time it takes to validate the cache entry.

The interrupt poll occurs every time around the loop, so we copy the arguments back and forth between the stack and registers on each iteration. If we unroll the loop we can avoid some of the copying:

The garbage collector in MIT Scheme is a very simple, stop the world, copying collector. The GC algorithm traverses the heap in breadth-first order. An undesired effect of this is data structures being spread about the heap. When I first built the test-list, the cons cells were close together. Here are the first few addresses: