Tuesday, October 21, 2008

NHibernate 2.0.1GA is the latest binary download available, but it seems the NHibernate.Caches.SysCache binary release is lagging behind, as the download at Sourceforge was built against NHibernate 2.0.0. Here's a version of NHibernate.Caches.SysCache built against 2.0.1GA.

Monday, October 20, 2008

Object oriented languages like C# and Java use method dispatching techniques to select a concrete method to invoke when given a polymorphic type. By far the most common technique is the virtual method table, or vtable for short.

Long ago I had updated the above Wikipedia page with information regarding alternative dispatch techniques. Research published a few years back indicated that dispatching through a vtable incurred astonishing overheads, as high as 50% of total execution time. Alternative dispatch techniques based on runtime tests, such as a linear sequence of if statements checking for the various concrete class types, or a sequence of nested if statements forming a binary search, were often more efficient on a variety of hardware.

Vtables are rather flexible however, and the composition of a number of vtables can encode a variety of advanced dispatching behaviour. The .NET CLR utilizes vtables, and I've sketched out rough encodings of first-class functions, algebraic data types and pattern matching, higher-ranked polymorphism, and even a form of first-class messages, all by encoding them as virtual method dispatches.

Unfortunately, research indicated that vtable dispatch is quite expensive, and many of these idioms require at least one and often two virtual dispatches. Runtime tests and casts might be usable here instead, but a call site compiled for a fixed hierarchy of classes cannot be extended as easily as vtable-based dispatching. The only exception is a JIT using a technique called "polymorphic inline caches", which neither the CLR nor JVM utilize.

So I set about to measure the dispatch overhead on the CLR. Here's the source for my test, which consists of 3 classes in a hierarchy, each of which provide a visitor for double-dispatching, and a single method for the single virtual dispatch test. The driver code performs the runtime tests and casts at the dispatch site for the last case.

The numbers are total CPU ticks for the entire test, so lower numbers are better. It's a purely CPU-bound test, and I ran it on a Core 2 Duo 2.13GHz. I used .NET's high resolution timer, and I threw away the highest and lowest numbers:

Double Dispatch

Runtime Test

Single Dispatch

1658222872

1092469328

1136358792

1659474960

1092917368

1136520160

1659485472

1093638752

1136745536

1659866480

1094546584

1136985448

1660287768

1094639088

1137321136

1660488312

1094689040

1137526400

1660662648

1094856216

1137994576

1662110088

1095503256

1138300032

1666248296

1095734848

1138461264

1667377456

1095989528

1138609112

1672127744

1097126880

1141472360

Avg:

1662395645.09

1094737353.45

1137844983.27

As you can see, runtime tests and casts are surprisingly efficient, beating out even single dispatch. Any functional language targeting the CLR will definitely want to use runtime tests to implement pattern matching. I might try a deeper hierarchy at some point, but research has shown that up to 95% of algebraic types have 3 cases or less.

Unfortunately, when the complete set of concrete types is not known, a vtable dispatch is unavoidable, unless one resorts to runtime code generation to recompile the dispatch site. This is essentially a polymorphic inline cache, and it requires more sophisticated runtime infrastructure than the CLR natively supports, though it is possible if the language abstracts program execution away from how the CLR natively works.

So in summary, vtables are nice and simple, but rather inefficient on current hardware.