Thursday, November 27, 2008

A few posts back, I hinted at using parameterization as an alternative to metadata. I've just written a serialization interface using this technique, and a binary serializer/deserializer to demonstrate the inherent tradeoffs.

You can inspect the pair of interfaces required, and I implemented a binary serializer/deserializer pair as an example. ICompactSerializable is implemented for each serializable object. It's essentially a declarative method, which describes the sequential, internal structure of the object. It's simple and fast, since it provides native speed access to an object's fields without reflection, and no need for metadata.

Of course, the obvious downside is that clients must describe the internal structure themselves via ICompactSerializer, and refactoring must be careful about reordering the sequence of calls. The upshot is that serialization and deserialization is insanely fast as compared to ordinary reflection-driven serialization, the binary is far more compact, and clients have full control over versioning and schema upgrade without additional, complicated infrastructure (surrogates, surrogate selectors, binders, deserialization callbacks, etc.).

These serializers are very basic, but the principle is sound. My intention here is only to demonstrate that parameterization can often substitute for the typical approach of relying heavily on metadata and reflection. This is only one possible design of course, and other tradeoffs are possible.

For instance, each method in ICompactSerializer could also accept a string naming the field, which could make the Serialize call invariant to the ordering of fields, and thus more robust against refactoring; this brings the technique much closer to the rich structural information available via reflection, but without the heavy metadata infrastructure of the .NET framework.

The Serialize method of ICompactSerializable can also easily be derived by a compiler, just as the Haskell compiler can automatically derive some type class instances.

This application is so basic, that the user wouldn't even need to specify the field names manually, as the compiler could insert them all automatically. Consider how much metadata, and how much slow, reflection-based code can be replaced by fast compiler-derived methods using such techniques. Projects like NHibernate wouldn't need to generate code to efficiently get and set object properties, since full native speed methods are provided by the compiler.

Fundamentally, this question is supposed to represent a causality dilemma. When there is a causal dependency between any two objects, A and B, we can ask ourselves, "which came first, A or B?" An object C could have caused B, and thus triggered the recursive relation C→B→A→B→A... Of course, it could just as easily have been A that was caused first.

To properly answer this question, we must reduce the abstract objects A and B to concrete objects and apply our scientific knowledge to ground the recursion. Using chickens and eggs as our objects, we have to precisely define what chickens and what eggs to consider.

The question "which came first, chickens or any type of egg?" is not a causal dilemma at all, and further is not faithful to the original purpose of the question. The ancient Greeks that first posed this question had no concept of evolution and no inkling that chickens could have any relationship to other egg types, so they would not have asked, "which came first, the chicken or the fish egg?" To the ancient Greeks, such a question isn't a dilemma, it's complete nonsense. Thus, the paper linked in my last post is invalid.

The more precise and faithful phrasing of the dilemma is, "which came first, the chicken or the chicken egg?", or more generally, "which came first, species Sn or it's reproductive mechanism Rn?"

Sn and Rn are in the appropriate recursive relation to form a causal dilemma, and we can ground it in biology and chemistry. The very first single-celled organism did not form by mitosis, thus the first single-celled organism, S∅ preceded its own reproduction mechanism, R∅. Thus, the dominoes fall, ie. the first egg-laying species was not hatched from an egg, thus it too preceded its own reproductive mechanism, and so on, ad infinitum.

Eventually, we reach chickens and chicken eggs, and the conclusion is simply, that the chicken necessarily came before its egg.

Friday, November 7, 2008

Parametric polymorphism is a powerful tool for constructing and composing safe programs. To demonstrate this, I've constructed a tiny embedded stack language in C#, and I exploited C# generics, aka parametric polymorphism, to ensure that any embedded programs are type-safe, and thus, "can't go wrong".Moreover, this language is jitted, since I use ILGenerator of System.Reflection.Emit, and the complexity of doing this was no higher than creating an interpreter. This is mainly because the CLR is itself a stack-based VM, and so jitting a stack-based program to a stack-based IL is fairly natural.

This type encapsulates the IL output stream and uses two phantom type variables to encode the values at the top of the execution stack. The T parameter represents the top of the stack, and the R parameter represents the rest of the stack.

A few more types are needed to safely represent other reflected objects used during code generation:

// Represents the 'void' type used in some functionspublic struct Unit{}// Represents a named function with a known type structure T→R.// This could be a void→void, an int→string, etc.public struct Function<T, R>{ internal MethodInfo method; public Function(Action<T> f) { this.method = f.Method; }}

Next I define a set of extension methods operating on typed stack objects, all parameterized by the underlying types. You can see how the top elements of the stack are specified, and how each operation consumes these elements or otherwise modifies the stack:

This is obviously a fairly limited set of functionality, and it would require a great deal more work and thought to properly abstract the full execution of CIL (especially exceptions!), but it just might be possible. A large subset is certainly possible.

It might serve as an interesting alternative to System.Linq.Expressions for some code generation applications, since Linq expressions are largely untyped. Here are a few more opcode examples for reference purposes:

Pasting all of the code in this post into a file and compiling it should just work.

Wrapping all CIL opcodes this way could be quite useful, since you're guaranteed to have a working program if your code type checks. If you have a language, embedded or otherwise, using this sort of technique might be a good way to achieve a type-safe JIT for .NET, assuming you can assign a stack-based execution semantics. Since you're compiling to .NET, it will need exactly such a stack semantics anyway!

Tuesday, November 4, 2008

I used to be a big fan of reflection, and C#'s attributes also looked like a significant enhancement. Attributes provide a declarative way to attach metadata to fields, methods, and classes, and this metadata is often used during reflection.

The more I learned about functional programming, type systems, and so on, the more I came to realize that reflection isn't all it's cracked up to be. Consider .NET serialization. You can annotate fields you don't want serialized with the attribute [field:NonSerialized].

However, metadata is just data, and every usage of attributes can be replaced with a pair of interfaces. Using [field:NonSerialized] as an example, we can translate this class:

Essentially, we are replacing reflection and metadata with parameterization, which is always safer. This structure is also much more efficient than reflection and attribute checking.

Consider another example, the pinnacle of reflection, O/R mapping. Mapping declarations are often specified in separate files and even in a whole other language (often XML or attributes), which means they don't benefit from static typing. However, using the translation from above, we can obtain the following strongly type-checked mapping specification:

There are in general two implementors of IRelation: hydration, when the object is loaded from the database and the object's fields are populated, and write-back, when the object is being written back to the database. The IRelation interface is general enough to support both use-cases because IRelation accepts references to the object's fields.

This specification of the mappings is more concise than XML mappings, and is strongly type-checked at compile-time. The disadvantage is obviously that the domain objects are exposed to mapping, but this would be the case with attributes anyway.

Using XML allows one to cleanly separate mappings from domain objects, but I'm coming to believe that this isn't necessarily a good thing. I think it's more important to ensure that the mapping specification is concise and declarative.

Ultimately, any use of attributes in C# for reflection purposes can be replaced by the use of a pair of interfaces without losing the declarative benefits.

Monday, November 3, 2008

In a previous post, I discussed representations used for polymorphic parameters. These special data representations are a contract between the language and the runtime which enables the runtime to traverse data structures to collect garbage.

64-Bit Polymorphic Representation

I had neglected to mention this interesting variant of tagged representations that I had come across at one point, but which I believe is not in widespread use. This representation uses full 64-bit floating point representations for all polymorphic values, and encodes non-floating point data in the unused bits of NaN values. IEEE 754 floating point numbers reserve 12 of the 64 bits for flagging NaN values, so the remaining bits are free to use for encoding integers, pointers, and so on.

I haven't found very much empirical data on this approach, but it looks promising. The polymorphic representation is doubled in size, but the structure is unboxed. Thus, the polymorphic function call overhead may be slightly higher, but garbage collector pressure is reduced since less boxing is needed. Numerical code is likely to see the biggest speedups.

Also, this representation can be used to ensure that all values are properly 8-byte aligned, which is often an efficiency gain in modern memory systems. This also has the advantage of permitting unboxed floating point values, which is a significant slowdown for the other tagged representations.

Natural Representations with Bitmasks

A new representation I recently came across is being used in the SML# language. Consider a typical tagged representation where 1-bit is co-opted to distinguish integers from pointers. Each polymorphic parameter must carry this 1 bit.

The Bool is the flag indicating whether the parameter is a pointer or an integer. But using a full function parameter to hold a single bit is a waste, so let's coalesce all the Bool parameters into a single Bitmask:

val f: Bitmask → 'a → 'b → 'c

Bitmask can be an arbitrarily long sequence of bits, and each bit in the mask corresponds to the bit that would normally be in the pointer/integer representation of the corresponding parameter in a typical tagged representation. By lifting the tag bit to an external parameter, we can now use full natural representations for all parameters.

Calling a polymorphic function now consists of masking and shifting to extract the bit for the given parameter, or assigning one if calling from a monomorphic function.

Bitmasks must also accompany polymorphic data structures. The presence of any polymorphic type variable implies the requirement for a bitmask. It's also not clear how expensive this shifting is in relation to inline tag bits. SML# does not yet have a native code compiler, so any comparisons to other SML compilers aren't representative.

However, the bitmask representation does not unbox floating point numbers, but a hybrid scheme with the above 64-bit representation is possible.

Consider the lifted 1-bit to indicate whether the parameter is word-sized (1), or larger (0). Larger parameters require a full 64-bit representation, while word-sized parameters do not. The garbage collector thus skips any parameters of bitmask tag value 1 (since they are unboxed integers), and analyzes any values of bitmask tag value 0 as a 64-bit value. If the 64-bit is a NaN, the value is further analyzed to extract a single distinguished bit indicating whether it is a pointer. If it is not a pointer, it is skipped (since it is an unboxed floating point value). If it is a pointer, the structure pointed to is traversed.

This last scheme provides natural representations for integers and floating point numbers, and expands the representation of pointers in polymorphic functions by one word. Other tradeoffs are certainly possible.

Tuesday, October 21, 2008

NHibernate 2.0.1GA is the latest binary download available, but it seems the NHibernate.Caches.SysCache binary release is lagging behind, as the download at Sourceforge was built against NHibernate 2.0.0. Here's a version of NHibernate.Caches.SysCache built against 2.0.1GA.

Monday, October 20, 2008

Object oriented languages like C# and Java use method dispatching techniques to select a concrete method to invoke when given a polymorphic type. By far the most common technique is the virtual method table, or vtable for short.

Long ago I had updated the above Wikipedia page with information regarding alternative dispatch techniques. Research published a few years back indicated that dispatching through a vtable incurred astonishing overheads, as high as 50% of total execution time. Alternative dispatch techniques based on runtime tests, such as a linear sequence of if statements checking for the various concrete class types, or a sequence of nested if statements forming a binary search, were often more efficient on a variety of hardware.

Vtables are rather flexible however, and the composition of a number of vtables can encode a variety of advanced dispatching behaviour. The .NET CLR utilizes vtables, and I've sketched out rough encodings of first-class functions, algebraic data types and pattern matching, higher-ranked polymorphism, and even a form of first-class messages, all by encoding them as virtual method dispatches.

Unfortunately, research indicated that vtable dispatch is quite expensive, and many of these idioms require at least one and often two virtual dispatches. Runtime tests and casts might be usable here instead, but a call site compiled for a fixed hierarchy of classes cannot be extended as easily as vtable-based dispatching. The only exception is a JIT using a technique called "polymorphic inline caches", which neither the CLR nor JVM utilize.

So I set about to measure the dispatch overhead on the CLR. Here's the source for my test, which consists of 3 classes in a hierarchy, each of which provide a visitor for double-dispatching, and a single method for the single virtual dispatch test. The driver code performs the runtime tests and casts at the dispatch site for the last case.

The numbers are total CPU ticks for the entire test, so lower numbers are better. It's a purely CPU-bound test, and I ran it on a Core 2 Duo 2.13GHz. I used .NET's high resolution timer, and I threw away the highest and lowest numbers:

Double Dispatch

Runtime Test

Single Dispatch

1658222872

1092469328

1136358792

1659474960

1092917368

1136520160

1659485472

1093638752

1136745536

1659866480

1094546584

1136985448

1660287768

1094639088

1137321136

1660488312

1094689040

1137526400

1660662648

1094856216

1137994576

1662110088

1095503256

1138300032

1666248296

1095734848

1138461264

1667377456

1095989528

1138609112

1672127744

1097126880

1141472360

Avg:

1662395645.09

1094737353.45

1137844983.27

As you can see, runtime tests and casts are surprisingly efficient, beating out even single dispatch. Any functional language targeting the CLR will definitely want to use runtime tests to implement pattern matching. I might try a deeper hierarchy at some point, but research has shown that up to 95% of algebraic types have 3 cases or less.

Unfortunately, when the complete set of concrete types is not known, a vtable dispatch is unavoidable, unless one resorts to runtime code generation to recompile the dispatch site. This is essentially a polymorphic inline cache, and it requires more sophisticated runtime infrastructure than the CLR natively supports, though it is possible if the language abstracts program execution away from how the CLR natively works.

So in summary, vtables are nice and simple, but rather inefficient on current hardware.

Friday, September 12, 2008

Creating DSLs is all the rage these days, and for good reason. Most abstractions are actually a little language struggling to get out. Design consists of creating abstractions with maximum power, and minimum restrictions, and reusing this abstraction as much as possible. Small domain-specific languages are the ticket.

However, language implementations are often written in fairly ad-hoc ways, and most interpreters are difficult to extend to compilers and partial evaluators. Languages are usually described by an "initial algebra", which basically just means that language terms are described by data types. Here's a simple definition of a language with integers, variables and addition:

So a program in this language can consist of integers, subtraction or addition expressions. The interpreter for this language unpacks the expression by checking the tags, then evaluates the result by dispatching to the appropriate handler.

This is fine for simple languages, but it's not very efficient since tag decode and dispatch can expensive. Eliminating tags via program analysis, like partial evaluation, yields a simple compiler of sorts, thus yielding efficient DSLs.

Furthermore, we would like to exploit the type checking of the host language to ensure the type safety of the embedded language's interpreter, and type safety of any expressions of the embedded language. Unfortunately, more sophisticated languages written using this structure make it difficult to ensure that all the cases in the language are properly handled, which then requires the use of runtime errors.

If I understand the terminology correctly, they use a final algebra instead of an initial algebra to describe a language. In other words, instead of using data types to describe language expressions, they use functions. The language above is described using four functions provided by a module:

This embedded program abstracts over the type of the "interpreter" used to evaluate it. In other words, if we build a compiler backend for the module type L, we can use it to evaluate this term. A compiler is overkill for this simple language, but the more sophisticated the embedded language, the more attractive a compiler looks.

This toy arithmetic language just illustrates the main ideas, but the language described in the paper is actually a full-fledged programming language, with if-statements, first-class functions, etc. The authors then extend their tagless interpreter into a full fledged compiler using MetaOCaml's staging facilities, and then further extend the compiler into a partial evaluator. There are also more sections on abstracting evaluation order of the embedded language, ie. making the embedded language lazy even if the host language is strict, and self-interpretation (ie. metacircular interpretation).

It's a very exciting result, but as Oleg describes in the LtU thread linked to above, the abstraction over "interpreter" types L, requires the host language to abstract over type constructors. As I explained before, C# cannot do this. In order to implement a program which abstracts over type constructors in some way, we have to resort to dynamics. In other words, we have to use casts.

I have devised three techniques to structure these interpreters to be mostly safe. The tradeoffs are slightly different in each, but in all cases, the interpreter and all embedded language terms are type-safe as long as the client doesn't try any hijinks. If they do, the resulting runtime errors are a perfectly acceptable compromise IMO. The embedded language I describe is also a full-fledged programming language with if-statements, lambdas, and so on, and all examples are written in C#.

The first implementation uses a ML-module-to-C# translation I described in a previous post on this blog. It's actually rather complicated, and you should probably skip over it unless you're masochistic enough to want to delve into nitty gritty details. This encoding is the safest of the three however, as I believe there is no way to trigger a runtime exception except by resorting to reflection.

The second implementation uses a much more natural functional-style encoding, where the language is described by an interface, and an expression is an extensible data type Exp that must be implemented by an interpreter. All Exp types are "branded" by the type of the interpreter, so it's not possible to accidentally mix expressions from different interpreters.

However, because Exp is extensible, this leaves the interpreter open to client injections, where a client can trigger a runtime exception if they subclass Exp themselves, and inject such an Exp value into a program term. This is only a problem if a trusted component evaluates program terms built by untrusted clients. The trusted component must still be aware that runtime errors are possible if clients can be malicious, which seems like a perfectly acceptable compromise for the simplicity of this solution.

The third implementation is a more object-oriented style encoding, reminiscent of Smalltalk. Each core type is now its own class, and operations on those types are methods on that class. For instance, an if-statement is the If method on the Bool class.

The language interface consists only of constructors for the core language types. This approach is more verbose than the functional-style implementation, but it permits a more natural embedding of program terms in C#, because we can now use operators and ordinary method calls to operate on program terms. Like the second implementation, this too can throw runtime errors based on client-injected Exp types, so the same caveats apply.

It's not yet clear which of the last two implementations is preferable, but I prefer the functional-style for its brevity if you're writing a real interpreter and/or compiler for a language. Functional-style programs excel at symbol manipulation. If you're constructing embedded programs from within C# only, then the last implementation is probably preferable, since the availability of operators can make program terms less verbose.

All three implementations abstract over the type of "interpreter" for program terms. It could be a straight-laced interpreter, or it could be a code generator backend (like System.Runtime.Emit). For instance, in the post for the third implementation, I briefly describe a JavaScript-generating backend for an embedded language I'm playing with. It would allow an embedded program to transparently execute server-side via an interpreter or client-side via JavaScript depending on a client's capabilities.

Consider the possible uses for transparently displaying AJAX pages server-side for thin HTML-only clients, and client side for real browsers. In other words, you write this code only once. Witness the power of DSLs.

Monday, August 18, 2008

A continuation is basically the state of a program at the time the continuation was captured. For a common example, consider a blocking system call for any operating system. If the system call blocks the process, the continuation of that process is placed on a blocked queue until the system call completes. When the operation completes, the continuation is removed from the queue and resumed with the result of the system call.

There has been plenty of well-deserved hoopla around continuations for web interaction. Navigating a website is a great deal like operating on a program's continuations. A page generated from a server-side program contains references to a number of program continuations in its links and forms. Clicking a link or submitting a form invokes the captured continuation which the server then executes. The result is yet another set of continuations on the resulting page, ad infinitum.

Unfortunately, most continuation frameworks are designed around server-side continuations, where the continuations are saved on the server, and the client receives only a secure token naming the continuation. The Waterken server is a particularly beautiful example of this for Java. Awhile ago I ported an older version of the Waterken server to ASP.NET. Here is a public continuation for the comments page, and embedded in the comments page is the editing authority, which is an unguessable capability URL for the continuation to edit the comments page.

The Waterken "web-calculus" is actually a full blown persistent, distributed object system with a binding for the web.

However, maintaining persistent server-side state significantly complicates many aspects of web systems. It makes load balancing servers difficult, and persistent continuations are single threaded so only one invocation can proceed at a time which hurts scalability. Furthermore, if the continuations are persistent, as they are in Waterken which provides orthogonal persistence, then we must also contend with schema upgrade.

Two recent articles prompted me to re-examine these issues and devise a solution. First, Tommy McGuire posted a link to his "Infernal Device" paper on LtU. His paper describes a set of structures which can be used to develop an application by reifying the application's states as transitions in a finite state machine. The Transition objects are essentially application-specific partial continuations. The paper also details all of the different storage available in a browser, and how it can be exploited by a server to save state on the client.

For page-local state, the browser URL can hold up to 2kB on IE (other browsers can store more), and we have embedded form fields for POST requests. For site-local state, we have cookies which can be in the multi-kB size.

Then I came across this blog post about scaling Seaside by reducing the need to capture server-side state. The best way to optimize expensive operations is by avoiding them completely!

However, much program state does not need to reside on the server. We can capture a limited set of program state in the browser itself. In essence, we serialize the program continuation in the browser's URL or in its form fields. We also encrypt the serialized form to maintain integrity.

And so our program continuations have become mobile objects that migrate between client and server and can survive server restarts as long as any server state the continuations access also survives restart. A set of load balanced machines need only share the encryption keys, and user requests seamlessly migrate between them.

I've created a small library for mobile continuations written in C#. I also provide an IHttpHandler for use with ASP.NET. A web application using this library doesn't need to use any web controls or .aspx pages. An application is structured as a set of continuations which implement two operations: Show, and Apply. Show outputs the continuation via display-independent set of interfaces. Apply is invoked when a continuation receives a POST request. It takes a NameValueCollection containing all of the form's fields. This takes the place of a continuation's named arguments. I could implement a more elaborate system as found in the Waterken server where the fields are implicitly typed and named, but I felt a more limited scope is preferable for the time being.

By convention, continuations in the research literature are generally named "k", so the continuation interface is named K. This simple HelloWorld class implements the K interface and it embeds a continuation to an Adder object in its display:

The Adder class is equally simple, consisting of a text input and a submit button. This continuation builds a form which submits to itself. When applying the continuation, it adds its currently stored value to the value specified in the form and returns a new instance of Adder with the stored value.

The library also provides a compact object serializer. HelloWorld serializes to 48 bytes. Adding some simple fields barely increases the size at all. You probably won't be entering any of these URLs by hand of course. There are a few more optimizations that can be made to the serializer as well. CrystalTech hosts ASP.NET in a medium trust environment, so the library also works for shared hosts. Some contortions were necessary, and fully general object serialization is not yet available (for instance, serializing delegates will fail).

By default, the library assumes all continuation objects can be migrated to clients. This encourages more scalable program structures. If some object needs to remain server-side, simply implement the ILocal interface. There are no operations, but the interface marks the class as non-mobile. The default behaviour in this case is to generate a random 64-bit token, and migrate this token instead. When the token migrates back to the server, it re-connects to the original object. If the server restarts, this state is lost, but a state server as found in ASP.NET can be built which will allow the state to survive restarts and be shared between machines.

Schema upgrade can still pose a problem unfortunately, and unlike with server-side state where the continuations are stored centrally, you can't use a tool to upgrade them. Altering method implementations does not pose an upgrade problem. Adding and removing fields is problematic however.

Fortunately, applications using the library are already structured for easy upgrade and permit a configurable policy for deprecation. Basically, don't touch existing classes, and only create new classes. Replace the use of old classes in the program with the new classes, then whenever you judge it appropriate, remove the old class entirely. On any serialization errors, the library will invoke an application-specific handler which returns a continuation. I think any conceivable upgrade policy is feasible given these primitives.

The current source for the library is available here. I plan to improve the mechanism for displaying continuations in a medium-independent fashion. Ideally, I'd like to be able to create a binding for Windows.Forms, and for JSON requests, so that the same continuation-structured application can be reused unmodified in different environments.

Monday, July 28, 2008

I have yet to find a comprehensive description of the various data representations used for garbage collection. This isn't an overview of garbage collection algorithms, but an overview of how to distinguish pointers from other values and how to traverse the data structures, which has seemingly gotten only a short riff in the literature. I aim to give an in-depth overview of the techniques I've come across or managed to puzzle out on my own.

Boxing: all values are allocated on the heap, so basically everything is a pointer, thus, there is no ambiguity between pointers and values.

Tagged pointers: pointers are distinguished from ordinary values by some sort of tag. This information is often encoded in the pointer or value itself for a very compact representation.

Type-passing: each parameter of a function is associated with a set of operations, usually referred to as a "dictionary", which is similar to an interface in object-oriented languages. This dictionary defines a collect() operation for each type, which will traverse the structure looking for garbage. This dictionary can be passed into every function, or it can be reconstructed from the stack.

Why Are Special Representations Needed?

One might first ask why special representations are even needed. At compile-time, it's generally perfectly clear what types are being used, as well as the types of the various data structures.

While this is true of simple C-like languages, it isn't true of polymorphic languages. C is what's known as monomorphic, meaning all parameters have a fixed, known type at compile-time.

By contrast, polymorphic languages can have function parameters that can accept any type, and this type is not fixed at any time. A simple example is the identity function:

val id: 'a -> 'a

This function accepts a parameter of any type, and returns a value of the same type. But since you can't know this type ahead of time, you need some way to encode the type in a running program so the garbage collector has enough information to traverse it in its search for garbage.

This problem only really crops up with the combination of polymorphism and separate compilation, meaning you compile a program to a single machine code file which can then be distributed without prior knowledge of how it will be instantiated and used. The .NET CLR is an example of a platform which supports polymorphism, but not separate compilation, since it uses the JIT to specialize methods that use generics. A specialized method is GC'd differently depending on whether it's instantiated with a reference type or a value type. In separately compiled languages, this is not true, hence the need for a special representation.

Boxing

All boxed values are allocated in the heap, including integers and floats. This is the most straightforward representation, and the least efficient of the options both in terms of space and time.

The extra space use comes from the additional pointer needed to indirect to an int, and the additional allocator headers required to track it.

The additional runtime overhead is incurred when repeatedly indirecting the pointers to access the values, and the additional garbage collector pressure from having so many small values.

Allocating many small types like this could also lead to significant memory fragmentation, which balloons memory use, and induces poor locality of reference.

A boxed version of the identity function would look like this:

void * id(void * v) { return v;}

Basically, we don't know anything about the value involved, and the void pointer just points to a heap location where the value is stored.

There is a single flag stored in the allocator headers to indicate whether this is a compound structure or a value. An int is a value, and a (int,int) pair is a compound. All compounds consist of pointers, so the collector just recursively traverses all of the pointers in the structure, and halts when it reaches values. For compounds, the size of the structure must also be encoded in the header.

Tagged Pointers

Tagged pointers have a long tradition in functional languages. Typically, all of the tag bits are encoded within pointers and values themselves, so the representation is very compact. This is important for functional languages in particular, since they encourage the allocation of many small objects.

Modern memory allocation algorithms generally segregate free space in variously sized buckets. The smallest bucket is generally the size of a pointer, or perhaps twice the size of a pointer. This is generally 4 bytes on a 32-bit machine, so at the very least, the pointer will always point to an address aligned on a 4-byte boundary.

However, most pointers are byte-addressable, meaning they can point to addresses that are aligned on a single byte boundary. This means that the lowest two bits of a pointer are generally wasted space, since they will always be 0. So let's use that wasted space for something!

Let's say we use the lowest bit to differentiate between pointers and integers on the stack and in data structures. This means an int will no longer be 32-bits, but 31-bits, with 1 bit going to this flag. If the lowest bit is 0, it's an int, and if the low order bit is 1, it's a pointer. This means we can scan a stack frame or a data structure one word at a time and easily distinguish integers from pointers by just checking the lowest bit.

So now we can have fully unboxed integers, which saves a lot of time and space in programs since we no longer need to allocate them on the heap. Anything larger than a word must still be heap-allocated.

Field offset calculations now require us to omit the lowest bit, which involves subtracting 1 from any offset. Since most offsets are statically known, this costs nothing at runtime.

There is one remaining problem with the above representation, namely, we still don't know the size of a structure! When the GC traverses a structure, it will iterate right off of the end since it doesn't know when to stop.

The paper I linked to above places the size in a header word present in every structure. This header word is split into multiple fields. On a 32-bit machine, the lowest 22 bits are used for the size of the structure in words, the next two bits are used by the garbage collector (typically a mark-sweep), and the last 8 bits are used for the constructor tag, ie. Nil or Cons. The constructor tag is typically present in every data structure, but using a whole word for it is somewhat wasteful, so this more compact representation is quite useful.

This representation may also permit some interesting optimizations. Since word-sized values are unboxed, all heap allocated structures will be at least 8-bytes in size, consisting of the header word followed by a number of data words (a commenter pointed out that this doesn't necessarily imply 8-byte alignment however; while true, as mentioned above allocators segregate memory into variously sized buckets, and so we need to simply ensure that these buckets are multiples of 8-bytes). This means the third lowest order bit is also unused, giving us two free bits in every pointer.

Consider the following type declaration:

type 'a Foo = Foo1 | Foo2 of int | Foo3 of 'a | Foo4 of 'a * 'a

Since Foo1 does not contain any embedded values or pointers, we don't need a pointer for it at all and it can just be represented by a simple unboxed integer.

Foo2 contains an integer, so we must allocate a heap structure to contain this case.

Foo3 and Foo4 are the interesting cases. 'a is a polymorphic value, and if instantiated with a pointer type, we don't need to allocate a heap structure for it. Basically, we can use the two free bits to encode up to 3 constructors for such values, and remove a level of indirection. Pattern matching on such structures is broken down into 3 cases:

If it's an int, then it's one of the empty constructors.

If it's a pointer, and the second and third lowest order bits are clear, extract the constructor tag from the header, and the value is just after the header.

Else, extract the constructor tag from the pointer, and the pointer is a pointer to the value.

However, to maintain separate compilation, we cannot assume that 'a is a pointer type, as it could be an int. Therefore only Foo4 can benefit from this inlining optimization in this particular case.

Coming back to the question of identifying structure sizes, an alternative representation which I have not seen described anywhere, is to use the unused third lowest bit as a mark bit to terminate GC scanning, and use the second lowest bit to distinguish structures with nested pointers, and those that are pointer-free.

A '1' in the second lowest order bit position flags a "value structure" which contains no pointers. In other words, it consists entirely of integer-sized data, so the GC does not need to scan this structure. Alternately, a '0' in this position indicates a "compound structure" which contains pointers, and which must be scanned by the GC.

A '1' in the third lowest order bit position flags the last pointer in a data structure. This bit is only set in pointers within data structures, and is clear everywhere else.

Note that only the last pointer is used for the halt condition, since the collector cares only about pointers and ignores unboxed values. The compiler can optimize garbage collection by placing all pointers at the beginning of a structure. Also, when extracting a pointer from a structure or writing one into a structure, it should be masked to clear or set the appropriate bits.

Using this representation, we free the 22-bits used for the structure size in the header. We can thus reserve a full 24 bits for the GC, which expands our choice of GC algorithms considerably. Even ref-counting becomes feasible with 24-bits in the header.

One possible downside of this representation is that the size header was also used for array lengths. The length must now be stored in its own field in the structure, so we've effectively grown the array type by one word. Considering the size of arrays compared to a single word, this seems like an acceptable overhead for this representation's uniformity and flexibility.

Type-Passing

Type-passing, aka tagless GC, is the most sophisticated GC technique. There are two approaches that I'm aware of:

Dictionary-passing: a dictionary consisting of functions specific to the type is passed along with all values of that type. These functions know how to collect values of that type.

Type Reconstruction: basically, the concrete type of a polymorphic function is reconstructed at runtime by inferring the type from previous activation frames. Monomorphic functions have all the type information needed to fully specify any polymorphic parameters, so we just search the stack for the type information stored there.

The benefits are that pointers are just pointers, integers are full-sized integers, and no special meaning is assigned to their values or any bits contained within the word.

To my knowledge, only two languages have used this technique, Id and Mercury, and only Mercury is still available as far as I know. The .NET CLR's technique can also be considered a form of type-passing.

The downsides of dictionary-passing are that every polymorphic parameter is paired with a dictionary parameter, thus increasing the parameter lists of all polymorphic functions. The additional pushing and popping can outweigh any advantages that might be gained from dictionary-passing.

Type-reconstruction can remove the need to pass this parameter at the expense of additional runtime overhead. Basically, we don't need to pass this dictionary to every function call, we just need to get it paired with the parameter in the monomorphic function which calls the first polymorphic function. As we walk the stack we can then extract the dictionary for a given parameter from the last monomorphic function. This becomes more complicated with closures, since the dictionaries for polymorphic parameters must now be saved in the closure environment as well.

This process is essentially a form of type inference performed at runtime. Unfortunately, type inference often has pathological corner cases with exponential running time. These cases are unlikely, but inference in general has substantial overhead (see the Id paper).

Acknowledgments

[Edit: GHC's recent pointer tagging paper actually discusses the constructor inlining optimization I describe above, along with some hard numbers to quantify how many constructors can be eliminated. Turns out that >95% of constructors contain 3 cases or less, so the technique could amount to a substantial savings. Theoretically, it seems that every second nested data type could be unboxed in this fashion.]

Friday, July 18, 2008

My last post generated a bit of discussion here and on Reddit, where after further testing, I discovered that stack management didn't work properly on Windows. It would seem that embedded frame pointers on the stack are used to ensure that we are on the same stack, at least in DEBUG mode, and so growing or shrinking the stack generated cryptic errors.

These problems motivated me to explore alternate methods of implementing coroutines. Fundamentally, the current state of a computation is encapsulated in the stack. So switching between two concurrent computations means modifying the stack, either by swapping out the old stack for an entirely new one, or by copying the relevant portion of the stack to a save area, and restoring it when switching back. Coroutines and threads typically use stack switching, and continuations have traditionally used some sort of copying.

The initial coroutine implementation used stack switching, which is fairly efficient incurring only a register save, a pointer update, and a register restore. Stack copying involves a register save, a copy to save area, a copy from a save area, and a register restore. The additional copies incur a certain overhead.

The benefits of copying are that it's virtually guaranteed to work across all platforms, and that all stack data are restored to their original locations so the full semantics of C are available. Stack switching restricts certain C operations, such as taking the address of a local variable.

My naive copying implementation was about two orders of magnitude slower than stack switching. It performed about 300,000 context switches/sec on my Core 2 Duo, vs stack switching at ~12,500,000 context switches/sec. I reduced this overhead by an order of magnitude by caching stacks (~1,500,000 context switches/sec), so part of the stack save area is wasted, but at least we're not allocating and freeing on each context switch.

The copying implementation can be optimized further using lazy stack copying. Basically, only a portion of the stack is restored, and an underflow handler is set up as the return address of the last stack frame. The underflow handler then restores the next portion of the stack and resumes the computation. If the underflow handler is never invoked, then we only need to copy that small portion of the stack back to the save area, so this ends up being a huge win.

Unfortunately, while establishing an underflow handler is trivial to do in assembly, I haven't been able to figure out how to accomplish this in C. It would be a neat trick though, and I'd estimate that it would bring the performance of a copying implementation very close to a stack switching implementation.

I've also started playing with an implementation of delimited continuations for C, since this has the potential to reduce the amount of copying needed. It's tricky to do in C though, so I don't have anything solid yet.

For expediency, I have also implemented coroutines using Windows Fibers. Unfortunately, Fibers don't support any sort of copy or clone operation, so I had to retire coro_clone for the time being. This means my coroutine library can no longer be used to implement multishot continuations. Of course, this capability is trivial in a stack copying implementation. Hopefully I'll be able to figure out a way to improve the performance of copying coroutines/continuations in a portable way.

Wednesday, July 16, 2008

I've just uploaded a functional coroutine library for C, called libconcurrency. It's available under the LGPL. I think it's the most complete, flexible and simplest coroutine implementation I've seen, so hopefully it will find some use.

Next on the todo list are some more rigourous tests of the corner cases, and then extending libconcurrency to scale across CPUs. This will make it the C equivalent of Manticore for ML.

There is a rich opportunity for scalable concurrency in C. Of course, I only built this library to serve as a core component of a virtual machine I'm building, and that's all I'm going to say about that. ;-)

Monday, May 26, 2008

In a slight departure from my usual computer science focused musings, I'm going to analyze another logical conundrum that has raged for centuries. Which came first, the chicken or the egg?

One of my many online debates clued me into the fact that there is a widespread belief that the egg came first. I even found a "paper" providing an in-depth analysis concluding the same. Unfortunately, the analysis appears flawed. An e-mail to the author bounced, so I figured I might as well post my brief analysis here.

Base case: single celled organism, asexual reproduction via mitosis (given our current knowledge)n implies n+1: species n produces, by its own reproduction mechanism Rn, an offspring n+1 with a slightly different reproduction mechanism, Rn+1.Conclusion: the "chicken" came first.

In this case, our "chickens" were not hatched from "chicken eggs", but were instead hatched from the "chicken's" progenitor's egg type. The authors of the paper attempted to disregard this semantic interpretation under "Chicken Precision", but as this is a metaphor, a "chicken" is merely a stand-in for a particular species to be substituted at will.

Thus, the only universally quantifiable proposition is that the "chicken" came first, since the first egg-bearing species, Sn+1 and Rn+1, was produced from some non-egg-based reproduction mechanism, Rn. The contrary proposition that the egg came first contradicts the base case of the above inductive argument where we know reproduction was based on asexual mitosis.

Thursday, April 17, 2008

My day job is has stuck me with C#, but I try to make the best of it, as evidenced by my FP# and Sasa C# libraries.

One thing that still gets in my way more than it should is O/R mapping. No other mapper I've come across encourages a true object-oriented application structure. Granted, I've only really used NHibernate, and I had built my own mapper before that was even available, but I've read up quite a bit on the the other mappers.

By true OO structure, I mean that all application objects are only constructed from other application objects, which doesn't involve dependencies on environment-specific code (ie. if you're running under ASP.NET, Windows forms, Swing, etc.). A pure structure encourages a proper separation between core application code, and display and controller code, which allows more flexible application evolution.

Instead, controller logic often manually constructs application objects, passing in default arguments to properly initialize the required fields. This means constructor and initialization code must be duplicated when running in another environment, or tedious refactoring is needed when changing the constructor interface. Further, the defaults are hardcoded in the code, which means changes in defaults require an application upgrade.

Instead, O/R mappers should promote a factory pattern for constructing application objects. Factories themselves are constructed when the application is initialized, and are henceforth singletons within a given application instance. O/R mappers don't support or encourage factories or singletons in this manner however, as they always map a key/identifier, to an object instance. Factories are slightly different as they are generally singletons.

Of course, a Table with dimensions of 0'x0' is invalid, so we need to ensure that a Table is initialized with a proper length and width. We can pass in a pair of default dimensions when constructing a Table instance in a controller, but chances are the default values will be the same everytime you construct an instance of Table. So why duplicate all that code?

For instance, suppose we have another class "DiningSet" which consists of a Table and a set of Chairs. Do we call the Table constructor with the same default values within the DiningSet constructor?

Of course, many of you might now be thinking, "just create an empty constructor which invokes the parameterized constructor with the default values; done". All well and good because your language likely supports the int type very well. Now suppose that constructor needs an object that cannot be just constructed at will from within application code, such as an existing object in the database.

The IProductFactory abstract all factories which construct products. Any parameters that the base Product class accepts in its constructor are passed in to the Make() method, as this is shared across all Product Factories. TableFactory is mapped to a table with a single record containing the default length and width values. If the constructor requires an existing database object, this can be referenced via a foreign key constraint, and the O/R mapper will load the object reference and its dependencies for you.

Since factories are generally singletons, it would be nice if O/R mappers provided special loading functions:

public interface ISession{ T Load<T>(object id); T Singleton<T>();}

This models and O/R mapper session interface after the one in NHibernate. Note that a special Singleton() method simply loads the singleton of the given type without needing an object identifier.

Our controller code is thus reduced to:

...Product table = session.Singleton<TableFactory>().Make();...

Which encapsulates all the constructor details in application objects, does not hardcode any default values since they live in the database and can be upgraded on the fly, isolates refactorings which alter the Table constructor interface to the TableFactory alone, and simplifies controller code as we don't need to load any objects. This is a "pure" object-oriented design, in that the application can almost bootstrap itself, instead of relying on its environment to properly endow it with "god-given" defaults.

This approach also enables another useful application pattern which I may describe in a future post.

[Edit: I've just realized that the above is misleading in some parts, so I'll amend soon. Singletons aren't needed as much as I suggest above.]

Tuesday, April 1, 2008

Calculating permutations has some fairly nifty algorithms out there. I recently ran into a permutation problem for which I couldn't find an existing algorithm. I admit that I didn't look too hard though. Basically, I needed the permutations of the elements of a set of size N over K slots. However, the permutations should include duplicate elements from the set, as K > N is valid configuration. This corresponds to NK permutations. Most algorithms I found did not permit duplicate elements.

As an example of an application for such a permutation algorithm, imagine the set of all function signatures of arity K-1 over N types. This corresponds to K slots with N possible choices for each slot.

I devised a fairly simple implementation of such a permutation algorithm. Essentially, N forms the base of an arbitrary-precision integer of size K. In other words, we have an array of elements with a maximum of N which index our set. To permute, we simply increment the first element and propagate any overflow across the rest of the array. If the carry is 1 when we're done iterating over the whole array, then we're done generating permutations.

Calculating permutations has been reduced to counting! Here is the C code:

The only assumption is that N is less than UINT_MAX. Also, for some insane reason that I cannot fathom, I can't get the slots to print in reverse order. The trivial reversal of the print loop induces an access violation under Windows:

Monday, February 11, 2008

Many people find themselves stumped by the so-called Blue-Eyed Islanders puzzle. There is also much controversy over its supposed solution. I'm going to analyze the problem and the solution, and in the process, explain why the solution works.

To begin, let's modify the problem slightly and say that there's only 1 blue-eyed islander. When the foreigner makes his pronouncement, the blue-eyed islander looks around and sees no other blue eyes, and being logical, correctly deduces that his own eyes must be blue in order for the foreigner's statement to make sense. The lone blue-eyed islander thus commits suicide the following day at noon.

Now comes the tricky part, and the source of much confusion. Let's say there are 2 blue-eyed islanders, Mort and Bob. When the foreigner makes his pronouncement, Mort and Bob look around and see only each other. Mort and Bob thus both temporarily assume that the other will commit suicide the following day at noon. Imagine their chagrin when they gather in the village square at noon the next day, and Mort and Bob both look at each other in surprise because Mort thought Bob was going to commit suicide, and Bob thought the same of Mort! Now both Mort AND Bob know their own eye colour is blue, and they will both commit suicide on day 2.

The very same argument can be extended to 3 blue-eyed islanders, Mort, Bob and Sue, who will commit suicide on the third day at noon. The day of the pronouncement, the three of them see each other, and Sue assumes Mort and Bob see only each other. Being logical, she thus deduces that they will commit suicide on the second day, by the above argument. Mort and Bob each see the same number of blue eyes as Sue, and thus reach the very same conclusions.

Imagine Sue's chagrin, when she gathers in the village square on the second day, and Mort and Bob are both surprised that she's not committing suicide! She now knows that her eye colour is also blue, and all three of them will kill themselves on the third day.

This inductive argument can be generalized to N blue eyed islanders, where all N of them will suicide on the Nth day after the pronouncement. QED.

Saturday, January 19, 2008

Extending the work in my last post, I've developed a way to express an almost type-safe, general monad in C#. Similar to my module translation, the single monad object becomes a pair of co-operating objects, only one of which the monad implementor must define. Since C# cannot abstract over type constructors, I had to exploit the only feature that could accomodate the flexibility I needed: C#'s dynamic typing.

Looks a bit complicated, I know, but every piece is well-motivated. IMonadOps is fortunately the only interface that new monads must implement. Note the type constraints. The interface and the general monad both have a "phantom" or "witness" type M, constraining the type of the monad operations to the same IMonadOps type that created the monad. This means that the monad is entirely closed to extension and inspection.

Each IMonadOps is effectively a stateless singleton. The constraint to a struct is merely an optimization. Every IMonadOps implementor is actually very similar to the body of an ML module. If I want to invoke Identity.Zero, I can do it thusly:

default(Identity).Zero<int>();

This reveals the magic I used to define the Monad type. Whenever the Monad<M,T> type needs to invoke the underlying monad operations, it invokes the operation on the corresponding singleton M just like the above. Indexing the Monad by an IMonadOps implementation M, is like a linking step between Monad and M.

You can see the dispatching at work here. All monad operations are dispatched to the "linked" methods of M. In a sense, we have succeeded in abstracting over the concrete implementation of Monad.

Now comes the catch: it's not fully type-safe, because the encapsulated state of the monad must be stored as 'object'. This means that each monad body must ensure it properly casts to and from the appropriate type. This is again due to the type constructor abstraction limitation.

Your first thought might be, why can't the encapsulated state simply be 'T'? Well, if all you wanted was an Identity monad, then that would be fine. But consider the Maybe monad:

The encapsulated state is an Option of type T, not a T. As another example, the List monad encapsulates a list of T's. Since we can't abstract over the type constructor for the encapsulated state, we thus need to resort to dynamic typing.

Now comes a slightly bizarre part: what are the injection and projection functions for? Well, despite the fact that IMonadOps is the "internal implementation" of Monad, it doesn't have direct access to the monad's internals. Unfortunately, sometimes that access is needed. Consider the List monad:

ListM needs access to the private state of the returned list of Monads in order to flatten the list, but that access is not permitted since Monad is a separate, encapsulated type. There is no way to make this state available using inheritance or access modifiers, without also permitting the state to escape inadvertently.

Instead, the Monad provides an injection/projection pair, which are used to construct monad instances when given private state, or read out the private state of a monad instance, respectively. Note that encapsulation is maintained since this ability is granted only to the implementor of a given monad, which is already trusted with its own state.

I suspect there's a more efficient way to share the monad's state, but I'm a little tired from standing on my head all day for C#, so if anyone has any ideas, I welcome them. :-)

While this encoding is less efficient than the one described in my previous post, it's safer in some ways for users monad implementors alike, and I proved to myself that C#'s type system is powerful enough to encode the monad interface if you contort yourself appropriately. This technique for abstracting over type constructors might even be usable in my tagless Orc implementation.

Friday, January 18, 2008

I've found other monad tutorials very frustrating. They are typically written in expressive languages with type inference, which permits concise descriptions, but obscures the underlying type structure. I've been struggling with writing something close to a monad in C# for quite some time, simply because none of these tutorials give a sufficiently complete description of a monad's structure. Suprisingly, the Wikipedia page on monads helped clarify what I was missing.

-- return is a type constructor that creates monad instances return :: a → m a

-- bind is a function that combines a monad instance m a with a-- computation that produces another monad instance m b from a's-- to produce a new monad instance m bbind :: m a → (a → m b) → m b

So the monad type 'm', has a function 'return' that constructs instances of that type, and 'bind' which converts an instance of that monad into another instance of a monad by passing its private state to the provided function. For instance, meet Haskell's Identity monad:

What's not at all obvious from any of the above signatures, or any of the existing tutorials, is that the monad that the function f returns, must be the same monad type. If you're in the list monad, f must return a new instance of the list monad. If you're in the identity monad, f must return a new instance of the identity monad. This simple fact eluded me for quite some time, and explains why the monad interface cannot be expressed in C#. We can still program using monads, but the interface can't be enforced by the type system.

Note that all of these monads in C# share a common structure. They have at least one constructor used to wrap a value (called 'return' earlier), and they all have a Bind method which operates on the value encapsulated in the monad, and maps it to a new instance of the monad. Since we have such a similar structure, is there a way to declare an interface or an abstract base class declaring the signature for the Bind method?

Note the two emphasized sections: the class constraint on Monad, and the type parameter Identity provides to Monad when inheriting from it. They are both used without a type argument. This is illegal in C#/.NET, but it's perfectly legal in languages with more powerful type systems, such as those with "kinds". Identity<int> has kind *, while Identity without a type argument has type *⇒*, ie. a function that constructs a type of kind * when given a type of kind *. This is why monads are not so easily translated into languages like C#.

While quite simple, it's not as efficient as it should be since C#/.NET doesn't support proper tail calls. You can see in that source file that I derived FoldLeftDerived from SuperFold. Deriving FoldRight is a bit trickier, so I have to think about it. The simple, inefficient, answer is to simply reverse the list.

I'll probably remove SuperFold from my purely functional list, since I don't think it will end up being useful in C#. The iterated FoldLeft I defined is more efficient due to the lack of tail recursion, and the iterated version also supports early termination.

[1] It's very frustrating to see how close MS gets to a truly general and useful abstraction, only to lock it down for no apparent reason. What good reason is there for Nullable to be restricted to struct types? If you give it more than a second's thought, I think you'll realize that there is no good reason. Nullable is the option type if it weren't for this restriction!

Tuesday, January 8, 2008

As Chung-chieh Shan pointed out, my encoding of modules in C# is somewhat limited. In particular, I cannot abstract over type constructors, which is to say, C# is missing generics over generics. Consider the Orc.NET interpreter:

This is the result of my translation, which was necessitated by the "Where" method. Where introduces a dependency which currently cannot be expressed with ordinary C# constraints, so the module encoding is necessary.

The above interface is a direct, faithful implementation of the Orc semantics. The implementation I currently have is an interpreter for those semantics. What if I want to provide a compiler instead? The interface should remain the same, but the implementation should differ. This is a classic abstraction problem.

It's clear that the implementations can be changed at compile-time, but providing a different Orc implementation at runtime, something very natural for OO languages, does not seem possible. The reason is that C# lacks the requisite polymorphism.

A generic type is a type constructor, which means that from an abstract type definition, Exp<T>, you can construct a concrete type by providing a concrete type T, such as Exp<int>. But, if Exp<T> is not a concrete type until you provide it a concrete type T, what is the type of the abstract definition Exp<T>?

Reasoning about type constructors requires the introduction of something called "kinds". As types classify values, so kinds classify types. The set of values, or concrete types, is of kind *. The set of type constructors is of kind *⇒*, which is to say they are "compile-time functions" accepting a concrete type and producing a concrete type.

Now consider that we have multiple Exp<T> implementations, say Interpreter.Orc.Exp and Compiler.Orc.Exp, all with different innards, and all define the same operations and thus are theoretically interchangeable. We would somehow like to abstract over the different implementations of Exp<T> so that we can use whichever one is most appropriate. In other words, we want to make our code polymorphic over a set of generic types Exp<T>.

This necessitates type constructor constructors, the kind *⇒*⇒*, which accepts a type such as Orc.Compiler.Exp, and produces a type constructor Exp<T>.

At the moment, I can't think of a way to encode such polymorphism in C#. Functional languages provide this level of polymorphism, and some even provide higher-order kinds, meaning type constructors can be arguments to other type constructors, thus achieving entirely new levels of expressiveness.