Tuesday, March 23, 2010

Extensible Data Structures in C

A lot of systems programming code is done in C, primarily because of the exposure of explicit memory addresses, but for other reasons too. However, C has pretty poor language support for many of the helpful programming constructs in object oriented languages that improve code re-use and readability, for example generics/templates, polymorphism, and inheritance. So systems programmers have developed a few design patterns to emulate such language constructs using C.

Lately I've been looking at how to design data structures in C that can provide flexible storage for data elements that can be used by different consumers. In particular, I'm looking at how a thread control block can provide storage for different implementations of scheduling systems.

There are three features of C that appear useful for this purpose: void pointers, unions, and typedefs.

union works well when you know exactly what type of data the different consumers will use ahead of time, and you are willing to place multiple specifications within the data structure. The advantages of union are that it is easy-to-read and efficient, providing multiplexed storage space to different data structures. Some disadvantages are that the programmer has to know which field of the union to access, and the size consumed by the union is equal to the largest member.

void pointers are type-less memory addresses, so a programmer can assign the value of the pointer to point to any structure, and later retrieve what was stored. However, as with unions, the programmer must know the type of the structure pointed to by the void pointer in order to use the structure. This usually is not a problem for the scenario I'm investigating.

typedefs provide a way to define new types for structures and primitives. They allow programmers to define opaque types whose representation can change, but whose type can always be checked. They have the advantage over union and void pointers of being able to be type-checked. The way to use typedef to provide extensibility is to combine it with C pre-processor conditional compilation to control which typedef is used. This way, multiple typedefs of the same type can be defined, and only one will be used based on pre-processor defines. However, code that uses the generic typedef may need to provide multiple cases based on the different specific typedefs provided.

I may come back to this topic and provide some examples. I'm playing around with typedefs right now, and like this approach because the ugliness can be hidden and the end result is high-performance code with little bloat in the compiled version.