C/C++

Discriminated Unions (I)

By Andrei Alexandrescu, April 01, 2002

Discriminated unions are an important data abstraction, very helpful in database interfaces, efficient parsers, and various other type-safe abstractions. Usually, support for discriminated unions must be present in the language. This column presents a full-blown source-level implementation of discriminated unions in existing C++, an implementation that satisfies most of the requirements that one would normally ask from an innate language feature, syntactic nicety included.

In conclusion, it would be best if Variant used in-situ allocation.

The search continues with the obvious idea of in-situ allocation: we could use an untyped buffer of characters, right?

We actually can compute neededSize quite easily. Just think of implementing a recursive algorithm that gets the largest element in a typelist: you compare the head with the largest element in the tail (which you compute recursively), and you choose the larger of the two.

As you can see, MaxSize recursively applies itself to compute the internal variable tailResult, and then it chooses the larger number. The limit case (null typelist) just returns the smallest possible positive number  zero.

If you heard an awful sound, that's our current storage implementation hitting the wall: it doesn't work. Worse, it might work sometimes  with some types, on some machines, on some compilers, and during certain phases of the moon. Get ready for tough times  we're having alignment issues.

Indeed, the problem with our approach is that the compiler doesn't know that we plan on storing "true" data inside buffer_. For all it knows, the compiler must care about a buffer of characters.

It turns out that, in most current computer architectures, some data types must be stored at specific offsets in memory. For example, typically four-byte integers can be allocated only at addresses that are a multiple of four. That doesn't mean that all eight-byte data must be allocated at multiple-of-eight addresses  their alignment could be four as well. (No data, however, can require alignment greater than its own size, because that would break the array storage rules.)

The rationale for the notion of alignment is the physical arrangement of memory banks, the bus, and the processor. For example, certain memory banks are physically linked to the lowest-order eight bits of the memory bus, so by construction their content cannot directly reach the higher-order bits.

If you try to access data that's not properly aligned, one of several things might happen. You can get the data you wanted, only much slower, because essentially the processor simulates in software, by shuffling data around, what should be a hardware operation. (The x86 processors do this.) Many other processors, however, terminate your program immediately on grounds of an attempt to execute an illegal memory access.

The good news is that most of the time, you don't need to worry about alignment: the processor nicely takes care of it by allocating all data at properly aligned addresses. The bad news is that you do need to care exactly now, because buffer_[neededSize] really means you're taking your life in your hands.

How can we guarantee that buffer_ is properly aligned? Well, that's exactly what union does  it guarantees enough size for its largest member and alignment for its most restrictive member. This is what we need, so our storage should look something like this:

union
{
unsigned char buffer_[neededSize];
Align dummy_;
};

We never use dummy_; it's there just for alignment purposes. We now only need to define the Align type properly. We'll investigate exactly how to do that in the next installment. Until then, happy coding and ideas galore!

[5] Rumor has it that the next revised C++ Standard, fondly called C++0x by those who keep score at home, will support types with constructors as union members. Anyway, types with destructors can't be allowed, and that makes sense. Given that the union doesn't store a discriminator, how would the compiler know which destructor to call when going out of scope?

Andrei Alexandrescu is a Ph.D. student at University of Washington in Seattle, and author of the acclaimed book Modern C++ Design. He may be contacted at www.moderncppdesign.com. Andrei is also one of the featured instructors of The C++ Seminar

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!