Discriminated Unions

In computer science, a discriminated union is one of the many names given to the concept of a “catch-all” datatype. (You’ll also hear it referred to as a variant.) It’s meant to hold data of any type at any given point in time. It does so by “tagging” the type information within the union. Generally speaking, it’s also an efficient datatype because the underlying storage can be shared amongst all tags. Since you’re only allowed to use one tag at a time, this sharing of memory can greatly reduce the overhead for some applications.
In C and C++, you have something that’s close to discriminated unions with the union keyword. However, you can only store very simple datatypes within the union. For instance:

union u1 {
int the_int;
char *the_string;
double the_double;
};

This declares a union named u1, and it is allowed to contain an int, a char * or a double at any given point in time. The programmer picks which datatype they want to use by using the “tags” (the_int, the_string or the_double). Since the double is the largest datatype used within the union, the entire union requires eight bytes to store the data. However, if you were to require storage for all three values, you’d need at least 16 bytes on a 32-bit platform.

In older versions of C++, the only datatypes which are allowed in a union are the built-in datatypes: char, int, short, long, long long, float, double, long double, char *, wchar_t * and user-defined POD structs. Recall, a POD struct is a structure that contains only data (no methods, no constructors, etc). This makes some degree of sense — none of these blessed datatypes require any special work on the part of the compiler. All of them are just a bucket of bytes with no worries about constructors or destructors. Unfortunately, it also severely limits the datatypes you can place into a union. Very few custom C++ datatypes don’t use constructor, destructor or some form of instance methods!

However, the new C++0x specification relaxes that rule slightly so that it has something closer to truly discriminated unions. Now you are allowed to place any class or struct, so long as it contains no virtual methods. That means you can now do:

However, this brings up an interesting question. When you declare a variable of type union u1, what happens to the_position? Does its constructor fire? Or, when u1 goes out of scope, does the_position’s destructor fire? After all, these were the dangerous things that C++ was protecting against previously.

Unions are not for the faint of heart, and discriminated unions are no different! The only entity that knows whether the constructor or destructor should fire is the programmer. The compiler cannot reliably figure it out, and so it’s left up to you to fire the constructor and destructor manually using the little-known feature of manually calling them!

Taking our example above, let’s say that you wanted to use the_position within the union, what would that look like?

As you can see, the Position::Position constructor is called explicitly at the point when we want to use the_position, and then the Position::~Position (automatically-generated) destructor is called when we’re done using the tagged value. Then we’re free to make use of one of the other tagged values within the union.

While I certainly agree with the implementation, and the rationale behind it, I am hard-pressed to think of times when I’d want to use the feature in production code. I can see a lot of use within the embedded markets where space concerns are high. But given the dangers of forgetting to call the constructor manually, or the destructor (if needed) is quite a high bar to set for most projects.

However, it is good to see that C++ has relaxed the rules. One of the benefits of working in C++ is that you’re allowed to shoot yourself in the foot (or in 30 copies of your foot, if you prefer). This allows you to implement powerful, efficient solutions, at the expense of the hand-holding provided by some other languages. I just hope I don’t catch any of my coworkers using this particular one! ;-)

6 Responses to Discriminated Unions

I know the purpose of the post was to talk about discriminated unions, and specifically the changes made possible with C++0x – and you did a great job. And I agree that while it’s an interesting and perhaps useful addition, I’m not sure I’ll ever actually use it.

@Dan — yeah, this is one of those language features that I understand as a language guy exactly why it exists, even if I can’t justify its existence from a practical perspective. I mean, I wouldn’t WANT many people to use this! But it definitely deserves to be allowed. If that make sense. ;-)

As strange as it may sound, I’ve never used Boost for anything practical, though I’ve certainly read plenty about it. But the concept of a variant has always left me feeling slightly sick coming from my REALbasic background. It’s just too easy to abuse! But in the sense of embedded programming, I can see a lot of utility to it!

Actually I don’t use Boost too much either. It’s rare that I see it used in embedded systems, although I’ve often seen shops with template code that is similar (things like smart pointers, dimensional analysis, etc.) Have you heard about the Highscore Boost book? One of my goals is to work through that material when I get a chance…

I’m mostly an “embedded generalist” – I have lots of experience developing firmware for lots of different kinds of systems — telecom/datacom (from large racks of distributed boards connected via fiber, to deeply embedded handsets, cable modems and routers) to industrial automation/motion control/motor control, to medical devices, to defense/military communications & weapons systems. Background is BSEE, but I’ve been doing firmware most of my life. Started (many years ago) doing board diagnostics & drivers, then RTOS/multitasking stuff, then full-blown systems (digital logic design & all firmware, from bring-up code to the application).

Now I’m a consultant, so the variety of projects I work on is pretty broad. But most of the time the client has the “domain knowledge”, but they don’t always know how to write good software / firmware. No processes, no architecture / design, poor (non-existent) use of tools (version control, static analysis, code review, etc.) So we couple my engineering background with their domain expertise, and we usually end up with a dog that can hunt ;-)

Your email address will not be published. Required fields are marked *

Comment

Name *

Email *

Website

Who

Aaron Ballman is a software engineer for GrammaTech. He has almost two decades of experience writing cross-platform frameworks in C/C++, compiler & language design, and software engineering best practices and is currently a voting member of the C (WG14) and C++ (WG21) standards committees.

In case you can't figure it out easily enough, the views expressed here are my personal views and not the views of my employer, my past employers, my future employers, or some random person on the street. Please yell only at me if you disagree with what you read.