Enumeration Types - A Quantitative Survey

A comparison of different techniques when using enumerations, their pro's and con's.

Abstract

Enumeration types are an essential ingredient in writing human readable source code. Due to their special nature, special care must be taken when deciding how to use them and - even more importantly - assessing implications of their use. By no small means are answers to these two questions governed by choice of implementation of enumeration types - that is, whether to use language provided enumeration type support, or other, customized approaches. This article compares various methods to implement enumeration types ranging from simple preprocessor constructs to more sophisticated, class-based methods. Although these constructs and their semantic peculiarities are discussed within the context of the C++ programming language, most of them can be used in C# or Java without much effort.

Being targeted not only to the novice C++ programmer, this article assumes some familiarity with the semantics of integral types and static class members in C++ and object-oriented design in general.

I Introduction

Motivation

Software design, and as a consequence, programming is much about representing abstract concepts or complicated structures in an easily understandable form benign to the human eye. This is one of the main reasons for the existence of so-called higher-level programming languages. Such languages usually feature advanced concepts like structured data types, loops or classes. One of the more primitive constructs are enumeration types. Their use is to symbolically map constant values, usually of integral types like int, char etc., to identifiers more intuitive to understand. For example, instead of 1 or 2, it is usually better to write, e.g., foo resp. bar. Likewise, on encountering the same constant with different meanings in nearby places, enumeration types are helpful in establishing the meaning of a specific occurrence.

While most programming languages offer built-in support for enumeration types based on integral types, i.e., enumeration type values are represented as integral type values by the compiler internally, the developer is left alone in extending the concept of enumerated types to classes or structured data types in general. Other issues arise from type conversions performed implicitly by the compiler. While not necessarily a show stopper, they can hamper efforts of programmers striving for type safety or become a problem if conformance to strict semantic rules is required.

The remainder of this section will discuss a simple example for an application of enumeration types. The next section will concentrate on primitive approaches to enumeration types, that is, by preprocessor and built-in language support plus a comparison to "ordinary" constant integer variables. A more sophisticated approach to enumeration types which can be extended to classes is presented in the third section.

A Simple Example

A graphics library is to be implemented. As color-capable displays are common nowadays, the library is expected to support color and color manipulation. Part of this support is therefore representation of color values within the scope of the RGB color model. As an additional requirement, shortcuts for colors white, black, red, green, blue, magenta, yellow and cyan are to be provided.

Colors in RGB color model are specified by three values representing fractions of the three fundamental colors red, green and blue, respectively. These values are modeled as whole, unsigned numbers ranging from 0 to 255. A possible implementation might look like this:

II Enumeration Types Based On Integral Types

This section discusses enumeration types which are based on integral types. This includes simple constant values as well as built-in language support like C++'s enum-types.

Preprocessor Statements

A crude but nevertheless quite widespread way to define enumeration values is using the preprocessor (which isn't part of C++, strictly speaking). For each enumeration value, a preprocessor macro is defined, which is subsequently expanded in turn. For example, on an array of RGB instances:

and thus access the RGB instance corresponding to red by cgFundamentalColors[RED].

It is important to note that the macro RED is never seen by the C++-compiler. Instead, the preprocessor replaces it with the numerical value of 2, i.e., the expression seen by the compiler is cgFundamentalColors[2]. A simple corollary of this observation is that such macros are not actual enumeration values or types - they are just a notational convenience. As a consequence, such macros do not have special associated types - they behave just like ordinary integer literals and their type (e.g., long, int or char) is that of the literal.

This means that values of such enumerations in general cannot be distinguished from those of other enumerations or any other integer values. In particular, the members of two enumerations can be compared to each other even if their meanings are totally unrelated:

However, compiler warnings might indicate possible deviations of intended semantics. Whether or not this is acceptable must be decided case-by-case.

Built-In Enumeration Types

Most higher-level programming languages offer some built-in support for enumeration types. In C++, enumeration types are declared by the keyword enum. The members of an enum-constructed enumeration type are called enumerators. For example,

enum { white, black, red, green, blue, magenta, yellow, cyan };

defines an anonymous enumeration type representing fundamental colors. The main difference between this type and afore considered preprocessor macros is that enumerators are actually encountered by the compiler, that is, the compiler really sees cgFundamentalColors[ red ]. However, as one can easily see, there is no explicit mapping of enumerators to a corresponding integer value in the above declaration. This is an important property of such a type - it is independent of its representation.

In many cases, this is what is wanted. Quite often, however, one needs to be more in control of how the enumerators are represented. In C++, each enumerator can be represented by a literal of an integral type explicitly. For example, the declaration:

enum { white = 0, black, red, green, blue, magenta, yellow, blue };

is equivalent to the preprocessor approach as far as its mapping to integer values is concerned. In contrast to the preprocessor macros, enumerators are typed, however. This becomes apparent in the following snippet:

The reason that the last assignment does not work is the main advantage of enum declarations over macros: they limit the interchangeability of enumerators in assignments. On the other hand, the caveat about possible unintended comparisons between enum-types remains because of built-in integral type conversions.

enum-types have a very distinct enumerator set. As a consequence, they cannot in general be used with bitwise logical operators as it is the case with integer values. Operator overloading can help in such cases if the enumerator set isn't too large. If the enum-type's enumerator set is sufficiently small, enumerators can get assigned powers of 2 for representation, for example:

enum { white = 1, black = 2, red = 4, blue = 8/* ... */ };

Most often this will exclude use of enumerators as array indexers, however.

Instead of applying the bitwise logical operators to the enumerator itself, they are applied to the result of integral type conversion. However, the result of the operator will fall out of the enum-type's enumerator set in general.

enum-types present another small problem that can prove to be a nuisance - the names of enumerators are added to the type's defining scope. An enum-type doesn't open a scope by itself. Therefore, it is relatively easy to provoke name clashes if enum-types are declared in the global namespace.

Constant Integral Type Variables

In a nutshell, enumerators are of a constant value nature. Sometimes, one might wish to use an enumerator in a way more akin to using a normal variable or instance, however. For example, there might be situations when access to the address of an enumerator is required. Identifying enumerators with constant variables can be helpful in such situations. This can easily be achieved by declaring constant variables of appropriate integral type and - typically - static storage class. For example, the color enumeration's enumerators could be declared like this:

It is important to note that these declarations do not declare enumerators in a strict semantic sense, but they can be used as such. Their nature is somewhat of a hybrid between the constant nature of enum-type's enumerators or macros and normal variables. They can be used in any place where constants can be used. On the other hand, they can be referenced by their address, although via const pointers only.

As the one-definition-rule applies to const variables, too, the actual value of such variables are stored in exactly one place. Therefore, changing a const variable's value will not require re-compilation of dependent sources (for simplicity, declarations of the form:

externconsttypename = value;

are not dealt with here). The disadvantage of using const variables is that they don't own a distinct type.

While this approach to implement enumerations, at first glance, may seem to be of academic interest only, the transition from values as enumerators to instances of types paves the way to class-based approaches in implementing enumeration types, which is the topic of the next section.

III Class-based Enumeration Types

Up to this point, implementation of enumeration types and their enumerators relied on some integral type for representation. In particular, use of enumerators was more or less boiled down to use of constants. The last variant described in the previous section, however, somewhat blurred this principle by using constant variables or instances of the type in question. By generalizing this concept and using classes instead of integral types, enumerators can be defined that combine most properties of the aforementioned approaches while still maintaining type safety and supporting object-oriented design.

Using Static Class Members As Enumerators

In C++ (and most other strongly typed object-oriented programming languages), each class has its own type, which is distinct from all other types. If not expressly defined, they cannot be converted into each other except for conversions to reference types of base classes. In particular, there is no implicit, built-in type conversion or value promotion as is the case with built-in types except for base/derived class pointer conversion.

Enumerators, on the other hand, are often used in an out-of-class context - that is, they do not require an instance of a particular class to be present; they are of a global constant/variable nature, instead.

Making use of these two observations, the RGB color class' declaration from section one can be rewritten like this:

This declaration of RGB allows for any two instances of class RGB to be compared to each other (assuming appropriate comparators being present). In particular, any RGB instance can be compared with the eight constant instances representing the fundamental colors. No instance of RGB can be compared to instances of other types, except for instances of derived classes. This special case can be dealt with by checking an instance's class from within comparators if necessary. For the rest of the following discussion, RGB is assumed not to have any classes derived.

Special care must be taken when it comes to declaring user-defined conversion operators with integral types as target type - this might allow use of RGB in a manner akin to built-in enumeration types.

Assignment and initialization of class RGB instances can be done via copy constructor implementation and/or assignment operator overloading as usual. Bitwise logical operators can be overloaded as seen fit, if necessary. The only limitation is that only integral types can be used as arguments for the switch statement. This is not really a big problem as cascaded if statements can do the same job, although with a slight run-time penalty because of lack of compiler-generated jumptables.

Run-Time Behavior of Static Members

Whenever, except for trivial cases, static class members are involved, it is a good idea to remember initialization peculiarities of variables or instances of static storage type. For fundamental types, the rules are quite simple: literals, literal expressions and expressions with static variables of fundamental types initialized before are assigned at compile-time. The same goes for static variables of reference types if they are literals; normally, this means initialization with NULL. The order of initialization is that of definition within the translation unit.

Pointer type variables that are initialized with constant, non-literal expressions, i.e., addresses of other variables of static storage type of whatever type, are initialized at link-time. Link-time can have two meanings: static link-time, that is, when the link editor is executed from command line (e.g., nmake, Visual Studio project build). Depending on the target platform and executable type, it can also mean dynamic link-time. For example, when executing DLLs or other types of shared code, the final addresses of any variable (including contents of jumptables and procedures) will not be determined until the run-time linker is run.

Finally, the so-called ctor-chain is executed. During ctor-chain, any remaining initializations are executed in the order of definition throughout the translation unit. The following code snippet summarizes these rules:

Initialization order between translation units is compiler-dependent. Quite often, it is determined by the order in which translation units (i.e., the compiled code) are fed into the link editor at compile-time.

Using Pointers Instead of Instances

As long as class-based enumerators are referenced from the translation unit only, they are defined in class instances as static members can be used without many problems. Evidently, this isn't the case in general; static members are referenced from all across the translation units the executable's code is built from or, in case of library projects (e.g., DLLs), from anywhere anytime.

In most situations, this severely limits the use of static class members if they are instances of classes. The solution to this problem is to use pointer types instead. They are initialized at link-time and thus before any access by running code. For example:

Except for being pointers, this enumerator variant can be used like its instance counterpart. While memory allocation, i.e., calling operator new, is done when executing the ctor-chain, freeing the memory must be done by hand. This isn't really necessary in most situations, though, because allocation takes place only once when the code in question is loaded into the calling process' address space. However, manual deallocation will keep the memory leak detection happy.

Execution of the ctor-chain imposes a time penalty. There are situations when this is not acceptable. For example, this might be the case for time critical code or if too many initializations take place. If enumerators can be guaranteed to be used only as opaque entities which are compared for equality at most, a special initialization expression can be used:

This initialization takes place at link-time and preserves the most important property - uniqueness. Source code generators can easily be set up to produce such expressions automatically.

The disadvantage is these pointers do not point at real class instances. Any attempt to access non-static members of class RGB via such pointers will utterly fail. This technique therefore should be used only after careful consideration of its consequences and in a well-documented manner only.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

Share

About the Author

Still lacking an university degree in computer science, I have 20 years of experience in software development and implementation. Having expert knowledge in object-oriented programming languages like C++, Java and C# on Windows, LINUX and UNIX platforms, I participated in multiple research projects at the University of Oldenburg. During these assignments, I was trusted with implementation of a graphical editor for specification languages like CSP or Z and a prototypical tool for workflow data distribution and analysis. I gained experiences in a widespread spectrum of CS and software development topics, ranging from compiler construction across data base programming to MDA. My research interests include questions of graphical user interface design and component-based systems.

I consider myself an old-school CS geek. While I admit that simple tasks do not pose much of a problem and can be done in quick and efficient manner, it's the challenging ones that appeal to me. If you are looking for a skillful employee - be it on a permanent, contract or freelance basis - and if you can live with a lacking university degree, why not contacting me?

By definition, the BORDER_TYPE is static; you have to refer to it as ClassName::enumeration, e.g. BORDER_TYPE myBorder = CTextBox::BORDER_3D;

In C++ it is "type-safe": the compiler would complain if you'd pass other type (e.g. int) as a parameter to the function accepting BORDER_TYPE, and there is no explicit conversion operator in your class.

Thanks for your comments and my apologies for taking so long to answer. An example for declaring enums in classes and/or namespaces can be found in the last snippet of the subsection about built-in enum-types. Anonymous namespace is used here, but that doesn't matter, really.

Enumerated types (via enum) may help you in describing fixed types but not every one knows the following:

sizeof(an enumerated type) is not always the same across compilers/platforms.(Some use 4bytes fixed, while others will optimize its size according to the MAX/MIN value of the constant values the enumerated type contains.

The C++ ANSI standard doesn't specify the (representation) size of enums - it just says that the enumerators (or their values, rather) must fit. But that is trivial anyway. So as long as ANSI-compliant compilers are concerned, one cannot rely on specific representation sizes, anyway.However, one cannot rely on sizes of integer types, either - int, for example, is only required to be at least 32 bits wide. Compilers can implement them with more, if they like. Analogous with short, which is implemented by most compilers with 32 bits, while for example older GCC compilers will use 16 bit.Problems when bridging platforms are not limited to enum types, thus, and this is one of the motiviation for interface specification languages like CORBA/IDL or COM-IDL. Not to mention all the fuzz Serialization makes about.

Re your second point: You can iterate over iterators ( ) by providing appropriately overloaded operators (e.g. operator++). Most people will, however, dread the effort. But yes, the lack of iteration did escape me so far.

What I said is from experience, I'm an embedded programmer, and most of the common mistakes programmers do with enum types and forget that their size may not be as expected. As you said the standards (C/C++) talk about fixed size for char, long (and maybe short) but not for enum types which is compiler/processor dependent.

About iterating, yes, you are right. but not always you have the capabilities to do so (e.g. in the world of plain C). The danger is when your enumerated types are not continuous and you don't deal with the correct iterators..

Back to your article, there is another way to define constants, that I saw in different articles here in CP: the use of namespaces, that also has its pros and cons.

This is almost the same as one of your examples(inside a class declaration). In one of my projects, I have a header file called ConstantsDb.h where I store all the constants for the project, in this way, there is only one place to look for the public constants.

Have a nice day.

-- Ricky Marek (AKA: rbid)-- "Things are only impossible until they are not" --- Jean-Luc Picard