Enums, Macros, Unicode and Token-Pasting

Hi, I am Rocky a developer on the Visual C++ IDE team.I would like to discuss the C++ programming technique of creating macro generated enums.I recently used this for distinguishing various C++ types such as class, variable, and function.Having the list of types in one file makes it easy to add types and allows for many different types of uses.The examples I have below mirror uses in code that I have worked with.I have also seen this technique used in many places in various other source bases, but I have not seen it discussed much in text books - so I thought I would highlight this technique.

Consider this enum:

enum Animal { dog, cat, fish, bird };

Now dog can be used in place of 0.You can get compiler enforced type safety that macros do not provide.The VS debugger will also show the friendly value of the enum instead of integers.However, functions that print out enum values need better formatting.This code can help:

wchar_t* AnimalDiscription[] = { L"dog", L"cat", L"fish", L"bird" };

With this array, debugging code can now print the friendly value of the enum by using the enum to index into the string array.

With macro generated enums both the enum and the friendly names can be maintained as one entity.Consider the file animal.inc:

MYENUM(dog)

MYENUM(cat)

MYENUM(fish)

MYENUM(bird)

And the following C++ code:

enum Animal {

#define MYENUM(e) _##e,

#include"animal.inc"

#undef MYENUM

};

wchar_t* AnimalDescription[] = {

#define MYENUM(e) L"_" L#e,

#include"animal.inc"

#undef MYENUM

};

Now editing animal.inc will update both the enum and the friendly text of the enum.In this case I added an underscore in front of the animal names I used before to get the macro to work correctly.The token-pasting operator ## cannot be used as the first token.The stringizing operator # creates a string from the operator.By adding an L right before the stringizing operator the resulting string is a wide string.

These macro generated enums can be “debugged” by using the compiler switch /EP or /P.This will cause the compiler to output the preprocessor file:

enum Animal {

_dog,

_cat,

_fish,

_bird,

};

wchar_t* AnimalDescription[] = {

L"_" L"dog",

L"_" L"cat",

L"_" L"fish",

L"_" L"bird",

};

C++ allows for a comma after the last entry of the enum and the array initializer.

This macro string replacement technique can be further expanded to produce code.Here is an example of using string replacement to create function prototypes:

#define MYENUM(e) void Order_##e();

#include"animal.inc"

#undef MYENUM

This expands to:

void Order_dog();

void Order_cat();

void Order_fish();

void Order_bird();

You may wish to do some action based on the kind of animal.If you switch on the kind of animal, here is an example of creating case labels and function calls:

#define MYENUM(e) case _##e:\

Order_##e();\

break;

#include"animal.inc"

#undef MYENUM

This expands to:

case _dog: Order_dog(); break;

case _cat: Order_cat(); break;

case _fish: Order_fish(); break;

case _bird: Order_bird(); break;

In this example the function definitions would need to be added for each of the Order_dog(), Order_cat(), etc..If you were to add a new animal to animal.inc, you would not need to remember that you would also need to add a new Order_ function definition for this new animal.The linker would give you an error reminding you!

Macro string replacement is a powerful tool that can be leveraged to allow for internal data to be stored in one spot.Keeping this data in one spot reduces the chances for errors, missed cases or missed matched cases.

Very interesting. FYI, the Intel IPP library uses this very same technique to generate a different set of functions for each processor architecture.

Charles

1 May 2008 12:44 PM

Nice. I've used macros in a similar fashion to maintain the mapping between code symbolic names and strings that are exposed outside the program in various ways. When used appropriately, macros can prevent errors and even allow the compiler to detect errors in these mappings before they get frozen into shipping code forever.

Thanks for the new ideas of how to use macros.

Also, I just realized I've recently used a similar technique as ajax16384. Good stuff.

Adrian

1 May 2008 4:28 PM

I don't understand why you're risking conflicting with the reserved library identifiers by prefixing the enum values with underscores.

"... I added an underscore in front of the animal names I used before to get the macro to work correctly."

Why? This works perfectly well:

#define MYENUM(e) e,

Many identifiers that start with an underscore are reserved for the implementation of the compiler and the standard libraries.

Rocky

1 May 2008 5:54 PM

Adrian raises a good issue here. The reason for using #define MYENUM(e) _##e, instead of #define MYENUM(e) e, was to demonstrated the token-pasting operator ##. Furthermore, the token-pasting operator ## allows for padding the enum with some prefix to help prevent name collisions. This allows developers to edit these include files, such as animal.inc, without worrying about collisions.

In my work I needed to represent types such as class, template, and enum. By padding the enum I can have:

MYENUM(template)

MYENUM(class)

MYENUM(enum)

And, there are no collions with padding. As Adrian points out, padding with the underscore alone does not provide much protection aginst colisions. In practice I have used #define MYENUM(e) tok##e, for enums of tokens. For my example, define MYENUM(e) animal_##e, would be more likely to not cause collisions.

Also, if there is a collision, in many cases the compiler will provide an error.

Hrm, one of the most ancient tricks in the C hacker's book that's for sure. One major downside you need to consider is that this makes the code ugly and difficult to read. I wouldn't want to maintain a codebase littered with preprocessor statements.

Consider writing a simple code generator instead. One that generates readable code for instance. Then you can generate to your heart's happiness and aren't limited to the preprocessor's quirks (and some times compiler-specific quirks).

Unfortunately VC++ has had a history of issues dealing with dependencies once you start using code generators (usually invoked via a makefile project). So mileage varies. Sometimes you need build twice to actually build.

I suppose it might be useful in some situations but I can't think of a time when I have ever wanted such a thing.

The code is ugly and difficult to read. IMO, macros obfuscate code and are easy to write and/or use dangerously with unexpected results. They are best avoided unless there is a very compelling reason to use them, which there isn't here, IMO. This is especially true in C++ (compared to C) since C++ often provides better, safer and easier-to-read alternatives.

I agree that it is good to have a mechanism which keeps enum values and name strings in sync, but that good is far offset by the bad of ugly, unreadable macro code combined with being forced to separate the list of names out into another file.

For the switch statement, it would be better if the compiler noticed the switch was on an enum and emitted a warning if all cases were not covered (and there was no default clause, of course). Maybe the compiler already does that; I'm not sure. If it does then this is pointless IMO as you might as well add all the calls to the new Order_XYZ functions when you write the Order_XYZ function itself, and the compiler will tell you all the places you need to do so.

Sorry if it seems like I'm having a tinkle on the fireworks here; I just have a strong dislike of code like this, especially when it's coming from Microsoft's VC++ team!

Joe

4 May 2008 3:09 PM

I despise code like this. It's lazy, ugly, bloated and stupid. No wonder Microsoft software is getting ugly, bloated and slow--I'll bet it's chock full of undisciplined crap like this.

Anon

6 May 2008 3:30 AM

Wait until you need an enum with dozens of entries, I bet you'll wanna try using this sort of 'stupid' and 'ugly' solution too!

<i>

I despise code like this. It's lazy, ugly, bloated and stupid. No wonder Microsoft software is getting ugly, bloated and slow--I'll bet it's chock full of undisciplined crap like this.

</i>

Vu

6 May 2008 5:23 AM

Macros is something C++ (especially the Standards Committee) has always tried to avoid. Such use of macros like this will make code unreadable and cause maintainance problems. I think it's not a good idea to play with such thing.

That's an ugly solution, I would put the values in a XML file and then use a XSL file to generate the code.

Anon

8 May 2008 4:40 PM

"That's an ugly solution, I would put the values in a XML file and then use a XSL file to generate the code."

are you kidding me?

ikk

11 May 2008 11:13 PM

I would use boost preprocessor library for this, but it is nice to see a way of doing it without any library.

Nothing beats having everything in sync automatically.

Joe

12 May 2008 9:39 PM

<i>Wait until you need an enum with dozens of entries, I bet you'll wanna try using this sort of 'stupid' and 'ugly' solution too!</i>

I have. Many times. And I've found that it's better in the long run to do it manually. (I went through a phase where I used macros for a lot of things, including token replacement. I even remember going through my personal class library and ripping many of those out.)

I stand by my belief that this is a bad idea.

(I do agree that the XML/XSL solution is even worse.)

Chad

18 May 2008 9:20 PM

I use this kind of technique for the sole purpose of having one single place in the source code where the definition exists (I prefer declaring just MYENUM(bird) and not both tok_bird and "bird"). This guarantees that the enum name and the string description will match, and that *must* be what we strive for.

I don't like how this kind of use of the preprocessor confuses intellisense in the IDE. And when used on code, it confuses the debugger (can't display the source for the you're in when you're stepping through a function whose source is generated by the preprocessor.) Often the best you can do is get taken to the line that says MYENUM(foo).

I do like the idea of using XSLT or text translation (.tt) or any other roll-your-own preprocessor, except that the debugger again, will take you to code from the generated .cpp file, rather than to your original .xml or .tt file. And intellisense often is baffled as to the true origin of the symbol.

I totally agree that using the leading underscore is wrong. If you are in love with the underscore, put it at the end, instead of the beginning. prefer foo_ to _foo. Because, as was pointed out, the leading underscore symbols are reserved by the implementation. I personally prefer putting the enum symbols in class or namespace to resolve collisions.

I really don't think this is bad style at all. In fact, it shows a maturity of understanding when to use macros and for what purpose. We want to avoid declaring something in two places...the text "bird" would appear in multiple places in the source code for the purposes of symmetric declarations were it not for the pre-processing. And currently, using the built-in preprocessor is the only way we have within the language to accomplish such a thing. It's standard C++, which is good.

The only reason to use this code is to generate both a compile-time symbol, and a compile time string literal at the same time. If the language had the ability to declare something as both a string literal and a symbol in the same breath, we'd use it, but the only way seems to be to use preprocessing of some sort.