More ways to map memory

The most elegant and pleasing notation might not be the most efficient one.

This is the third in a series of articles on accessing memory-mapped device registers using C and C++. In my previous column (November 2004), I discussed how small variations in memory-mapping techniques might lead to differences in the efficiency of compiled code. My column generated some feedback from readers that helped me to refine my observations. This month, I'll share that feedback with you and show you a few more variations on the theme.

Where we were
In Standard C and C++, you typically access a memory-mapped device register by dereferencing a pointer whose value is the register's address. You can define a pointer to a memory-mapped device register either as a macro or as a constant object.

As in my previous columns, I'll use an example from the ARM Evaluator-7T single-board computer. The board's documentation refers to the device registers as special registers, so I do, too. The Evaluator-7T's memory is byte-addressable, but each special register occupies a four-byte word. Special registers are also volatile, so I define the type for special registers as:

typedef unsigned int volatile special_register;

The Evaluator-7T uses five special registers to control the two integrated timers, which I represent as a struct defined as:

The timer registers on the Evaluator-7T reside at address 0x03FF6000. A program can access the timer registers via a pointer defined as a macro, as in:

#define timers ((dual_timers *)0x03FF6000)

or as a constant object, as in:

dual_timers *const timers = (dual_timers *)0x03FF6000;

The TMOD register contains bits that you can set to enable a timer and clear to disable a timer. You can define the masks for those bits as enumeration constants:

enum { TE0 = 0x01, TE1 = 0x08 };

Then, for example, you can disable both timers using:

timers->TMOD &= ~(TE0 | TE1);

When it's defined as a macro, timers expands to an rvalue expression. When it's defined as a constant pointer, timers is an lvalue. An lvalue is an expression that designates an object. An rvalue is an expression that is not an lvalue. Since an rvalue does not necessarily refer to an object, compilers may be able to avoid generating data storage for rvalue expressions. Avoiding generating storage for lvalues is harder, but not impossible.

In C, constant objects declared at global scope have external linkage by default, as if they were declared with the keyword extern. This means that references to timers may appear in other translation units, and a C compiler must generate storage for timers just in case such external references exist.

In C++, constant objects declared at global scope have internal linkage by default, as if they were declared with the keyword static. This means that all references to timers must appear in the same translation unit as the definition for timers. In that case, the compiler might be able to determine that it doesn't need to generate the storage for the constant pointer. A C compiler should also be able to eliminate the storage for the constant pointer if you define it with the keyword static.

I wrote a number of small programs to test if real compilers generated code for memory-mapped I/O as I just described. The first test program appears in Listing 1. It defines timers as a macro. I compiled the program as both C and C++ using four different compilers for the Intel x86. In all cases, the compiler generated code that used immediate operands for the pointer values and didn't generate a copy of the constant pointer in the data space.

Listing 1: A little test to see how the compiler generates code to access memory-mapped device registers

In the second test program, I replaced the macro with a constant pointer, defined as:

dual_timers *const timers = (dual_timers *)0x03FF6000;

I found that the C compilers invariably generated a copy of the constant pointer in the data space, but the C++ compilers did not.

In the third program, I added the keyword static to the constant pointer definition:

static dual_timers *const timers
= (dual_timers *)0x03FF6000;

This had no impact on the code generated by the C++ compilers. (Last time, I reported that this produced a change for one compiler. I've since reviewed my results and found I was mistaken.) The C compilers should have been able to exploit this change to produce better code, but only one compiler actually took advantage of it.

A missed observation
Last time, I wrote that:

"In C, constant objects . . . declared at global scope have external linkage by default. That is, they behave as if they had been declared with the keyword extern . . . . This means that references to timers may appear in other translation units and a C compiler must generate storage for timers just in case such external references exist. In theory, the linker might be able to determine that no external references exist and eliminate the storage for timers, but I don't know of a linker that does." (November 2004, p. 48)

Dave Baker (davidabaker@gmail.com) wrote that he uses a C compiler that discards unused objects at link time. I realized that, in doing my analysis last time, I had looked only at the generated assembly code, not at the linked executable programs. When I looked at the link maps for each test program, I found that one of the C compilers I had tested came with a linker that also discarded unused pointers.

Using a local pointer
Thus far, all of the tested variations declare timers as a non-local name, either as a macro or a global. However, as a general rule, you should declare names in the smallest scope possible. So I defined timers as a constant pointer local to main, as in:

Using this approach, two of the compilers (for both C and C++) generated the same code as they did when timers was defined as a macro. That is, the compiled code allocated nothing for the constant pointer and used immediate operands for the pointer value. A third compiler did much the same, but actually generated slightly shorter code than it did when when timers was a macro.

Surprisingly, the fourth compiler generated noticeably poorer code when using a local pointer than it did when using a macro. The compiled C program allocated storage for the pointer on the stack and initialized the pointer at run time. The generated code also contained more instructions than did the code that resulted from using a macro.

When I defined timers as a statically allocated constant pointer local to main, as in:

I got the same code as when the pointer declaration appears at the global scope. I found this to be true for all C and C++ compilers that I tested.

Using references instead of pointers
References in C++ provide many of the same capabilities as pointers. A reference, like a pointer, is an object that you can use to refer indirectly to another object. The difference between pointers and references is that you must use an explicit operatorthe * operatorto dereference a pointer, but you don't use an operator to dereference a reference. A reference automatically dereferences when you access it. For example, if pt is a "pointer to T" pointing to object x of type T, then expression *pt derefences pt to refer to x. In contrast, if rt is a "reference to T" referring to x, then expression rtwithout any operators at alldereferences rt to refer to x.

A reference is essentially a const pointer (not pointer to const!) that's automatically dereferenced each time it's used. You can always rewrite code that uses references as code that uses constant pointers. For example, a reference declaration such as:

int &ri = i;

is equivalent to a pointer declaration such as:

int *const pi = &i;

An assignment to the reference, as in:

ri = 4;

is equivalent to an assignment to the explicitly dereferenced pointer, as in:

*pi = 4;

In C++, you can use a reference to refer to a memory-mapped device register. For example:

dual_timers &timers = *(dual_timers *)(0x03FF6000);

declares timers as a reference referring to the dual_timers object at location 0x03FF6000. Since a reference is automatically dereferenced when you use it in an expression, you don't use the -> operator with a reference as you do with a pointer. Rather, you use the . (dot) operator, as in:

timers.TMOD &= ~(TE0 | TE1);

I like the way that references make memory-mapped registers look like objects.

I modified the test program to define timers as a reference at global scope and compiled it with each of my C++ compilers. Since a reference is implemented as a constant pointer, you might expect that each compiler would generate the same code as it did when timers was a constant pointer. That's almost what I got, but not quite.

When timers is a reference defined as:

dual_timers &timers = *(dual_timers *)(0x03FF6000);

all of the compilers allocate storage for timers as if it had been declared as a constant pointer, as in:

dual_timers *const timers = (dual_timers *)0x03FF6000;

with one difference. Remember that we're now working only in C++ because C doesn't support references. In C++, an object declared const at global scope has internal linkage. However, the reference is not declared const, so it has external linkage just like const objects at the global scope in C. Therefore, C++ code that declares timers as a reference actually generates code that's closer to what you get when you declare timers as a const pointer and compile it with C.

With one compiler, the C++ code using a global reference produced exactly the same machine code as when it compiled C code using a global const pointer. Another compiler chose to use the CPU registers differently, producing a couple more instructions than when using a const pointer in C. The other two compilers did something with the reference that really surprised me: they generated code that initialized the reference at run time rather than at compile time, resulting in bigger and marginally slower code.

Using this approach, all of the C++ compilers generated the same code as they did when timers was defined as a local constant pointer.

OK, so what's the new bottom line? From my limited sampling, it appears that using a pointer constant defined as a macro is the surest way to obtain the tightest code for accessing memory-mapped registers. However, declaring the pointer as a local constant might be better style than using a macro, and with most compilers, produces code that's just as good as when using a macro. In C++, using a reference instead of a pointer offers an appealing alternative notation for memory mapping, but with many compilers, it may incur minor performance penalties that don't occur with pointers.

For those of you who would like to try this experiment on your compilers, Listing 2 shows all of my test programs combined into one using conditional compilation statements. You can select a variant at compile time by defining the macro VER to the number of that variant.

Listing 2: All of my test programs combined into one using conditional compilation statements

Dan Saks is president of Saks & Associates, a C/C++ training and consulting company. Dan is co-author of C++ Programming Guidelines and co-developer of Suite++: The Plum Hall Validation Suite for C++. You can write to him at dsaks@wittenberg.edu and find him on the web at www.dansaks.com.