In C++ you can’t create new operators to overload, you can only overload the operators that are already defined in the language. Often, an exponentiation operator is cited as an example of a new operator that you might want to have, so something like this isn’t possible.

Previously I’ve written about the performance implications of passing and returning values. It turns out that it needn’t be too expensive and that passing temporaries or initialising objects from function returns need not entail object copies.

Here’s a test class the might represent some sort of number. As discussed, here’s one possible implementation of operator+.

Num operator+( Num l, const Num& r )
{
return l += r;
}

It’s probably about the shortest implementation in number of characters and, superficially, it looks quite good on keeping object copies to a minimum. There’s a single return statement so the compiler should be able to use the ‘return value optimization’ to avoid the copy on return. We also need to copy (or at least construct a new object) at least once, and that happens when we copy the parameter l.

There are however some subtleties.

Here’s a function that uses operator+ in a reasonable simple way that shouldn’t add any extra copies:

I’ll show the assembler for the calling routine and for operator+ so that we can account for all the copies necessary in calling operator+. I’ve ‘demangled’ the function names so while it’s no longer valid assembler, it is at least human-readable. I also took the liberty of pruning some alignment directives and exception handling labels. They’re very important, but not relevant to this discussion.

Well, we have a second copy inside the operator+, in addition to the one outside the function for constructing the first parameter. This isn’t ideal. We know that we want to just return the first parameter as the return value, but because the caller arranged for the parameter copy and they don’t know that we want it constructed in the return value we can’t avoid this extra copy.

Perhaps we should go back to taking the first parameter by const reference again so that we can at least control where we make the copy.

OK, we’ve still got two copies going on. First the new temporary is constructed from l, then, after calling +=, we’re copying the return value again. What’s going on here?

Look carefully at the definition of operator+=. It returns a reference to a Num. We know it returns a reference to its left hand operand, or it really ought to, but there’s not anything in the function signature that actually specifies this. For all the information in the signature, it might return a reference to a completely unrelated Num and the compiler is not allowed to ‘take a chance’.

We know that we just want to return the copy of the parameter that we just made, so lets make that explicit.

That’s better! We just have the single copy. In the use of operator+ we’ve only had to construct one new Num object which is as minimal as it gets, having been passed two const references to the incoming Num values.

The answer is that it calls the global function f. The reason (or ‘problem’) is that in the expression “f();”, the identifier f is a non-dependent name. It does not depend on the template parameter, so lookup occurs at the point of definition of g, not at the point of instantiation of the template. Because the base class is dependent on the type of the template parameter it is not used to resolve f. The only visible f is, therefore, the global f.

To call the base class’ f, we just use a qualified identifier:

void g()
{
Base<T>::f();
}

As usual, this disables the virtual function mechanism and a direct call to the base class’ f is made.

How do we ensure that the correct virtual f is called? We need to call f through a pointer or reference which is dependent on the template parameter so that the base class is used in the lookup of f at the point of instantiation.

This means something like:

void g()
{
Base<T>& baseref = *this;
basethisref.f();
}

or

void g()
{
Base<T>* basethis = this;
basethis->f();
}

It’s actually enough to just do this:

void g()
{
this->f();
}

“this” is type-dependent because g is a member function of a class template, so “this->f()” is a type-dependent expression because it has “this” as a type-dependent subexpression.

The def(), copy(), assign() and destroy() functions are just used to track how many class instances are created and when. They are marked as ‘no throw’ to keep the example simple, but the possibility of having a copy constructor throw does have implications. The heavy use of extern “C” in the examples is just to make the function names in the generated assembler as simple as possible.

If you’re not fluent in x64 assembler then the important things to watch for are the calls to the extern functions which track how many objects are created and also what order copying and function calls happen in.

Here, we can see that if we used an unnamed temporary passed directly into the function taking an object by value then only one object is created and there is no copying. If we have a named object that we pass in then, as you might expect, a copy is made. Note that even though we don’t actually do anything with the copy after the function has returned, the compiler can’t skip the copy as our undefined extern “C” functions may have side effects that can’t be ignored.

But in the called function itself is there any copying or clean up of the temporary that needs to be done?

Let’s see. We test a function that passes a pointer to the passed object to an undefined function to ensure that the compiler believes that the object really is used and can’t be optimized out of existence.

Wow. We just jump (that’s goto, not a function call) to the function that takes the pointer. If you wanted to consult the documentation for x86_64 (aka amd64) calling conventions on linux you would find the following information about passing objects by value. Small objects (up to 16 bytes in size) are passed in one or two registers, or – after the registers assigned for parameters are all allocated – on the stack. For larger objects the caller allocates temporary space for them and passes a pointer to the temporary. The caller is then responsible for cleaning up the temporary after the function call. In other words, the state of the registers and stack on entry and exit of a function which takes a large object by value is exactly the same as it is for a function which takes a pointer to such an object. (It is also exactly the same for a function taking an object by reference. Once you get to assembler references are pointers.)

This (and the ‘jmp TakePointer’) proves that passing by value is costs exactly the same as constructing a temporary and passing a pointer or a reference to it.

OK, so what about returning a value, what does this cost?

Here we have two functions that return an object by value. One returns it directly and the other uses a local variable, mutates it and then returns it.

Look no copies! So what’s going on here? x64 functions can use up to two registers for the return value: rax and rdx. For objects larger than 16 bytes, the caller must allocate space for the return value and passes a pointer to the allocated space in register rdi. The pointer passed in rdi is returned to the caller in rax. (rdi does not have to be preserved by the called function, so by returning the pointer to the caller in rax the caller is relieved of the need to save this value in an alternative preserved register or on the stack.) The other thing to note is that rbx belongs to a calling function must be preserved my the called function.

So here’s what happens in the first case. A pointer to space for the return value comes into the MakeValue function in rdi. MakeValue pushes rbx onto the stack so that it can restore the contents of rbx when it returns. It then saves the pointer value in rdi into rbx. This ensures than when it calls other functions which may overwrite rdi, it still has a copy of the pointer to the allocated space for the return value. MakeValue then calls the default constructor for TestClass. It then copies the return value pointer (saved in rbx) into rax (the return value register) and restores the original value of rbx that it save from the stack.

In the second case much the same happens. Indeed, the only different instructions are a movq where the pointer value parameter for TakeReference is set up and the call to TakeReference itself. Despite the fact that we had a named local variable, the storage used for this variable was the space allocated for the return value by the caller and no copy was made for the return statement. This is the “named return value optimization” in action.

The sub instruction allocates stack space for the object by moving the stack pointer down 32 bytes, the constructor is called inside the MakeValue function, then the stack pointer is also used as parameter to the TakeReference function and to the destructor. No copies here, either.

Excellent, passing and returning values needn’t imply any unnecessary copies, so implementing operator+ like this is fully optimal:

Unfortunately, this doesn't work as std::bind2nd ends up trying to create a refence to a reference.

Luckily tr1 can come to our rescue... the bind here is that tr1::bind returns a function object that isn't derived from std::unary_function, so we can't pass it straight to std::not1, we have to bind it to std::logical_not.

Classes with value semantics are a very important, err, class of classes. Frequently used as the building blocks for other code, it is important that they are tested and behave as expected.

The following function templates are designed to make testing the basic properties of value classes as simple as possible. They are designed to be used with a simple C++ testing framework such as hshgtest.

The tests are designed to test classes that are default constructible and have a number of properties each of which affect the value of the class. Changing any of the properties away from its default value should change the value of the class. Copy construction and assignment should preserve the value of the class in the expected way and value classes may have a serialization mechanism which should preserve the value of the class through an intermediate ‘flat’ form, typically a byte sequence.

To test a particular class, it is required to specialize the following function template:

template< class T >
void FillTestVector( std::vector< T >& vec );

The implementation should populate vec with any number of class instances each of which must differ from a default constructed instance of the class. Typically one would create as many instances as the class has distinct properties and make each instance differ from a default constructed instance by perturbing just one property from its default value. This maximizes the tests’ coverage.

The “Not Equal” Test

The first set of tests is a basic sanity check of operators == and != and of the set of values chosen for the FillTestVector specialization. Instances should compare equal to themselves and non-default instances should compare unequal to default constructed instances.

The Serialization Test

This test is a little more flexible. There are numerous ways of streaming class and numerous representations of a ‘flat’ serialized class. The serialization and deserialization functions are abstracted into two class templates, Freezer and Thawer.

The two function templates, Freeze and Thaw, serialize and deserialize the class under test (T) though an intermediate flat data class (S) using the Freezer and Thawer class templates. This test requires a specialization of Freezer and Thawer to specify how the serialization is performed.

Here is a partial specialization which uses standard insertion and extraction operators with std::stringstream classes to serialize the class under test to and from a std::string.

Testing a new property is as simple as adding more test values to the test vector in the FillTestVector specialization. Testing that the property contributes to the equality operator, that the property is appropriately preserved in copy construction, assignment and serialization happens automatically.

Address Space Monitor 0.6 has been released. In true ‘alpha’ quality style it has gained many features, while probably being no more stable than the previous release.

The major new feature is virtual address space recording which enables you to record the address space usage of a program over time and later analyse the recording and compare it with other runs.

Address Space Monitor now comes bundled with a number of command line utilities for manipulating recordings and extracting statistics from them, but for now these are completely undocumented so you should not rely on their interface (or indeed their existence) in future releases.

As always, the installer and source are both available here and as before the license is an MIT license variant.

For many situations the use of RTTI in C++ is unnecessary and it can often be an indicator of poor design. There are, however, some circumstances in which a dynamic_cast can make things cleaner and reduce dependencies.

For many classes an equality operator is desirable, its implementation often passes through to its members in a natural way, however some classes may hold pointers (or auto_ptr) to base types because they require a member to have polymorphic behaviour but that member should still have value semantics.

The problem then, is how to compare two objects through pointers to their common base class. The result of the comparison should be result of comparing the derived objects if the derived types are the same and false otherwise.

Note how we can use template argument deduction so that we don’t have to explicitly specify ‘Derived1’. If you fancy, you can use this to make a macro and just put the macro in each class declaration. This is one step beyond what I’d do, though.

We retain the IsEqual template function for ease of use in the simple case, but provide a template struct with an operator() and a parameterizable equality functor for which there is an obvious default in the standard C++ library.

What’s the alternative without RTTI? Unfortunately it’s a classic double dispatch problem for which there are no universally neat solutions. Here’s what I came up with. It’s not nice.

The basic strategy is that the comparing to base classes calls the virtual equal function on one of them, with the other as parameter. The virtual function mechanism now ensures that we can use a this pointer with the correct derived type in the derived equal function, the other parameter still has a type of base reference, though. To promote this to derived type we call a virtual helper function (iseq) on the base reference which is overloaded for all the different derived types. This ensures that we can now promote the other parameter to a derived pointer through a virtual function without losing information about the type of the first reference. When the two derived types differ, the base implementation (return false) is the correct behaviour. Only the iseq overload for the matching derived class needs to be overridden in each derived class. It can then call the appropriate equality operator for two derived instances.

The big ugliness is the requirement for a virtual function in the base class for each derived type which is a nasty reversed coupling between base and derived classes.

For a function template specialization you can omit any trailing template arguments that can be deduced; if all template arguments can be deduced you can just use the unadorned template name in the declaration of the specialization.

Similarly, for explicit instantiations of function templates you can just use the template name where the arguments can be deduced.

This applies to both C and C++, although the details are a little different in the more obscure corners. Consider this C++ snippet which is designed to position a streambuf somewhere a bit before its end:

sb is a std::streambuf
thing.GetSize() returns an unsigned int
pubseekoff takes a streambuf::off_type which is some sort of signed type for representing stream offsets.

Here’s what happened on my implementation.

128 is an integer constant, and fits in an int, so it’s an int.
After integral promotions, it’s still an int (phew!).

-128 is, therefore an int.

thing.GetSize() is an unsigned int, so -128 gets converted to an unsigned int for the binary ‘-‘ operation. (Uh-oh!)

Finally this unsigned int gets converted into whatever streambuf::off_type is. If off_type behaved like a signed int then (implementation defined!) you might get the conversion to wrap around back to the negative numbers to a value that you were expecting in the first place.

On the other hand, off_type might act like a 64-bit signed integer type, with unsigned int being a 32-bit unsigned type. In this case, the very large positive unsigned 32-bit number that you were hoping was actually a reasonably small negative number will quite happily stay large and positive in the conversion. The function call then ends up trying to seek 3.99 GBytes beyond the end of the stream. Whoops.