GENERIC : Traits on Steroids

The April installment of Generic1 discussed traits templates and traits classes. This article takes traits further by discussing traits objects and hierarchy-wide traits.

Traits are useful, but when do you need their unusual flexibility? If you want to use them, how can you avoid the drudgery of manually traits-enabling reams of classes in existing hierarchies? This article answers these questions in the context of the same SmartPtr example we used last time. In particular, watch for hierarchy-wide traits—a new, killer-cool C++ technique that allows you to define traits not only for individual types, but for entire hierarchies in a single shot.

BACK TO SMARTPTR The previous column presented a smart pointer that can be used with single-threaded code and multithreaded code, depending on how the client instantiates it. Recall SmartPtr's definition:

template >
class SmartPtr
{
...
};

The RefCountingTraits class template customizes SmartPtr to meet the exact reference counting syntax and semantics that type T uses. If you need to use SmartPtr with single-threaded code, RefCountingTraits will do. Otherwise, you must provide a separate traits class (MtRefCountingTraits) as the second template argument. MtRefCountingTraits ensures multithreading safety for the reference count manipulation.

Client code uses SmartPtr for single-threaded widgets and SmartPtr for multithreaded widgets. It would be that simple, were it not for the homework question at the end of the previous installment: What's still a potential source of inefficiencies in the multithreaded version of SmartPtr?

As many readers pointed out, the problem is that MtRefCountingTraits uses what's known as class-level locking. Herb Sutter puts it in a plastic way: Static lock, bad juju! (What a useful mnemonic.) Whenever you perform a serialized operation like MtRefCountingTraits::Refer, class-level locking locks all objects of type MtRefCountingTraits. This happens because lock_ is a static variable shared by all the instances of MtRefCountingTraits.

This phenomenon might become a major source of time inefficiencies if you have many threads that manipulate smart pointers to widgets intensively. Threads that could have been totally independent have to wait in line one after another whenever they copy SmartPtr objects.

Object-level locking is the technique that solves this problem. To use object-level locking, make lock_ a regular (nonstatic) member of MtRefCountingTraits. (That's the only change to MtRefCountingTraits needed.) This way serialized code locks each object independently. The downside of this approach is the increase in size that locks incur to each object. Let's implement object-level locking for smart pointers to Widgets.

TRAITS OBJECTS As soon as we try to graft object-level locking onto SmartPtr, we stumble upon a problem. Let's recall SmartPtr's destructor definition:

As you see, there's no RCTraits object at all; SmartPtr's destructor calls Unrefer with static member syntax (RCTraits::Unrefer). SmartPtr uses the RCTraits traits class as the mere depository of two static functions. Now the traits class has state, so we start talking about storing a traits object. The obvious place to store it is right in the SmartPtr object, so let's modify the code accordingly.

Now each SmartPtr object holds and uses a Lock object, which is exactly what's needed. Different SmartPtr objects belonging to different threads don't share any data and therefore don't have any operations to synchronize. Problem solved.

However, SmartPtr grew bigger. "Obviously," I hear you saying, "now each multithreaded SmartPtr has a Lock object, which is what we wanted in the first place." However, not only multithreaded SmartPtrs grew. The single-threaded SmartPtr is bigger, too, although it has no supplemental data at all (recall from the previous column that RefCountingTraits has no data members). Why? Because in C++ even empty objects have a nonzero size. This rule enables much of the language to remain sound (otherwise, for instance, how would you build arrays of zero-sized objects?).

As sensible as this rule might be, in this case it works against us. SmartPtr > is larger than a mere pointer to T, and it should not be. Now the size of the single-threaded SmartPtr is at least sizeof(T*) + 1, but often, due to alignment and padding issues, it's around 2 * sizeof(T*). If you have many single-threaded smart pointers, the size overhead incurred might become significant—to say nothing about the supplemental cost of passing SmartPtr by value.

Fortunately, the standard provides another rule that can be of help regarding object sizes. It's known as the empty base optimization. If a base class B of a class D is "empty" (i.e., has no nonstatic data members), the B subobject in a D object can have an effective size zero. This doesn't break the language rules because the B subobject is "melted" within the D object; of course, as soon as you extract a standalone B object it has, again, nonzero size. Whether you get the empty base optimization or not depends heavily on your compiler—it's an option, not a requirement. Compilers such as Metrowerks' Code-Warrior 5.x and Microsoft Visual C++ 6.0 do perform empty base optimization—an added incentive to try taking advantage of this optimization.*

Applied to the previously listed SmartPtr code, the empty base optimization suggests it is better for SmartPtr to inherit from RCTraits than aggregate it. This way, if RCTraits is empty, the compiler can optimize away any slack space; if RCTraits is not empty, the situation becomes much the same as with aggregation.

What kind of inheritance (private, protected, or public) should we use? Well, let's not forget it's just about an optimization, not a conceptual change. SmartPtr is not an RCTraits by any means. Therefore, the best choice is to use private inheritance.

This is an "inheritance hack" for optimizing object size. Ironically, we're back to the double-colon call syntax since RCTraits is now a base class of SmartPtr.

Traits objects are needed whenever traits may have to hold state. Traits objects can be part of other objects and/or passed around as parameters. When a traits object can be empty, you might want to consider using the inheritance hack to achieve optimal memory layout of your object, assuming your compiler actually does the empty base optimization.

Definition: A traits object is an instance of a traits class.

INTERMEZZO Traits templates, traits classes, traits objects . . . As we moved from mere static code generation to entities with state, our means of expression evolved from the most static (templates) to the much more dynamic (full-fledged traits objects). Traits templates are exclusively a compile-time mechanism; they vanish completely even before compilation ends. At the other end of the spectrum, traits objects are living, dynamic entities with state and behavior.

One further step to dynamism is to use polymorphic traits and pointers, or references to traits objects, but that's certainly not traits' charter anymore. Polymorphic traits are best spelled as the Strategy design pattern.2

Use whichever traits mechanism suits your needs, and choose the most static solution that you can. Prefer compile-time solutions to runtime solutions. Compile time often means that code is better checked at compile time (essential) and more efficient (doesn't hurt). On the other hand, of course, dynamism is what brings spice to life.

HIERARCHY-WIDE TRAITS Traits often apply not only to individual types, but also to whole class hierarchies. For instance, the reference counting strategy is usually the same within a class hierarchy. It would be nice, then, to define traits that apply to a whole class hierarchy at once without having to hand-code them into each individual class. However, templates—on which traits build—are agnostic of inheritance. What to do?

Maybe the first and foremost rule of good design is to be flexible and not to get stuck with a single strategy. Solving a design problem is like conquering a fortified castle: If one strategy doesn't work well, it's best to try another. A bad strategy might still solve the problem, but with higher costs than an alternative approach.

Following this line of thought, let's regroup. The need is to templatize a class template with something that is invariant within a hierarchy of types. And guess what? Nested classes (classes defined inside other classes) are invariant inside a hierarchy, as long as you don't redefine them. Nested classes are inherited like any other symbols. Looks like a path worth exploring. For automating such inner type definition, let's put in place a simple template:

Now say we have a hierarchy rooted in class Shape (Fig. 1). To publish the hierarchy root role that Shape has, you derive it from Hierarchy, shown as follows. The rest of the hierarchy remains the same.

In any case, the key achievement is that if you write Rectangle::HierarchyId, you get the same type as if you wrote Shape::HierarchyId. As long as a type deriving directly or indirectly from Shape does not redefine the symbol HierarchyId, that symbol identifies a type that's invariant in the hierarchy.

Building a SmartPtr that uses hierarchy-wide traits is just as simple as with regular traits. You only have to replace T with T::HierarchyId, like so:

template >
class SmartPtr
{
...
};

Now say you have two hierarchies in your application: one rooted in Shape and another one rooted in Widget. Just like Shape, Widget inherits HierarchyRoot to state its role of hierarchy root. You can now specialize RefCountingTraits for the two hierarchies in this way:

That's it—the traits above dispatch correctly on classes in the two hierarchies, even for types that you have not defined yet. As the following two sections show, hierarchy-wide traits are quite flexible.

PERSONALIZED HIERARCHY-WIDE TRAITS Simple traits offer per-type specialization; hierarchy-wide traits offer per-hierarchy specialization. Sometimes you might need something in between—a traits template that you can define for a whole hierarchy, yet specialize only for an isolated type or two in that hierarchy.

You can achieve that by defining your traits template as shown in the following:

How does this traits template work? The client code uses Traits, Traits, etc. If you want to specialize the traits for the whole Shape hierarchy, you specialize HierarchyTraits. By default, because Traits inherits HierarchyTraits, all derivatives of Shape use the traits defined in HierarchyTraits. (I'd bet money you have a lot of fun following all these symbols. But it's actually simple. HierarchyTraits refers to the hierarchy, and Traits refers to each type.)

If you want to specialize the traits for a specific class, say Ellipse, you do this by specializing the Traits template directly:

template <>
class Traits
{
... specialized stuff for Ellipse ...
};

You can choose to derive or not from the HierarchyTraits if you want just to override a symbol or two, or not to derive at all if you want to rewrite Ellipse's traits from scratch. It's up to you.

A note about the use of inheritance with the traits just mentioned: From the viewpoint of dynamic polymorphism, the fact that Traits inherits HierarchyTraits is wrong. HierarchyTraits is not a polymorphic base class. We use inheritance here with a different good reason—as a symbol-propagating device; the intent is to have Traits sip the bouillabaisse of symbols that HierarchyTraits defines. Inheritance has uses not only for dynamic polymorphism, but also for compile-time type manipulation.

The Traits-HierarchyTraits idiom presented in this section allows per-type specialization of hierarchy traits. In the example discussed, Traits and Traits take you to HierarchyTraits, while Traits takes you to the specialized Traits—all this with amazingly little scaffolding.

SUBHIERARCHY TRAITS Suppose that in your Shape hierarchy you decide to redefine traits for a certain subhierarchy of that hierarchy, for instance, the subhierarchy rooted in Bitmap (see Fig. 2). You will need therefore to specialize traits on Bitmap and all its derivatives, direct or indirect.

Figure 2. Hierarchy of shapes with a bitmap subhierarchy.

You can do this by deriving Bitmap from HierarchyRoot, in addition to deriving from Shape. Then you provide a using declaration that disambiguates the HierarchyId symbol, like so:

Through the using declaration, Bitmap gives preference to the class HierarchyRoot::HierarchyId over Shape::HierarchyId. This way you can now specialize some traits for Bitmap::HierarchyId and all its derivees will use that specialization. Unless, of course, you decide to define a new subhierarchy with distinct traits down the road.

CAVEAT EMPTOR The biggest disadvantage of hierarchy-wide traits is that they require modifying the base class of the hierarchy, and you sometimes are not in a position to do that. Fortunately, you get a pretty clear compile-time error ("Class Widget does not define a type HierarchyId") instead of subtle runtime errors.

You can mitigate this problem to a certain extent by specializing your hierarchy-wide traits template for an insipid type, like void. Then you use HierarchyTraits for hierarchies that you cannot modify. Not too flexible, but it works when you're stuck.

There are ways to create nonintrusive hierarchy-wide traits, but such techniques are more fragile and exposed to various kinds of errors. Ideas from readers are, as always, welcome.

CONCLUSION Traits objects are useful whenever the trait must hold some state. If a trait class has optional state (some traits have state, some others are empty) it's best to use the "inheritance hack" to grab the empty base optimization when applicable (and if available).

With only a tad of scaffolding, you can define hierarchy-wide traits. This way you write the traits for a given hierarchy once and only once. Hierarchy-wide traits offer considerable flexibility by allowing you to "personalize" traits for specific classes in the hierarchy. You also can define subhierarchy traits: traits that apply to a specific subtree of the inheritance graph.

Hierarchy-wide traits make heretical use of inheritance, proving that inheritance is not only a tool for runtime polymorphism, but also for compile-time type manipulation. C++ mixes the two natures of inheritance, which sometimes becomes confusing.

The best news of all, however, is that hierarchy-wide traits use only simple template amenities, which means that you can use them with your not-so-compliant compiler today.

Acknowledgment Many thanks to Herb Sutter, who took the time to review this article and added insightful remarks.

FOOTNOTE* Conveniently enough, the latest implementations of the standard C++ library for the mentioned compilers take advantage of the empty base optimization, which is applicable to containers. Every standard container aggregates an allocator object, and the default allocator is usually an empty class.