Login

ACCU Buttons

Tiny Template Tidbit

I am constantly amazed at what can be done with templates. I
started programming in C around 1985 and in C++ only around 1995.
Most of my work over the past 6 years has involved using C++ to the
maximum of my abilities, but only for three years have I been
working with templates. In this article I would like to share a
very small template tidbit, something I think is really cool. I
hope it can stimulate your interest in using template capabilities
for your own code, or just be another "trick" to put in your bag,
something I think fairly easy to remember.

I will assume in this article that you are at least familiar
with a few of the STL class templates, and, though you may not have
written your own class templates before, you have "seen how it's
done".

Problem
statement

We have an application in which a set of objects is being
generated by a third-party library and stored in a container. For
simplicity, I will refer to the objects as "points". Objects of
this class don't necessarily have much intrinsic behavior, and are
used by other classes of objects or functions for complex
computations. For instance, an object (which I call a processor for
clarity) might use the set of points to create a spline curve that
best fits the points. That is, each point is simply a 2D
coordinate, an aggregate of two numbers:

struct Point {float x,y;};
std::set<Point> points;

Yet, as the application evolves, it is likely that objects of
the Point class will become more
complex. Perhaps other state data, such as local temperature,
contact normal, density, etc, will be "carried around" along with
the coordinates of the point. In this case, the Point class suddenly changes from being the data to being an aggregate of
a data object and the other state data:

The "curve fitter" processor mentioned earlier suddenly no
longer works since it was designed to use the point as a 2D
coordinate pair, not as an object containing the 2D coordinate pair as a
data member.

Assuming the curve-fitter could be designed to compile in both
cases, the application may evolve such that the 2D coordinates of
the point are no longer a publicly accessible attribute, but rather
a private data member than can only be accessed via a const "get"
method (or even computed on-the-fly instead of being a data
member). This could happen when Point
starts having behavior of its own, not necessarily relevant to the
curve-fitting operation of the processor. Since the latter was
designed to access an attribute, it no longer accepts the new class
of Point. For instance,

struct Foo1 {int bar;};
struct Foo2 {int bar() const;};

Both should be adequate for the curve-fitter, yet the latter
would fail to compile if Foo2 is
used instead of Foo1, complaining
that bar() is not an attribute (or some
equivalent hard-to-decode error message, different for every
compiler). Does the processor really care how the data is
acquired?

If this weren't enough, the container type may change, yet the
processor object (curve-fitter in this example) does not care: a
std::vector, a std::list, a std::set,
or a your::container, as long as the
data is there, the processor should be satisfied. This is trivial
to adapt to, as you will see below, but more importantly, what if
the container stores pointers to the points rather than the actual
points? This is likely to happen if copying Point is onerous and you need to avoid copying
those objects in your application, so you change the container type
from containing Point to containing
Point*. Suddenly, the curve-fitter no longer
compiles since it expects each element of the container to be an
object, not a pointer to an object.

Finally, our processor (object or function) may be used in
different modules, or even different applications we build, perhaps
for related yet distinct application domains. Do we want to have 10
different versions of the same code, one for each possible
container type and point design?

It turns out that templates can be used in C++ to make one
processor (object or function) that accepts any container that uses
iterators AND, much more interestingly, is able to extract the
needed data from the container elements, regardless of whether the
container element type

Is the data, or is a pointer to the data (as opposed to
containing it)

Is an object or pointer to object containing the data

Makes data available as attribute or method

Moreover, templates allow you to do so at compile time, and automatically, without your
intervention. Yep, no kidding! Hopefully, you can see that the
above is a fairly generic problem, unrelated to the particular
application domain (curve-fitting) in which I encountered it.

Solution

Let's start simple and try to keep things simple. Assume we have
a function called fitCurve that takes a
container of points and finds the curve that best fits the set,
returning the result as a Curve object.
We are not concerned with the definition of the Curve class. To be generic, let's call the point
class Coord and the container of points
Coords:

This function can be used only to fit a curve to a set of
Coord objects stored in a std::list. If you have a set of Coords in a std::set or
a std::vector, fitCurve cannot be used. Why should the interface
force us to use one type, when fitCurve
only cares that the set is ordered? If you want to use STL
containers, then the containers are not related by inheritance, but
they do share common interfaces for certain operations. For
instance, they all have an iterator type nested. Let's make
fitCurve independent of the container
type. We use templates to parameterize fitCurve on the container type, for instance

The only requirement on the element type (Coord) in the container is that it have an
operator+ defined. The requirement on
the container is that we use Container::iterator instead of other access
operators such as operator[] to access
the individual elements of the container.

What are some of the advantages and disadvantages of
parameterizing? I can think of the following:

Advantages:

Any kind of container, whose concrete type is known at compile
time, can be given to the curve fitter. For each new type of
container, the compiler takes the "function template" (i.e. a
template for a function) and generates a "template function" (i.e.
a function that comes from a template) - the order of those words
matters. There will be a few constraints on the operations required
from the container, but this is already far better than being stuck
with one container class. The alternative would be to copy all the
elements from your container to the container expected by
fitCurve.

Since fitCurve is a function, most
compilers will know to generate the template function for the given
container without your having to specify it explicitly. The mere
act of calling fitCurve with your
container is enough, since the compiler knows the type of container
you are using, and can generate the template function
automatically.

For the same reason, you can change the type of your container
at some later stage of development, without having to do anything
to fitCurve: the compiler and your build
environment will know that the old "template function" should be
changed to a new one, at compilation time, and will do the change
for you.

Disadvantages:

Every template function that is generated by the compiler means
extra code. This is the famous "code bloat" problem of class and
function templates. This is a factor to consider only if you plan
on using several different containers in the same application
(which is not necessarily the case), and memory footprint
matters.

It can be difficult to predict how "well behaved" is the
Container (template) class. Can it throw
exceptions? Does its interface properly identify what is const and
non-const? Can this invalidate the algorithm that accesses the
Container? There will always be cases
where the disadvantages outweigh the advantages, but I have found
the reverse to be true much more often.

The next step is to allow the container element type to be a
pointer to a Coord as well as a
Coord. Intuitively, we know it should be
possible: the compiler knows exactly what is the type of what is
contained: i.e., a Coord or a pointer to
a Coord. When the element type is a
pointer to Coord, the above piece of
code would have coord1 and coord2 be pointers to Coord objects rather than actual Coord objects. Wouldn't it be nice if we could
write, in the above code, something like:

Now things are going to get a little more interesting. We want
to allow the element type of the container be something else than a
Coord, i.e. it could be a Point that contains a Coord. All we need is to overload getCoord() once more for the case where
v is a Point:

const Coord& getCoord(const Point& v)
{return v.coords;}

Of course, we want to be able to use our own class instead of
Point. Templates come to the rescue once
again:

This requires that POINT, the type of
object contained by Container in fitCurve, have a publicly visible "coords" data member. So it's not totally generic
but again, far better than being stuck with only Point and Coord
objects!

Before dealing with the member "attributes vs. methods" problem,
the last requirement is that the container can contain pointers to
point types that contain the Coord data
as a data member. What we need is a template overload for
pointers:

It is somewhat like "partial specialization" because some of the
type information is still general and unknown (POINT could be anything) but some is specific: it
must be a pointer to something. However "partial specialization"
specifically refers to specialization of a subset of the template
parameters in a multi-parameter template definition.

With those four getCoords put
together, we have the following scenario:

If the element type of the container is anything but a
Coord or Coord*,
it will use the first or second definition of getCoord, depending on whether or not getCoord is called with a pointer to a POINT or a POINT
itself;

Otherwise, it will use the appropriate complete specialization
of getCoord, depending on whether or not
getCoord is called with a pointer to a
Coord or a Coord itself.

The compiler can choose the right version of getCoord to use at compilation time. In the last
two cases, as long as Something has a
public data member called "coords", we are
ok. Container can be any class that
provides an iterator type (or whatever operations are required by
the processor object or function).

The last generalization that we must do is to allow the
"coords" to be a method instead of a data
member. For instance,

You probably know how to "point" to a method in a class. For
instance, the "coords" method can be
pointed to with a pointer of type "const
Coord& (Point::*)() const". Given a typedef

typedef const Coord&
(Point::* Method)() const;

This typedef defines a new type called
Method, as a pointer to a const method of the
Point class returning a reference to a
const Coord. If this seems horribly
difficult to read, don't worry, you're not the only one who feels
that way. Pointers to functions and methods is one of the
idiosyncrasies of C++. The syntax is atrociously difficult to get
your mind around, but I guess it could be worse. In any case once
you've done a few examples on your own it gets better. Here is an
example of using a Method object:

So the question is whether there is a way of doing the same
thing with data members.
Indeed, when getCoord accesses the
coords field of the class, the compiler
knows exactly what "coords" is: a data
member or a method. So if we could write the general getCoord template like

and properly define two getData
functions, one accepting a pointer to a method as second argument,
and another accepting a pointer to a data member as second argument, we have
accomplished our goal. The compiler will know which to choose
without us having to tell it. In pseudo-language:

where Method is the typedef described
earlier, ptr_dmem_Point stands for
"pointer to data member of Point" and
the ellipsis just means "don't know the syntax yet". It took me a
bit of trial and error to find out the correct syntax, but it turns
out it's not too bad: a "pointer" to a data member of type
T for a class Point is "T Point::*".
Simpler than defining a pointer to a method.

Our final solution for the getData is
therefore, after replacing Point by a
template parameter[1]:

Discussion

What does this give us?

Our fitCurve can use any container of
Coord, Coord*,
Something, or Something*, as long as the container has an "iterator" type defined in it, or whatever methods are
required by fitCurve (!=, ++, etc).

"Something" must have a "coords" field, which must be either a Coord or a "const method returning a reference to
a const Coord."

If we make any changes to our container type, no changes are
needed to fitCurve, as long as the
changes satisfy the minimal requirements of 1.

If we make any changes to the type of object stored in the
container, no changes are needed to fitCurve, as long as the new object is something
that satisfies 2. above;

The compiler, as it recompiles the class that uses fitCurve, will make use of the correct getCoord and getData
without our intervention. This is probably the biggest gain.

Since everything is inlined, there are no function calls
involved: everything is resolved at compile time and is equivalent
to your having typed the coords data member
straight where it is used by the processor object or function.

We now have a far more generic processor with only 15 lines of
extra code, and one that requires almost NO intervention on our
part when using the fitter with a new or different type of
container and contained objects. Pretty good for 15 lines of
code.

For a library developer, this level of generality may still be
insufficient: the member in POINT must
be called "coords". This is similar to STL
containers which all have a member called "iterator" that defines an iterator appropriate for the
container. To make it completely general, we would need a way of
allowing the user to specify that they want something other than
"coords" to be used by getCoord(). One
work-around would be to define further complete specializations for
your point class, using the appropriate member:

in your source file to get the specialization. The good news is
that if you forget to define this, you get a compile time error.
The bad news is that you have to define it in a place where it is
visible to the compiler by the time your processor object or
function calls getCoord, which is likely
to increase the coupling between header files. The only other way
that I can think of is to derive your class and provide a
"coords" alias for the member, but this will
be useful only if you have any control over the type of object
stored in the container. For instance, you could derive YourPointClass as

and use a container of PointDerived
objects instead of a container of YourPointClass objects. This only works if you
have control over the element type of the container.

It is interesting to notice that templates are necessary in C++
only because C++ is such a strongly typed language. Consider
Python, which is essentially untyped: classes are not
differentiated by type, only by name. Hence a List class in Python
that takes as argument the name of an object to store in the list,
can have that object be any class of object. The idea of templates
in Python is implicit in some ways, and unnecessary in others.
There are probably many other languages where this applies. The
strong-typing nature of C++ requires extra typing (sorry for pun),
but it allows the compiler to know much more about the data being
handled, and therefore to do much more stringent error-checking and
optimizations not possible in weakly typed languages. Templates in
C++ are necessary so C++ can support very useful features found in
other high-level languages in a strongly-typed and optimized
context.

Note: At the time of this writing, GNU g++ 3.0 (on SGI) has
problems picking the correct getCoord: it seems to get confused by
the pointer overloads. The code compiles and works perfectly,
however, with SGI's MipsPro C++ compiler 7.3.1.

Summary

I showed how a processor (object or function) that uses data
from a set of objects can be generalized with a very small number
of lines of code, using templates and template specialization, to
support application evolution, without requiring you to adapt the
processor. More specifically, your processor object or function is
able to extract the needed data from the container elements,
regardless of whether the container element type.

Is the data, or is a
pointer to the data (as opposed to containing it)

Is an object or pointer to object containing the data

Makes data available as attribute OR method

Moreover, the compiler does so for you, automatically, without
requiring your intervention. I'll gladly reply via email to
suggestions and ideas. Sincere thanks to Phil Bass for his critical
comments and ideas.

References

Bjarne Stroustrup, C++ Programming Language, third
edition

Andrei Alexandrescu, Modern C++ Design

Mark Lutz, Python Programming, second edition

[1] It is not currently possible, most
unfortunately, to define templated typedefs. Hence writing the
following is not possible, even though it would make perfect
sense: