I recently attended the
international meeting of the Embedded C++ organization (held in
San Jose in conjunction with the Embedded Systems Conference). If
you have not heard of this organization, it is a consortium of
(mostly) Japanese companies that have set out to create and adopt
a subset of Standard C++ to be used in developing high end
embedded applications. For more information check out their home
page at "www.caravan.net/ec2plus". I have been
particularly interested in this effort ever since I first heard
about it. In part this is because I have a long standing interest
in embedded software development, even when I can not claim to
actually be doing such. More to the point of this column however,
I am interested in the Embedded C++ effort for a number of
sociological implications that I see it having for the overall
C++ community.

First there is the relationship
of Embedded C++ to Standard C++. On the one hand, there seems to
be a general consensus amongst those "in the know",
including many of the old time members of the C++ committee that
(d)Standard C++ and its library@ are too large. On the
other hand, there is no consensus whatsoever about which features
/ libraries should be left out (I want to go on record as NOT
being one who thinks (d)Standard C++ is too big, but that is
another column). Embedded C++ is the first real attempt (that I
am aware of) to come up with some viable criteria for subsetting
Standard C++. While there is some disagreement over the criteria,
there is almost universal support for the effort as a whole. One
thing that seems key is the Embedded C++ technical committee's
commitment to being a pure subset of Standard C++ (i.e. no
extensions allowed). It seems that people with a real stake in
creating portable, maintainable software are willing to accept
Standard C++ as is, even if they don't want to actually use it.

The second factor is the emphasis
on efficiency. The Embedded C++ committee identified a spectrum
of hardware and applications domains that they felt represented
the embedded software arena. On one end were the very small
systems where assembly language is still the only real option;
next came those systems where C is the programming language of
choice; above that are larger systems which Embedded C++ is
intended to support; and finally, sitting on the large end of
things are those systems where full blown Standard C++ makes
sense. Naturally, the distinctions are not clear cut and a lot of
overlap is possible. In particular, I think much of the appeal of
Embedded C++ is that it represents a version of C++ that could be
used anywhere C makes sense. In the other direction, I suspect
there are a lot of people who would be perfectly happy using
Embedded C++ instead of Standard C++ for all their application
development. In this sense, it seems that the Embedded C++ effort
is returning C++ to its roots.

One of the key's to C++'s
popularity from the beginning has been its claim to provide
C-like efficiency. This remains one of the major reasons
developers and software organizations choose C++. In spite of
this, there are many hard core C developers that question the
validity of this claim. Given the undeniable fact that developing
large scale C++ software can be fraught with pitfalls, one of the
contentions that often comes up when a C++ project gets into
trouble is that it would have been better and more efficient to
have done it in C. Now, with Java making claims to be able to
generate code as efficient as C++, as well as claiming to be a
more productive object-oriented language to use, the C++
community needs to start paying real attention to some of the
efficiency issues that previously were simply ignored because the
alternatives were clearly much worse.

A lot of these issues fall into
the area of "quality of implementation", in other words
how good is your compiler, and/or linker at doing C++ specific
optimizations, as well as how efficient is your standard library
implementation. As a mental exercise consider the following
question: you have a program written in ANSI C; if you port that
program to C++, basically by just recompiling it, how much extra
runtime and memory overhead would you expect the C++ version to
cost you?

A reasonable answer is "none
at all." After all, the C code contains no constructors or
destructors to be implicitly invoked, no conversion routines, no
virtual functions, no templates, etc. In particular, even those
primary sources of unwanted overhead, RTTI and exceptions, are
not present in a pure C program. Without virtual functions, there
will be no RTTI blocks generated; since there are no destructors,
there is no reason for a compiler to generate the tables needed
to support stack unwinding during an exception; and finally,
since there are no exceptions being thrown from any place within
the code, there is no reason for the final executable to contain
the code to support the exception runtime system.

All of this seems
"reasonable" but is it "likely?" That I do
not know. What I do know is that even if you begin with a pure C
design, C++ programmers soon start to tinker with it. First you
replace malloc/free with new/delete. This
introduces the possibility of exceptions. Then you start building
better abstractions with constructors and destructors and
eventually virtual functions. Finally, your end up using
templates and RTTI and while the functionality may be similar to
the old C program, the code is nothing like it was before.
Consider the following two versions of the classic hello world
program.

First a C version:

#include <stdio.h>

int main(int argc, char*
argv[])

{

const char* p = "Hello
world!";

printf(stdout,
"%s\n", p);

return 0;

}

Now a C++ version:

#include <iostream>

int main(int argc, char*
argv[])

{

const string s = "Hello
world!";

cout << s << endl;

return 0;

}

Just for grins, I turned both of
these into functions, stuck them into a program that ran each a
million times (sending the output to /dev/null), and generated
some statistics. The figures are:

C version 8.83s

C++ version 55.67s

While the actual figures are
implementation dependent, system dependent, time-of-day
dependent, and probably latitude/longitude dependent, they do
illustrate my point: the typical C++ program is usually not
anywhere close to the performance of a similar C program. The
fact that the C++ version is more type safe, more flexible, may
have been easier to get debugged and running, etc, is often
overlooked, especially when it appears that all of those things
are not actually happening. Now, I am not for a moment planning
to give up string and iostreams, or any of the other general
purpose C++ libraries that I have come to depend upon. Nor am I
going to forgo virtual functions, constructors, destructors,
multiple-inheritance, operator and function overloading,
references, RTTI, or even exceptions. Still, it doesn't hurt to
pay attention to what some of these things cost. The main
emphasis of this column is to introduce a lexicon for identifying
certain characteristics about the overhead of certain C++
classes. More broadly, this column is about performance issues in
C++.

Plain Old Data

Let us start with an overview of
certain parts of the (draft)Standard itself. In ARM C++ there
were the built-in data types and there were user defined data
types. In (d)Standard C++ things are a little more complicated.
In the (d)Standard the distinction is between POD types and
non-POD* types. The POD types include the built-in
types, and user defined POD-class types. The definition of a POD-struct
in the (d)Standard (paraphrased slightly) is "a user defined
struct or class that has no user-declared constructors, no
private or protected non-static data members, no base classes, no
virtual functions, no non-static data members of type pointer to
member, non-POD-struct, non-POD-union (or array of such types) or
reference, and has no user-defined copy assignement operator, and
no user-defined destructor." A POD-union is defined
similarly. A POD-class is either a POD-struct or a
POD-union.

When you get past the double
negatives and the recursion, you find that a POD-struct is
basically a data structure that could be defined in C. It is true
that a POD-class can have static data members, as well as static
and non-static member functions, but they do not affect the
fundamental properties of the class. The point here is that many
of the requirements specified in the (d)Standard apply only to
non-POD-class types. For example, if you write

X x;

The compiler is required to
invoke the default constructor to initialize 'x' only if class X
is a non-POD class. Without attempting to go through the
(d)Standard, the intent is fairly clear -- the compiler is free
to treat POD data types different from non-POD types. By
implication (often made explicit in the notes and footnotes of
the (d)Standard), the compiler is expected to treat POD types
pretty much just like a C compiler would treat them. Thus, if I
write

podPoint pt;

where podPoint is a POD
Point class (listing 1) I expect the compiler to set aside memory
for the object, but to do no initialization (the value of 'pt'
will be indeterminate). Likewise for

podPoint ptarray[1000];

Whenever these objects go out of
scope, I expect the "destruction" to be equally simple.
Finally, as mentioned above, I do not expect the compiler to
worry about such objects when it comes to generating code to do a
stack unwind in case of an exception.

Initializing POD objects

If I want to initialize a
POD-class object, the obvious way to do it is with an aggregate
initializer:

podPoint pt = {0.0, 0.0};

The same result as above can be
obtained by simply writing

podPoint pt = {};

This brings up another subtle
difference between ARM C++ and (d)Standard C++. In ARM C++, if
the number of elements in an aggregate initializer was less than
the number required to fully initialize the object, the compiler
would supply zeros for the ommitted elements. Thus "{}"
above is equivalent to "{0,0}" in ARM C++. The implicit
conversion from int to float means that the two statements above
have the same effect.

In (d)Standard C++, the
definition of an aggregate has been relaxed somewhat from ARM
C++. You can now have an aggregate that is a non-POD-class. An
object of the following

struct Error {

int errNo;

string errMsg;

};

can be initialized with an
aggregate initializer:

Error err = { 42, "Disk
full" };#

As a result of this, the
committee also generalized what the compiler supplies if you omit
elements of an aggregate initializer. Now, instead of zero, the
compiler supplies the equivalent of 'X()' where 'X' is the type
of the element to be initialized. Therefore, under (d)Standard
C++

Error err = {};

is equivalent to

Error err = { int(), string()
};

The ability to write 'X()' where
X might be a built-in type was added to C++ syntax when templates
came along. Most current compilers treat 'X()' as a no-op if 'X'
is a built-in type. This is clearly not what people expect to
happen if they omit elements from an aggregate initializer, so
under (d)Standard C++, an explicit initializer of the form 'X()'
for a POD type results in the element being zero initialized.
This means that

Error err = {};

has the same effect as

Error err = { 0, string() };

which is probably what we
expected. Since podPoint is itself a POD-class, we can
write

podPoint pt = podPoint();

and get the same result as

podPoint pt = {};

Note that this is not true for
class Error. Since Error is a non-POD-class, the form

Error err = Error();

does not zero initialize the
elements of the class, instead it invokes the default constructor
for class Error. This is obviously the correct thing to do since
the string element 'errMsg' requires something more than just
being zero initialized. In this case, the default constructor is
implicitly defined by the compiler. The resulting constructor is
equivalent to a user defined version of the form

Error() {};

This is NOT the same thing as

Error() : errNo(), errMsg() {}

Instead the language semantics
make it equivalent to

Error() : errMsg() {}

The empty initializer list in the
implicitly generated default constructor does not initialize any
POD data types. Any elements of non-POD-class type are
initialized by their default constructors, as usual.

All of this is probably of little
consequence in most C++ code.

Aggregate initializers are the
preferred way to initialize POD-structs. If you use aggregate
initializers wherever you did before, you will get the same
results that you did before. The key difference for most
programmers will simply be the ability to use aggregate
initializers in a few situations where they were not permitted
before. I have gone into it here because it does have some
important implications further down, in particular for light
weight classes.

(Re)Introducing the Infamous Four

From the above it should be
fairly obvious that if you stick to using just POD data types in
your C++ program you should get performance that rivals a C
program. On the other hand, you might as well use C. It is
probably safe to say that most user defined data types in a
typical C++ program are not POD-classes. In fact, many well
meaning text books and C++ coding style guides go so far as to
insist that if you define a class type you should:

a. Make all data members private.

b. Provide initializing
constructors. These should carefully initialize all base classes
and all members in the constructor initializer list (being
careful to specify them in the order they appear in the class
definition).

c. Provide a default constuctor.

d. Provide a copy constructor.

e. Provide an assigment operator
(it should carefully check for self assignment and then make sure
it assigns all data members).

f. Provide a virtual destructor.

A lot of good C++ code has been
written which follows these rules. Unfortunately, it is probably
safe to say that a lot of otherwise good C++ code has been
derogatorily compared to comparable C performance as a result of
blindly following these rules.

I assume that most readers of
this column are aware of the special class members that the
compiler will write for you under certain circumstances. The four
most important of these (hereafter referred to as the Infamous
Four, or I4) are:

 The default constructor

 The copy constructor

 The copy assignment
operator

 The destructor

Further, I assume that most
readers know what the semantics of the compiler generated version
are, and under what circumstances the compiler generated versions
are not sufficient (if not see [3] or [4]). What I am interested
in here is under what circumstances the compiler generated
versions aren't even generated.

For each of the I4, the
(d)Standard indicates that there can be trivial and non-trivial
versions. The definitions of trivial for each of these
follows:

-- A default constructor of a
class is trivial if it is implicitly declared and if (a) the
class has no virtual functions, and no virtual base classes, and
(b) each direct base class has a trivial default constructor, and
(c) for all non-static data members of the class that are of
class type (or array thereof), each such class has a trivial
default constructor.

-- A copy constructor for a class
is trivial if it is implicitly declared and (a) the class has no
virtual functions and no virtual base classes, and (b) each
direct base class has a trivial copy constructor, and (c) for all
non-static data members of the class that are of class type (or
array thereof), each such class has a trivial copy constructor.

-- A copy assignment operator of
a class is trivial if it is implicitly declared, and (a) the
class has no virtual functions and no virtual base classes, and
(b) each direct base class has a trivial copy assignment
operator, and (c) for all non-static data members of the class
that are of class type (or array thereof), each such class has a
trivial copy assignment operator.

-- A destructor of a class is
trivial if it is implicitly declared, and (a) each direct base
has a trivial destructor, and (b) for all non-static data members
of the class that are of class type (or array thereof), each such
class has a trivial destructor.

This is what the (d)Standard
says. As with the distinction between POD-class types and
non-POD-class types, certain requirements in the (d)Standard are
called out only for non-trivial versions of the Infamous
Four. Where there are no requirements imposed on compilers
regarding trivial versions of the I4, the implication is obvious
that the compiler is free to do something intelligent. For a
trivial default constructor or destructor, the obvious
intelligent thing to do is -- nothing. For a trivial copy
constructor or copy assignment operator, the obvious intelligent
thing to do is to substitute bit-wise copy semantics for the
otherwise required member-wise copy semantics. It is upon the
assumption that this is what a good compiler will do that the
rest of this paper is based.

A lexicon for class types.

What I propose is to distinguish
four different categories of user defined class types based upon
whether or not they have trivial versions of the I4 functions.
The class types are:

 Feather weight classes

 Light weight classes

 Middle weight classes

 Normal or typical weight
classes

In overview, the definitions are:

-- A feather weight class is one
that has a trivial default constructor, a trivial copy
constructor, a trivial copy assignment operator, and a trivial
destructor.

-- A light weight class is one
that has a trivial copy constructor, a trivial copy assignment
operator, and a trivial destructor.

-- A middle weight class is one
that has a trivial destructor.

-- A normal weight class (also
known as a typical class) has non-trivial versions of each of the
I4.

Several things should be obvious
from these definitions. First, each class category is a subset of
the following category. Thus, if a class is a light weight class,
it also qualifies as a middle weight class, and a normal weight
class. Furthermore, each definition is recursive in terms of
itself. Thus, a light weight class has base classes and members
that are all light weight classes themselves. Finally, it should
also be obvious that a POD-class type or a built-in type can also
be substituted in place of a feather weight classes, with one
caveat which I will mention below.

Normal weight classes are what
most of us create whenever we define a class in C++. It is
definitely what you get if you follow the requirements of most
C++ style guides. Unfortunately, there can be a lot of overhead
associated with a normal weight class. In a lot of cases, this
overhead is unnecessary. This lexicon is about the alternatives.

Feather weight classes

A feather weight class is
a class that has trivial versions of all of the I4 members. While
a feather weight class is not a POD class, by definition,
nevertheless you can think of the POD-class types as a subset of
the feather weight class types with the following exception: if
you try to explicitly invoke the default constructor of a
POD-class type, you will zero-initialize all members of the
object; whereas an explicit default constructor invocation of a
feather weight class object does nothing.

It is fairly easy to turn a
POD-class type into a non-POD feather weight class -- just
provide a base class, or have one of the members turn into a
feather weight class type. A more common transformation is to
have one or more private or protected data members. Listing 2
shows a version of the Point class that qualifies as a feather
weight class.

This is actually a useful
abstraction, though it is not one you might expect. Assuming our
compiler is on the ball, both of

fwPoint pt;

fwPoint ptarray[1000];

should act just as they would for
podPoint. Likewise, any assigments such as

ptarray[100] = pt;

should use bit-wise copy
semantics. The one area where there are differences between a
feather weight class and a POD class is initialization.

Typically, we can not use an
aggregate initializer with a feather weight class object. Instead
we can do initialization such as the following:

fwPoint pt =
fwPoint().moveTo(1.0, 1.0);

The (d)Standard makes it clear
that the compiler is free to eliminate the temporary and the copy
constructor invocation implicit in the above and simply treat
'pt' as the target of the moveTo() call. For a typical
normal weight class, we would assume that any decent compiler
would automatically perform this optimization. In the case of a
feather weight class, the compiler might not bother to optimize
away the bit-wise copy that results from the trivial copy
constructor. This is the first of many QOI$ issues
that I will be touching on. If your compiler doesn't optimize
away the copy, then you might prefer

fwPoint pt;

pt.moveTo(1.0, 1.0);

For myself, I have adopted a
guideline that says that no attempt should be made to initialize
feather weight objects unless the feather weight class is also an
aggregate.

A feather weight class can be an
aggregate and can thus be initialized by an aggregate initializer
if (a) it has no base classes, and (b) it has no private or
protected nonstatic members. Stated another way, a feather weight
class is an aggregate if it is otherwise a POD-class type, but
one or more of its members is of feather weight class type
instead of being of POD-class type.

Feather weight classes occur more
often than you might think. They occur fairly regularly in
certain parts of the STL. Most of the functors defined by the STL
qualify as feather weight classes even though they are empty.
Most empty classes would seem to meet the requirements of a
POD-class type, but the STL function objects usually have a base
class. If there is a base class, then the resulting class is, at
best, a feather weight class. Nevertheless, there is usually no
reason to burden such an empty class (unless it is an ABC), with
user defined versions of the I4.

Light weight classes

If truth be told, I do not have a
lot of use for feather weight classes. I think initializing
constructors rank right up there with subroutines as one of the
most important developments in the history of programming.
Therefore, most of my classes, even ones that might otherwise
qualify as aggregates, usually end up with a constructor or two.
If the class is otherwise a feather weight class, the presence of
a user defined constructor converts it into a light weight class.

A light weight class is a
class that has a trivial copy constructor, a trivial copy
assignment operator, and a trivial destructor. The class will
have one or more user defined constructors. This will prevent the
compiler from implicitly declaring a default constructor, so a
light weight class will typically also have a user defined
default constructor. Listing 3 presents a light weight version of
the Point class.

Since initialization is what
light weight classes are all about, that is where we will
concentrate our attention. First, because a light weight class
has an initializing constructor, the compiler will require that
we provide a default constructor if we are to write code such as:

lwPoint pt;

lwPoint ptarray[1000];

It is very tempting, and usually
typical, to create a default constructor for lwPoint by
simply providing default arguments for one of the other
constructors. For example, we could have defined lwPoint's
initializing constructor as

lwPoint(double x = 0.0, double
y = 0.0)

: _x(x), _y(y) {}

and kill two birds with one
stone. But what does this do to our users?

This is a primary illustration of
the difference between C and C++ programming styles. The compiler
is required to invoke the default constructor to initialize lwPoint
objects. So

lwPoint pt;

sets 'pt' to the origin. The
language also requires that the default constructor be used to
initialize every member of the 'ptarray' object. Suppose that the
origin is not the desired initial value for the point. In the
case of the single object, we can override the default
initialization with an explicit initialization as in

lwPoint pt = lwPoint(1.0,
1.0);

Doing the same thing for the
array would require that we specify an aggregate initializer with
1000 elements. Let's be real. Instead, we will write a loop after
the declaration which will step through the elements of the array
and initialize each one to the desired value. This means that the
compiler will first step through the array and initialize every
point to (0.0,0.0) and then we will go back through and
reinitialize every point. Compared to using a POD Point class (or
a fwPoint), the initialization will take at least twice as
long. Maybe for a given application this extra overhead will be
lost in the noise level, but maybe not. One thing is for sure, it
is stuff like this that causes hard core C programmers to snicker
under their breath when C++ programmers talk about writing
efficient code.

We can not change the language
semantics, but we can at least make the compiler's job easier --
maybe. The first thing we do is to re-establish the fact that
object construction and initialization are two different things.
Construction is the responsibility of the class designer.
Establishing an initial value for an object is the responsibility
of the user. In C++, we are so use to using constructors to do
initialization that we forget the fact that the default
constructor's sole reason for existence is to do object
construction in those cases where the user does NOT supply an
initial value. In the case of lwPoint, there is no need
for the default constructor to do anything, so I have written it
to do nothing.

In fact, at this point I will
make two assertions: (1) every light weight class can have an
empty default constructor, and (2) most should. The first
assertion follows from the fact that a light weight class has
both a trivial copy constructor and a trivial copy assignment
operator. Since both these functions exhibit the same bit-wise
copy semantics, it means that assignment can be substituted for
copy construction. Therefore, any class that qualifies as a light
weight class can be left un-initialized by its default
constructor with the assurance that when the user assigns an
initial value (we always initialize our variables before we use
them, don't we?), everything will be taken care of.

The second statement is not quite
as strong as the first. Listing 3 provides the beginning of a
light weight string class, defined as a template. In this case,
while it is legal to have an empty default constructor, the
semantics of the class require that a valid object always has
_length <= SIZE. This, plus the fact that most users expect a
string (or any container) to be initialized to empty, caused me
to yeild to convention and provide some initialization in the
default constructor. Still, in doing so I was conscious of what I
might be losing in terms of performance.

Just exactly what does an empty
default constructor buy us? In the case of a single object
definition

lwPoint pt;

the inline default constructor
will be substituted in place to do the object initialization.
Since the constructor is empty, it will have no effect and the
light weight class is equivalent to the POD class. The case of
the array initialization is more complicated. Given

lwPoint ptarray[1000];

we might assume that the compiler
will just generate a loop which invokes the default constructor
for each element of the array. If this happens, the optimizer
should come along and figure out that the loop body is empty and
eliminate the entire thing. Unfortunately, things are more
complicated than they at first appear. The language specification
not only requires that the default constructor be invoked to
construct every element of the array, but it also requires that
if any one of those constructor invocations throws an exception,
then every element of the array before the one under construction
will have its destructor invoked before the exception is allowed
to propagate.

Because of this extra complexity,
many implementations delegate array initialization to a separate
function. This function is passed the address of the constructor,
the address of the destructor, the number of elements in the
array, etc. With an empty constructor, the loop will only consist
of the function call and return overhead, plus whatever overhead
is typically involved in a try/catch block. For a large array,
this is still a non-trivial amount of overhead to do nothing.

We might reasonably expect that
the compiler will do some optimizations. The first thing we can
hope is that the compiler will recognize that the destructor is
trivial. A trivial destructor means that the stack unwind in case
of an exception is trivial. At the very least, we can hope that
the compiler will use a different initialization function when
the destructor is trivial -- one that doesn't have the try/catch
block overhead to deal with a constructor exception. Ideally, we
can hope that the compiler will also recognize that it has an
inline constructor along with the trivial destructor, and will go
ahead and inline the array initialization. This will give the
optimizer a chance to recognize and remove the empty loop.

Assuming all of this recognition
and optimization takes place, we could be left with an array
whose initialization imposes no more overhead than an array of
POD class type. While most of these optimizations seem reasonable
for this special case, they may not be reasonable in general. For
example, in my own coding I seldom bother to inline a function
that contains a loop. I figure the function call overhead will be
swamped by the loop. It is not unreasonable that a compiler will
make the same choice and refuse to inline the array
initialization. Likewise, it might seem to me a short step from
recognizing that a default constructor is trivial, to recognizing
that it is non-trivial but is inlined and empty. Nevertheless, it
is a step, and I really have no idea how difficult it might be.
Needless to say, this is all very much QOI issues beyond the
scope of the (d)Standard. If your application domain needs the
performance, you might want to run a few tests to see how much
optimization your implementation performs.

In general, you can probably
expect that arrays of light weight classes (or higher) will
likely have an initialization overhead that doesn't exist for
feather weight or POD classes. In the absence of adequate
optimization, what we desire is something similar to an array
that lets us specify the initial value to be used during
initialization. In fact, someone has come up with a reasonable
facsimile of such an array, it is called vector.

If we use a vector of lwPoints,
we have a few different choices. We can write

vector<lwPoint>
vpt(1000);

and the vector will initialize
itself using the default constructor. This is the same as an
ordinary array. We can specify the initial value for a vector
however.

vector<lwPoint>
vpt(1000, lwPoint(1.0, 1.0));

Finally, we can just request a
chunk of memory be reserved and fill it in later.

vector<lwPoint> vpt;

vpt.reserve(1000);

I said that vector is a
reasonable approximation of what we desire. Unfortunately, it is
not a particularly good approximation. The vector<>
template is intended to be instantiatable with any type.
Therefore, a correct vector<> implementation has to
be written to deal with the needs of any normal weight class.
This means the vector constructor has to have the same general
purpose code to cope with exceptions that the standard array
initialization function has (rather, it should have; until
recently there was nothing in the (d)Standard that required
vectors (or any other standard containers) to cope with
exceptions. A recent addition to the (d)Standard as attempted to
correct this oversight). What we really want is a class that is
specialized to take advantage of the characteristics of light
weight classes, i.e. the bit-wise copy semantics and the trivial
destructor semantics.

It turns out that there is a
container in the (d)Standard library that is specialized for POD
classes. It is called 'basic_string'. While it may seem
incongruous to create a string of points, it should work just
fine. Actually, since basic_string is defined to work only
with POD data types, instantiating basic_string with a
light weight class technically yields undefined behavior. While I
can not recommend this as a portable technique, my own experience
says that a light weight class has enough in common with a POD
data type to work just fine in a string. Alternatively, you might
consider defining your own version of vector<>
specifically tailored for light weight classes.

Middle weight classes

In my lexicon, a middle weight
class has only a trivial destructor. In general, by the time you
have a class that needs user defined constructors and user
defined copy semantics, you probably also need a user defined
destructor. There is one special case that I mention primarily
for completeness. If you review the definition of what it means
for the destructor to be trivial, you will not find any mention
of virtual functions. This is in contrast to the trivial versions
of the other three of the I4. So, one simple way to get a middle
weight class is to add a virtual function to a POD, feather
weight, or light weight class.

If the idea of a class that has
virtual functions but does not have a virtual destructor doesn't
send chills down your back, you definitely need more C++
education. Nevertheless, I mention it because there is one area
where even a middle weight class can have a significant
performance advantage over a normal weight class -- stack
unwinding. I touched on this above, but the full story is that
whenever a C++ function creates a local object, the compiler is
obligated to make sure that object will be destroyed if an
exception propagates out of the function. Different compilers
have adopted different schemes for ensuring this stack unwind
process occurs correctly. Some build tables at compile time,
others build tables at run time. Each has its advantages and
disadvantages. All impose some type of overhead on a C++
application, even if no exceptions ever get thrown.

There are all kinds of QOI issues
surrounding exceptions, but it seems reasonable (that word again)
that a compiler should realize that if a class has a trivial
destructor, it doesn't have to worry about destroying objects of
that type during a stack unwind. Personally, I think that if you
are worried about this type of overhead, you are better off
sticking to light weight classes, and/or just forgoing virtual
functions altogether. Nevertheless, like I said, I mention it for
the sake of completeness.

A more realistic version of a
middle weight class is hinted at in listing 4. This is one of the
rare occasions when it actually makes sense to have user defined
copy semantics, but destruction

is still trivial. In this case,
the programmer (me) decided that in order to avoid the code bloat
from having a bunch of lwString templates instantiated
with different sizes, I would provide all functionality via a
single base class that all the templates would derive from. The
base class performs all operations using its three data members,
one of which is a pointer to the actual data area in the derived
class. Obviously, I could not allow bit-wise copy semantics here
(note that _size and _data are declared as const data members).
Still, there is nothing for the destructor to do, and by
eliminating it I (hopefully) keep mwString objects out of
the stack unwind tables. Note that the actual data copy operation
in mwString::operator=() is kept as close to bit-wise
semantics as I could make it.

[Aside: I can't resist noting
that one of the most useful versions of the mwString
template is the specialization for SIZE = 0.

template<> class
mwString<0>;

This also derives from mwStringBase,
but since msString<0> doesn't have a data area of
its own, it has to be constructed from an existing character
array. It uses its argument to construct mwStringBase. As
such, it can be used like so

const mwString<0> s =
"Hello World!";

to wrap a standard C++ string
class interface around an existing C style string without having
to actually construct a string object -- which would make
a copy of the data. End aside]

Wrapping up

After all this, what can we
conclude?

A. Beware of C++ coding style
guides (and introductory texts) that recommend always supplying
user defined versions of the I4. There are certainly classes that
need these, and every good C++ programmer should understand the
circumstances where they are required. Nevertheless, blindly
supplying these functions for every class you create can
seriously cripple the efficiency of otherwise useful
abstractions.

The question of whether or not
you should provide user defined versions of the I4 for classes
that are clearly normal weight classes when the implicitly
generated versions are adequate is a much more subjective
question. I tend to be very ambivalent on this topic. Once upon a
time, I was fairly strongly in favor of not writing any code that
the compiler was perfectly capable of generating automatically.
Obviously, if you have any constructors you have to provide your
own default constructor, but in many cases the copy constructor,
copy assignment operator, and destructor just end up being hand
written versions of what the compiler would have generated
anyway. Besides being less code to write, it seemed to me that
allowing the compiler to generate them was likely to make
maintenance easier. I have since changed my mind.

These days I lean toward
"explicit is better". I am not fanatical about it, but
in general it now seems better to make things visible and not
depend upon the compiler's implicit versions. While this may open
the potential for more errors initially, it actually seems to
make maintenance easier in the long run. Besides, this becomes
one way to delineate normal classes from the lighter versions --
if a class doesn't have a user defined version of one or more of
the I4, it should be commented that the function is being
explicitly omitted so it can be recognized as trivial (remember,
user defined versions of the I4 are never considered trivial, no
matter how trivial they look).

B. Likewise, beware of coding
style guides that recommend always giving every class a virtual
destructor. Besides the obvious fact that any user defined
destructor is a non-trivial version, if it is a virtual function,
it automatically insures that none of the other I4 functions are
trivial either.

C. Remember that good old C style
structs (a.k.a. POD-class types) still have their place. If your
application needs to create and destroy a lot of objects, and do
it efficiently, then you need to consider POD-class types. In
particular, if you need to create large arrays of objects, the
POD data types make a lot of sense. Don't forget that a POD-class
can have member functions, so you can have many of the advantages
of an abstract data type without the overhead.

D. If you want (and can afford) a
bit more abstraction (and most of us can afford it), consider
using feather weight, or light weight classes. A light weight
class, in particular, has most of the benefits of a normal class
in terms of its ability to provide a complete abstraction, but
given a decent compiler its usage can still be as efficient as a
POD class.

E. Don't overspecify your default
constructor. As noted, a light weight class can always have an
empty default constructor. This also means that you should almost
always provide a separate default constructor, not a version
depending upon default arguments for another constructor. As a
general rule, I would apply this guideline to all classes.
Remember, the default constructor is there to construct
the object under those circumstances where no initial value is
available. Do as little as possible in a default constructor.

A secondary issue is whether the
default constructor should be inlined or not. Opinions differ on
whether constructors in general should be specified as inline.
For normal weight classes I tend to prefer to not inline
initializing constructors, but I also tend to inline default
constructors. For light weight and middle weight classes, all
constructors should always be inlined. This allows whatever
optimization the compiler might be able to perform to take place.

F. As a general rule you probably
want to avoid using large arrays of anything heavier than feather
weight objects. If your compiler is good enough, you might get
away with light weight objects, but in general, if you have to
create large collections of normal weight objects, consider using
a vector instead of an array. If you need a large array of light
weight objects, consider using the basic_string template instead
of vector.

G. If you are creating templates,
don't overspecify your class definition. While I haven't
specifically discussed templates, many of these guidelines are of
particular concern to template writers. Consider the following

typedef pair<double,
double> Point;

What kind of Point have I
created? Since pair<> has at least one constructor,
the best I can have is a light weight class. Ideally, what I want
is to actually have a light weight class, and not
something heavier. This means the definition of pair<>
needs to be as minimal as possible. In particular, the default
constructor for pair<> needs to be empty, and there
should not be any of the other I4 functions declared.
Unfortunately, there are still compilers that will not correctly
handle this, but hopefully that will be changing soon.

All of this is not intended to in
any way discourage the use of ordinary normal classes.
Fundamentally, a clean design is the most important
characteristic of any program. Starting with a clean design, you
can usually address just about any other problem in an
application, including its performance. Normal weight classes are
a key aspect to C++'s data abstraction and object oriented
programming paradigms, and as such are fundamental components of
most C++ designs. On the other hand, you should not ignore
efficiency (as they say), and so I encourage you to be on the
lookout for those data types that can efficiently be represented
with something less than a full blown normal weight class. This
way we use the best of C++'s data abstraction and object oriented
programming styles where it makes sense, while retaining the
efficiency of C where we can.

Suppose I were to ask "What
is the difference between the following two statements?"X a; // 1

X b = X(); // 2

In ARM C++ the answer is fairly
easy -- basically there is no difference. If X is a built-in
type, the explicit constructor call syntax on line 2 is just a
syntactic convenience added to the language for the sake of
templates -- it does nothing. If X is a user defined type, then
the compiler will implicitly invoke the default constructor --
which is what is done explicitly on line 2. In (d)Standard C++,
this changes somewhat.

In (d)Standard C++, the
differentiation now depends upon whether or not X is a
"non-POD class type" (see the column text for the
definition of a POD class type). If X is a non-POD class type,
then the behavior is the same as ARM C++ for user-defined types.
If X is a POD type, however, things are different. POD types do
not have default constructors (by definition). In this case, the
statement on line 2 does a "default initialization." A
default initialization is the same as the initialization of a
global static -- it does a zero initialization.

This new convention is
potentially useful in those situations where aggregate
initializers are not allowed (e.g. constructor initialize lists),
but as general rule, I would use the aggregate initializer form,
if possible. Since a POD class type is by definition also an
aggregate, it is usually possible (to use the aggregate
initializer list). The reason I recommend this is to ease
maintenance. It is very easy for a POD class type to be turned
into a non-POD class type during maintenance. If you have been
using aggregate initializers, then either your code will continue
to compile, and work (it is possible to have an aggregate that is
not a POD), or it will no longer compile. On the other hand, if
you were depending upon the explicit default initialization
syntax, then your code will still compile, but it will have
silently changed meaning -- when X becomes a non-POD class type,
then "X()" becomes an explicit default constructor
invocation. If the maintenance didn't add a default constructor,
then the compiler will synthesize an empty one. This will switch

X a = X();

from being a statement that zero
initializes all data members of object 'a', to a statement that
leaves all data members of 'a' un-initialized.