Что быстрее - вызвать виртуальную функцию или нормальную?

In theory, the overhead of dynamic binding is highly dependent on the compiler, operating system, and machine. In practice, almost all compilers do it the same way, and the overhead is very small.

A virtual function call typically costs 10% to 20% more than a nonvirtual function call. The overhead is smaller if there are several parameters, since the dynamic binding part of a virtual function call has constant cost. In practice, the overhead for the linkage of a function call is usually a very small percentage of the cost of the work that gets done, so the cost for a virtual function call is about the same as the cost for a normal function call.

For example, if a system or application spends 5% of its CPU utilization performing the linkage for function calls, and 25% of those calls are converted to virtual function calls, the additional overhead will be 10% of 25% of 5%, or around one-tenth of one percent overall.

If you can afford a normal function call, you can almost always afford a virtual function call.

Как в C++ реализована статическая типизация?

Static typing ensures that all declarations, definitions, and uses of a virtual function are consistent while dynamic binding provides the "plumbing" so that the right implementation is called at runtime.

Given a reference (or pointer) to an object, there are two distinct types in question: the static type of the reference and the dynamic type of the referent (that is, the object being referred to). In other words, the object may be an instance of a class that is derived from the class of the reference. Nonvirtual (statically bound) member functions are selected based on the (statically known) type of the reference. Virtual (dynamically bound) member functions are selected based on the (dynamically known) type of the referent.

The legality of the call is checked based on the (statically known) type of the reference or pointer. This is safe because the referent must be "at least as derived as" the type of the reference. This provides the following type safety guarantee: if the class of the reference has the indicated member function, then the class of the referent will as well.

Может ли деструктор быть виртуальным?

Yes, many destructors should be virtual.

Virtual destructors are extremely valuable when some derived classes have specified cleanup code. A practical, easy-to-remember guideline: if a class has any virtual functions, it should have a virtual destructor. The rationale for this is that if a class has no virtual functions, chances are the class designer wasn't planning on the class being used as a base class, so a virtual destructor is unnecessary.

Furthermore, on most compilers there is no additional per-object space cost after the first virtual function, so there is very little reason not to make the destructor virtual if the class already has at least one virtual function.

Note that this guideline is not precise enough for every circumstance, but the precise rule is much harder to remember. Here is the more precise rule: if any derived class (or any data member and/or base class of any derived class, or any base class of any data member of any data member of any derived class, or any data member of any base class of any data member of any derived class and all other recursive combinations of base classes and data members) has (or will ever have) a nontrivial destructor, and if any code anywhere deletes (or will ever delete) that derived class object via a base class pointer, then the base class's destructor needs to be virtual.

Для чего нужен виртуальный деструктор?

A virtual destructor causes the compiler to use dynamic binding when calling the destructor.

A destructor is called whenever an object is deleted, but there are some cases when the user code doesn't know which destructor should be called. For example, in the following situation, while compiling unawareOfDerived(Base*), the compiler doesn't even know that Derived exists, much less that the pointer base may actually be pointing at a Derived.

Because Base::~Base() is nonvirtual, only Base::~Base() is executed and the Derived destructor will not run. This could be a very serious error, especially if the Derived destructor is supposed to release some precious resource such as closing a shared file or unlocking a semaphore.

The solution is to put the virtual keyword in front of Base's destructor. Once that is done, the compiler dynamically binds to the destructor, and thus the right destructor is always called:

class Base {
public:
virtual ~Base();
};

Что такое виртуальный конструктор?

The virtual keyword cannot be applied to a constructor since a constructor turns raw bits into a living object, and until there is a living object against which to invoke a member function, the member function cannot possibly work correctly. Instead of thinking of constructors as normal member functions on the object, imagine that they are static member functions (see FAQ 16.05) that create objects.

Even though constructors cannot actually be virtual, a very simple idiom can be used to have the same effect. This idiom, called the virtual constructor idiom, allows the creation of an object without specifying the object's exact type. For example, a base class can have a virtual clone() const member function (for creating a new object of the same class and for copying the state of the object, just like the copy constructor would do) or a virtual createSimilar() const member function (for creating a new object of the same class, just as the default constructor would do).

Following is an example of this idiom (the return type is an auto_ptr to help prevent memory leaks and wild pointers; see FAQ 32.01).

In Circle::createSimilar() const and Circle::clone() const, the kind-of relationship allows the conversion from a Circle* to a Shape*, then the Shape* is converted to an auto_ptr<Shape> (that is, to a ShapePtr) by the auto_ptr's constructor. In Circle::clone() const, the expression newCircle(*this) calls Circle's copy constructor, since *this has type const Circle& inside a const member function of class Circle.

Users can use clone and/or createSimilar as if they were virtual constructors. An example follows.

Как из конструктора или деструктора вызвать виртуальную функцию ?

Use the scope operator, ::.

For example, if a constructor or a destructor of class Base calls a virtual function this->f(), it should call it using Base::f() rather than merely f().

In our experience, this guideline reduces the probability that misunderstandings will introduce subtle defects, since it forces developers to explicitly state what the compiler is obliged to do anyway. In particular, when a constructor invokes a virtual member function that is attached to this object, the language guarantees that the member function that is invoked is the one associated with the class of the constructor, even if the object being constructed will eventually be an object of a derived class that has its own version of the virtual function. An analogous statement can be made for calling a virtual function from a destructor.

The initialization list of Derived::Derived() calls Base::Base(), even if Base() isn't explicitly specified in the initialization list. During Base::Base(), the object is merely a Base object, even though it will eventually be a Derived object (see FAQ 20.14). This is why Base::f() is called from the body of Base::Base(). During the body of Derived::Derived(), however, Derived::f() is called. The output of this program is as follows.

Since developers are often somewhat surprised by this language feature, we recommend that such calls should be explicitly qualified with the scope operator, ::.

Нужно ли двоеточие :: при вызове виртуальной функции?

Only from derived classes, constructors, or destructors.

The purpose of the scope operator is to bypass the dynamic binding mechanism. Because dynamic binding is so important to users, user code should generally avoid using ::. For example, the following prints Base::f() even though the object is really a Derived.

(2)Generally this is better for nonstatic member functions called from user code

Что такое чисто виртуальная функция ?

A pure virtual member function is a member function that the base class forces derived classes to provide. Normally these member functions have no implementation; but see FAQ 21.11.

A pure virtual member function specifies that a member function will exist on every object of a concrete derived class even though the member function is not (normally) defined in the base class. This is because the syntax for specifying a pure virtual member function forces derived classes to implement the member function if the derived classes intend to be instantiated (that is, if they intend to be concrete).

For example, all objects of classes derived from Shape will have the member function draw(). However, because Shape is an abstract concept, it does not contain enough information to implement draw(). Thus draw() should be a pure virtual member function in Shape.

class Shape {
public:
virtual void draw() = 0;
};

This pure virtual function makes Shape an abstract base class (ABC). Imagine that the "= 0" is like saying "the code for this function is at the NULL pointer."

Pure virtual member functions allow users to write code against an interface for which there are several functionally different variants. This means that semantically different objects can be passed to a function if these objects are all under the umbrella of the same abstract base class.

Yes, but new C++ programmers don't usually understand what it means, so this practice should be avoided if the organization rotates developers.

If the goal is to create a member function that will be invoked only by derived classes (such as sharing common code in the abstract base class), create a protected: nonvirtual function instead of using this feature. If the goal is to make something that may be callable from user code, create a distinctly named member function so that users aren't forced to use the scope operator, ::.

The exception to this guideline is a pure virtual destructor in an ABC (see FAQ 21.13).

Как определить пустой виртуальный деструктор?

It should normally be defined as an inline virtual function. An example follows.

The reason Base::~Base() is inline is to avoid an unnecessary function call when Derived::~Derived() automatically calls Base::~Base() (see FAQ 20.05). In this case, Derived::~Derived() is synthesized by the compiler.

Leaving out a definition for Base::~Base() will cause a linker error, because Derived::~Derived() automatically calls Base::~Base() (see FAQ 20.05). In this case, Derived::~Derived() is synthesized by the compiler.

Depending on the compiler, there may be a marginal performance benefit in using a pure virtual destructor with an explicit inline definition versus the inline virtual technique that was described in the previous FAQ. Calls to inline virtual functions can be inlined if the compiler is able to statically bind to the class. However, the compiler may also make an outlined copy of an inline virtual function (for any other cases where it isn't able to statically bind to the call). Although in theory destructors of ABCs don't have these limitations, in practice not all compilers produce optimal code when using the inline virtual technique.

Как предотвратить создание дубликатов inline virtual functions?

If a class has one or more virtual functions (either inherited or first-declared in that class), then the class should have at least one non-inline virtual function.

Many compilers use the location of the first non-inline virtual function to determine the source file that will house the class's magical stuff (the virtual table, out-lined copies of inline virtual functions, and so on). If all of the class's virtual functions are defined inline, these compilers may put a static copy of a class's magical stuff in every source file that includes the class's header file.

Note that this advice is fairly sensitive to the compiler. Some compilers won't generate copies of the magical stuff even if all the virtual functions in a class are inline. But even in these compilers, it doesn't cost much to ensure that at least one of the class's virtual functions is non-inline.

If the base class has a virtual destructor, the destructor in the derived class will also be virtual, and, unless specified otherwise, will be inline.

The safest bet is to give every derived class at least one non-inline virtual function (assuming the base class has a virtual destructor). To show how subtle this can be, consider this trivial example.

Even though no Base or Derived objects are created, the preceding example will fail to link on many systems. The reason is that the only virtual function in class Derived is inline (Derived::~Derived() is a synthesized inline virtual function), so the compiler puts a static copy of Derived::~Derived() into the current source file. Since this static copy of Derived::~Derived() invokes Base::~Base() (see FAQ 20.05) the linker will need a definition of Base::~Base().

Adding a non-inline virtual function to a derived class (for example, thisDoesNothing()) eliminates the linker errors for that derived class, because the compiler puts the (only) copy of the magical stuff into the source file that defines the non-inline virtual function.

FAQ 22.01 What are constructor initialization lists?

Constructor initialization lists are the best way to initialize member objects.

All member objects are initialized before the body of the constructor begins executing. Constructor initialization lists allow the class to exercise control over the construction of the member objects before the execution of the constructor.

(2)Note: No assignment here since the init list copies name into name_

FAQ 22.02 What will happen if constructor initialization lists are not used?

Initialization lists are usually a performance issue, although there are cases when initialization lists impact correctness (see FAQ 22.03 and 22.04). In some cases the code of a constructor can be three times slower if initialization lists are not used. For example, consider the following Person class.

The following implementation of the constructor initializes member object name_ using an initialization list. From a performance perspective, it is important to note that the result of the + operator is constructed directly inside member object name_. A temporary object is not needed in this case, and most compilers do not produce an extra temporary object. This typically requires one allocation of memory from the heap and one copy of the data from each string.

In contrast, the following constructor sets up member object name_ using assignment. In this case the default constructor (see FAQ 20.08) may have allocated a small amount of memory (many string classes store a '\0' byte even in cases when the string is empty); then that memory is immediately discarded in the assignment operator. In addition, the compiler will probably have to create a temporary object, and this temporary object is passed into the name_ object's assignment operator; then the temporary is destructed at the ;. That's inefficient. All together, this constructor might make three calls to the memory allocation routines (two allocations, one deallocation) and might copy the string's data twice (once into the temporary and once into name_).

Conclusion: All other things being equal, code will run faster with initialization lists than with assignment.

FAQ 22.03 What's the guideline for using initialization lists in constructor definitions?

As a general rule, all member objects and base classes should explicitly appear in the initialization list of a constructor. In addition to being more efficient than default initialization followed by assignment, using the initialization list makes the code clearer since it takes advantage of something that the compiler is going to do anyway.

Note that there is no performance gain in using initialization lists with member objects of built-in types, but there is no loss either, so initialization lists should be used for symmetry.

FAQ 22.04 Is it normal for constructors to have nothing inside their body?

Yes, this happens frequently.

The body of a constructor is the {...} part. A constructor should initialize its member objects in the initialization list, often leaving little or nothing to do inside the constructor's body. When the constructor body is empty, it can be left empty, perhaps ({ }), or decorated with a comment such as

Notice that the initialization list resides in the constructor's definition and not its declaration (in this case, the declaration and the definition are separate).

FAQ 22.05 How is a const data member initialized?

Nonstatic const data members are declared in the class body with a const prefix, and their state must be initialized in the constructor's initialization list. The value used to initialize the const data member can be a literal value, a parameter passed to the constructor, or the result of some expression. After initialization, the state of a const data member within a particular object cannot change, but each object can initialize its const data member to a different value.

In the following example, i_ is a non-const member variable and j_ is a const member variable.

Be sure to avoid binding a reference data member to an object passed to the constructor by value (for example, if parameter i were passed by value), since the reference (i_) would refer to a temporary variable allocated on the stack. This would create a dangling reference since value parameters disappear as soon as the function (the constructor in this case) returns.

Depending on the phase of the moon, a dangling reference might crash the program.

FAQ 22.07 Are initializers executed in the same order in which they appear in the initialization list?

Not necessarily.

C++ constructs objects by initializing the subobjects of immediate base classes in the order the base classes appear in the class declaration, then initializing member objects in the order they appear in the class body layout. It uses this ordering so that it can guarantee that base class subobjects and member objects are destructed in the opposite order from which they are constructed. Member objects are destructed in the reverse order of the class body layout, then subobjects of immediate base classes are destructed in the reverse order they appear in the base class list in the class declaration. The order of the initialization list is irrelevant.

The following example demonstrates the fact that initialization order is tied to the order of the class layout rather than to the order of the initialization list. First, class Noisy prints a message during its constructor and destructor.

The constructor of class Fred lists its three Noisy objects in a different order than the one in which they are actually initialized. The important thing to notice is that the compiler ignores the order in which members show up in the initialization list:

The constructor's initialization list order is (b_, a_, base-class), but the class body layout order is the opposite: (base-class, a_, b_). The output of this program demonstrates that the initialization list's order is ignored:

Even though the order of initializers in a constructor's initialization list is irrelevant, see the next FAQ for a recommendation.

FAQ 22.08 How should initializers be ordered in a constructor's initialization list?

Immediate base classes (left to right), then member objects (top to bottom).

In other words, the order of the initialization list should mimic the order in which initializations take place. This guideline discourages a particularly subtle class of order dependency errors by giving an obvious, visual clue. For example, the following contains a hideous error.

Note that y_ is used (Y::f()) before it is initialized (Y::Y()). If the guideline espoused by this FAQ was employed, the error would be more obvious: the initialization list of Z::Z() would have read x_(y_), y_(), visually indicating that y_ was being used before being initialized.

Not all compilers issue diagnostic messages for these cases.

FAQ 22.09 Is it moral for one member object to be initialized using another member object in the constructor's initialization list?

Yes, but exercise great caution.

In a constructor's initialization list, it is best to avoid using one member object from the this object in the initialization expression of a subsequent initializer for the this object. This guideline prevents subtle order dependency errors if someone reorganizes the layout of member objects within the class (see the previous FAQ).

Because of this guideline, the constructor that follows uses s.len_ + 1 rather than len_ + 1, even though they are otherwise equivalent. This avoids an unnecessary order dependency.

An unnecessary order dependency on the class layout of len_ and data_ would have been introduced if the constructor's initialization of data_ had used len_+1 rather than s.len_+1. However using len_ within a constructor body ({...}) is okay. No order dependency is introduced since the entire initialization list is guaranteed to finish before the constructor body begins executing.

FAQ 22.10 What if one member object has to be initialized using another member object?

Comment the declaration of the affected data members with the comment //ORDER DEPENDENCY.

If a constructor initializes a member object of the this object using another member object of the this object, rearranging the data members in the class body could break the constructor (see FAQ 22.08). This important maintenance constraint should be documented in the class body. For example, in the constructor that follows, the initializer for data_ uses len_ to avoid a redundant call to strlen(s), thus introducing an order dependency in the class body.

Note that the //ORDER DEPENDENCY comment is attached to the affected data members in the class body, not to the constructor initialization list. This is because the order of member objects in the class body is critical; the order of initializers in the constructor initialization list is irrelevant (see FAQ 22.07).

FAQ 22.11 Are there exceptions to the rule "Initialize all member objects in an initialization list"?

Yes, to facilitate argument screening.

Arguments to constructors sometimes need to be checked (or screened) before they can be used to initialize a member object. When it becomes difficult to squeeze the resultant if (...) throw ... logic into the initialization list, it may be more convenient to initialize the member object via its default constructor, then modify its state in the constructor body ({...}) via assignment or some other mutative member function.

This situation is usually limited to classes that are built directly on built-in types (int, char*, and so forth), because constructors for user-defined (class) types normally check their own arguments.

For example, in the preceding FAQ, MyString::MyString(const char*) passed its parameter to strlen(const char*) without verifying that the pointer was non-NULL. This test can be implemented by using assignment in the constructor.

Using assignment rather than initialization tends to remove order dependencies. For example, MyString::MyString(const char*) no longer introduces an order dependency in the member data of class MyString. However, doing this may introduce performance penalties if the member objects are user-defined (class) types.

FAQ 22.12 How can an array of objects be initialized with specific initializers?

Why use arrays in the first place? Why not use containers, particularly from the standard library? If arrays are a must, and if the elements require specific initializers, the answer is the {...} initializer syntax.

FAQ 23.01 Are overloaded operators like normal functions?

Yes, overloaded operators are syntactic sugar for normal functions.

Operator overloading allows existing C++ operators to be redefined so that they work on objects of user defined classes. Overloaded operators are syntactic sugar for equivalent function calls. They form a pleasant facade that doesn't add anything fundamental to the language (but they can improve understandability and reduce maintenance costs).

For example, consider the class Number that supports the member functions add() and mul(). Using named functions (that is, add() and mul()) makes sample() unnecessarily difficult to read, write, and maintain.

FAQ 23.02 When should operator overloading be used?

When it makes sense to users.

The goal of operator overloading should be to improve the readability of code that uses a class. However, it should be used only in ways that are semantically familiar to users. For instance, it would be nonintuitive to use operator+ for subtraction.

The ultimate goal is to reduce both the learning curve and the defect rate for users of a class. Another related goal is to enable users to program in the language of the problem domain rather than in the language of the machine.

Here are a few examples of operator overloading that are intuitive.

myString + yourString might concatenate two string objects.

myDate++ might increment a Date object.

a * b might multiply two Number objects.

a[i] might access an element of an Array object.

x = *p might dereference a "smart pointer" (see FAQ 31.09) that acts as if it points to a disk record. The actual implementation could use a database lookup to get the value of record x.

While it is true that operator overloading can be overutilized (by trying to define everything as an operator), it can also be underutilized. For some reason, some developers hate to implement overloaded operators in their classes. They don't do it even in places where it should be done. One reason some developers don't do it is because they're not used to itthey don't do it every day, so they're not comfortable with it. Another reason they don't do it is because Java doesn't have operator overloading (as if that were a reason to not do something in C++). And another reason they don't do it is because they think that it makes their code ugly by adding all those member functions with funny operator names.

Our response to these concerns (in particular the last one) is go back to two of the central tenets of this book: "Design classes from the outside in" and "Think of your users, not yourself" when designing interfaces. In particular, if overload operators make sense to the users of the library/class or their code will be easier to understand and maintain, then the developer should define overloaded operators.

FAQ 23.03 What operators can't be overloaded?

The only C++ operators that can't be overloaded are dot (.), .*, arithmetic if (?:), size (sizeof), typeid, and ::.

Here's an example of an array-like class without operator overloading.

FAQ 23.04 Is the goal of operator overloading to make the class easier to understand?

The goal is to help the users of a class rather than the developer of a class. There may be lots of users of a class, so this is an example of leverage: the good of the many outweighs the good of the few.

When programmers think only about themselves and the class they are writing, operator overloading seems to make matters worse. For example, class Array2 in the previous FAQ has more symbols and clutter than the Array version that didn't have operator overloading. However when programmers think about the overall complexity of the application, they see that operator overloading can help. For example, all the code written by all the users of Array2 will probably be a lot easier to understand than the equivalent code written using class Array (compare sample2() with sample() in the previous FAQ).

FAQ 23.05 Why do subscript operators usually come in pairs?

They usually occur in pairs so that users can access elements of a const object.

Classes that have a subscript operator often have a pair of subscript operators: a const version and a non-const version. The const version normally returns the element by value or by const reference, and the non-const version normally returns the element by non-const reference. The code for the two versions is normally quite similar if not identical.

For example, the following class has two subscript operators. It represents an Array of Fred objects. Class out_of_range is the standard exception class that is thrown if an argument is out of bounds.

When a user accesses an element of an Array via a reference-to-non-const (for example, via Array& a; see a[3] in the following code), the compiler generates a call to the non-const subscript operator. Similarly, when a user accesses an element of an Array via a reference-to-const (for example, via const Array& b; see b[3] below), the compiler generates a call to the const subscript operator. Since the non-const subscript operator returns the element by non-const reference, things like a[3] can appear on the left side of an assignment operator. Conversely, since the const subscript operator returns the element by const reference, things like b[3] cannot appear on the left side of an assignment operator:

FAQ 23.06 What is the most important consideration for operators such as +=, +, and =?

Respect the user's intuition and expectations.

In classes that define +=, +, and =, the expressions a += b and a = a + b should generally have the same observable behavior. Similar comments can be made for the other identities of the built-in types. For example, if the class of a defines both a += operator that can be passed an int and the prefix ++ operator, then a += 1 and ++a should have the same observable behavior.

Similarly if the class of p defines a subscript operator and a + operator that can take i on the right side, and a dereference operator, then p[i] and *(p+i) should be equivalent.

One way to enforce these rules is to implement constructive binary operators using the corresponding mutative operators. This also simplifies maintenance. For example, the code below implements + using +=.

The two versions of operator-- are similar: the prefix version takes no parameters and the postfix version takes a single parameter of type int.

FAQ 23.08 What should the prefix and postfix versions of operator++ return?

++i should return a reference to i; i++ should return either void or a copy of the original state of i.

The prefix version, operator++(), should return *this, normally by reference. For example, if the class of the object is Fred, Fred::operator++() should normally return *this as a Fred&. It is also valid, but not as desirable, if it returns void.

The postfix version, Fred::operator++(int), should return either nothing (void) or a copy of the original state of the object, *this. In any event, it should not return a Fred& because that would confuse users. For example, if i++ returned *this by reference, the value returned from i++ would be the same as the value returned from ++i. That would be counterintuitive.

It is often easiest if Fred::operator++(int) is implemented in terms of Fred::operator++():

Note that users should avoid i++ in their code unless the old state of i is needed. In particular, simple statements such as i++; (where the result of i++ is ignored) should generally be replaced by ++i;. This is because i++ may cause more overhead than ++i for user-defined (classes) types. For example, calling i++ may create an unnecessary copy of i. Of course, if the old value of i is needed, such as in j = i++, the postfix version is beneficial.

The two versions of operator-- are similar: the prefix version should return *this by reference, and the postfix version should return either the old state of *this or void.

FAQ 23.09 How can a Matrix-like class have a subscript operator that takes more than one subscript?

It should use operator() rather than operator[].

When multiple subscripts are needed, the cleanest approach is to use operator() rather than operator[]. The reason is that operator[] always takes exactly one parameter, but operator() can take any number of parameters. In the case of a rectangular Matrix-like class, an element can be accessed using an (i,j) pair of subscripts. For example,

FAQ 23.10 Can a ** operator serve as an exponentiation operator?

No, it can't.

The names, precedence, associativity, and arity (number of arguments) of operators are predefined by the language. There is no ** operator in the C++ language, and it is not possible to add the ** operator to the C++ language.

In fact, the expression x ** y is already syntactically legal C++. It means, x * (*y) (that is, y is treated like a pointer that is dereferenced). If C++ allowed users to provide new meaning to **, the compiler's lexical analyzer (the lowest-level operation in the compiler) would need to be contextually dependent on the semantic analyzer (the highest-level operation in the compiler). This would probably introduce ambiguities and break existing code.

Operator overloading is merely syntactic sugar for function calls. Although syntactic sugar is sweet, it is not fundamentally necessary. Raising a number to a power is best performed by overloading pow(base, exponent), a double precision version of which can be found in the <cmath> header file.

Another candidate for an exponentiation operator is operator^, but it has neither the proper precedence nor the proper associativity.

Don't force-fit the semantics of an overloaded operator.

FAQ 24.01 What should assignment operators return?

Assignment operators should generally return *this by reference. This means that they adhere to the same convention used by the built-in types by allowing assignment to be used as an expression rather than simply as a statement. This allows assignment to be cascaded into larger expressions. An example follows.

FAQ 24.02 What is wrong with an object being assigned to itself?

Nothing, unless the programmer who developed the class failed to implement the assignment operator correctly.

Assigning an object to itself is called self-assignment.

No one intentionally assigns an object to itself (a = a). However, since two different pointers or references could refer to the same object (aliasing), a statement like a = b could assign an object to itself. If that happens, and if the object doesn't properly handle it, a disaster could occur, especially if remote ownership is involved (see FAQ 30.08). An example follows.

Even though the code of sample() doesn't appear to be assigning an object with itself, the two references a and b could (and in this case, do) refer to the same object. Unfortunately, the assignment operator fails to check for self-assignment, which means that the statement delete p_ deletes both this->p_ and f.p_. Yet the next line uses the deleted object *f.p_, meaning that the program is using a dangling reference. Depending on the contents of class Wilma, this could be a disaster.

FAQ 24.03 What should be done about self-assignment?

The programmer who creates the class needs to make sure self-assessment is harmless. The simplest way to make sure that self-assignment is harmless is to perform an if test such as the one shown in the following example.

Self-assignment can also be rendered harmless by efficiency considerations. For example, the following assignment operator replaces the allocated memory only if the old allocation is too small to handle the new state. In this class, self-assignment would automatically be handled since it would skip the allocate/deallocate steps:

The goal is to make self-assignment harmless, not to make it fast. It doesn't make sense to optimize self-assignment since self-assignment is rare. For example, when self-assignment actually occurs, the assignment operator would unnecessarily call memcpy(). Although that call could be removed by an extra if test, the extra if test would put more overhead on the normal path in an attempt to optimize the pathological case. So in this case the right trade-off is to avoid the extra if test and put up with an unnecessary memcpy() in the case of self-assignment.

FAQ 24.04 Should an assignment operator throw an exception after partially assigning an object?

The assignment operator should either completely succeed or leave the object unchanged and throw an exception.

This gives callers some strong assurances: if the assignment operator returns normally (as opposed to throwing an exception), the caller is assured that the assignment was completely successful; if the assignment operator throws an exception, the caller is assured that the object was left in a consistent state.

Sometimes achieving this goal means that the object's new state must be copied into some temporary variables; then, after all potential error conditions are bypassed, the state can be copied into the this object.

In any event, it would be a mortal sin to leave the object in a corrupt state when an exception is thrown. For example, if an exception is thrown while evaluating new Wilma(*f.p_) (that is, either an out-of-memory exception or an exception in Wilma's copy constructor), this->p_ will be a dangling pointerit will point to memory that has already been deleted.

The easiest way to solve this problem is simply to reverse the order of the new and delete lines. That is, to allocate the new Wilma(*f.p_) first, storing the result into a temporary Wilma*, then to do the delete p_ line, and finally to copy the temporary pointer into p_:

Note that reversing the order of the new and delete statements has the beneficial side effect of making self-assignment harmless even without the explicit if test at the beginning of the assignment operator. That is, if the initial if test is removed and someone does a self-assignment, the only thing that happens is an extra copy of a Wilma object. Since making an extra copy of a Wilma object shouldn't generally cause problems (especially since the caller was expecting a copy to be made during assignment anyway), the if test can be removed, thus simplifying the code and improving the performance in the normal (non-self-assignment) case. Remember: the goal is to make self-assignment harmless, not to make it fast. Never optimize the pathological case (self-assignment) at the expense of the normal case (non-self-assignment). (See FAQ 24.03.)

FAQ 24.05 How should the assignment operator be declared in an ABC?

The assignment operator of an ABC should generally be protected:.

By default, assignment operators for all classes are public:. For ABCs, this default should usually be changed to protected: so that attempts to assign incompatible objects are trapped as compile-time errors. An example follows.

Note that the protected: assignment operator assigns the internal state of one Shape to another Shape. If it didn't assign the Shape's p_ member object, each derived class would have to define an explicit assignment operator in order to do what the base class's assignment operator should have done in the first place.

Also note that the protected: assignment operator can return void instead of the usual *this (see FAQ 24.07).

FAQ 24.06 When should a user-defined assignment operator mimic the assignment operator that the compiler would generate automatically?

When an ABC defines an assignment operator merely to make it protected:.

When a protected: assignment operator contains the same code that the compiler would have synthesized automatically, its only purpose is to prevent users from assigning to an object of the class. This is common with abstract base classes. Here's an example.

Note that such an assignment operator does not automatically trigger the Law of the Big Three (see FAQ 30.13).

FAQ 24.07 What should be returned by private: and protected: assignment operators?

Either return a reference to the this object, or make the return type void.

Assignment operators that are private: or protected: needn't return *this because such operators have very few users, and therefore the advantage of returning *this is limited.

Assignment operators are often declared as private: to prevent users from assigning objects of the class (see FAQ 30.13). They are often left undefined to prevent being accidentally called by a member function or a friend function.

Assignment operators are often declared as protected: in abstract base classes to ensure that assignment doesn't occur when the destination is a reference to an abstract class (for example, assigning a circle to a square when both are referenced by a Shape&; see FAQ 24.05).

FAQ 24.08 Are there techniques that increase the likelihood that the compiler-synthesized assignment operator will be right?

Following a few simple rules helps the compiler to synthesize assignment operators that do the right thing. Without an assignment operator discipline, developers will need to provide an explicit assignment operator for too many classes, because the compiler-synthesized version will be incorrect an unnecessarily large percentage of the time.

The following FAQs provide guidelines for an assignment operator discipline that we have found to be effective and practical.

FAQ 24.09 How should the assignment operator in a derived class behave?

An assignment operator in a derived class should call the assignment operator in its direct base classes (to assign those member objects that are declared in the base class), then call the assignment operators of its member objects (to change those member objects that are declared in the derived class). These assignments should normally be in the same order that the base classes and member objects appear in the class's definition. An example follows.

Typically, a Derived::operator= shouldn't access the member objects defined in a base class; instead it should call its base class's assignment operator. Nor should a Base::operator= access member objects defined in a derived class (that is, it usually shouldn't call a virtual routine, like copyState(), to copy the derived class's state).

If a Base::operator= tried to copy a derived class's state via a virtual function, the compiler-synthesized assignment operators in the derived classes would be invalidated. This requires defining an explicit assignment operator in an unnecessarily large percentage of the derived classes. This added work often negates any common code that is shared in the base class's assignment operator.

For example, suppose Base defines Base::operator= (const Base& b), and this assignment operator calls virtual function copyFrom(const Base&). If the derived class Derived overrides copyFrom(const Base&) to change the entire abstract state of the Derived object, then the compiler-synthesized implementation of Derived::operator= (const Derived&) is likely to be unacceptable: the compiler-synthesized Derived::operator= (const Derived&) would call Base::operator= (const Base&), which would call back to Derived:: copyFrom(const Base&); after returning, the Derived state would be assigned a second time by Derived::operator= (const Derived&).

At best, this is a waste of CPU cycles because it reassigns the Derived member objects. At worst, this is semantically incorrect, because special changes made during Derived::copyFrom(const Base&) may get wiped out when the Derived member objects are subsequently assigned by Derived::operator= (const Derived&).

FAQ 24.10 Can an ABC's assignment operator be virtual?

An ABC's assignment operator can be virtual only if all derived classes of the ABC will be assignment compatible with all other derived classes and if the developer is willing to put up with a bit of extra work. This doesn't happen that often, but here's how to do it.

Classes derived from a base class are assignment compatible if and only if there's an isomorphism between the abstract states of the classes. For example, the abstract class Stack has concrete derived classes StackBasedOnList and StackBasedOnArray. These concrete derived classes have the same abstract state space as well as the same set of services and the same semantics. Thus, any Stack object can, in principle, be assigned to any other Stack object, whether or not they are instances of the same concrete class.

If all classes derived from an ABC are assignment compatible with all other derived classes from that ABC, there are two choices: when a user has a reference to the ABC, either prevent assignment or make it work correctly.

It is easiest on the class implementer to prevent assignment when the user has a reference to the base class. This is done by making the base class's assignment operator protected:. The disadvantage of this approach is that it restricts users from assigning arbitrary pairs of objects referred to by Stack references (that is, by Stack&).

The other choice is to make assignment work correctly when the user has a reference to the base class. This is done by making the base class's assignment operator public: and virtual. This approach allows any arbitrary Stack& to be assigned with any other Stack&, even if the two Stack objects are of different derived classes. The base class version of the assignment operator must be overridden in each derived class, and these overrides should copy the entire abstract state of the other Stack into the this object.

Note that the override (StackArray::operator= (const Stack&)) returns a StackArray& rather than a mere Stack&. This is called a covariant return type.

FAQ 24.11 What should a derived class do if a base class's assignment operator is virtual?

The developer should probably override the base class's assignment operator and also provide an overloaded assignment operator. For example, when base class B declares B::operator= (const B&) to be virtual, a publicly derived class D should provide both the override (D::operator= (const B&)) and the overload (D:: operator= (const D&)).

Because the compiler resolves which override to call based on the static type of the parameters, the first assignment in the following example is the only one that calls the assignment operator that takes a D; all the others end up calling the assignment operator that takes a B. Because b in sample() is actually of class D, and because the base class's assignment operator is virtual, all four assignments call one of the assignment operators from the derived class.

The last two calls resolve to the override (D::operator= (const B&)) because the actual class of b in sample()is D. If b had actually been a B, the last two calls would have resolved to B::operator= (const B&). Naturally, these calls could also resolve to some other override if the object's actual class had been some other derived class that provided an override.

Note that D::operator= (const B& b) does not detect that its parameter b is of class D.

FAQ 24.12 Should the assignment operator be implemented by using placement new and the copy constructor?

It's a well-known trap!

It is tempting to avoid duplicate code for the assignment operator for class X by trying something like this.

There are many problems with this approach. rhs will be sliced whenever it is not of type X, and the dtor-new-ctor sequence does not bode well for performance. Worst of all, consider what this does to future classes derived from X, even if operator=() isn't declared virtual in a base class of X (which introduces issues of its own).

It's good to minimize duplicate code, but the smart way to do it is to put common code into a private: member function that can be used by both the copy ctor and assignment operator. That's a safe way to reuse code; the approach in the example should be avoided.

FAQ 25.01 What is the purpose of templates?

Templates share source code among structurally similar families of classes and functions.

Many data structures and algorithms can be defined independently of the type of data they manipulate. A template allows the separation of the type-dependent part from the type-independent part. The result is a significant amount of code sharing.

A template is like a cookie cutter: all the cookies it creates have the same basic shape, though they might be made from different kinds of dough. A class template describes how to build classes that implement the same data structure and algorithm, and a function template describes how to build functions that implement the same algorithm.

In other languages, these facilities are sometimes called parameterized types or genericity.

Prior to templates, macros were used as a means of implementing generics. But the results were so poor that templates have superceded them.

FAQ 25.02 What are the syntax and semantics for a class template?

The syntax of a class template is the keyword template, some template parameters, then something that looks a lot like a class. But semantically a class template is not a class: it is a cookie cutter to create a family of classes.

Consider a container class (see FAQ 2.15). In practice, the C++ source code for a container that holds ints is structurally very similar to the C++ source code for a container that holds strings. The resulting binary machine code is probably quite different, since, for example, copying an int requires different machine instructions than does copying a string. Trying to make the binary machine code the same might impose runtime overhead to generalize, for example, the copying operations for int and string and might also increase the complexity of the container.

Class templates give programmers another option: capturing the source code similarity without imposing extra runtime performance overhead. That is, the compiler generates special purpose code for containers of int, containers of string, and any others that are needed.

For example, if someone desired a container that acted like an array, in practice they would probably use the standard class template vector<T>. However, for illustration purposes we will create a class template Array<T> that acts like a safe array of T.

The template<class T> part indicates that T represents a yet unspecified type in the class template definition. Note that the keyword class doesn't imply that T must be a user-defined type; it might be a built-in type such as int or float.

The C++ Standard defines the term instantiated class to mean the instantiation of a class template, but we will use the term instantiation of a class template instead, since most C++ programmers think of an instantiated class as an object rather than another class. When it doesn't matter whether it is a class template or a function template, we will drop the qualifying adjective and refer to the instantiation of a template.

Normally the compiler creates an instantiation of a class template when the name of a class template is followed by a particular sequence of template arguments. In this case, the only template argument is a type. The compiler generates code for the instantiated template by replacing the template argument T with the type that is supplied, such as int.

FAQ 25.03 How can a template class be specialized to handle special cases?

Use explicit specialization.

Sometimes a programmer wants the compiler to bypass the class template when creating an instantiation of a class template for a particular type and use a specialized class template instead. For example, suppose that an array of bits is needed. The natural thing to do is create an Array<bool> using the template class from FAQ 25.02.

#include "Array.hpp"
int main()
{
Array<bool> ab;
ab[5] = true;
}

If the previously defined Array template were used to generate the code for this class, it would end up creating an array of bool which would, at best, be optimized to be an array of bytes. Clearly a bit array would be more space-efficient than a byte array. This more space-efficient implementation can be created by defining class Array<bool> as an explicit specialization of the class template Array. Notice how class Array<bool> uses a bit array rather than a byte array.

Explicit specializations are often used to take advantage of special properties of the type T and achieve space and/or speed benefits that could not be achieved using the generic class template.

It is normally best to define the explicit specialization (for example, Array<bool>) in the same header that defines the template itself (for example, the same header that defines Array<T>). That way the compiler is guaranteed to see the explicit specialization before any uses of the specialization occur.

FAQ 25.04 What are the syntax and semantics for a function template?

The syntax of a function template is the keyword template, some template parameters, then something that looks a lot like a function. But semantically a function template is not a function: it is a cookie cutter to create a family of functions.

Consider a function that swaps its two integer arguments. Just as with Array in the preceding example, repeating the code for swap() for swapping float, char, string, and so on, will become tedious. A single function template is the solution.

As with class templates, a programmer can get the compiler to bypass the function template when creating a template function: the programmer simply needs to manually create a specialized template function.

FAQ 25.05 Should a template use memcpy() to copy objects of its template argument?

No.

An object should be bitwise copied only when it is known that the class of the object will forever be amenable to bitwise copy. But the class of a template argument can't be known. Here is an example.

If a template uses memcpy() to copy some T objects, the template must have a big, fat, juicy comment warning potential users that a class with nontrivial copy semantics might destroy the world. For example, if memcpy() were used in the example class template, and if someone created an Array<string>, it is likely that the memcpy() would create dangling references and/or wild pointers, and they would probably crash the application (see FAQ 32.01).

Finally, notice that the member functions that create T objects (that is, the constructors and the assignment operator) do not have exception specifications (see FAQ 9.04). This is because the T object's constructor may throw arbitrary exceptions, and any restrictions placed on these template member functions would be wrong for some particular type T.

FAQ 25.06 Why does the compiler complain about >> when one template is used inside another?

Maximal munch.

In the following example, a is a list of vector of int (list and vector are standard container classes; see FAQ 28.13).

If the declaration had been written without any spaces between the two > symbols, such as list<vector<int>>, the compiler would have interpreted the two > symbols as a single right-shift operator.

Here are the details. The compiler's tokenizer (something the compiler does to figure out what a program means) has a rule called the maximal munch rule: "Read characters out of the source file until adding one more character causes the current token to stop making sense." For example, the keyword int is one token rather than three separate tokens, i, n, and t. Therefore, if the tokenizer encounters two > symbols together with no whitespace between them, the maximal munch combines them into one token: >>.