Posts

Overview

I was recently asked the question, "How do I clean up primitive types?" The reference to primitive types refers to one of the built-in fundamental types in C/C++ (e.g., char, int, long, double) that are used to construct more elaborate compound types (e.g., arrays, structs, classes, unions). The C++ language also nicely allows this to easily explored since the task of reserving/acquiring memory can be separated from the task of constructing an instance of a type. This is the topic of and is explored in this post. One may also be interested in reading, Using The C++ Placement New Operator, where how one might implement a class such as Boost.Optional using placement new is used as its example.

It is helpful to also look at the following clause (§3.9.2) as it defines the compound types:

arrays

functions

pointers (e.g., to void, objects, or functions of a given type; to non-static class members)

references

classes (which includes structs)

unions, and,

enumerations.

Storage and Clean Up for the Fundamental Types

When one first learns how to program, one does not normally think about allocating RAM to hold the variables that he/she is creating. The reason is simple: the compiler/interpreter does this for you for all static storage duration (i.e., not thread, dyanmic, or automatic storage duration; e.g., global variables) and automatic storage duration (i.e., variables allocated function call stack) variables.

When one learns about dynamic memory allocation (e.g., C's malloc, calloc, realloc, and free; C++'s new and delete), one will also learn about sizeof(T). The sizeof(T) operation is a compile-time operator that returns the number of bytes used to represent an object of type T. With static and automatic storage declared variables, the compiler will reserve sizeof(T) bytes of space for each variable. For variables on the function call stack, this is done by adjusting the "stack pointer" by the proper number of bytes within each function/scope; with static storage duration objects this is (usually) done by appropriately reserving those bytes in the generated binary (e.g., in the generated EXE or DLL file). So, in essence, the compiler takes care of setting aside and reclaiming the space used for automatic storage duration variables and it takes care of setting aside the space for all static storage duration variables. The actual RAM used to hold all such variables is ultimately allocated by the operating system when it loads the program (e.g., the EXE or DLL) at run-time: the operating system acquires the RAM to hold the executable and reserves a certain amount of stack space for that executable.

ASIDE: C++ also supports thread storage and dynamic storage durations. This is specified in Clause 3.7. Thread storage duration might involve dynamic memory allocation managed by the C++ compiler-generated code for such. Dynamic storage duration is typically managed by the programmer explicitly and increasingly through the use of specialized types, e.g., std::shared_ptr and std::unique_ptr.

So instances of fundamental types themselves don't need to be cleaned up. The actual data stored in instances of these types are simply bits that don't need to be acquired, released, or require any "cleaning" operations to be performed to stop using them, to use them for another purpose, etc. This is because their values themselves do not inherently refer to requested resource(s) that must be released later, thus, one does not need to do anything to "clean" them up or to reuse them. Even with the compound types this is generally true. Note that:

pointer values (on von Neumann architectures) are also represented by integer values so pointer values (on their own) don't need to be "cleaned up", and

non-dynamically allocated fixed-size arrays of fundamental types will assume the size (i.e., the size of the array is not stored/used at run-time) so these also on their own will not need to be "cleaned up" either.

Thus there is no need to "clean up" fundamental types and many compound types. This is perhaps why, if you are wondering, your computer science/engineering professor never really mentioned this. 🙂

ASIDE: Clearly many pointers do need to be "cleaned up" –but this is not the pointer value itself, rather, the interpretation of that pointer value referring to a previously acquired resource which later needs to be released is the reason why one "cleans up" a pointer value. Clearly a pointer can be set to point to anywhere in the computer's memory –and that pointer itself does not need to be "cleaned up."

High-level programming languages don't typically provide the level of control that can be exploited in C++. Even with all of C++'s constructs and abstractions, one can still control how space is reserved/acquired/released separately from object construction and destruction. When these two operations needs to be separated, one normally will use a special form of the new operator called placement new.

If you are new to C++, it is important to first realize that the ISO committee made sure that all types could be treated in a uniform manner syntactically. Specifically, this means that the syntax to default construct, copy construct, move construct, copy assign, move assign, and destruct any type is identical for all types in C++. Further, explicitly using this syntax is always valid whenever the type is represented by a template parameter (that represents a type) regardless of what type it is.

It is meaningful to illustrate that C++, via a template parameter, can even manipulate a fundamental type just like one can do with a user-defined type. Consider this (demonstration-purposes only!) example:

i.e., notice that the destructor can be invoked on the user-defined type as well as the built-in fundamental type. Clearly there is no "real" destructor for any of the fundamental types but the C++ language permits the syntax and semantics so such things can be expressed and used with all (complete) types. This is important because this allows one to focus on functionality –not what type the parameter T actually represents. This facilitates generic programming in C++.

If you are new to C++, know that destructors are actually functions –so they can be directly called but you should never do so unless you are using placement new. Constructors, on the other hand, are defined to not be functions. One implication of this is constructors cannot be directly called. Constructors are invoked either through a variable declaration or by using one of the new operator overloads.

The new operator has a number of overloads that are defined in the language and, like the other operators, can be overloaded. The new operator even allows one to define an overload that permits a user-defined set of function arguments passed to it! The C++ standard defines these forms of the new operator (e.g., see http://en.cppreference.com/w/cpp/memory/new/operator_new):

Four global overloads: two for scalars, two for arrays. One of each of these pairs is a non-throwing version (i.e., it returns nullptr instead of throwing std::bad_alloc).

Placement new overloads: one is defined for scalars, the other for arrays. Both of these accept a void* address argument representing the memory allocated for the object. (NOTE: Programmers can define additional overloads with custom arguments. Such definitions are considered to be placement new forms.)

Four class-based overloads: two for scalars, two for arrays. One of each of these pairs behaves just like the throwing global form and the other has user-defined arguments. (NOTE: All such overloads are considered to be static class members in C++ –even if the static keyword is not used.)

In C++, there are three rules concerning how these are used:

If a non-placement scalar form of new is used to obtain memory, then the delete keyword must be used to release that memory.

If a non-placement array form of new is used to obtain memory, then the delete[] keyword must be used to release that memory.

If a placement form of new is used, then there is no corresponding call to any form of the delete keyword, i.e., it must not be used. Instead, the programmer must manually invoke any destructors and appropriately release any resources as required should that be necessary.

If the third rule above sounds like it can be dangerous, know that it can be! The third rule essentially requires the programmer to completely manage everything manually. This is a good thing: enabling the programmer to control things when needed are what both the C and C++ programming languages are all about!

To understand how placement new and the aforementioned direct invocation of a destructor can be properly and safely used it is best to consider a minimal example:

#include <new>

template<typename T>class A1{private:alignas(T)char data_[sizeof(T)];

public:
T& get(){return*reinterpret_cast<T*>(data_);}

A1(){new(data_) T{};// Use data_'s memory.}

~A1(){
get().~T();// Manual destruction.}};

int main(){
A1<int> a;
a.get()=34;}

In the above definition of A1, the instance of T is stored where data_ is in memory. This is why data_ is an array of char having sizeof(T) elements properly aligned in memory (i.e., so that it can be used to hold an instance of type T). The actual construction of T's instance is done in the default constructor using one of the placement new forms. Clearly visible is the passing of data_, which is effectively a char* address, which is passed in as the void* argument to placement new. Because the memory address is passed to a placement new form, the new operator does not try to acquire any memory to hold the object at all: it assumes the address passed to it is valid to hold the object and it constructs the type at that address. To undo this, the A1 destructor needs to manually invoke T's destructor. Since the memory was reserved by the struct's data_ member declaration itself, no memory actually needs to be released, i.e., nothing further needs to be done.

A1's constructor, however, does not actually need to construct T using placement new. One could defer that until a time (should it ever occur) the actual object is needed. The following program modifies class A1 so that unless the get() member function is invoked, the type stored in data_ is never constructed:

A2 requires an additional bool member to track whether or not the object is constructed.

A2's get() member may throw exceptions from the construction of the object whereas A1's get() member function would have never thrown an exception.

but these trade-offs may be worthwhile, even mandated, under certain conditions. For example, if one needed to store an array of 1,000,000 T objects using A2 instead of A1 then this would mean:

only those instances of T actually accessed would incur run-time costs of construction, and

upon construction of the array only the memory would be acquired –no instances of T would be constructed.

Just how powerful and useful placement new is can be demonstrated with a very practical and simple example. In the code below, the MO ("move-only") type cannot be default- or copy-constructed and copy-assigned: it can only be move-constructed, move-assigned, and destroyed. Clearly trying to create lvalue scalars or arrays of MO will fail, but, std::vector<MO> doesn't fail to compile! It, in fact, has no problem using MO:

// But this works since std::vector // uses placement new internally...
std::vector<MO> v;
v.reserve(1000000);
v.emplace_back(MO{});}

Isn't this cool, considering that v reserves the space in std::vector for 1,000,000 MO objects?!!

The existence of placement new allows one to separate the two distinct operations of reserving space for an instance of some type T versus actually instantiating an instance of type T. In the above program, the reservation of a million objects did not have the inefficiency of impossibly default constructing those objects. Only two MO objects are constructed: the one passed to emplace_back and the one at index 0 in std::vector. This works because std::vector internally uses placement new.

Finally, one should also print out when things are being created, destroyed, etc. This way one can confirm exactly how many instances of a type are being created and destroyed:

Even though 15 array locations are reserved, absolutely no objects are instantiated in TEST 1. When an element is added in TEST 2, you can see the instantiations: the object on the call stack 0x7fff54118050 being moved constructed into the newly created object in the array 0x604010. So even though there are 14 more locations of RAM reserved for A<int> objects, there is only 1 A<int> object actually constructed in v and it was constructed only when it was needed to be!

Clearly this has obvious benefits for std::vector. One benefit is that elements that are not used don't consume time being initialized. Another benefit is the initialization can occur when it is needed, e.g., push_back(), emplace_back(), etc. and those initializations can actually be constructor calls –not assignments.

Closing Words

Hopefully this post helps (i) demystify any misconceptions about new and placement new and (ii) improve your understanding concerning how type instances can be constructed and destructed (independently of how one obtains the memory to hold those types). Placement new is an important part of C++ and it is a vital part of all std::allocator-styled classes. When it is used, it is hidden inside of classes to avoid its misuse and errors that would otherwise likely occur. Containers such as std::vector rely on placement new to give it the abilities it does (e.g., the ability to store move-only objects, objects without a default constructor, etc.) that cannot be done using traditional C-style arrays of the same type.

Overview

A "quine" is a computer program that accepts no input and produces a copy of its own source code [Wikipedia.org]. If one searches for quines on the Internet, one will find all kinds of variations written in various languages. This blog post presents a simple C++11 quine.

Substituting Strings Within Strings

Many short quines one can find on the Internet rely on the ability to substitute strings within other strings. For example consider the following BASH shell script's use of string substitution to output, "Hello World!" [this script is not a quine]:

S="World"S="Hello $S\!"echo$S

Since many languages (e.g., C/C++) are 100% statically typed and don't permit such string substitutions at compile-time, one would not write a quine this way in these languages. This post will present one using C++11 which could easily be ported to other languages with similar restrictions.

The presented code's emphasis is on its simplicity, i.e., the code is meant to be easy-to-read and easy to figure out how it works. Unfortunately the syntax highlighter does not properly highlight the code –so I enabled line numbering and refer to specific line numbers.

With the exception of the program variable, all of the other variables are of type constchar*const, i.e., they are compile-time constants. I used raw strings to avoid additional escaping that would be needed and to span the string across multiple lines –which makes the code much more readable.

where "delimiter" is a possibly empty token. This means one can copy-and-paste string literals inside of other string literals when they have different delimiters. Without this ability, one would have to use traditional C/C++ string escaping. Except for simple cases, traditional escaping techniques will hurt quine code readability as well as the ease that one can write the quine.

Notice that the "data" raw string literal starting on line 4 with R"data( ends on line 19 with )data". Clearly, the use of the raw string enabled me to copy-and-paste a large part of the program source code into the "data" raw string literal. This makes coding a quine so much easier! (I kept the rest of the string literals as traditional C/C++ strings since they are short and easy-to-read.)

The quine works by computing the program variable's value (i.e., lines 27 and 28) at run-time and outputting it (i.e., line 33). Within the source code, the entire program effectively appears twice, however, in the executable binary the source code effectively appears once (broken up) since the necessary concatenations reconstructing the original program will occur at run-time.

The reason this program can output itself is due to how it is broken up: no variable tries to quote itself. Instead the program is broken up across a number of variables (most of which are compile-time constant literals) and a final variable, i.e., program, is used to generate the program so it can be output.

Closing Comments

The presented code can be made more interesting, if it output a program that computed the next iteration of something (if possible) or itself (if not possible). For example, one could have a literal array of std::size_t which holding the first numbers in the fibonacci sequence:

std::size_t const fib[] = { 1, 1, 2, 3, 5 };

and upon running the program, it computes the next number in the sequence. If the next number overflows std::size_t, then the program would output itself, otherwise, the program would output itself with the fib array updated to have one more element in it.

Another variation would involve having a program actually output new versions of itself, compile the new code, and then exec() the new executable. Obviously care has to be taken to ensure that the program terminates and doesn't use up all of the CPU.

An easy "back-door" debugging hack to help ensure that the program-that-runs-another-program shuts down when desired/debugging would be to check for the existence of a special file (and if it exists, it terminates instead of calling exec()).

Finally, consider not having self-bootstrapping programs be quines or quine-capable by writing and using a static/dynamically-linked library of code common to all generated programs to keep the generated program code short.

Overview

It has been a long while since my last post! All of my previous posts were written at a time when GCC and Clang had various issues with C++11 and did not have various portions of C++11 implemented. Very nicely, GCC and CLang now fully implement C++11! Despite the hiatus on this blog, I have been very busy writing C++ code including a lot of C++ template metaprogramming code. Included within such has been updates to apply_tuple() function. This post describes the (final) updates for apply_tuple() freely using appropriate C++11 features as needed.

Deleting Old Stuff

When I started messing around with coding apply_tuple() in 2011 (see the Articles By Topic link):

the C++11 standard had only been recently approved,

the C++ compilers were beginning to implement C++11,

C++11 features could not necessarily be used as they triggered compiler bugs or they were not yet implemented, and

everyone that wrote code in C++98/C++03 were all starting to explore how to do things in new and better ways in C++11.

As a result, the older apply_tuple() was done using older C++98 techniques –the last vestiges of such appeared in the Part III of my last apply_tuple() blog post which I am happy to delete from the code. Specifically, the following are completely deleted from the previous blog post:

struct make_seq_indices_impl

return_type type alias

apply_tuple_return_type_impl()

apply_tuple_return_type()

tuple_size() since it is not needed for one line of code

Since C++11 supports the decltype keyword to determine the type of an expression. This allows one to delete return_type, apply_tuple_return_type_impl(), and apply_tuple_return_type(). Additionally, SFINAE can be done with template parameters and this allows the deletion of struct make_seq_indices_impl. Finally, since tuple_size() is only one line of code, it was removed and replaced with a template parameter with a default value that computed the size of a tuple.

Rather than explain everything relative to the last version which forces one to read all of the previous blog posts, I will explain the solution from start to finish here. This is the final version of apply_tuple(). If you are interested in the earlier versions then look at my older posts –although after reading this, you might wonder along with me if I should just delete the older posts!

If you still need to write C++98 code, then the older posts are useful. Except for variadic template parameters, everything can be easily done using C++98 features. Variadic parameters can be handled in C++98 by hard coding definitions having 1, 2, 3, etc. parameters as needed. The use of constexpr (e.g., tuple_size) can be defined as C++ template metaprogramming (TMP) structs instead. The definitions of std::decay, etc. can be used via Boost or TR1 library code.

The discussion of the final apply_tuple() solution will start at the top of the source code file and work its way to the end. Simply concatenate all of the code for the complete working program.

The Required #include Files

This solution requires the use of std::tuple-related items and std::size_t, therefore these #include files are needed:

#include <cstddef>#include <type_traits>

Generating the Tuple Indices

C++'s std::tuple requires using get<index>(tuple_var)</index> to access the tuple's elements. Unfortunately for newcomers to C++, INDEX must be a compile-time constant std::size_t that is permitted to be used as a template parameter. This means one cannot use a traditional for-loop to iterate through all elements to process them. While one could write a recursive function template to do this, that is too much work. It is better to create a list of std::size_t indices inside of a template parameter list that can be unrolled with a variadic template parameter list of tuples. In essence, if the tuple has 5 elements, one wants to obtain these indices:

indices<0,1,2,3,4>

These indices can be generated in a straight-forward manner. First one needs to tell the C++ compiler that indices is an empty struct that has zero or more std::size_t valued template parameters:

template<std::size_t...>struct indices {};

The purpose of indices is only to hold the index values needed to be used with:

get<INDEX>(tuple_var)

The actual generating of these values is done in a C++ TMP metafunction called make_seq_indices.

If you are familiar with what was discussed in previous posts concerning make_seq_indices, know that the solution below was rewritten to be simpler and safer (since it checks that Begin <= End).

The definition of make_seq_indices is in several parts. It is defined using a struct that is partially specialized over its template arguments. The partial specializations cover these conditions:

The definition of the general case requires Begin and End to be set, it sets Indices to be an empty list, and the final parameter is needed to use std::enable_if to define an appropriate partial specialization based on the values of Begin and End. The definition of the general case of make_seq_indices is:

Unlike the tuple solution below, nothing special needs to be done to pass the arguments in this version to op since the Args parameter pack is simply expanded inside the call to op:

std::forward<Args>(args)...

Applying an Operation to Each and Every std::tuple Element

Unlike the apply() function above, to apply a tuple the apply_tuple() function will receive a single std::tuple argument –but it will need to retrieve each tuple element for that single argument. Since the single std::tuple function argument is not and cannot be a template parameter pack, one needs a way of obtaining a template parameter pack containing the indices required to invoke the std::get() function on each tuple argument. Doing so enables one to expand the template parameter pack over the indices –which will give us the desired result:

But this is not the function the client user will invoke since one wants the indices to be generated automatically. Generating indices was the purpose of make_seq_indices and make_seq_indices_T metafunction! Thus, the definition of apply_tuple() client will directly invoke is:

Notice that there is an extra template parameter, Indices, that has a default value set. The caller of this function will never set its value. This is a handy way of creating a template parameter "variable" to hold a compile-time computed value in a function template's parameter list. In this particular instance, the Indices is used twice and by storing Indices in a template parameter, one does not have to copy-and-paste the code for it twice.

Earlier code went through some extraordinary tricks to obtain the return type of the call, op(args...). With C++11's decltype, the return value of such can be obtained by writing, decltype(op(args...)) provided the new function declarator syntax is used where the return type appears after the function arguments. This code is easier to read and understand. Unfortunately, one does have to duplicate the code in with the function's return type and with the return statement in the function.

Closing Comments

This solution is shorter and much easier to maintain than the previous solutions in my earlier blog posts. This is understandable given the amount of time that has elapsed since I first started and the evolution of C++ compilers from C++98 support to fully supporting C++11. I have been doing a lot of C++11 coding (normal and C++ TMP) within the last year and have been able to explore all of the new C++11 language features and having been learning new and better ways to write code that is more easily maintained and more efficient (than C++98, e.g., fully exploiting move semantics). One of the tasks that I have been doing now that GCC and CLang fully support C++11, is revisiting and updating old code to better use C++11's features. This post clearly shows doing such can be well-worth the effort!

I will again resume blogging concerning C++11 topics and other cool things in the fall. Between now and then, I will be posting some simpler coding-related posts as fun things to try out! 🙂

Overview

Reading the article Choosing Between References and Pointers in C++ and its comments have lead me to write this article in response. Briefly stated: (i) references are not pointers and (ii) I wouldn't use the article to form opinions or learn about C++'s pointers and/or references.

UPDATE (Aug. 19, 2012): I changed "know" in some places to "assume" concerning the compiler (related to Mikko Sysikaski's comment and my reply). I would also like to add, in my opinion only, when concepts are finally added to C++ code written using references may well become preferred over some uses of pointers if concepts permit the definition of type properties akin to group theory (e.g., think of associativity, transitivity, distributivity, etc. properties). If such occurs, compilers should then be able to powerfully shuffle expressions around to transform and optimize them and since the compiler is not mandated to use storage for a reference, the compiler can forgo allocating storage space for references in the resulting transformed expressions should they not need to be stored.

Declarations

Reference Declarations

The C++11 standard states this about references:

It is unspecified whether or not a reference requires storage. (§8.3.2; para. 4)

This implies the C++ compiler does not need to represent a reference as a pointer internally, i.e., the compiler has total freedom to choose a representation it wants. That said, if the programmer explicitly takes the address of a reference, then it is likely the compiler will have no choice but to represent it internally using a pointer. If one does not use the address-of operator explicitly, then the compiler:

knows the reference is an alias to a specific value,

knows whether or not the value is an instance variable; a constant value/literal; an lvalue or rvalue; or, can internally flag it if the referent is some weird pointer to memory address hack / complicated expression (i.e., treat it internally as a pointer); and,

it can optimize code using that knowledge how it sees fit as it need not even allocate storage for it.

It is therefore foolish to assume anything about a reference other than it refers to a valid specific value. It is reasonable to assume that if one "converts" the reference to a pointer (i.e., by using the address-of operator) to its referenced value, then one is likely inducing the compiler to implement the reference as a pointer and perhaps also forcing the compiler to declare a temporary variable to hold the specific value. If such occurs, then this will likely have an effect on the compiler's ability to optimize the code.

On the other hand, if one avoids the address-of operator, then one is allowing the compiler to retain the reference as a reference and to use whatever forms the compiler decides it should be. So if the reference is to a constant literal value, it can keep it as such when/where required when optimizing the code.

These restrictions:

There shall be no references to references, no arrays of references, and no pointers to references. […] (§8.3.2; para. 5)

combined with the fact that what a reference refers to is known assumed to be valid, allows for the optimizing compiler to more intelligently optimize code. When C++ adds concepts this should prove to be especially powerful to optimization engines.

Pointer Declarations

Unlike references pointers can easily refer to any memory addresses be they valid or invalid. References cannot:

[…] A reference shall be initialized to refer to a valid object or function. [Note: in particular, a null reference cannot exist in a well-defined program […] [§8.3.2; para. 5]

Thus the code in this comment written by AHelps to the aforementioned article violates the C++ standard:

References can be used to refer to NULL. It looks like this:

int &x = *((int *) NULL);

You should never do this, but I have it on good authority that there exist large commercial code bases that do this. It IS well defined behavior, as references are exactly the same as pointers

This is not okay. It is also not well-defined behaviour. If such is used in a program, the C++ program is not well-defined according to the standard. Further the C++ standard does not say anywhere that references are exactly the same as pointers: they are not!

Since pointers can point to anywhere including invalid memory locations, the compiler knows less information concerning the pointer and its referent than it does with a reference since a reference's referent is at least valid.

Further complicating the compiler's ability to handle pointers are the aliasing problems that arise since there can be pointers to pointers, and frankly, pointers can point to anything except references and bit fields [§8.3.1, para. 4]. This implies that the compiler might even have to "follow" pointer chains to perform some types of optimizations –provided it knows what those pointers point to. Of course, in general, the compiler doesn't even know if the pointer points to a valid value, so "following" pointer chains is less useful than it might seem. With references, the compiler knows assumes (i.e., it is assumed by the definition of a from the creation of a reference) that what it refers to is valid. As most code using references sets the referent to literal values or variables –not some expression involving memory address tricks– and this implies that the compiler can likely use the reference information in the optimization process as it typically knows which variables, arguments, and/or literals it refers to.

Closing Comments

Unless you are doing dynamic memory allocation, exchanging pointers with external libraries / other programming languages, or truly need to do pointer arithmetic, use references instead of pointers. In doing so you are permitting the compiler to make decisions on how it represents the reference with the added benefit that the compiler might even be able to avoid allocating any storage to implement the reference. The same cannot be done for pointers.

Overview

I was wondering whether lambdas together with move semantics or any other new feature can do as good as ETs. Any thoughts?

So I answered the question: no, expression templates are still needed! This blog post is a rewritten version of my response on StackOverflow.

C++11 Move Operations Are Not Free

If you are new to C++11 think of move semantics as optimized copy operations. Never think of them as operations with zero overhead because they don't have zero overhead. In the worst case, moving something has the same complexity as copying: i.e., O(moving data) = O(copying data). In the best case (for classes/structs with data members), the cost of moving is at least the cost of copying pointers to internal data structures: i.e., Ω(moving data) = Ω(copying pointers to internal data structures). In no case, except when the class/struct has no data members, will the cost of moving something be zero: i.e., Ω(moving data) ≠ 0 just as Ω(copying data) ≠ 0 under the same circumstances.

If it ever appears that the cost of moving or copying data in C++ is zero, it is because of the copy elision optimization and the return value optimization (RVO) are being performed by the C++ compiler –or in a very niche set of cases your classes/structs have no member variables.

The example presented below and in the StackOverflow answer I made shows why the costs of moves needs to be considered. In the example, a math_vector class using an statically-sized std::array as its internal representation. Thus, the cost of moving a math_vector object is the same as copying it and so avoiding any extra copies or moves involving temporaries is of utmost importance. However, I don't stop there as the next question one would likely ask is: how does one get rid of the extra copies and or moves? The method used to accomplish this was the C++ template metaprogramming technique called expression templates.

The Objective and the Problem

When code is written, the goal usually sought after is to have maximum efficiency while keeping the code understandable and maintainable by humans. Object-oriented and object-based programming can help achieve this goal as it groups the operations that can be performed on types together. This leads to code that looks like:

i.e., CASE 1 requires four explicit constructed instances using initializer lists (i.e., the "initlist" items), the "result" variable (i.e., `0x7fff8d6eddd0`), and an additional three objects for copying and moving; CASE 2 is better only requiring three extra objects for moving.

But one can do better: zero extra temporaries should be possible with the one caveat: all explicitly instantiated types would still be created (i.e., the four "initlist" constructors and "result"). This will be achieved using the expressions templates technique by creating:

a proxy math_vector_expr<leftexpr ,BinaryOp,RightExpr> class to hold an expression not computed yet,

a proxy plus_op class was created to hold the addition operation not computed yet,

a constructor added to math_vector to accept a math_vector_expr object,

"starter" member functions in math_vector to trigger the creation of the expression template (e.g., operator + and operator +=), and,

"end" member functions in math_vector to force the computation to occur (e.g., by assigning a math_vector_expr to a math_vector and by passing a math_vector_expr to a math_vector constructor).

Key is realizing that care was taken to ensure that all function arguments are perfectly forwarded and all return values are moved with the proxy objects and all code invoked from the use of such. Additionally, code was written to ensure that copies would not be needlessly be made by writing overloads for rvalue references –not just constant references– and moving returned values when they are lvalues referring to rvalue references.

which appears between the initialization of the four "initlist" variables and the output of the "result" data. The reason this is an "even better" result is that absolutely no function calls are done here: only CPU operations related to loading and adding floating point numbers. This is what one would do when writing hand-optimizing the code, e.g., unrolling loops, etc.

Expression Templates

Before showing the code below, a short bit concerning expression templates should be provided. Expression templates are a form of template metaprogramming. Template metaprogramming is a style of programming using C++ templates that has the compiler compute the code to generate at compile-time. This can be done because the C++ template mechanism is itself Turing-complete. Unlike "normal" programming it involves the use of types and/or constants.

Aside: Overall template metaprogramming is not easy, so if you are new to C++ wait until you are at least an intermediate-skilled C++ programmer before jumping into the C++ metaprogramming world as you will need to understand key subtle points of the language. Many of my other C++ posts on this site use template metaprogramming if you want to see some of the things that can be done and some of what it entails.

The expression template technique involves building parse trees on recognized expressions using user-defined types. Each type in these parse trees is a proxy object representing the values of what is to be computed at a later point in time. Thus, each proxy object stores its arguments and any operations to be performed without actually performing the operation until such absolutely needs to be done or otherwise desired.

In the code below, this is exactly what the math_vector_expr class does. Notice that it stores the "left expression" and the "right expression" for some "binary operator". Notice that both the left and right expressions are stored, but, the binary operation was not since plus_op class does not have any state in this example. If it did, then the binary operation would have also been stored.

The actual construction of the parse tree is triggered from operator + code in the math_vector class since it returns a math_vector_expr. Notice that the math_vector_expr class doesn't ever compute anything –it merely stores its arguments and returns another math_vector_expr. Key here is to have all math_vector_expr instances rvalues or at least constant references to exploit compiler optimizations and avoid generating code with run-time overhead. Finally, notice that only by passing a math_vector_expr to the math_vector constructor or by assigning one to a math_veector will the math_vector_expr finally be asked to compute the values belonging to each vector element. When coupled with C++11 compiler optimizations the end result is the elision of all compiler temporary variables and total code efficiency. It is totally and very cool!

C++ Code

The following C++ code is the example code used to produce all output above. Be sure to compile the code with at least level one optimizations. This code will work with both clang++ v3.1 and g++ v4.8 (and likely v4.7). I compiled the code using these options: -std=c++11-O3.

If you want to build the code so that it does not use expression templates, then #define DONT_USE_EXPR_TEMPL. If not, then ensure the macro is not defined.

Notice that the indices are passed as an unused third argument for the function template to infer Indices. The C++ compiler will (or should!) eliminate the overhead of passing such a parameter.

Closing Comments

The above version is terse, short, efficient, and substantially easier to understand than any of the previous versions. (My apologies for missing (oops!) the above easy simplications which were pointed out in a comment to my previously posted article.)

The technique used by Bryan mentioned earlier conceptually does the same thing. The difference with Bryan's solution is how it works. Bryan's solution recursively unrolls all template arguments and then applies them to the function, whereas, the solutions I've presented expands all template arguments all at once (i.e., without using recursion to unroll them).

Overview

If you've been following this article series, this is the article you've been waiting to read! It is the one where I show the final versions of apply() and apply_tuple(). These versions have the following features:

perfect forwarding of all tuple elements to the corresponding arguments of a function (as with all previous versions), and,

rvalue template argument deduction is used to reduce the previous solution's apply_tuple() and apply_tuple_impl::apply_tuple() total of eight (8) functions to four (4),

This article will present the full solution while briefly discussing the elements that make it work.

Update (Feb. 15, 2012): After posting this article, I realized I missed a few (obvious!) simplifications to the code which make it substantially easier to read and understand. The main change was to make apply_tuple() use an easy-to-use template alias as a compile-time type function guard for std::enable_if. A non-cosmetic change was also made as well: the apply_tuple_return_type_impl() prototype was modified to use std::get to remove any possibility of any type mismatches (e.g., differences with const/volatile qualifiers) within the code. I have replaced the previous code and updated the text only in places that make specific code references that would otherwise no longer match the code.

Supporting Code

Required #include Files

Computing The Indices

In order to extract the elements of a std::tuple, the tuple index is needed as a template parameter, and, to extract all elements of a tuple in a single parameter expansion, a complete set of tuple parameter indices is needed to index the provided tuple. This solution stored the indices using the following type:

template<std::size_t...>struct indices;

which has no definition since there will no (run-time) instance of indices created.

Generating The Indices

Previously, the indices were generated in a slightly different way. In order to make it easier to reduce the number of apply_tuple()-related functions from eight (8) to four (4), the method of generating indices was changed. Now, to generate the indices, one uses make_seq_indices<BEGIN,END> where BEGIN is the starting index (i.e., 0) and END is one past the last index. Thus the definition of make_seq_indices and its supporting code is:

Indirectly Determining the Return Type

The first argument to apply() and apply_tuple() is a function or a functor. For the code to work, one needs to be able to easily determine the return type without directly using template template parameters within apply_tuple(). I came up with a clever way to do this only using function prototypes:

apply_tuple_return_type accepts forwarded types and passes them to apply_tuple_return_type_impl for template argument deduction so that function's/functor's return type can be determined.

The trick here is to:

use function prototypes to help ensure run-time code for these functions is never generated,

use function prototypes to trigger template argument deduction,

use the auto return type to permit the use of std::result_of to find the return type of Op(OpArgs...), and,

the auto return type to allow decltype to be used with the function's arguments applied to apply_tuple_return_type_impl to enable computing the return type.

A consequence of the above is that it allows any client code to determine the return type of a function/functor without directly using template template parameters. For example, later in this article apply_tuple_return_type_impl is used like this:

Since template template parameters are not (directly) used used to determine the types that will be passed to the function/functor, there is no need to write both lvalue and rvalue overloaded functions to handle the tuple parameter. Since the above definition also uses the same function, std::get, to determine tuple element type as apply_tuple() no type mismatches will occur as a result of using this prototype and the apply_tuple() functions.

Aside: Ensuring that template template parameters need not be directly used along with another item mentioned later was key to reducing the number of functions required from eight (8) to four (4) with apply_tuple and apply_tuple_impl::apply_tuple().

Helper Functions For Readability

Without the following definitions, the presented code is harder to read and understand:

Two versions of apply_tuple() are needed: one for operations returning void and one for those returning a type. The use of enable_if allows the compiler to select the correct function based on Op.

std::decay is used to strip Tuple down to is base type so that the tuple's size can be determined.

With the size, make_seq_indices is invoked to generate the indices.

decltype is used to determine the operation's return type.

The purpose of apply_tuple() is to forward everything to apply_tuple_impl::apply_tuple(). This is needed since the indices must be a parameter pack. Unlike previous version, the indices and the return type are computed without using a template template type within apply_tuple(). With apply_tuple() setting things up, the definition of apply_tuple_impl::apply_tuple() is straight-forward. The reader will note the code is identical to previous code since it still invokes std::get<Indices>(std::forward<T>(t)). This is what has changed:

The template arguments do not make use of any template template parameters. This allows rvalue template argument deduction and reference collapsing to take place. Consequently, this allows one to write one function for the two cases: the void return type and non-void return type.

What allowed Item 1 to be done was realizing that one wants to use decltype within the std::forward template parameter to determine the type of each element instead of the tuple's template argument parameter pack. While this duplicates the get code twice, it permits the parameter pack expansion to work identically without explicitly using any template template parmeters. It also avoids any possibility of incorrect types between what is passed to the function and what is provided to std::forward.

The choice of not directly using template template parameters is what enables this solution to eliminate four (4) function definitions (i.e., the lvalue overloads). As a bonus the solution's type safety is now robust since std::forward's parameter type exactly matches what is provided to the function. This avoids the possibility of compiler "error novels" should somehow some of the types don't match up.

Closing Comments

Overall, this solution is not pretty, but, it is straight-forward, maintainable (especially if some comments are added), and, best of all, its use by end-users is almost effortless. The end-user need not ever see the above as he/she would simply write code like this:

double add(int a, double b){return a+b;}

//...
std::tuple<int,double> some_tuple;// ...

apply(add, 2, 34.2);// or
apply_tuple(add, some_tuple);

Such code can be easily written by newcomers and experienced C++ programmers alike!

Overview

This article updates my previous article on writing an efficient implementation of applying a tuple to a function using perfect forwarding.

This coding endeavour and article series started last year when fellow student, Bryan St. Amour, wrote an initial version and we've been hacking and discussing this (and other metaprogramming topics) off-and-on every since with our own separate code solutions:

his implementation unrolls the tuple to be applied using function calls, and,

my version relies on expanding the arguments pack within a single function call.

Bryan's solution syntactically resembles a functional programming style, e.g., his latest version uses template aliases as type functions and constexpr functions to perform the "magic" in a terse way. My approach has been to generate the function call without "recursively" unrolling the tuple elements using a function, i.e., to unroll all tuple elements in a single parameter pack expansion. Both of our solutions perform exactly the same operation from the end-user's perspective and are efficient with various pros and cons to each version –which will be a future article.

Aside: Bryan has an excellent update to his code and tells me he will be blogging about it soon. When he does I will delete this comment and link directly to it in the above paragraph.

Bryan and I agree we have worked out the details of what is necessary to extract a std::tuple's elements and efficiently pass them to a function call: we have been (and are) polishing our code and better exploiting C++11. Hence, we are in "The Home Stretch".

To avoid having far too much in one article, I will split up The Home Stretch article over a series of blog posts. The solutions presented so far:

Handling void Functions/Functors

A small oversight in the previously presented code was that it could not handle functions and functors that return void. Fortunately this is easy to fix by using std::enable_if found in <type_traits>.

How std::enable_if Works

Details concerning std::enable_if can be found in the Boost Enable If library which explains one can use std::enable_if in practice with class and function templates.

NOTE: This article will only explore one aspect of its use as it pertains to the presented code.

A C++11 definition of std::enable_if could be:

template<bool Cond, class T>struct enable_if {typedef T type;};

template<class T>struct enable_if<false, T>{};

which makes it a compile-time C++ template metaprogramming function where:

if Cond is true, then std::enable_if<true,T>::type is defined and is a valid type, otherwise,

if Cond is false, then std::enable_if<false,T>::type is not defined and is, therefore, an invalid type.

It is important to recognize that templates are patterns and when template parameters are substituted in the template, the result may or may not contain valid code and/or type constructs. Equally important is recognizing, in the case of function templates, that the C++ compiler will look for the best match of a given function when it is overloaded. Both of these together will permit std::enable_if to be used without triggering a compile-time error and to properly select code for a function that returns void versus one that does not. For example, consider:

if the type of the argument is void*, then std::enable_if<Cond,T>::type in the first example function will be void since std::is_same<T,void*>::value is true but it is not defined in the second example function; or,

if the type of the argument is notvoid*, then std::enable_if<Cond,T>::type in the first example function but it is not defined in the second.

Since one of the definitions always results in an invalid definition, the compiler discards the function template with the invalid return type (which would be a compile-time error if there wasn't another function that could be selected) in favour of the one with a valid return type. Thus, given any type T, there exists an unambiguous definition of example for the compiler to use. 🙂

Aside: C and C++ compilers only select the function to call based on its name and the number and types of its arguments. Never is the return type of the function used to select which function to call. This has not changed here: std::enable_if simply permits one to select a function template based on one or more of its template parameters.

Using std::enable_if With apply_tuple

It is now straight-forward to use std::enable_if with apply_tuple to handle void functions as follows:

given a void function, apply_tuple cannot be defined to return the result of calling apply_tuple_impl::apply_tuple, and,

given a non-void function, apply_tuple must return what apply_tuple_impl::apply_tuple returns.

Except for the addition of std::enable_if, the previous code's definitions of apply_tuple and apply_tuple_impl::apply_tuple represent the latter. Thus, like example(T) above, one only needs to write versions of apply_tuple and apply_tuple_impl::apply_tuple when a void function is passed as an argument. Before presenting the solution, let's define two C++11 template aliases to determine the return type using std::enable_if so the code is more readable and maintainable:

Closing Comments

Although the above code now supports void functions, the number of apply-related functions has doubled even though the code within each function is essentially the same. Fortunately, it is possible to exploit C++11's ability to use rvalue template argument deduction to accept lvalues or rvalues (by collapsing references). This feature only works if the function argument is a simple template type (which it is not here). Fortunately, with some cleverness, this is possible and will be the focus of the next article. The resulting solution will eliminate one half of the functions above (i.e., those taking constant references will be eliminated). 🙂

Overview

In A Template Argument Extender C++ template metaprogramming was used to repeat a single template parameter type N times. This article outlines how such can be enhanced to repeat a set of template parameter types N types.

Defining The Problem

The previous article outlined how to use C++ template metaprogramming (C++ TMP) to create a template class having a single type repeated N times as a parameter. For example, the type:

one was out of luck with my previous article. But as you might have guessed it is possible to easily write C++ TMP code to handle such with a little cleverness! 🙂

Defining A Placeholder Type

One of the weird clever things one does when writing C++ TMP code is creating types that are not at all used at run-time. These types are simply placeholders so that the C++ TMP code transformations can be made to work. In this article, a placeholder type is defined and used purely for compile-time only code transformations. The purpose of the placeholder is to represent a type list for a set of template parameters:

template<typename... T>struct tl {};

The reason one needs such a type is because template parameter packs:

are not types (e.g., one cannot hold a template parameter pack in a typedef),

cannot have their parameters directly manipulated, and,

they can only be expanded (e.g., using ...) or queried for their size (e.g., using sizeof...).

However when using a placeholder type template parameter packs:

can be held as the template parameters of the placeholder types,

can be passed around and stored in typedefs via the placeholder type since the placeholder type is a type,

can be (indirectly) queried/manipulated using template template parameters and partial template specialization,

can even have a concrete run-time implementation if needed.

Thus, by using a placeholder type, one can circumvent the limitations of template parameter packs. Some of these benefits of handling template parameter packs by using a placeholder type will be apparent in the code below.

Defining A Template Alias

Although one cannot have template typedefs in C++, the C++11 standard introduced template aliases using the using keyword. The definition of the above n_tuple as a template alias is:

The goal of having a list-of-types extended automatically has been reached!

Closing Comments

As is typical when one revisits old code, one improves the previous code. Such is true here: the above code is simpler and easier-to-understand than my previous article's code. The use of a template alias further enhances readability and end-user usability. The alias also avoids the need for a special class simply to properly invoke the repeater class.

Newcomers to C++ TMP should note that the placeholder type used above could be any template class type accepting 0 or more types as template parameters. The placeholder's only use is to permit the extraction of its template parameters using partial template specialization. If combined with other C++ TMP code then not hard-coding the placeholder type is especially convenient: should one use repeater directly, one does not need to extract the parameters from one template class type and then place those types into a special placeholder type just to use repeater.

If one is using repeater directly, one can pass an initial set of template parameters for ArgsList.... These parameters will not be repeated but will be used as the first parameters of the Result type regardless of the value of N.

Finally note that one could have placed ArgsList... within a placeholder type as well. I wanted it to be easy to pass in a list of mandatory initial template parameters when using repeater directly. If one wanted to process multiple parameter lists independently then using a placeholder type would be essential.

Overview

In previous posts I developed and tested code that could apply tuples to functions and function objects (functors) efficiently. In a reply comment to my last post, it was noted that apply_tuple() does not work with lvalues. This post outlines how to fix the code so it works with both lvalues and rvalues.

Some Code Demonstrating That Lvalues Won't Compile

Nothing fancy has to be done to show that passing an lvalue to apply_tuple() will fail to compile. One need only write some code like this:

// place apply_tuple-related code here

int add(int a, int b){return a + b;}

int main(){
std::tuple<int,int> t;
apply_tuple(add, t);// etc.}

This code fails to compile since the tuple argument is not an rvalue and the tuple argument type is not an exactly-as-passed template type parameter.

C++11 defines a special template argument deduction rule when a function template accepts an argument as an rvalue reference to a template type. When this occurs, if the argument is an lvalue reference of type T, then it is treated as T&; if the argument is an rvalue reference of type T, then it is treated as T&&. This convenience is only applied when T is an exactly-as-passed template argument.

Thus, by adding overloads for constant references (i.e., lvalues) apply_tuple() can now be used with lvalues. Adding these lvalues overloads is necessary since we could not exploit the special template argument rvalue deduction rule.

Closing Comments

The special template argument rvalue deduction rule is very convenient but its applicability is limited. When the rule cannot be exploited, one needs to write overloads to handle both lvalues and rvalues. With apply_tuple() one needs to extract the OpArgs... non-type parameter which unfortunately seems to prevent one from exploiting the rule.