Introduction

Modern programming is more than just producing a big slew of C code that seemingly gets the job done and should work without any major problems. As C programmers, we have probably experienced the need to reinvent basic data structures over and over again, thus C++ entered the scene and with its standard library produced a number of common data structures, but it did more than that – it provided common data structures that were type safe and generic at the same time. If you have had some exposure to C++, and I hope you have if you are reading this, then it is fairly obvious of what I’m talking... templates.

What many books describe as being templates is “a way to generalise the type that a function or class uses” – an example of this would be a single-linked list taking a user-defined type. C++’s template system provides more than that, it provides further control than merely generalising over a type, it provides functionality for creating specific implementations for specific types through (partial) template specialisation. This may boost the efficiency of an implementation greatly as you can usually make more presumptions on your data than your compiler can, because your compiler will always need to make a conservative guess about your intentions.

In this article I will cover how to specialise templates, both completely and partially, and illustrate how this might help in increasing the efficiency of an implementation greatly, and I will also show some more strange tricks that rely on more in-depth knowledge of the template mechanism more-so than it is a way to ‘generalise’ types.

A Motivating Example

Let’s presume that we have a functor, we cannot rely solely on functions because we cannot partially specialise these1, that copies data from one std::vector to another:

Since we cannot make presumptions of our data this is the best we can do... or is it?

Complete Specialisation

A complete specialisation of a template is one where we assign a type/value to each template parameter in a new declaration of a class. With our motivating example this is fairly easy as there is only one parameter. To return to the question I let hang in the air in the previous section: “[...] this is the best we can do... or is it?”, what if we knew that our type was a pointer-to-an-int, in that case we would be able to use std::memcpy to move the contents2 from the source to the destination. So, how exactly do we specialise this thing?

This doesn’t seem so bad, apart from the code duplication, of course, and the way to specialise should be fairly straightforward. The above specialisation of our copier will approximately (without optimisations turned on) run somewhere between three and seven times faster than the general version, which is, of course, a rather nice speed improvement. However, it becomes tedious, and not to mention error-prone, to do this for each and every single type you will use your functor on. It would be a lot better if we could say given a type T that is indeed a pointer, let us use std::memcpy instead of a per element copy.

Partial Specialisation

There is, of course, also a way to accomplish specialising on some type that is a pointer, this is through partial template specialisation3. Partial specialisation looks like normal template declaration coupled with complete specialisation, like this:

This looks surprisingly a lot like our complete specialisation for the int, right? So with an extra 8 lines of code we now have an appealing speed gain for all pointers as we can utilise the faster std::memcpy .

There are, of course, more advanced things one can accomplish using this: compile-time based size constraints, compile-time assertions, type manipulations, and many, many more. This beginning was, however, merely a short example of why it can be interesting to use complete and/or partial template specialisation.

Notes and Caveats

Due to the fact that we can't guarantee that std::vector uses contiguous storage it would be sensible to, as one normally would, use std::copy appropriately, but this illustrates how to make a functor-version of this, providing std::vector does indeed use contiguous storage.

Timing the Difference

To notice the benefit of our specialisation it is, of course, useful with a good benchmark, and for this I’ve written the following little program. It is written for linux, but I am sure that it can be reworked to work in windows with relatively little effort. I will just present main here as the specialisations are shown above. First compile without the specialisation and then later with the specialisation and notice the difference in speed.

I have run the above on my 1.5 GHz Athlon XP+ with 256 MB RAM and it produced the following results:

Version

Time (in micro-seconds)

Without Specialisation

292925

With Specialisation

51254

That is, indeed, a speed benefit that is tangible, and one that should provide the first inkling of motivation.

Non-Type Related Templates

Apart from providing a generalisation over user-defined types templates can also be used together with a number of simple types, bool, (unsigned) char, (unsigned) short, (unsigned) int, (unsigned) long, enumerations and pointers. In other words templates can be used with bool, simple integer-types and pointers. This can, for instance, be used for compile-time computations that aren’t trivial and will incur a noticeable overhead at runtime. We might, for instance, be computing something requiring the calculation of the same factorial in the body of a loop that is run hundreds of times, rather than compute this at runtime we would like to calculate it to a constant number and just use this, however just replacing the call to the factorial function with a number introduces one of the so-called ‘magic’ numbers that will leave maintainers scratching their heads in puzzlement. So... time for yet another motivating example:

for (unsignedint i = 0; i < 1000; ++i) {
...
fac(10) * ...
...
}

This will evaluate fac(10) for every single iteration, but we would, of course, like to do better than that (we cannot rely on the compiler hoisting the computation from the loop as it is a function call). This is critical code after all, but we would like all our changes to take place so another programmer will be able to deduce the meaning of the code without having to ponder said magic constants. So what can we do? Considering this is a paper on templates I'm almost positive that it was a leading question.

So how, exactly, do we use the integral types with templates? Almost exactly like we abstract types, let us try to implement a simple recursive factorial using templates. I will come back to why it has to be recursive later in the article.

This code might require a bit of explanation. The template ``type'' is as we are used to it, just instead of class or typename we have now used unsignedlong instead. Nothing too complicated there. To be able to specify a value of a variable inside a class4 the variable must both be static and const. Compared to a normal recursive definition of factorial we can see the similarity:

You may already have spotted the problem with our template code now... we don't have the check for the stopping criterion, oh horror! Our template will just keep on going and going and going, we will have sent the compiler into an infinite loop5 trying to compile our code (!) If I didn't have a solution to this it would be a rather short and disappointing article. We can, of course, use our knowledge from the last section and completely specialise the factorial template to catch n equal to zero, like this:

So when we instantiate factorial with a zero result will be 1. Now it is time to see how, exactly, all this takes place within the compiler. Knowing how the compiler deals with things often makes it a lot easier to choose the better way to implement something. Do remember that this is happening at compile time, your compiler is actually doing the computation for you when it is compiling your source file. So let us presume we instantiate our factorial template with 10 like this:

std::cout << factorial<10>::result << '\n';

What exactly happens at the instantiation of factorial<10>::result? It implicitly instantiates factorial<9>, which instantiates factorial<8>, etc. down to factorial<0>, which returns 1. When the 1 is found we proceed all the way back up through the instantiation sequence and thus gets the following sequence of computations: 1*1*2*3*4*5*6*7*8*9*10, and the result of this will be stored in factorial<10>::result .

Using all of this we can apply it to our loop mentioned earlier and we get this:

Considering factorial<10>::result is evaluated at compile-time we will merely be multiplying with a constant at runtime and we have thus saved a significant amount of computations in a critical loop.

There are, of course, other things that we can use this for that aren't only related to mathematics, some of the things you might have heard of, or will come to hear of is compile-time assertions and compile-time enforcements. For good measure I will illustrate the first of these here.

Compile-time Assertions

The C++ Standard Library provides the assert macro in the cassert header, but whatever you assert here will only be checked at runtime - that is one more thing to do at runtime that might be unnecessary. There are, of course, some things you want to assert at runtime, but what about the things we can tell already at compile-time?

What exactly is there that we can assert at compile-time? Plenty of stuff, actually. If we know the size of two matrices at compile-time we can also check that they comply with the normal mathematical rules if we wish to multiply them with each other. The benefit of doing this at compile-time is, once again, that we save the time at runtime. This means that the program will potentially run faster for those of you following at home. The way to do compile-time assertions is actually amazingly simple, and like all good ideas you sit wondering afterwards why you didn't think of that yourself.

template<bool> struct CTAssert;
template<> struct CTAssert<true> {};

This code is, indeed, fairly simple, right? What we do is we forward-declare a general CTAssert taking a bool (which is a fairly limited data structure as values can either be true or false). We then proceed to make a complete specialisation for when the bool is true, but we omit any definition of the false specialisation or the general implementation. This means that if you try to use it with a bool that is false it cannot figure out what class to instantiate (as it hasn't been defined), so it will issue a compile-time error. Pretty fancy, isn't it?

So how are we going to use it, and for what? We can use it in any place we might've used assert to test something that we already know at the time of compilation, for instance whether sizeof(int) is 4. This is accomplished using: CTAssert<sizeof(int) == 4>(); in some code block. As this is an empty object any optimising compiler should be able to optimise the construction away if it isn't subsequently used (which is not the case in the example, at least). The use of CTAssert to test the size of int might seem a bit dull to you, but you could, for instance, use it to test whether the compile-time known sizes of two matrices fit for multiplication, and there are many other things you can use it for.

Summing up the Paradigms

As the past sections should have illustrated we have in templates an entire unique programming language within the confines of C++. The templates actually feature a simple second order lambda calculus6 based system (that is a good thing in case you are wondering). Unlike the rest of C++ templates aren't bound in the procedural paradigm, but rather in the functional programming language paradigm7 and as much as a pure functional programming language templates only provide some concepts. The concepts it provide is, however, adequate for pretty much everything: recursion and testing conditions. You have seen examples of both previously, and soon it is time to see how we can use these to design two mathematical vector classes with a maximal amount of code reusability between the two implementations.

Recursion

We have already seen recursion implemented using templates in the factorial example where the template calls itself with a value one lower. This is recursion in its most easily identifiable form. Whenever you need to make a loop with templates you use recursion, be it for iterating over types, numbers, or enumerations.

Conditions

We can hardly live without our ancient and trust-worthy if-construct, all non-trivial programming depends on it, and as such it is fortunate that we have it with templates as well. ``Where?'', you may ask. It is the complete and partial specialisations, nothing less, nothing more. Remember the factorial example once more, it was through complete specialisation we tested for when we called factorial with 0 and it is through complete and/or partial specialisation that we can test on one or more of the template arguments. Of course doing or-constructs (that with partial specialisation) is a great deal more `wordy' than a normal if-construct, but the mere fact that we are able to do so will aid us greatly in the future as we delve deeper into the art of template metaprogramming.

An Advanced Example: Mathematical Vectors

There is a duality between many things in how you can implement them with C++: as a run-time size specified data structure or as a compile-time size specified data structure. As such you seem to get either or, never both within the same data structure, and hardly ever with the exact same interface. There is, of course, a way to remedy this, which I am about to explain. A way that I have often a time used when developing templated code for my template library: template metaprogramming.

I will make the (evil) presumption that you, the reader, is proficient enough in mathematics to understand vectors and what they do and instead of explaining these things focus on how we are going to implement it. The first, and most obvious, thing to do would be to design the call interface.

The Interface

I will here only describe the public interface, which is to be shared between the compile-time size specified vector and the runtime size specified vector, because it is in the private sections of the scene that the true adventure will unfold.

These were all trivial functions to implement when we had a simple Vector without all these template frills and whatnot8. They will, actually, look almost completely alike this time around, with the exception of size, which will differ. Having established the external interface of the Vector it is time to figure out what is going to go on behind the scenes - how to choose between runtime and compile-time based storage.

Behind the Scenes

The tricky work happens behind the scenes, as is usually the case with templates. The devise goes 'Easy to use, hard to write'. First we need two distinct classes to specialise to depending on whether we are using runtime sized vectors or compile-time sized vectors. These we will call Vector_rtsize and Vector_ctsize respectively. The latter of these will be templated with a size (we need to be able to specify the size somehow).

So now we have two distinct classes to work with. We now need to define the actual storage. To do that we need to know the type of the data we are going to store and the size of the data to store (in case it is compile-time sized). If we call our storage class for Vector_storage then an initial sketch of the runtime-sized storage could look like this:

The actual body here isn't too hard, it is the specialisation that matters to us mainly. As you may recall this is a partial template specialisation that takes Vector_storage and specialises it to Vector_rtsize as a Storage specifier while leaving Type still a template parameter. While the contents of Vector_storage specialised to Vector_ctsize is remarkably the same contents-wise it will still be interesting to see how the actual specialisation is done, considering Vector_ctsize takes a template parameter on its own and that size() relies on the compile-time constant as well.

So we can also specify further template parameters to specialisations as can be seen above. This allows for even more expressive power when programming with templates.

What we have now is a general interface for mathematical vectors, as well as two distinct storage policies, one using the free store, the other using the stack. All we need to do now is add it all together. This we do by instantiating the appropriate storage inside the Vector class and for that we need to know the storage policy, thus we make this a part of the template parameter. So, without further ado, I give you the entirety of the Vector class and its auxiliary classes:

The remainder of the Vector methods are fairly trivial to write and I will, as most other lazy authors, leave it to the avid reader to write these. There should never be written code without at least a single example or two on how to use it, so a small example of instantiating the Vector in different ways and to use them is shown in the next piece of code:

The Vector at a Review

What was it that allowed us to create one vector class yet be able to differentiate between storage mechanisms: using the free store or using the stack. Rather than specialising the actual Vector class we have specialised the storage of the Vector as appropriate. This leaves us with the same interface for the class the user sees, but with differing internals depending on the template parameter.

This is but one more place where partial template specialisation may aid our programming and allow us to use the stack-bound version, which doesn't have the dynamic memory overhead, in time-critical code, and allow us to use the dynamic version when we cannot predict at compile time the size of the storage. Much like using fixed-width arrays as opposed to dynamic arrays in C.

All in all we have a Vector where the storage method is transparent to the user (just like any good object-oriented encapsuled class should guarantee), but with which the user can transparently switch between storage types without changing any other places in the code but the declaration of the variable. This is in places known as a policy-based design that defers several choices of specific implementation to the application programmer rather than letting the library developer take the decisions. Policy-based design is described thoroughly in [Alexandrescu] chapter 1.

Further Down the Road

What you have learned above, providing you have actually read it all and managed to digest it, should provide you with the tools to program just about anything relating to template specialisations. The basic understanding of templates I trust you had already prior to throwing yourself into this article. So what are some things that you are ready to tackle and play with? To name a few: smart pointers, type lists, matrix classes, object factories, and many, many other design patterns. Further information on design patterns and their applicability in object-oriented design can be found in [GOF].

Footnotes

1 It is possible to completely specialise functions, and it is used in several places of the C++ Standard Library.

2 This is presuming std::vector uses contiguous memory, as it is now this is not a requirement on behalf of the standard, but as to my knowledge then all implementations of the STL uses contiguous memory for it. However, this is merely to serve as an example.

3 At the time of writing only a small handful of compilers supports this, so take great care what compiler you use.

4 A struct is equivalent to a class in C++, just with a different default visibility specifier.

5 This isn't exactly correct as most compilers will have a limit on the maximum recursion depth of template parameters so most compilers will produce an error message stating that the maximum recursion depth has been exceeded, but in theory we would get an infinite loop.

6 A typed lambda calculus.

7 This might be why it draws us nerdy Computer Science majors to experiment with it.

8 Providing you have actually tried to implement a Vector class before.

History

5th August 2003: Initial version uploaded.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

Share

About the Author

Comments and Discussions

I know your example was a simple one but it begs the question why bother templatizing something like that at all? you just waste time writing and debugging the template instead of just writing the code you need. why waste time bothering with something like this when you can actually just code what you mean to write and move on to something else?

Anonymous wrote:I know your example was a simple one but it begs the question why bother templatizing something like that at all?

Well, it would've been helpful if you had referred to which part of the article you found wasn't worth the bother. So, I guess I will just answer for them all.

1) The copier provides a type safe optimisation for data structures for which you can presume more knowledge than your compiler. It is always a benefit if you can carry out optimisations in your high level language rather than having to resolve to the nitty-gritty details of assembly.

2) Non-type related templates might seem overly simplified in my article, but it is, after all, an introductory survey, but they can also provide an optimisation of some prowess.

3) Compile-time assertions provide you with the ability of detecting programming errors at compile time. The sooner you can detect the error the faster you can fix it.

4) The mathematical Vector is a prime example of something where templates is greatly beneficial. If you take up scientific computing you may very well need a data type far more precise than doubles or long doubles, so you, naturally, create your own class for multiprecision floating point arithmetic and instantiate it with that, whereas if you're working on a game a double might be good enough, or even a float, so you can use those rather than having to roll out a unique version for each data type.

Lastly, to answer in general: Using templates allows us to abstract the underlying types in a manner that allows us to reuse our code in more situations. I reuse my code far more often when it is templated than otherwise. Another benefit is, for many templates I write, that you can use it both with ASCII character sets, but also with Unicode.

Anonymous wrote:you just waste time writing and debugging the template instead of just writing the code you need.

I don't find writing templates any more difficult than writing, say, versions using void*'s. Perhaps because templates closely resemble the functional programming paradigm and I have spent considerable time working with such languages, but I do actually find it worth my time to spend the few moments longer writing out a complete test case for a template so each method is instantiated, because it allows me to reuse it later and know the code works, instead of having to write it again.

The same thoughts apply for the standard library. Why have std::vector as a templated container when we could just type cast to and from void*? type safety.

Considering template.cpp isn't one of the files I have supplied a bit more context about what you are trying to do would be nice.

Apart from that then Visual C++ .NET 2002 does not support partial template specialisation so there are several aspects of the article that will not work on this compiler. This is also why I specified VC7.1 as the target compiler in the article header (top right of the page). It should, however, work on most compilers that support partial template specialisation. I have, at least, successfully tested it with GNU gcc-3.2-3.

If you refer to ISO/IEC 14882:1998 then per §23.2.4/2 then we know that:
"A vector satisfies all of the requirements of a container and of a reversible container (given in two tables in §23.1) and of a sequence, including most of the optional sequence requirements (§23.1.1)."

Investigating the pertinent sequence requirements §23.1.1/1 the Standard states: "A sequence is a kind of container that organizes a finite set of objects, all of the same type, into a strictly linear arrangement. The library provides three basic kinds of sequence containers: vector, list, and deque."

That is the only item in the Standard pertaining to memory layout requirements of sequence containers, e.g. vector. Having a linear memory layout is not the same as having a contiguous memory layout - in the latter elements are stored just after each other. However, in a linear memory layout all that we need to guarantee is that vec[0] is before vec[1], which is before vec[2], etc. in memory. This allows us to, for instance, have padding between vector elements on quadword boundaries for more efficient data fetching. I am sure there are other and more dark reasons that this may be a benefit, but I also know that it has many a time been a great annoyance to people wanting to use it where functions expect C-style arrays.

Lastly, &theVector[0] only returns a pointer to the first element if our stored type, say T, does not overload operator&, in which case the Standard makes no such guarantee. And due to the linear layout above we cannot guarantee that ++&theVector[0] == &theVector[1], unfortunately.

I hope that this clarified the point, I must admit that I was a bit too lazy to include this more brevious explanation as to why it was so in the article.

IIRC, a C compiler is allowed to pad elements even in a C-style array. For example, given X* p, the only requirement is that incrementing p steps through each element of the array. If X happens to be 3 bytes in size, the compiler is permitted to insert a pad byte so that everything is neatly aligned, as long as incrementing p then adds 4 each time.

A "linear arrangement" is not the same as a "linear memory layout" A linear arrangement simply means that the elements of a container appear (logically speaking) one after the other, as they do in vectors, lists and deques. I don't have the Standard to hand but vectors are a special case where their layout in memory is specifically stated and required to be the same as that of C-style arrays.

Referring to &theVector[0] was a sloppy statement on my part but the intent is clear: vectors *must* be stored in a contiguous block of memory where the address of the first element is the address of that block.

I know I have read in several places statements made by people a lot smarter than me that using vectors where C-style arrays are expected is OK (e.g. item 16 in Effective STL). To have it otherwise would have been an enormous oversight on the part of the standards committee and I can't believe that they would've missed it

The intention from the work-group was that it should be contiguous, but as Andrew Koenig states in defect report item 69 per 29th of July 1998 then it is not completely specified. Item 69[^].

His proposition is to add this to the end of paragraph 1 of §23.2.4: "The elements of a vector are stored contiguously, meaning that if v is a vector where T is some type other than bool, then it obeys the identity &v[n] == &v[0] + n for all 0 <= n < v.size()".

So in other words - yes, for all practical purposes the vector has its elements stored contiguously, but a zealous implementor could play tricks on you. :) But as I stated in the article, I don't know of any implementations that have decided to interpret it as having non-contiguous storage.

If you decide to interpret the strict linear arrangement as a logical ordering of elements rather than referring to memory layout there is no mention of what memory layouts to choose apart from worst-case runtimes for some operations. (I have just scrutinised the sequence and vector sections once more to make entirely sure that I haven't made some great blunder here). :)

Henrik Stuart wrote:If you decide to interpret the strict linear arrangement as a logical ordering of elements rather than referring to memory layout there is no mention of what memory layouts to choose apart from worst-case runtimes for some operations.

It occurs to me that you *have* to interpret the phrase "linear arrangement" as referring to the logical ordering. Otherwise it makes no sense for lists, where nodes can live anywhere in the heap.

Taka Muraoka wrote:It occurs to me that you *have* to interpret the phrase "linear arrangement" as referring to the logical ordering. Otherwise it makes no sense for lists, where nodes can live anywhere in the heap.

Yes, you are right. It wouldn't be realistic (or sensible) to impose a linear arrangement as in memory on a list, so it must have to be interpreted as a logical, linear arrangement. Guest I was a tad too fast at interpreting the standard there.

I can't dig up the reference in the C++ standard, but Herb Sutter Scott Myers clearly states in "Effective STL":The elements in a vector are constrained by the C++ standard to be stored in contigious memory"

If I trust someone on STL, it's Scott, for he wrote the perfect book to teach people why STL is bad

In brief, STL is cool as a Template Library, but overly complicated as a Standard Library. It would even be OK with the right amount of compiler support: solid documentation, correct and informative identification of syntax errors, syntax error decoder/type breakdown, and - of course - template code that can actually be compiled by a compiler, or read by a human. Damn, this "new: exceptions and templates" age was a nightmare I can tell...

STL is hard to integrate with classic C-style libraries, or libraries that aren't "STL'ish". Big problem - the "STL mindset" (i.e. to understand STL sources, and write similar code) is very complex, you can teach a newbie how to use a vector, and make him write down and remember the syntax for using not1, not2, bind1st and bind2nd. However, making him understand what he is doing, and why, usually takes a few years into C++. That's ok for a library, but IMO not for a STANDARD library.
(All we wanted was a string class! If you throw in

btw. Scott Myers "Effective STL" is both a good book to use STL effectively, and warning of the (often serious) pitfalls of STL.

peterchen wrote:I can't dig up the reference in the C++ standard, but Scott Myers clearly states in "Effective STL":
The elements in a vector are constrained by the C++ standard to be stored in contigious memory"

I don't happen to own a copy of Meyer's Effective STL, but if you can quote the appropriate paragraph, please do so. As to whether it is contiguous see my post, here. I will choose to trust the Standard on this.

peterchen wrote:If I trust someone on STL, it's Scott, for he wrote the perfect book to teach people why STL is bad

I hope that was a joke. It may not be trivial to use, but it is far from bad.

(The text is from 1999)
"The standards committee agrees that this is a natural thing to want to do, and it intends to put wording in the first Technical Corrigendum to the effect that the elements inside a vector must indeed be stored contiguously and so will be usable with code that expects an array. The library vendors know that this clarification is coming (after all, the major standard library vendors were with us in the room when this was being discussed), and so they will make sure that new versions of their vector implementations won't introduce any incompatibilities that would make them nonconforming in light of this clarification."

checked your post - ok, seems it's not in the official release, only known that you "should implement it that way" to everybody who should know.
(Damn - another checkmark on my "STL is evil" list)

I know to trust the standard only (If I had one ) - but OTOH if Scott Myers is wrong on this (even though he shows deepest understanding of all the pitfalls of STL), I would have to seriously readjust my view of the world.

Funnily, the CPP FAQ Lite (CPP FAQ lite[^] does not contain a reference to the CPP Std section either... (but unanimously states "Yes, it is contigious)

Well, &vector[0] is the only known legal way to pass an STL container to a legacy API (i.e. the thing we mere mortals have to live with - until the entire OS is included in the STL) - so if you take away this, I will drown you in STL rants..

Hard to say what Meyers have or haven't said since I haven't got the book, but hey, even the best miss technical details from time to time.

peterchen wrote:Funnily, the CPP FAQ Lite (CPP FAQ lite[^] does not contain a reference to the CPP Std section either... (but unanimously states "Yes, it is contigious)

From the C++ FAQ Lite, that very same section, bottom part:

Caveat: the above guarantee is currently in the technical corrigendum of the standard and has not, as of this date, officially become a part of the standard. However it will be ratified Real Soon Now. In the mean time, the practically important thing is that existing implementations make the storage contiguous, so it is safe to assume that &v[0] + n == &v[n].. I think he caught it pretty well.