There is a quote that goes “Standards are great! Everyone should have one.” or something along those lines. (Somewhat ironically, this quote, too, has many different variations, and has many attributions. The earliest I’ve found attributes it to George Morrow in InfoWorld 21 Oct 1985).

A case in point is the base64 encoding. Put simply, it’s a method of encoding an array of 8-bit bytes using an alphabet consisting of 64 different printable characters from the ASCII character set. This is done by taking three 8-bit bytes of source data, arranging them into a 24-bit word, and converting that into four 6-bit characters that maps onto the 64-character alphabet (since 6 bits is 0-63).

When I was looking at base64, I was interested in three different varieties or flavours, namely the MIME version, the (per RFC 4648) standard base64, and base64url. These differ in how they handle line breaks and other illegal characters, what characters are used in the 64-character alphabet, and the use of padding at the end to make up an even triplet of bytes.

That’s three mostly similar but slightly different algorithms, so how to design an implementation? A number of designs are possible, of course, like:

three copy-pasted and tweaked sets of functions

hugely parameterised functions where every variation in algorithm or data can be altered in the call

I chose to use a design that in my opinion takes the best from all of those, with none of the disadvantages.

The Wikipedia article has a handy table giving a summary of the differences between the variants, which gives eight possible differences of data or algorithm. I’ve elected to ignore the last of those, which is the addition of a checksum only used for OpenPGP (RFC 4880), since that is easily added outside the base64 coding. The sixth difference on Wikipedia’s list – line separator – can also be ignored since it can be inferred from the maximum line length. The remaining areas of difference are:

Character for index 62

Character for index 63

Character to use for padding

Whether fixed line length is used

Maximum line length

Handling of illegal characters

Fortunately, these are all integer types (bool and char are both integer types), so can be used as template parameters. In other words, I can define a type to hold all possible variants:

This means that there is no risk of mixing up parameters in function calls – once a set is defined (and tested and verified to be correct) that can be used everywhere. The actual coding functions then get a very simple interface, with an input, an output, and variant type as only parameters:

(I won’t post the whole implementation here. While it’s only 200 airy and thoroughly commented lines, it’s better to give you a link to an archive with the files.)

Now, the astute reader will note that I only declared the coding functions earlier, and talked about the implementation as if it was separate. That’s the normal way of doing things, but that won’t work with templates, right?

Here’s the thing with template functions and classes: they are not just the one thing. A normal, non-template function is one single thing, fully defined once and once only. Therefore, it can be compiled in one compilation unit, and then linked to. It can be declared elsewhere, and that declaration is essentially a promise that somethere this thing is defined, which is all the compiler cares about.

A template function is effectively a new function for each template parameter (or combination of parameters) it is used with. (As far as the compiler is concerned, a template class is just a way of saying that all member functions have the same set of template parameters.) To the compiler, there’s no such thing as a “template function”. There is only “template function with these template parameters”.

This means that there can’t be a single compiled variant that can be linked to, only specialisations that use this or that set of template parameters. Even if only one variant, only one specialisation, is used in your project, the compiler can’t know that.

So instead, the compiler compiles these inline; it’s effectively replacing each call to a template function with a copy of that function, in which the template parameters are those used in that particular call.

In other words, C++ templates are just a way of bullying the compiler into doing the copy-paste programming you are ashamed of doing yourself.

As it happens, though, that also provides the solution to separationg definitions and declarations, provided you know what template parameters you’ll be using. All you need to do is declare the function with the parameters you want, in the same compilation unit as the template function definition.

In the example above, the declaration in the last line of my_template.cpp tells the compiler there’ll be a variant of the template function that uses int as template parameter. Okay, says the compiler, I’ll put an inline copy there. Since the generic definition is right there in the same compilation unit (ie my_template.cpp), this is something the compiler can do – it has all the information it needs.

The result of that is that in the compiled file (probably called my_template.obj) there is now a function that has the signature int my_temp_func(const int& t). This is a fully defined specialisation of a template function, so to the linker it looks just like a normal function.

However, the linker won’t be able to find a string specialisation, so this will generate a linker error.

This illustrates both how to use this trick, and its limitation. It only works if you list all specialisations you are going to use, which makes it unfeasible for generic libraries.

In this case, though, it’s ideal. I have my three variants of base64 defined – base64MIME, base64 and base64url – and those are the only one I’ll need.

This lets the linker find compiled varieties for all those base64 definitions.

Should you want to implement a slightly different base64 coder, you could use the code I’ve written. It wouldn’t be enough to declare a new definition type, but you would also have to add a declaration using that type to the .cpp file. But the source code is both open and free, so help yourself.

(I should note that like with so much else, base64 is something there are lots and lots of implementations of available on the net, but most of the ones I’ve found tended to be very lax and lack strict checking of syntactical correctness, or implement just one flavour. Hence, writing my own.)

As always, if you found this interesting or useful, or have suggestions for improvements, please let me know.

License

Share

About the Author

Orjan has worked as a professional developer - in Sweden and England - since 1993, using a wide range of languages (C++, Pascal, Delphi, C, C#, Visual Basic, PHP, Python and x86 assembler), but tends to return to C++.