Wednesday, August 13, 2008

In one of my earlier articles introducing programming with Boost libraries, I had demonstrated some simple techniques using boost::shared_ptr to share dynamically allocated object instances across scopes, and manage the life cycle of such objects effectively. This was an important step in the direction of eliminating memory leaks. But this was not a complete answer to all forms of memory leaks. This article helps you distinguish between "owning" and "aliasing" semantics for pointers and introduces another class from Boost's smart pointer stable - the boost::weak_ptr. Although not a smart pointer itself, the weak_ptr adds important capability that is vital in handling memory issues and thread safety of objects.

Imagine that you are building a linked list, and you decide to store the "head" node as a shared_ptr<Node> and encapsulate inside this node, the life cycle management of all the nodes in the entire list. That way, when the "head" node goes out of scope, all the other nodes are automatically deleted. How can you do it? Well, the head pointer stores a shared_ptr<Node> pointing to the first node, the first node stores a shared_ptr<Node> pointing to the second node, and so on.

This works remarkably well, and by now your penchant for using shared_ptr and RAII is seeing you write elegant code and wish away memory leaks. And then you are asked to enhance your linked list to a doubly linked list. With a deft, touch you make the necessary changes - and you are up and running again ...

... except that you have a memory leak. Believe it or not, what I promised a couple of articles ago was a fib - you can leak memory without even so much as nibbling on a pointer, and in spite of having wrapped all your heap memory in shared_ptr. Run it and you'll not see any of the messages from the destructor of Node. Come to think of it and you'll know the reason.

In the function getList, initially, the nodes head, next1 and next2 each have a reference count of 1 (at line 22, 23 and 24 respectively). Next at line 26, next1 has its ref-count bumped up to 2. In the following line 27, next2 has its ref-count bumped up to 2. So far so good - the real fun starts now. In the next line 28, head has its ref-count bumped to 2, and finally next1 has its ref-count bumped to 3. When this function returns, each of the Smart pointers have their reference counts decreased by 1. But head is copied as a return value - so the decrement nullifies, and when we reach main function at line 35, its reference count remains 2. At this point, the reference count of next1 and next2 are 2 and 1 respectively - but these objects are no longer in scope - completely encapsulated by head. At the end of the main function's scope, the ref-count of head will drop back a notch to 1 - it will not get to 0 - necessary for its internal pointer to get "destructed" and trigger the destruction of its subsequent nodes.

What caused the problem? We only introduced three lines of code - lines 11, 28 and 29 - so the problem would have to be here in these three lines.

Why did all hell break lose? Because, we used the shared_ptr indiscriminately. The shared_ptr implements a "shared ownership" idiom. In this case, head "owns" the next1 node - meaning that before head is completely "destructed", it would have ensured complete "destruction" of next1. But next1 does not own head - it only has an alias to head, in the form of the member prev. But by making it into a shared_ptr<Node>, we have unwittingly made next1 the owner of head. So we hit the chicken-and-egg problem. To reiterate what we just said about ownership semantics - before head is completely "destructed", it should ensure complete "destruction" of next1, and before next1 is completely "destructed", it should ensure complete "destruction" of head. Of course that's an impossible scenario. So we need a different construct for handling the semantics of an aliasing pointer. Here is the code example again with some changes that work reasonably well.

The statement at line 6, is crucial because if prev is not set to 0 at this point, an accidental dereferencing of new_head->prev at a later point in the code will bomb. At the end of the scope, at line 7, the head goes out of scope and is "destructed". What if this automatically set the prev member pointing to it to be NULL. Then we could afford to forget to write new_head->prev = 0, without facing any dire consquences.

The boost::weak_ptr is tailor-made for the purpose shown here - it models a "non-owning alias" which is automatically set to a null state when the object it points to is destructed. For this, weak_ptr always works in conjunction with a shared_ptr, and always points to an object via the shared_ptr that owns it. The following example should make all this clear:

The weak_ptr maintains a reference to a shared_ptr owning the object. But it does not bump the reference count of a shared_ptr - therefore any number of weak_ptr's can point to a shared_ptr, without affecting its life cycle in anyway. If the shared_ptr goes out of scope and the underlying object is destructed - the weak_ptr's are automatically "updated" about this change through the reference to the shared_ptr that it maintains.

In an informal way, weak_ptr's are to shared_ptr's, what soft links are to hard links on a Unix file system.

You can check the state of the underlying object through the weak_ptr, using the weak_ptr::expired() member function. You can also construct an owning reference - a shared_ptr which contributes to the reference count of the original shared_ptr - through the weak_ptr::lock() member function. This can indirectly be used to check the same condition as weak_ptr::expired() does. To understand that - look at the final example here:

The lock method is particularly crucial when sharing an object across threads. Some threads might only have an aliasing reference to the shared object in the form of a weak_ptr. This can be used to great effect. At any time when such a thread needs to share ownership of this object - it can call lock on the weak_ptr. It will get a non-null shared_ptr only if the object is still valid, and otherwise it will get a null shared_ptr.

Notice that the weak_ptr provides no ways of dereferencing the underlying pointer through it. So there are no overloaded -> or * operators for example. If you need to use these operators - use the lock method to try and get a shared_ptr and then apply these methods on the shared_ptr. In fact - a weak_ptr is not a smart pointer - I call it a "smarter" pointer.Read more!

Sunday, August 03, 2008

In this article, we would see how the use of templates can simplify and generalize the Expression Functors code, and improve performance by eliminating virtual functions.

In the example from the previous segment of this article, the Expression abstract base class represented the family of all types that could be used to represent different kinds of expressions. Two specific classes, Variable and Constant, represented the leafs of an expression tree - numeric literals and single variable names. These could be combined to form other expressions - generally represented by the ComplexExpression class. All these classes implemented the interface defined by the Expression abstract base class.

We now plan to switch from dynamic polymorphism to template driven static polymorphism. For example, given classes with a common interface but without a common base class, like the ones below:

We have just carried over the definition of the ComplexExpression class from the previous article and made it into a template class. The operator classes like Add, Subtract, Multiply and Divide should continue to work unchanged. Guess what, we just have to define the operator overloads (for +, -, * and /) and we will be done with defining our framework of algebraic expression.

In the above, x is a Variable, 3 should generate a Constant, x + 3 should result in a ComplexExpr<Variable, Constant>. Further, x*x is a ComplexExpr<Variable, Variable> and (x*x+3)*(x+3) is a ComplexExpr<ComplexExpr< ComplexExpr<Variable, Variable>, Constant>, ComplexExpr<Variable, Constant> >. One can trace the template parameter types as sub-trees of the class template ComplexExpression.

Clearly, to write an expression like x+3, we need an overload of operator+ between a Variable and an integer. To write an expression like (x*x+3)*(x+3) - we need an operator* between two ComplexExpressions. To write something like, x+Constant(3), we need an overload of operator+ between Variable and Constant. Now, just as we defined the basic operators on Expression in the previous article, we could do the same on the Expr class template here, instead of overloading each operator for different combinations of types. For example:

The other operators are not very difficult to write, following the above code. However, turns out that while these do take care of operators between complex expressions as well as Variables and Constants, they cannot handle real number or integer literals. To handle real number literals, we define the following overload pairs for each operator.

As it should be already apparent - there are no objects allocated on free store, no reference counted proxy wrappers, and a fair bit of compile-time type computation and call dispatching - this results in significant savings in the runtime costs of the program, apart from the obvious cut in the number of lines of code. The basic philosophy of the template based program has not changed from the non-template version - it uses functional composition involving the function call operator, and operator overloading to support naturally combining simple expressions into arbitrarily complex expressions. However, the use of templates allows a fair bit of this work to be done at compile time. This is all that is there to Expression templates. You can put together the expression framework using code given here and use the following function to test your code.

Here Predicate represents a functor (an object that overloads the function-call operator). In this particular case, the functor should take an element of the container of which first and last are iterators, and return a boolean truth value based on some criteria, possibly as a function of the element passed.

Using find_if, one can find all elements in a container, that meet a particular criteria. What’s remarkable is that, what find_if identifies as matching elements in the container is entirely dependent on the logic encoded in the Predicate functor – not on find_if’s implementation. This makes the functionality of find_if open-ended. What we are able to achieve in the process is a form of polymorphism with a very high degree of type-safety and performance. There is a specific name for such idioms – functional composition. In fact, by combining an arbitrary number of Functors with varied functionality, we can construct a fairly complex piece of functionality. This has a very useful application in developing mini-languages (also called DSELs or Domain Specific Embedded Languages - expressions with a syntactic form alien to C++ syntax, being embedded inside C++ programs as valid code. In this article and the next in this series, we look at the powerful technique called Expression Templates which helps solve these classes of problems.

We have all learned in high school Calculus course about such functions of single variable as:

F(x) = x^3 + 3x - 7

or

G(x) = x.sin x + 6

Representing such simple algebraic functions using functors is not a big hassle. For example, to represent G(x), one could write a functor like the one below:

This is essentially an area under the curve calculation that takes the average of upper bound and lower bound sums.

Now here is the deal: in course of a mathematical programming one could need to integrate dozens of such mathematical expressions. If there are 50 different expressions to be integrated, a program needs to define 50 different functors – and this is where the efficiency and ease of the system breaks down.

Imagine being able to define function objects that can evaluate arbitrary mathematical expressions and being able to pass such expressions to functions like integrate. Something like the following:

This almost looks like magic, doesn’t it – but it isn’t. The complex algebraic expression can, in each case, be engineered to give a functor which evaluates the expression for different values of the function argument. The standard technique or idiom used for enabling such expression building is known as Expression Templates and it is the focus area of this article. But, Expression Templates have a notorious reputation of being difficult to understand and learn and turned off many a learner. Therefore, we would try to build the logic of constructing such expressions intuitively, bit by bit. We will start off with a completely non-template version of the code – an idiom that I discovered for myself and which I call Expression Functor. We would then ‘deduce’ the Expression Template idiom as a special case of this idiom which achieves phenomenal performance improvements and code improvement, using templates.

So here we go.

Breaking down the problem

Let us first break the problem into its basic elements. We begin with a simple polynomial expression:

f(x) = x + 3

Such an expression has two kinds of entities – a variable (x) and a constant (the literal 3). Both these entities are valid expressions by themselves. For example, the straight line going parallel to the x-axis at a distance of 3 units above x-axis is represented by the function f(x) = 3. Similarly, the straight line going through the origin at an angle of 45 degrees to both x and y axes is represented by the function f(x) = x. Modelling these most trivial functions will be our building blocks for non-trivial expression building.

Consider the case of the horizontal line – f(x) = 3. We want to model a function which takes any value of x and always returns 3. The basic function would be:

Such a functor can be used to model any constant expression. C3 would now be equivalent to Constant(3).

Next, let’s try to model the 45 degree straight line through the origin – f(x) = x. Turns out that this is even easier – "given any value x, return that very value" should summarize our function. In short:

These two classes are remarkable on their own, but what happens when we try to model a function like:

f(x) = x+3

What type will be x + 3 – it is not Constant, and it is not an independent variable like Variable either. So we possibly need another class to represent more complex expressions. Come to think of it. Both a Constant and a Variable are each, in themselves, an expression, and so is a more generic expression like x+3. So it would seem logical that we should have an Expression base class of which, Constant, Variable and other Expression classes can be derived classes. We define it like this:

There is a reason for the curious looking Expression* clone() virtual function. I have included this with the benefit of contrived foresight - having implemented this whole solution already. Without getting ahead of the story, let me tell you that it plays a small but important role in the life cycle management of Expressions. It is used to copy Expressions where needed.

We would refer to Expressions like Constant and variable as simple expressions. We also need to define a sub-type of Expression to represent all non-trivial (i.e. other than simple) expressions. But before that can happen, we need to understand how multiple expressions can be combined using arithmetic operators. For example, we want to be able to write such expressions as:

Clearly, we need to overload operators like * and + between Variables (x*x), between Constants and Variables (2*x), and between multiple ComplexExpressions (x*x and 2*x). Moreover, the literals like 1 and 2 are not Constant objects, they are doubles that need to be converted to Constant objects. So, these operators should also be overloaded for double arguments. Clearly, given the fact that all of these (Variable, Constant, ComplexExpression) are different sub-types of Expression, it will be easier if we just overload these operators between Expressions, and between Expressions and doubles.

Now any complex expression can be represented as:

f(x) = u(x) OP v(x)

OP is a binary arithmetic operator. For example:

f(x) = c is a degenerate case where, u(x) = 1, v(x) = c and OP = *. Or perhaps u(x) = 0, v(x) = c and OP = +. Assuming that we have such an operation available for each appropriate arithmetic operation, we write the class for complex expressions as a template class - the template parameter being the Binary Operation.

Finally, when we combine two simple Expressions, we get a complex expression. For example, x might be a simple expression, but x*x is a complex expression. We want to be able to cascade the operators to any degree, thus:

x*x*x

should be a valid expression. Clearly x*x must return an object of such type that can be combined with x using a * operator. x*x must return a reference or pointer to Expression - because Expression is an abstract class so it cannot be returned by value, and besides it is not much point returning it by value because we want to treat it polymorphically (virtual function calls to operator() and clone() in ComplexExpression). Something like the following:

The above does not work because operator* returns a pointer to an Expression object - x*x returns a pointer. Consider an expression like x*x*2. x*x*2 is equivalent to (x*x)*2 - since x*x returns an Expression*, we need an operator* which takes a pointer (Expression*) and a double (2). The Standard does not allow an operator to be overloaded on arguments of integer types alone. Thus, operator * can only return a reference.

The above does not even work - because we are passing a reference to a local object created in the function. It has undefined behaviour. What do we do - well, a bit of inevitable complexity creeps in here. We need to return a pointer wrapped in smart wrappers, which can take care of the life-cycles of the underlying pointer and act as proxy objects when participating in mathematical expressions.

Consider the following definition of a reference counted wrapper cum Proxy for heap allocated Expression pointers:

Thus, the return value of x*x is of type ExpressionRef, and at least it wraps a polymorphic reference to Expression, and ensure that there would be no memory leaks. The only problem now is to make expressions such as x*x*x valid. x*x*x translates to (x*x)*x - whereby x*x returns ExpressionRef and x is some sub-type Expression. This is not such a big deal - we define operator * (ExpressionRef&, Expression&). We also need the alternative permutations: operator * (Expression&, ExpressionRef&). In fact, to allow expressions such as (x*x)*(x*x), one must also allow operators like: operator * (ExpressionRef&, ExpressionRef&).

At this point, all the issues with our solution are resolved - and we need to take stock of the final set of operator overloads we need to make the expressions work naturally. Here is a summary:

That makes it 8 types of signatures. For each type, if we plan to implement +, -, * and /, then we will have a total of 32 operators. One set of operators can have real implementations and the rest can be defined in terms of the first set. Here are the definitions for the + operators.

Having come thus far, you may have made two observations. One we have written code which is nifty, and works well but is repetitive in some parts and not very extensible. Second, although this article is about Expression Templates, we have hardly used any templates (except for the ComplexExpression class).

What are the key concepts in this code which enabled creating the Expression Functors?1. Nesting function objects and operator() of outer objects calling operator()'s of inner objects - for example ComplexExpression::operator(). The ormal term for these idioms is Functional Composition.2. Overloading operators on Expression types.

What we've done so far was laying the foundation for our understanding of Expression Templates. The real deal should not take a lot of time after this. We will now head into investigating how this model of functional composition can be retained and generalized, performance of the Expressions phenomenally improved, and the lines of code drastically cut - by using Templates. This is the topic of the next part of this article.

Saturday, July 12, 2008

Ten nifty tricks to incorporate into your armoury in your first week of using Boost

Boost has a wide array of libraries, and they often differ in complexity, expanse and sometimes power, but not in usefulness. Obviously a utility to convert strings to integers will not have the same complexity or power as a library for functional compositions. But both are useful in their own right. Today, Boost has close to 40 libraries (may be more) and this large number, along with the wide variety therein, sometimes serves to overawe the new learner. This article has a specific purpose - to get you started with a few, relatively easy, nifty tricks that you can employ in many general cases to add value and usability to your code, without having to be a Boost expert. Such head-starts serve to increase our motivation to use a tool, exposing us to the first hand benefits of a tool at the expense of a little effort. So without further ado, let me cut to the chase.

1. Time measurements and thermometer bars

The following piece of code illustrates how to use the Boost.Timer library to measure elapsed time between two points in code. Using this facility, you can easily measure the time taken to execute a certain section of code. To view the sorce code with numbered lines, click "Show line numbers".

The Boost header we used for our purpose is progress.hpp. We create an array of 16 integers and simulate some processing on this array by iterating through the elements of the array and pausing for 3 seconds for each element to simulate a period of processing (line 20). We intend to find the time taken to set up the array and then perform this dummy operation. To do this, we enclose the relevant part of the code (lines 14 through 22) in braces to define a scope. At the start of the scope, we declare an object of boost::progress_timer and initialize it with a stream object which can be written to. At the end of the scope, the progress timer object will write the total time elapsed to the stream object. Thereafter we either extract the double precision floating point value of the time elapsed in seconds from the stream (line 24) or directly write the stream's contents to the standard output (line 27).

2. Tokenizing strings and seamless value conversions

Suppose you are working on a banking back-office system and are scrubbing the summary data from your bank's daily closing report. Now this report is in some delimited format which your report rendering software understands, but which is not XML or something. Your requirement is simple - you want to extract all the data from the report and do some processing on it. In other words, you need to parse the report data. Second, there are numeric items in the list as well and you want to do quick comparisons of some of those - like the age of the account holder and the net balance in her account. All these quantities will be read as strings from the input so we will also need an effective and consistent way to convert these to numeric values. Parsing data of even moderate complexity can be a tough job and while the standard library containers and string classes provide adequate tools for doing this job, the code you'd end up with often tends to be long and not very readable. And converting string to numeric types isn't always straightforward either. On the other hand you don't want to spend more than half-hour in writing the code. What do you do. Well, you use the Boost.Tokenizer and Boost.Conversion libraries and you are through.

Suppose the data comes to you in the form of several records delimited inside braces ( { and } ). Within each record, several fields are delimited by semi-colons (;) and each field is a name-value pair of the form name=value or name:value. You don't want to be bothered about verifying the correctness of the grammar - you could assume that the document is well-formed and you then want to extract the data. Here is some sample data.

Obviously, this code does a few things with the data, but certainly does not print it. For a nice way to print this data in a suitable format, look at section 4 below.

3. Expressive code with zero resource-leaks

In an ideal world, all libraries and APIs you work with will have perfect semantics and you will never need to make compromises of any kind in your code. In such a world, you would only need to work with first class C++ objects which take care of their own initialization and clean-up in the C++ tradition of construction and destruction. In the real world however, you would be writing C++ applications to talk to legacy systems using a C API that is less than ideal. In that real world, you will have strdup-style C APIs that return dynamically allocated storage to you and expect you to take care of the life cycle of the pointer, remembering to call free on the pointer once you are done. As a real life programmer you would surely face such dilemmas several times, in several forms - and that's where the family of Smart pointers comes into the picture.

Smart pointers are simple classes which can "wrap" pointers to objects of different kinds, and in some cases arrays of objects. The primary advantage in the use of such wrappers is the seamless life-cycle management of these pointers (through the RAII idiom) and exception-safety. In fact the Smart Pointer collection from Boost is so good that you no more have an excuse to write C++ code that has memory leaks. I am not about to start a discussion on either the RAII-idiom or exception-safety - primarily because of lack of space in this column, and sites that handle the topics with more focus. Here we will see how the use of Boost's smart pointers provide power, simplicity and correct, reliable behaviour to your code.

Consider the following piece of code - let me tell you, it does not compile.

Given a C++ string, if you want to pass it's contents as a char* to a function that takes a (non-const) char* rather than a const char* (like strucase above), you have to make a copy of the string's underlying character array into a new array, and pass the base address of the array instead. It's a bit of a fib when I say you have to - what good is const_cast for after all? But then, what guarantee do you have that this strucase function that takes a non-const char* does not write to the string you passed. In fact, in this case strucase does write to the string, converting it in place to an all upper-case ASCII string. So the call at line 30 above fails (click Show line numbers to see line numbers). The string returned by new_api can be of arbitrary length (not exceeding platform specific limits, still) so you cannot create an array, with a size known at compile time, to hold the copy of the string. In other words, you have to allocate memory dynamically, and whenever you do that there is the hassle of correctly deallocating the allocated memory once every part of your code is done with using it. The following code addresses the issue with a smart array from Boost.

The first good thing about the above code is that it ensures there is no memory leak - the array allocated on line 31 is wrapped inside the scoped array object sa and at the end of the enclosing scope (in this case the main function), the destructor of scoped array kicks in and deallocates the entire array by calling delete[] on it.

Now imagine that instead of all this code being written inside the main() function, it was a different function which carried out these operations returned the allocated character array. Something like the following will obviously crash your code:

Lines 14 and 16 are recipes for disaster because new_str is a copy of the pointer copied_str that the my_new_string() function passes back to you. And both these are aliases or copies of the underlying pointer in the scoped_array in my_new_string(). As soon as my_new_string() returns, scoped_array makes sure that the memory pointed to by its underlying pointer is deallocated. So in the main() function, you are dealing with an invalid memory reference - new_str.

What do we do? Pass back a copy of the scoped_array object itself? For one, scoped_array cannot be copied. Even if it could be, whether that would work or not would depend on whether or not scoped_array detects copying and such replication operations and keeps a count of the places in the code that have valid references to the object. This is called reference counting and scoped_array does not do it. Boost shared_array does. All you need to change in the above code is this:

Now, whenever you need to pass around a dynamically allocated array, just pass around the shared_array wrapper for it. It takes care of everything else.

There are more goodies in the Boost Smart Pointer library, including its work-horse - shared_ptr. In fact having already shown scoped_array and shared_array, it is not much work now to get up and running with scoped_ptr and shared_ptr. Both scoped_ptr and shared_ptr are capable of encapsulating a pointer to a single object on dynamically allocated storage. scoped_ptr is, of course, not copiable and not reference counted - suitable for operations limited in a single scope. shared_ptr on the other hand is a remarkable mobile smart pointer - you can freely copy and pass it around, it is fully reference counted. These classes overload the -> and * operators and therefore working with them is a breeze. Here is a short example to wrap up this section.

Imagine how you would have handled the case where you could need to create arbitrary objects on the free-store with new and pass them around in the program, across scopes, sharing between multiple owning objects and scopes. Without the shared_ptr, such brevity was impossible.

In reality, shared_ptr models a "shared ownership" idiom that concerns managing the life cycle of the underlying heap-object. Sometimes, this is precisely what we want to avoid - ownership. In such cases, shared_ptr fails to prevent memory leaks and we need other alternatives. For more details, see weak_ptr's.

4. Using reference wrappers to write efficient code

Suppose you are dealing with objects of a class which cannot be copied, and perhaps not be assigned either. May be you are modelling humans in a social group and one of the assumptions / constraints is that humans cannot be replicated or cloned. So a Human object would essentially be a non-assignable, non-copiable object. And yet, for the purposes of effectively managing large groups of humans, you have to use different kinds of containers. May be you are modelling the behaviour of humans in a queue and you choose an std::list here - because the Humans have the liberty to leave the queue at any point. A list perhaps also makes sense, because a Human might opportunistically join a queue at the middle rather than at the end where he is actually supposed to join - to reduce his personal wait time. Whatever the social engineering issue here, we have a bigger computer programming issue to beat. Humans are not copiable or assignable - but an std::list container requires its elements to be both. You have no choice - either make the Humans copiable (an untenable model) or roll out your own containers which can store object references - or so you sulk. Enter Boost reference wrappers - a feature that should actually make it to the next release of the C++ Language and Standard Library (C++0x) - and you have reason to cheer.

The idea behind reference wrappers is simple. Create a trivial wrapper around a reference to an object and make this wrapper copiable and assignable. This wrapper object stores a reference or pointer to the real object. The wrapper object can be easily copied and assigned - without the underlying object getting replicated. Finally, the wrapper object should transparently degrade to the underlying type in contexts where an object of the underlying type is needed. This makes a reference wrapped object very easy to store in STL containers and easy to use in STL algorithms. The boost::reference_wrapper<T> template conforms to all these requirements and does a great job of it. Consider the following code.

The above code defines the Human class, and each object of the class gets a unique id at the time it is created - which distinguishes it as a unique individual, distinct from others of the same kind (Humans).

Four humans are created and they are stored in a fixed order in a vector of reference wrapped Humans. This is necessary because we could not have had a vector<Human> since Human is not a copiable class. Since boost::reference_wrapper<T> is copiable, we can create a vector<boost::reference_wrapper<T> >.

Finally, we print the unique id of each Human in the people vector. If you notice, we actually reference wrapped and stored const Human, and we created such reference wrapped objects from regular objects using the call boost::cref(...). You could have used boost::ref(...) instead to create non-const references. Most importantly, the functor PrintId does not need to know anything about reference wrappers. It's overloaded operator() takes a normal reference to a const Human, and when the for_each algorithm is used, the reference wrapper elements from the vector gracefully degrade to const Human&. This is huge convenience if you consider how brief the code is.

But if you thought this is the limit of reference_wrapper's applicability, you have been deceived. The reference_wrapper class is not only an idiom-enabling construct (allowing you to store references in STL containers), it is also an optimization-enabling construct and in many cases, changing existing code to use reference_wrappers can actually reap tremendous performance benefits. Consider the following piece of STL code:

This shows a regular class Foo, which retains a unique id (from the Human example) being stored in a vector< Foo > and then the functor Bar<Foo const&> being invoked on each element of the vector. The Foo object is copiable and assignable, although on copy-construction the new object gets its own unique id, and even on assignment, this unique id is retained and not overwritten. Whenever a new object is created through the default constructor, a message of the form:

Object constructed: id = <object_id>

is printed. Whenever a new object is created through the copy constructor, a similar message of the form:

If I remove the call to the reserve method of the vector at line 46, you might see a few more copies of the Foo objects being done as the vector initially expands upon each push_back. On my system (MSVC 9), with line 46 commented, the output looks like:

Things look only marginally better with gcc 4.1. This copying is at its peak initially when the vector starts to grow because the vector's storage is repeatedly reallocated. It becomes much less pronounced when the vector becomes larger. This should be a good enough reason to use the reserve method to optimize on unnecessary reallocation and copying.

If you now notice, in this piece of code, we are not retaining using the vector after any of the Foo objects f1, f2, f3 or f4 go out of scope. So, if we could store references to the Foo objects in the vector, it would have been a lot more efficient - with almost no copying needed. Here is the code for it. You can compare its output with that of the code above.

This is an important optimization - the kind there is a lot of scope for, and so you should always look out in your code for an opportunity to do such optimizations, using boost::ref and boost::cref.

5. Expressive use of STL algorithms with lambda functions

This short section is the least trivial of all the "week 1" techniques we are going to discuss here, because here I will whet your appetite for the enormous amounts of functional programming that Boost supports. We will take a look at the Lambda library - a library for creating unnamed functions, based on expression templates.

The beauty and power of the Lambda library lies in the fact that one can create specialized functions (or rather functors or function objects) on the fly and invoke them by various means. These functions don't need a name and have an implicit prototype that can easily change.

In the example in the section 2 (on string algorithms) above, we extracted the AccountSummary data for each individual from the delimited text stream. The data is now present in the form of a vector. What if we want to write this data in a suitable format to the standard output. Say, we want to output each record in a block, like:

We could write a function object and pass it to the std::for_each algorithm as the applicative functor to do the job for us - for_each iterates over each element in the vector and prints it in the format supported by the functor. But following this, you discover that you also need to put this data in a different delimited format like:

What do you do - fire up your C++ editor and add another piece of functor. The number of formats you want your data to be printed in is really open-ended - may be you just want to summarize the name and age of the individuals - yet another functor? You must be thinking by now that you want an easier way to do this. After all you might be defining all these functors in some other source or header file, while you might be using them somewhere completely removed from the place they are defined in. Your operative call would probably just look like:

There is nothing wrong with such code provided we have reasonable names for the functor classes like PrintBlocks, PrintPipeDelimited or PrintSummary (all of which are assumed to have a trivial default constructor here). But can you look beyond the PrintBlocks name and figure out the format in which the data is actually printed by it - or figure out what constitutes a summary in PrintSummary. You cannot - and you don't want to depend on an aspect of programming that depends on appropriateness of naming identifiers. Besides, such formatting detail is not worth encapsulating in functors and storing away - how much reusable are these functors really going to be. This is where Boost Lambda helps you in being expressive. Below is the slightly curious looking rewrite of the above code using Boost Lambda, and with all the formatting logic spelled out in-place. The code essentially extends whats's already there, so only the newer parts of the code are provided, with only a bit of the old code for context. Please don't be exasperated by syntactic unfamiliarity - try to get the big picture of what's being done.

If you see, we have printed the data in the AccountSummary records present in the vector in three different formats - Blocked, Delimited and Summary information. We didn't use a for-loop but rather the for_each algorithm to iterate through the records in each vector. The for_each algorithm takes three arguments - the two iterators that define the range of records to be handled, and a function pointer or functor which takes a single element of the container (in this case an AccountSummary object) and does some processing on it. The for_each statements for the three different cases (Blocked, Delimited and Summary) appear at lines 89, 102 and 113 respectively. In each case, the first two arguments are the iterators marking the beginning and past-the-end location of the AccountSummary vector. But for the third argument - we have a Lambda expression or an unnamed function that evaluates to a function object.

A Lambda expression is really just an expression, whose value is a function object. Just as 5+3 is an expression which results in an integer object, the Lambda expression is an expression which generates a callable entity - a functor object. The type of this functor object is a derived type which is of no consequence here. The Lambda expression is really a way of composing smaller function objects into a more complex function object, using overloaded operators to string together the smaller function objects. Thus expressions such as bind(...), constant(...) and _1 are all function objects with overloaded operators between them. Indeed, a Lambda expression is an excellent example of functional composition - the core idea in functional programming. More explanation follows.

In each of these Lambda expressions, the curious looking _1 represents a "placeholder" for the first argument to the function. Because a Lambda expression evaluates to a functor of some sort and this functor has a signature of its own - it is but natural that we need a way to reference the arguments to this functor inside the Lambda expression. Boost's Lambda expressions support upto 9 arguments - by default represented by the placeholders _1 through _9. They are just normal C++ objects (of some type, not important here) and intelligently named using the fact that any valid C identifier can start with an underscore. In this case, there is only one argument to deal with - the current element in the vector - so we only need _1. But we need to get at its members.

An expression like bind(&elem_type::name, _1) creates a functor which returns the value of the data member name of the object represented by _1, whose type is elem_type. Here elem_type is typedef'd to mean the value_type of the vector (i.e. AccountSummary). For example, if you were to use for_each to iterate over a map of string keys and double values, then at each step you'd be getting elements whose type is std::map<string, double>::value_type. This is essentially an std::pair<string, double> with members first (string) and second (double). Then, in order to construct a functor which returns the string key, we would use a bind expression like this:

bind(std::map<string, double>::value_type::first, _1);

We can almost always abridge these expressions by using suitable typedefs. For example:

Typically, we should try to use a single typedef'd name, corresponding to the type of _1 (or whichever argument placeholder).

Expressions like constant("name: ") or constant('\n') represent functors which return a constant value. We don't want literals in the lambda expression - rather function objects which return such literals. That's because they need to be invoked for each iteration - not just printed once. However, you would have noticed that "Delimited:" or "Block:" are not enclosed in constant(...). In the output, you should see that name: or age: are repeated for each record - these were enclosed in constant(...) - but "Delimited:" or "Block:" were printed only once - because these were not enclosed in constant(...).

In the same way as constant, there is another construct - var, which can wrap non-literal object instances. We haven't used it here but we could.

When all of these are combined into a single Lambda expression, it represents a callable entity - essentially a generated super-function, built from the above individual functional units. By now, you should have a gut feel of how Lambda expressions work - but I will leave you with another example to squint your eyes at.

The real power of Lambda expressions is the ability to define a callable entity at the site of invocation in a fairly intuitive and succinct manner. There is a syntactic baggage associated - and I feel that's the only factor that weighs against Lambda expressions. But it requires a bit of mastering, and you'd be on your way to leveraging the full power of Lambda expressions.

6. Platform-independent code to manipulate files on the disk

7. Quick and easy threads

8. An extensible way to handle command-line arguments

9. Matching patterns and replacing text using Regular Expressions

Regular expressions are an important programming tool and is extensively used in applications for filtering and parsing text, and matching names of objects against name patterns. Every language used for application programming owes itself a regular expression facility, and with the slick and fast Boost.Regex, C++ gets its first standard Regular Expression library.

This section cannot serve as an introduction to the theory and syntax of regular expressions. Assuming that you have a working knowledge of Regular expression syntax as used by grep and egrep on Unix, or the Perl language, we get started here with a few short examples of how to use the Boost.Regex library.

To use the Boost.Regex library, you should know its basic object model - the main classes whose objects you need to define and use for your Regex work.

A regular expression is encapsulated by an instance of the class boost::regex. In fact, boost::regex is a typedef for boost::basic_regex<char>. There is a wide character version available - boost::wregex, which is a typedef for boost::basic_regex<wchar_t>.

The result of trying to search for a regular expression in an input can result in matches at several levels - including matches for sub-expressions within regular expressions. All such details of matches are reported back in an object of type match_results. Once again, match_results is a class template, and there are concrete instantiations like cmatch, smatch and their wide character variants (wcmatch and wsmatch) which correspond to specializations for C-strings (char*), C++ strings and wchar_t* strings.

The iterator regex_iterator which helps you run through the input sequence identifying all matches for a particular regular expression. This is a really handy tool.

Finally, the regex_token_iterator iterator adaptor - which makes splitting input based on delimiters (like split in Perl) as easy as it gets in C++.

In our first example of this section, we will write a simple egrep like utility which looks for regular expression patterns in a text stream, and prints out lines matching the pattern. It also prints the line number, and the segment of the line that matched the regular expression (in brackets). The program is invoked with a regular expression and one or more input file names as command-line arguments.Hide line numbersShow line numbers

If you notice, the above example has a small issue. It picks up each line, searches for a match for the regular expression, and as soon as the first match is found, it prints the line number, line and the matching segment (in brackets). It then moves to the next line. If there are multiple segments in a single line matching the given regular expression, we miss all of those segments except the first one.

While it is possible to use the same regex_search function in a loop and using string iterators to run through all matches per line of text, the code doesn't look very good. In case you have to run through all distinct segments within an input sequence that match a regular expression, use the regex_iterator. A regex_iterator "points to" a match_result object. A match_result object represents a segment of input that matches a regular expression (as well as sub-expression matches, etc.). Thus, using regex_iterator, we loop over all matches in an input sequence. Here is the slightly altered code ... only the changed lines are highlighted.

There are just three things to note in the above example. On line 39, we instantiate two objects of sregex_iterator - this is a typedef for regex_iterator. The second of these objects (end)is default constructed and serves as an end-marker or null-marker. Finally, on line 42, (*it) is an smatch (match_result) object and (*it)[0] represents the part of input that matched the whole Regular Expression. If there are N sub-expressions within the regular expression, (*it)[1] through (*it)[N] represent parts matching those sub-expressions. The str() member function gives the matching part as a C++ string.

10. Easy date and time measurements

Having come thus far, I must clarify that these ten libraries are in no way the ten most important libraries to know in Boost. There are many others which find no mention in this article - Boost.Interprocess, Boost.Asio, Boost.IoStream, Boost.Graph and the seminal Boost.MPL (The Boost Template Metaprogramming Library) among many others. The reason for not having covered these is that these are more involved libraries and we need to devote more space and time to each one to leverage their power. Therefore, they don't belong to a '10 nifty tricks' list. I'll be back soon with some of these topics.Read more!