Introduction

Don Clugston explained about the member function pointer and its behavior in a very well organized way in his article: "Member Function Pointers and the Fastest Possible C++ Delegates". In short, member function pointers can not be assigned into a void * data pointer, so special care need to be taken to deal with a member function pointer as a data (functor). Don's FastestDelegate is the fastest possible delegate as he claims, but he leveraged reinterpret_cast<> to deceive the compiler, which he calls a 'horrible hack' by himself in his article.

Sometime later, Sergey Ryazanov came up with a fast C++ delegate which he found to be as fast as Don's FastestDelegate, but completely C++ Standard compliant. His technique is to use a templatized static member function ('stub') to store/restore the function call information (return type, argument type, cv-qualifier, and platform specific calling convention) at compile-time. Actually, it isn't new as a similar approach has already been introduced by Rich Hickey in his article "Callbacks in C++ Using Template Functors", 1994. Rich called such a static member function as a 'thunk' function. However, Sergey's technique is unique as he passes over member function call information through a non-type template parameter. Unfortunately, not many commercially available compilers can support this truly C++ Standard feature. Thus, his code is not really portable in that sense.

There is one more great article about how to implement C++ delegates, here in CodeProject. It is: "Yet Another Generalized Functors Implementation in C++", by Aleksei Trunov. He explains and analyzes so well in detail about generalized functor requirements and the problems of the existing delegates. His article inspired me to start implementing my own fast delegate, which will be explained here below.

By the way, these techniques are considered as 'fast' since they are able to avoid heap memory allocation to store member function pointers, unlike boost::function.

Another fast C++ delegate? So, what's new?

No, there is nothing new in my fast delegate. All the features that I will show in this article already exist in other's implementations.

Fast delegate. (No heap memory allocation in 'most' cases.)

Support three callable entities (free function, member function, and functor).

But, what if these features are all available in one? Doesn't that sound promising? Let's see!

Type restoring without heap memory allocation

Let's start off by revisiting Sergey's technique. It is only time the type information of the callee object is available when the member function pointer is getting assigned into the delegate; thus, some kind of mechanism is required to store the type info of the member function pointer into a non-typed universal form, and then to restore it whenever the delegate is being invoked later. Sergey uses a templatized static member function called 'method_stub' for this purpose, and also uses a member function pointer as a non type template parameter. These two techniques make it possible to avoid heap memory allocation. But, both of these truly C++ Standard-compliant features are accepted by the relatively new compilers only.

From my own experience, I know that old compilers such as VC6 do not like to deal with templatized static member functions. At the same time, I also know that, in such a case, using a static member function of the nested template class will make those compilers happy. So, I changed it as shown below:

One problem solved, but still need some 'C++ Standard-compliant but portable' mechanism which can substitute the non-typed template parameter of the member function pointer. The TMethod should be stored as a non-typed form in the delegate class. But, as we have all learned from Don's great article, the size of the member function pointer varies according to the inheritance trait of the class that it belongs to as well as according to the compiler vendors. Therefore, dynamic memory allocation is 'inevitable' unless otherwise we decide to use a really huge sized buffer. (Don's analysis showed that the maximum size of the member function pointer is about 20 ~ 24 bytes, but I don't want to put a disclaimer saying that "this delegate class works only if the size of the member function pointer is less than or equal to 24 bytes".)

If you've read Rich's article "Callbacks in C++ Using Template Functors", you will realize that what I am doing here now is just a replication of what he did 10 years ago. But, after I read Aleksei's article "Yet Another Generalized Functors Implementation in C++ - An article on generalized functors implementation in C++", I realized that his meta meta-template technique to make the class behave differently according to the size of the member function pointer can be applied here and that I might be able to create something neat.

By using the If <> meta meta-template, it is possible to make the member function pointer be stored in the internal buffer, whose size can be determined at compile time, and is smaller than the size of the internal buffer; but, if the size of the member function pointer is found to be greater than the size of the internal buffer, we can still deal with it by allocating the heap memory to whatever size it is. And, the compiler will automatically decide it for you.

Now, we have a 'C++ Standard-compliant and portable' substitution for Sergey's member function pointer as a non-typed template parameter. Two or three levels of indirection has been added, but the compiler with the descent inline optimization can opt it out and yield a code which is equivalent to Sergey's.

In addition to this, we now have a binary representation of the member function pointer as an internal data structure so that it becomes possible to compare it with others. In other words, we can use delegates in STL containers which require its element for comparisons capability. (See the demo project included.)

The size of the internal buffer can be customized by defining a proper macro before including the header file. I chose 8 bytes as the default size according to the size of the member function pointer table from Don's article. (I am mostly using MSVC, and has never used virtual inheritance, so 8 bytes is sufficient for myself, but again, you can customize the default buffer size.)

Object Cloning Manager Stub (New)

In the previous version, only the pointer (reference) to the bound object can be stored in the delegate for the member function pointer (argument binding) to be called on when the delegate is being invoked later. I decided to add the support for cloning the bound object in the delegate so that the member function pointer can be invoked on the internal copy of the bound object rather than on the pointer to the bound object.

Again, only when the type information of the member function pointer or its bound object is available is when the member function pointer is being bound to the delegate. Therefore, some sort of type retaining the 'stub' for object cloning and destruction is required. A similar static member function of the nested template class, as shown above, to be used for invoking the member function pointer can be applied here again.

The size of the bound object is unknown, and it can be as small as a few bytes to several hundred bytes, or even more. Thus, heap memory allocation/deallocation is inevitable. This conflicts against the design criteria of 'fast delegates' since the main purpose of using a fast delegate is to avoid the use of heap memory at all costs. (A custom memory allocator, which will be addressed later, might play a decent role to soothe this issue.)

Actually, I decided to introduce the cloning feature in my delegate for smart pointer support. Unlike in C#, we don't have a built-in garbage collector in C++, but we have smart pointers. In order to work with a smart pointer, it requires to be able to copy or destruct the smart pointer instance in a type safe manner (in other words, an appropriate assignment operator or destructor of the smart pointer must be called). We already have a 'stub' function to serve this purpose. But, there are still two more prerequisites / conditions to be fulfilled. (The idea is borrowed from boost::mem_fn.)

A function, get_pointer(), which takes the reference or const reference to the smart pointer and retunes the pointer to the target object stored must be provided in the qualified namespace (including an argument-dependent lookup).

The smart pointer class must expose a public interface (typedef) of element_type. (std::auto_ptr<T>, boost::shared_ptr, and its sibling, loki::smartPtr expose this public interface.)

The following two versions of get_pointer()s are implemented in my delegate, by default:

boost::shared_ptr defines get_pointer() for itself in the boost namespace. Thus, those compilers which implement the Koenig lookup (argument-dependent lookup), such as VC71 or higher, GCC3.x.x.x, will be able to see the definition so that it can be recognized and supported by my delegate without adding any extra line of code at all. But for those which don't implement the argument-lookup properly, nor doesn't have it at all, we can help them by providing the appropriate get_pointer() in the fd namespace.

The Preferred syntax can be only accepted by the relatively newer C++ Standard-compliant compilers such as VC7.1 or higher, or GNU C++ 3.XX, while the Portable syntax is supposed to be accepted by most compilers (I assume that my fast delegate can be easily ported to some other compilers without any significant problems, as it is proven to work even in the notorious VC6). (Remark: I tested my delegate only in VC6, VC7.1, and DEV-C++ 4.9.9.2 (Mingw/gcc 3.4.2).)

When both Preferred syntax and Portable syntax are supported, it is just fine to use both syntaxes mixed (copy, comparison, copy-construction, assignment, and so on). Therefore, all the example code snippets hereafter will be demonstrated in Portable syntax.

Wrapping three callable entities

A callable entity has a function call operator (), and three callable entities are:

Free functions (including static member functions),

Member functions, and

Functors (Function objects).

These three callable entities can be assigned to fd::delegate in a very similar manner as boost::function, except functor.

Probably, this might not be what you wanted to achieve. You might want to declare a delegate more like: fd::delegate1 <void, int>, rather than: fd::delegate2 <void, CBase1 *, int>. If so, it is called 'argument binding' for a member function, and will be addressed later.

It is very interesting that the member function can be adapted and invoked as shown above. While I was using boost::function in this way, I almost had an illusion that a raw member function pointer can be called also in the same way, and actually, I even tried (and the compiler complained to me :P). It is a special provision, and it involves lots of internal coding to cast such an illusion. I will call it a 'member function adapter'.

I didn't consider including functor support in the beginning. When I changed the plan (even after I completed a tedious and boring code duplication for a calling convention), I tried to implement the normal assignment operator (operator =) for the functor, but it caused me so many overloaded function ambiguity issues. So, I almost gave up this support and planned to enforce the user to implement it, something like:

Silly me! I hope you will be happy with operator<<= instead of the above.

A delegate can't be a delegate without something for which it represents. That is, the wrapped target callable entity must be in a valid state when the delegate is being invoked. This behavior is somewhat different from how boost::function is assigned from a functor. By default, boost::function clones the target functor internally (heap memory allocation) unless otherwise boost::ref or boost::cref is explicitly used. In the previous version, my delegate only stores the reference (pointer) to the target functor assigned. So, if it is a stateful functor, the caller is responsible to keep the target functor intact to be called (the exact same idea is applied to the callee object bound for the member function later).

But in the new version, I added a cloning bound object feature, therefore the syntax of operator<<= has been changed to distinguish the reference (pointer) storing version and the cloning version. Also, a special copy-constructor which accepts the dummy bool as the second argument for the functor is shown above.

A member function pointer is required to be called on the callee object of the same type. The callee object is bound as a reference (pointer) so that it should be in a valid state when the delegate is being invoked. It is the caller's responsibility to make the callee object intact to be called.

In the new version, the bound object can be cloned internally, or even a smart pointer can be bound for automatic memory management.

But the member function adapter version of fd::make_delegate() needs to be treated as different from the other version of fd::make_delegate(), and there is reason for it.

The CBase1::virtual_not_overridden member function in this example is a public member function, and the derived class didn't override it. Since it is a public member function, it is just fine to refer the member function pointer as a notation of 'CDerived1::virtual_not_overridden'. But, when this notation of the member function pointer is passed over to some automatic template deduction function such as fd::make_delegate() as an argument, the template type automatically deduced is surprisingly 'CBase1::virtual_not_overridden', not 'CDerived1::virtual_not_overridden'. Therefore, the delegate created from fd::make_delegate() will become the fd::delegate2 <void, CBase1 *, int> type, while what we wanted was the fd::delegate2 <void, CDerived1 *, int> type. This is why the typed null pointer is required to be passed over explicitly as the first argument of the make_delegate() helper function, in this case. A similar concept is used in type-check relaxation later.

Comparing two delegates means comparing the memory address of the function pointer stored internally, and it does not really mean anything special. But, making it possible allowed my delegate to be used in an STL container seamlessly. As it is a 'fast' delegate in 'most' cases, we don't need to worry too much about performance degrade while the delegate is being copied inside an STL container by the value semantic.

Platform specific calling convention

Calling convention is not a C++ standard feature, but it can't be just ignored as the Win32 API and the COM API use it. From an implementation point of view, it is just a matter of boring and tedious replication of the same code. By default, none of the platform specific calling conventions are enabled. To enable it, the relevant macro needs to be defined before "delegate.h" is included.

FD_MEM_FN_ENABLE_STDCALL - to enable __stdcall support for member function

FD_MEM_FN_ENABLE_FASTCALL - to enable __fastcall support for member function

FD_MEM_FN_ENABLE_CDECL - to enable __cdecl support for member function

FD_FN_ENABLE_STDCALL - to enable __stdcall support for free function

FD_FN_ENABLE_FASTCALL - to enable __fastcall support for free function

FD_FN_ENABLE_PASCAL - to enable Pascal support for free function

(Remark) Calling convention support only works in MSVC at the time, due to the lack of my understanding of gcc.

Type-check relaxation

Template parameter types passed over to a delegate are very strictly checked, but this could be too much in real life. There might be a situation where we want to treat a bunch of int (*)(int) functions and int (*)(long) functions together. When these functions are assigned into fd::delegate1 <int, int>, the compiler will emit errors for the int (*)(long) functions saying that it cannot be assigned since 'int' and 'long' are different types.

By defining the FD_TYPE_RELAXATION macro before including "delegate.h", type-check relaxation can be enabled. In a nutshell, a function (free function, member function, or functor) can be assigned or bound to fd::delegate whenever the following three conditions are met:

the number of arguments matches,

each matching argument can be trivially converted (from the delegate's argument to the target function's argument) by the compiler,

the return type can be trivially converted (from the delegate's return type to the target function's return type "and vice versa") by the compiler.

If any of the above conditions can not be met, the compiler will complain about it (compile-time warning and/or error messages).

and upcasting from 'CDerived1 *' to 'CBase1 *' is always safe, thus can be converted trivially.

fd::make_delegate() for type-check relaxation mode (Removed)

[Obsolete] When FD_TYPE_RELAXATION is defined, sets of fd::make_delegate() are enabled to support this mode. Since fd::make_delegate() can not guess at all what type of delegate is required to be created, the caller must specify the delegate type null pointer as the first argument of fd::make_delegate(). This is exactly the same concept used in the member function adapter version of fd::make_delegate() explained earlier. [Obsolete]

The purpose of using make_delegate() is automatic template parameter deduction, so there is no reason to use make_delegate if the additional type information is compulsorily provided as the first argument. Since this feature even causes a big confusion to poor compilers such as VC6, it is removed in the new version.

static_assert (Debugging support)

When a delegate is assigned or bound to a function pointer, the compiler generates the appropriate function call operator () at compile-time. If there is a type mismatch warning or error, it is really difficult to track down where those originate from. A smart compiler like VC7.1 has the nice capability of tracing those warnings or errors with detailed template type information up to the user source code, but VC6 doesn't. (VC6 usually gives two levels of traces.) So, I tried to place a static_assert (FD_STATIC_ASSERT, FD_PARAM_TYPE_CHK) in as many places as possible so that it can be easier to track down the origin of warnings/errors in the user source code.

Custom memory allocator support (New)

In the new version, my delegate can use services from any custom memory allocator when it requires to allocate or deallocate memory either for storing the member function pointer whose size is greater than the internal buffer size or for storing the cloned bound object. std::allocator<void> uses heap memory, which is known to be very expensive and very slow. Using fixed size block (chunk) memory allocator for small objects can increase the performance in a quite big time than when just using the default std::allocator<void>. Of course, the degree of benefit from using a custom allocator will vary according to the implementation detail of the custom allocator employed.

I included a fd::util::fixed_allocator which allocates a big chunk of memory at once for small objects' later usage. It is implemented based on several articles that can be found in CodeProject. You can provide any custom memory allocator of your favor.

Final words

If the speed is your only concern, define FD_DISABLE_CLONE_BOUND_OBJECT (extra four bytes space per delegate will be saved as bonus), and store the pointer to the bound object version of member functions only; otherwise, you can use a smart pointer to the bound object and the custom memory allocator to tweak the performance by balancing between speed and safety. The delegate's behaviror and features are fully customizable by defining the proper macros (look at the "config.hpp" file).

I also included a guide on how to extract a simplified version from the full version for those who want to see the implementation details after the macro expansion.

References

[Hickey]. Callbacks in C++ Using Template Functors - summarizes existing callback methods and their weaknesses, then describes a flexible, powerful, and easy-to-use callback technique based on template functors. ('1994)

[Peers]. Callbacks in C++ - An article based on Rich Hickey's article to illustrate the concept and techniques used to implement callbacks.

[Clugston]. Member Function Pointers and the Fastest Possible C++ Delegates - A comprehensive tutorial on member function pointers, and an implementation of delegates that generates only two ASM opcodes!

[Ryazanov]. The Impossibly Fast C++ Delegates - An implementation of a delegate library which can work faster than "The Fastest Possible C++ Delegates", and is completely compatible with the C++ Standard.

[Trunov]. Yet Another Generalized Functors Implementation in C++ - An article on a generalized functors implementation in C++. Generalized functor requirements, existing implementation problems, and disadvantages are considered. Several new ideas and problem solutions, together with complete implementation are provided.

[boost]. "...One of the most highly regarded and expertly designed C++ libraries." boost::function, boost::bind, boost::mem_fn.

Above alternative is more intuitive but there is a caveat for this case. Once functor assignment is involved, fd::delegate is no more fast delegate. fd::delegate is a fast delegate only for member function and free function. Functor assignment is provided only for convenience purpose.

(Note, functor assignment syntax in the above is for the updated version of fast delegate. I completely re-designed functor assignment portion and the updated version will be eventually posted here in codeproject. I just don't have time now. )

Why do you bother with these things? Bind has been made apart of TR1 your
implementation lacks are great deal of functionality and optimisations that
are already tested/running on all major C++ compilers. I'm sorry to say your
article may have been relevent 5-10 years ago, however not anymore.

Thank you for your comment, I agree your point, bind & function now a part of TR1. However, if you have chance to read Don Clugston's great article about the 'fast(est)' delegate posted here in CodeProject. You will see that there is a certain area which bind & function is not quite well suit to be applicable as they are implemented to use heap memory allocation/deallocation internally, which tend to be ' slow ' operation under certain situation.

I do also love to use bind/function/lambda from boost library but there are still many people who don't use boost and I guess that it will take sometime till all the popular compiler vendor start to support bind & function as the standard package.

One can use bind & function in general and when he/she need the ' fastest ' delegate then he/she can use Don's or Sergey's implementation and if he/she need faster but more feature rich delegate then he might consider to use mine.

Most of all, I enjoyed myself while developing this simple library and I thought what I found here might interest someone else out there as well.

Since the ambiguity issue in VC8 occurs between 1) bind function which takes pointer to object and 2) that takes reference to object, it is possible to disable all bind functions which takes reference to object by defining FD_DISABLE_CLONE_BOUND_OBJECT macro before including "delegate.h".

Once FD_DISABLE_CLONE_BOUND_OBJECT is defined, Object Cloning Manager Stub will be disabled therefore bind functions will accept only pointer to object only. Binding member function with Smart pointer to object also depend on Object Cloning Manager Stub thus will be disalbled as well.

Hopefully the fix will be added into the next release of VC8 service pack, so this crap go away.

seems to be a good article, but i am going with don's solution because all though reinterpret_cast<> is a hack, it's a lot easier. too many files, too much of the meta-meta- approach, and some ugliness ala <<=, made your approach less appealing and more academic. and, when it's said and done, it's not myself alone spending the additional time understanding your source. the best part of your article is it's use of cited reference. that said, i think it's a neat puzzle and i do appreciate your attempt to syntheisize it.

Yes, I think I tried to squueze too many features into a single class.

And I agree that <<= semantic is really ugly (I don't like it either but wasn't be able to resolve all kind of ambiguity issues when tried to incoporate the copy function into normal = operator especially keep considering the support for the old compiler such as VC6)

But I just came up with that it might be good idea to use some sort of functor_adaptor class to copy functor into delegate rather than ugly <<= operator. Then I can say that delegate is only for free function pointer and member function pointer if you still want to copy functor into delegate, use functor_adaptor! (Don's solution do not support functor eiter anyway )

Also, is there some better option than the make_delegate function when passing member pointers to a function that takes a delegate? I will have possibly several hundred such calls and it seems like a lot of added characters to have to use that adapter every time.

Thanks for this article. I enjoyed getting my head around it. I need a delegate mechanism for an application I am developing and this approach looks to me to be just what I am looking for.

I found a few errors (at least I think so!) in the third delegate class definition:

destructor has if_by_malloc rather than is_by_malloc
select_fp_ typedefs for Then and Else need template parameters for fp_by_value and fp_by_malloc
typedef for stub_type is missing the delegate argument

Sorry if I wasn't clear enough. I was referring to the example code in the article itself. Specifically, your third version of the delegate class. I realise that this is not intended as the code that people will use, but I was trying to build the simplified version to try to understand what you had done, since the full version enevitably has a lot more templates and extra complexity.

One thing I have encountered during this is that my compiler at work (VC6) is not very happy about the placement new being used on a non-user defined class. Is there a work around for this, or am I doing something stupid? It also had some issues about members (and the enum) being private within the delegate class.

I'm sure these things have probably crept in due to simplification of the much more general code that you have written. I don't think these issues are vital to understanding the basic idea.

Yes the full version of source code has quite degrees of complexity due to its generalization and compiler specific template bug workarounds (meta meta template & macros). In the next update, I will include a tip how to extract the simplified version from the full version so that people like you can have better understand of the implementation detail. (and that is how I debug the code while developing)