On moving and d-pointer performance

Edit: fixed links… sigh, I should switch to wordpress I did!

The usual thing, long time no blogging. Well, real life got a good deal in the way. Not for bad though. While I did my MSc. thesis at KDAB I got the possibility to start an Ph. D. position on information visualization at ILOG (now part of IBM) in Paris. This meant packing our stuff from our Berlin apartment (not too much luckily as we rented furnished) and moving it back to the Netherlands at first. Then using the next couple of months to sort out and pack our remaining stuff which we didn’t take to Berlin and find an apartment in Paris. As you probably can imagine, this was slightly stress-full, but I’m pleased to say that we are finally settled now. We have a nice apartment south of Paris, only a 20 min walk from the office I work and we have Internet access again The only thing we don’t have yet is the ability to understand french…. Well, lets see if that changes over time.

What triggered me to write this blog was something different though. First a note of warning, I didn’t do any other research than presented in the remaining of this writing. My former colleague Marc wrote two great articles (here and here) on the topic of d-pointers. Its in German, though it shouldn’t be too hard to read if you’re somewhat familiar with the topic and German. One topic he doesn’t address though (or I skimmed too fast now, it was some time ago I really read them) is the performance of d-pointer classes versus value classes. Recently I was writing a class which has the potency of getting created a lot and often which triggered the question where and how much the cost difference would be. We all know that creating objects on the heap is more expensive than creating them on the stack, so I quickly deduced from this that d-pointer classes must be more expensive. Okay, that shouldn’t have come as too much of a surprise. Two questions remain though, what is the price one pays and where (e.g. creation, copy, access).

Intrigued by these questions I set up a real quick and dirty benchmark just to get some rough ideas. By no means I pretend to be complete and given that I’m quite tired I might have made some horrible mistakes as well. So feel free to comment on the results. Here’s what I did: I created three different classes: DPointerClass, ValueClass and InlineValueClass (the latter just for additional comparison). The classes are very simple, they just hold an int which you can get (const access) and set (non-const access). Complete source can be found here. Next I created a test class using QTest lib with the following methods:

void test${Class}CreateDestruct();

void test${Class}HeapCreateDestruct();

void test${Class}Copy();

void test${Class}Assignment();

void test${Class}ConstAccess();

void test${Class}NonConstAccess();

In these methods I use the QBENCHMARK macro to benchmark the actions described by the test names (using walltime). This leaded to the following results:

In this image you see the number iterations for each action. To be honest, when seeing that graph I’m not completely sure how to interpret it. As I understand from the QTest doc, for walltime a higher number of iterations might be needed to get more precise results. However, what more precise results mean is not clear to me and neither why there is such a big variation in the number of iterations per tests for the individual classes as well as between all tests. (E.g. for value it is high for the CreateDestruct test, low for HeapCreateDestruct and again high for Copy). Well I guess the second issue is closely related to the first one so if anyone could enlighten me on that It’d be great.

Secondly, and slightly more interesting is the following graph:

Here you see the time (in msec) it took per iteration. We can directly conclude that for creation/destruction and copying the difference is significant. For assignment and the access operations the difference is negligible. So the extra step of indirection doesn’t seem to add, more strangely it seems to be slightly faster even. The factors between DPointer and Value for CreateDestruct, HeapCreateDestruct and Copy are respectively 42, 20 and 12 20, 42 and 12.

Concluding we can say that only for CreateDestruct, HeapCreateDestruct and Copy the differences are significant. For creation this is logical, for copying I assume that the byte-by-byte copying from the default constructors are performing slightly better than custom ones. For assignment and access there is no performance loss. One last question remains, although the difference is significant even in the worst case we’re talking about 0.0012 msec. How does one determine whether this is too much for his application and thus fall back to creating classes with private members? The class I’m working on has the potention of getting created in large amounts in short time (e.g. 100.000 several times in a number of consecutive calls) I’m somewhat tempted at the moment to move away from value classes to d-pointer classes for the obvious design reasons as the class will become part of a library. Any comments and thoughts on this are welcome.

If you need to create a ton of ‘stuff’ on the heap, keep a pool of the ‘stuff’ and avoid the cost of allocating/freeing memory by keeping destroyed (deleted) objects in the pool to be reused (via new) later. I’d be really interested in what this would do to the second graph

IMO d-pointers/pimpl pointers are a much-misunderstood beast, and should only be used on publicly facing interfaces. There is no point using a pimpl pointer on an internal class. If you’re dealing with a class that will be exported from the shared lib, and need the extra performance there are ways around that problem as well.

There certainly is a point. Build time is another reason for using d-pointers. By moving (most) of your members and code into a private header (or the .cpp), you shift the burden of changes from recompiling everything touching that code, to just everything touching the private implementation. With codebases of a few thousand LOC, this quickly starts to make a very visible impact.

d-pointers have other advantages (see my comment above about build time) than hiding implementation and aiding ABI-stability (though that is certainly a helpful side-effect), and also allow for a clearer seperation of public vs private interfaces in design which is usually a good thing.

My $0.02 in summary…

…is that you really shouldn’t worry about things like this unless profiling shows it is a real problem on real world code. Microbenchmarks, like this one, are great for examining things that you’re curious in, but you shouldn’t let this have an impact on how you write code by itself, really.