I have a method where performance is really important (I know premature optimization is the root of all evil. I know I should and I did profile my code. In this application every tenth of a second I save is a big win.) This method uses different heuristics to generate and return elements. The heuristics are used sequentially: the first heuristic is used until it can no longer return elements, then the second heuristic is used until it can no longer return elements and so on until all heuristics have been used. On each call of the method I use a switch to move to the right heuristic. This is ugly, but work well. Here is some pseudo code

As I said, this works and it's efficient since at each call, the execution jumps right where it should. If a heuristic can't provide an element, m_step is incremented (so the next time we don't try this heuristic again) and because there is no break statement, the next heuristic is tried. Also note that some steps (like step 1) never return an element, but are one time initialization for the next heuristic.

The reason initializations are not all done upfront is that they might never be needed. It is always possible (and common) for GetElem to not get called again after it returned an element, even if there are still elements it could return.

While this is an efficient implementation, I find it really ugly. The case statement is a hack; using it without break is also hackish; the method gets really long, even if each heuristic is encapsulated in its own method.

How should I refactor this code so it's more readable and elegant while keeping it as efficient as possible?

Refactoring isn't generally going to make code more efficient - just more readable and maintainable. The two shouldn't be done at the same time. Optimization will often make the code less readable and maintainable.
–
Larry WatanabeDec 30 '09 at 14:59

It feels like you implemented the "yield return" by hand. Pretty cool!
–
Hamish GrubijanDec 30 '09 at 14:59

@Larry Watanabe : You are right, optimization and readability are often competing objectives. But this code seem so hackish to me I was wondering if there is anyway to improve it.
–
Mathieu PagéDec 30 '09 at 15:02

3

Mathieu: I know you've said you profiled it, but it's hard to believe a lot of CPU time is really being spent in this function... unless the caller and the heuristics are trivial.
–
Jason OrendorffDec 30 '09 at 15:42

1

If it's called about 20 million times per second then it clearly isn't performing badly. That's only 150 cycles per iteration. That doesn't seem bad for something that makes several function calls and has several branches. I think that using a switch statement in this way is sufficiently commonplace that it's not unreasonable.
–
DrPizzaJan 2 '10 at 4:09

10 Answers
10

To my mind if you do not need to modify this code much, eg to add new heuristics then document it well and don't touch it.

However if new heuristics are added and removed and you think that this is an error prone process then you should consider refactoring it. The obvious choice for this would be to introduce the State design pattern. This will replace your switch statement with polymorphism which might slow things down but you would have to profile both to be sure.

I undestand you are saying that each Heuristic must be a diferent object implementing the same interface (or inheriting the same abstract class) and put them into an iterator for choosing the good one. That's a nice solution (polymorphism)
–
heliosDec 30 '09 at 15:15

I don't think your code is so bad, but if you're doing this kind of thing a lot, and you want to hide the mechanisms so that the logic is clearer, you could look at Simon Tatham's coroutine macros. They're intended for C (using static variables) rather than C++ (using member variables), but it's trivial to change that.

Note that once you decide to go this route, you can do the same to each heuristic, and then this becomes even more straightforward: bool Walk(Callback fn) { return WalkHeuristic1(fn) || WalkHeuristic2(fn) || WalkHeuristic3(fn); }.
–
Jason OrendorffDec 30 '09 at 17:51

That's micro optimization, but there is no need to set m_elem value when you are not returning from GetElem. See code below.

Larger optimization definitely need simplifying control flow (less jumps, less returns, less tests, less function calls), because as soon as a jump is done processor cache are emptied (well some processors have branch prediction, but it's no silver bullet). You can give a try at solutions proposed by Aaron or Jason, and there is others (for instance you can implement several get_elem functions annd call them through a function pointer, but I'm quite sure it'll be slower).

If the problem allow it, it can also be efficient to compute several elements at once in heuristics and use some cache, or to make it truly parallel with some thread computing elements and this one merely a customer waiting for results... no way to say more without some details on the context.

May be it would be less performant because of the virtual method call, maybe it would be better performant because of less state maintaining code, but the code would be definitely much clearer and maintainable, as always with patterns.

What could improve performance, is elimination of DoSomeOneTimeInitialisationForHeuristic2();
with separate state between. 1 and 2.

If the element code you are processing can be converted to an integral value, then you can construct a table of function pointers and index based on the element. The table would have one entry for each 'handled' element, and one for each known but unhandled element. For unknown elements, do a quick check before indexing the function pointer table.