Optimizing boolean expressions for speed.

Minimizing boolean expressions is of great pragmatic importance. In hardware, it is used to reduce the number of transistors in microprocessors. In software, it is used to speed up computations. If you have a complex expression you want to minimize and look up a textbook on discrete mathematics, you will usually find a list of boolean laws and identities that you are supposed to use to minimize functions.

Using these laws, you are supposed to massage your expression until you discover a shorter one that computes the same truth values as the original expression. Or you can use a truth table and see what expressions are equivalent to some of the sub-expressions and how you can combine them to obtain an equivalent but shorter boolean expression. All you need is a pen, a piece of paper, coffee, and a lot of patience. While I make this sound like a tedious process, you will rapidly get the knack of this and you will find that minimizing expressions of a few variable quite easy.

Let us take an example. You may happen to write something like (I do, once in a while):

The conditional statement is equivalent to , were is current->next!=0, and is SomeThing(). Using the truth table approach (a favorite of textbooks), you write:

0

0

1

0

1

1

0

1

1

0

1

1

1

0

0

0

0

0

1

1

0

1

1

1

The last column you eventually find by trying numerous combinations (ok, in this case it’s not all that difficult, but you get the point) to find that . So you rewrite the above code as:

if ( current->next==0 || SomeThing() )
{
...
}
else ...

The textbook approach to minimization, however, doesn’t tell the whole story: it considers that all operators, , , , , etc. have an equal cost; namely zero. However, in real life, evaluating a boolean expression does incur a time cost. First, each variable involved in the expression must be assigned a truth value, which may be a constant but may also be the result of a computationally expensive function. Second, the expression itself must be evaluated, and the cost of the evaluation is not null. It is proportional to the number of operators involved.

Using boolean simplification and common sub-expression expression will help a lot to reduce the computational cost of the evaluation by not evaluating costly sub-expression more than once, but we can go further. Let us note that it is not always necessary to evaluate the entire expression to ascertain its truth value. For example, the expression is true if is true, regardless of the value of , so it is safe to abort the evaluation of the expression whenever is true. If is false, then we proceed to the evaluation of . Similarly, the expression is false whenever is false, regardless of the value of . The sub-expression is evaluated if, and only if, is true.

Some (most?) programming languages makes use of this observation to speedup evaluation of the conditional expressions. In Simula, for example, the short-cut evaluation is explicitly controlled by the programmer via the logical operators and then and or else. In Turbo Pascal, compiler switches, {$b+} and {$b-}, would instruct the compiler to generate code for complete or truncated expression evaluation. C and C++ use short-cut evaluation in all cases.

All C/C++ programmers use short-cut boolean expression evaluation to conditionally enable parts of expressions. Indeed, it is not uncommon to see code that looks like

The first part of the expression prevents the second part p->count from being evaluated if p is null. However, I think not all programmers explicitly use this feature to speedup tests by carefully resequencing/reorganizing the conditional expressions. Consider the following piece of code:

This code first evaluates an expensive function then, on success, proceeds to evaluate the remainder of the expression. But even if the first test fails and the evaluation is short-cut, there’s a significant performance penalty because the fat really_slow_test(...) is evaluated. While retaining program correctness, one can rearrange the expression as follows:

Now, the code is much faster because fast_test costs very little compared to the other two parts of the expression, and we proceed to evaluate really_slow_test(...) only if the two previous tests succeed. Ideally, the expressions should lend themselves to such reordering. In practice, there might be dependencies in the tests that force a natural ordering, regardless of which test is expensive or not. Think of our previous example with p and p->count; clearly, inverting the expression is not going to work. In many other cases, however, one can massage the expression to simplify it and to move the easiest, fastest, most likely to fail- (&&) or succeed-fast (||) tests to the front, allowing the expression’s value to be ascertained as fast as possible while being still correct.

The use of fail- or succeed-fast testing is, I think, a good, simple pratice that does not endanger code legibility and that does not lean too far on the side of premature optimization (and all its evils), yet yields interesting performance gains whenever the conditional expressions aren’t completely trivial.

You’re right. Branchings ARE expensive. However, if cascades aren’t a frequent programming style, and they might not help legibility all that much.

The second trick you present is valid when all variables are already assigned a truth value, or have unit cost of initialisation. && translates to &, || to |, and using De Morgan’s laws as you did allows to transform ((a==0)&&(b==0))==1 as ((a!=0)||(b!=0))==0 which is, indeed, (a|b)==0. I was going to talk about this in Part II, thanks for the spoilers ;)

If you do not have exactly 0 or 1 as your variables’ values, you can still demote them to zero anyway. Suppose you have ((a==3)&&(b==c))==1, you can rewrite this as (((a-3)==0)&&(((b-c)==0))==1, which, applying De Morgan’s law again, you transform to ((a-3)|(b-c)) == 0. All low-cost arithmetic, no extra jump due to short-cut evaluation.

Let me describe a “brute force” technique to boolean optimization I came up. The concept is to calculate term evaluation speed and probability during runtime and adjust the order of evaluation dynamically. I’ll apply it to your example:

The cool stuff happens inside OptimizeExpression. The first iteration of the while loop constructs all the terms and captures the initial term timing. The start/end times of each expression are captured and sorted using some appropriate ranking ie. time/probability with a running average. The next iteration of the while loop will adjust the order of evaluation using the case statement assigned to each term. This executes the fastest terms with the highest probability of being false for AND expressions and true for OR expression. Considerations for using this are: 1) need to evaluate ALL terms in the expression to gather the statistics of unknown/changing execution times and probabilities 2) this expression has to be something you do a lot, ie. looping through this expression hundreds of times with different data. 3) the “while switch” construct should replace any “if” without scoping issues. 4) the term evaluation needs to be expensive to compensate for the overhead of this mechanism (although it has little impact I can see)

I would think it would be most useful when you gather the run-time behavior, then use the collected data to hard code the final result: like a profile-guided optimization. With gcc you can instrument the code to gather this data and the compiler is able to use it to generate better code.

Did you actually looked at the code generated / measured the overhead ?