Optimization of vector operations with bit hacks

Dec 7, 2014 • Alex Rogozhnikov

Recently worked on optimization of some (internal) classifier.
The problem was mostly not in training, but in applying of trained classifier — this code was
originally written in C++ and then translated to cython (which surprisingly decreased the speed by a factor of 2).

This was quite easy rewrite the code using numpy and vectorized approach
(initially predictions were built event-by event, after rewriting the classifier was applied tree-by-tree).
However this gave only speed comparable with original C++ code (and twice faster than cython version).

What really fastened the code is switching from int8 operations to int64
(the latest are natively supported in all modern processors).
So 8 operations in int8 were grouped into one 64-bit operation.

In this simple example we see 5x speed up. Views of course do not copy the
data, which is very essential for the speed. This trick can be applied with: summation /
subtraction / binary or / binary and, but you need that the size of original array was divisible
by 8.

Links

there is an awesome collection of twiddling bits,
which was my starting point in bit optimizations.