Bitwise Optimization in Java: Bitfields, Bitboards, and Beyond

Quick quiz: how do you rewrite the statement below,
which alternates between two constants, without a
conditional?

if (x == a) x= b;
else x= a;

Answer:

x= a ^ b ^ x;
//where x is equal to either a or b

The ^ character is the logical
XOR operator. How does the code
work? With XOR, doing
the operation twice always gives you back the original
result.

Case 1: x equals a

Case 2: x equals b

x= a ^ b ^ x

x= a ^ b ^ x

Use formula

x= a ^ b ^ a

x= a ^ b ^ b

Substitute

x= b ^ a ^ a

x= a ^ b ^ b

Reorder variables

x= b

x= a

The last two variables cancel out

While not a popular technique, delving into the black art
of bitwise manipulation can boost performance in Java. The
rewrite above benchmarked at 5.2 seconds, versus
5.6 seconds for the conditional statement on my setup.
(See Resources below for the
sample code.)
Avoiding conditionals can often increase performance
on modern processors with multiple pipelines.

Of particular interest to Java programmers are certain
tricks related to what are called bitsets. In Java,
the int and long integer
primitives can double as a kind of set of bits with 32 or
64 elements. These simple data structures can be
combined to represent larger structures, including arrays.
Additionally, certain special operations
(called bit-parallel) effectively condense a number of
separate operations into one. When using finer
grained sized data--bits rather than integers--we will find that operations that do only one
thing at a time when applied to integers can sometimes
do many things at once when applied to bits. We can
think of bit-parallel operations as rediscovered software
pipelines in Java systems. (Of course, assembly programmers may not
see much new about this!)

Advanced chess-playing programs like Crafty (written in
C) use a special chess data structure called a
bitboard to represent chess positions faster than arrays
could. Java programmers should be even more interested in
avoiding arrays than C programmers. While C has a fast
implementation of arrays, Java arrays (which have more
features like bounds checking and garbage collection) are
on the slow side. A simple test I performed on my system
(Windows XP, AMD Athlon, HotSpot JVM)
comparing integer access to array access shows that array
access takes 160 percent longer than integer access. This
doesn't even touch on the garbage collection issue.
Bitsets, which are implemented as integers, can replace
arrays in some situations.

Let's take a look at dusting off some of those old
assembly-age tricks with a special emphasis on bitsets.
Java actually has good bitwise support for a portable
language. Additionally, Tiger adds a number of useful bit manipulation
methods to the API.

Considering Bitwise Optimization in the Typical Java
Environment

Java programs are pretty far removed from the bit-crunching
machines and operating systems on which they run. While modern
CPUs often have special instructions to manipulate bitsets,
you certainly can't execute these instructions in Java. The
JVM supports only signed and unsigned shift, and bitwise
XOR, OR, AND, and NOT. Ironically, Java programmers find
themselves in the same boat with many assembly programmers
who happened to be targeting CPUs, in lacking
extra bitwise instructions. The assembly programmers had
to emulate these instructions, as quickly as possible, in
software. Many of the new Tiger methods do just this in
pure Java.

C-language programmers who are micro-tuning may review their actual
assembly code and then consult the optimizer's manual for
their target CPU, and actually count how many instruction
cycles their code will run in. In contrast, a practical
problem with doing low-level optimization in Java is
determining the results. The Sun JVM spec says nothing
about the relative speed at which the given opcodes are
executed. You may spoon-feed the JVM the code you think
will run fastest, only to be shocked by lack of improvement
or even detrimental effects. The JVM may either mangle your
optimizations somehow, or it may be optimizing the normal
code internally. In any case, any JVM optimizations, even
if they are mainstays in compiled languages, should be
benchmarked.

The standard tool to look at bytecodes, javap,
is included in the JDK. However, users of Eclipse can use
the
Bytecode Outline plugin from Andrei Loskutov. A
reference book about the JVM's design and instruction set
is available online on
Sun's online
Java book page. Note that there are two kinds of
memory in every JVM. There is a stack, where local
primitives and expressions are stored, and a heap, where
objects and arrays are stored. The heap is subject to
garbage collection, while the stack is a fixed chunk of
memory. While most programs only use a small amount of stack,
the JVM specification states that you can expect the stack
to be at least 64K in size. Take special note that the JVM is
really a 32/64-bit machine, so the byte and short primitives
are unlikely to run any faster than the int.

"Two's Complement" Numbers

In Java, all integer primitives are in a
signed number format known as "two's complement." To
manipulate the primitives, we need to understand them.

There are two types of two's complement numbers:
non-negative and negative. The highest bit, the sign bit,
usually shown on the left, is set to zero for non-negative
numbers. These numbers are "normal" you simply read them
left to right and convert to the base you want. However, if
the sign bit is on, then the number is negative and the
remaining bits represent a negative number.

There are two ways to look at the negative numbers. In the
first way, negatives count up starting at the smallest
possible number and end at -1. So, for a byte, they start at
10000000 (or -128 decimal), then
10000001 (or -127), all the way up
to 11111111 (or -1). The second way
to think about them is a little odd. When the sign bit is
on, instead of having leading zeros followed one bits, there
are leading ones followed by zero bits. However, you must
also subtract one from the result. For example,
11111111 is just the sign bit padded by seven
ones, which is "negative zero" (-0). We then
add (or subtract, depending on how you look at it) one to
get -1. 11111110 is
-2, 11111101 is -3,
and so on.

It may seem strange, but we can do a lot of operations by
mixing the bitwise operators together with the arithmetic
operators. For example, to change between decimal
x and -x, negate and add one:
(~x)+1. This can be seen in the following table.

x

~x

(~x)+1or-x

0111 (7)

1000 (-8)

1001 (-7)

0110 (6)

1001 (-7)

1010 (-6)

0101 (5)

1010 (-6)

1011 (-5)

0100 (4)

1011 (-5)

1100 (-4)

0011 (3)

1100 (-4)

1101 (-3)

0010 (2)

1101 (-3)

1110 (-2)

0001 (1)

1110 (-2)

1111 (-1)

0000 (0)

1111 (-1)

0000 (0)

Boolean Flags and Standard Boolean Bitsets

The bit flag pattern is common knowledge and widely used in
the public APIs of GUIs. Perhaps we are writing a Java GUI
for a constrained device like a cell phone or a PDA. We
have widgets like buttons and drop-down lists that each
have a list of Boolean options. With bit flags, we can stuff a
large number of options in a single word.

Your program can pass around a whole group of Boolean
variables in no time with a simple assignment expression.
Perhaps the API states that the user can have only one of
borderStyleA, borderStyleB,
borderStyleC, or borderStyleD at
the same time. To check, first select those four bits with a
mask and second, check to see that the result has, at most,
one bit. The code below uses a little trick we will explain
soon.

If temp is not equal to
rightMostBit, that means temp must
have more than one bit, because rightmostBit will
contain zero if temp is zero, otherwise it
contains only one bit.

if (temp != rightmostBit)
throw new IllegalArgumentException();

The example above is a toy example. In the real world, AWT
and Swing do use the bit flag pattern, but
inconsistently. java.awt.geom.AffineTransform
uses it extensively. java.awt.Font uses it, as
does java.awt.InputEvent.

Some Common Operations and the New JDK 1.5 Methods

To get very far with bitsets, you need to know the standard
"tricks," or operations you can perform. There are new
bitset API methods as of the J2SE 5.0 (Tiger) release. If you're using
an older release, you can just cut and paste the new methods
into your code. A recent book with a lot of material on
bitwise algorithms is Hacker's Delight by Henry S.
Warren, Jr. (See
www.hackersdelight.org or read
the book on Safari.)

The following table shows some operations that can be done either with
a line of code or one of the API methods:

y= all 0 bits

y= 0;

y= all 1 bits

y= -1

y= all zeros except for the rightmost or
least significant bit

y= 1;

y= all zeros except for the leftmost or
sign bit

y= Integer.MIN_VALUE;

y= the rightmost 1-bit of x

y= x & (-x)

y= the leftmost 1-bit of x

y= Integer.highestOneBit(x);

y= the rightmost 0-bit of x

y= ~x & (x + 1)

y= x with the rightmost 1-bit turned off

y= x & (x - 1)

y= x with the rightmost 0-bit turned off

y= x | (x + 1)

y= the number of leading zeros in x

y= Integer.numberOfLeadingZeros(x);

y= the number of trailing zeros in x

y= Integer.numberOfTrailingZeros(x);

y= the number of 1 bits in x

y= Integer.bitCount(x);

y= x with the bits reversed

y= Integer.reverse(x);

y= x after a rotated shift left by c units

y= Integer.rotateLeft(x,c);

y= x after a rotated shift right by c units

y= Integer.rotateRight(x,c);

y= x with the bytes reversed

y= Integer.reverseBytes(x);

To get an idea of how long these methods take, one can step
through the source. Some methods are more cryptic than
others. They are all explained in Hacker's Delight. They
are either one-liners or a few lines long, like
highestOneBit(int) below.