Bitslicing with Quine-McCluskey

Data Orthogonalization for Cryptography

August 27, 2018

Part one gave a short introduction of bitslicing as a concept, talked about
its use cases, truth tables, software multiplexers, LUTs, and manual optimization.

The second covered Karnaugh mapping,
a visual method to simplify Boolean algebra expressions that takes advantage of
humans’ pattern-recognition capability, but is unfortunately limited to at most
four inputs in its original variant.

Part three will introduce the Quine-McCluskey algorithm,
a tabulation method that, in combination with Petrick’s method,
can minimize circuits with an arbitrary number of input values. Both are relatively simple to implement in software.

The Quine-McCluskey algorithm

Without much ado, we’ll jump right in and bitslice functions for both its
output bits in parallel. You’ll probably recognize a few similarities to K-maps,
except that the steps are rather mechanical and don’t require visual
pattern-recognition abilities.

Step 1: Listing minterms

The lookup table SBOX[] can be expressed as the Boolean functions
fL(a,b,c) and fR(a,b,c). Here are their truth tables,
with each combination of inputs assigned a symbol mi. Rows
m0-m7 will be called minterms.

fL(a,b,c)

a

b

c

fL

m0

0

0

0

0

m1

0

0

1

0

m2

0

1

0

1

m3

0

1

1

0

m4

1

0

0

1

m5

1

0

1

1

m6

1

1

0

1

m7

1

1

1

0

fR(a,b,c)

a

b

c

fR

m0

0

0

0

1

m1

0

0

1

0

m2

0

1

0

1

m3

0

1

1

1

m4

1

0

0

0

m5

1

0

1

0

m6

1

1

0

1

m7

1

1

1

0

We’re interested only in the minterms where the function evaluates to 1 and
will ignore all others. Boolean functions can already be constructed with just
those tables. In Boolean algebra,
OR can be expressed as addition, AND as multiplication. The negation of x
is represented by x.

Well, that’s a start. Translated into C, these functions would be constant-time
but not even close to minimal.

Step 2: Bit Buckets

Now that we have all these minterms, we’ll put them in separate buckets based
on the number of 1s in their inputs a, b, and c.

fL(a,b,c)

# of 1s

minterm

binary

1

m2

010

m4

100

2

m5

101

m6

110

fR(a,b,c)

# of 1s

minterm

binary

0

m0

000

1

m2

010

2

m3

011

m6

110

The reasoning here is the same as the Gray code
ordering for Karnaugh maps. If we start with the minterms in the first bucket n,
only bucket n+1 might contain matching minterms where only a single variable
changes. They can’t be in any of the other buckets.

Step 3: Merging minterms

Why would you even look for pairs of minterms with a one-variable difference?
Because they can be merged to simplify our expression. These combinations are
called minterms of size 2.

All minterms have output 1, so if the only difference is exactly one input
variable, then the output is independent of it. For example, (a & ~b & c) | (a & b & c)
can be reduced to just a & c, the expression value is independent of b.

fL(a,b,c)

# of 1s

minterm

binary

size-2

1

m2

010

m2,6

—10

m4

100

m4,5

10—

m4,6

1—0

2

m5

101

m6

110

fR(a,b,c)

# of 1s

minterm

binary

size-2

0

m0

000

m0,2

0—0

1

m2

010

m2,3

01—

m2,6

—10

2

m3

011

m6

110

Always start with the minterms in the very first bucket at the top of the table.
For every minterm in bucket n, we try to find a minterm in bucket n+1 with a
one-bit difference in the binary column. Any matches will be recorded as pairs
and entered into the size-2 column of bucket n.

m2=010 and m6=110 for example differ in only the first
input variable, a. They merge into m2,6=—10, with a dash marking
the position of the irrelevant input bit.

Once all minterms were combined (as far as possible), we’ll continue with the
next size. Minterms of size bigger than 1 have dashes for irrelevant input bits
and it’s important to treat those as a “third bit value”. In other words, their
dashes must be at the same positions, otherwise they can’t be merged.

There’s nothing left to merge for fL(a,b,c) as all
its size-2 minterms are in the first bucket. For fR(a,b,c), none
of the size-2 minterms in the first bucket match any of those in the second,
their dashes are all in different positions.

Step 4: Prime Implicants

All minterms from the previous step that can’t be combined any further are
called prime implicants. Entering them into a table let’s us check how well
they cover the original minterms determined by step 1.

If any prime implicant is the only one to cover a minterm, it’s called an
essential prime implicant (marked with an asterisk). It’s essential because
it must be included in the resulting minimal form, otherwise we’d miss one of
the input values combinations.

fL(a,b,c)

m2

m4

m5

m6

abc

m2,6*

x

x

-10

m4,5*

x

x

10-

m4,6

x

x

1-0

fR(a,b,c)

m0

m2

m3

m6

abc

m0,2*

x

x

0-0

m2,3*

x

x

01-

m2,6*

x

x

-10

Prime implicant m2,6* on the left for example is the only one that
covers m2. m4,5* is the only one that covers
m5. Not only is m4,6 not essential, but we actually
don’t need it at all: m4 and m6 are already covered
by the essential prime implicants. All prime implicants of fR(a,b,c)
are essential, so we need all of them.

When bitslicing functions with many input variables it may happen that you are
left with a number of non-essential prime implicants that can be combined in
various ways to cover the missing minterms. Petrick’s method
helps finding a minimum solution. It’s tedious to do manually, but not hard to
automate.

Step 5: Minimal Forms

Finally, we derive minimal forms of our Boolean functions by looking at the abc
column of the essential prime implicants. Input variables marked with dashes
are ignored.

fL(a,b,c) = m2,6 + m4,5 = bc + ab

The code for SBOXL() with 8-bit inputs:

uint8_tSBOXL(uint8_ta,uint8_tb,uint8_tc){return(b&~c)|(a&~b);}

fR(a,b,c), reduced to the combination of its three essential prime implicants:

Bitslicing a DES S-box

When I started writing this blog post I thought it would be nice to ditch the
small S-box from the previous posts, and naively bitslice a “real” S-box, like
the ones used in DES.

But these are 6-to-4-bit S-boxes, how much more effort can it be? As it turns out,
humans are terrible at understanding exponential growth. Here are my intermediate
results after an hour of writing, trying to bitslice just one of the four output
bits:

I gave up when I spotted a few mistakes that would likely lead to a non-minimal
solution. Bitslicing a function with that many input variables manually is
laborious and probably not worth it, except that it definitely helped me
understand the steps of the algorithm better.

As mentioned in the beginning, Quine-McCluskey and Petrick’s method can be
implemented in software rather easily, so that’s what I did instead. I’ll
explain how, and what to consider, in the next post.