Nonlinearity measurement is particularly useful to quantify the
strength of invertible
substitution tables.
This is important when pre-defined tables are a part of a
cipher definition.
But nonlinearity measurement can be even more important in the
context of
scalable ciphers:
When ciphers can be down to experimental size, it becomes possible to
talk about the overall nonlinearity (for each
key) of the cipher itself.
This is far more information than we usually have on cipher designs.

Affine Boolean Functions

A Boolean
function produces a
single-bit result for each
possible combination of values from perhaps many Boolean variables.
The Boolean
field consists of the values
{0,1}, with
XOR as "addition" and
AND as "multiplication."

In the Boolean field, a constant or a0 value of '1'
inverts or reverses the result, while a constant of '0' has
no effect. The coefficients ai simply enable or
disable the associated variable xi. And if we
consider the collected coefficients to be a counting binary value,
we have a unique ordering for affine Boolean functions:

In this way, we can write 16 different forms for 3 variables.
But it is convenient to pair the functions which are the same except
for the value of the constant, and then we have exactly 8 affine
Boolean functions of 3 variables. Each of these has a particular
value for every possible combination of variable value, which we
can show in a
truth table:

Unexpected Distance

One way to measure a sort of "correlation" between two Boolean
functions is to compare their truth tables and count the number
of bits which differ; this is their
Hamming distance.

Since we expect about half the bit positions to differ
(on average), we can subtract that expected distance and come up with
what I am calling -- for lack of a better term -- the "unexpected
distance" (UD). The magnitude of the UD relates to just how
unexpected the distance is, while the sign indicates the direction.
Consider two functions and their difference:

Nonlinearity

Nonlinearity is the
number of bits which must change in the truth table of a Boolean
function to reach the closest affine function.
But every affine Boolean function also has a complement affine
function which has every truth table bit value reversed. This means
that no function possibly can be more than half its length in bits away
from both an affine Boolean function and its complement.
So a zero UD value is not only what we expect, it is in fact
the best we can possibly do.

A non-zero UD value is that much closer to some affine function,
and that much less nonlinear. So the nonlinearity value is half
the length of the function, less the maximum absolute value of the
unexpected distance to each affine function.

The function f in the previous section has a length of 8 bits,
and an absolute value maximum unexpected distance of 2. This is a
nonlinearity of 4 - 2 = 2; so f has a nonlinearity of 2.
Nonlinearity is always positive, and also even (divisible by 2)
if we have a
balanced function.

The Hadamard Matrix and Affine Functions

A Hadamard matrix H is an n x n matrix with all entries +1 or -1,
such that all rows are orthogonal and all columns are orthogonal
(see, for example, [HED78]).

The usual development (see, for example [SCH87]) starts with a
defined 2 x 2 Hadamard matrix H2 which is ((1,1),(1,-1)). Each step
consists of multiplying each element in H2 by the previous matrix,
thus negating all elements in the bottom-right entry:

So if we map the values in the affine truth table:
{0,1} -> {1,-1},
we find the same functions as in the Hadamard development.
These are the Walsh functions, and here both developments
produce the same order, which is called "natural" or "Hadamard."
Walsh functions have fast transforms which reduce the cost of
correlation computations from n*n to n log n, which can be a very
substantial reduction.

The Fast Walsh-Hadamard Transform

A Fast Walsh Transform (FWT) essentially computes the correlations
which we have been calling the "unexpected distance" (UD). It does
this by calculating the sum and difference of two elements at a time,
in a sequence of particular pairings, each time replacing the
elements with the calculated values.

It is easy to do a FWT by hand. (Well, I say "easy," then always
struggle when I actually do it.) Let's do the FWT of function
f: (1 0 0 1 1 1 0 0): First note that f has a binary power length,
as required. Next, each pair of elements is modified by an "in-place
butterfly"; that is, the values in each pair produce two results
which replace the original pair, wherever they were originally
located. The left result will be the two values added; the right
result will be the first value less the second. That is,

(a',b') = (a+b, a-b)

So for the values (1,0), we get (1+0, 1-0) which is just (1,1).
We start out pairing adjacent elements, then every other element,
then every 4th element, and so on until the correct pairing is
impossible:

Note that all FWT elements -- after the zeroth -- map the U.D.
results exactly in both magnitude and sign, and in
the exact same order. (This ordering means that the binary index
of any result is also the recipe for expressing the affine function
being compared in that position.) The zeroth element in the FWT
is the number of 1-bits in the function when we use the real values
{0,1} to represent the function.

So to find the "unexpected distance" from any balanced function
to every affine Boolean function, just compute the FWT. Clearly,
the closest affine function has the absolute value
maximum UD value of all the transformed elements past the
zeroth. Just subtract this value from half the function length
(which is the zeroth FWT value in a balanced function) to get the
nonlinearity.

Understanding the FWT

To understand how the FWT works, suppose we label each
bit-value with a letter, and then perform a symbolic FWT:

Each of these columns is the symbolic description of one element
in the FWT result. Since each uses the same input variables in the
same order, we can represent the uniqueness of each result simply
by the sign applied to each variable:

So not only do we once again find the affine functions, we also
find them implicit in a way appropriate for computing add / subtract
correlations, thus producing UD values directly with high efficiency.

LintHadFmSeqWalsh takes an array of 32-bit integers, and changes
the array data into the Walsh-Hadamard transform of that data. For
nonlinearity measures, the input data are {0,1} or {1,-1}; the
results are potentially bipolar in either case. (The "lastel"
parameter is the last index in the data array which starts at
index 0; it is thus always 2n - 1 for some n. The ABSOLUTE
attribute forces Borland Pascal to treat the parameter as a LongInt
array of arbitrary size.)

Using {0,1} Versus {1,-1}

It is common to consider a Boolean function as consisting of the
real values {0,1}, but it is also common to use the
transformation

x' = (-1)x

(negative 1 to the power x) where x is {0,1}. This transforms
{0,1} into {1,-1}, and for some reason looks much cooler than doing
the exact same thing with

x' = 1 - 2x

This transformation has some implications: Using real values
{1,-1} doubles the magnitude and changes the sign of the FWT
results, but can simplify nonlinearity for unbalanced functions,
because the zeroth term need not be treated specially. But if
the Boolean function is balanced, as it will be in the typical
invertible substitution table, the zeroth element need not be used
at all, so using real values {1,-1} seems to provide no particular
benefit in this application.

Nonlinearity in Invertible Substitution Tables

An invertible
substitution table
is an array of values in which
any particular value can occur at most once. If the range of the
output values is the same as the input values, then every value
occurs in the table exactly once. Typically the table has a
power-of-2 number of elements, which is related to size in bits of
its input (and output) value. For example, an "8-bit" table has
28 = 256 elements, in which each value from
0 though 255 occurs exactly once.

Even these relatively small tables have remarkable
keying
potential. Each invertible table differs from every other only in
the arrangement of the values it holds, but there is typically an
incredible number of possible
permutations. A 2-bit
table with 22 = 4 elements is one of are 4!
(4-factorial) or just 24 different tables. But a 4-bit
table with 24 = 16 elements is one of 16!
or 2.09 x 1013 tables, a 44-bit number,
and potentially a 44-bit keyspace. The usual 8-bit tables have a
1648-bit keyspace, per table. When a table is used alone as
Simple Substitution,
these entries are easily resolved. But as
part of a more complex
block cipher, the
entries may be hidden so that the keying potential of the table
can be realized.

Nonlinearity applies to Boolean functions, and so does not apply
directly to substitution tables. But each output bit from such
a table can be considered a Boolean function. So we can
run through the table extracting all the bits in a given bit
position, and then measure the nonlinearity of the function
represented by those bits.

Clearly, if we measure a nonlinearity value for each output bit
position, we do not have a single nonlinearity for the table.
Several ways have been suggested to combine these values, including
the sum or the average of all values. But for cryptographic use it
may be more significant to collect the minimum nonlinearity
over all the bit positions. This allows us to argue that no bit
position in the table is weaker than the value we have.
Since a table collects multiple Boolean functions, tables tend to
be weaker than the average Boolean function of the same length.
But the nonlinearity values for tables and sequences of the same
length do tend to be similar and somewhat comparable.

Some Table Nonlinearity Distributions

There are no nonlinear 2-bit tables. We know this because there
are exactly 6 balanced bit sequences of length 4, and each of those
has a measured nonlinearity of zero. So there is no chance to build
a nonlinear table by collecting those sequences.

Here are some coarse graphs of nonlinearity
distributions
at various table sizes:

Nonlinearity Measurement

Bit Width:
2
3
4
5
6
7
8

Make Table is just a convenient way to create a random
permutation and place it in the top panel. The buttons select the
size of the table.

Enter Table:

The top panel wants to see a table permutation with a
space or a comma between each element. An arbitrary table can be
entered, but the number of elements must be some power of 2
(such as: 4, 8, 16, ...).

Bit Column:

Extract LS Bits will run down the list in the top panel
and test the least-significant bits of each value to create a
bit-sequence in the bottom panel. Extract Next Bits extracts
the next most-significant bits.

First Combination creates a balanced bit-sequence of the
same length as a table (bit width 4 or less) and puts it in the
bottom panel. Next Combination steps the sequence.

The bottom panel normally holds a bit sequence, or the
transformed result, with a space or a comma between each value.
A general sequence of values can be entered and transformed, but
the number of elements must be some power of 2.

Max UD:Nonlinearity:

Transform will run a fast Walsh-Hadamard transform (FWT)
on the sequence in the bottom panel, and replace the sequence
with the results.

Overall Minimum Nonlinearity:Status:

Overall NL will extract a bit-column and run a FWT for
every bit-column of the table in the top panel. The result is the
minimum nonlinearity value over all bit columns.
Warning: With 8-bit tables this operation has taken almost
a minute to complete, and also has crashed Windows 3.1 with a
"stack overflow" message.