MARS

MARS is the IBM's entrant into the AES competition. As IBM is
the company that designed DES, this alone makes this entry of
particular interest. Note, incidentally, that as the AES process
only requires companies to allow free use of the proposed algorithm
if it is selected as the successful candidate, some AES candidates
are protected by patents, both to retain control of the cipher itself
if it is not selected, and to control the basic technology on which
the cipher is based, so that licensing would still be required for
larger variants of the cipher, for example. Among the entrants I have
examined so far, this applies to RC6 and it had also applied to MARS,
but IBM later made MARS available for licensing free of charge.

The document describing MARS, accessible at
http://www.research.ibm.com/security/mars.html,
notes that "little-endian" conventions,
with the "first" byte of a 32-bit word being its least significant
byte, are used in the cipher. Unlike the documentation of some of
the other little-endian designs, the MARS documentation makes it
completely clear and unambiguous which byte goes where.

As the little-endian case is difficult to describe, I am going to
use big-endian conventions consistently in my description of MARS.
Essentially, this means that the very first thing one does when
starting MARS as I describe it is divide the 128-bit block of plaintext
into four words of four bytes, and reverse the order of the four bytes
in each word. The same has to be done on exit.

Overview

The overall structure of MARS is as follows:

First, key material is XORed with the whole block.

Then, eight rounds of a transformation similar to DES are applied,
but that transformation is fixed, and without any part that is affected
by a key. (Note that if not for the initial XOR of key material,
this transformation would be a waste of time.)

Then, there are sixteen rounds (the last eight are called "reverse"
rounds, with a rationale somewhat like the one seen in Skipjack)
which constitute the "cryptographic core" of the cipher. One
32-bit word is used to modify the other three, by being split into
three copies of itself, and subjected to various manipulations,
including one multiplication by key material, one S-box lookup,
and two data-dependent rotations.

Then we have another unkeyed transformation, and another XOR of
key material.

There are 40 32-bit words of subkeys, which are generated by
a kind of shift-register method from the key.

As its documentation points out, the design of MARS is oriented
around having structures that are
secure, but which can also be analyzed, so that it is possible to
be confident of the security the cipher posesses. Hard-to-analyze
structures that might offer more security, but which would be hard
to be certain of, were specifically avoided. The unkeyed rounds
of mixing at the start and end of the cipher were, in a way, an
exception to that rule; this is why they were left unkeyed (although
that may well seem bizarre and wasteful), to ensure that they didn't
form, in some sense, a "real" exception to that rule. (It would seem to me
that if the key schedule of MARS were very strong, i.e., like that of
Blowfish, or if the key used for the outer rounds, presumably to XOR with
the inputs to the S-boxes, were independent of the key for the rest of
the cipher, there would also be no concern to prevent doing this. And
the encrypting speed of MARS would not be affected significantly, either.)

Detailed Structure of MARS

The diagrams I will use here show the words of the block with
the first word on the left, and with the most significant byte of
each word on the left. Thus, before starting, and after finishing,
the bytes of the block being enciphered must be transposed to the
following order, if considered as numbered from 0 to 15:

3 2 1 0 7 6 5 4 11 10 9 8 15 14 13 12

to convert between little-endian to big-endian. With this conversion
applied, the procedure I am describing in big-endian form will be correct.

The first step in MARS encipherment is to XOR the first four
subkey words, K0 to K3, with the four words of the block. (Of course
this can be thought of as the 128-bit XOR of a single 128-bit subkey,
but this keeps the notation for subkeys consistent.)

The four bytes of the first word in the block, W0, are used as inputs
to the two fixed S-boxes used by this cipher with eight bits of input and
32 bits of output. The second word is XORed with the word in S0 chosen by
the least significant byte of the first word and then the word in S1 chosen
by the third, or second least significant, byte of the first word is added
to it. The word in S0 chosen by the second byte of the first word is added
to the third word. The fourth word is XORed with the word in S1 chosen by
the first byte of the first word.

In the first and fifth round of this, the fourth word is added to the
first word, and in the second and sixth round of this, the second word is
added to the first word. This additional complication is intended to protect
against differential cryptanalysis.

Then the words are rotated, so that the old first word becomes the new fourth
word, and the other words are moved to the position immediately preceding their
old position.

This is shown clearly in the diagram of MARS that has been
on the right side of this page above, which outlines the
general structure of MARS.

Also on this diagram are the next sixteen rounds, which constitute the
"cryptographic core" of MARS. The E function produces three 32-bit outputs
from the first word of the block and two 32-bit subkeys. The 13-bit circular
left shift shown as being applied to the first word would be duplicated inside
the E-function if it really recieved only one input; that oversimplication is
avoided in the following diagram,

which shows one of the forward core rounds (which I have chosen to label
as type D in the first diagram for ease of reference) in detail.

The final eight rounds of MARS are the inverse, in the sense of
the operations performed, of the first eight rounds of unkeyed mixing;
subtraction replaces each addition, and the rounds are performed
in reverse order. However, the direction in which the four words are
rotated after each round is not reversed, so these rounds are not an
exact inverse of the first four rounds in an overall sense.

S(0,1) in the diagram stands for an S-box with nine input bits and 32
output bits which is merely the concatenation of S0 and S1.

The S-boxes in MARS are as follows (no, I didn't type them in myself; I
must gratefully acknowledge the C implementation of MARS by
Brian
Gladman as my source):

The Key Schedule

The key for MARS may be from 4 to 39 words in length. These words are
considered to be little-endian in format, so the first byte of the key is
the least significant byte of the first word.

To generate the key, we use an array of words which is considered to
have subscripts ranging from -7 to 39. The first seven words of this
array, numbered from -7 to -1, are filled with the first seven words in
the S-box (or the first seven words in S-box zero).

Then, words 0 to 38 of the array are initialized in order with
the following quantity:

The XOR of the following four items:

The word in the position of the array seven places earlier

The word in the position of the array two places earlier, shifted
three bits left

A word of the key (starting with the first word, and using the words
of the key in rotation)

The number of the array word being calculated (from 0 to 39)

Word 39 of the array is loaded with the number of words in the external
key.

At this point, we forget words -7 through -1 of the array, and treat
the array as containing only words 0 through 39.

Then, seven times over, starting from word 1 of the array, add to each
word of the array the S-box entry indicated by the most significant nine
bytes of the preceding element of the array. Note that word 39 is
considered as preceding word 0, the last word to be modified by this
loop.

Finally, word i of this temporary array, for i from 0 to 39, becomes
subkey (7 * i) modulo 40.

A proposed change to the definition of MARS is
to replace this key generation procedure, which generates 40 subkey
words in one step, with four instances of the following, each of
which generates 10 subkey words:

To generate the key, we use an array of words which is considered to
have subscripts ranging from 0 to 14. The key is placed in the array
starting with word 0, the word in the array immediately following the
last word of the key is filled with the number of words in the key, and
the remainder of the array is filled with zeroes.

What follows is done four times, each time producing 10 words of
subkey.

Then, words 0 to 14 of the array are initialized in order with
the following quantity:

The XOR of the following five items:

The word at the current position of the array.

The word in the position of the array seven places earlier (modulo
15, so that word 8 is seven places earlier than word 0)

The word in the position of the array two places earlier (again
modulo 15), shifted three bits left

The number of the array word being calculated (from 0 to 9) shifted
two bits left

The number, from 0 to 3, of the instance of this procedure. (That is,
0 when we are generating the first 10 subkey words, 1 when we are generating
the second group of 10 subkey words.)

Then, four times over, starting from word 1 of the array, add to each
word of the array the S-box entry indicated by the most significant nine
bits of the preceding element of the array. Note that word 14 is
considered as preceding word 0, the last word to be modified by this
loop.

Finally, word (4 * i) modulo 15 of this temporary array, for i from 0 to 9,
becomes subkey i plus 10 times the iteration number (in the first
iteration, considered iteration number zero, we generate subkeys 0 through 9,
in the second iteration we generate subkeys 10 through 19, and so on).

Subkeys 5, 7, 9, 11, ... to 35 are used to multiply the first word of
the block within the E function. These subkey values are modified if
necessary to ensure that they are good values for multiplication.

The method of correcting these subkeys is somewhat involved, and goes
as follows:

Call the original value of the subkey SR.

Let SM be SR or 3.

Let IX be SR and 3.

Create MA such that only those bits in MA corresponding to a run of ten
or more bits in SM are ones, excluding the first and last bit of
each run. (If MA is zero, SR does not need to be altered, so you can quit.)

Originally, the reference code implementing
this procedure left the first bit of MA equal to one if the first
bit of SM was zero. A proposed modification to MARS is to change this to
conform to the written description.

Let MA be MA and FFFFFFFC - that is, set its last two bits to zero.

Use IX to select an element from this array (containing four values
from the S-boxes):

0: A4A8D57B
1: 5B5D193B
2: C8A8309B
3: 73F9A978

then rotate the value right by the amount indicated by the element
three places ahead

The proposed modification changes
this to one place back. (Since only odd-numbered subkey words
are modified, this will always lead to a word in the same
group of 10 words as the one being modified.)

in the array of words of internal key, and then modify
the current key word by XORing it with that result ANDed with MA.

Comments

As noted above, the unkeyed mixing rounds of MARS seemed to me
to be somewhat wasteful. However, I was one of what seemed to be
only a few people who liked MARS, thinking it one of the likeliest
candidates to remain secure for a considerable time to come. Originally,
I could not think of a good way to modify the key schedule to produce
keys for the mixing rounds. Once the key schedule was modified,
however, I saw a natural way to do this, which I described in a comment
to Round 2 of the AES process. However, a second comment in which I
made a correction to that proposal got garbled.

My proposal, as corrected, is:

Using the new key schedule, after generating subkeys 30 to 39,
the process that is applied to the array of 15 words to generate
a batch of 10 subkeys is to be performed once again. Once this
is done, the contents of that array, the elements of which are
known as T[0] through T[14], are to be modified as follows:

The intent of this is to cause the contents of the array T to
be a non-invertible function of the key. This is done by XORing the
contents of T after an iteration of the invertible transformation
function with values generated from it previous to that iteration.

The values chosen are all values used as subkey values, so that
additional memory is not required to retain other information to
be used for this purpose. For that reason, the subkeys that were modified
in order to be used for the multiplication portion of the E function in
the cryptographic core rounds of MARS are also avoided.

Then, two additional iterations of the process previously used
to generate a set of 10 subkey words are performed, but from each of them
only eight subkeys are now taken. Each subkey is used to provide four
bytes to XOR to the values used to index into the MARS S-boxes in
one of the mixing rounds.

Because there are now seven, rather than four, instances of the
procedure to modify the contents of the array T, the number of the
array word being calculated, when XORed with the contents of that word,
should be shifted three bits left instead of two bits left.

In my proposal, I did not specify how to take eight subkeys
from each of the last two key generation phases.
I now suggest that, instead of choosing the subkeys based on (7 * i) mod 15,
for each of these last two phases, the eight subkeys taken should simply be
from words T[8], T[9], ... T[14], T[0], thus always using the last
ones modified, and being different from the scheme used for the rest of
MARS. However, if the subkeys used for the first group of mixing rounds
are subkeys 40 to 47 in order, and for the second group 48 to 55
in order, the words taken from these rounds should be allocated to the
subkeys as follows: