Diving into Gcc: OpenBSD and m88k

This article describes how the m88k-specific backend of the GNU
C compiler, gcc, was fixed, from the discovery and analysis of the
problems to the real fixing work. Since it started with almost zero
gcc internals knowledge, it should be understandable by anyone able to
read C code, and proves that diving into gcc is not as hard as one
could imagine.

Most of the code snippets displayed below come from gcc 2.95 sources,
in the m88k specific code (found inside
gcc/config/m88k/m88k.*). For more details about the gcc
internals, the reader is welcome to refer to the Resources.

Some Background History

The Motorola 88000 architecture is not well-known today. Think of it as a
bridge between the famous Motorola 68000 family and the well-known PowerPC
family. Although now a dead architecture, many fine m88k-based systems were
produced from 1988 to 1992, such as the Data General Aviion workstations and
Motorola's own embedded systems.

Due to the elegance of its design and the availability of second-hand
machines, the m88k systems became and remain quite popular among hobbyists. No
wonder that several free operating systems are ported, or are being ported to,
these machines. The most advanced effort was the OpenBSD/mvme88k port to the
Motorola VME boards.

Nivas Madhur started the OpenBSD/mvme88k port in 1995.
Back then, OpenBSD would still ship with gcc 2.7.2.1 and local patches (these
were the days!), and the problems Nivas had to face were dire kernel bugs, so
no real effort was done on the toolchain. Optimization was disabled in order
to prevent compiler bugs, if any, from interfering.

Nivas Madhur eventually stopped working on the port. Dale
Rahn integrated it into the OpenBSD main sources, but Dale
did not have the resources to maintain it. Steve Murphree,
Jr eventually took over. At some point, OpenBSD started to
use gcc 2.8, which did not fix the optimizer problems.

Steve was very close to producing an OpenBSD/mvme88k release.
Unfortunately, some weeks before the code freeze, the in-tree gcc was updated
to egcs 1.1 and compiler problems started to plague the port: even at
-O0 non-optimization level, the compiler would not always output
correct code. As the kernel was compiled with -O as an exception
to the -O0 rule, people started running unreliable kernels.

At some point, the userret() code path factorization in the
OpenBSD kernel for all architectures required kernels to be built with
optimization, relying on userret() being an inline function.
Unfortunately, gcc, by design, will never inline functions (even if explicitly
requested) at -O0. The non-return point had been crossed: gcc had
to be fixed.

Starting Debugging

Finding and fixing compiler problems is never easy. It's like climbing a
mountain: you need a good rope, and good assets. Moreover, a debugger is
mostly useless: you're not trying to find why code behaves incorrectly,
but rather what causes gcc to produce incorrect code.

In my case, I made sure to keep a known working gcc 2.8 binary in a safe
place, which could be used to bootstrap gcc 2.95 first, then as a working
reference. I also made sure I had a good backup.

Stack Me Harder

After compiling gcc 2.95 with gcc 2.8, my first test was to compile and
run a kernel and hope for the best. After a long compilation, my hopes
were smashed very quickly: the kernel failed very early in an
assertion:

dropping me into the debugger. Yet the function parameters from the
traceback were apparently correct!

From the assertion message and the debugger traceback, I could easily
reconstruct the code flow. Since the assertion failure would happen in an
uvm_pagealloc_start invocation, I built the following simple
program to reproduce a similar flow:

More interestingly, the assertion would only fail for the first
call but not afterwards. This would hint toward either incorrect
stack or incorrect register usage, eventually corrected
by the side effects of multiple function calls.

After some tinkering, I finally ended with this interesting sample
program:

The problem was apparently tied to the use of a 64 bit argument, but only in
some cases. Why?

Let's examine the m88k calling convention for these routines. The
canonical m88k calling convention mandates that the arguments are passed
in registers r2 to r9, with extra arguments
passed on the stack. If an argument can not fit in one register (such as
a double, or an int64_t), it will be put in two
consecutive registers starting at an even number, so that double word load
and store instructions can be used. In our case, the calling convention
would be:

Calling convention for even64()

r2 - oddmaker
r3 - evenmaker
r4, r5 - stamper
r6 - value

Calling convention for odd64()

r2 - oddmaker
r3 (wasted)
r4, r5 - stamper
r6 - value

However, looking at the code generated by gcc 2.95, odd64()
would be invoked with

r2 - oddmaker
r3 (unused)
r4, r5 - stamper
stack - value

Why would the last parameter be passed on the stack with the new compiler?

Let's dive in to the gcc sources. A large piece of code in
gcc/calls.c is responsible for function invocations, choosing
where to pass arguments, whether on the stack or in a specific
register. To do this, it relies upon a set of macros provided by the
processor-dependent backend of gcc: the FUNCTION_ARG macro
set.

In this particular case, the macros misbehave as soon as they encounter
a 64-bit parameter, for which it is necessary to skip and waste an
odd-numbered register. As such, the problem probably lies in
FUNCTION_ARG or FUNCTION_ARG_ADVANCE. The first
macro will decide where to put the argument while the second one updates a
position counter for the next FUNCTION_ARG update to know at
which register number or stack location to start.

A simple grep through the gcc sources shows that gcc 2.8 invokes
FUNCTION_ARG_ADVANCE in five places:

Notice how CUM, the first parameter, is used unprotected?
Guess what happens in the CUM++; statement when
CUM happens to be *args_so_far? Our exact
problem! Compiling a call to odd64() would trigger the
CUM++; statement from the macro invocation using the
pointer. This is just one more bug caused by unnoticed preprocessor-unsafe
code, especially since it had been working correctly in previous gcc
versions.

As a result of the bug, args_so_far would end up incremented,
pointing to a semi-random memory location holding a huge value. As a result,
subsequent FUNCTION_ARG invocations would consider that all the
r2-r9 registers are in use, placing the remaining arguments on the
stack.

That's not all. There is another bug left in there. Look more closely at the
macro expansion: