Linux Compatibility on BSD for the PPC Platform: Part 5

At various stages of Linux emulation debugging, the lack of a strong debugging tool such as gdb is a big issue. Getting native threads working with the Java Virtual Machine (JVM) is so tricky that it really requires a working gdb to understand what is going on. NetBSD's gdb is able to work on Linux processes, but it is not able to work with dynamic Linux programs, because it knows nothing about Linux's ld.so. Thus having Linux's gdb working is highly desirable. In this article, we will take a look at Linux emulation fixes needed to have a fully functional Linux gdb.

Spurious terminal hangup

The first issue we had with gdb was rather rude: gdb loaded successfully, displayed the credit lines, the prompt, and then it exited, taking the whole session down at the same time. When running Linux gdb in a telnet session, I was simply logged off.

Using ktrace(1) on gdb, we were able to discover that the reason was a hangup signal (SIGHUP) issued to gdb, and probably to all the processes in the process group operating on the terminal, because all were killed.

The question was: Where was this spurious signal coming from? The kernel trace showed no kill() system call from gdb, therefore gdb was not requesting the whole session to die. The decision to send the signal therefore had to be made in the kernel.

Getting a spurious SIGHUP is quite unusual. Most of the time, runaway processes get unexpected SIGSEGV or SIGBUS signals because they attempted to access invalid memory locations in their address spaces.

SIGHUP is rare enough, so we were able to locate where it was coming from in the kernel by using grep for SIGHUP in the kernel sources. There are basically four places in the NetBSD kernel where a hangup signal is sent to a process. These are

sys/kern/kern_exit.c:exit()

sys/kern/kern_proc.c:orphanpg()

sys/kern/tty.c:ttioctl()

sys/kern/tty.c:tymodem()

By adding a few printf commands before each of these locations in the kernel, it was possible to discover that the problem was coming from sys/kern/tty.c:ttioctl().

As its name suggests, ttioctl() is an ioctl method. By having a look to the kernel trace just before the SIGHUP is caught by gdb, we have a better idea of where the problem was coming from:

Knowing that the problem is related to ioctl() calls, we wanted to have a deeper look to the four ioctl() calls that occur before the hangup signal. We started with the last one, the ioctl() TIOCSETAW command, which happened to be the ioctl() call leading to the spurious hangup signal generation. But first, let us introduce this ioctl() command.

As we explained in part three of this series, ioctl() is used to perform various non standard operations on files -- this is different from read, write, etc. The Linux TIOCSETAW ioctl() command is used to set terminal properties, but after current I/O operation has finished. Using this system call, gdb just tries to adjust a terminal setting.

Let us now see how this ioctl() call happens to invoke ttioctl(). The ioctl() system call is implemented as sys/compat/linux/common/linux_termios.c:linux_sys_ioctl() for Linux processes. Like most Linux wrapper functions, the job of linux_sys_ioctl() is to make appropriate translations and then call the native ioctl implementation. The native ioctl() implementation depends on the file on which the ioctl() system call was made. linux_sys_ioctl() loads the appropriate function address in the *bsdioctl function pointer, like this:

bsdioctl = fp->f_ops->fo_ioctl;

Then linux_sys_ioctl() tests the command argument (com) of the ioctl() system call, doing different Linux to BSD translations depending on the command. The Linux ioctl() command we are looking after, TIOCSETAW is implemented as two NetBSD ioctl() commands: TIOCGETA and TIOCSETAW. Both of theses commands are executed using *bsdioctl.

The ioctl() operation is done on file descriptor zero (first argument of the ioctl() system call), which is the standard input. If the standard input is a terminal (as opposed to a regular file or a pipe), its ioctl() method is the ioctl() method for terminals, which happens to be ttioctl(). In the ttioctl() implementation, we can see that SIGHUP is issued when executing the TIOCSETAW command, and if the terminal output speed is null.

Our problem here is that Linux sometimes has a null output speed for a terminal because it does not need to have a value for a virtual terminal, whereas NetBSD uses this value to detect a terminal hangup.

The fix was to fool the NetBSD kernel into thinking that the terminal output speed was not null, whereas it was apparently set to zero for the Linux process. This was achieved by modifying the linux_termio_to_bsd_termios() and bsd_termios_to_linux_termios() functions from sys/compat/linux/common/linux_termios.c, whose job is to translate between Linux and NetBSD termios structures. The fix is simple: When a Linux process stores a null value in the output speed field c_ospeed, we set the field to -1 so that the NetBSD kernel will not hangup the terminal:

/*
* A null c_ospeed causes NetBSD to hangup the terminal.
* Linux does not do this, and it sets c_ospeed to zero
* sometimes. If it is null, we store -1 in the kernel
*/
if (bts->c_ospeed == 0)
bts->c_ospeed = -1;

And when the Linux process reads a struct termios from the kernel, if c_ospeed is -1 then we translate it back to 0. The Linux process thus has a consistent value for c_ospeed:

/*
* A null c_ospeed causes NetBSD to hangup the terminal.
* Linux does not do this, and it sets c_ospeed to zero
* sometimes. If it is null, we store -1 in the kernel
*/
if (bts->c_ospeed == -1)
bts->c_ospeed = 0;

The value -1 is arbitrary, it was chosen negative so that it cannot interfere with any valid value for c_ospeed. With this fix, gdb was able to startup without immediately hanging up the whole session. Next step was to actually use it.