The Linux Signals Handling Model

Communication is the key to healthy relationships between threads and the kernel; these are the signals they use to communicate.

Signal Description and Default Action

The disposition of a signal can be changed from its default,
and a process can arrange to catch a signal and invoke a
signal-handling routine of its own or ignore a signal that may not
have a default disposition of Ignore. The only
exceptions are SIGKILL and
SIGSTOP; their default dispositions cannot be
changed. The interfaces for defining and changing signal
disposition are the signal and sigset libraries and the
sigaction system call. Signals can
also be blocked, which means the process has temporarily prevented
delivery of a signal. Generation of a signal that has been blocked
will result in the signal remaining as pending to the process until
it is explicitly unblocked or the disposition is changed to
Ignore. The
sigprocmask system call will set
or get a process's signal mask, the bit array inspected by the
kernel to determine if a signal is blocked or not.
thr_setsigmask and
pthread_sigmask are the equivalent
interfaces for setting and retrieving the signal mask at the
user-threads level.

I mentioned earlier that a signal may originate from several
different places for a variety of different reasons. The first
three signals listed in Table 1—SIGHUP,
SIGINT and SIGQUIT—are
generated by a keyboard entry from the controlling terminal
(SIGINT and SIGHUP) or if the
control terminal becomes disconnected
(SIGHUP—use of the
nohup command makes processes
“immune” from hangups by setting the disposition of
SIGHUP to Ignore).

Other terminal I/O-related signals include
SIGSTOP, SIGTTIN,
SIGTTOU and SIGTSTP. For the
signals originating from a keyboard command, the actual key
sequence that generates the signals, usually
CTRL-C, is defined within the parameters of the
terminal session, typically via stty(1) which
results in a SIGINT being sent to a process, and
has a default disposition of Exit.

User tasks in Linux, created via explicit calls to either
thr_create or
pthread_create, all have their own
signal masks. Linux threads call
clone with
CLONE_SIGHAND; this shares all signal handlers
between threads via sharing the current->sig
pointer. Delivered signals are unique to a thread.

In some operating systems, such as Solaris 7, signals
generated as a result of a trap (SIGFPE,
SIGILL, etc.) are sent to the thread that caused
the trap. Asynchronous signals are delivered to the first thread
found not blocking the signal. In Linux, it is almost exactly the
same. Synchronous signals happening in the context of a given
thread are delivered to that thread.

Asynchronous in-kernel signals (e.g., asynchronous network
I/O) is delivered to the thread that generated the asynchronous
I/O. Explicit user-generated signals get delivered to the right
thread as well. However, if CLONE_PID is used,
all places that use the PID to deliver a signal will behave in a
“weird” way; the signal gets randomly delivered to the first
thread in the pidhash. Linux threads don't use
CLONE_PID, so there is no such problem if you
are using the pthreads.h thread API.

When a signal is sent to a user task, for example, when a
user-space program accesses an illegal page, the following
happens:

page_fault
(entry.S) in the low-level page-fault handler.

do_page_fault
(fault.c) fetches i386-specific parameters of the fault and does
basic validation of the memory range involved.

handle_mm_fault
(memory.c) is generic MM (memory management) code
(i386-independent), which gets called only if the memory range
(VMA) exists. The MM reads the page table entry and uses the VMA to
find out whether the memory access is legal or not.

The case we are interested in now is when the access was
illegal (e.g., a write was attempted to a read-only mapping):
handle_mm_fault returns 0 to do_page_fault in this case. As you can
see from Listing 1, locking of the MM is very finely grained (and
it better be this way); the mm->mmap_sem,
per-MM semaphore, is used (which typically varies from process to
process).

force_sig(SIGBUS,current)
is used to “force” the SIGBUS signal on the
faulting task. force_sig delivers the signal even if the process
has attempted to ignore SIGBUS.

force_sig fills out the
signal event structure and queues it into the process's signal
queue (current->sigqueue and
current->sigqueue_tail). The signal queue
holds an indefinite number of queued signals. The semantics of
“classic” signals are that follow-up signals are ignored—this is
emulated in the signal code kernel/signal.c. “Generic” (or RT)
signals can be queued arbitrarily; there are reasonable limits to
the length of the signal queue.

The signal is queued, and current-signal
is updated. Now comes the tricky part: the kernel returns to user
space. Return to user space happens from
do_page_fault=>page_fault (entry.S), then the
low-level exit code in entry.S is executed in this order:

Next, do_signal unqueues the
signal to be executed. In this case, it's
SIGBUS.

Then handle_signal is called
with the “unqueued” signal (which can potentially hold extra
event information in case of real-time signals/messages).

Next called is setup_frame,
where all user-space registers are saved and the kernel stack frame
return address is modified to point to the handler of the installed
signal handler. A small sequence of code jumper is put on the user
stack (obviously, the code first makes sure the user stack is
valid) which will return us to kernel space once the signal handler
has finished. (See Listing 2.)

Careful: this area is one of the least-understood pieces of
the Linux kernel, and for good reason; it is really tough code to
read and follow.

The popl %eax ; movl $,%eax ; int $0x80
x86 assembly sequence calls
sys_sigret, which later on will
restore the kernel stack frame return address to point to the
original (faulting) user address.

What is all this magic good for? Well, first the kernel has
to guarantee that signal handlers get called properly and the
original state is restored. The kernel also has to deal with binary
compatibility issues. Linux guarantees that on the IA-32 (Intel
x86) architecture, we can run any iBC86-compliant binary code.
Speed is also an issue.

Finally, we return to entry.S again, but
current-signal is already cleared, so we do not
execute do_signal but jump to
restore_all as shown in Listing 3.
restore.all executes the “iret”
that brings us into user space. Suddenly, we are magically
executing the signal handler.

Did you get lost yet? No? Here is some more magic. Once the
signal handler finishes (it does an assembly “ret” like all
well-behaving functions), it will execute the small jumper function
we have set up on the user stack. Again we return to the kernel,
but now we execute the
sys_sigreturn system call, which
lives in arch/i386/kernel/signal.c as well. It essentially executes
the following code section:

The above code restores the exact user-register contents into
the kernel stack frame (including the return address and flags
register) and executes a normal
ret_from_syscall, bringing us back
to the original faulting code. Hopefully the
SIGBUS handler has fixed the problem of why we
were faulting.

Now, while reading the above description, you might think
this is awfully complex and slow. It actually isn't;
lmbench reveals that Linux has the
fastest signal-handler installation and execution performance by
far of any UNIX running:

"Implementation of correct and reliable signals has been in place for many years now ... reliable signals require the use of the newer sigaction interface" - What is the definition of 'reliable signals'? Are there some signals reliable and some not, in the same implementation? Or are there reliable implementations in which all signals are reliable, and unreliable implementations where all signals are unreliable?
And I have a question that is not answered in all the papers and books I have read so far: what happens if a signal arrives when a previous one (of the same type) is being handled in a signal catching function?

I trust reliable is referred to in the sense that some functions makes the entire signaling handling reliable, e.g. 'sigaction' may block signals and set a new signal handler in one atomic swoop, as to eliminate the potential race condition.
As to your last question, if the signal is not blocked and a new instance is generated in the midst of the processing of the last one, then the signal handler must be re-entrant. IMHO, the best method is to block the signal on entering the signal handler. It will be automatically restored on exit. On old UNIX versions, once a signal handler is invoked, the signal handling is reverted to default. If a new signal instance comes along, it will be treated the default way, most often terminating the process.