Meta

Tag: Supervisory Bit

One of the advantages to having a Linux computer is, the fact that users can find a wealth of detailed information about their hardware, without having to buy any add-on software. ‘Phoenix’, the PC which also acts as my Web-server, is a Debian 8.5 system, i.e. a Debian / Jessie, Linux system, I think well-suited to being my server. Yet, ‘Phoenix’ is old hardware, and used to exist as my computer ‘Thunderbox’, before I completely wiped it and resurrected it as ‘Phoenix’. This machine was bought in the year 2008. It is a dual-core, 64-bit system with 4GB of RAM, and with a top clock-speed of 2.6 GHz.

One subject we can educate ourselves about, is how interrupt controllers work today. Back when I was still studying, we had a notion about interrupt controllers which was quite simplistic, and which supported a maximum of 16 interrupt request lines, from 0 to 15. Today, that sort of configuration would be called ‘a legacy interrupt controller’, and is not really supported anymore. ( :2 ) Today, what we have is ‘APIC’, which stands for ‘Advanced Programmable Interrupt Controller’. And one of the things which a present-day Linux user can do, is view the complete history of interrupt requests, since the last boot. When I do this, this is what I get to see:

We see that the type of interrupt request named “IO-APIC-edge” is maximally available for IRQ lines 0-15, even though IRQ9 is already something different: “IO-APIC-fasteoi”. This ‘EOI’ thing needs some explaining.

First of all, my IRQ history looks this small, because even this computer is extremely old. The same view on the laptop named ‘Klystron’ reveals a more complex picture.

The way interrupt controllers work today, requires that Interrupt Service Routines end with an instruction named ‘EOI’, which simply stands for End Of Interrupt. This CPU instruction sends a message to the interrupt controller, which allows the controller to update its local register, that allows the interrupt controller to know separately from the CPU, what the current Interrupt Priority Level is. The interrupt controller needs to know this, because the Interrupt Priority Encoder is part of the interrupt controller, and is only supposed to pass Interrupt Requests to the CPU, which have a higher priority, than the current priority level. And the fact has not changed, that the lowest-numbered IRQs, are also the highest-priority.

In the days of legacy interrupt controllers, Interrupt Service Routines did not need to end with an EOI instruction, because no explicit message needed to be sent to the controller, to tell it that the current Interrupt Service Routine was done. This routine would just exit, and upon doing so, return as a subroutine to a level of control that was either a lower-priority -service routine, or that existed unpredictably in user-space, since interrupt requests can interrupt user processes as well as Interrupt Service Routines already running. When a physical interrupt request was processed, the first thing that would happen was that the instruction pointer would get pushed onto the stack, and then the microcode of the CPU would dereference the interrupt vector table with the IRQ number to be jumped in to. The dereferenced Interrupt Service Routine would get executed with the status register supervisory bit set, which could be different from how the status register was before, when running a user-space process.

By the time the return address was popped off the stack, the status register already had to be restored, so that a user-space process could resume, but not inherit the supervisory status that the Interrupt Service Routine was running with, in kernel-space.

Also, any Interrupt Service Routine needs to be coded, so that it will push whatever register onto the stack that it is going to use, and that near the end of its life, these registers are popped back off, in the reverse order they were first pushed, by the Interrupt Service Routine itself. An actual Return from Subroutine instruction, finally acts to pop the instruction pointer, which lands the CPU in the exact part of the process that was interrupted. ( :1 )

All of this basically did and still does require that the interrupt controller be able to filter interrupt requests, so that only ones with higher priority than the current priority, and which were not masked, would get translated into a physical signal from the controller to the CPU, to execute an Interrupt Service Routine.

An APIC is assumed to be separated from the core CPU more than a legacy interrupt controller was, in that an APIC does not have access to the status register, but still has to filter the IRQ signals from peripherals, towards the same goal. And so with the use of the APIC, the Interrupt Service Routine needs to send an EOI signal, so that the APIC can unwind its local copy, of the current interrupt level.

To help the APIC accomplish this, this component, which is not an intimate part of the CPU, additionally needs to have something called an “In-Service Register”, which has 1 bit set, for every interrupt level that is currently in the process of running, with the assumption that only the highest-priority of those is truly running, and that if more than 1 bit of the In-Service Register is set, the lower-priority Interrupt Service Routines have all been interrupted by the higher-priority ones.

Further, it is usually assumed that a given Interrupt Service Routine should not be interrupted by additional requests for the same one, until the present instance has returned. And this detail also needs to be assured in an explicit way…

The APIC needs to have its In-Service Register, precisely because it does not have access to any of the CPU registers, including the status register nor any interrupt mask which the CPU itself may be storing. If an explicit need occurs to mask one of the interrupt request numbers, and if those come after IRQ15, then this needs to be sent to the APIC explicitly, and in a way that is no longer fast.

Okay. But now, when I look at the interrupt history of my own machine, this is as part of a health-check. And one fact which I always see on ‘Phoenix’, is that there is a history of 1 official hardware error. Presently, ‘Phoenix’ has been running for 2 days. But this machine can frequently run for 10 days, or even for 30 days, before it either needs a reboot, or before it is struck with a power failure. Even if it has been running for 30 days, I still see that there has been 1 official hardware error.

What this tells me is that the error in question happens early in the boot process, and always so. During boot-up, a Linux system starts with all its interrupt request lines masked, and a tedious process starts of detecting hardware, loading drivers, and activating the Interrupt Service Routines associated with those drivers.

Apparently these are chaotic moments followed by ‘normal operating time’, during which there is not allowed to be any more error.

What I also make sure to note, is that the total number of Non-Maskable Interrupts exactly equal the number of Performance Monitoring Interrupts. This is reassuring.

And the rest is just fun stuff to look at.

Dirk

1: ) If the reader assumes that the Interrupt Service Routine may simply unset the supervisory bit, directly before doing a Return from Subroutine, this is a casual error. The -service routine instance has no static way of knowing, whether it has just interrupted a user-space process, or another Interrupt Service Routine. And thus there is no static way to code, whether to unset the supervisory bit or not.

Along with that, If the reader assumes that the Interrupt Service Routine may simply push the status register at the beginning, then he has overlooked the fact that control is being handed to it with the supervisory bit set, regardless of whether it was set or unset prior to the current hardware interrupt taking place.

And so it would seem that the status register must be popped, but that it is not up to the -service routine, to push it, any more than it was up to the -service routine, to push the instruction pointer…