Ars Technica Guide to Virtualization: Part II

In this second installment of the Ars Technica Guide to Virtualization, we …

Privilege levels, rings, and fooling the guest OS

In the previous installment of the Virtualization Guide, I talked in general ways about the exculsive hardware access privileges that the OS reserves for itself. Now it's time to nuance that picture a bit, so you can see exactly how the OS retains the upper hand over applications and users. This brief installment sets the stage for Part III, which will talk in some detail about Intel VT.

A microprocessor does more than just blindly run whatever instructions are loaded into its front end, without regard for where those instructions came from. Microprocessors are in fact "aware" of the OS, and they provide direct hardware support for enforcing divisions between components of the hardware/software stack that I described in the previous article.

In order to keep applications from usurping any part of the OS's privileged access to system hardware, processors provide a mechanism that allows different programs to run at different privilege levels. These privilege levels are called rings, and they're arranged in a hierarchy that starts with Ring 0 (the lowest, most trusted level) and extends upwards through one or more progressively less-trusted Rings (e.g., Ring 1, Ring 2, and so on).

The hardware/OS/application stack, with rings

On any given processor, Ring 0 is the most privileged level, and any software that runs in Ring 0 is running in the most privileged state that the hardware supports. Such trusted software has complete command of the processor and of the rest of the system, which is why Ring 0 is typically reserved exclusively for the OS. Rings 1 and higher are less privileged, and they're home to less sensitive parts of the OS and to user-level application software.

Many processors have only two rings, Ring 0 for the OS and Ring 1 for all the other software in the stack. The x86 ISA, in contrast, has four rings (Rings 0 through 3), presumably because x86's designers thought more was better. But it turns out that all operating systems (with the exception of the erstwhile OS/2) use only two of x86's privilege levels: Ring 0 for the OS and Ring 3 for everything else. Rings 1 and 2 go completely unused.

Because programs running in the higher rings have restrictions on what parts of the system they can touch, it's harder for these de-privileged programs to do any real damage to the system, like crash it, or overwrite another user's data either through accident or malice. Conversely, an accidental or malicious error in a Ring 0 program (typically the OS kernel) often has catastrophic consequences for the entire software stack. The general rule is that programs are vulnerable to interference from programs that are running in the same ring or in a lower ring, but not in a higher ring. This rule means that the program at the very lowest ring is untouchable, while the programs in the higher rings are at the mercy of programs running below them.

The introduction of a hypervisor into this ring structure complicates the software stack picture to a greater or lesser degree, depending on the nature of the hardware's ISA and the exact type of virtualization being used. Specifically, the hypervisor must be the most privileged program in the stack that it hosts, which means that it must run in a lower ring than the guest OS. Clearly, this means that the guest OS must be de-privileged by being booted out of Ring 0 and forced to run in a higher ring. Most of the challenge of virtualization lies in keeping the guest OS in the dark about the fact that it's no longer running in Ring 0, and "classical" virtualization solutions meet this challenge with two tricks: trap-and-emulate, and shadow structures.