Sorry for the delay in this post. I could not get to this post in time and wanted to be sure it is well-researched. The final post in this series is a comparison of the hardware support in the ARM and x86 world. As mentioned in the previous post the biggest reason for ARM to include virtualization in their architecture is to be viable in the server market against x86. So I think a comparison of x86 and ARM hardware support for virtualization is warranted.

The extensions added to both ISAs have a similar scope. Which isn’t surprising since they have the same goals. These goals, first formalized by Popek and Goldberg in 1974, are:

A program running under the VMM should exhibit a behavior essentially identical to that demonstrated when running on an equivalent machine directly.

The VMM must be in complete control of the virtualized resources.

A statistically dominant fraction of machine instructions must be executed without VMM intervention.

Still there are differences in the implementation of each and tradeoffs chosen which make an interesting case study.

The Players
The ARM architecture is owned by ARM. ARM designs its own processors and licenses its architecture to some other chip manufacturers. In the x86 world both Intel and AMD have their own offerings. Intel’s offerings fall under the umbrella marketing term Virtualization Technologies or VT. VT is composed of VT-x, which encompasses the core side features; VT-d, which encompasses the IOMMU; and VT-c, which covers the network interface. AMD markets its core side virtualization offerings under the label AMD-V while the IOMMU is marketed under AMD-Vi. The Intel and AMD offerings are different in implementation details but are more closely architected to each other than ARM’s offering.

Design Philosophies
The support for virtualization in both architectures is similar but the implementation details are colored by the design philosophies of each architecture. So before we start comparing the specifics of each architecture we need to talk about the general philosophies guiding the design of each.

RISC vs CISC
When we compare x86 virtualization vs ARM virtualization their respective pedigrees as CISC and RISC machines start to be visible. In CISC (Complex Instruction Set Computer), instructions are more powerful and try to bridge the semantic gap between an assembly programmer and the machine. In RISC (Reduced Instruction Set Computer), instructions are simpler giving more flexibility to the software. There are tradeoffs in each philosophy dealing with implementation difficulty, code density, and compiler support.

Backwards Compatibility
The extensions in both ARM and x86 mostly offer backwards compatibility to existing software. Intel has historically gone the extra mile in providing backward compatibility. It is this effort that allows experiments like the one in the video below:

A commitment to backwards compatibility makes software porting significantly easier. But over time it results in cruft in the architecture that make implementation more difficult. It also prevents a major paradigm shift. While ARM provides backwards compatibility, generally ARM is more willing to make major architectural changes.

Ecosystem
A lot of differences between the two architectures stem from the different business models of ARM and Intel. ARM is an IP company that does no manufacturing. It designs an architecture, processor cores, and system IP blocks. Customers may choose to buy all or part of their offerings and customize their SoC according to their needs.

Intel on the other hand manufactures their own core, their own chipset, and provides reference designs to OEMs. There is very little customization and Intel is the owner of all the blocks in the system except for external devices that communicate through a standard interface. AMD for the most part operates in the same manner with a top-down design philosophy.

Feature Implementation Comparison
At a high level the features supported in each architecture are very similar. They all support two stage translations. Intel calls this Extended Page Tables, AMD calls it Rapid Virtualization Indexing. They both support exception and interrupt interception by hypervisor and triggering of interrupts in the guest. They maintain host state separate from the VM state.

Privilege Levels
In x86 the expectation is for the processor to run Windows or any other OS right out of the box without any major changes. The hypervisor needs to be a trusted program that can be installed on the operating system. So the hypervisor works as a part of privileged mode using special virtualization instructions, including instructions for enabling and disabling the hypervisor. Once the hypervisor is enabled a privilege separation is created for host versus VM level.

This is very different from how the privileges are organized in the ARM world. The developers don’t have the same expectations from ARM. Until very recently most of the commercial ARM devices did not allow the user to purchase and use software of their choice. This allowed the device maker to make major changes to the software. When implementing virtualization ARM does not have the same restrictions present in the x86 world. Because of this ARM’s approach to the privilege levels is different from x86.In ARM the hypervisor is a new mode at a higher privilege level than the regular kernel. Using a hypervisor requires changes to the boot code and it is incompatible with running the hypervisor as an application on an unchanged OS.

Virtual Machine Entry
The VT-x enhancement create a memory mapped area called VMCS (Virtual Machine Control State) for each of the VMs. This area is setup with a state for all the registers and a VM Launch instruction is used to mirror this state onto the processor. AMD has similar instructions with a similar data structure.

ARM on the other hand does not make these software simplifications and expects the hypervisor to manually modify any registers it needs to modify. There isn’t even a special instruction to launch a VM. In the ARM architecture to launch a VM the hypervisor sets the Exception Link Register (ELR) to the desired PC (Program Counter, Instruction Pointer in x86) and performs an exception return.

This is in line with ARM being a RISC architecture and x86 being a CISC architecture. Analogous to the launch of a VM, a similar structure is created for handling process details in the x86 world while in ARM it is entirely maintained by the OS.

Exception Intercepts
ARM allows redirecting most exceptions to the hypervisor. The ARM architecture allows intercepting interrupts. There is a special mode that allows running an application directly under the hypervisor and this mode allows intercepting undefined instruction and supervisor call exceptions but enabling this mode disables the kernel level. In the ARM architecture, however, there isn’t a policy for redirecting page faults from the kernel to the hypervisor.

In x86 a page fault can be redirected to the hypervisor. This allows the hypervisor to page out the exception vectors for the guest and helps in providing shadow page tables. This feature is necessary in x86 because when hardware support for virtualization was initially introduced, it did not contain support for a two stage tablewalk. The ARM architecture requires 2-stage tablewalks be implemented when implementing virtualization. Since a 2-stage tablewalk usually has higher performance and is easier to implement than shadow page tables, the lack of this feature isn’t a problem for the ARM architecture.

IOMMU and Devices
Both ARM and x86 chipsets have support for the features we talked about in Post 3. One thing to note is that IOMMU does not need core support so in the ARM world there is a potential for customers to choose their own implementation of IOMMU than to go with ARM’s offering. In addition to the IOMMUs mentioned already there are other important implementations. There is a standard defined for an IOMMU for PCI-Express devices. In addition graphics cards have supported the GART IOMMU since AGP.

Different devices may offer different amounts of support for virtualization. That may include sharing a device between multiple virtual machines. For example, Intel’s VT-c allows the ethernet controller to be shared between multiple virtual machines.

In conclusion both ARM and x86 have similar hardware extensions for virtualization. Which is expected since both of the architectures are competing for the same market. The implementation of each is different in several ways as mentioned and it reflects the pedigree of each architecture. And ARM’s support for virtualization points to an interesting future.