The world's first ATX-compatible, workstation-class mainboard for the IBM POWER8 processor.

In this update we explore the performance of virtualized Linux guests
on an OpenPOWER Linux host with QEMU. Several tests are run, and all
yield a somewhat surprising result – virtual machines actually
provide a performance boost compared to native execution when the host
SMT is set to 1! We suspect this is due to native host scheduling
problems, but this also implies that there is considerable untapped
potential latent within these OpenPOWER machines.

Test Setup

For all tests below, we use a Firestone reference server with dual
8-core 190 W CPUs, 4 Centaur memory buffers, and 256 GB RAM. We While
the absolute numbers will change on a Talos machine, proportionally
the numbers should be nearly identical when comparing native execution
to the two virtualized modes.

OpenPOWER machines under KVM/QEMU have two separate virtualization
modes available, “Hypervisor” (kvm-hv) and “Problem” (kvm-pr). The
hypervisor mode uses the native virtualization extensions of POWER7
and greater CPUs, and provides the best possible peformance of any
virtualization mode on POWER systems. However, this mode is limited
to the host CPU generation or the prior CPU generation, and
furthermore cannot be used from inside another virtual machine. In
comparison, problem mode executes the virtual machine completely in
user mode by utilizing the problem handlers of the POWER architecture,
and emulates privileged instructions where needed. This
virtualization mode can be used on any PPC / POWER hardware, can
emulate any PPC / POWER CPU type or generation, and is suitable for
nested virtualization, but carries a variable performance penalty
based on workload.

One final variable is that POWER machines can be set to different SMT
(Simultaneous MultiThreading) modes. POWER8 CPUs natively support 8
simultaneous threads (SMT 8), but some workloads (e.g. QEMU) require
the native SMT support to be disabled (SMT 1). As a result, we
benchmark the native SMT 8 performance alongside the native SMT 1
performance for direct comparison. It is hoped that over time, as
QEMU on POWER matures further, this limitation can be removed.

Test 1 - Kernel Compile

Building on our previous kernel compilation
tests, we ran timed compile tests on
several native and virtualized configurations. As before, a snapshot
of the Linux kernel source tree was pulled and compiled for POWER
using the stock Debian configuration. The compilation took place
entirely within a dedicated tmpfs mount. The command used to compile
was:

Test 3 - UNIX Bench

Given the rather odd results shown above, a more comprehensive
systemwide open-source benchmark was sought. Unix Bench gives
detailed information on the speed of various system calls, process
spawning, etc. and we ran this benchmark on all four of the test
system configurations.

Discussion

As before, the highest performance is attained within the kvm-hv
virtual machine, which still exceeds native performance. The kvm-pr
virtual machine performs far worse than expected, only reaching 11.6%
of the kvm-hv performance in these kernel operation -heavy tests.

The results do shed some light on the performance increase inside a
kvm-hv virtual machine, however. It appears that system call overhead
is greatly reduced inside the kvm-hv virtual machine as compared to
native exection, including execl(), and this would easily explain the
observed results for the timed compilation tests. Furthermore,
disabling SMT produces a puzzling, massive drop in timed compile
performance, but this drop is not reflected in the Unix Bench results
above. Overall, these test results hint the Linux kernel may not be
properly tuned for native execution, and that our prior benchmarks on
the campaign page and in the updates are likely significantly
under-reporting OpenPOWER’s true performance limits. We will be
forwarding these results to IBM for further analysis and hopefully a
fix that unlocks more of OpenPOWER’s true potential!

Talos™ Mainboard and Accessories

The world's first ATX-compatible, workstation-class mainboard for the
new, free-software friendly IBM POWER8 processor. Includes one
heatsink and 92 mm fan, one ATX-compatible I/O shield, and a live
rescue DVD with factory reset utilities, source code for firmware and
FPGA components, mainboard schematics, user manual, and Ubuntu
installation media. CPU, RAM, power supply, storage drives, and
chassis sold separately.

No longer available

Free Worldwide Shipping

$7,100

Talos™ Desktop Edition (TALP8D050)

A complete Talos™ workstation with a CPU of your choice, 128 GB of DDR3 ECC RAM, an AMD Radeon RX 480 (8 GB VRAM) GPU, and two Western Digital WD40EFRX 4 TB SATA drives, all installed in a heavy-duty tower chassis. Comes pre-installed with Debian. Select the CPU by purchasing it as a separate item in the same order or under the same account - the CPU will be installed and ready to go before shipping.

No longer available

Free Worldwide Shipping

Desktop Environment

$7,600

Talos™ Storage Server (TALP8S050)

A complete Talos™ server with a CPU of your choice, 128 GB of DDR3 ECC RAM, LSI SAS controller, and two Western Digital 4 TB SAS drives, all installed in a heavy-duty 4U rack mount chassis with 24 3.5" hot swap SAS drive bays and redundant 1200 W power supplies! Comes pre-installed with either Debian or CentOS. Select the CPU by purchasing it as a separate item in the same order or under the same account - the CPU will be installed and ready to go before shipping.

No longer available

Free Worldwide Shipping

Operating System

$17,600

Complete Talos™ Workstation (TALP8W100)

A complete 12-core Talos™ workstation with 256GB of DDR3 ECC RAM,
installed in a customized, heavy-duty chassis with your choice of an
AMD® FirePro™ or nVidia® Tesla™ graphics card (see list). Also
includes built-in 4TB RAID1 (2x 4TB enterprise SAS disks and LSI SAS
controller), plus a pre-installed copy of Debian or CentOS to get you
up and running in no time!