CPU Wars, Part 2: POWER to the People

In the second article of a three-part series, David Chisnall examines the current state of the CPU industry. Part 1 looked at general trends. Part 2 focuses on two of the surviving RISC families: POWER and SPARC.

Like this article? We recommend

Like this article? We recommend

Part 1 of this series took a broad look at the state of the CPU industry and
what general trends existed. This week, we’ll examine the two survivors of
the workstation-targeted RISC families: PowerPC and SPARC.

A Separation of POWERs

IBM’s POWER line began as a very high-performance multi-chip product.
The POWER2 was the first in the series to be a microprocessor, in 1996. Back in
1991, IBM began a collaboration with Apple and Motorola to create the PowerPC
architecture. This was very similar to the POWER instruction set, although
neither was a subset of the other. Some compilers, including GCC, can output
code that just uses the common subset, however; and AIX running on PowerPC chips
would trap POWER-specific instructions and emulate them in software.

Starting with the POWER3, IBM blurred the line further. The POWER3
implemented the 64-bit PowerPC instruction set, as well as the POWER2
instruction set. Thus, all recent POWER chips have also been PowerPC chips
(although the converse is not true).

The PowerPC was intended to replace both Motorola’s aging 68000 series
and Intel’s 8086 series. It included a number of features to make
emulating both of these architectures easier, including special instructions for
supporting both big-endian and little-endian operating systems.

PowerPC never made the inroads into the desktop market that IBM and Motorola
wished, but it has been hugely successful in the embedded market. Microsoft
released Windows NT 4.0 for PowerPC, but few machines running it were sold.
IBM’s OS/2 port took too long and never sold. Only Apple made any inroads
into the desktop market, but has now abandoned PowerPC in favor of x86
chips.

IBM sells PowerPC 970 and POWER5+ workstations, but these are considerably
more expensive than similar Opteron offerings.

This year sees IBM moving to the POWER6, replacing its current high-end CPUs.
The main market for the POWER-branded chips is high-performance
computing—the POWER5 series has been very popular among supercomputer
builders—although many of the technologies featured are likely to trickle
down into PowerPC-branded systems eventually. One thing, however, has migrated
the other way: VMX.

Motorola introduced vector extensions into its PowerPC line with the 7400
series, which IBM collaborated in designing but didn’t manufacture. These
were added to the PowerPC specification with version 2.03. IBM has implemented
these extensions with the PowerPC 970 series of chips, but they’ve been
absent from the POWER line. This situation will change with the POWER6, which
incorporates the vector instruction set (dubbed VMX by IBM,
AltiVec by Motorola, and Velocity Engine by Apple).

Another interesting feature is the addition of a decimal floating-point unit.
This change is important for financial institutions, in which a fixed number of
decimal places of accuracy is often required. It’s impossible to represent
many common decimals as nonrecurring binaries. The value 0.2, for example,
recurs after the eighth binary digit, and 0.1 after the ninth. This leads to
rounding errors, which are a big problem when dealing with money. Languages such
as COBOL and recent versions of Java provide support for decimal arithmetic to
avoid this problem, but emulating this arithmetic on a machine that
doesn’t natively support it can be very slow. This reasoning is thought to
be part of IBM’s initiative to consolidate all of its computing products,
from workstations to mainframes, on a single architecture.

The thing that makes the POWER6 most remarkable is the clock speed. In an era
when everyone else is moving to more—slower—cores, IBM plans to
release a chip at 5 GHz. This is likely to give IBM the best single-thread
performance for a while (and, consequently, the best overall performance, if you
use enough chips), which should present an advantage in some markets.

It should come as no surprise that the POWER6 supports SMT, since IBM was the
first to market this feature in a general-purpose CPU with this in earlier POWER
processors (the first to market overall was Sun, with a CPU aimed at Java
applications). Another IBM first was virtualization; System/360 derivatives were
the first to support virtualization, and IBM has included hardware support for
virtualization in all recent products. The POWER6 is expected to support 1024
virtual partitions, although it’s unlikely that quite this many will be
needed for a while.

Although the POWER6 is an incremental improvement on the POWER line, IBM has
several other CPU products, the most interesting of which is the Cell,
co-developed with Sony and Toshiba. The most unusual feature of the Cell is that
it’s a heterogeneous multicore design. Most other multicore processors
have two or more copies of the same core, while the Cell has one core of one
kind and eight of another.

The first core is a fairly simple PowerPC. This is a 64-bit design, similar
to the PowerPC 970 in capabilities but with two-way SMT and without out-of-order
execution. This core can run existing PowerPC code, but when the CPU is busy
it’s mainly responsible for coordinating the other eight, known in IBM
buzzword lingo as Synergistic Processing Units (abbreviated to SPU by
everyone who is too embarrassed to say "Synergistic" in polite
company). I’m not going to talk much about the Cell, because it has been
covered in exhaustive detail everywhere else already, but I will cover a few key
points.

The SPU is basically an extended VMX unit with a few other instructions. The
extended VMX instruction set (VMX128) is most notable for having 128 registers,
rather than the standard 32. Since each of these registers is 128 bits wide,
this gives 16KB of space in registers—a number not far off the size of
level-1 cache in other processors.

The most interesting feature is not the instruction set, however, but the
memory architecture. Rather than having a cache that’s transparent to the
programmer, the 256KB of RAM that’s local to the SPU is directly exposed,
along with instructions to perform bulk DMA transfers between it and main memory
(or memory belonging to other SPUs).

In recent years, IBM has put a lot of its weight behind Free Software, and
Linux in particular. This makes sense from the perspective of IBM as a whole,
since IBM is a services company that tries to give customers the product they
want, whatever it is—and a big support contract to go with it. This
approach may well have a knock-on effect for IBM’s CPU arm, since Linux
runs just as well (and in some cases better) on POWER/PowerPC as x86. For
customers looking at thin clients or simple workstations, PowerPC designs
originally aimed at the embedded market might be a good choice. For those
looking at high-end servers, the POWER6 may be cost-effective. Without the need
to run Windows, there’s a great deal more flexibility in the possible
architectures.

FreeScale

FreeScale, formerly Motorola’s CPU division, has been shipping PowerPC
chips since the beginning. FreeScale was still providing Apple with laptop chips
before the Intel switch, but the design wasn’t updated much in recent
years. While still shipping a large number of PowerPC chips, they tend to be
mainly for the embedded market (thirty or so in every new BMW, for example).

The FreeScale PowerPC 74xx series (the G4, to Apple users) was seriously
limited by the slow front-side bus speed, which made it almost impossible to
keep the vector unit fed with data. The successor to this design, the e600, is
due out early in 2007, and is expected to feature a dual-core design running at
up to 1.5 GHz, with the e700 taking this speed up to 3 GHz and adding 64-bit
support. When these chips were first announced in 2004, they seemed obvious
contenders for a PowerBook upgrade. Now, their place seems a little more
uncertain.

PA Semi

After the death of the Alpha, some of the brightest CPU designers in the
industry were scattered. Many ended up at AMD and worked on the Opteron. A few
went to Intel and worked on XScale. Two have now formed their own company, PA
Semi, and are working on PowerPC designs.

PowerPC grew out of a collaboration between three companies, and IBM has
always been keen to encourage others to use the architecture, hoping to displace
x86. This setup has worked quite well in the embedded sphere, with the PowerPC
4xx series being very popular with ASIC designers who want to add a little bit
of custom logic to an existing design. It’s also found at the core of a
number of higher-end FPGAs, allowing an existing operating system to run on the
PowerPC core and offload application-specific workloads to custom logic in the
FPGA. Most PowerPC designs tend to focus on the bottom end of the market,
however.

PA Semi is more ambitious. Its recently unveiled PWRficient design is
squarely aimed at the performance-per-watt target. Featuring two 2 GHz 64-bit
cores, it’s an impressive design. The chip features DDR2 controllers and
2MB of level-2 cache. These aren’t attached to the cores directly;
instead, they’re connected via a crossbar, giving either CPU direct access
to the cache, and the cache controller direct access to both memory
controllers.

The same crossbar is used to attach a number of other dedicated processing
units. These include 10-gigabit Ethernet controllers, DMA engines, and a TCP
offload engine. The DMA controller, being connected to the crossbar, is capable
of moving data between any of the I/O components and memory (including
I/O-to-I/O and memory-to-memory transfers) in a way somewhat reminiscent of the
Cell. Perhaps the most interesting features is that the chip implements a
significant portion of iSCSI in hardware, as well as IPSec/SSL and common RAID
functions. This makes these chips an ideal choice for lower-power NAS
controllers.

In spite of expected performance close to that of the PowerPC 970MP, the
power dissipation of the chip is claimed to peak at only 25 W at 2 GHz, dropping
to 10 W for the 1 GHz variant, compared to around 80 W for a 2 GHz dual-core
970MP, which requires additional chips for memory, PCIe etc. controllers.

The PWRficient line looks like the ideal part for a non-x86 laptop, but we
don’t see a lot of people lining up to manufacture those now that Apple
has left the market. IBM might have been interested in a ThinkPad that could run
AIX, but has now sold its laptop arm to Lenovo. I’ll be very interested to
see what products do end up using the chips; the fact that QNX and Wind River
are both partners of PA Semi indicates that they’re aiming hard at the
embedded/real-time market, and I expect that a low-power, high-performance chip
of this nature is going to give rise to a lot of exciting products in the next
few years.