Linux and the Alpha

This is the first of a 2 part series, an introduction to the Alpha family of computers in preparation for giving us the techniques for optimizing code on this high-performance platform in Part 2.

Ever since its announcement in the Fall
of 1991, the Alpha architecture (see Reference 3) has been the
foundation of the world's fastest systems. In fact, except for one
or two brief blips, Alpha systems have been the highest-performing
systems based on single-CPU SPECmark performance. With this
outstanding performance record comes marketing hype and, sometimes,
unrealistic expectations. It is not all that uncommon to find
e-mail messages or USENET articles saying things like: “I heard
the Alpha is so fast, but now I find that my dusty deck is just 10%
faster on the Alpha than on the other system.” So what's the
truth? The honest answer is that it depends on what you're doing.
Alpha systems are without a doubt fast machines, but it is
unreasonable to expect that taking a dusty deck and running it on
an Alpha will result in the best possible performance. This is
particularly true for programs that were written with the mind-set
of the eighties, when CPU cycles were at a premium and memory
bandwidth was abundant. Reality looks quite different today: CPU
clock-rates above 150MHz are the rule and even laptops can run at
200MHz or more. The result is that, today, the memory system—and
not the CPU—is often the first-order bottleneck.

In part 2 of this article, we will demonstrate a few simple
techniques that help avoid the memory system bottleneck. Except for
one case, the focus is on integer-intensive applications. The topic
of optimizing floating-point intensive applications is certainly
just as important but, unfortunately, well beyond the scope of this
series. The techniques presented can result in tremendous
performance improvements. While the techniques will be helpful for
all modern systems, they normally extract the biggest benefits on
Alpha-based machines. There are a couple of reasons for this
bias.

One, the Alpha architecture has been designed with longevity
in mind. Specifically, the Alpha architecture should be good for
the next 15-25 years, which corresponds roughly to a 1000-fold
increase in overall performance. For this reason, some
design-tradeoffs were made in favor of long-term viability rather
than short-term benefits. For example, the Alpha was right from the
start a 64-bit architecture, even though, at the time of its
announcement, 32-bit address spaces were considered comfortably
large.

Two, the current Alpha implementations are designed to
achieve high performance by pushing clock frequency to the limit.
This means the CPU-to-memory-system performance gap is the largest
for Alpha-based systems. For example, suppose a memory access takes
100ns. On a 500MHz Alpha CPU, this corresponds to 50 clock cycles.
In contrast, on a 250MHz CPU, this is only 25 cycles. So the
relative performance penalty of accessing memory is much higher on
a CPU with higher clock speeds. This may sound like a bad thing,
but since the absolute performance is the same, what this really
means is that a fast-clock CPU system that is running a
memory-bound application will be about as fast as a slower-clock
system, but when running a memory-wise application, it will be much
faster.

In this part of the series, I present a brief overview of
existing and upcoming Alpha implementations. While it is not
usually necessary to optimize for a specific CPU, it is helpful to
know what the characteristics of current CPUs and systems are. I
also discuss a couple of simple performance analysis tools that are
available under Linux. When porting legacy code to modern systems,
such tools are invaluable, since they avoid wasting time trying to
optimize rarely executed code.

Overview of Alpha Family

So far, the Alpha CPU family tree spans three generations; it
all began with the 21064 chip. At the time of its introduction, it
was the highest performing CPU, and it still makes for a nice
workstation, though it's no longer competitive with the latest
generation CPUs. This chip branched off into a version that was
called the “Low-Cost Alpha” (LCA), also known as 21066 or 21068.
The chip core was identical to the 21064 but it had an integrated
memory and PCI-bus controller. This high integration made it
possible to build Alpha-based systems at relatively low cost and
for the embedded systems market. Unfortunately, the design had a
major weakness—the memory system was seriously under-powered. This
created the paradoxical situation in which a system based on this
chip performed on some applications on average, no better than a
100MHz Pentium, but outperformed a P6 running at 200MHz. As a
result, the reaction to this chip varied greatly, and probably
resulted in quite a few disappointed customers for Digital. On the
other hand, there is no doubt that the low-cost at which
21066-based systems eventually were sold caused a quantum leap in
the number of Linux/Alpha users.

Around June 1994, the 21164 chip was announced. It had
dramatically improved performance over the 21064 and was the first,
and so far only, Alpha CPU to feature a three-level cache
hierarchy. The first and second-level caches were both on-chip and
only the third-level cache was on the motherboard. This chip, in
slightly improved versions, is still going strong. At the Fall 1996
Comdex in Las Vegas, such a chip, coupled with a liquid cooling
system, was demonstrated running at 767MHz. Another version, called
21164PC, is scheduled to become available around Spring 1997. It
omits the relatively expensive second-level, on-chip cache but adds
multi-media extensions and other performance-enhancing features. As
the name indicates, this chip is designed to be price-competitive
with PC processors, specifically the forthcoming Intel Klamath (an
improved P6). While price-competitive, the 21164PC is supposed to
deliver over 50% better performance than the Klamath. For this
second-generation, low-cost Alpha implementation, it certainly
looks like Digital and its co-designer Mitsubishi are not going to
repeat the mistakes of the past. The 21164PC promises to be cheap
and fast.

If you happen to have a deep pocket or want to take a glance
at what PC processors might look like in two or three years, the
21264 might be of interest. It is scheduled to become available in
high-end machine during the second half of 1997. With this chip,
CPU performance is expected to take another giant leap. Current
estimates call for a performance level that is three to four times
faster than the fastest CPUs available today.

Between each major chip generation, there are typically
“half-generation” CPUs which have improvements that derive
primarily from a shrink of the chip manufacturing process. For
example, the 21064 chip was followed by the 21064A, and similarly,
the 21164 was followed by the 21164A. In the former case, the core
of the chip remained virtually identical to the 21064, but the
primary caches doubled in size from 8KB to 16KB. In the latter
case, instructions for byte and word accesses were added and the
maximum clock frequency increased from 333 to 500MHz.

A summary of the performance attributes of the current Alpha
chip family is presented in Table 1.

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.