SIMD architectures

What do Sony's Playstation2 and Motorola's MPC7400 (a.k.a. the G4) have in
…

Introduction

What do Sony's Playstation2 and Motorola's MPC7400 (a.k.a. the G4) have in common?? Besides the incredible hype behind both products and their legions of crazed fans, there's one acronym that unites them all--an acronym that sums up the secret to their stellar performance: SIMD.? Single Instruction stream, Multiple Data streams (SIMD) computing first entered the personal computing world in the form of Intel's neglected addition to the x86 instruction set, MMX.? Even though MMX was panned by the press and was slow to be adopted, SIMD computing was here to stay on the personal computing landscape.? And it's a good thing too, because SIMD is a technology whose time has definitely come, and it's just about ubiquitous on the desktop: MMX, SSE, 3DNow!, AltiVec, etc. are all acronyms for SIMD instruction sets. In this article, we're going to look at what SIMD is, what it offers, and how it's integrated in three-and-a-half of today's hottest processors.? Three and a half?? The half is Sun's upcoming MAJC architecture, which isn't actually out yet.? We've included it here because its approach to SIMD is quite different from the other three, so it provides a nice contrast.?

This article will provide a basic introduction to SIMD concepts, as well as an overview of the three and a half SIMD implementations under discussion.? One thing that should definitely be understood is that this article is actually the sequel to my previous G4 vs. K7 tech article.? If you want to look at AltiVec and 3DNow! in the context of both the G4 and K7 as a whole, then you must read the first article too.? This article focuses in on the SIMD, and ignores many of the important issues already taken up by its predecessor.

??

SIMD basics

Early microprocessors didn't actually have any floating-point capabilities; they were strictly integer crunchers.? Floating-point calculations were done on separate, dedicated hardware, usually in the form of a math coprocessor.? Before long though, transistor sizes shrunk to the point where it became feasible to put a floating-point unit directly onto the main CPU die, and the modern integer/floating-point microprocessor was born.? Of course, the addition of floating-point hardware meant the addition of floating-point instructions.? For the x86 world, this meant the introduction of the x87 floating-point architecture and its (now hopelessly archaic) stack-based register model.

So the x87 brought a new name, new capabilities, new registers, and new instructions to Intel's microprocessors.? Sound familiar?? It should.

Actually, the addition of SIMD instructions and hardware to a modern, superscalar CPU is a bit more drastic than the addition of floating-point capability.? A microprocessor is a SISD device (Single Instruction stream, Single Data stream), and it has been since its inception.??

?

As you can see from the above picture, a SIMD machine exploits a property of the data stream called data parallelism.? You get data parallelism when you have a large mass of data of a uniform type that needs the same instruction performed on it.? A classic example of data parallelism is inverting an RGB picture to produce its negative.? You have to iterate through an array of uniform integer values (pixels), and perform the same operation (inversion) on each one -- multiple data points, a single operation.? Modern, superscalar SISD machines exploit a property of the instruction stream called instruction-level parallelism (ILP).? In a nutshell, this means that you execute multiple instructions at once on the same data stream.? (See my other articles for more detailed discussions of ILP).? So a SIMD machine is a different class of machine than a normal microprocessor.? SIMD is about exploiting parallelism in the data stream, while superscalar SISD is about exploiting parallelism in the instruction stream.

There were some early, ill-fated attempts at making a purely SIMD machine (i.e., a SIMD-only machine).? The problem with these attempts is that the SIMD model is simply not flexible enough to accoodate general purpose code.? The only form in which SIMD is really feasible is as a part of a SISD host machine that can execute conditional instructions and other types of code that SIMD doesn't handle well.? This is, in fact, the situation with SIMD in today's market.? Programs are written for a SISD machine, and include in their code SIMD instructions.

One thing I'd like to note for the sake of all you nit-pickers out there, is that I'm going by the description of SISD as laid out in Hennessey and Patterson.? A more detailed discussion of the finer points of SISD vs. SIMD as concepts, while it would be appropriate here, would hinder us from moving more quickly to the actual comparison of the SIMD implementations.?