Sound and Vision: A Technical Overview of the Emotion Engine

The Vector Units

Both VU0 and VU1 are microarchitecturally identical, but they're not functionally
identical. VU1 has some extra features tacked onto the outside of it that
help it do geometry processing, and VU0 has some features that it doesn't
normally use (but that VU1 does). Toshiba did things this way to make the units
easier to manufacture. Since VU0 is simpler, we'll start with it first.
Just keep in mind that a lot of what's said about VU0 also applies to VU1.

Vector Unit 0

VU0 is a 128-bit SIMD/VLIW design. (If you're confused about the term
"SIMD/VLIW," don't worry, so was I at first. We'll discuss what this
term means in a special section to follow.) Since VU0 is a coprocessor for the
MIPS III core, it spends most of its time operating in Coprocessor Mode.
This means it looks like just another logical pipe (along with the integer ALUs)
to the programmer. The instructions that make VU0 go are just 32-bit MIPS COP
instructions, mixed in with integer, FPU, and branch instructions. In this
respect, VU0 looks a lot like the G4's Altivec unit. Often, in the rendering
process, the CPU maintains a separate thread that controls VU0. The CPU places
FP data on the dedicated bus in 128b chunks (w,x,y,z), which the VIF unpacks
into 4 x 32 words for processing by the FMACs.

VU0 has its own set of 32, 128-bit FPRs (floating-point registers), each of
which can hold 4, 32-bit single precision floating-point numbers. It also has
16, 16-bit integer registers for integer computation.

Here are the computational units available to VU0 (and VU1):

4 FMACs

1 FDIV

1 LSU

1 ALU

1 random number generator.

The first 5 units here, the 4 FMACs and the 1 FDIV, are sort of the heart of
both VU0 and VU1 (which are themselves the heart of the Emotion Engine,
which is itself the heart of the PS2). So this is where the magic happens. Each
of the FMACs can do the following instructions:

Floating-Point Multiply-Accumulate

1 cycle

Min/Max

1 cycle

The FDIV unit does the following instructions:

Floating-point Divide

7 cycles

Square Root

7 cycles

Inverse Square Root

13 cycles

The bulk of the processing that the PS2 does to make a 3D game involves
performing the above operations on lots and lots of data.

Now, those last three units in my list (LSU, ALU, and RNG) aren't normally
shown in most charts as being part of VU0. I suspect this is because they aren't
used in coprocessor mode. When VU0 is acting like a MIPS Coprocessor, it only
uses the 4 FMACs. "Wait a minute," you're saying, "isn't VU0
always a MIPS coprocessor--you know, the 128-bit dedicated bus and stuff? You
went to great lengths to make that point in the first half of the article."
Yeah, I did kind of insist that VU0 is on the CPU's "team," and that
they share the same goals, and that it's bound to the CPU, etc.. This is
kind of misleading (although I would argue heuristically justifiable), but all
will become clear in the final section. For now, just understand that VU0 mostly
operates as a MIPS Coprocessor that handles any FP SIMD instructions that show
up in the CPU's instruction stream.

Vector Unit 1

VU1 is a fully independent SIMD/VLIW processor that includes all the
architectural features of VU0, plus some additional mojo. These additions relate
directly to VU1's role as a geometry processor for the Graphics Synth, and they
help bind it more tightly to the GS. The primary addition is an extra functional
unit, the Elemntary Functional Unit (EFU). The EFU is just 1 FMAC and 1 FDIV,
just like the CPU's FPU. The EFU performs some of the more basic calculations
required for geometry calculation.

Another big difference between VU1 and VU0 is that VU1 has 16K of data memory
and 16K of instruction memory (as opposed to VU0's 8K data/8K instruction
sizes). This larger amount of data memory is needed because VU1's role as a
geometry processor requires that it handle much more data than VU0.

Finally, VU1 has multiple paths it can take to get data to the GIF (and on to
the GS). Like VU0, VU1 can send display lists to the GIF via the main, 128b bus.
Or, VU1's VIF can send data directly to the GIF. Finally, there's a direct
connection between VU1's 16K data memory and the GIF, meaning that VU1 can work
on a display list in data memory, and the DMAC can transfer the results directly
to the GIF.

I have to pause here and note that there's some serious confusion in Sony's
literature on the direct path between VU1 and the GIF. One diagram for a slide
show seems to show the path as connecting the instruction memory to the GIF,
another diagram quite obviously shows the path going from the lower execution
unit to the GIF, and yet another shows it with the path connecting the data
memory to the GIF. This last one is the only one that makes sense to me, but I
went ahead and left my diagram ambiguous.

As you'll recall from the discussion of VU0, VU0 is controlled by the CPU,
and VU0 gets its instructions from whatever program the CPU is currently
running. VU1, however, doesn't work that way. VU1's VIF plays a much more
prominent role in VU1's life than VU0's VIF does in its. VU1's VIF takes in and
parses what Sony confusingly calls a 3D display list. This 3D display
list is not VU1's program. Rather, it's a data structure that contains
two types of information, and some specialized commands that tell VU1 how to
handle this information. The two types of info are

a. the actual VU1 program instructions, which go in VU1's instruction
memory. b. the data that said program operates on. This goes in VU1's data
memory.

The VIF decodes and parses the 3D display list commands, and makes sure that
VU1 program code and data find their way into the correct spots. In this manner,
VU1 can operate independently of the CPU to generate display lists. Executing
these VU1, "VLIW mode" programs brings into play those three
units that VU0 often neglects: the LSU, the iALU, and the RNG. These three
units, along with the EFU (which acts as a general FPU), all function to make
VU1 a full-blown SIMD/VILW coprocessor. Hahaha...there's that term again:
SIMD/VLIW. Now it's time to find out what it means.