PowerPC on Apple: An Architectural History, Part II

Part II of Hannibal's three-part series on the history of the PowerPC on Apple …

The PowerPC 7400

The Motorola MPC7400 is a widely-used embedded processor, which means that it's designed primarily for use in routers and other non-PC devices that need a microprocessor with low power consumption and strong digital signal processing (DSP) capabilities. Apple Computer used the 7400 as the CPU in the first version of their G4 workstation line, and this processor was later replaced by a lower power version, the 7410, before the 7450 (a.k.a. the G4+ or G4e) was introduced.

Except for the addition of SIMD capabilities, the 7400/7410 is essentially the same as the 750. Motorola's technical summary of the G4 has this to say about the 7410 vs. the 750:

The design philosophy on the MPC7410 (and the MPC7400) is to change from the MPC750 base only where required to gain compelling multimedia and multiprocessor performance. The MPC7410's core is essentially the same as the MPC750's, except that whereas the MPC750 has a 6-entry completion queue and has slower performance on some floating-point double-precision operations, the MPC7410 has an 8-entry completion queue and a full double-precision FPU. The MPC7410 also adds the AltiVec instruction set, has a new memory subsystem, and can interface to the improved MPX bus. (MPC7410 RISC Microprocessor Technical Summary, section 3.11).

Aside from the vector execution unit, the most important difference in the execution cores of the two units lies in the 7400's improved FPU. The 7400's FPU is a full-blown double-precision FPU, and it does single- and double-precision floating-point operations, including multiply and multiply-add, in three, fully-pipelined cycles.

With respect to the instruction window, Motorola describes the differences between the 750 and the 7400/7410 as follows:

The MPC750 has a 6-entry IQ and a 6-entry CQ. For each clock, it can fetch four instructions, dispatch two instructions, fold one branch, and complete two instructions. The MPC7410 is identical, except for an eight-entry CQ, as shown in Figure 1. The extra CQ entries reduce the opportunity for dispatch bottlenecks to the MPC7410's additional execution units.

In all other respects, including the number and size of the reservation stations attached to each execution unit, the processors' instruction windows are the same. (Note that the 7400's two vector execution units each have a one-entry reservation station.)

Figure POWERPC.5: the PowerPC 7400

The 7400's vector unit

In the late 90's, Motorola and IBM jointly developed a set of SIMD extensions to the PowerPC instruction set for use in the PowerPC processor series. These SIMD extensions go by different names: IBM calls them VMX and Motorola calls them AltiVec. I normally refer to these extensions using Motorola's AltiVec label.

The new AltiVec instructions, which I've covered in detail in elsewhere, were first introduced in the 7400. The 7400 executes these instructions in its vector unit, which consists of two vector execution units: the vector ALU (VALU) and the vector permute unit (VPU). The VALU performs vector arithmetic and logical operations, while the VPU performs permute and shift operations on vectors. To support the AltiVec instructions, which can operate on up to 128 bits of data at a time, 32 new 128-bit vector registers were added to the PowerPC ISA. On the 7400/7410, these 32 architectural registers are accompanied by six vector rename registers.

The AltiVec instruction set was a hit, and it began to see widespread use by Apple and by Motorola's embedded customers. But there was still much room for improvement to the 7400's AltiVec implementation. In particular, the vector unit's single VALU was tasked with handling all integer and floating-point vector operations. Just like scalar code benefits from the presence of multiple specialized scalar ALUs, vector performance can be improved by splitting the burden of vector computation among multiple specialized VALUs operating in parallel. Such an improvement would have to wait for the successor to the G4, the G4e.

PowerPC 7400/7410 Conclusions

The major problem with the 7400/7410 was that its short, four-stage pipeline severely limited the upwards scalability of its clock rate. While Intel and AMD were locked in the GHz race, Motorola's 7400/7410 was stuck around the 500MHz mark for quite a long time. As a result, Apple's x86 competitors soon surpassed it in both clock speed and performance, leaving what was once the most powerful commodity RISC workstation line in serious trouble with the market.