Nintendo's
GameCube Technical OverviewYes it looks
like a cute little toy, but a very powerful console lies underneath.

Release
Dates

Nintendo released
the GameCube on Sept 14th, 2001 (Japan) and Nov 18th, 2001 (North America).
That's three years after the Dreamcast Japanese release of Nov/1998.

The GameCube
was released in North America at a price of $199 US.

Specifications

First released
by Nintendo at Nintendo's Spaceworld on August 24th, 2000 in Tokyo, Japan.
New specs announced by Nintendo on it's website on May 15th, 2001, during
the 2001 E3 show in LA. "Gekko" CPU upgraded from 400 MHz to 485 MHz, and
"Flipper" GPU downgraded from 202.5 MHz to 162 MHz.

*The peak figures listed are all
for maximum instantaneous performance and cannot be achieved in an actual
game. However, following the conventions of the game industry, they are
listed for your reference.

First off, lets point out that Nintendo
is being very conservative in it's 6 to 12 million polygons/seconds rating,
as
the developer Factor 5 who is developing Star Wars: Rogue Squadron is already
doing 12 million polygons/second, and the developer claims they are only
using 50 percent of Gamecube's power. Factor 5 indicated they could get
20 million polygons/second per second with all effects. Effects stands
for texture layers, and not polygonal lighting.

Two videos of Star Wars: Rogue Squadron
running on Gamecube at cube.ign.com
available here.

Here is information on Star Wars:
Rogue Squadron, and the Gamecube as provided by Julian Eggebrecht (President)
of Factor 5, and was originally presented on the forum at this german video
game site: Maniac Online.

Both videos run in real-time and only
use 50% of the hardware.

The X-Wing model is the original model
used by Industrial Light and Magic (ILM), for special effects in the Stars
Wars film and includes ILM's textures and shaders. The X-Wing alone is
comprised of 30,000 polygons and the pilot has 4000 polygons!

Both demos run at a constant 60 fps,
double buffered, true color, and with full screen anti-aliasing and deflickering.

The surface of the second Death Star
from the film was rebuilt accurately at 1:1 ratio. The simple shapes on
the Death Star surface have up to 300 polygons. Every element has a 512x512
true color texture. You can see 25 of them in the demo. There are 70 Tie
Fighters and X-Wings onscreen. So, there are like 200,000 polygons at 60
fps with up to 8 light sources along with gloss, dirt, and bump maps. (Note:
that number and type of light source can have a huge effect on the polygon
rate)

Texturing allows all these effects:
(alpha, bump mapping, gloss mapping, specular highlights, etc.) in the
same cycle. It can do 8 layers in a single pass. (It has been previously
reported that Julian said single "cycle", but that is not possible since
Flipper has 4 pipelines with one texel unit per pipeline)

The 2 MB frame buffer is a render buffer
- when a frame is done, it is then sent to main memory. When being sent
to main memory anti-aliasing and deflickering is done.

The 1 MB texture cache is automatically
filled during rendering, as the T&L engine triggers the swap. Textures
that are used often can be "locked" into the cache.

A-Memory is for audio but can also be
used as a buffer for other items that don't need the speed of main RAM.

The 2 MB on-chip frame buffer contains
only the data for the current frame being rendered, and the z-buffer. Double/triple
buffer is stored in main RAM, because the video DAC gets the image data
directly from main RAM. To render an image, polygonal data is sent to the
T&L unit, which loads/swaps textures in automatically into the texture
cache. Textures are decompressed during rendering, so uncompressed textures
never take up memory.

Specs are one thing, but it was the
Star Wars: Rogue Squadron videos that Nintendo showed proves it has quite
an amazing new console. This one game looks outstanding, and it is still
very early in development.

A side note on Factor 5 is that they
will be providing the sound tools for Gamecube development. The software
is called MusyX and you can find details on it here.

Motherboard

Click on the motherboard picture
to see a larger picture.

A beautiful piece of engineering
interms of size, number of large components (only 5 large semiconductors
with CPU, GPU and three memory chips), low manufacturing cost, and awesome
performance. The entire board is roughly the size of a compact disc jewel
case!

Unlike Sony's Playstation
2 Emotion Engine, the Gekko MPU was not built from the ground up. It's
a derivative of the PowerPC 750 RISC processor and includes some 50 new
instructions. Based on 0.18-micron copper wire process technology, the
device runs at 485 MHz and has an external bus to the Flipper device with
a peak of 1.3 Gbytes/s. The chip has a performance rating of 1125 DMips
(Dhrystone 2.1).

The GameCube team chose to build
off the existing PowerPC design to leverage the available tool chain, such
as compilers and optimizers. IBM claims this has given developers a jump
on creating new games. "Developers have been making software for the GameCube
a long time before people knew we were doing the silicon," said IBM's West.
"If you were to take code written for a PowerPC you could essentially run
it on this device. We didn't deviate from what is a well-understood architecture
by a large amount."

One of the modifications it made
was to cut the 64-bit floating point unit in half, allowing it to do two
32-bit floating point operations every cycle. "Conventional wisdom is that
four-way is actually better, but this is not necessarily true," West said.
"Two-way is actually pretty much as powerful as four-way, plus it takes
up less silicon and it's easier to make it go fast. We're going to try
to complete two instructions every cycle."

To improve the internal data flow,
IBM tried to eliminate "cache trashing," or wasting cache space on
transient data. The 256-Kbit Level-2 cache can be locked down so that it
retains only the data that needs to be reused. There's also an internal
direct memory access that moves data from the cache while allowing the
device to process a different set of data. This mechanism helps mitigate
the incremental latency associated with compressing and decompressing the
data.

"You often get into a mode of
cache trashing and filling it up with useless data," said PowerPC architect
Peter Sandon. "We tried to optimize the data movement so that we don't
see the cache misses you would otherwise see."

Chip
Sizes

Nintendo has released the chip size
for "Gekko" and "Flipper" as indicated in this article
entitled "Designers bring practical touch to GameCube" at EE Times on Sept.
7th, 2000.

Although MoSys was perhaps
the most strategic partner, the GameCube project involved several alliances.
IBM Corp. provided the so-called Gekko CPU, a custom version of the 400-MHz*
PowerPC with 256 kbytes of secondary cache — all made with a 0.18-micron
copper CMOS process. Despite the large secondary cache, IBM was able to
build the chip on a 43-mm2 die, said Takeda.

The 3-D graphics technology is
from ArtX Inc. (Palo Alto, Calif.), which is fabricating the embedded system
chip, a 120-mm2 device made with 0.18-micron technology, which contains
the SRAM embedded memory as well as the ArtX graphics engine and a sound
generator. Volume production of the part is to begin next month at the
No. 9 fab at NEC Kyushu. And Matsushita is supplying a proprietary 8-cm
optical disk for the games.

*Note: CPU is now 485 MHz due to revised
specs.

Small chip sizes is important for
the number chips produced on a silicon wafer. The more chips per wafer,
the cheaper the cost of production. Smaller circuit sizes can have a huge
effect on the size of the chip, as the IBM PowerPC CPU (0.18 micron)with
256 KB of secondary cache is the same size as the Hitachi SH-4 CPU (0.25
micron) used in the Dreamcast!

The "Flipper" GPU contains 51 million
transistors, of which half is used up by the on-chip memory.

The Gamecube has a massive heat sink
that covers all the chips, and it does have a fan on one of the air vents
on it's side. It has been reported that the fan and the drive are very
quiet.

Motherboard
Datapath

The motherboard datapath diagram
is not official as Nintendo has not released that information yet.

All the datapaths listed above are
bidirectional (read and write). The 81 MHz A-memory has low bandwidth,
but it is more then adequate for supporting all the sound channels that
the Gamecube is capable of, as this quick calculation shows:

With a 81 MHz DSP and 16 MB of sound
memory, the quality of sound in Gamecube games should be outstanding!

Someone from Nintendo has confirmed
that the sound chip on "Flipper" has it's own data pins to the 81 MHz DRAM
that is seperate from the main memory bus. This means that all sound accesses
will not have any negative effect on graphic operations.

Note that the CPU also has access
to the 81 MHz DRAM, so it can be used for other storage besides sound data.
A good place to store information that does not need lots of bandwidth
like selection screens.

Graphics
Processing Unit (GPU) Datapath

The "Flipper" chip datapath diagram
above is not official, and was created based on speculation.

The sound chip was not included in
the above diagram, as focus will concentrate on the most bandwidth intensive
aspects of the "Flipper" chip.

The texturing rate was released on
March 16, 2001 by cube.ign who indicate
that they have access to Nintendo's official GameCube Hardware Overview
documentation which states a pixel rate of 648 MPixels/sec. It has come
to my attention from someone who has access to the Gamecube developer documentaton
that there is a single texel unit per pipeline.

4 Pipelines @ 162 MHz

Texture Rate

Texel Rate

1 texel unit per pipeline

648 MPixels/sec

648 MTexels/sec

Polygon
Rate

Information below comes from this
article
at cube.ign, and they got the information from Nintendo's official GameCube
Hardware Overview documentation.

Features

Performance

1 vertex color + 1 light + 1 texture

20M polygons/sec

no vertex color + 1 texture

26.4M polygons/sec

1 vertex color + no texture (gouraud
shading)

32M polygons/sec

Note: above figures where changed
to reflect revised specs. Flipper chip is now 162 MHz and not 202.5 MHz.

As you can see the Gamecube can push
a lot of polygons per second, and it can do 26.4 million textured
polygons per second maximum. Note that increasing the number of local lights
in a scene would cause the polygonal rate to go down.

Flipper
(GPU) Instruction Set

Info from Beyond3D message board
thread
from a Japanese individual. It lists the different instructions available
by Flipper's Transformation and Lighting (T&L) unit.

Here is some information
about Flipper from Japanese magazine "Nikkei Electronics 2000/10/9"

As you can see, the Gamecube's Flipper
GPU has a very rich instruction set for transformations, texturing, lighting,
and bump mapping, and is very powerful in the number of instructions it
can do in parallel, while being easy to program.

Texture
Compression

The Gamecube's GPU can use S3TC's
compressed textures which provides for a 6:1 ratio in compression for 24-bit
textures. For 16-bit textures the ratio is 4:1, and for 8-bit textures
the ratio is 2:1.

Let us consider how much compressed
textures the Gamecube can hold in it's 24 MB of main memory if we consider
different memory size requirements for game code/geometry/etc.

Code/Geometry/etc.

Free Texture Space

24-bit Textures (compressed 6:1)

6 MB

18 MB

108 MB

8 MB

16 MB

96 MB

10 MB

14 MB

84 MB

12 MB

12 MB

72 MB

As you can
see the Gamecube can store lots of textures in it's memory using S3TC's
texture compression format. Note that another benefit of using compressed
textures is that the bandwidth requirements also decrease by the same ratio
as the actual compression. At a ratio of 6:1, the memory bus can pass 6
times more textures. That means the GPU's texture cache bus of 10.4 GB/sec
can pass 62.4 GB of 24-bit compressed textures, and the external bus of
2.6 GB/sec can pass 15.6 GB of 24-bit compressed textures each second!

Should a developer
use 16-bit textures over 24-bit textures in order to save space? Let us
compare:

Texture Size 512 x 512

16-bit

24-bit

Uncompressed

525 KB

786 KB

Compressed

131 KB

131 KB

As you can see with the greater ratio
of 6:1 for 24-bit textures, it makes more sense for the developer to use
only 24-bit compressed textures as they are the same size as 16-bit compressed
textures.

S3TC also allows texture compression
of transparencies, which the Vector Quantization (VQ) texture compression
on the Dreamcast could not do. This will allow the Gamecube to store lots
of transparencies in it's main memory.

Gamecube does do hidden surface removal
(HSR) by doing an early z-buffer check, that discards hidden pixels as
it renders from front to back. The front to back sorting has to be done
by the game engine as developed by game developer or it will not be effective.
Of course the results will then vary from developer to developer. This
HSR is not as effective as PowerVR's infinite planes, as the PowerVR method
does not need the developer to render objects in any order to be effective.
This information on Gamecube's HSR was provided by someone who has access
to Gamecube's developer documentation.

Virtual
Texturing

Virtual texturing is a hardware feature
of managing textures by breaking them up into smaller blocks. This can
contribute to quite a savings in bandwidth and make for more efficient
use of the texture cache for textures like sky textures for example where
in most games only half of the sky can be seen in most scenes. By keeping
the most used texture blocks in the texture cache, this allows main memory
bandwidth to be used more efficiently. All of this is done automatically,
and does not have to be coded in by the developer.

Hardware
Lighting

8 hardware lights supported. Every
polygon in a scene can be affected by as many as 8 lights. The number of
polygons that the Gamecube can do with 8 lights depends on whether those
lights are local or infinite, and you also have to consider what type of
lighting is being used. The number of polygons with light sources could
vary greatly depending on these variables.

Vertex
Compression

Gamecube supports vertex compression,
as it allows vertex data to be represented by bytes (8-bits) or shorts
(16-bits) instead of floating point numbers (32-bits) if the particular
game engine can get away with using less accurate polygon positioning.
The transformation unit automatically unpacts the integer data, and converts
it to floating point values before processing it.

Main
memory

One of the most amazing aspects of
the Gamecube, is it's 24 MBytes of 1T-SRAM. 1T-SRAM was invented by MoSys,
Inc., and you can find specific information on this memory here
at MoSys's website.

With Gamecube's main memory of 1T-SRAM
and it's sustainable latency of 10 ns or lower, it should be faster then
any other affordable memory technology out there when it comes to repeated
non-linear accesses. This memory will shine with repeated random accesses
that complex game AI may introduce, and not with general texture accesses,
since textures are stored in memory linearly. Texture access speed will
not suffer though, since the main memory has a bandwidth of 2.6 GB/sec,
and there is also that 1 MB of onchip texture cache to help keep the most
repeated textures near the rendering unit.

The Gamecube also has an extra 16
MB of 81 MHz DRAM and this memory would be great for data that does not
need the access speed of the 1T-SRAM main memory like sound, and selection
screens.

Here again the GameCube
uses 1T-SRAM, this time as 24 Mbytes of external memory. Operating at a
324-MHz clock speed, the memory moves data at 2.6 Gbytes/s, with a sustained
latency of 10 ns. But unlike the Playstation 2 or the forthcoming XBox,
GameCube's memory subsystem does not rely on a Rambus or Double-Data Rate
(DDR) interface to boost the bandwidth. Instead, Mosys developed a proprietary
active termination I/O that resides near the pads and eliminates the need
for placing a bank of resistors on the board, saving area and cost.

Texture
Cache

Here is any interesting article
from AsiaBizTech that
provides some information on the number of simultaneous accesses that can
occur with the texture cache:

Parallel Processing
of 32 Access Transactions

The Flipper LSI has two units
of the 1T-SRAM memory integrated, namely, 2.1MB for a frame-buffer and
Z-buffer and 1MB for a texture cache. NEC Corp. manufactures the LSI.

It was necessary to enhance random
access performance of 1T-SRAM applied to a texture cache which will be
frequently accessed, thus making it faster than that used for the frame-buffer
and Z-buffer. To meet this need, the entire bank was divided into 512
pieces. Fu-Chien Hsu, chairman and CEO of MoSys said, "Of those component
banks, 32 banks can be accessed simultaneously." On the other hand,
the frame and Z buffer was designed to have 128 banks, since there was
no strong need to offer high operation with this buffer.

The main memory of the Gamecube
consists of two sets of 96Mbits 1T-SRAM. As it can drive a 64-bit data-bus
at 400MHz*, the machine transfers data at up to 3.2GB* per second, which
is the same rate the PlayStation2 has achieved through the Direct Rambus
interface consisting of two channels.

However, latency on random access
to the main memory is slower than that of the 1T-SRAM being embedded in
the Flipper LSI, which results from a configuration of the main memory
being externally attached. Nonetheless, the memory access is completed
in less than 10ns, sufficiently faster than multi-purpose DRAM.

*Note: Data bus transfer is now 2.6
GB/sec due to revised specs, and the external 64-bit data-bus runs at 324
MHz now.

Both internal memory
buffers have a sustained latency of under 5 nanoseconds. The frame and
z-buffer memory is capable of 7.68 Gbytes/second of bandwidth. The texture
buffer boasts an even faster bandwidth of 10.4 Gbytes/s because it's divided
into 32 independent macros, each 16 bits wide for a total I/O of 512 bits.
This gives each macro its own address bus, so that all 32 macros can be
accessed simultaneously, said Mark-Eric Jones, vice president of marketing
for Mosys.