ATI Hollywood very similar to ATI HD 2000 series in many aspects

Hi there: For those who have read about the proves about why wii Hollywood has a GPU natively capable of GPGPU will have no trouble reading this, if not, you should read the topic "Nvidia 7 series not capable of HDR+AA, but ATI Hollywood"

Ok, first off, must of us know about the displacement mapping patents that nintendo field some years ago; but why even an architecture like those present in the ATI x1000 cannot achieve displacement mapping efficiently(like the ATI Radeon x1900), the answer can be found here

" Blogs Graphics Drivers »Graphics Driver Page »Forceware 97.28 Vertex Displacement Mapping or simply Displacement Mapping is a technique allowing to deform a polygonal mesh using a texture (displacement map) in order to add surface detail. The principle is not new, it is even the basis of number of terrain generation algorithms (see the Terrain Generator GLUT project). The new thing is the use of the GPU to achieve in real time the mesh deformation.

Update: November 5, 2006: The displacement mapping involves a graphics controller that allows to access at least to one texture unit inside the vertex shader. The access to a texture inside the vertex shader is called Vertex Texture Fetching. The shader model 3.0 imposes that at least 4 texture units are accessible inside the vertex shader. Currently, only the graphics controllers based on nVidia Geforce 6, Geforce 7 and higher support the Vertex Texture Fetching. ATI graphics controllers do not support the Vertex Texture Fetching even for the latest high-end models such as the X1950XTX (for more explanation see here: ATI X1900XTX and VTF).

"

As we know, the first ATI x1900 were launched in January 2006, but the ATI Hollywood was finished after June of 2006, when NEC announced it´s collaboration with Nintendo and Mosys for providing eDRAM to Wii.

The thing is, if Nintendo was seeking a way to achieve displacement mapping efficiently, that would make the ATI Hollywood comparable to ATI 2000 series models, since those were the ATI Radeons that supported vertex texture fetch.

The new texture processor supports filtering of FP32 textures as well as the vertex texture fetch feature the Radeon X1000 did not support. It also supports 8192x8192 textures and RGBE 9:9:9:5 texture format to comply with DirectX 10 requirements. Besides everything else, ATI/AMD claims an improved quality of anisotropic filtering.

"

In few words, Hollywood may be very similar to the ATI Radeons HD2000. Plus, there is a displacement mappinmg patent filed by Nintendo that talks about vertex texture fetching, and the command processor providing stream of vertex commands.

" Associated on the side of the shader core diagram are the texture units. ATI has chosen four texture units for R600. Each unit has eight texture addresses per cycle while four of those are used for bilinear and four are used for four unfiltered lookups. The vertex cache can be used for vertex accesses or other structured accesses. It can even be used for displacements, which will probably become more prevalent in DX10 games.

Associated within each unit are 20 texture samplers for a total of 80 samplers in R600. These samplers fetch and return the data. According to ATI, it does not matter if it is floating point or integer data. It will return four filtered floating point values per cycle and it will return four unfiltered floating-point, or any other type of data per cycle. The 2400 and 2600 core functionality will remain the same but they won't be able to return as much because they have fewer units.

Compared to the previous generations, the texture caches are a bit more complicated, as they are broken up into several caches. There is a 32K L1 unified for all of the SIMD arrays. In comparison, the R500 series only had an 8K cache (per SIMD it is four times larger). It is backed up by a second 256K L2 (2600 has 128K L2 and the 2400 has no L2). The secondary cache allows for very large data structures like fat pixels or very large textures. The aim is to reduce the bandwidth they use for texture.

In concert with the texture cache subsystem, there is also a vertex cache system. It is called a vertex cache because that is one of its primary uses, but it can be used for unfiltered texture lookups as well. It is quite common to use the cache with displacement mapping, structured lookup into arrays and render-to-vertex arrays where data is fetched back. Since it deals primarily with vertex data, it was called a vertex cache. For all intents and purposes, it is a structured linear cache working in parallel. It is not necessarily as important how much data actually goes through any of these caches as much as it is the availability of resources when work needs to be done. The availability of resources for which the cache can be arbitrated is more crucial. In the case of the HD 2400, it actually fetches its vertices through the texture cache. The hardware looks at all of these units as a general resource and will have the compiler take the shader code and convert it. The key to all of this architecture is how well the compiler can convert code, which will determine how things are going to work and what kind of throughput you will actually end up with.

In tasks such as render to texture, it is common to create a texture and then immediately use it. Issues can arise by doing that. The texture needs to finish being drawn before it is used. On older processors (ATI and current Nvidia), the chip would idle to finish rendering the texture before moving on to the next command. There is a performance hit involved. ATI has changed this on the 2000 series. As mentioned before, self checking has been moved down into the hardware so when the rendering of textures occurs there is a coherency check within the chip across the texture units and the raster back ends. The driver doesn't care anymore. It just sends the commands down to the chip and fills it up. The processor itself handles all of the synchronizations between all of the units.

Stream Out allows something that was introduced in the R500 called render-to-vertex buffer. This can now be done after geometry shading processing by streaming it out directly from the shader. It can write vertex data out of the shader and then circulated through for tessellation or any other extra processing. It can also be done via thread communication. Here one thread can write the data out and have the next thread reads it back in, do a render-to-vertex buffer, or overflow the GS data. This can only be done if the GPR stack is virtualized.

"

And as we read from the displacement mapping patent of nintendo, the hardware includes a vertex cache.

Another key point is that some ATI of the HD 2000 series are so optimized that have very few texture units comparable to the ATI x1600; they can have as low as 8 texture units in total.