What defines an Onyx2 as a workstation, is a screen, keyboard and mouse. Without video hardware ([[SGI_Onyx2#InfiniteReality|see InfiniteReality below]]) an Onyx2 is an Origin 2000 server. Even the SGI documentation describes an Onyx2 as a workstation despite the fact they can be configured into 5 rack "reality monsters". Thats some workstation, and a lot of noise.

+

What defines an Onyx2 as a workstation, is a screen, keyboard and mouse. Without video hardware ([[SGI_Onyx2#InfiniteReality|see InfiniteReality below]]) an Onyx2 is an [[Origin 2000]] server. Even the SGI documentation describes an Onyx2 as a workstation despite the fact they can be configured into 5 rack "reality monsters". That's some workstation, and a lot of noise!

An Onyx2 system is comprised of nodes linked together by an interconnection network. It uses the distributed shared memory S2MP (Scalable Shared-Memory Multiprocessing) architecture. The Onyx2 uses NUMAlink (originally named CrayLink) for its system interconnect. The nodes are connected to router boards, which use NUMAlink cables to connect to other nodes through their routers. The NUMAlink's network topology is a bristled fat hypercube. In configurations with more than 64 processors, a hierarchical fat hypercube network topology is used instead. Additional NUMAlink cables, called Xpress links can be installed between unused Standard Router ports to reduce latency and increase bandwidth. Xpress links can only be used in systems that have 16 or 32 processors, as these are the only configurations with a network topology that enables unused ports to be used in such a way.

+

An Onyx2 system is comprised of nodes linked together by an interconnection network. It uses the distributed shared memory S2MP (Scalable Shared-Memory Multiprocessing) architecture. The Onyx2 uses [[NUMAlink]] (originally named CrayLink) for its system interconnect. The nodes are connected to router boards, which use NUMAlink cables to connect to other nodes through their routers. The NUMAlink's network topology is a bristled fat hypercube. In configurations with more than 64 processors, a hierarchical fat hypercube network topology is used instead. Additional NUMAlink cables, called Xpress links can be installed between unused Standard Router ports to reduce latency and increase bandwidth. Xpress links can only be used in systems that have 16 or 32 processors, as these are the only configurations with a network topology that enables unused ports to be used in such a way.

==Router boards==

==Router boards==

Line 50:

Line 56:

An Onyx2 node fits on a single 16" by 11" printed circuit board that contains one or two processors, the main memory, the directory memory and the Hub ASIC. The node board plugs into the backplane through a 300-pad CPOP (Compression Pad-on-Pad) connector. The connector actually combines two connections, one to the NUMAlink router network and another to the XIO I/O subsystem.

An Onyx2 node fits on a single 16" by 11" printed circuit board that contains one or two processors, the main memory, the directory memory and the Hub ASIC. The node board plugs into the backplane through a 300-pad CPOP (Compression Pad-on-Pad) connector. The connector actually combines two connections, one to the NUMAlink router network and another to the XIO I/O subsystem.

+

+

See also the [[Onyx2/Origin2000_Node_boards]] topic.

==Processor==

==Processor==

−

Each processor and their secondary cache is contained on a HIMM (Horizontal Inline Memory Module) daughter card that plugs into the node board. At the time of introduction, the Onyx2 used the IP27 board, featuring one or two R10000 processors clocked at 180 MHz with 1 MB secondary cache(s). A high-end model with two 195 MHz R10000 processors with 4 MB secondary caches was also available. In February 1998, the IP31 board was introduced with two 250 MHz R10000 processors with 4 MB secondary caches. Later, the IP31 board was upgraded to support two 300, 350 or 400 MHz R12000 processors. The 300 and 400 MHz models had 8 MB L2 caches, while the 350 MHz model had 4 MB L2 caches. Near the end of its life, a variant of the IP31 board that could utilize the 500 MHz R14000 with 8 MB L2 caches was made available.

+

Each processor and their secondary cache is contained on a HIMM (Horizontal Inline Memory Module) daughter card that plugs into the [[Onyx2/Origin2000_Node_boards|node board]]. At the time of introduction, the Onyx2 used the IP27 board, featuring one or two R10000 processors clocked at 180 MHz with 1 MB secondary cache(s). A high-end model with two 195 MHz R10000 processors with 4 MB secondary caches was also available. In February 1998, the IP31 board was introduced with two 250 MHz R10000 processors with 4 MB secondary caches. Later, the IP31 board was upgraded to support two 300, 350 or 400 MHz R12000 processors. The 300 and 400 MHz models had 8 MB L2 caches, while the 350 MHz model had 4 MB L2 caches. Near the end of its life, a variant of the IP31 board that could utilize the 500 MHz R14000 with 8 MB L2 caches was made available.

==Main memory and directory memory==

==Main memory and directory memory==

−

Each node board can support a maximum of 4 GB of memory through 16 DIMM slots by using proprietary ECC SDRAM DIMMs with capacities of 16, 32, 64 and 256 MB. Because the memory bus is 144 bits wide (128 bits for data and 16 bits for ECC), memory modules are inserted in pairs. Directory memory, which contains information on the contents of remote caches for maintaining cache coherency, must be used in configurations with more than 32 processors as the Onyx2 uses a distributed shared memory model. The directory memory is contained on proprietary DIMMs that are inserted into eight DIMM slots set aside for its use. In configurations where there are fewer than 32 processors, the directory memory is contained within the main memory.

+

Each [[Onyx2/Origin2000_Node_boards|node board]] can support a maximum of 4 GB of memory through 16 [[DIMM]] slots by using proprietary ECC SDRAM DIMMs with capacities of 16, 32, 64 and 256 MB. Because the memory bus is 144 bits wide (128 bits for data and 16 bits for ECC), memory modules are inserted in pairs. Directory memory, which contains information on the contents of remote caches for maintaining cache coherency, must be used in configurations with more than 32 processors as the Onyx2 uses a distributed shared memory model. The directory memory is contained on proprietary DIMMs that are inserted into eight DIMM slots set aside for its use. In configurations where there are fewer than 32 processors, the directory memory is contained within the main memory.

==Hub ASIC==

==Hub ASIC==

Line 65:

Line 73:

==I/O subsystem==

==I/O subsystem==

−

The I/O subsystem is based around the Crossbow (Xbow) ASIC, which shares many similarities with the SPIDER ASIC. Since the Xbow ASIC is intended for use with the simpler XIO protocol, its hardware is also simpler, allowing the ASIC to feature eight ports, compared with the SPIDER ASIC's six ports. Two of the ports connect to the node boards, and the remaining six to XIO cards. While the I/O subsystem's native bus is XIO, PCI-X and VME64 buses can also be used, provided by XIO bridges.

+

The I/O subsystem is based around the Crossbow (Xbow) ASIC, which shares many similarities with the SPIDER ASIC. Since the Xbow ASIC is intended for use with the simpler XIO protocol, its hardware is also simpler, allowing the ASIC to feature eight ports, compared with the SPIDER ASIC's six ports. Two of the ports connect to the [[Onyx2/Origin2000_Node_boards|node boards]], and the remaining six to XIO cards. While the I/O subsystem's native bus is XIO, PCI-X and VME64 buses can also be used, provided by XIO bridges.

A IO6 base I/O board is present in every system. It is a XIO card that provides:

A IO6 base I/O board is present in every system. It is a XIO card that provides:

Line 87:

Line 95:

The implementation is partitioned into '''Geometry''' (also known as the '''Geometry Engine'''), '''Raster Memory''' (also known as the '''Raster Manager''') and '''Display Generator''' boards, with each board corresponding to each stage of the three major stages in the architecture's pipeline. The board set partitioning scheme is the same as the RealityEngine, as a result of Silicon Graphics wanting the RealityEngine to be easily upgradable to the InfiniteReality. Each pipeline consists of one Geometry Engine board, one, two or four Raster Manager boards and one Display Generator board.<ref name="Paper">John S. Montrym et al. "InfiniteReality: A Real-Time Graphics System". ACM SIGGRAPH.</ref>

The implementation is partitioned into '''Geometry''' (also known as the '''Geometry Engine'''), '''Raster Memory''' (also known as the '''Raster Manager''') and '''Display Generator''' boards, with each board corresponding to each stage of the three major stages in the architecture's pipeline. The board set partitioning scheme is the same as the RealityEngine, as a result of Silicon Graphics wanting the RealityEngine to be easily upgradable to the InfiniteReality. Each pipeline consists of one Geometry Engine board, one, two or four Raster Manager boards and one Display Generator board.<ref name="Paper">John S. Montrym et al. "InfiniteReality: A Real-Time Graphics System". ACM SIGGRAPH.</ref>

−

The implementation comprises twelve [[Application-specific integrated circuit|ASIC]] designs [[Semiconductor fabrication|fabricated]] in 0.5 and 0.35 micrometre processes with three layers of metal interconnect.<ref name="Paper"/> These ASICs require a 3.3 V power supply. An InfiniteReality pipeline in a maximal configuration contains 251 million transistors. The InfiniteReality was developed by 55 engineers.<ref name="HC"> John Montrym, Brian McClendon. "InfiniteReality Graphics - Power Through Complexity". Advanced Systems Division, Silicon Graphics, Inc.</ref>

+

The implementation comprises twelve Application-specific integrated circuit (ASIC) designs fabricated in 0.5 and 0.35 micrometre processes with three layers of metal interconnect.<ref name="Paper"/> These ASICs require a 3.3 V power supply. An InfiniteReality pipeline in a maximal configuration contains 251 million transistors. The InfiniteReality was developed by 55 engineers.<ref name="HC"> John Montrym, Brian McClendon. "InfiniteReality Graphics - Power Through Complexity". Advanced Systems Division, Silicon Graphics, Inc.</ref>

Given a system capable enough, such as certain models of the Onyx2 and Onyx 3000, up to 16 InfiniteReality pipelines can be hosted. The pipelines can be operated in three modes: multi-seat, multi-display and multi-pipe. In multi-seat mode, each pipeline can serve up to eight simultaneous users, each with their own separate displays, keyboards and mice. In multi-display mode, multiple outputs drive multiple displays, which is useful for virtual reality. The multi-pipe mode has two methods of operation. The first method requires a digital multiplexer (DPLEX) daughterboard to be installed in every pipeline, which combines the output of multiple pipelines. The second method uses '''MonsterMode''' software to distribute the data used to render a frame to multiple pipelines.

Given a system capable enough, such as certain models of the Onyx2 and Onyx 3000, up to 16 InfiniteReality pipelines can be hosted. The pipelines can be operated in three modes: multi-seat, multi-display and multi-pipe. In multi-seat mode, each pipeline can serve up to eight simultaneous users, each with their own separate displays, keyboards and mice. In multi-display mode, multiple outputs drive multiple displays, which is useful for virtual reality. The multi-pipe mode has two methods of operation. The first method requires a digital multiplexer (DPLEX) daughterboard to be installed in every pipeline, which combines the output of multiple pipelines. The second method uses '''MonsterMode''' software to distribute the data used to render a frame to multiple pipelines.

Line 95:

Line 103:

=== Geometry board ===

=== Geometry board ===

−

The Geometry board is responsible for geometry and image processing and is divided into four stages, each stage being implemented by separate device(s). The first stage is the '''Host Interface'''. Due to the InfiniteReality being designed for two very different platforms, the traditional [[shared memory]] [[Bus (computing)|bus]]-based Onyx using the POWERpath-2 bus, and the [[distributed shared memory]] network-based Onyx2 using the [[NUMAlink|NUMAlink2]] interconnect, the InfiniteReality had to have an interface that could provide similar performance on both platforms, which had a large difference in incoming bandwidth (200 MB/s versus 400 MB/s respectively).<ref name="Paper"/>

+

The Geometry board is responsible for geometry and image processing and is divided into four stages, each stage being implemented by separate device(s). The first stage is the '''Host Interface'''. Due to the InfiniteReality being designed for two very different platforms, the traditional shared memory bus-based Onyx using the POWERpath-2 bus, and the distributed shared memory network-based Onyx2 using the [[NUMAlink|NUMAlink2]] interconnect, the InfiniteReality had to have an interface that could provide similar performance on both platforms, which had a large difference in incoming bandwidth (200 MB/s versus 400 MB/s respectively).<ref name="Paper"/>

−

To this end, a '''Host Interface Processor''', an embedded [[RISC]] core, is used to fetch display list objects using [[direct memory access]] (DMA). The Host Interface Processor is accompanied by 16 MB of [[SDRAM|synchronous dynamic random access memory]] (SDRAM), of which 15 MB is used to [[cache]] display leaf objects. The cache can deliver data to the next stage at over 300 MB/s. The next stage is the '''Geometry Distributor''', which transfers data and instructions from the Host Interface Processor to individual Geometry Engines.

+

To this end, a '''Host Interface Processor''', an embedded [[RISC]] core, is used to fetch display list objects using [[direct memory access]] (DMA). The Host Interface Processor is accompanied by 16 MB of [[SDRAM|synchronous dynamic random access memory]] (SDRAM), of which 15 MB is used to cache display leaf objects. The cache can deliver data to the next stage at over 300 MB/s. The next stage is the '''Geometry Distributor''', which transfers data and instructions from the Host Interface Processor to individual Geometry Engines.

−

The next stage is performing geometry and image processing. The '''Geometry Engine''' is used for the purpose, with each Geometry board containing up to four working in a [[MIMD|multiple instruction multiple data]] (MIMD) fashion. The Geometry Engine is a semi-custom ASIC with a single instruction multiple data (SIMD) pipeline containing three [[floating-point]] cores, each containing an [[arithmetic logic unit]] (ALU), a multiplier and a 32-bit by 32-entry [[register file]] with two read and two write ports. These cores are provided with a 32-bit by 2,560-entry memory that holds elements of OpenGL [[State (computing)|state]] and provides [[Scratchpad RAM|scratchpad]] storage. Each core also has a '''float-to-fix converter''' to convert floating-point values into [[integer]] form. The Geometry Engine is capable of completing three instructions per cycle, and each Geometry board, with four such devices, can complete 12 instructions per cycle. The Geometry Engine uses a 195-bit microinstruction, which is compressed in order to reduce size and banwidth usage in return for slightly less performance.

+

The next stage is performing geometry and image processing. The '''Geometry Engine''' is used for the purpose, with each Geometry board containing up to four working in a [[MIMD|multiple instruction multiple data]] (MIMD) fashion. The Geometry Engine is a semi-custom ASIC with a single instruction multiple data (SIMD) pipeline containing three floating-point cores, each containing an arithmetic logic unit (ALU), a multiplier and a 32-bit by 32-entry [[register file]] with two read and two write ports. These cores are provided with a 32-bit by 2,560-entry memory that holds elements of OpenGL State and provides Scratchpad RAM storage. Each core also has a '''float-to-fix converter''' to convert floating-point values into integer form. The Geometry Engine is capable of completing three instructions per cycle, and each Geometry board, with four such devices, can complete 12 instructions per cycle. The Geometry Engine uses a 195-bit microinstruction, which is compressed in order to reduce size and banwidth usage in return for slightly less performance.

The Geometry Engine processor operates at 90 MHz, achieving a maximum theoretical performance of 540 MFLOPS.<ref name="HC"/> As there are four such processors on a GE12-4 or GE14-4 board, the maximum theoretical performance is 2.16 GFLOPS. A 16-pipeline system therefore achieves a maximum theoretical performance of 34.56 GFLOPS.

The Geometry Engine processor operates at 90 MHz, achieving a maximum theoretical performance of 540 MFLOPS.<ref name="HC"/> As there are four such processors on a GE12-4 or GE14-4 board, the maximum theoretical performance is 2.16 GFLOPS. A 16-pipeline system therefore achieves a maximum theoretical performance of 34.56 GFLOPS.

−

The fourth stage is the '''Geometry-Raster FIFO''', a [[FIFO|first in first out]] (FIFO) [[buffer]] that merges the outputs of the four Geometry Engines into one, reassembling the outputs in the order they were issued. The FIFO is built from SDRAM and has a capacity of 4 MB,<ref>Mark J. Kilgard. "Realizing OpenGL: Two Implementations of One Architecture". 1997 SIGGRAPH Eurographics Workshop, August 1997.</reF> large enough to store 65,536 [[Vertex (geometry)|vertexes]]. The transformed vertexes are moved from this FIFO to the Raster Manager boards for triangle reassembly and setup by the Triangle Bus (also known as the Vertex Bus), which has a bandwidth of 400 MB/s.

+

The fourth stage is the '''Geometry-Raster FIFO''', a [[FIFO|first in first out]] (FIFO) buffer that merges the outputs of the four Geometry Engines into one, reassembling the outputs in the order they were issued. The FIFO is built from SDRAM and has a capacity of 4 MB,<ref>Mark J. Kilgard. "Realizing OpenGL: Two Implementations of One Architecture". 1997 SIGGRAPH Eurographics Workshop, August 1997.</reF> large enough to store 65,536 vertexes. The transformed vertexes are moved from this FIFO to the Raster Manager boards for triangle reassembly and setup by the Triangle Bus (also known as the Vertex Bus), which has a bandwidth of 400 MB/s.

=== Raster Memory board ===

=== Raster Memory board ===

−

The function of the Raster Memory board is to perform [[rasterization]]. It also contains the [[texture memory]] and [[framebuffer|raster memory]], which is more commonly known as the [[framebuffer]]. Rasterization is performed in the '''[[Fragment (computer graphics)|Fragment Generator]]''' and the eighty '''Image Engines'''. The Fragment Generator comprises four ASIC designs: the '''Scan Converter''' (SC) ASIC, the '''Texel Address Calculator''' (TA) ASIC, the '''Texture Memory Controller''' (TM) ASIC and the '''Texture Fragment''' (TF) ASIC.<ref name="Paper"/>

+

The function of the Raster Memory board is to perform rasterization. It also contains the [[texture memory]] and [[framebuffer|raster memory]], which is more commonly known as the [[framebuffer]]. Rasterization is performed in the '''Fragment Generator''' and the eighty '''Image Engines'''. The Fragment Generator comprises four ASIC designs: the '''Scan Converter''' (SC) ASIC, the '''Texel Address Calculator''' (TA) ASIC, the '''Texture Memory Controller''' (TM) ASIC and the '''Texture Fragment''' (TF) ASIC.<ref name="Paper"/>

−

The SC ASIC and the TA ASIC perform scan conversion, color and depth interpolation, perspective correct texture coordinate interpolation and level of detail computation on incoming data, and the results are passed to the eight TM ASICs, which are specialized [[memory controller]]s optimized for texel access. Each TM ASIC controls four SDRAMs that make up one-eighth of the texture memory. The SDRAMs used are 16 bits wide and have separate address and data buses. SDRAMs with a capacity of 4 Mb are used by Raster Manager boards with 16 MB of texture memory while 16 Mb SDRAMs are used by Raster Manager boards with 64 MB of texture memory.<ref name="HC"/> The TM ASICs perform texel lookups in its SDRAMs according to the texel addresses issued by the TA ASIC. Texels from the TM ASICs are forwarded to the appropriate TF ASIC, where texture filtering, texture environment combination with interpolated color and fog application is performed. As each SDRAM holds part of the texture memory, all of the 32 SDRAMs must be connected to all of the 80 Image Engines. To achieve this, the TM and TF ASICs implement a two-rank [[omega network]], which reduces the number of individual paths required for the 32 to 80 sort while maintaining the same functionality.

+

The SC ASIC and the TA ASIC perform scan conversion, color and depth interpolation, perspective correct texture coordinate interpolation and level of detail computation on incoming data, and the results are passed to the eight TM ASICs, which are specialized memory controllers optimized for texel access. Each TM ASIC controls four SDRAMs that make up one-eighth of the texture memory. The SDRAMs used are 16 bits wide and have separate address and data buses. SDRAMs with a capacity of 4 Mb are used by Raster Manager boards with 16 MB of texture memory while 16 Mb SDRAMs are used by Raster Manager boards with 64 MB of texture memory.<ref name="HC"/> The TM ASICs perform texel lookups in its SDRAMs according to the texel addresses issued by the TA ASIC. Texels from the TM ASICs are forwarded to the appropriate TF ASIC, where texture filtering, texture environment combination with interpolated color and fog application is performed. As each SDRAM holds part of the texture memory, all of the 32 SDRAMs must be connected to all of the 80 Image Engines. To achieve this, the TM and TF ASICs implement a two-rank omega network, which reduces the number of individual paths required for the 32 to 80 sort while maintaining the same functionality.

−

The eighty Image Engines have multiple functions. Firstly, each Image Engine controls a portion of the raster memory, which in the case of the InfiniteReality, is a 1 MB SGRAM organized as 262,144 by 32-bit words.<ref name="Paper"/><ref name="HC"/> Secondly, the following OpenGL per-fragment operations are performed by the Image Engines: pixel ownership test, stencil test, depth buffer test, blending, dithering and logical operation. Lastly, the Image Engines perform anti-aliasing and [[accumulation buffer]] operations. To deliver pixel data for display, each Image Engine has a 2-bit serial bus to the Display Generator board. If one Raster Manager board is present in the pipeline, the Image Engine uses the entire width of the bus, whereas if two or more Raster Manager boards are present, the Image Engine uses half the bus.<ref name="Paper"/> Each serial bus is actually a part of the Video Bus, which has a bandwidth of 1.2 GB/s. Four Image Engine "cores" are contained on an Image Engine ASIC, which contains nearly 488,000 logic gates, comprising 1.95 million transistors, on a 42 mm<sup>2</sup> (6.5 by 6.5 mm) die that was fabricated in a 0.35 micrometre process by [[VLSI Technology]].

+

The eighty Image Engines have multiple functions. Firstly, each Image Engine controls a portion of the raster memory, which in the case of the InfiniteReality, is a 1 MB SGRAM organized as 262,144 by 32-bit words.<ref name="Paper"/><ref name="HC"/> Secondly, the following OpenGL per-fragment operations are performed by the Image Engines: pixel ownership test, stencil test, depth buffer test, blending, dithering and logical operation. Lastly, the Image Engines perform anti-aliasing and [[accumulation buffer]] operations. To deliver pixel data for display, each Image Engine has a 2-bit serial bus to the Display Generator board. If one Raster Manager board is present in the pipeline, the Image Engine uses the entire width of the bus, whereas if two or more Raster Manager boards are present, the Image Engine uses half the bus.<ref name="Paper"/> Each serial bus is actually a part of the Video Bus, which has a bandwidth of 1.2 GB/s. Four Image Engine "cores" are contained on an Image Engine ASIC, which contains nearly 488,000 logic gates, comprising 1.95 million transistors, on a 42 mm<sup>2</sup> (6.5 by 6.5 mm) die that was fabricated in a 0.35 micrometre process by VLSI Technology.

The InfiniteReality uses the '''RM6-16''' or '''RM6-64''' Raster Managers. Each pipeline is capable of display resolutions of 2.62, 5.24 or 10.48 million pixels, provided that one, two or four Raster Manager boards respectively are present.<ref name="Report">Onyx2 Reality, Onyx2 InfiniteReality and Onyx2 InfiniteReality2 Technical Report, August 1998. Silicon Graphics, Inc.</ref> The raster memory can be configured to use 256, 512 or 1024 bits per pixel. 320 MB supports a resolution of 2560 by 2048 pixels with each pixel containing 512 bits of information.<ref name="HC"/> In a configuration with four Raster Managers, the texture memory has a bandwidth of 15.36 GB/s, and the raster memory has a bandwidth of 72.8 GB/s.

The InfiniteReality uses the '''RM6-16''' or '''RM6-64''' Raster Managers. Each pipeline is capable of display resolutions of 2.62, 5.24 or 10.48 million pixels, provided that one, two or four Raster Manager boards respectively are present.<ref name="Report">Onyx2 Reality, Onyx2 InfiniteReality and Onyx2 InfiniteReality2 Technical Report, August 1998. Silicon Graphics, Inc.</ref> The raster memory can be configured to use 256, 512 or 1024 bits per pixel. 320 MB supports a resolution of 2560 by 2048 pixels with each pixel containing 512 bits of information.<ref name="HC"/> In a configuration with four Raster Managers, the texture memory has a bandwidth of 15.36 GB/s, and the raster memory has a bandwidth of 72.8 GB/s.

Line 117:

Line 125:

=== Display Generator board ===

=== Display Generator board ===

−

The '''DG4-2''' Display Generator board contains hardware to drive up to two video outputs, which may be expanded to eight video outputs with an optional daughterboard, a configuration known as the '''DG4-8'''. The outputs are independent and each output has hardware for generating video timing, video resizing, [[gamma correction]] and [[digital-to-analog]] conversion. Digital-to-analog conversion is provided by 8-bit digital-to-analog converters that support a pixel clock frequency up to 220 MHz.

+

The '''DG5-2''' Display Generator board contains hardware to drive up to two video outputs, which may be expanded to eight video outputs with an optional daughterboard, a configuration known as the '''DG5-8'''. The outputs are independent and each output has hardware for generating video timing, video resizing, gamma correction and digital-to-analog conversion. Digital-to-analog conversion is provided by 8-bit digital-to-analog converters that support a pixel clock frequency up to 220 MHz.

−

Data for the video outputs are provided by four ASICs that de-serialize and de-interleave the 160-bit streams into 10-bit component [[RGBA]], 12-bit component RBGA, L16, Stereo Field Sequential (FS) or color indexes. The hardware also incorporates the [[cursor]] at this stage. A 32,768 [[color index|color index map]] entries are available.

+

Data for the video outputs are provided by four ASICs that de-serialize and de-interleave the 160-bit streams into 10-bit component [[RGBA]], 12-bit component RBGA, L16, Stereo Field Sequential (FS) or color indexes. The hardware also incorporates the cursor at this stage. A 32,768 |color index map entries are available.

=== Capabilities and performance ===

=== Capabilities and performance ===

Line 160:

Line 168:

== InfiniteReality3 ==

== InfiniteReality3 ==

−

InfiniteReality3 was introduced in 2000 along with the [[SGI Origin 3000 and Onyx 3000|Onyx 3000]] to supersede the InfiniteReality2. It was used in the [[SGI Onyx2|Onyx2]] and Onyx 3000 visualization systems. The only improvement over the previous implementation was replacement of the RM9-64 Raster Manager with the '''RM10-256''' Raster Manager, which has 256 MB of texture memory, four times that the of the previous raster manager. When maximally configured with four Raster Managers, the InfiniteReality3 pipeline provides 320 MB of raster memory.

+

InfiniteReality3 was introduced in 2000 along with the Onyx 3000 to supersede the InfiniteReality2. It was used in the [[SGI Onyx2|Onyx2]] and Onyx 3000 visualization systems. The only improvement over the previous implementation was replacement of the RM9-64 Raster Manager with the '''RM10-256''' Raster Manager, which has 256 MB of texture memory, four times that the of the previous raster manager. When maximally configured with four Raster Managers, the InfiniteReality3 pipeline provides 320 MB of raster memory.

Introduction

What defines an Onyx2 as a workstation, is a screen, keyboard and mouse. Without video hardware (see InfiniteReality below) an Onyx2 is an Origin 2000 server. Even the SGI documentation describes an Onyx2 as a workstation despite the fact they can be configured into 5 rack "reality monsters". That's some workstation, and a lot of noise!

Architecture

An Onyx2 system is comprised of nodes linked together by an interconnection network. It uses the distributed shared memory S2MP (Scalable Shared-Memory Multiprocessing) architecture. The Onyx2 uses NUMAlink (originally named CrayLink) for its system interconnect. The nodes are connected to router boards, which use NUMAlink cables to connect to other nodes through their routers. The NUMAlink's network topology is a bristled fat hypercube. In configurations with more than 64 processors, a hierarchical fat hypercube network topology is used instead. Additional NUMAlink cables, called Xpress links can be installed between unused Standard Router ports to reduce latency and increase bandwidth. Xpress links can only be used in systems that have 16 or 32 processors, as these are the only configurations with a network topology that enables unused ports to be used in such a way.

Router boards

There are four different router boards used by the Onyx2. Each successive router board allows a larger amount of nodes to be connected.

Null Router

The Null Router connects two nodes in the same module. A system using the Null Router cannot be expanded as there are no external connectors.

Star Router

The Star Router can connect up to four nodes. It is always used in conjunction with a Standard Router to function correctly.

Standard Router (Rack Router)

The Standard Router can connect up to 32 nodes. It contains the SPIDER ASIC, which serves as a router for the NUMAlink network. The SPIDER ASIC has six ports, each with a pair of unidirectional links, connected to a crossbar which enables the ports to communicate with each other.

Meta Router (Cray Router)

The Meta Router is used in conjunction with Standard Routers to connect more than 32 nodes. It can connect up to 64 nodes.

Onyx2 nodes

An Onyx2 node fits on a single 16" by 11" printed circuit board that contains one or two processors, the main memory, the directory memory and the Hub ASIC. The node board plugs into the backplane through a 300-pad CPOP (Compression Pad-on-Pad) connector. The connector actually combines two connections, one to the NUMAlink router network and another to the XIO I/O subsystem.

Processor

Each processor and their secondary cache is contained on a HIMM (Horizontal Inline Memory Module) daughter card that plugs into the node board. At the time of introduction, the Onyx2 used the IP27 board, featuring one or two R10000 processors clocked at 180 MHz with 1 MB secondary cache(s). A high-end model with two 195 MHz R10000 processors with 4 MB secondary caches was also available. In February 1998, the IP31 board was introduced with two 250 MHz R10000 processors with 4 MB secondary caches. Later, the IP31 board was upgraded to support two 300, 350 or 400 MHz R12000 processors. The 300 and 400 MHz models had 8 MB L2 caches, while the 350 MHz model had 4 MB L2 caches. Near the end of its life, a variant of the IP31 board that could utilize the 500 MHz R14000 with 8 MB L2 caches was made available.

Main memory and directory memory

Each node board can support a maximum of 4 GB of memory through 16 DIMM slots by using proprietary ECC SDRAM DIMMs with capacities of 16, 32, 64 and 256 MB. Because the memory bus is 144 bits wide (128 bits for data and 16 bits for ECC), memory modules are inserted in pairs. Directory memory, which contains information on the contents of remote caches for maintaining cache coherency, must be used in configurations with more than 32 processors as the Onyx2 uses a distributed shared memory model. The directory memory is contained on proprietary DIMMs that are inserted into eight DIMM slots set aside for its use. In configurations where there are fewer than 32 processors, the directory memory is contained within the main memory.

Hub ASIC

The Hub ASIC interfaces the processors, memory and XIO to the NUMAlink 2 system interconnect. The ASIC contains five major sections: the crossbar (referred to as the "XB"), the I/O interface (referred to as the "II"), the network interface (referred to as the "NI"), the processor interface (referred to as the "PI") and the memory and directory interface (referred to as the "DM"), which also serves as the memory controller. The interfaces communicate with each other via FIFO buffers that are connected to the crossbar. When two processors are connected to the Hub ASIC, the node does not behave in a SMP fashion. Instead, the two processors operate separately and their buses are multiplexed over the single processor interface. This was done to save pins on the Hub ASIC. The Hub ASIC is clocked at 100 MHz and contains 900,000 gates fabricated in a five-layer metal process.

I/O subsystem

The I/O subsystem is based around the Crossbow (Xbow) ASIC, which shares many similarities with the SPIDER ASIC. Since the Xbow ASIC is intended for use with the simpler XIO protocol, its hardware is also simpler, allowing the ASIC to feature eight ports, compared with the SPIDER ASIC's six ports. Two of the ports connect to the node boards, and the remaining six to XIO cards. While the I/O subsystem's native bus is XIO, PCI-X and VME64 buses can also be used, provided by XIO bridges.

A IO6 base I/O board is present in every system. It is a XIO card that provides:

1 10/100BASE-TX port

2 Serial ports provided by dual UARTs

1 internal Fast 20 UltraSCSI single-ended port

1 external wide UltraSCSI, singled ended port

1 real-time interrupt output for frame sync

1 real-time interrupt input (edge triggered)

Flash PROM, NVRAM and real time clock

InfiniteReality

The difference between a SGI Origin 2000 and an Onyx2 is the InfiniteReality. In fact, the Onyx2 Rack system pictured top right was built from two Onyx2 racks with the InfiniteReality taken out of the second rack and in its place, the top compute module, is an Origin 2000 deskside with the plastics removed. The InfiniteReality was introduced in early 1996. It succeeded the RealityEngine, although the RealityEngine coexisted with the InfiniteReality for some time for the Onyx as an entry-level option for deskside "workstation" configurations.

The InfiniteReality architecture was a third-generation design and is categorized as a sort-middle architecture. It was designed to render complex scenes in high-quality at 60 frames per second, roughly four or two times the performance of the RealityEngine it replaces. It was designed explicitly for use in conjunction with the OpenGL graphics library and implements most of the OpenGL pipeline in hardware.

The implementation is partitioned into Geometry (also known as the Geometry Engine), Raster Memory (also known as the Raster Manager) and Display Generator boards, with each board corresponding to each stage of the three major stages in the architecture's pipeline. The board set partitioning scheme is the same as the RealityEngine, as a result of Silicon Graphics wanting the RealityEngine to be easily upgradable to the InfiniteReality. Each pipeline consists of one Geometry Engine board, one, two or four Raster Manager boards and one Display Generator board.[1]

The implementation comprises twelve Application-specific integrated circuit (ASIC) designs fabricated in 0.5 and 0.35 micrometre processes with three layers of metal interconnect.[1] These ASICs require a 3.3 V power supply. An InfiniteReality pipeline in a maximal configuration contains 251 million transistors. The InfiniteReality was developed by 55 engineers.[2]

Given a system capable enough, such as certain models of the Onyx2 and Onyx 3000, up to 16 InfiniteReality pipelines can be hosted. The pipelines can be operated in three modes: multi-seat, multi-display and multi-pipe. In multi-seat mode, each pipeline can serve up to eight simultaneous users, each with their own separate displays, keyboards and mice. In multi-display mode, multiple outputs drive multiple displays, which is useful for virtual reality. The multi-pipe mode has two methods of operation. The first method requires a digital multiplexer (DPLEX) daughterboard to be installed in every pipeline, which combines the output of multiple pipelines. The second method uses MonsterMode software to distribute the data used to render a frame to multiple pipelines.

To interface the pipeline to the system, a Flat Cable Interface (FCI) cable is used to connect the Host Interface Processor ASIC on the Geometry Board to the Ibus on the IO4 board, a part of the host system.

Geometry board

The Geometry board is responsible for geometry and image processing and is divided into four stages, each stage being implemented by separate device(s). The first stage is the Host Interface. Due to the InfiniteReality being designed for two very different platforms, the traditional shared memory bus-based Onyx using the POWERpath-2 bus, and the distributed shared memory network-based Onyx2 using the NUMAlink2 interconnect, the InfiniteReality had to have an interface that could provide similar performance on both platforms, which had a large difference in incoming bandwidth (200 MB/s versus 400 MB/s respectively).[1]

To this end, a Host Interface Processor, an embedded RISC core, is used to fetch display list objects using direct memory access (DMA). The Host Interface Processor is accompanied by 16 MB of synchronous dynamic random access memory (SDRAM), of which 15 MB is used to cache display leaf objects. The cache can deliver data to the next stage at over 300 MB/s. The next stage is the Geometry Distributor, which transfers data and instructions from the Host Interface Processor to individual Geometry Engines.

The next stage is performing geometry and image processing. The Geometry Engine is used for the purpose, with each Geometry board containing up to four working in a multiple instruction multiple data (MIMD) fashion. The Geometry Engine is a semi-custom ASIC with a single instruction multiple data (SIMD) pipeline containing three floating-point cores, each containing an arithmetic logic unit (ALU), a multiplier and a 32-bit by 32-entry register file with two read and two write ports. These cores are provided with a 32-bit by 2,560-entry memory that holds elements of OpenGL State and provides Scratchpad RAM storage. Each core also has a float-to-fix converter to convert floating-point values into integer form. The Geometry Engine is capable of completing three instructions per cycle, and each Geometry board, with four such devices, can complete 12 instructions per cycle. The Geometry Engine uses a 195-bit microinstruction, which is compressed in order to reduce size and banwidth usage in return for slightly less performance.

The Geometry Engine processor operates at 90 MHz, achieving a maximum theoretical performance of 540 MFLOPS.[2] As there are four such processors on a GE12-4 or GE14-4 board, the maximum theoretical performance is 2.16 GFLOPS. A 16-pipeline system therefore achieves a maximum theoretical performance of 34.56 GFLOPS.

The fourth stage is the Geometry-Raster FIFO, a first in first out (FIFO) buffer that merges the outputs of the four Geometry Engines into one, reassembling the outputs in the order they were issued. The FIFO is built from SDRAM and has a capacity of 4 MB,[3] large enough to store 65,536 vertexes. The transformed vertexes are moved from this FIFO to the Raster Manager boards for triangle reassembly and setup by the Triangle Bus (also known as the Vertex Bus), which has a bandwidth of 400 MB/s.

Raster Memory board

The function of the Raster Memory board is to perform rasterization. It also contains the texture memory and raster memory, which is more commonly known as the framebuffer. Rasterization is performed in the Fragment Generator and the eighty Image Engines. The Fragment Generator comprises four ASIC designs: the Scan Converter (SC) ASIC, the Texel Address Calculator (TA) ASIC, the Texture Memory Controller (TM) ASIC and the Texture Fragment (TF) ASIC.[1]

The SC ASIC and the TA ASIC perform scan conversion, color and depth interpolation, perspective correct texture coordinate interpolation and level of detail computation on incoming data, and the results are passed to the eight TM ASICs, which are specialized memory controllers optimized for texel access. Each TM ASIC controls four SDRAMs that make up one-eighth of the texture memory. The SDRAMs used are 16 bits wide and have separate address and data buses. SDRAMs with a capacity of 4 Mb are used by Raster Manager boards with 16 MB of texture memory while 16 Mb SDRAMs are used by Raster Manager boards with 64 MB of texture memory.[2] The TM ASICs perform texel lookups in its SDRAMs according to the texel addresses issued by the TA ASIC. Texels from the TM ASICs are forwarded to the appropriate TF ASIC, where texture filtering, texture environment combination with interpolated color and fog application is performed. As each SDRAM holds part of the texture memory, all of the 32 SDRAMs must be connected to all of the 80 Image Engines. To achieve this, the TM and TF ASICs implement a two-rank omega network, which reduces the number of individual paths required for the 32 to 80 sort while maintaining the same functionality.

The eighty Image Engines have multiple functions. Firstly, each Image Engine controls a portion of the raster memory, which in the case of the InfiniteReality, is a 1 MB SGRAM organized as 262,144 by 32-bit words.[1][2] Secondly, the following OpenGL per-fragment operations are performed by the Image Engines: pixel ownership test, stencil test, depth buffer test, blending, dithering and logical operation. Lastly, the Image Engines perform anti-aliasing and accumulation buffer operations. To deliver pixel data for display, each Image Engine has a 2-bit serial bus to the Display Generator board. If one Raster Manager board is present in the pipeline, the Image Engine uses the entire width of the bus, whereas if two or more Raster Manager boards are present, the Image Engine uses half the bus.[1] Each serial bus is actually a part of the Video Bus, which has a bandwidth of 1.2 GB/s. Four Image Engine "cores" are contained on an Image Engine ASIC, which contains nearly 488,000 logic gates, comprising 1.95 million transistors, on a 42 mm2 (6.5 by 6.5 mm) die that was fabricated in a 0.35 micrometre process by VLSI Technology.

The InfiniteReality uses the RM6-16 or RM6-64 Raster Managers. Each pipeline is capable of display resolutions of 2.62, 5.24 or 10.48 million pixels, provided that one, two or four Raster Manager boards respectively are present.[4] The raster memory can be configured to use 256, 512 or 1024 bits per pixel. 320 MB supports a resolution of 2560 by 2048 pixels with each pixel containing 512 bits of information.[2] In a configuration with four Raster Managers, the texture memory has a bandwidth of 15.36 GB/s, and the raster memory has a bandwidth of 72.8 GB/s.

Display Generator board

The DG5-2 Display Generator board contains hardware to drive up to two video outputs, which may be expanded to eight video outputs with an optional daughterboard, a configuration known as the DG5-8. The outputs are independent and each output has hardware for generating video timing, video resizing, gamma correction and digital-to-analog conversion. Digital-to-analog conversion is provided by 8-bit digital-to-analog converters that support a pixel clock frequency up to 220 MHz.

Data for the video outputs are provided by four ASICs that de-serialize and de-interleave the 160-bit streams into 10-bit component RGBA, 12-bit component RBGA, L16, Stereo Field Sequential (FS) or color indexes. The hardware also incorporates the cursor at this stage. A 32,768 |color index map entries are available.

750 million trilinear mip-mapped, textured, 16-bit texel, four by four sub-sample anti-aliased, depth buffered pixels per second

710+ million textured and anti-aliased pixels per second

300 million displayed pixels per second, distributed over one to eight outputs

InfiniteReality2

InfiniteReality2 is what hinv (an IRIX utility that lists the hardware present in a system) refers to an InfiniteReality that is used in the Onyx2. The InfiniteReality2 however, was still marketed as the InfiniteReality. It was the second implementation of the InfiniteReality architecture, and was introduced in late 1996. It is identical to the InfiniteReality architecturally, but differs mechanically as the Onyx2's Origin 2000-based card cage is different from the Onyx's Challenge-based card cage.

Introduced by the InfiniteReality2 is an interface scheme that is used in rackmount Onyx2 or later systems. Instead of being connected to the host system via a FCI cable, the board set is plugged into the rear of a midplane, which can support two pipelines. The midplane has eleven slots. Slot six to slot eleven are for the first pipeline, which may contain one to four Raster Manager boards. Slot one to four is for the second pipeline, which may contain one or two Raster Manager boards due to the number of slots there are. Because of this, maximally configured Onyx systems use one midplane for each pipeline to avoid restricting half of the 16 pipelines to a maximum of two Raster Manager boards. Slot five contains a Ktown board if the midplane is used in an Origin 2000-based system (Onyx2) or a Ktown2 board if the midplane is used in an Origin 3000-based system (Onyx 3000). The purpose of these boards is to interface the host system's XIO link to the Host Interface Processor ASIC on the Geometry board. These boards have two XIO ports for this purpose, with the top XIO port connected to the right pipeline and the bottom XIO port connected to the left pipeline.

Reality

The Reality is a cost-reduced version of the InfiniteReality2 intended to provide similar performance. Instead of using the GE14-4 Geometry Engine board and the RM7-16 or RM7-64 Raster Manager boards, the Reality used the GE14-2 Geometry Engine board and the RM8-16 or RM8-64 Raster Manager boards. The GE14-2 has two Geometry Engine Processors, instead of four like the other models. The RM8-16 and RM864 has 16 or 64 MB of texture memory respectively and 40 MB of raster memory. The Reality was also limited by the number of Raster Manager boards it could support, one or two. When maximally configured with two RM8-64 Raster Manager boards, the Reality pipeline has 80 MB of raster memory.

InfiniteReality2E

The InfiniteReality2E was an upgrade of the InfiniteReality, marketed as the InfiniteReality2, introduced in 1998. It succeeded the InfiniteReality2 board set and was itself succeeded by the InfiniteReality3 in 2000, but was not discontinued until 10 April 2001.

It improves upon the InfiniteReality by replacing the GE14-4 Geometry Engine board with the GE16-4 Geometry Engine board and the RM7-16 or RM7-64 Raster Manager boards with the RM9-64 Raster Manager board. The new Geometry Engine board operated at 112 MHz,[6] improving geometry and image processing performance. The new Raster Manager board operated at 72 MHz,[6] improving anti-aliased pixel fill performance.

InfiniteReality3

InfiniteReality3 was introduced in 2000 along with the Onyx 3000 to supersede the InfiniteReality2. It was used in the Onyx2 and Onyx 3000 visualization systems. The only improvement over the previous implementation was replacement of the RM9-64 Raster Manager with the RM10-256 Raster Manager, which has 256 MB of texture memory, four times that the of the previous raster manager. When maximally configured with four Raster Managers, the InfiniteReality3 pipeline provides 320 MB of raster memory.

InfiniteReality4

InfiniteReality4 was introduced in 2002 to succeed the InfiniteReality3. It was used in the Onyx2, Onyx 3000 and Onyx 350. It is the last member of the InfiniteReality family, itself succeeded by the ATI FireGL-based UltimateVision, which was used in the Onyx4. The only improvement over the previous implementation was the replacement of the RM10-256 Raster Manager by the RM11-1024 Raster Manager, which has improved performance, 1 GB of texture memory and 2.5 GB of raster memory, four and thirty-two times that of the previous raster manager, respectively. When maximally configured with four Raster Managers, the InfiniteReality4 pipeline has 10 GB of raster memory. In a maximum configuration with 16 pipelines, the InfiniteReality4 contained 16 GB of texture memory and 160 GB of raster memory.[7]

Comparison

The figures presented in the tables are for a minimal 1-pipeline and a maximal 16-pipeline configuration, except for the Reality, which was restricted to single pipe operation.

Diagnostics

Try stripping the Onyx2 until you get a minimum configuration that boots without error.

Remove:

Directory RAM

All standard RAM except the pair in Bank 0 on each node <your hinv indicates all Bank 0s were working>

The Graphics module

The IO6G <if you still have the IO6 to replace it with>

The MENET and FC boards

The HD that contains the failed IRIX install

The external CD

If necessary, all but one nodeboard

<from this point make and test each change/reconfiguration *one* step at a time - it'll take more time, but it will also enable you to make more sense of any errors>

Connect a serial terminal <enable a *large* scroll back buffer on the terminal program and save each session>.

Boot to the PROM monitor and issue "resetenv"

Enter POD mode from the PROM command line by entering "pod", then:

"go cac"

"clearalllogs"

"initalllogs"

"flush"

"reset" <the system will reset>

When it restarts, stop in the PROM and:

run "enableall",followed by "update" at the PROM command line <NOTE: repeat this 3 step process after *every* hardware error>

Reboot - are there any error messages?

If so - what are they? <stop and report back to the forums>

If not, install the IO6G and graphics board <but *nothing* else yet and do not connect kb, m, or monitor> Boot to the PROM monitor, and "update" the PROM hardware invertory Boot again - if errors appear report back

If no errors appear during the boot to PROM Pwer down, re-install the boot drive, restart the system, clear/prep the drive and install IRIX <what revision is your install set, btw?>

If there are install errors <stop and report back>

If not, connect a kb, mouse and monitor, <leave the serial terminal connected for now> and attempt to boot IRIX

If booting IRIX is unsuccessful what errors appeared?

If the IRIX boot was successful, test each RAM set in Bank 0 of a nodeboard <*no* Directory RAM yet>. If any set gives errors, record the error message, init the POD log, update the PROm inventory, and test the remaining sets.

Once you have eliminated any problem RAM Try the RAM that passed in the other memory banks If there are any errors during this process, try another known good set in the problem bank if the problem persists <and cleaning the slot(s) didn't help>, skip the bank or replace the nodeboard

Once the RAM is tested and running w/o error, reinstall the MENET and FC boards You can also reinstall the Directory RAM, but in an 8 processor system it does little beyond using electricity and producing heat.

BTW - when you remove nodeboards the compression connectors <labeled "Connector Actuation 7/64 Hex> should be released first, then the phillips headed machine screws at the top and bottom of each board.

When you install nodeboard, reverse the process. Tighten the machine screws first, then the compression bolts . Following this procedure prevents the compression connector having to support the weight of the nodeboard during removal/installation.