[Opteron as a mesh topology]
ANd when CPU 1 accessess it's RAM, it does not slow other CPU's down, since every CPU has a dedicated bank of RAM at their disposal (and they can also access the RAM on other CPU's). And when CPU 1 talks to CPU 2, it does not slow down the communication between CPU3 and Northbridge (for example)

You're missing the important part here: There's just one link between northbridge and southbridge. In an Octane or Origin 200, every device participates in the system at the same level the processors do. Any two devices can get a high speed point-to-point link on demand. All of them can talk simultaneously, in any combination, without any impact on latency. (And a pretty minimal impact on bandwidth.)

Edit:
It also bears mentioning that an Opteron only has three hypertransport links. You top out at four processors before you have to use custom chipsets that erect serious performance barriers, abandoning the advantages of a mesh topology.

In your PC, even an Opteron system, all of the expansion IO is at the end of a very long tunnel. Sure, communication between CPUs is fast, but the buck stops at the northbridge. The northbridge becomes a ridiculously complex multiplexer.

An Octane, or even a Tezro, is incredibly slow compared to systems on more modern CPUs, but good god are they powerful. It hardly matters that the CPU is dreadfully slow with that kind of I/O throughput. Not just peak bandwidth: actual saturated links with uninterrupted data flows on them.

Even some of the older, pre-XIO64 systems have some ridiculous I/O. (e.g. Challenge L. The damn things were frequently equipped with something like ten SCSI controllers. They could literally sustain 200 MB/s. )

You're missing the important part here: There's just one link between northbridge and southbridge.

In some A64-systems, there is no southbridge

Quote:

In an Octane or Origin 200, every device participates in the system at the same level the processors do.

No they don't. Or are you claiming that HD's, optical-drives, ports and so forth all "participate at the same level"?

Quote:

Any two devices can get a high speed point-to-point link on demand. All of them can talk simultaneously, in any combination, without any impact on latency. (And a pretty minimal impact on bandwidth.)

And it's the same in those Opteron-systems as well. And there will e impacts of bandwith even on those SGI-machines. If those CPU's try to talk to same HD, the HD does not have the bandwidth to satisfy both of them (hell, it doesn't really have the bandwidth to satisfy ONE of them!)

Quote:

Edit:
It also bears mentioning that an Opteron only has three hypertransport links.

200-series Opteron does, 800-series has more.

Quote:

You top out at four processors before you have to use custom chipsets that erect serious performance barriers, abandoning the advantages of a mesh topology.

You are now talking about big servers and supercomputers, which are outside the scope of this discussion. I thought we were talking about workstations? Show me a SGI-workstation that has more than 4 processors. And IIRC, 800-series Opterons support 8-way glueless SMP. besides, Cray think that those Opterons are good enough for their massive supercomputers .

Quote:

In your PC, even an Opteron system, all of the expansion IO is at the end of a very long tunnel.

No they are not. PCI-E-slots are connected to the northbridge, which means that they are as close to the CPU's as possible. the devices that are "at the end of the tunnel" are stuff like SATA, USB, Firewire and the like

Quote:

Sure, communication between CPUs is fast, but the buck stops at the northbridge. The northbridge becomes a ridiculously complex multiplexer.

the hypertransoprt-link extends from the Northbridge to the soutbridge. And I bet that the devices on the southbridge are not enough to saturate the link between northbridge and southbridge. And I could also have a system where there is no southbridge, eliminating that particular "bottleneck".

Quote:

An Octane, or even a Tezro, is incredibly slow compared to systems on more modern CPUs, but good god are they powerful. It hardly matters that the CPU is dreadfully slow with that kind of I/O throughput. Not just peak bandwidth: actual saturated links with uninterrupted data flows on them.

But when I look at the specs of those machines, I see pretty low-end specs. 200Mhz CPU-bus? Relatively slow RAM? I just dont see the benefit from where I'm standing. Sure we have the "crossbar" and all that, but with the Opteron for example, I have uber-fast point to point links between components.

Quote:

Even some of the older, pre-XIO64 systems have some ridiculous I/O. (e.g. Challenge L. The damn things were frequently equipped with something like ten SCSI controllers. They could literally sustain 200 MB/s. )

With enough HD's the Opteron-system could also support metric assloads of thoroughput. The HD's on SGI-systems are not magically faster than the ones on some Opteron-machines. And if you take the memory-bandwidth in to account, I wuld say that the Opteron massacres the SGI, both in bandwidth and on latency._________________My tech-blog | My other blog

No they don't. Or are you claiming that HD's, optical-drives, ports and so forth all "participate at the same level"?

Yep.

Quote:

Quote:

Edit:[i]
It also bears mentioning that an Opteron only has three hypertransport links.

200-series Opteron does, 800-series has more.

I was talking about the 800.
Three links to other processors.

Quote:

You are now talking about big servers and supercomputers, which are outside the scope of this discussion. I thought we were talking about workstations? Show me a SGI-workstation that has more than 4 processors.

Onyx?
Origin?

Quote:

And IIRC, 800-series Opterons support 8-way glueless SMP. besides, Cray think that those Opterons are good enough for their massive supercomputers .

And then we're into the realm of custom chipsets with "interesting" performance limitations.

Evangelion wrote:

Quote:

In your PC, even an Opteron system, all of the expansion IO is at the end of a very long tunnel.

No they are not. PCI-E-slots are connected to the northbridge, which means that they are as close to the CPU's as possible. the devices that are "at the end of the tunnel" are stuff like SATA, USB, Firewire and the like

This was my whole point.
No PCI-E, no northbridge intermediaries. In an SGI, option cards participate at the same level as the CPUs.

I/O is connected to the northbridge. How exactly is that different from having I/O connected to the HEART?

Quote:

Evangelion wrote:

No they don't. Or are you claiming that HD's, optical-drives, ports and so forth all "participate at the same level"?

Yep.

Really? So those SGI-machines used some kind of Uber-HD's that could saturate the buses? Or did they in fact use regural SCSI-HD's? How about optical drives? Were they some uber-drives that are 50 times faster than optical drives on other systems?

The drives in SGI-machines were regural drives. You could use the exact same drives on some other systems as well (including PC's). So where is the difference here?

Quote:

Quote:

Edit:[i]
It also bears mentioning that an Opteron only has three hypertransport links.

200-series Opteron does, 800-series has more.

I was talking about the 800.
Three links to other processors.[/quote]

Yep, you are right, it has three links. But it still supports 8 CPU's (16 cores in case of dual-core).

How many links does SGI-machines have? Aren't they all connected to just the HEART, which means that they have just one link?

Quote:

Quote:

You are now talking about big servers and supercomputers, which are outside the scope of this discussion. I thought we were talking about workstations? Show me a SGI-workstation that has more than 4 processors.

Onyx?
Origin?

Neither of those are WORKSTATIONS.

Quote:

This was my whole point.
No PCI-E, no northbridge intermediaries. In an SGI, option cards participate at the same level as the CPUs.

Last time I checked, SGI-machines had PCI-slots, USB-ports and so forth. So how exactly do they "participate at the same level"? Or are you saying that PCI-slots in SGI-machines are not really PCI-slots, but some kind of uber-slots unlike the PCI-slots on other systems?

I doubt it. In a PC-machines, the PCI(-E) slot are connected to northbridge. In SGI-machine they are connected to HEART. What is the difference here?_________________My tech-blog | My other blog

Really? So those SGI-machines used some kind of Uber-HD's that could saturate the buses? Or did they in fact use regural SCSI-HD's? How about optical drives? Were they some uber-drives that are 50 times faster than optical drives on other systems?

The SCSI controllers participated as XIO devices. I think you were being disingenuous. Or do you usually plug your SCSI devices right into a PCI slot on your PC?

We're all friends here. There's no reason to be a dick about anything.

Quote:

How many links does SGI-machines have? Aren't they all connected to just the HEART, which means that they have just one link?

That's right. Just one link.

Evangelion wrote:

Quote:

Onyx?
Origin?

Neither of those are WORKSTATIONS.

They both came in deskside configurations with graphics heads and local consoles. The Onyx series were explicitly workstations. (Origin series machines were rarely workstations, but many were available as such if you were willing to pay enough.)

Evangelion wrote:

Quote:

This was my whole point.
No PCI-E, no northbridge intermediaries. In an SGI, option cards participate at the same level as the CPUs.

Last time I checked, SGI-machines had PCI-slots, USB-ports and so forth. So how exactly do they "participate at the same level"? Or are you saying that PCI-slots in SGI-machines are not really PCI-slots, but some kind of uber-slots unlike the PCI-slots on other systems

PCI slots on an SGI are (were) used for low-speed devices. In fact, on an Octane or Origin 200, PCI card cages were an option you had to pay extra for, as they were not commonly used.

The only SGI system that has USB is the Fuel. And the Fuel is a piece of shit. It's hardly more than a MIPS reference board with ARCS firmware. It's also the only PCI based SGI. (Edit: I could be wrong on that one.) Surprisingly, it was not the first SGI to abandon SGI-designed graphics. (The Onyx is now available with an ATI GPU on an SGI chipset ;_; )

Today, things are different from the good old days. As SGI is no longer interested in developing their own hardware, some of the newer option cards are just off-the-shelf PeeCee PCI-X cards. The firewire and gigabit-over-copper ethernet come to mind.

Quote:

I doubt it. In a PC-machines, the PCI(-E) slot are connected to northbridge. In SGI-machine they are connected to HEART. What is the difference here?

The difference is that in an SGI, you rarely use the PCI slots. (I suspect they exist primarily for home-grown hardware. Developing a PCI card is really fairly cheap / easy.)

The SCSI controllers participated as XIO devices. I think you were being disingenuous. Or do you usually plug your SCSI devices right into a PCI slot on your PC?

We're all friends here. There's no reason to be a dick about anything.

I'm not "being a dick". I just fail to understand what you are saying here. In SGI you have bunch of devices in the system. And in PC's, you have bunch of those same devices in the system. But the ones in the SGI-machine are somehow faster. I'm just trying to understand that how they are faster.

Quote:

Quote:

How many links does SGI-machines have? Aren't they all connected to just the HEART, which means that they have just one link?

That's right. Just one link.

So how is it better than having four links (one to northbridge, three to other CPU's)?

Quote:

They both came in deskside configurations with graphics heads and local consoles. The Onyx series were explicitly workstations. (Origin series machines were rarely workstations, but many were available as such if you were willing to pay enough.)

Looking at pictures of Onyx, I see a HUGE computer. of course it could be used as a workstation if you so wanted, but it's pretty unfair to compare such system to more typical workstations.

As for Origin.... Yes it was available as a workstation (well, SGI calls it a "workgroup server"). And it could have four CPU's, but not more.

Quote:

PCI slots on an SGI are (were) used for low-speed devices. In fact, on an Octane or Origin 200, PCI card cages were an option you had to pay extra for, as they were not commonly used.

But they were available

Quote:

Today, things are different from the good old days. As SGI is no longer interested in developing their own hardware, some of the newer option cards are just off-the-shelf PeeCee PCI-X cards. The firewire and gigabit-over-copper ethernet come to mind.

maybe those "PeeCee"-cards are as good or better than the ones SGI designed?

Quote:

The difference is that in an SGI, you rarely use the PCI slots. (I suspect they exist primarily for home-grown hardware. Developing a PCI card is really fairly cheap / easy.)

You are not required to use the PCI-slots in a PC either. And you still haven't explained what is the magical ingredient that makes the SGI faster._________________My tech-blog | My other blog

As for Origin.... Yes it was available as a workstation (well, SGI calls it a "workgroup server"). And it could have four CPU's, but not more.

You're talking about the Origin 200. There were graphics options available for the larger, deskside and rackmount, Origin systems, which could have dozens of processors.

Quote:

maybe those "PeeCee"-cards are as good or better than the ones SGI designed?

SGI doesn't design their own hardware line any more. It's been years since they did.
There are no SGI-designed gigabit-over-copper or firewire options.

Evangelion wrote:

You are not required to use the PCI-slots in a PC either. And you still haven't explained what is the magical ingredient that makes the SGI faster.

evangelion wrote:

In SGI you have bunch of devices in the system. And in PC's, you have bunch of those same devices in the system. But the ones in the SGI-machine are somehow faster. I'm just trying to understand that how they are faster.

This is very, very, very simple.

The PCI bus is the primary bus in a PC. It is the only way to add option hardware. Most devices are attached to the PCI bus, far away from the core of the system.

In an SGI, the primary bus is the system bus. Option cards participate at the same level as the processors themselves. Option cards can have high speed links to each other or to the processors.

To analogize to modern AMD64 systems, it would be as if every option card had a direct hypertransport link to each and every other device in the system.

The idea behind Octane is you have a central chip - XBow - which allows all the Devices in an Octane talk to each other directly. It's like a traffic cop, but it's a traffic cop that doesn't get in the way of the traffic flow. There are eight (technically, although only seven appear to be used) interlinked XIO Ports. Here's a breakdown based on my knowledge:

Two links for the system board (One for 'HEART', the other for 'BRIDGE')

Four links for the XIO Expansion slots

One link for the PCI Card cage (a second 'BRIDGE'; it allows three 64bit PCI devices to be connected)

The XBow is very much like the chips found in high-end network switches. They are constantly interlinking ports so that two or more devices are routed efficiently without hitting bottlenecks. The XBow on an Octane is just a smaller cousin to the Crossbow chip on the bigger Origin 2000 clusters.

HEART is the chip that manages access to CPU and Memory (the system ASIC). It was built to allow up to a max of four processors in an Octane, however, only two-processor systems ever made it to production (the "ski-jump" heatsink on an Octane's HEART chip is evidence of some possibility that early Octane prototypes sported four CPUs). The original Octane's only allowed a max of 2GB of main memory. The newer models allowed for a max of 8GB.

BRIDGE is capable of efficiently connecting up to eight PCI devices into the XIO layer. In Octane's case, four of these available PCI devices are used:

Two QLogic 1040B scsi controllers (First services up to three internal devices; Second services up to 15 external devices)

The IOC3 -- Which provides a 10/100mbps NIC, two serial ports, parallel port, keyboard/mouse, voltage sensors, and the real-time clock

RAD1 Audio Porcessor, which allows for digital and analog input/outputs, including optical connectors.

Where all these pieces come together is in the XBow. If the graphics card needs full, unrestricted access to the disks, XBow allows it without the video data having to be routed through memory. The same goes for any other device in the system. Additionally, both CPUs can run at full speed without having to compete for resources.

When you want to compare an Octane to a different system, you have to compare it to systems of the same era. Consider that Octane's came out in 1997. Even then, when they sported 175MHz R10000 CPUs, the amount of expandibility and options utterly crushed anything else offered back then. Only in recent times has the x86 line managed to slowly catch up. Modern Octane's can sport up to dual 600MHz R14000 CPUs now (although they'll cost you your firstborn to obtain such a machine). Even these systems are a bit old, coming out in ~2003 if I recall correctly. So to be fair, you'd need to compare them with 2003-era Opteron's and such.

The problem with x86 is exactly that -- it's x86. No matter how you re-work it, you're still dealing with design limitations that originated from an x86 desktop. They've just been made so ridiculously fast, that you don't notice it half the time. Everything from the BIOS (itself reportedly a mess of 20 years of spaghetti code) to the bus topology still has issues.

Opteron is a step in the right direction. It's wicked fast, and speed-wise, blows an Octane clear out of the water. The CPU interconnects offer advantages other processors (Notably Intel's) lack. But all that speed is needed to do the heavy lifting to constantly move data around its limited I/O space._________________"The past tempts us, the present confuses us, the future frightens us. And our lives slip away, moment by moment, lost in that vast, terrible in-between."
--Emperor Turhan, Centauri Republic

Damn, I forgot all about this thread! for starters: I would like to apologize on my part for the "heated" discussion that took place. Furthermore, It was not my intention to disparage SGI or their hardware. I was merely interested in knowing what is their "secret formula"_________________My tech-blog | My other blog

Relative to the hypertransport vs crossbar discussion, you miss a key point.

With hypertransport in PC land, for your OS-on-harddisk to move some data from your ram to the buffer of your soundcard (ie, to play some audio) it needs to go to the cpu to do this .. it has a fast link to the cpu .. the cpu then goes to the ram .. has a sickly fast link to the ram .. gets the data .. back to the cpu .. down to soundcard .. has fast link to soundcard .. done.

With crossbar (the non-pci parts of a Sparc based machine use this model as well) the same scenario looks like this.
OS-on-harddisk gives a command to one of those little magic chips on the frontplane, and the data goes from the ram directly to the soundcard buffer without ever even SEEING the cpu, nevermind stopping there for routing/processing.

The key to understanding the difference is that yes, HT is way way faster on a cpu-to-whatever link .... but -everything- in the system is sharing the cpu all the time .. with crossbar, most other hardware never needs to use the cpu to do its stuff. This is one of the reasons slower-clocked sparc/mips systems can seem as-fast-as pc systems a generation ahead of them.

I've got a sunblade1000 on my desk in the other room .. 2cpu 750mhz 2 gigs of ram. I've got a sick sick gaming system on my desk in this room .. Athlon64 X2 2.6ghz, 2 gigs dual chan ram, sata, etc. The AMD box builds a linux kernel about 40 times faster than the sparc. The CPU just does the math way faster. However in terms of "desktop responsivness" the sparc box is at least 90% as fast. During normal operation all the hardware-level communication that happens in the background is direct .. hardware-hardware .. where on my AMD box, the CPU wastes huge amounts of time with hardware-cpu-hardware communications thus giving an artificial performance hit.

If you stuck one of these high-end PC cpus into a system which had a crossbar arch, you would have one hellova fast system... sadly this would require some serious re-engineering on the part of the CPU makers...._________________-Tim Smith

If you stuck one of these high-end PC cpus into a system which had a crossbar arch, you would have one hellova fast system... sadly this would require some serious re-engineering on the part of the CPU makers....

There is such a system, but it's not in anybody's hobby price range. OctigaBay (based in Vancouver, BC, CA) developed a crossbar-interconnect for HyperTransport. They were subsiquently bought out by Cray, and their technology is now in the Cray XD1 - the system is sold with Linux. The unit is actually a cluster of dual-core Opteron 200s (6 sets of linked processors, with 2 cores per processor), crammed into 3U of space. 24 cores in 3U of space in total.

Face it, even XIO is damn slow compared to todays systems. 800MB/sec aren't a lot, even if it is full duplex and non blocking.
Where it shines, is in basic QoS, afaik the HEART can allocate bandwidth to some devices, which means that for example your sound will not ever stutter.
That's something you don't have in todays X86 systems.

And in 1997 it was very very fast, now it's not anymore, but it is still a system that is fast enough in some respects, very stable, and almost indestructible. It looks good and won't ever crash, usually .

In todays PCs the Northbridge has a function that is almost identical to the HEART, both connect some peripherals directly (disk controller on modern PCs, don't know about native XIO parts on the Octane), a PCI bus, which in turn connects slower peripherals, IOC3 and sound and similar on the Octane, as the PC does, too.

Interconnect-wise the modern AMD PCs are more advanced than a Octane, Hypertransport is a very nice architecture, more flexible than the fixed star topology of the octane, which does not have MP that is that advanced, the CPUs share the bandwidth to memory and HEART, where the modern opteron has dedicated memory and meshes peripheral access.

I like my octane, but it's getting a bit old, even in architectural terms, by now.

Face it, even XIO is damn slow compared to todays systems. 800MB/sec aren't a lot, even if it is full duplex and non blocking.

So you're saying that an XIO SCSI/GigE/whatever card with 800MB/s of DEDICATED bandwidth is somehow slower than 33MB/s of SHARED bandwidth that a PCI card gets? Or the 200MB/s that a 1x PCI-E card gets? I think not. Only a 4x PCI-E card, or a 64 bit 133Mhz PCI-X card can compare - and how many of these 4x/PCIE or 64/133 PCIX slots does your PC have? Having your peripherals on the XIO bus in an SGI, would be like having your SCSI/VGA/Ethernet directly plugged into the Hypertransport bus on an Opteron.

Galahad wrote:

Where it shines, is in basic QoS, afaik the HEART can allocate bandwidth to some devices, which means that for example your sound will not ever stutter. That's something you don't have in todays X86 systems.

Correct. Take a PC from 1997 (when Octane2 was new) and try to play back full screen HD video, while running a bunch of other tasks on the PC at the same time. (Forget the fact that a 1997 PC cannot play HD video in real time, period.) It will be like watching a slide-show it is so slow and studdering. Octane can handle that with ease (I've tested this myself!). Even a brand new PC will be working pretty hard to play back HD video and do other tasks at the same time.

Galahad wrote:

In todays PCs the Northbridge has a function that is almost identical to the HEART, both connect some peripherals directly (disk controller on modern PCs, don't know about native XIO parts on the Octane), a PCI bus, which in turn connects slower peripherals, IOC3 and sound and similar on the Octane, as the PC does, too.

Incorrect. Northbridge is a bottleneck between CPU and perhiperals, and perhiperals reside on a slower secondary bus (PCI). For a Northbridge to be comparable to XIO, you would have to be able to attach your VGA, SCSI, Ethernet, etc. directly onto the Hypertransport bus - and we all know that isn't possible now or any time soon.

Galahad wrote:

Interconnect-wise the modern AMD PCs are more advanced than a Octane, Hypertransport is a very nice architecture, more flexible than the fixed star topology of the octane, which does not have MP that is that advanced, the CPUs share the bandwidth to memory and HEART, where the modern opteron has dedicated memory and meshes peripheral access.

Meanwhile my Celeron system has only a 66mhz FSB sans DDR, and it's i810 graphics chip can't top 16 bits per channel in accelerated mode. It's 32-bit CPU is the bottom of the barrel in it's class, and in it's timeframe and it's lack of AGP makes it bad for graphics. It can't pump good graphics through PCI, can it?!?!

...Yet, in practice, it's so much faster it's not the slightest bit fair to compare the things. The celeron beats the shit out of the N64 in real performance 10 times over.

Basically what we have here is a situation that the Octane had great technology and very good design. But the steady march of development makes up for that. ~In practice~, newer PCs can just do things that the Octane just can't. Even my 1.0ghz Athlon on a cheapest motherboard can run Gimp filters 10% faster, compiles programs faster, plays video perfectly too and so on.

It's called having an old computer.

If you want to discuss this, compare an Octane to a PC from the same timeframe. About the year 2000. This Celeron PC is about the same age as my Octane and my Octane absolutely murders it in every way imaginable. It's wonderful, though it is comparing a budget PC to something that cost $20,000. It'd be more reasonable if I had a top of the line PC but I don't.

Now if you want a great system, it would be nice to apply all these design concepts with new technology, new chips, new manufacturing processes, new design processes and so on. Unfortunately SGI will never do that because they're dieing.

Personally I kinda expect SGI to get bought out by someone like AMD who can adopt some of the patented technologies and make real use of them._________________My systems:

This is like if I were to start an argument over wether my Nintendo64 (sgi made circa 1996) is better than my PC at my office. (Celeron 700mhz, from around 2000.)

"Better" is not really a term we can use to compare the two. "Better at a certain task" is preferred. I have no doubt that the aforementioned celeron can beat the crap out of the N64 at C++ compiling, at running Firefox, and at running OpenOffice. I have no doubt. But the N64 is certainly a faster system than the celeron when it comes to rendering real-time 3D graphics like with the Mario64 game. The celeron simply can't do that. My point in the earlier post was not that the Octane was better than a modern PC in every conceivable way - because it isn't. My point was that the Octane is still competetive compared to a modern PC for _certain_very_specific_tasks_. Also, that (Mhz speeds aside) the architecture used in the Octane is of a more ideal design for high-bandwidth I/O to various parts of the system.

The Nintendo64 has a 250mhz FSB & RAM double pumped with DDR to 500mhz. It has a 64-bit CPU. It's also got a Reality Engine derived GPU capable of real time antialiasing, advanced texture filtering, mipmapping, 24 and 32 bit color.

Frapazoid wrote:

Meanwhile my Celeron system has only a 66mhz FSB sans DDR, and it's i810 graphics chip can't top 16 bits per channel in accelerated mode. It's 32-bit CPU is the bottom of the barrel in it's class, and in it's timeframe and it's lack of AGP makes it bad for graphics. It can't pump good graphics through PCI, can it?!?!

...Yet, in practice, it's so much faster it's not the slightest bit fair to compare the things. The celeron beats the shit out of the N64 in real performance 10 times over.

Again... "so much faster" at what? At compiling gentoo? certainly. But at rendering 3D games in real-time? No. What do you call "real performance"? Is it how responsive KDE is? If so, the celeron is definitely better. Is it how many textured pixels per second it can render to the frame buffer? If so, the N64 wins.

Frapazoid wrote:

Basically what we have here is a situation that the Octane had great technology and very good design. But the steady march of development makes up for that. ~In practice~, newer PCs can just do things that the Octane just can't. Even my 1.0ghz Athlon on a cheapest motherboard can run Gimp filters 10% faster, compiles programs faster, plays video perfectly too and so on. ...

If you want to discuss this, compare an Octane to a PC from the same timeframe. About the year 2000. This Celeron PC is about the same age as my Octane and my Octane absolutely murders it in every way imaginable. It's wonderful, though it is comparing a budget PC to something that cost $20,000. It'd be more reasonable if I had a top of the line PC but I don't.

blah blah blah. FWIW The Octane was 1997, not 2000. But that isn't the point. Of course any PC from 1997 is much much slower than an Octane from 1997, that we all know. How many times does it need to be said? SGI's are purpose built machines designed for specific tasks. Peecee's are general purpose machines designed for a variety of tasks. Of course The GIMP is faster on a modern PC than it is on an Octane. Duh. No one is arguing that. But when is the last time you tried putting together a huge 3D physics simulation (e.g. automotive crash analysis) and rendering it in HD quality. I think you'd be quite surprised how an 2x600Mhz Octane2 with V12 gfx stacks up against a modern PC.

Relative to the hypertransport vs crossbar discussion, you miss a key point.

With hypertransport in PC land, for your OS-on-harddisk to move some data from your ram to the buffer of your soundcard (ie, to play some audio) it needs to go to the cpu to do this .. it has a fast link to the cpu .. the cpu then goes to the ram .. has a sickly fast link to the ram .. gets the data .. back to the cpu .. down to soundcard .. has fast link to soundcard .. done.

What difference does it make? In both cases, the CPU is still processing while data is being transferred; an Opteron does not stall processing during a DMA transfer from its locally attached RAM.

So, in a crossbar system, the soundcard goes to the memory controller, which goes to the RAM chips. In an Opteron system, the soundcard goes to the PCIe-HT bridge, which goes to the memory controller, which goes to the RAM chips. OK, so the PCIe-HT bridge adds a bit of latency (but see HTX), but the higher clock speeds, and lower latencies per-stage of modern hardware mean that this effect is hidden.

Don't get me wrong; for its time, Octane was an incredible design. It says something about the quality of engineering that we're comparing it to 2004 and later systems to get to commodity equivalents. It's all very well that XBow was amazing (40us latency in the Xbow chip was incredibly low for its time), but if your Octane has 80us latency on its RAM, and my Opteron keeps to it 10us latency, I've got an extra 70us to waste in bus design.

Does anyone have a kernel config for an SGI Origin against a 2.6.15-18 ish kernel?

I'm trying to rebuild a kernel which works as well as Kumba's ip27-r10k+-20040528.img.

Look no further than /proc/config.gz. Boot up the netboot image, then simply use zcat /proc/config.gz > /gentoo/usr/src/linux/.config to dump it where needed. _________________Stuart Longland (a.k.a Redhatter, VK4MSL)
I haven't lost my mind - it's backed up on a tape somewhere...