There has been quite a few on running Gentoo/Linux on SGI-hardware. But one more thread can't harm anyone

I recently got a hard-on (don't ask...) for "alternative architectures". So I started looking around for a computer to play around with. And since I'm a geek, Silicon Graphics has always been close to my heart. And to my surpsire I noticed that you can get quite nice machines for modest amounts of money. Octane 2 with 2x 300MHz R12000 and 1GB of RAM can be had for under $500.

Now, if I were crazy enough to actually buy one, what kind of machine would it be? I noticed some old threads where it was said that SMP does not work on those machines. But some more recent websites say that SMP is supported today. But since X does not work, I think Octane would be relegated to server-duty or remote X-terminal (altough using one as an actual workstation would increase my geekhood by at least factor of five!). What kind of performance would it offer when compared to regular PeeCees? Looking at pictures of the box, I noticed few quite large fans, is it noisy? How much power would it consume?

Now, if I wanted a SGI-workstation for real workstation use with Linux (who doesn't?), what kind of machine should I be looking at, considering compatibility and performance? By "performance" I mean machine that is fast enough for everyday use._________________My tech-blog | My other blog

With the Octanes, SMP is now quite reliable, and an X driver for ImpactSR is in the works. At the moment, the said X driver gets the colours wrong, and is horrendously slow (it uses shadowfb, as the framebuffer's internal memory is not directly accessible by the OS), but it works. As to when we can expect a true solid working X driver though, nobody can say.

Best supported machines though, at the moment, you're looking at either an Indy, or an O2. These both run very reliably, and are well supported all round. If going for an Indy, try to get one with an R4400 or an R5000 CPU -- the R4600's have issues, especially the ones with rev 1.7 CPUs. As for O2s, currently only the R5000, RM5200 and RM7000 models work under Linux. The R10000 O2s, suffer much the same issues as the Indigo2 Impact systems -- that is, caching issues caused by speculative execution, and so Linux is extremely flakey, if it runs at all. The RM7000 O2s are also not without their problems. RM5200 and R5000-based O2s however work fine.

With the Octanes, as I mentioned, ImpactSR works for console, as does VPro. I'm not sure about other graphics options though. For anything more exotic, you may be stuck with serial console. There's no X driver for VPro yet... nor is there one released for ImpactSR, although one is in the pipeline. This doesn't stop you from running remote X however. I already do this with my Indigo2 Impact machine -- remote X is not just easy to set up, but it also works a treat with both other desktop machines, and real Xterms alike.

As for power usage... The two machines I have (R4600 Indy and a Indigo2 Impact), aren't too bad on the noise or power usage factor. They're not über quiet, but they're not vacuum cleaners either. Power drain isn't that much more than the average desktop system. ~200 or 300W. Performance wise, the Octanes certainly have lots of grunt, especially if you get a dual CPU one. O2s usually don't fair too badly, with R5000 CPUs running up to 180MHz (and possibly beyond), or RM5200's at 300MHz(? not sure). Indy's are down the lower end of the scale, most being quite limited in the specs, but also suffering from problems such as a slow SCSI driver which barely puts out 2Mbps on a good day.

My recommendation, try to aim for an R5k/RM5.2k O2, or if you don't require X on the local system, an Octane isn't a bad buy._________________Stuart Longland (a.k.a Redhatter, VK4MSL)
I haven't lost my mind - it's backed up on a tape somewhere...

Thanks for your reply . Now, I KNOW that I shouldn't stare at MHz-ratings. But what kind of performance could I expect from those R5000's and RM5200's? There seems to be R12000-based O2's available here. Would those be OK with Linux? What is the price-range for R5000/RM5200-based O2's?

On my quest for "alternative computing" I started thinking about Sun Ultra 60 as well. Prices (for the 2x 360MHz models) are more or less the same as with Octane2's. Upside compared to SGI would be better supported hardware. Downside would be the fact that those Sun-boxes are simply not as sexy as SGI-boxes are. I have no idea how the two compare as far as performance is concerned.

EDIT: I looked around, and while most websites refer to O2 as a 1-way system, some sites sell O2's with up to 4-way configurations. Does O2 support SMP?_________________My tech-blog | My other blog

There is no MP O2, Octane is only dual, so they probably mean a real big iron, origin or onyx probably.

Personally I have high hopes for a future good linux support on octanes, it's almost there now imo. And these things are quite fast and not very expensive, you can get a R12ka 400MHz 1GB Ram V6 Octane off german ebay for 300euro. Ok, so there's no X support for vpro, but the developer (skylark) has 3D working, and I don't think X is that far off.

O2 is uni-processor only. Octane and Octane2 are single or dual AFAIK, and Origin/Onyx go to ridiculously large systems (a 128-way Origin 2000 was at the time the biggest SMP Linux had ever booted).

I thought the SPARC based Fujitsu AP1000+ had more CPUs than that, way back in the days when the MIPS port wasn't even started, but there's a fair chance I'm remember wrong (or that it was more a cluster than a regular SMP.)

O2 is uni-processor only. Octane and Octane2 are single or dual AFAIK, and Origin/Onyx go to ridiculously large systems (a 128-way Origin 2000 was at the time the biggest SMP Linux had ever booted).

I thought the SPARC based Fujitsu AP1000+ had more CPUs than that, way back in the days when the MIPS port wasn't even started, but there's a fair chance I'm remember wrong (or that it was more a cluster than a regular SMP.)

You're right Kumba. The AP1000+ had the potential to be a 1024 node cluster, but the configuration documented in the link below only had 16 SPARC CPU's, though it appears a 32 node system was on the way. Remember that this was 1996!

SGI Machines are awesome. I have an Indy R5000@180Mhz, 256MB and an Octane2 R12000@400Mhz, just upgraded to 8.0GB memory and V12 graphics. I'm running IRIX 6.5.22 on both of them. The whole point of the SGI machines is their amazing visualization capability - I do like Gentoo (I run it on my alpha) but, if it doesn't have full, stable, accelerated 3D support for the video card, it's useless as a workstation OS for these machines.

My Octane2 is five years old now, yet it can edit high-definition video streams in REAL TIME. (An HD video frame can fit entirely within the Octane's CPU cache!!) Try that on even a brand new ultimate peecee and see how far you get. For small tasks, a modern peecee is faster, but for large tasks (particularly video editing or 3D oriented), the Octane2 can beat the pants off of even the latest Xeon or Opteron workstations.

For video playback, I can play uncompressed high-definition video using mplayer on the Octane2 and the A/V sync is steady at 0.000 (perfect sync). Even on my 2.4Ghz P4 peecee with dual-channel DDR memory, it isn't perfectly synced like that.

It's not just the super efficient processor, or the awesome 3D graphics card that makes Octane fast - it's the interconnects. Rather than traditional bus architecture as is used in a peecee, the Octane uses a cross-bar switch giving large amounts of dedicated bandwidth between the CPU, memory, video, and other peripherals - it's the same crossbar switch architecture of the Origin class supercomputers (few small differences).

It's not just the super efficient processor, or the awesome 3D graphics card that makes Octane fast - it's the interconnects. Rather than traditional bus architecture as is used in a peecee, the Octane uses a cross-bar switch giving large amounts of dedicated bandwidth between the CPU, memory, video, and other peripherals - it's the same crossbar switch architecture of the Origin class supercomputers (few small differences).

late reply, but.... Are those interconnects REALLY faster than on a modern PC? I mean, if we compare SGI to Opteron for example? I mean, the Opteron was good enough for Cray . The Opteron has (among other things)

- Glueless SMP
- Dedicated memory-controllers. Each CPU has a dedicated 128bit channel to DDR-RAM, and they can also access the RAM attached to other CPU's (which means that effective bandwidth doubles as number of CPU's double)
- 1Ghz dedicated bus between each CPU and the rest of the system
- PCI-Express for graphics and other expansion-cards

I would be very interested in hearing more info regarding the interconnects and buses on SGI-machines. I have SOME information, but there are gaps in my knowledge._________________My tech-blog | My other blog

It's not just the super efficient processor, or the awesome 3D graphics card that makes Octane fast - it's the interconnects. Rather than traditional bus architecture as is used in a peecee, the Octane uses a cross-bar switch giving large amounts of dedicated bandwidth between the CPU, memory, video, and other peripherals - it's the same crossbar switch architecture of the Origin class supercomputers (few small differences).

late reply, but.... Are those interconnects REALLY faster than on a modern PC? I mean, if we compare SGI to Opteron for example? I mean, the Opteron was good enough for Cray ;). The Opteron has (among other things)

- Glueless SMP
- Dedicated memory-controllers. Each CPU has a dedicated 128bit channel to DDR-RAM, and they can also access the RAM attached to other CPU's (which means that effective bandwidth doubles as number of CPU's double)
- 1Ghz dedicated bus between each CPU and the rest of the system
- PCI-Express for graphics and other expansion-cards

I would be very interested in hearing more info regarding the interconnects and buses on SGI-machines. I have SOME information, but there are gaps in my knowledge.

Think of all the various subsystems in an Octane as being arranged in a networked star-topology. The HEART chip on the frontplane of an Octane is the little guy responsible for linking everything together. It allows one part of the system to access any other part at full speed (usually simultaneously with other parts accessing other parts, all without interrupting each other).

Truly a sickeningly powerful system, but also costly. If PCs used this setup from Day 1, we'd probably be in a whole different era of computing where even Windows wouldn't seem slow. Best compared below in the following exceprt between Evil Genius and Robert, one of his henchmen (guess the film):

Code:

Evil Genius: When I have the map, I will be free, and the world will be different, because I have understanding!

Robert: Understanding of what, master?

Evil Genius: Digital watches. And soon I will have understanding of videocassette recorders and car telephones.
And when I have understanding of them, I shall have understanding of computers. And when I have
understanding of computers, I shall be the Supreme Being! God isn't interested in technology. He
knows nothing of the potential of the microchip or the silicon revolution. Look how he spends his
time: forty-three species of parrots! Nipples for men!

Robert: Slugs.

Evil Genius: Slugs! He created slugs! They can't hear, they can't speak, they can't operate machinery. If I
were creating the world, I wouldn't mess about with butterflies and daffodils. I would've
started with lasers, eight o'clock, day one!

Think of all the various subsystems in an Octane as being arranged in a networked star-topology.

Yes, I know of that. But of we look at Opteron for example, we will see that each CPU has dedicated RAM. And each CPU is directly connected to the other CPU's as well, using a very fast bus, giving some massive bandwidth. IIRC, the bus on SGI's runs at 200Mhz, whereas on Opteron it runs at 1000Mhz.

Quote:

The HEART chip on the frontplane of an Octane is the little guy responsible for linking everything together. It allows one part of the system to access any other part at full speed (usually simultaneously with other parts accessing other parts, all without interrupting each other)

Doesn't modern PC's have something similar these days? And even if they used shared buses, since those buses are so fast, it's still very fast. And then we have the uber-fast RAM, uber-fast cache, uber-fast expansion-buses (16x PCI-Express...). Yes, SGI's are a marvel of engineering. But it seems to me that PC's simply outspec them these days. Yes, their smart engineering has brought them very far. But it can only bring them that far.

Now if SGI (or someone else), took a R16000, moved it to 0.09um SOI-process, replaced the bus with multiple 1Ghz hypertransport-buses, gave it a fast dedicated memory-controller, overhauled the expansion to PCI-E... It could be done, but only if the will was there. End-result would be a system with A LOT more bandwidth than they have today._________________My tech-blog | My other blog

Yes, I know of that. But of we look at Opteron for example, we will see that each CPU has dedicated RAM. And each CPU is directly connected to the other CPU's as well, using a very fast bus, giving some massive bandwidth. IIRC, the bus on SGI's runs at 200Mhz, whereas on Opteron it runs at 1000Mhz.

Opterons may have a faster pipe, but I believe SGI wins because the "pipes" are enourmous. They're slow, but due to the sheer size of data that can be shifted around the system, it still outclasses modern-day systems. The achillies heel of an Octane (and an Origin), though, is the Compression Connectors. These little doodads are what allow the transferance of such large amounts of data, but the things are about as fragile as a computer component can get. They're essentially little bristle pads of copper wire that is compressed onto a pad on the frontplane of an Octane. The slightest mistake from even touching one of these things with your fingers can ruin it.

Although, prolonged soaking in an isopropyl alcohol solution, or for you MIT guys out there, soaking in a graphite bath, will clean the things rather well, since they can be carefully detached from the XIO board they're on.

Evangelion wrote:

Doesn't modern PC's have something similar these days? And even if they used shared buses, since those buses are so fast, it's still very fast. And then we have the uber-fast RAM, uber-fast cache, uber-fast expansion-buses (16x PCI-Express...). Yes, SGI's are a marvel of engineering. But it seems to me that PC's simply outspec them these days. Yes, their smart engineering has brought them very far. But it can only bring them that far.

Think of it this way (and this might be a little exaggerated): Imagine an Octane as an 8" steel tube. You can roll a golf ball down this tube with relative ease, and with very little effort exerted. Now imagine a PC as a common-variety garden hose. You can't put a golf ball down that with minimal effort. You need a lot of effort, and provided sufficient force (and the assumption that our mythical garden hose will stretch w/o breaking), you could force a golf ball down a common-variety garden hose.

A.k.a., you can shift the same kinds of data around on a PC nowadays as you can an Octane, but the Octane can do it much easier because the hardware was not only designed for this specific purpose, it was fine-tuned for this specific purpose. Everything from the CPU to the Graphics card is solely designed to do one task, and to do it very well, and in the Octane's case, that's a graphics workstation.

Thanks for all the info; it really supplements a book I started reading about the internals of the MIPS processor design itself (and how to write assembler for it; really detailed, heady stuff). Now I just have to find an Octane2 on eBay for a reasonable price . . . and kick the pants off my Pentium3. ^_^

Opterons may have a faster pipe, but I believe SGI wins because the "pipes" are enourmous.

Are they? Wikipedia sez:

Quote:

The XIO employs two source-synchronous channels (one in each direction), each 8 or 16 bits wide. They are clocked at 400 MHz to achieve peak rates of 800 MB/s (ie. in megabytes). Each of the devices can utilize the full bandwidth, as the XBow router prevents collisions by being able to route between any two points.

800MB/sec is not a lot. And IIRC the bus from the CPU is 200Mhz and it's 64bits wide. HyperTransport on Opterons (for example) is 32bits wide both directions (so it's in effect 64 bits) and it runs at 1000Mhz. And they have several of those buses (to each CPU on the system).

I hope that I'm not sounding overtly negative. I absolutely adore SGI-hardware. But it just seems to me that when actual bandwidth is concerned, SGI isn't competetive anymore. Yes, they have great engineering in there, but PC's seem to have more actual bandwidth, both in the buses, in the RAM, in the expansion and in the cache.

Quote:

They're slow, but due to the sheer size of data that can be shifted around the system, it still outclasses modern-day systems.

But do they, really? It seems to me that modern machines have lots more memory-bandwidth, FSB-bandwidth and expansion-bandwidth. SGI does have smart engineering (like I said), but is it enough? And does the SGI-machine really have more bandwidth?

Quote:

The achillies heel of an Octane (and an Origin), though, is the Compression Connectors.

I have heard quite a lot about those connectors.... are there any pics of them available?

Quote:

Think of it this way (and this might be a little exaggerated): Imagine an Octane as an 8" steel tube. You can roll a golf ball down this tube with relative ease, and with very little effort exerted. Now imagine a PC as a common-variety garden hose.

But are modern PC's "garden hoses" anymore? Looking at the buses on modern PC, they seem to have a metric assload of bandwidth available. The CPU-buses are very fast, expansion is very fast, RAM is very fast, cache is very fast (IIRC, the cache on R16000 runs at half the CPU-speed).

Of course, if we are comparing Octane (for example) to modern PC, we need to remember that we are comparing state-of-the-art PC to an old SGI-workstation .... But if we compare them to Tezro for example, we can still see some strange things: the CPU-bus on the Tezro is quite slow (as in: it has quite little bandwidth) when compared to PC-buses. I don't know about the RAM, but unless it has 256bit mem-bus, it seems to me that PC will propably have more memory-bandwidth as well.

But, like I said, the MIPS-landscape COULD be made competetive. How about on-chip full-speed L2-cache? Dual-core? Faster buses and more of them? Metric assload of off-chip L3-cache on 256bit bus (if vid-cards can have 512megs of uber-fast GDDR3-RAM on 256bit bus, why couldn't CPU's have 128MB (for example) of uber-fast L3-cache on similar 256bit bus? Price of the CPU-module would be lower than modern hi-end vid-card would cost)._________________My tech-blog | My other blog

800MB/sec is not a lot. And IIRC the bus from the CPU is 200Mhz and it's 64bits wide. HyperTransport on Opterons (for example) is 32bits wide both directions (so it's in effect 64 bits) and it runs at 1000Mhz. And they have several of those buses (to each CPU on the system).

Hypertransport is a point-to-point link.

Quote:

I hope that I'm not sounding overtly negative. I absolutely adore SGI-hardware. But it just seems to me that when actual bandwidth is concerned, SGI isn't competetive anymore. Yes, they have great engineering in there, but PC's seem to have more actual bandwidth, both in the buses, in the RAM, in the expansion and in the cache.

For practical purposes, this isn't really the case. Or discreet wouldn't still be selling turnkey tezros

Quote:

But, like I said, the MIPS-landscape COULD be made competetive. How about on-chip full-speed L2-cache? Dual-core? Faster buses and more of them? Metric assload of off-chip L3-cache on 256bit bus (if vid-cards can have 512megs of uber-fast GDDR3-RAM on 256bit bus, why couldn't CPU's have 128MB (for example) of uber-fast L3-cache on similar 256bit bus? Price of the CPU-module would be lower than modern hi-end vid-card would cost).

None of these things will ever happen. SGI has completely, totally abandoned IRIX and MIPS.

800MB/sec is not a lot. And IIRC the bus from the CPU is 200Mhz and it's 64bits wide. HyperTransport on Opterons (for example) is 32bits wide both directions (so it's in effect 64 bits) and it runs at 1000Mhz. And they have several of those buses (to each CPU on the system).

Hypertransport is a point-to-point link.

yes it is. One point goes to the nortbridge, and other points go to the other CPU's. Is it somehow different in SGI?

Quote:

Quote:

But, like I said, the MIPS-landscape COULD be made competetive. How about on-chip full-speed L2-cache? Dual-core? Faster buses and more of them? Metric assload of off-chip L3-cache on 256bit bus (if vid-cards can have 512megs of uber-fast GDDR3-RAM on 256bit bus, why couldn't CPU's have 128MB (for example) of uber-fast L3-cache on similar 256bit bus? Price of the CPU-module would be lower than modern hi-end vid-card would cost).

None of these things will ever happen. SGI has completely, totally abandoned IRIX and MIPS.

Yes, I know . But the point was that there's nothing in MIPS as such that dooms it to irrelevancy (on computers that is)._________________My tech-blog | My other blog

None of these things will ever happen. SGI has completely, totally abandoned IRIX and MIPS.

Which is sad. It looks like they're slowing down on IRIX releases, two a year now from what I hear. Likely, that will fall to once a year after 2-3 years, then it'll stop at some point in 2010-2015, assuming SGI isn't bought out by some other company before then.

The plus side of this is, we'll hopefully see a quicker drop in price on the highend hardware so hobbyists can get it (hopefully before the damned resellers), and maybe even better, release of all the old company confidential info detailing every little nook and cranny of their systems.

yes it is. One point goes to the nortbridge, and other points go to the other CPU's. Is it somehow different in SGI?

Yeah. That's the whole point. It's a crossbar switch. Any two or more components can communicate at full speed without slowing everything else down.

It's a mesh topology.

Quote:

Yes, I know . But the point was that there's nothing in MIPS as such that dooms it to irrelevancy (on computers that is).

High-end MIPS chips don't sell. That's why SGI ended up developing them in the first place. They spun off all of the low-end MIPS stuff a long time ago. MIPS, Inc. makes a hell of a lot more money than SGI does.

For good, ill, or otherwise, commoditization is what makes money.

(P.S.: The next-generation MIPS chip is finished. They got as far as working, full-speed, production-yield engineering samples, then shitcanned it.)

It's much harder to sell a RISC machine running at a lower clock speed, than the modern AMD and Intel CPUs, which run at 2 or 3 GHz.

The fastest MIPS chips I've heard of, is the Broadcom SB1, and one of the PMC-Sierra series ... which were capable of reaching 1GHz._________________Stuart Longland (a.k.a Redhatter, VK4MSL)
I haven't lost my mind - it's backed up on a tape somewhere...

yes it is. One point goes to the nortbridge, and other points go to the other CPU's. Is it somehow different in SGI?

Yeah. That's the whole point. It's a crossbar switch. Any two or more components can communicate at full speed without slowing everything else down.

Um, isn't that the way AMD's work? The northbridge has individual links to each CPU's, so each CPU has dedicated pipe to the northbridge. Compare that to Intel for example where the CPU's share one bus between them. And each CPU has a dedicated bus to each of the other CPU as well. AFAIK, SGI-system did not, they all talked to each other through the northbridge, correct?

So I fail to see the difference between "Crossbar switch" and "dedicated hypertransport-links between different parts of the system". When the Northbridge communicates with CPU 1 in AMD-system, it does not slow down northbridges communication with CPU 2, since they each have dedicated bus. ANd when CPU 1 accessess it's RAM, it does not slow other CPU's down, since every CPU has a dedicated bank of RAM at their disposal (and they can also access the RAM on other CPU's). And when CPU 1 talks to CPU 2, it does not slow down the communication between CPU3 and Northbridge (for example)_________________My tech-blog | My other blog