Slashdot videos: Now with more Slashdot!

View

Discuss

Share

We've improved Slashdot's video section; now you can view our video interviews, product close-ups and site visits with all the usual Slashdot options to comment, share, etc. No more walled garden! It's a work in progress -- we hope you'll check it out (Learn more about the recent updates).

SLiDERPiMP writes: "Yahoo! News is reporting that HP created an 'off-the-shelf' supercomputer, using 256 e-pc's (blech!). What they ended up with is the 'I-Cluster,' a Mandrake Linux-powered [Mandrake, baby ;) ] cluster of 225 PCs that has benchmarked its way into the list of the top 500 most powerful computers in the world. Go over there to check out the full article. It's a good read. Should I worry that practically anyone can now build a supercomputer? Speaking of which, anyone wanna loan me $210,000?" Clusters may be old hat nowadays, but the interesting thing about this one is the degreee of customization that HP and France's National Institute for Research in Computer Science did to each machine to make this cluster -- namely, none.

Actually, the best MIPS/Wh is probably with the slower versions of the current laptop chips. Maybe portable G3/G4?

Also, I don't think you'd get much useful stuff done with early Pentiums and 486. Consider that a P4 2 GHz has 20 times the clock speed and probably does twice as much per cycle, so it's ~40X faster. Now, if you connect 40 P100 together, unless your problem is completly parallel (like breaking keys, as opposed to most linear algebra), you're going to lose at least a factor of 2 there. This means that in order to equal 1 P4 @ 2 GHz, you'll need almost 100 Pentium 100 Mhz. This means that 10 P4 would be like a thousand Pentiums. At these numbers, it's going to cost so much in networking and power...

I'd say (pure opinion here) the slower you'd want to have today is something like a Duron 1 GHz and the best MIPS/$ is probably with a bunch of dual Athlon 1.4 GHz (A dual is not cheaper that 2 single, but you get more done because of parallelism issues).

My experience doesn't suggest that the P4 does twice as much per cycle. I'm seeing P4s do a fair bit less than the P3 per cycle, and the P3, P2, and PPro cores didn't seem *that* much faster per clock than the original Pentiums. My gut tells me that the P4 doesn't do any more than the original Pentiums per clock cycle, and the only thing they have going for them is Intel's ability to manufacture them at high clock speeds.

If you really want a cpu that does a lot in a single cycle, look at the IBM POWER series. IIRC, on the floating point side, a 2xx MHz POWER III is darn not too far from an Alpha 21264 at 733 MHz. And now there are 1.1GHz and 1.@GHz POWER IV chips, in the new IBM p690 machines. I don't know how they compare to the POWER III per cycle, though, because the POWER IV opens a whole new (good) can of worms.

I'm seeing P4s do a fair bit less than the P3 per cycle, and the P3, P2, and PPro cores didn't seem *that* much faster per clock than the original Pentiums.

For "unoptimized" applications, that may be approximatly true. However, if you're going to build a cluster, you're also going to optimize your code for it. What kills the P4 is branch misprediction. However, by carefully writing your code, you can avoid most of these problems. Also, most of the big clusters are for numerical code, for which branch predictions does well (plus you can do lots of loop unrolling).

Another thing that the P4 (and PIII) has is SSE. On a P4 2 GHz, you can do theoratically 8 gflops. In practice, if you write good code, you'll bet between 1 and 2 gflops. On a plain pentium, the FPU is not pipelined, so a P100 (I'm guessing here) probably has a theoretical maximum of ~25 mflops with a performance for real code around ~10 mflops. That means the P4 is probably 100-200 times faster at floating that a P100.

Of course, you're right in saying that other architectures are probably faster that P4....and by the way I'm not saying that the P4 is great... but if you're doing numerical stuff and using SSE, it's VERY fast (in my experience, 3DNow! has been faster than SSE at the same clock rate, but 2GHz is too much higher than the fastest Athlon).

My experience with optimizing research code for a particular platform: it will never happen, unless you're starting from scratch and expect your code to have a short lifetime. We've got libraries that are several years old, written to be portable across various unix and Windows platforms, running on MIPS, Alpha, x86, SPARC and PA-RISC. These libraries aren't optimized for any particular platform, and nobody has time to mess with platform-specific optimizations.

I've never tried optimizing for SSE, but someone in lab did once. He reported higher performance when doing his computations element-at-a-time than vector-at-a-time. His conclusion, for his particular application, was that memory latency was killing SSE. He was better off doing lots of work on a few numbers, than he was doing fancy stuff to a lot of numbers with SSE. On the other hand, some people have had some luck with SSE optimized FFTs, or so I've heard.

At 2GHz, I'll bet that you're better off doing element computations than vector computations because of the radical difference in memory-versus-processor performance, if the P4's L1 take more than a cycle to feed the registers. Otherwise, do whatever fits in L1 and can be prefeteched -- like elementwise computations on long vectors. Anyone have any real or anecdotal evidence to refute or support this theory?

In the end I think that platform-specific optimizations are a waste of time for research code. I seem to remember some people eventually including hooks in BLAS or LAPACK to allow the user to specify cache sizes; and FFTW does some runtime experiments for optimization. But my guess is that, overall, SSE, 3DNOW, and even AlitVec are irrelevant to most computer researchers. I'll bet their highly relevant to most embedded engineers or many robotics researchs, i.e. people targetting specific hardware.

I've never tried optimizing for SSE, but someone in lab did once. He reported higher performance when doing his computations element-at-a-time than vector-at-a-time. His conclusion, for his particular application, was that memory latency was killing SSE.

Maybe his problem was really special, but most likely he didn't know how to write SSE code. First, if you write your code correctly, the worse you can do is a bit better than the x87, because you can use SSE with scalar and take advantage of the linear register model (as opposed to stack).

The only time I've converted some code for SSE, I got a 2-3x improvement in speed. There is one thing you really have to be careful when writing SSE code: ALIGNE EVERYTHING TO 16 BYTES. That's very important. The "movaps" (move aligned packed single) is more than twice faster than the "movups" (unaligned) when data is unaligned (movups on aligned is not too bad). That makes all the difference. Also, sometimes you need to change the way you do your computations.

For the case I have (optimizing a speech recognition engine), just by changing the order of of the loops (inner vs. outter), we got a 2-3x improvement (still with x87) because of increased L1 hit rate. Then when switching to SSE, it ended up to a 5x improvement over the original code. Had I just re-written the code in SSE (with cache optimization), the gain would have been around 25%, because memory would still be the bottleneck.

As for libraries not being optimized, just look at FFTW (www.fftw.org) or ATLAS (www.netlib.org/atlas) and you might change your mind.

On the other hand, for a weather simulation, I would bet on the cluster.

No way. Weather sinulations involves lots of linear algebra on huge matrices. When you parallelize that, you need a lot of communication between the nodes. With a cluster of P100, communication will kill you right there (10x of so penalty). It's not that much the network bandwidth, but the latency. Weather simulation is one of the hardest problems to parallelize and that's why until recently, SMP was prefered to MPP (and of course clusters of small workstations).

As for the memory bandwidth, depending on the problem, sometimes the L1 is really effective.

Actually I worry that people like Saddam Hussein might get ahold of enough parts and knowlege to build a system specifically for modeling atomic weapons design. He's tried it before when he attempted to get hundreds of PS2's when they first came out.

Actually, the worry about the PS2 machines was that their imaging capabilities are strong enough to be used in the missile guidance systems. I think he never actually attempted to get any of them, but US blocked shipments to Iraq just in case.

It is true that the first nukes were developed without "a Beowulf or a Cray"- BUT, to develop good nukes without doing lots of tests (and the U.S. led the world in sheer number of tests) you need fast computers. To develop small nukes you definitely need fast computers... Hence the paranoia over supercomputers in the Wrong Hands(TM).

Or terrorists could use the Athlon cluster for a more insidious plan. Simply plug them in, turn them on and with the juice they draw, black out any major metropolitan area of their choosing, oh the horror! *g*

>Should I worry that practically anyone can now build a supercomputer?

Unless "practically anyone" has the funds, the storage room, and the manpower to maintain this monstrosity, there is nothing
to worry about.

It's only 256 machines, how hard could it be to steel(steal?) 256 computers. Rip off a couple of Gateway (are they still in business) and Dell trucks, maybe rob a few schools. Now that you've gotten (by fair means or foul) a super-computer that is rated in the top 500, what would you like to do with it? How about any brutt force crack that you feel like (it's not like you need some neat algorthyms with that much firepower)? How about generating insane encryption levels that FbI and CiA can't crack?

Of course, about 256 people with laptops and wireless modems, all running mandrake, sitting in the train station solving tough mathematical problems just annoy people....

Well, it seems like super clusters are becoming very easy to build hardware-wise. If you throw enough commodity at a problem, it becomes easier. I would think the biggest problem with supercomputers is no longer the hardware itself, but networking, and the programming to take advantage of the hardware. These computers still only really work for something that distributes easily. The biggest factors are now the ability to distribute, and schedule work for each node. The more nodes you engage, the more you hope your problem is CPU bound, so it will scale more.

Data transfer and message passing are such a big issue I belive the most important developments are in the networking topologies and hardware for these environments.

How powerful standard desktop computers are. There is only two orders of magnitude between a normal desktop computer (I refuse to call a Pentium III 733 as outdated) and a mainframe computer.

Now all we need are ways of getting local connections significantly faster (Did someone say Gig Ethernet) to allow faster communication between the nodes and we will be able to scale beyond several hundred and break the top 100. I hear 1gig NICs will be falling in price to under $100 US retail soon...

No... What really is needed is not more bandwidth, but lower latency (more bandwidth IS nice, however)... That way we could have hard real-time distributed supercomputers. Wouldn't mind one or two of those myself...

Well here I thought we were staying with commodity hardware, but if we are going to go beyond commodity hardware into stuff that is engineered to provide low latency, high bandwidth, lets look at technologies like Infiniband [infinibandta.org] which is engineered to avoid almost all OS latencies by delivering data directly to applications from hardware with OS bypass (almost no software between your app and the hardware anymore), and at 2 Gb/sec it isn't too bad on the performance scale. However I am betting that when this stuff comes out it will be a little pricey and go slightly over the 210,000 dollar budget

The problem with Ethernet in clustering isn't bandwidth, its the latency.

The real issue is how parallel-efficient your algorithms are. We do molecular dynamics (MD) on large clusters, and we can get away with slow networks because each node of the cluster has data that is relatively independent of all other nodes - only neighboring nodes must communicate. If you have a case (and most cases are like this) where every node must communicate to every other node, it becomes a more difficult problem to manage. To deal with this, you need a high-speed, low-latency switch like the interconnects in a Cray. The only real choice for that is a crossbar switch, like Myrinet.

I'm curious. Given that neigbor communications are the most important, how do you network the machines? I mean, it seems that a useful design might be to have extra network cards in all the machines, to make overlapping network topologies reminiscent of the physical dependencies.

Yes, you should probably worry that practically anyone can build a supercomputer. But you could mitigate all that fear with the fact that not practically anyone can whip up software that takes full advantage of it.

Thank god there isn't any off the shelf "missile trajectory" software in the CDW catalog. you would hope that any society that can whip together motivated coders to write such code already has access to some pretty spiffy kit.

(yeah i said "kit"... and I'm from Chicago... I feel like such a wanker.)

As we all know, "kit" is a british slang term for computer hardware. What many people may not know is that it is also the secret weapon in a British campaign of cultural assimilation.

Yes, you heard me right. Cultural assimilation. The brits are sick of seeing Mickey Mouse and Donald Duck and the sexy chick from Enterprise on TVs all over the world, and they're going to do something about it.

The British invented the English language, and in many circles certain British accents are percieved as more sophisticated or upper-class. They're capitalizing on that by inventing slang terms - "kit" being among the forerunners - that other English-speaking peoples appropriate. Thus is begins.

Soon, British TV will move off of PBS, where it belongs. British computer games and hardware will surpass American in popularity. And there is nothing - absolutely nothing - we can do about it.

(In case you hadn't realized it, yes this is a joke. And yes, I know it's offtopic and will be moderated as such. But this was fun to write.)

No it isn't. That's just the only context in which you've heard it used (translation: you read too much Slashdot, and should get out more often). "Kit" is the British equivalent of the American "rig" when used in this context. It is not used specifically to refer to computers.

I refer to "the british" meaning the residents of the island now known as England, not in the sense of citizens of the modern political entity. I probably shouldn't have done that, but hey - it wasn't meant to be accurate anyway.

I have not had the chance to play with Beowulf clusters at all. Do I still get a local desktop on certain clustered computers ???

The ultimate Linux selling tool, every linux box in your company is a node in a cluster, add a few servers for extra speed, add a few computers to provide file I/O and backup capability, and you have one of the fastest supercomputers available to your company without having to spend an extra dime (everyone needs a desktop anyway). Can you imagine the extra cycles available for simulation, whatever when people start going home at 5 PM.

Absolutely none I would hope... The dataset resides on a centrally managed server, and because they are running a Linux desktop I get to laugh at what a trojan horse virii can do to a user account on a Unix box. This can also be removed as a problem by putting a keyboard, mouse and monitor on the desktop and locking the PC into a cabnet under each desk... What the user can't touch the user can't screw-up

That is a serious problem though, and one I assume Beowulf clusters will take care of, what if a node goes down in the middle of processing, how does the cluster respond to it ?

Software such as Platform's [platform.com] LSF take care of this magicly... it even allows for checkpointing, assuming your task allows it.
Because my render software didn't really do checkpointing, I had to add that in to my wrapper code.

We do use desktops at night to work with our render farm. Platform has some cool tools to work with for such environments. I have never tried LSF in conjunction with PVM or MPI but they have support for it, so I imagine it does pretty well.

You'd think a top computer manufacturer would be able to beat out a part-time dictator from the third-world in gigaflops, but I'm thinking it was more a demonstration by HP that they're getting set to embrace Linux and shelve HP-UX.

They mention in the end that they are working with Microsoft to support this approach. They also suggest using spare cycles. Unlike SETI@home, where you download some stuff, work on it, send it back, this appears to be a system where the power scales linearly with nodes.

Windows support makes a difference. Take a large company (10,000+ in a single location) that has some intensive projects. In this case, they could just drop the $210,000 (call it $750,000 with installation, support, etc.) and put it in a room.

However, a smaller shop, say, 50-250 employees, being able to install this software on the staff's machines. They rarely use their computers to capcacity, and can probably contribute 90% of the CPU 90% of the time. This approach could let people doing giant calculations do so cheaply.

The real question, however, is who needs that kind of horse-power. For those that need the horse power, is the savings with off-the-shelf components meaningful.

Its a tremendous accomplishment, and I wonder how much of the changes were new (vs. Beowulf clusters that we always hear about). However, if this fills a need, congratulations, its an impressive accomplishment regardless.

Uh, I was doing this in the early 90's (as were many others). The idea of using idle cycles from your workstations is beyond old. Is it somehow newsworthy because HP did it? The article makes it sound like a revelation. I'm willing to bet what I was doing was more sophisticated. My processes would relocate themselves whenever a regular user logged in and would even save the system state to prevent any lost work. Hmmm... sounds like a nasty virus! And while I'm at it, Beowulf was nothing more than rehash as well. How far back does PVM date? Guess it is just because the name sounds cool.

Kinda depends on what you do for a living doesn't it? I'd like to see you predict the weather acurately (well, as I guess I should say "as accurately as the standard weather forcast") over the coming week or two without a supercomputer. Or design an aerodynamic car. Or an airplane. Or a new medical drug. Or a spaceship. Or any number of defense/military applications. There is plenty that a supercomputer can do that people do on a day to day basis. You just have to be in the right line of work. (And by the way, a supercomputer *can* do everything a desktop machine could do, it'd just be rather pointless to use a machine that big as a desktop...)

The individual machines that made up the I-Cluster are now out of date, each running on 733MHz Pentium III processors with 256MB of RAM and a 15GB hard drive. HP introduced a faster version at the beginning of this month and will launch a Pentium 4 e-PC by the end of the year.

this kind of hardware is out of date? unless i'm mistaken HP markets these e-PCs toward home users looking for light processing power, such as the ability to view web pages, read emails, and play solitaire. this looks more like a power-user rig, or something a gamer would have as a decent Q3A machine. how in the world could this hardware be obsolete? i guess i should replace the pentium III 933 i'm running because lord knows it just won't hold up to today's high powered apps! man it's almost a year old, i should start worrying...

It's probably out of date because processors that speed are either already unavailable or will be shortly. They could presumably underclock, but it makes more sense to just tweak the model number slightly.

What I'd like to see is a shot at a distributed supercomputer cluster utilizing the spare cpu cycles of computers on high-speed internet connections (cable or DSL). Since efficiency would be remarkably degraded by slow communication times and the fact that many of these computers would be running Office (ahem), you'd have to scale up at least one order of magnitude.

Technically I can't see why this wouldn't be feasible. It would be beyond SETI and protein folding in that the 'control center' could change what problem was being worked on at any time. It may not be incredibly practical compared to setting up specific machines in a single large room, but it would be free and have a potential user base in the hundreds of thousands or millions.

Imagine: instead of the same SETI screen output time and again, you'd get a message on your SS saying "would you like to see what your computer is working on right now? How about high-pressure fluid dynamics in environment x?"

Actually, this is exactly a project I've been working towards this last 6 months. Granted, I'm not an expert in the arena of parallel algorithms, but I've done some reading and have some legitimate exposure to load balancing complex jobs across workstations in my current employment environment.

That being said, my approach has been to design a C/C++ like language that intrinsically understands multithreading and remote processing, but has a minimal standard library, and is by default intrinsically secure by design. People running the client software can allow default security privileges for anyone to run under, or can set up specific execution profiles for certain people or programs, controlling cpu load and even restricting the hours they share out their machine, etc. Basically, think SETI@Home, except with any program that might be written in the language automatically taking advantage of all available clients.

Trust me, there's a _lot_ of technological wrinkles to iron out in the language to make it safe to run on another person's machine without authorization, as this is the heart and soul of the project. Rather than reinvent the distributed processing technology for many different purposes and try to get many people to run several clients, it's better to get a single client that works for all sorts of problems. I think the time is now to prepare to capitalize on spare processing power. As we plunge further into the information age of cheap hardware and don't seem to require much more power to do the same tasks (unless you run Windows XP), the ratio of spare cycles to used ones will only increase over time.

The topology of a network of distributed clients over (potentially) slow connections makes it undesirable for solving some kinds of problems, though. What it's _not_ good for are things like converting video files from one format to another, like the HP I-Cluster is. Making it good at such tasks means slackening the security aspect a bit... And even then, slow transfers can invalidate even a fast remote CPU.

Of course, I'm always looking for funding. I'd love to make this a full time job in the future. Know anybody with loose purse strings? Didn't think so.:-/

I wasn't able to get hard facts about this, so I'm going to throw out the question for general "gee whiz" value.

I was pondering the computrons per watt of a cluster such as this versus a real honest-to-Bob supercomputer (Something from Cray/Terra/SGI, for example). we can assume that each machine in HPs cluster uses probably 60-80 watts (because they're sans monitor), so youre looking at about between 1.2 and 1.8 kilowatt hours to power this thing. I'm not sure what a Cray TSE [cray.com] uses, but I have to think it's nowhere near that because of all the redundancy that PC clusters use (one Power supply, chipset, etc per Core).

Though, I'm sure if you can afford either a Cray or 256 PCs, you can afford the power bills, too. If you have to ask how much it will cost you, you can't afford it. But while CIP (Cluster of Inexpensive PCs) is cheaper, is it as efficient?
.

Regarding efficient, in a word: no.
These systems, much like an OS/390 are written and built around a whole different theory of architecture. They are also often custom made for specific solutions. The system for 'missile guidance simulations' would have different cards, different chips, and way different software from the system for 'global thermo-nuclear warfare simulations'.

with an efficient cooling system of network booting pcs (no hard drives, no moving parts except for fans) then you have a bank of systems which is not only incredibly cheap and off the shelf but efficient as well

using a bunch of those 1U dual athlon rackmount boxes for this? seems like it would reduce the overall footprint by several orders of magnitude, as well as easily doubling (if not tripling) the power. comments, anyone?

Hey remember all those completely and hopelessly out of work Russian PhD CS
grads sitting around and starving and writing strong crypto software for the
Russian Mafia? You might even have heard that the Russian Mafia is always looking
to explore new business ideas and strategy.

Well hell wouldn't this be a great business opportunity for both of them?Call
it RMBM (Russian Mafia Business Machines), and then build cheap super-clusters
and turnkey code for "specialized" clients. The possibilities are
endless.

This is where you get them now: Support. You sell them the machines at a 25%
markup and then charge a ridiculous annual service agreement.

From the presentation:
"Using "borrowed" Post-CCCP Mi-8TV assault/commando choppers
RMBM support staff can be deployed to your corner of the desert in a matter
of hours! Lets see IBM match that! Not even Larry Ellison and his personal Mig
can touch that! (canned laugh track)"

Anyway... I just got a job working with the NCSA as part of a project called OSCAR [sf.net]. It's basically an open-source solution to the problem of creating a cluster. I'm part of the team working on documentation and training materials for people trying to implement OSCAR. I can say, from my own experience, that installing a cluster (even only with 4 PCs) is not a simple task. OSCAR is still young as far as software development goes, but it will do the job well once it is finished.

Maybe I've got a different view on things, but I find it pretty funny when a guy gets all excited about calling a Mandrake cluster a supercomputer, then gives a blech! when he announces they used e-pc's. Does it matter?

So maybe by using e-pc's the peak performance went down some, but anytime you tie anything together in a clustered environment the sustained performance dies (not just takes a hit) too. Only way to make it hit close to peak is assign each node a process that doesn't require any interaction between processes/nodes. In that case, you wouldn't need to tie them together to make a top-500 cluster. Just assign some IP's, cross mount their filesystems and call it a morning.

Besides, government agencies (and scientific companies) are beginning to realize that when you cluster 500 boxes together, you're still administering 500 boxes. When real supercomputer companies make real supercomputers, you've only got to administer one computer. Maybe that's why the term "Supercomputer" isn't plural.

If clusters keep on being called supercomputers, you might as well call "the internet" a supercomputer too since it's an environment where lots of computers are connected and running processes that don't depend on another. "Wow! Look at the sustained power of all 5000 Counter-strike servers out there! It's a super-counter-strike-computer!"

I'm a student at a vocational college in New Hampshire, and I'd like to get a Beowulf cluster set up here. The obvious question, though, is how do I convince the local administration to go for it? Any suggestions? I'm thinking of having it be on a donation basis, although of course any support the schol can give will be supported.;D I'd deeply and sincerely appreciate any suggestions.
Dave

You know this Beowulf business is getting to be pretty staid and routine by now.

In fact, I'd almost say it would be newsworthy if there were any organization (university, company, govt lab) that had not yet built "a supercomputer from the COTS components".

What I'd like to see now is more metrics (some of which the article does, admittedly, reveal).

hardware cost per FLOP (everyone already tells you this)

FLOPS per human time to build

FLOPS per sysadmin time to maintain

FLOPS per kilowatt of electricity

FLOPS per cubic foot of rack space

can it run smoothly if Bad Andy goes behind the rack and unplugs a few network connections, a few power cords to some nodes?

Everyone knows that you can spend your own time scouring dumpsters for cast-off computers and coaxing them to life, bringing up an old 486 with an ISA 10bT card as a member of your cluster. Unless you're doing it for your own educational benefit, it's just not worth it.

Don't get wrong. I love these clusters and want to use them. It's just that, in 2001, their mere existence is no longer as exciting as it was in the mid 1990s.

Now days, I care more about ease of use and ease of maintenance, taking the low cost of a Beowulf cluster as a given.

With the size of these clusters going up and the ratio of hardware cost to human time constantly decreasing, I'd be more impressed to see how a system with many hundreds of nodes was brought up in a short time, never rebooted for a year, even as 13 of the nodes developed variously problems and become unproductive members of the cluster.

A more daunting task might be taking the model to a consumer environment, which, Richard
pointed out, is full of often dormant processors like those in printers and DVD players.

HP imagines "clouds" of devices, or "virtual entities," which could discover and use the resources around a user.

Anyone else reminded of A Deepness in the Sky by Vernor Vinge? In that story, IIR, one of the protagonists controlled a ubiquitous cloud of invisible compute "particles". Each particle was networked to the rest through its neighbors floating in the air around it.

I fail to see what is impressive about this.
It looks like the wheel reinvented several times.
For cluster installs on several machines, use system imager [systemimager.org] .
For using and controlling a cluster of machines for various taskes, use LSF [platform.com] .

The number of machines is pathetic too... 225 @ 733 mhz? That makes it to #325?
How sad. I need to bench mark our render farm (200+ boxes, 120 are dual 1ghz) and see what we can come up with. I know it is higher than that... and we have a smaller install for the industry.

I looked for info to spec our machines but I couldn't find any info.... any help?

Since you could now build a Supercomputer with OTC stuff (well, Im sure you could before today as well), will that make software like this (read: Beowulf) fall under export laws preventing exportation to countries like Iraq and China?

No, a beowulf cluster is the last thing that one would use for nuclear simulation.

While great at highly parallel tasks that require very little synchronization between threads (think code cracking), nuclear testing (and almost all other fluid dynamic problems) generally requires all of the cpu's to have high speed access to all of the memory. So one needs a huge shared memory system (think Cray or Sun StarCat).

And for this reason, I find the top 500 list to be a bit misleading in these days of massively parallel systems. Its great as a test of how many flops the system can crank out, but it does not take into account the memory bandwidth between the cpu's, and that is often more important than raw cpu horsepower.

I find too that people assume that an "X" type of cluster will solve all problems, regardless of what they are. Each cluster type serves a purpose. Cray and then SGI spent time developing the Cray Link for a reason. Sun, IBM, HP and others have gotten into the game as well. Sometimes you need a ton of procs with access to the same memory, sometimes the task divides well.

I see this from almost the opposite side of the spectrum with rendering. To render a shot, you can divide the task amongst ignorant machines. They just need to know what they are working on. The cleverness goes into the managment of these machines. A place where the massively parallel machines would be nice is rendering a single frame. After the renderers initial load of the scene file and preping for render, the task can be divided amongst many processors on the same machine. To divide it beowulf style would throttle the network with the memory sharing over the ethernet ports.

Yes, we all saw the Apple ads for the G4 being capable of 1GFlop. What you didn't see, was that the Pentium III 500 was capable of ~2GFlop. Now that can run an 1GHz. You also didn't see that AMD's Athlon, having a superscalar FPU, is faster than a P3. And now they can run at 1.6GHz. The P4 has new instructions to speed up certain types of multimedia processing as well. By contrast, the G4 is only now approaching 1GHz. Go figure (as you Americans say..:o)

An Apple is not a supercomputer.

RISC does not mean faster. It allows for simpler design which can lead to increased speed, but as we have seen, Apple have consistently failed to compete with Intel and AMD (not that they even make thier own chips...). CISC is actually a good idea, since with the huge speed differential between CPU and memory, and the introduction of cache, the bottleneck in any system is the memory bandwidth. Think for a moment : why did Intel add instructions to the x86 architecture in every iteration? Because its faster having one instruction doing something complex, than many simple ones, simply because of the reduced frequency of memory access. In todays computers, RISC doesn't mean anything, since memory, storage and network bandwidth is the bottleneck.

The moral of this story:
1: Don't believe Apple's advertising.
2: Don't believe what a Mac Zealot will tell you about RISC or some other claptrap.
3: Get ppc Mandrake if you're unfortunate enough to have actually bought a G4.

Yes, I use Macs. Daily. And I hate Apple. But my PHB is a Mac zealot. It frightens the hell out of me seeing all our company's work being stored on a Mac (OS9 (no pre-emption, memory protection, RAID, journalling, or anything you would want for a server...)).

While I agree with what much of flegged said, his/her post implies that modern Intel/AMD CPUs -are- largely CISC devices. This simply isn't the case. Both (the AMD moreso though) make heavy use of RISC-type design and technique.

nonsense. It is not the superiority of cisc, it is the superiority of amd and intel. Opposite is also nonsense, cpus do not directly execute instructions for the last 20 years or so, there is always a layer for translation to simpler operations. It is not the architecture that counts- after all every current cpu is risc in a sense- it is the implementation.

True. Every computer interprets instructions, right the way down to the simplest level of switching a gate on or off.
The difference between RISC and CISC is the instruction set exposed to the programmer (compiler, interpreter, whatever), and thus the number of memory accesses needed to implement something complex. But if you were to believe Apple advertising (which anyone who actually buys Apples must do...), you would think less instructions is better.
I was simply demonstrating that anything Apple tell their lusers is believed, whether it's true, fabricated, or just hiding certain facts (like the relative floating point performance of the G4).

I bet you HP and many other tech companies have people who called the government telling them they should bomb "enemy" computers because they are "weapons systems." With this cluster, we see this justification could apply to any computer whatsoever.

Then, the US gets tired of bombing, and HP sells them new machines. Soon thereafter, we decide their new "good" dictator is just as bad as their old "bad" dictator, and the cycle begins again.

So the question then is, is this good for Open Source computing? I mean, this gives us dollar metrics like the IDG and other measurement people want, but the end work product couldn't be described as beneficial, so it's really not that good that this happens.

Of course, the same could be argued about a Win2K or WinXP hacked clone - but the utility in solving nuclear equations and modeling explosions is not as high.

We know that they have #3, this HP open source supercomputer may give them #2, now they only have to pick up #1 - maybe Pakistan or the Taliban have such and will sell them to raise cash or create more problems for us.

I honestly doubt that terrorist organizations would go through all the trouble of sending off people to get educated if physics, having them build a supercluster and compose simulation software, throw up and test some designs, etc. Much more likely, they'll buy finished bombs, or they'll at least buy pre-tested blueprints.

This might be different in for a country like Iraq who already have many educated physicists, and they have a realistic chance of actually doing all this work from scratch. Of course, the IEEE is doing random inspections there all the time, but maybe they could "disguise" their number-crunching supercomputer as a 256 separate workstation terminals for all the government clerks who write email. By night, thought, it's Linuxtime.

You're right about the missing material, but I'm sure someone somewhere will be willing to sell 10kg of it... (the more people we bomb, the more likely that seems).