Posted
by
Unknown Lamer
on Tuesday May 24, 2011 @06:29PM
from the runs-quake-at-ten-billion-fps dept.

An anonymous reader writes "Supercomputer giant Cray has lifted the lid on its first GPU offering, bringing it into the realm of top supers like the Chinese Tianhe-1A"
The machine consists of racks of blades, each with eight GPU and CPU pairs (that can even be installed into older machines). It looks like Cray delayed the release of hardware using GPUs to work on a higher level programming environment than is available from other vendors.

Physics simulations involving discretized partial differential equations can make any machine less powerful than a Matroshka Brain cry uncle. If some of my optimizations work out, I'll be able to get near-slideshow framerates on a 512x256 2D simulation of a single, ideal, purely hydrodynamic fluid using nVidia's top of the line C2050 GPU.

Now consider that I'd prefer to have at least a thousand cells per side in all 3 dimensions, which makes the problem ten thousand times larger, preferably several thousa

A Beowulf cluster of Beowulf clusters is not a Beowulf cluster, it's a multidimensional Beowulf cluster.Likewise, a BOINC of Beowulf clusters, or a "jagged Beowulf cluster", is not just a Beowulf cluster.

You'd want to make a MOSIX cluster of Beowulf clusters, so as to allow for each cluster to appear as a node without any conflicts. To make it 3D, you'd use a Kerrighed cluster of MOSIX clusters of Beowulf clusters.

I did some rough calculations regarding NICS's Kraken Cray XT5 and bitcoin mining. FYI, The Kraken was the 8th fastest supercomputer in Novermber of 2010. I determined that if the supercomputer put forth all of it's resources to mine bitcoins, it could generate 1,511.61 per day (or about $8,450.53/day). Granted, the Kraken has just regular CPU's doing the calculations. I could only imagine what a Cray supercomputer with GPU's in it would be capable of...

Uh, no you couldn't. The rate of bitcoin creation is fixed (it's about 50 BTCs / 10 mins, for now). If you add more computational time the system will adjust and it'd become proportionally harder to generate them, so the global rate would keep stable.

So despite the 100 thousand-fold increase in mining difficulty in the past 15 months, the network continuously self-adjusts itself to issue one block of Bitcoins about every 10 minutes. The difficulty increase is entirely caused by users competing between themselves to acquire these blocks.

Ok, yes, but the difficulty would increase for everyone mining as well. Last I checked, the entire bitcoin network had a mining strength of 1,747 Ghash/s. The Kraken alone has about 367 Ghash/s. That's 21% of the entire network. With all that power coming into the network at once, you're still bound to make a TON of bitcoins, because you're essentially taking a substantially large portion of bitcoins from other miners. I did neglect to factor in the scaling of difficulty (and that's why I said it was a roug

That means with ten of these, you could have the majority of the compute power, enough to earn majority trust and poison the bitcoin system with your own fake transaction records and wipe everybody's funny-money into oblivion. I'm sure the NSA has enough compute power they could do that now if they wanted to crypto bitcoins instead of your emails for a day. The FBI would probably pay as much attention to a bad actor crashing a fake currency as they would someone hacking your WoW and selling your Traveler's

You need to join a pool. Acting alone, and at the current rate processing power is being added with all the pump and dump slashspam, you likely won't win a 50 coin fabulous prize if you left your computer running for four years.

The Kraken reportedly consumes about 2.8 megawatts of power, so assuming your figures are accurate, the power alone would cost about $6,720/day (at $0.10/kWh) for a "profit" of $1730/day. Factor in the fact that it's a $30 million machine with a very short usable lifespan (i.e. massive depreciation), and they'd be losing a ridiculous amount of money.

The fact that writing C and Fortran code using a message passing library constitutes a high level programming environment is a complete indictment of the sad state of parallel programming today.
Seriously, do you want to be programming complex parallel algorithms on HPC machines using Soviet Era technology? I've tried that and it made me want to jump out a window. It's about as easy to program in this type of an environment as it is to program an FPGA (hint: it's a pain in the ass).

You're forgetting things like PGAS and other higher-level parallel programming models. MPI is the dominant technology in use so these machines have to support it well. But they also support more future-looking tools.

Really, you've tried it and it made you want to jump out of a window? OpenMP is an extremely simple, easy to use add-on to the C language. It is one of the two current standards used for parallelized scientific computing, and although it will eventually be succeeded by a language with more features, it will be difficult for its successor to match its ease and workmanlike grace.

I honestly have trouble believing someone could have much difficulty with it. If you want to have the work in a "for" loop paralleli

Maybe so, but the comment to which you replied, and with which you disagreed, was specifically about "using a message passing library." That's MPI, not OpenMP. It's like responding to someone saying, "I don't like spam!" with "But grilled cheese sandwiches are so much tastier when you put ham on them, so clearly you're wrong!" Your statement may be technically correct, but as a response to the topic at hand, it is in error.:-)

Yes, and it works just fine; the only issues would be if I got my placement wrong via poor layout/blocking or I neglected to upc_memget something that I needed intense access to but for some reason couldn't make local in the initial layout. Neither of those require much forethought at all to avoid.

It depends on the task. Some of them are not complex. Some things are so embarrassingly parallel that you just tell the first node or whatever to apply a function to the first lot of data, feed the next lot to the next node and so on - then just concatenate the results together at the end. There's a lot of stuff in geophysics like that, for example - apply filter X to twenty million traces (where a trace is just like an audio track). You could do that with twenty million processor cores if you had the

Nothing but a complete moneypit if we have no actual experiments to run on them that require that kind of scale

And we have plenty, the big ones off the top of my head being nuclear weapons work (as we've replaced live tests with computer simulations entirely), protein folding, climate modelling, and signals intelligence processing. I'm sure other./ers without your childishly narrow experience of the world can think of others.

So I guess the second part of the question is. Have the HPC libraries been ported yet. I have heard one of the big reasons that Fortran is still so popular is the large library of highly optimized HPC libraries. The other reason is that Fortran is supposed to be really easy to optimize which I can believe.

Uh, Cray already has machines in service that blow Tianhe-1A out of the water on real science. Tianhe-1A doesn't even exist anymore. It was a publicity stunt. Cray is already making the top supers. It's others that have to catch up.