Slashdot videos: Now with more Slashdot!

View

Discuss

Share

We've improved Slashdot's video section; now you can view our video interviews, product close-ups and site visits with all the usual Slashdot options to comment, share, etc. No more walled garden! It's a work in progress -- we hope you'll check it out (Learn more about the recent updates).

SethJohnson writes "Thanks to a $59 million National Science Foundation grant, there's likely to be a new king of the High Performance Computing Top 500 list. The contender is Ranger, a 15,744 Quad-Core AMD Opteron behemoth built by Sun and hosted at the University of Texas. Its peak processing power of 504 teraflops will be shared among over 500 researchers working across the even larger TeraGrid system. Although its expected lifespan is just four years, Ranger will provide 500 million processor hours to projects attempting to address societal grand challenges such as global climate change, water resource management, new energy sources, natural disasters, new materials and manufacturing processes, tissue and organ engineering, patient-specific medical therapies, and drug design."

If I had mod points, I'd mod this insightful, not funny. There are a lot of HPC projects that were planning to use Barcelona, that were held back by the TLB bug. I'm sure anything approaching this magnitude already had a contract with AMD that includes guaranteed delivery dates and penalties, either directly or through the OEM. If you don't have a signed contract with AMD or with someone who has one with AMD, you're going to have to wait in line.

Explanation: this affirmation that "a computer is so fast it runs an infinite loop in X seconds" is actually true. Integers overflow, if you increase the largest positive number you get a negative number. But of course, this program uses 32-bit integers, it would take four billion times longer running in 64 bits.

There's a lot of variables here. Sadly I've just spent the last 5-10 minutes of my life considering them all and was writing up a post about it. Then I realised that sometimes, I take jokes waaaaay too seriously.

Couldn't see details, but this may use Sun's hypertransport switch as an interconnect. Until Intel's next generation of chips with QPI, you couldn't do that sort of interconnect with Intel processors. Admittedly though, I'm not convinced that it is significant enough a benefit over recent Infiniband solutions despite the penalty of going through an Infiniband chip and then a PCI express controller.Even with the L3 errata straightened out, it still looks to be a rough road for AMD, who hasn't demonstrated

Power and cooling penalties? Are you looking at the same spec sheets I am because I'm seeing better performance per Watt out of Barcelona systems than out of Intel quad core Xeon's based systems. Most of it has to do with the fact that Intel uses power sucking FB-Dimms, but that's a design tradeoff that Intel made.

I'm saying in order to get the same number of flops and using price-performance instead of straightforward performance, you must increase node count. If the processor performance/watt *was* better (I believe with the 45 nm process on Intel's side for the moment, a Xeon 2.33 quad core comes in a 50W TDP variant, for example, Barcelona comes in at 95W TDP, goes along way toward offsetting the FB-DIMM power), you still have to worry about more AC power supply inefficiencies, general power usage of extra moth

There already is a faster computer. Ranger's peak is over 90 TF slower than BG/L at LLNL so unless there are some amazing efficiency breakthroughs (doubtful), it isn't going to be number one in June. That, of course, doesn't mean BG/L will still be number one either.

...How is this different from a regular supercomputer? Supercomputers are already parallel - this just adds a second layer of parallelism. Competing supercomputers have always had different economy on a "processor hour", multicore CPUs is just one technique to increase the processing power of the machine.

And you could *TRY* to build a 15,744 single core machine and claim the same performance, but it would all fall apart very very quickly when someone asks "how many FLOPS?" (which is what computing power

The 4 year lifespan in the/. article refers to the amount of time the award money covers for operations costs. So if it finds some others mean s of operation funds it could live longer... of course those funds will probably be from a private organization and the ranger would no longer be open for research.

Within four years the Performance/Watt ratio will have dropped compared to state of the art, so it would make very little sense to keep the thing taking valuable computer room space and working hours of the technical staff. It happens with all supercomputing machines, just Moore's law in practice. What I think is still a big problem is that there are still many problems getting the hardware work correctly in parallel. Often half a year or longer is lost debugging file system/network issues, which is a consi

Well, I was talking out of personal experience;) Not at TACC, though. Debugging such issues on these systems is always very difficult. There are a lot of users, all have different programs and it might be just a bug in the user's program. And then there are a lot of incompatibilities between several compilers and network systems (infiniband, etc).

I guess the field is just too fast-moving to force some hardware/software standards on it, but its bugginess does cost a lot of money and computing time, and si

Seriously though, this money comes from an NSF grant earmarked specifically for this project. We get these kind of complaints from other departments and especially undergrad editorials in the student newspaper. Unfortunately, the budget from the football team won't be used to renovate the social work buildings.

With that many cores, they will need to find new energy sources just to power it, and re-think water resource management as they redirect the river to cool the thing and to prevent it from causing global climate change itself!

Yes, it's going to use some power...but I would compare this to teaching a man how to fish rather than handing him one. This energy will be an investment in all sorts of efficiencies in the future. That is, if the computing time is allocated efficiently.

Actually UT Austin has its own natural gas fired power plant. As of 12 years ago, it made enough power to sell excess back to the city. It also had a fairly impressive uptime with only one black out during its entire operational lifetime. Unless the power requirements for UT have grown out of control (there has been A LOT of construction since I graduated), I suspect that there will be little trouble running the computer cluster.

AMD has been trying to build a plant in Austin for years now. This is most likely a deal deeply connected with finally breaking ground on a plant people have been protesting for YEARS now, due to it's location (near watersheds, etc)

I was at TACC a few weeks ago, and the peak performance was around 519 teraflops....
Sadly, they also said the word on the street is that IBM wont take too kindly to the new king in town, and since TOP500 is biannually, everyone is biting their nails about blue-gene getting a quick upgrade in time to stay on top.
Turns out the blue-gene systems are so scalable its quite easy to strap a few thousand new processors for a nice performance boost.

BlueGene/L at LLNL already peaks significantly faster than Ranger. The only question is whether Ranger can get a =sustained= number that passes BG/L and considering the difference in peak performance, it's unlikely. June is going to be an interesting list as there could be quite a bit of shuffle at the top.You are also correct about the scalability of BG. If you look at last June's list and last November's list you'll see a big difference in performance for BG/L. That's entirely due to simply adding mor

Since TFA says this hardware will last only four years, what typically happens to these supercomputers built out of so called commodity hardware? Is sun going to donate/resell this gear? Or does it end up in the scrap heap? Surely, these Sun Blades are supposed to have a useful lifespan greater than four years. Sun could probably give these blades to every elementary school in all of Texas. Is the future of super computing aka disposable computing?

Probably this means that the system is funded for 4 years. About 2 years in, they will try to renew their grants. If they do, they will probably upgrade the system at about 3 years with a mid-life kicker. This would probably be new blades using 8-core opterons and 2-4 times as much memory. They might even get a second kicker if AMD comes out with a socket-compatible upgrade. After that it's probably time for a forklift upgrade.This cycle is fairly typical for HPTC systems, and a 6 year total life-span is pr

It sounds like they don't need faster super computers but instead to narrow the number of areas they are crunching away at. Why not pick the top 2 or 3 issues and crunch away at those instead of running 20 jobs, all of which will hardly get anywhere in the four years this supercomputer has to live?

They've put quite a bit of thought into how to efficiently allocate time on a 60 million dollar compute cluster, I can assure you. If some jobs get time, it's because non-trivial progress can be made on those things in the time they've been allocated.

What does this polluting plant have to do with anything? I can't see how they would have decided to build a 59 million dollar supercomputer to placate protesters that were protesting against something completely irrelevant to supercomputers.(to answer your question, after 4 years the computer loses public funding, but doesn't cease operation.)

Some fraction of this machine was originally supposed to be in production in May of last year (a requirement of the original request for proposals), but as far as I know it wasn't even accessible to friendly users until some time last fall. I don't understand how TACC, Sun, and/or AMD avoided getting hit with penalties from the NSF.

From "The Hitchhiker's Guide to the Galaxy" by Douglas AdamsChapter 25There are of course many problems connected with life, of which some of the most popular are Why are people born? Why do they die? Why do they want to spend so much of the intervening time wearing digital watches?

Many many millions of years ago a race of hyperintelligent pan- dimensional beings (whose physical manifestation in their own pan-dimensional universe is not dissimilar to our own) got so fed up with the constant bickering about

OK, the end of the skit from Hitchhiker's Guide:Chapter 26"Yes, very salutary," said Arthur, after Slartibartfast had related the salient points of the story to him, "but I don't understand what all this has got to do with the Earth and mice and things."

"That is but the first half of the story Earthman," said the old man. "If you would care to discover what happened seven and a half millions later, on the great day of the Answer, allow me to invite you to my study where you can experience the events yoursel

Yes. This is how science works. What you and your whiny brethren call "doctrine", scientists tend to call "Accepted theories". You have always, and will always, have to argue a whole lot more if you're on the other team. You have a whole lot of people to convince, for a start.

Who should get precedence: a medical researcher who is trying to prove that HIV does not cause AIDS, or a biological chemist who is looking for a cure for cancer?

Both Opteron and Phenom were at the same B2 stepping, complete with the same L3 errata, despite the different packaging. That's why you haven't seen a Tier one vendor touch the Opterons with a 10 foot poll for a generally available product. You can bet your ass this is the reason AMD released the kernel patch so 'some customer' could proceed with a Linux Opteron deployment with B2 parts without the performance penalty nor risk of the L3 errata.

This deployment is probably where AMD focused a firesale of B2 parts, since it's nice and well controlled.