Posted
by
timothy
on Tuesday September 11, 2012 @10:01AM
from the plus-a-fun-museum dept.

1sockchuck writes "Three of the world's most powerful supercomputers live in adjacent aisles within a single data center at Oak Ridge National Laboratory in Tennessee. Inside this facility, technicians are busy installing new GPUs into the Jaguar supercomputer, the final step in its transformation into a more powerful system that will be known as Titan. The Oak Ridge team expects the GPU-accelerated machine to reach 20 petaflops, which should make it the fastest supercomputer in the Top 500. Data Center Knowledge has a story and photos looking at this unique facility, which also houses the Kraken machine from the University of Tennessee and NOAA's Gaea supercomputer."

(Oh, for God's sake, I click the "post anonymously" button 25 minutes after the last comment I post and it says I didn't wait long enough. Good way to spoil a joke... No, I'm not really Dr. Smoot. Posting under my real name is the only way to post this. Are you on slashdot staff, Dr. Cooper?

Yeah, WTF does that mean, "...which should make it the fastest supercomputer in the Top 500..." If it's the fastest, then it's the fastest regardless of the number it's compared to (e.g. fastest of top 1,000,000 or fastest of top 2) so it's meaningless to include that unless the goal was to confuse people.

It's more correct to say it's the fastest on the list, than the fastest in the world. There are any number of metrics you can use to compare supercomputers. Top 500 just uses the most popular metric. Another machine could easily be the fastest on a different list, like http://www.graph500.org/ [graph500.org].

It's more correct to say it's the fastest on the list, than the fastest in the world. There are any number of metrics you can use to compare supercomputers. Top 500 just uses the most popular metric. Another machine could easily be the fastest on a different list, like http://www.graph500.org/ [graph500.org].

The other specific consideration is that the list is ONLY for those that volunteer to run the Linpack benchmark and wish to publicize the results. It is presumed that governments with classified computing facilities withhold this information, for obvious reasons, so there are likely many "supercomputers" (perhaps even a "fastest") that will never be part of the Top 500. The US NSA, for example, is widely believed to operate facilities at or near the top of the list, but they are nowhere in sight for obvious reasons.

At first I thought this was redundant, but then I wondered if there are faster supercomputers that simply are not independently verified to be in the top 500 supercomputers. Anyone have any more info, or am I just overthinking this?

Different benchmarks will produce a different fastest super computer list. 'Top 500' is a specific list that uses a specific benchmark, a benchmark that this particular machine is currently at the top of. Using a different benchmark could be just as valid and produce a completely different list.

I went there my Sophomore year to check out Oak Ridge. I didn't go for computing,but since my guide knew that I like computing, he took me to look at the supercomputers. It's this huge room that was visible through glass windows which looked essentially like a huge clean, white, office floor with all the cubicles removed and with the supercomputers in place instead.

At that time (2009?) I heard it wasn't really the fastest supercomputer but it's awesome to hear they're revving it up to that. If I didn't hat

I really, really wish articles would stop saying that computer X has Y GFLOPS. It's almost meaningless, because when you're dealing with that much CPU power, the real challenge is to make the communications topology match the computational topology. That is, you need the physical structure of the computer to be very similar to the structure of the problem you are working on. If you're doing parallel processing (and of course you are, for systems like this), then you need to be able to break your problem into chunks, and map each chunk to a processor. Some problems are more easily divided into chunks than other problems. (Go read up on the "parallel dwarves" for a description of how things can be divided up, if you're curious.)

I'll drill into an example. If you're doing a problem that can be spatially decomposed (fluid dynamics, molecular dynamics, etc.), then you can map regions of space to different processors. Then you run your simulation by having all the processors run for X time period (on your simulated timescale). At the end of the time period, each processor sends its results to its neighbors, and possibly to "far" neighbors if the forces exceed some threshold. In the worst case, every processor has to send a message to every other processor. Then, you run the simulation for the next time chunk. Depending on your data set, you may spend *FAR* more time sending the intermediate results between all the different processors than you do actually running the simulation. That's what I mean by matching the physical topology to the computational topology. In a system where the communications cost dominates the computation cost, then adding more processors usually doesn't help you *at all*, or can even slow down the entire system even more. So it's really meaningless to say "my cluster can do 500 GFLOPS", unless you are talking about the time that is actually spent doing productive simulation, not just time wasted waiting for communication.

Here's a (somewhat dumb) analogy. Let's say a Formula 1 race car can do a nominal 250 MPH. (The real number doesn't matter.) If you had 1000 F1 cars lined up, side by side, then how fast can you go? You're not going 250,000 MPH, that's for sure.

I'm not saying that this is not a real advance in supercomputing. What I am saying, is that you cannot measure the performance of any supercomputer with a single GFLOPS number. It's not an apples-to-apples comparison, unless you really are working on the exact same problem (like molecular dynamics). And in that case, you need some unit of measurement that is specific to that kind of problem. Maybe for molecular dynamics you could quantify the number of atoms being simulated, the average bond count, the length of time in every "tick" (the simulation time unit). THEN you could talk about how many of that unit your system can do, per second, rather than a meaningless number like GFLOPS.

Here's a (somewhat dumb) analogy. Let's say a Formula 1 race car can do a nominal 250 MPH. (The real number doesn't matter.) If you had 1000 F1 cars lined up, side by side, then how fast can you go? You're not going 250,000 MPH, that's for sure.

I don't remember reading any stipulation about the nature of where the cars are placed.

I guess my point is that with ideal conditions, i.e. an infinitely wide and straight track (plus no mechanical failures, infinite tyres, come on we all understand the meaning of the words "ideal conditions"), then collectively 1000 cars cover the same distance as one car going 1000 times the speed.

Comparing LINPACK numbers makes sense. But GFLOPS (or TFLOPS or PFLOPS or whatever), by itself, is a meaningless and misleading number. Most people just stop thinking when they see a single metric like GFLOPS, and then they compare GFLOPS in one system to GFLOPS in another system. If those systems *are* comparable, then fine. But often enough, they are not comparable.

I'll drill into an example. If you're doing a problem that can be spatially decomposed (fluid dynamics, molecular dynamics, etc.), then you can map regions of space to different processors. Then you run your simulation by having all the processors run for X time period (on your simulated timescale). At the end of the time period, each processor sends its results to its neighbors, and possibly to "far" neighbors if the forces exceed some threshold. In the worst case, every processor has to send a message to every other processor. Then, you run the simulation for the next time chunk. Depending on your data set, you may spend *FAR* more time sending the intermediate results between all the different processors than you do actually running the simulation. That's what I mean by matching the physical topology to the computational topology. In a system where the communications cost dominates the computation cost, then adding more processors usually doesn't help you *at all*, or can even slow down the entire system even more. So it's really meaningless to say "my cluster can do 500 GFLOPS", unless you are talking about the time that is actually spent doing productive simulation, not just time wasted waiting for communication.

Considering that computational fluid dynamics, molecular dynamics, etc., break down into linear algebra operations, I'd say that the FLOPS count on a LINPACK benchmark is probably the best single metric available. In massively parallel CFD, we don't match the physical topology to the computational topology, because we don't (usually) build the physical topology. But I can and do match the computational topology to the physical one.

Yes, many problems can be expressed as dense linear algebra, and so measuring and comparing LINPACK perf for these makes sense for those problems. However, many problems don't map well to dense linear algebra. The Berkeley "parallel dwarfs" paper expresses this idea better than I ever could: http://view.eecs.berkeley.edu/wiki/Dwarfs [berkeley.edu]

Yes, many problems can be expressed as dense linear algebra, and so measuring and comparing LINPACK perf for these makes sense for those problems. However, many problems don't map well to dense linear algebra.

Sure, but as far as I've seen, linear algebra problems dominate the runtime of these very large systems. That's what I use them for.

At least the first 6 on that dwarfs list are done daily on top500 machines. I write parallel spectral methods, and use structured and unstructured grids. Achieving high scaling on these on massively parallel machines is not at all what I would call an open problem (as far as correctly using a given network for a problem, or designing a network for a given problem). For any give

The US still has these Big Science centers left over from the glory years. There's Oak Ridge, Los Alamos, and the Lawrence Livermore Senior Activity Center (er, "stockpile stewardship"), plus the NASA centers. Their original missions (designing bombs, sending people to the Moon) are long gone, but nobody turned off the money, so they keep looking for something, anything, to justify the pork.

The atomic centers are all located in the middle of nowhere. This was originally done for good reasons - their exist

Umm, what is this "post-nuke" era ? There's a reason they have huge computing capability - the nukes haven't gone away, we just don't talk about them anymore. And they don't just sit around gathering dust; they must be carefully maintained and a huge amount of computing power is expended in "improving" them. And you may rest assured that the nuclear establishment is developing new tactical and strategic nuclear weapons for specialized applications, again using vast amounts of computing power.

Please do your homework first. While the supercomputers at Lawrence Livermore, Los Alamos, and Sandia National Laboratories are primarily used for nuclear weapons work, the work of keeping the country's huge stockpile safe and reliable is a gigantic job, especially if you don't want to actually detonate any of the warheads. Yep, that's the trick. Simulate the ENTIRE weapon, from high explosive initiation all the way to final weapon delivery. With all of the hydrodynamics, chemistry, materials science, nucle

While it's certainly fascinating to hear about the machine itself, it's easy to forget part of why it exists: simulating destruction. The Manhattan Project also came from Oak Ridge, if you recall.

As someone who lives in the region, nobody is particularly keen on what possibly goes on at these places. There are various "secret" military installations scattered around here, from Oak Ridge to Holston Army Ammunition. Between what we factually know is buried under and developed at these places, and what is r