The Rich Report: Five Minutes with Cray's CTO, Steve Scott

Now that ISC has turned 25 and I’m seeing all these familiar faces, I’ve been getting a little nostalgic for the Old Cray Days. And as I look around the show floor at all this capitalism going on, I think about how it all goes back to Seymour Cray, the guy who created the supercomputer industry. So this week it was a great pleasure for me to meet with Cray CTO, Steve Scott.

insideHPC: You know, I started my career at Cray back in 1986, so I just wanted to say how wonderful it is to see the company doing so well and making money again.

Steve Scott

Steve Scott: It is nice. A lot of the credit I think goes to our CEO, Peter Ungaro. He knows the industry inside and out and he put together a management team that got the finances in order. That was a big deal because people would go, “We like what you’re saying, but what about the financial viability of the company?” So we don’t get that any more.

insideHPC: So I’d like to start out by talking about Exascale. I was having a discussion earlier and someone asked which company is going to get there first. It seems to me it’s down to Cray or IBM.

Steve Scott: You certainly can’t count them out. But there’s actually a record to consider. If you go back and look at not the first peak number or linpack result, but the first sustained application Gigaflop, Teraflop, and Petaflop, they were all on Crays. So the first sustained application Gigaflop was on a CRAY Y-MP in 1988. The first Teraflop was in 98 on a CRAY T3E. And the first sustained application Petaflop was in 2008 on Jaguar and the CRAY XT-5. So I’ve gone out on a limb and said publically that the first sustained application Exaflop will be on a Cray in 2018. So that’s our internal target. And this one is going to be harder than the last one. So now that Cascade is kind of in the bag, most of my time is spent thinking about how we get to Exascale.

insideHPC: So what is the superscale user community doing right now to get things going?

Steve Scott: There’s a lot of stuff heating up in the Exaflop race or whatever you want to call it. The DOE Exascale program is getting closer to reality and the DARPA Ubiquitous High Performance Computing program just got off the ground. They’re not specifically targeting Exascale, but UHPC actually grew out of all the study teams that DARPA sponsored: one on hardware, one on software, and one on resilience.

So UHPC is focused on a Petascale in a box, but at power efficiency targets that are good enough for an Exascale. They target 50 sustained Gigaflops per Watt and if you scale that up, that turns into Exascale for 20 Megawatts. So if we can hit the UHPC target, we will be on our way.

Interestingly, the biggest datacenters today are north of 100 Megawatts. That’s the big Internet datacenters, you know Google, Amazon, and Microsoft. So this Exascale system would be 20 Megawatts, which is bigger than any single system today. Jaguar (#1 on TOP500) is the biggest HPC power consumer today at roughly 7 Megawatts, but it’s also the greenest x86 system in the TOP50 in terms of sustained flops per watt.

insideHPC: So if they can hit these targets, the Exascale system would consume roughly three times as much power as Jaguar?

Steve Scott: Yes, and that’s a big factor because you have to consider the budget to run the system and what’s practical from a political perspective. The power budget for a 20 Megawatt system would be roughly $20 Million per year, since power costs about a million dollars per Megawatt year.

insideHPC: It sounds like the next 8 years are going to be very exciting.

Steve Scott: It is fun. And the past five or so years have been pretty interesting as well. We had this huge technology inflection point, you know, with power and delay and transistor counts are going up, a bunch of things came together which said we’ve got to do things completely differently. That’s why we had the whole multicore phenomenon happening.

But when we look out to Exascale, it’s clear that just straight multi-core x86 is not going to get us there. Fundamentally we have to do something different underneath the covers to get the energy efficiency. So that’s why we’re pretty interested these days in looking at forms of accelerated computing. We formed a partnership with NVIDIA and we’re working with them on some future stuff. It’s not going to take over the world overnight, but it’s headed in the right direction.

insideHPC: So what are the most important things you would like potential customers to know about Cray?

Steve Scott: I would say number one is Efficiency. We’re all about power efficiency and system efficiency. Our focus is entirely on sustained computing, sustained results for real scientists. So we work very, very closely with our end users. We have a relationship with them that is not typical. We know their codes and we work on things that don’t make the peak of the machine any better, but things that help them get sustained results. So the fact that the Jaguar system is sitting at number one on the list is nice, but we really don’t care how fast it runs linpack. There are three applications on Jaguar that are achieving over a Petaflop, sustained, and that’s what it’s all about.

I would say the third thing is that Cray now has complete coverage of the HPC space, from the desk-side up to the biggest supercomputers. We’ve got a collaboration with Microsoft where we offer Windows HPC on our low-end, desk-side boxes. And we have now introduced the CX1000, which is a single rack system that uses Intel processors. It has SMP nodes as well as distributed-memory nodes. It also has graphic acceleration, all integrated and ready to run. So HPC is all we do and we can get you started.

Comments

This claim from the interview:
“And the first sustained application Petaflop was in 2008 on Jaguar and the CRAY XT-5″

Not true. Back at ISC 2008 in Dresden on Wednesday, June 18, there was a special session on the Los Alamos Roadrunner system, and this was reported (slide 11 of 18 in the presentation):
“Petavision achieved a sustained 1.144 PF/s on the full RR system (SP).”

Jeff – We heard back from Steve; the context of his comment is that (as far as he knows) Petavision is a 32 bit application. While the single-precision Petaflop for that code is still a great result on the machine, when we talk about sustained performance records, it’s for double precision. I know not everyone uses DP, but it’s dominant in the high end community. Seems like a fair explanation.

Resource Links:

Latest Video

Industry Perspectives

In this episode, the Radio Free HPC team splits on the topic of Net Neutrality. The FCC will soon publish its new rules for ensuring an even playing field for Internet Bandwidth. "Dan doesn't like the idea one bit. Henry disagrees and thinks we need Net Neutrality to keep the Comcasts of the world from running amok. As for Rich, he just finds the whole argument rather amusing since it's pretty much a done deal." [Read More...]