MIT: towards the 1000-core processor

MIT is a disorienting place, particularly Stata Center, the home of EECS. There are no straight lines in the building and nothing appears plumb. Architect Frank Gehry, it seems, wanted his design to disturb and overwhelm and there he has succeeded, particularly when it rains: The building leaks. But does the building also stimulate? Again, Gehry has succeeded: The building hums with energy.

On a sunny day in July, the place is crawling with people. Students of various ages, genders, and nationalities wander by chatting in their t-shirts and flip flops, professors share bag lunches with their children in shady corners of the labyrinthine lobby, the line at the deli counter queues around in a disorderly sort of meander, while people in suits mingle with the flip-flop crowd in and under staircases that wander up and off into brick-lined oblivion.

Stata is part intellectual Grand Central Station and part Winchester Mystery House, enticing tourists and visiting scholars alike to wander in off the ponderous corporate streets of Cambridge.

EECS Professor Srini Devadas has an office on the 8th floor of Stata. When we sat down to chat there on Monday, July 9th, he started with an enthusiastic endorsement of MIT’s most famous building.

*********************

WWJD: Do you like the building?

Srini Devadas: It’s got a ‘cool’ factor and I love to show it off to visitors. There are tourists all the time in the lobby!

WWJD: When I last visited you in 2003, you were head of Area II. At the time, there were 6 areas numbered I to VII.

Srini Devadas: [laughing] Yes, well there’s been some consolidation since then. Now there are only 5 areas, still numbered I to VII.

WWJD: Also the last time I was here, you were somewhat down about DAC, but now I see you presented a paper in June in San Francisco. What’s up with that?

Srini Devadas: It was an invited paper, plus DAC’s done a good job of broadening the topic material and creating new areas of discussion. Our paper this year was a ‘vision’ paper, and our second invited paper since I last saw you. We also gave a paper in 2007 with Ed Suh presenting.

DAC is definitely changing and I credit the Technical Program Committee with those changes. DAC has been nimble and has broadened its scope to include hardware systems with interesting features, and things like low power. The conference now has more focus than just design automation.

Also, great conferences are being co-located with DAC, including HOST on Sunday and Monday at DAC. I have grad students working in the area of hardware security, so I came to San Francisco for that and killed 2 birds with one stone by also presenting our invited paper on Tuesday.

Srini Devadas: We want to build a 1000-core processor here at MIT. That is the Angstrom Project. We have DARPA support, plus various companies, and I’m now leading the effort. There are a great number of applications for this hardware, particularly in the area of Big Data.

Tremendous streams of data are being produced in things like weather prediction, with data coming in from large numbers of hardware sensors. The result is huge amounts of traffic and the software taking a lot of time to process it all. Also the Google’s and Hadoop’s of the world are dealing with so many bits, they have to have huge amounts of hardware storage.

There are many strategies to deal with all of this. Many are algorithmic, but [ultimately] the solution has to be on-chip at the nanoscale level. But then you have to deal with power dissipation problems, many joules of energy that can fry the whole system!

The Angstrom project wants to address these things with a completely new architecture on a single chip with a 1000 cores – solving the bandwidth issue, having a lot of storage on-chip, and balancing the demands of power, performance, and programmability. Which is where self-aware computing comes in [SEEC], the topic of our invited paper at DAC.

[From the DAC paper abstract:In SEEC, applications explicitly state goals, while other systems components provide actions that the SEEC runtime system can use to meet those goals. Angstrom supports this model by exposing sensors and adaptations that traditionally would be managed independently by hardware. This exposure allows SEEC to coordinate hardware actions with actions specified by other parts of the system, and allows the SEEC runtime system to meet application goals while reducing costs (e.g., power consumption).]

WWJD: Of those “demands” you mention that need balancing, it seems like programmability is the biggest problem.

Srini Devadas: Yes, there’s a disconnect between the Grand Vision of highly multicore and the reality of parallel programming. There’s absolutely a gap, and two ways of looking at the solution.

One school of thought says solve the problem as simply as possible through the software, which they say doesn’t require 1000 cores. Big Data today has an embarrassing level of parallelism, while an incredible numbers of applications coming online also face the obvious problem of needing to have the parallelism extracted. A lot of people are beating their heads on the wall trying to extract that parallelism.

The Angstrom Project school of thought says that solving the parallel algorithm is becoming increasingly hard, and is now harder than solving the hardware problem. We want to create systems with clusters of pods or machines instead. [Critics say] these chips will be huge, with non-uniform data access, which cost in time and energy.

But we say, don’t move all of the data around, move the program around [to the various clusters]. We believe the faster you can do that, the better you’ll do with solving the energy problems, the better the efficiency of the chip, and the better the performance.

The fact is that the world today is different. There are lots of applications with tremendous parallelism, and we believe if we can provide a 1000-core solution in the future, it will prove to be scientifically useful to every lab in the world, and to everyone handling Big Data.

WWJD: So how far along are you towards creating the 1000-core Angstrom processor?

Srini Devadas: Currently, we’re working on a 121-core chip in an 11 x 11 array. It’s pretty big, 10 mm x 10 mm at 45 nanometers. Ultimately, we will do a 64 x 16 array, which will be the “1000” core processor, in actuality with 1024 cores.

WWJD: Where are you going to fab the chip your team is currently working on?

Srini Devadas: Through IBM’s TAPO [Trusted Access Program Office], housed in Kansas. It’s like MOSIS, but less well known.

WWJD: If IBM is involved by way of the Common Platform Alliance with Global Foundries in Singapore and Samsung in Korea, are there IP issues with a DARPA-funded project in all of this?

Srini Devadas: Yes, there are both IP and export issues, but we are paying close attention to [the regulations].

And although DARPA is supporting the project, the RTL will be completely open. The GDS II is less likely to be open, however. It will be proprietary, because we can’t give away somebody’s library that might be involved.

Also, our chip is being designed completely using Bluespec, which we want to shout from the rooftop of Stata! The primary designers involved swear there are huge benefits from using Bluespec versus coding directly – with the Verilog RTL done by Bluespec.

And we’re taking the academic attitude: Forget legacy designs! We’re not using any third-party IP. The RTL is all truly MIT home grown. We’re proving that a small group of 4 students can do this huge design – assuming it works, of course!

Also in our group, Li-Shiuan Peh is doing a big chip as well, a 16-core processor with lots of innovation. All of this is very nice to see, particularly as I’ve always wanted to be close to the architecture, always wanted to architect something. I’ve been involved with the hardware security world for a long time, but not with a parallel compute project.

We’re all very excited here about parallel programming on-chip. And there are tremendous benefits for for the students, because [with this experience] they will get good offers from industry when they are done.

Students at the graduate level can write compilers, but they really want to be architecture people creating general purpose processors with general purpose software. Something that’s just not happening on a regular basis in academia.

Of course, we never forget that hardware is uncompromising and can be heartless. Either you get it right, or you don’t. Nonetheless, this is a beautiful project!

One Response to “MIT: towards the 1000-core processor”

[…] Aycinena, the editor of EDACafe, did a nice piece on July 17th, MIT: towards the 1000-core processor. This article is about a project that came as a surprise to many of us here. Here is a great […]