World's most interesting computer in jeopardy

Cray's MTA goes MIA

Common Topics

Cray confirms that the MTA-2 multithreaded supercomputer, one of the most interesting and unusual machines ever built, has indeed been sidelined by the company.

"The R&D development level has gone down," spokesman Steve Conway told The Register.

Will there be an MTA-3?

"That will all be determined by the market. We've shipped two of the MTA-2s, one to a Japanese customer. It all depends on the market," said Conway.

The MTA represents over twenty years of pioneering work in parallel processing, and the ideas inspired today's SMT Intel chips. But very little about the CMOS-based MTA resembles any of today's high end commercial systems, let alone personal computers.

Each MTA processor handles up to 128 hardware threads, and each thread has its own virtual register file and program counter. The MTA processor is attached to a system board, with up to 4GB of memory per board, and up to eight of these modules can be accommodated in a single MTA system.

But that's only part of the story of this remarkable machine. It's a uniform flat shared memory system, with a full-empty bit for every word of memory providing much faster synchronization. And there's no data cache. So cache coherency - the bane of SMP shared memory systems - isn't a problem. The machine creates a large number of tasks, and ensures that each is execution stream is kept busy.

"In an effort to improve scaling, some vendors have abandoned shared memory and introduced distributed-memory computers. These are also euphemistically called scalable parallel, massively parallel, or cluster computers. Regardless of the name, they all suffer the same basic problem: a truly horrible programming model.

"First, they require that applications be rewritten before they can even be run in parallel. Then, to achieve mediocre levels of performance, they require programs to be carefully tuned to manage communications and data placement. And since these systems are built using off-the-shelf microprocessors, they require further tuning for effective use of their data caches. Finally, these systems all suffer from inadequate communication bandwidth. Parallel applications can never be expected to run as well on these computers as on shared memory systems regardless of the programming effort invested."