For the prior version of this page that includes numerous links (some broken/expired) to news coverage
click here
.

Summary

In the past, mainstream computers used a single processor core. Nearly a decade ago all mainstream commercial computers switched to using multiple cores (due to technical reasons,
such as overheating), and became known as multi-cores. Consequently, the XMT vision for a desktop many-core supercomputer, introduced in 1997, became relevant for mainstream
computing. XMT is as timely today: policy reports, such as The Future of Computing Performance:
Game Over or Next Level?, recognize that the field is desperate for alternative multi-core ideas,
as well as competition. To appreciate how serious the situation is, note the extremely low level of application innovation in general-purpose desktop computing (excluding graphics) during
the last decade (e.g., in comparison to the internet and mobile markets). The main reasons for the current situation are: 1. Commercial multi-core machines have, so far, failed to generate a
broad base among application programmers; most programmers simply find these multi-cores too difficult to program effectively; indeed,
the Game Over report points out that Only heroic programmers can exploit the vast parallelism in today's machines . 2. In
turn, the repelling of programmers took the steam out of desktop computing:
not having enough ready-to-deploy applications for multi-cores weakened the business case for larger investment in better multi-core hardware. It is also relevant to point out that by the
time competition among machine vendors was needed to give the market true choice towards the transition to multi-cores, most brand names of the 1990s (e.g., Apple, DEC, IBM, Motorola,
Sun) have already dropped out from the desktop hardware market.

XMT aspires to break this impasse. Rather than sidestep ease-of-programming, or treat it
as afterthought, XMT builds on PRAM -- the foremost parallel algorithmic theory to date.
XMT appears to be the only active architecture research project in the computer science
community today that handles ease-of-programming as a first class constraint.
Indeed, the following quote captures the programming difficulty for current commercial
systems: In practice, exploiting eight cores means that a problem has to be broken down
into eight pieces -- which for many algorithms is difficult to impossible. The piece that
can't be parallelized will limit your improvement, Paolo Gargini (Intel), Nature, 2/2016.

Status report 2017. A new "capstone" for the overall XMT effort was reported in this
paper, which demonstrated that XMT
can be programmed in a lock-step manner. To appreciate the significance of this accomplishment, note that
textbooks on (PRAM) parallel algorithms employ just one command: "pardo" in their pseudo-code
for parallel algorithm. The new contribution is that this command alone is all that is
needed from the parallel programmer. Namely, such pseudo-code can imported as-is into the
XMT parallel program. The surprising result is that such program achieves the same performance
as the fastest hand-optimized threaded code, which is what XMT programmers had to
produce prior to this work.
For other recent work, see the
presentation of a paper entitled: Can Cooling Technology Save Many-Core Parallel Programming from Its Programming Woes?
at the Compiler, Architecture and Tools Conference (CATC), Intel Israel Development Center (IDC), Haifa,
Israel, November 23,2015, and most recently this
2017 Technion and IDC talk

Old (2012-3) status report.
Using our hardware and software prototypes of XMT, we recently demonstrated parallel speedups that beat all current platforms by orders of magnitude on the most advanced parallel
algorithms in the literature, including for
max-flow, graph
biconnectivity and
graph triconnectivity, , as well for
our 2013 parallelization of the engine (known as the
Burrows-Wheeler algorithm) that drives the Bzip2 text compression standard. This stress test validates our 30-year old hypothesis that the PRAM algorithmic theory can
save parallel computing from its programming woes, while traversing an architecture narrative that has belittled for decades the PRAM theory, stipulating that it is too simplistic for any
existing or future parallel machine. The XMT/PRAM advantage is especially important for the so-called, irregular, fine-grained programs that all commercial platforms to date fail to support effectively.

Navigate this page:

The XMT programmer's model is unique as it incorporates an algorithmic model, the parallel random-access machine/model
(PRAM), into the programming "workflow".
The wealth of the PRAM theory of algorithms is well documented. The XMT project has been
driven by a PRAM-On-Chip vision, seeking to build an easy-to-program parallel computer
comprising thousands of processors on a single chip, using PRAM-like programming.
Interestingly, starting with a PRAM might not have been an obvious choice.
Technology constraints guide us away from tightly coupled concurrency in programs;
e.g., away from the PRAM and towards multi threading. On the other hand, relaxed
concurrency in programs is notoriously difficult to design or analyze for correctness or
performance. The XMT programming approach incorporates an elegant workaround.
XMT provides a work flow from a PRAM algorithm to an XMT program,
and even to fine-tuning an XMT program for performance. Given a problem, a PRAM style
parallel algorithm is developed for it using the Shiloach-Vishkin 1982 Work-Depth (WD)
Methodology. All the operations that can be concurrently performed in the first round are
noted, followed by those that can be performed in the second round, and so on. See the figure below that contrasts this
"natural" (parallel) algorithm with the serial, one operation at a time, doctrine.
Such synchronous (lock-step) description of a parallel algorithm makes it easy to reason about
correctness and analyze for work (the total number of operations) and depth
(number of rounds). The 2017 update above reports that WD lock-step description is all that
is needed from an XMT program!
However, for those who are interested in how XMT programming looked till 2016 and is still
possible today, we kept the following description. The XMT programmer can use the XMTC
language(basically C with two additional commands) for producing a multi-threaded program.
Reasoning about correctness or performance can now be restricted to just comparison of
the program with the WD algorithm, assuming that correctness and performance of the
algorithm have been established, often a much easier task than directly analyzing the
program. This workaround allowed college freshmen and even high-school students solve the
same problems they get in typical freshmen serial programming course assignments using
(XMT) parallel programming.

The programming side is quite simple. See the figure below that depicts serial and parallel
modes and spawn and join commands. Its primary
example is a standard C programming language variant called XMTC that adds
only two basic commands. The main added command is spawn. The program
alternates between serial mode and parallel mode. The spawn command can declare
any number of concurrent threads and causes a switch from serial mode to
parallel mode. Each thread advances through its program at its own speed till
termination. When all threads terminate (depicted as Join in the figure), the
program switches back to serial mode. Successive spawn commands can each
declare a different number of threads. This brief description is not meant to
replace a fuller description that can be found in technical presentations and
papers below. For example, this brief description suppresses: (i) the second
added command, (ii) the classification of XMTC as a so-called single-program
multiple-data (SPMD) language, (iii) the so-called independence-of-order (IOS)
semantics of XMTC, and, of course, (iv) how the program is compiled and
implemented in hardware.

Why is the problem that XMT solves of broad interest?

General-purpose computing (your personal computer) is one of the biggest success stories of technology
as a business model. Intel's former CEO,
Andy Grove, coined the term Software Spiral to describe the following process: improvements in hardware lead to improvements in software that lead back to improvements in
hardware. Dr. Grove noted that the software spiral has been a powerful engine of sustained growth for general-purpose
computing for quite a few decades. The bad news is that this spiral
is now broken. Until recently, programmers could assume that computers execute one operation at time.
To programmers, the improvements in hardware simply looked as executing single operations faster and faster. Software
developers (programmers) could continue to write programs in the same way, knowing that the next generation of
hardware will
run them faster. Technology constraints are now forcing the
industry to base all its new hardware on parallel computing, which executes many concurrent operations at a time.
This hardware requires a complete overhaul of
the way that software needs to be written. Unfortunately, as hardware vendors recognize, this hardware can be too demanding
on software developers. For example, Chuck Moore, Senior Fellow,
AMD said in 2008:
"To make effective use of multicore
hardware today, you need a PhD in computer science".
Evidence that not much has changed is reflected in the following July 2010 title
by Prof. David Patterson, University of California, Berkeley:
The Trouble with Multicore: Chipmakers are busy designing microprocessors that most programmers can't handle.

Keeping the general-purpose "Software Spiral" on track, which requires reinventing both software and hardware platforms for parallel computing, is one of the biggest challenges of our times.
Parallel software productivity problems are breaking the spiral, and failing to resolve the problem can cause a significant
recession in a key component of the economy.

What specific problem does XMT solve? And what is unique about the XMT solution?

XMT provides a complete solution for the way future computer should be built (hardware) and programmed (software).
Unlike other contemporary computers, XMT is based on the only approach that was endorsed by the theory of computer science
for addressing the intellectual effort of developing parallel computer programs (or parallel algorithms).
The advanced computers that the industry currently builds are very difficult to program, while even high school student can program
XMT. In fact, XMT programming has been successfully taught to
high school students.
To appreciate the significance, simply contrast this fact with the above quote by Mr. Moore, AMD or Prof. Patterson's title.

High School Student on His Experience with XMT
- I was motivated to solve all the XMT programming assignments we got, since I had to cope with solving the algorithmic problems
themselves which I enjoy doing. In contrast, I did not see the point of programming other parallel systems available to us at school,
since too much of the programming was effort getting around the way the systems were engineered, and this was not fun.
Jacob Hurwitz, 10th grader, Montgomery Blair High School Magnet Program, Silver Spring, Maryland, December 2007. Jacob was one of the students
who completed
all programming assignments, in the
Informal Parallel
Programming Course for High School Students, offered in Fall 2007. These challenging high school assignments included all programming assignments in the
UMD graduate course on parallel algorithms.

- The single-chip supercomputer prototype built by Prof. Uzi Vishkin's group uses rich algorithmic theory to address the practical problem of building an easy-to-program multicore computer. Vishkin's XMT chip reincarnates the PRAM algorithmic technology developed over the past 25 years, uniting the theory of yesterday with the reality of today.
Charles E. Leiserson, Professor of Computer Science, MIT, June 2007.

- I am happy to hear of the impending announcement of a 64 processor prototype of Uzi Vishkin's XMT architecture. This system represents a significant improvement in generality and flexibility for parallel computer systems because of its unique ability to exploit fine-grain concurrency. It will be able to exploit a wider spectrum of parallel algorithms than today's microprocessors can, and this in turn will help bring general purpose parallel computing closer to reality.
Burton Smith, Technical Fellow, Microsoft Corporation, June 2007.

- Today's multi-core processors support coarse grain parallelism. Professor Vishkin has defined a new parallel architecture that supports extremely fine-grained threading. On XMT, a program can be profitably decomposed into tasks as small as 10 instructions. With a complete programming model and an impressive FPGA-based prototype, Professor Vishkin is proposing a compelling alternative design for future microprocessors.
Geoff Lowney, Intel Fellow, April 2008.

The September 15, 2007 tutorial kicked off an informal parallel programming course for high school students. Beyond the opening tutorial,
the course was conducted through a weekly office hour that Scott Watson, an undergraduate teaching assistant, held at Montgomery Blair High School.
Item 8 above, the home page for the informal course, provides practical guidance on using the XMT FPGA computer and links to 8 programming assigments.
Past tutorial:
Tutorial on
"How to Think Algorithmically in Parallel", Seattle, WA,
Sunday, June 17, 2007, held in conjunction with the 21st ACM International Conference on
Supercomputing (ICS).

·(General talk:) Explicit Multi-Threading
(XMT): A Theorist's Proposal for a Big Little Computer System. Download talk (27
slides). Versions of the SPAA'98 talk were given at
IBM Haifa Research Laboratory (8/98), University of Maryland Computer
Engineering Colloquium (9/98), The Computer Architecture and Parallel System
Laboratory, University of Delaware (10/98), The Center for Research on Parallel
Computation, Rice University (10/98), CASES98: Workshop on Compiler and
Architecture Support for Embedded Systems, Washington, DC (12/98), New York
University (12/98), IBM T.J. Watson (12/98), the Workshop on Parallel
Algorithms (WOPA'99), Atlanta (5/99), Technion - Israel Institute of Technology
(5/99), and Georgia Institute of Technology (3/00).

·(General talk) PRAM on Chip: Advancing
from Parallel Algorithms to a Parallel System. Download talk (pdf,
46 slides) . Talks were given at Sandia National Lab 3/2006 and the UMIACS
Workshop on Parallelism in Algorithms and Architectures, May 12, 2006.

·(Panel discussion presentation at IPDPS08, April 16, 2008, Miami, Florida)
Topic of panel: How to avoid making the same mistakes all over again or...
how to make the experiences of the parallel processing communities useful for the multi/many-core generation.
Download
presentation (pdf, 6X2 slides) . Program of IPDPS08 .

·(General talk)
Using Simple Abstraction to Guide the Reinvention of Computing for Parallelism. Parallel@Illinois Distinguished Lecture and a Computer Science Colloquium, University of Illinois,
Urbana-Champaign, March 15, 2010.
Abstract, video, and
slides from the Parallel@Illinois Archives. Power point slides are available
here (pptx, 52 slides). Talks with the same title were given at
the
Hebrew University, Jerusalem, Israel, May 30, 2010 and
University of Vienna, Austria, June 2, 2010. Power point slides are available
here (pptx, 52 slides).

· (General talk)
Current mainstream computing: If in a hole, should we keep digging? Abstract,
presented on June 2, 2011 at the Computer Science Department, University of California, San Diego.
61 slides, or link through
the UCSD class on Parallel Algorithms Later talks expending on this title were given at ETH Zurich (11/2011), Tel Aviv University (11/2011),
Georgetown University (12/2011) and Los Alamos National Lab (1/2012).

·
(Position on research direction) Principles Matter and Can Matter More: Big Lead of PRAM Algorithms on Prototype-HW.
Download Uzi Vishkin's abstract and
slides.
NSF Workshop on Research Directions in the Principles of Parallel Computation, Pittsburgh, Pennsylvania, USA
June 28, 2012.

·U.
Vishkin, S. Dascal, E. Berkovich and J. Nuzman. Explicit Multi-Threading (XMT)
Bridging Models for Instruction Parallelism (Extended Abstract). In Proc.
10th ACM Symposium on Parallel Algorithms and Architectures (SPAA), 1998. Note: This paper introduces XMT to readers whose background includes
reasonable understanding of algorithms and theory. The presentation is by way
of a bridging model. Among other things, a bridging model provides a design
space for algorithm designers and programmers, as well as a design space for
computer architects. It is convenient to describe our wider vision regarding
``parallel-computing-on-a-chip'' as a two-stage development and therefore two
bridging models are presented: Spawn-based multi-threading (Spawn-MT) and
Elastic multi-threading (EMT). Download TR (12
pages).

·U.
Vishkin, S. Dascal, E. Berkovich and J. Nuzman. Explicit Multi-Threading (XMT)
Bridging Models for Instruction Parallelism (Extended Summary and Working
Document). Current version of UMIACS TR-98-05 . First version: January
1998. Note: This long paper is really the MAIN publication for the XMT framework, as first conceived.
The SPAA'98 paper gives a principled presentation of XMT. It also introduces
XMT to readers with algorithms and theory background. The next central
presentation for the project is the journal version of the SPAA'01 paper. In
contrast, the September 1999 TR UMIACS99-55 by Berkovich, Nuzman, Franklin,
Jacob and Vishkin is not a good single representative of the XMT project, as a
whole; while it describes possible architecture choices for the XMT framework,
and studies their microarchitecture performance implications, some part of it
are misleading when it comes to understanding choices actually made for
XMT. The paper emphasizes an important ``decentralization'' property of the
architecture. Its performance hardly degrades even when the number of cycles
for traversing wires (interconnects) increases. The introduction that the latter
paper provides to XMT is less comprehensive and less direct than the SPAA98
paper, as some aspects of XMT are not addressed, and some architecture choices
were made for the sake of completeness of the architecture and not because
they are required by the XMT paradigm. Download TR
(47 pages).

·D. Naishlos, J. Nuzman, C-W. Tseng, and
U. Vishkin. Towards a First Vertical Prototyping of an Extremely Fine-Grained
Parallel Programming Approach. The conference version appeared in Proc. 13th
ACM Symposium on Parallel Algorithms and Architectures (SPAA-01)), July
2001. Download TR
(pdf, 10 pages).
The journal version appeared in the invited Special Issue for SPAA01: TOCS
36,5 pages 521-552, Springer-Verlag, 2003. Download
the TR that led to the journal version (pdf, 26 pages).

·U.
Vishkin, G. Caragea and B. Lee. Models for Advancing PRAM and Other Algorithms
into Parallel Programs for a PRAM-On-Chip Platform. Chapter 5, Handbook on
Parallel Computing: Models, Algorithms, and Applications, Editors: S.
Rajasekaran and J. Reif, Chapman and Hall/CRC Press, 2008. This paper provides
models for developing performance-tuned XMT programs. There is no conflict
between this work and the quest for ease-of-programming using the PRAM model.
The implied methodology (for the XMT system developers) is as follows. Given a
simple PRAM-like program manually performance-tune it using the performance
models in this paper. Then teach the compiler to produce the performance-tuned
program. ·Download TR
(pdf, 62 pages).

·
L. Hochstein, V. Basili, U. Vishkin and J. Gilbert.
A pilot study to compare programming effort
for two parallel programming models. Journal of Systems and Software Volume 81 Issue 11, 1920-1930, November, 2008.
This paper compares the programming effort in MPI and XMTC for a similar programming assignment.
The main finding is that with high confidence XMTC development time is nearly half of MPI.
Download
TR (pdf, 26 pages).

· U. Vishkin. Algorithmic approach to designing an easy-to-program system:
can it lead to a HW-enhanced programmer's workflow add-on? Proc. International Conference on Computer Design (ICCD), Lake Tahoe, CA, October 4-7, 2009.
Download
TR (pdf, 4 pages).Download talk
(pptx, 47 slides): this is a slightly updated version of the slides following later talks at the Electrical Engineering Department, Technion and Intel, Haifa, October 21 and 22, 2009.
Download abstract for these talks.

·
U. Vishkin.
Using simple abstraction to reinvent computing for parallelism.
Communications of the ACM (CACM) 54,1, pages 75-85, January, 2011.
Download from CACM.
[Download
TR (pdf, 9 pages); note: the published version should be accessible to many more readers, as it was edited by the CACM.]

· N. Crowell. Parallel algorithms for graph problems, May 2011. MSc scholarly paper
by a Computer Science student who evaluated XMT for
some wide-interest benchmarks. This student was not part of the XMT team. Crowell's work provides
independent validation of evidence published by the XMT team on the advantages of XMT.
In particular, his experience has been that, beyond developing serial programs for the
problems he worked on, the extra effort for producing parallel code was minimal.
Download paper.

· U. Vishkin. Restoring software productivity crucial to economic recovery:
The multi-core dilemma. White paper, the Technology Innovation Program (TIP),
National Institute of Standards and Technology (NIST). Input
to help TIP focus the TIP program on areas of critical national need. July 2011.
Download this single page paper.

Funding by the National Science Foundation and the Department of Defense is
graciously acknowledged. Some of the material on this web page, and XMT subpages linked from this page, is based upon work
supported by the National Science Foundation under grants No. 0325393, 0811504 and 0834373. Any
opinions, findings, and conclusions or recommendations expressed in this
material are those of the author(s) and do not necessarily reflect the views of
the National Science Foundation or the Department of Defense.