Thoughts
on High Performance Computing: from 1989 till the present and beyond

I have been
involved in technical computing for over 40 years, starting with minicomputers
that were used by this community. This is a collection of articles,
memos, talks, and testimony covering that period.

Paths
and cul de sacs on the endless road to
supercomputing. (2.5 PPT). Talk given at the 100th Centenary of John
Vincent Atanasoff at Iowa State University on 30
October-November 1 at their Symposium on Modern Computing. This talk
is substantially along the lines of my criticism of the U.S. efforts to
regain supremacy for the world's fastest computer from Japan. Our approach
that uses loosely coupled, commodity-based uni-
and multi-processor computers is still not a match for a very high speed
cross-point switch interconnecting shared memory vector processors that
NEC has evolved based on the "Cray" formulae.

Global Grid Forum Keynote 25 June
2003, Seattle. (12
MB PPT, ) Basic messages using six examples: enable your applications
now as web services, Grid developers need to adopt users in order to
understand how to create the Grid, and we see a real drive to Community
and Data Centric Computing. This new world will be around databases and
data mining where discoveries come from analyzing large datasets.
Over time, the web services program model may be so pervasive, that these
services will be the way to communicate within the computer. Furthermore,
web services begin to open up the 20 year promise of distributed
computing.

DOE Science Computing Conference
2003 19 June 2003 includes my talk entitled Seven Paths to High
Performance (and the Petaflops) 2 MByte PDF. Evolution is the most likely
path. Any idea of a magic bullet that will not be tried by Moore's
Law is fool-hearty. The DARPA HPC program is certain to require more
than six years to achieve any significant
performance. The tried and true vector architecture with tight
coupling among nodes must remain as a component as the Japanese have shown
in their use at NEC and Fujitsu --and especially their compilers.
Finally, we must go back and work on the programming environment for
cluster that includes both the compiler and run-time systems.

International Conference on
Computational Science (ICCS2003) Presentation (PPT) and PDF "Progress, Prizes,
and Community-centric Computing, Melbourne
2 June, 2003. The presentation has three parts: history of the
Gordon Bell Prize and the computers that have enabled the winners; a very
brief look at the misguided efforts of GRID computing to supply more
operations as epitomized by the Tera-grid
project, and ends with a strong plea for
Community- and data-centric versus Centers-centric versus. In this regard,
planning has to revert to the various scientific communities that have the
need. Whether such communities are capable of planning and operating as a
community is a key question. However, the SkyServer
that the astronomy community constructed provides a model as does NCAR.

Interview
in IEEE Software Engineering July 1987 discussing the NSF research
needs aimed at parallelism as Assistant Director for the Computers and
Information Science and Engineering Directorate. Also, the Gordon Bell
Prize was posited. I stated: "Our goal is obtaining a factor of
100 in the performance of computing, not counting vectors, within the
decade and a factor of 10 within five years. I think 10 will be easy
because it is inherently there in most applications right now. The hardware
will clearly be there if the software can support it or the users can use
it. Many researchers think this goal is aiming too low. They think
it should be a factor of I million within 15
years. However, I am skeptical that anything more than our goal will be
too difficult in this time period. Still, a factor of 1 million may be
possible through the SIMD approach. The reasoning behind the NSF goals is
that we have parallel machines now and on the near horizon that can
actually achieve these levels of performance. Virtually all new computer
systems support parallelism in some form (such as vector processing or
clusters of computers). However, this quiet revolution demands a major
update of computer science, from textbooks and curriculum to applications
research.

Funding_Alternatives_for_NSF_Supercomputing_Centers_870821
(Word, HTML)
A memo as the founding Assistant Director of NSF's CISE (Computing and
Information Science and Engineering) directorate. It outlines the
basic problem of how funds are allocated to form computing centers.
It should be noted that by 2003, nothing has fundamentally changed in this
allocation process.

CACM_
Ultracomputers_A_Teraflop_Before_Its_Time_9208 argues that given the
state of software standards and the ability to exploit the peak
performance of large, parallel computers, spending over 100 million for a
computer is not prudent. Just wait and let Moore's Law provide a less
expensive machine at a later time and invest in software tools and
research.

A
policy for how to conduct computing research entitled (GB_Report_Policy- Best Computer_R&D_support_1994-1995.doc )was written in 1994 based on observing the
results of numerous computing research projects, nearly all of which
failed.

IEEE_Proc_DSM_Perspective_Another_Point_of_View_9903
is a perspective on Distributed Shared Memory Architectures that I felt
were essential for programmability, but were unattainable because
clustered commodity computers aka multicomputers
were driving them out.

CACM_What's_Next_in_HPC_Bell_&_Gray_0202
for the time being, it is all over -- clustered Beowulfs
using the Intel architecture are "the way". The good: everyone
can buy and build their own, and we have a common platform that enables
apps. The bad: programs that exploit the inherent performance of large
clusters to provide over 10% of peak performance is
still elusive.

Petascale
Computational Systems: Balanced CyberInfrastructure
in a Data-Centric World (pdf).
Response to NSF invitation regarding the CyberInfrastructure,
October 2005, by Alex Szalay, Johns Hopkins, Jim
Gray and Myself. Abstract: Computational science is changing to
be data intensive. NSF should support balanced systems, not just CPU farms
but also petascale IO and networking. NSF should
allocate resources to support a balanced Tier-1 through Tier-3 national
cyber-infrastructure.

History
of Supercomputers PowerPoint talk (PDF) and one hour Video of the
talk was given at Lawrence Livermore National Laboratory on 24 April, 2013
.wmv.Abstract:Since
my first visit to Livermore in 1961 and seeing the LARC, recalling the
elegance of the 6600, and just observing this computer class evolution
have been high points of my life as an engineer and computing observer.
Throughout their early evolution, supercomputer architecture “trickled
down” in various ways for use with other computers. In the mid 1990s the flow reversed when large computers
became scalable and constructed from clusters of microprocessors. Unlike
the two paths of Bell’s Law that account for the birth, evolution, and
death of other computer classes e.g. minicomputers, http://ieeeghn.org/wiki/index.php/STARS:Rise_and_Fall_of_Minicomputerssupercomputers
have doubled in performance every year for the last 50 years just by
building larger structures. While computer performance is the first order
term to track their high performance, many other technologies e.g.
FORTRAN, LINPACK, government funding policy, and applications have
contributed to the extraordinary progress. This talk traces a
trajectory and contributors to this exciting class.

Supercomputing-A_Brief_History_1965_2002
(.doc, html)
is an expanded draft of the article by Jim Gray and myself, but focuses
more on the history and especially traces the evolution of the
"Cray" Brand, starting with Seymour Cray at Univac, going to
help form CDC (1957), forming Cray Research (1972) where he created the
vector supercomputer and on to Cray Computer Corp. to SRC and finally
bought by Tera that was renamed Cray
Company. This rather tragic trajectory shows how government policy
has helped wipe out the US Supercomputing industry aka Cray, and on the
other hand enable NEC to provide the highest performance computer measured
in real application performance (RAP) in the 2002-2005 time frame.

A 1984
Critique on an NRC/OSTP Plan for Numerical computing 1984 was written,
and conditions are virtually unchanged .
The report starts: "I believe the report greatly underestimates the
position and underlying strength of the Japanese in regard to Supercomputers.
The report fails to make a substantive case about the U. S. position,
based on actual data in all the technologies from chips (where the
Japanese clearly lead) to software engineering productivity."
Note a set of heuristics are given for managing such an effort.