Index for University of Tennessee Technical Reports

file ut-cs-91-131.ps
by Ed Anderson, Z. Bai & Jack Dongarra,
title LAPACK Working Note 31: Generalized QR Factorization
, and Its Applications,
ref University of Tennessee Technical Report CS-91-131,
, April 1991.
for The purpose of this paper is to reintroduce the
, generalized QR factorization with or without pivoting
, of two matrices A and B having the same number of
, rows. When B is square and nonsingular, the
, factorization implicitly gives the orthogonal
, factorization of B{-1}A. Continuing the work of
, Paige [20] and Hammarling [12], we discuss the
, different forms of the factorization from the point of
, view of general-purpose software development. In
, addition, we demonstrate the applications of the GQR
, factorization in solving the linear equality
, constrained least squares problem and the generalized
, linear regression problem, and in estimating the
, conditioning of these problems.
file sc91.ps
by Adam Beguelin, Jack J. Dongarra, G.A. Geist, Robert
, Manchek, & V.S. Sunderam,
title Graphical Development Tools for Network-Based
, Concurrent Supercomputing,
ref Proceedings of Supercomputing `91, pp. 435-444,
, Albuquerque, New Mexico, November 1991.
for This paper describes an X-window based software
, environment called HeNCE (Heterogeneous Network
, Computing Environment) designed to assist scientists
, in developing parallel programs that run on a network
, of computers. HeNCE is built on top of a software
, package called PVM which supports process management
, and communication between a network of heterogeneous
, computers. HeNCE is based on a parallel programming
, paradigm where an application program can be described
, by a graph. Nodes of the graph represent subroutines
, and the arcs represent data dependencies. HeNCE is
, composed of integrated graphical tools for creating,
, compiling, executing, and analyzing HeNCE programs.
file ut-cs-91-136.ps
by Adam Beguelin, Jack Dongarra, Al Geist, Robert
, Manchek, & Vaidy Sunderam,
title A Users' Guide to PVM Parallel Virtual Machine,
ref University of Tennessee Technical Report CS-91-136,
, July 1991.
for This report is the PVM version 2.3 users' guide. It
, contains an overview of PVM and how it is installed
, and used. Example programs in C and Fortran are
, included.
,
, PVM stands for Parallel Virtual Machine. It is a
, software package that allows the utilization of a
, heterogeneous network of parallel and serial computers
, as a single computational resource. PVM consists of
, two parts: a daemon process that any user can install
, on a machine, and a user library that contains
, routines for initiating processes on other machines,
, for communicating between processes, and synchronizing
, processes.
file ornl-tm-11850.ps
by Jean R.S. Blair & Barry W. Peyton,
title On Finding Minimum-Diameter Clique Trees,
ref Oak Ridge National Laboratory Technical Report
, ORNL/TM-11850, Oak Ridge National Laboratory, Oak
, Ridge, Tennessee, August 1991.
for It is well-known that any chordal graph can be
, represented as a clique tree (acyclic hypergraph,
, join tree). Since some chordal graphs have many
, distinct clique tree representations, it is
, interesting to consider which one is most desirable
, under various circumstances. A clique tree of minimum
, diameter (or height) is sometimes a natural candidate
, when choosing clique trees to be processed in a
, parallel computing environment.
,
, This paper introduces a linear time algorithm for
, computing a minimum-diameter clique tree. The new
, algorithm is an analogue of the natural greedy
, algorithm for rooting an ordinary tree in order to
, minimize its height. It has potential application in
, the development of parallel algorithms for both
, knowledge-based systems and the solution of sparse
, linear systems of equations.
file ornl-tm-12318.ps
by Jack J. Dongarra, Thomas H. Rowan, and Reed C. Wade
title Software Distribution Using XNETLIB
ref Oak Ridge National Laboratory Technical Report ORNL/TM-12318
, June, 1993
for Xnetlib is a new tool for software distribution. Whereas its
, predecessor netlib uses e-mail as the user interface
, to its large collection of public-domain mathematical software,
, Xnetlib uses an X Window interface and socket-based communication.
, Xnetlib makes it easy to search through a large distributed
, collection of software and to retrieve requested software in seconds.
file ut-cs-91-141.ps
by James Demmel, Jack Dongarra, & W. Kahan,
title LAPACK Working Note 39: On Designing Portable High
, Performance Numerical Libraries,
ref University of Tennessee Technical Reports CS-91-141,
, July 1991.
for High quality portable numerical libraries have existed
, for many years. These libraries, such as LINPACK and
, EISPACK, were designed to be accurate, robust,
, efficient and portable in a Fortran environment of
, conventional uniprocessors, diverse floating point
, arithmetics, and limited input data structures.
, These libraries are no longer adequate on modern high
, performance computer architectures. We describe their
, inadequacies and how we are addressing them in the
, LAPACK project, a library of numerical linear algebra
, routines designed to supplant LINPACK and EISPACK. We
, shall now show how the new architectures lead to
, important changes in the goals as well as the methods
, of library design.
file ut-cs-89-85.ps
by Jack J. Dongarra,
title Performance of Various Computers Using Standard Linear
, Equations Software,
ref University of Tennessee Technical Report CS-89-85,
, December 1990.
for This report compares the performance of different
, computer systems in solving dense systems of linear
, equations. The comparison involves approximately a
, hundred computers, ranging from a CRAY-MP to
, scientific workstations such as the Apollo and Sun to
, IBM PCs.
file ut-cs-91-134.ps
by Jack Dongarra,
title LAPACK Working Note 34: Workshop on the BLACS,
ref University of Tennessee Technical Report CS-91-134,
, May 1991.
for Forty-three people met on March 28, 1991, to discuss a
, set of Basic Linear Algebra Communication Subprograms
, (BLACS). This set of routines is motivated by the
, needs of distributed memory computers.
file pc.v17.10.ps
by Jack Dongarra, Mark Furtney, Steve Reinhardt, &
, Jerry Russell,
title Parallel Loops -- A Test Suite for Parallelizing
, Compilers: Description and Example Results,
ref Parallel Computing 17 (1991), pp. 1247-1255.
for Several multiprocessor systems are now commercially
, available, and advances in compiler technology provide
, automatic conversion of programs to run on such
, systems. However, no accepted measure of this
, parallel compiler ability exists. This paper presents
, a test suite of subroutines and loops, called Parallel
, Loops, designed to (1) measure the ability of
, parallelizing compilers to convert code to run in
, parallel and (2) determine how effectively parallel
, hardware and software work together to achieve high
, performance across a range of problem sizes. In
, addition, we present the results of compiling this
, suite using two commercially available parallelizing
, Fortran compilers, Cray and Convex.
file ut-cs-91-146.ps
by Jack Dongarra & Bill Rosener,
title NA-NET: Numerical Analysis NET,
ref University of Tennessee Technical Report CS-91-146,
, September 1991.
for The NA-NET is a mail facility created to allow
, numerical analysts (na) an easy method of
, communicating with one another. The main advantage of
, the NA-NET is uniformity of addressing. All mail is
, addressed to the Internet host ``na-net.ornl.gov'' at
, Oak Ridge National Laboratory. Hence, members of the
, NA-NET do not need to remember complicated addresses
, or even where a member is currently located. This
, paper describes the software.
file ut-cs-91-137.ps
by Jack J. Dongarra & Majed Sidani,
title A Parallel Algorithm for the Non-Symmetric Eigenvalue
, Problem,
ref University of Tennessee Technical Report CS-91-137,
, July 30, 1991.
for This paper describes a parallel algorithm for
, computing the eigenvalues and eigenvectors of a
, non-symmetric matrix. The algorithm is based on a
, divide-and-conquer procedure and uses an iterative
, refinement technique.
file ut-cs-91-138.ps
by Jack Dongarra & Robert A. van de Geijn,
title LAPACK Working Note 37: Two Dimensional Basic Linear
, Algebra Communication Subprograms,
ref University of Tennessee Technical Report CS-91-138,
, October 28, 1991.
for In this paper, we describe extensions to a proposed
, set of linear algebra communication routines for
, communicating and manipulating data structures that
, are distributed among the memories of a distributed
, memory MIMD computer. In particular, recent
, experience shows that higher performance can be
, attained on such architectures when parallel dense
, matrix algorithms utilize a data distribution that
, views the computational nodes as a logical two
, dimensional mesh. The motivation for the BLACS
, continues to be to increase portability, efficiency
, and modularity at a high level. The audience of the
, BLACS are mathematical software experts and people
, with large scale scientific computation to perform.
, A systematic effort must be made to achieve a de facto
, standard for the BLACS.
file ut-cs-91-130.ps
by Jack Dongarra & Robert A. van de Geijn,
title Reduction to Condensed Form for the Eigenvalue Problem
, on Distributed Memory Architectures,
ref University of Tennessee Technical Report CS-91-130,
, April 30, 1991.
for In this paper, we describe a parallel implementation
, for the reduction of general and symmetric matrices to
, Hessenberg and tridiagonal form, respectively. The
, methods are based on LAPACK sequential codes and use a
, panel-wrapped, mapping of matrices to nodes. Results
, from experiments on the Intel Touchstone Delta are
, given.
file icci91.ps
by Eric S. Kirsch & Jean R.S. Blair,
title Practical Parallel Algorithms for Chordal Graphs,
ref pp. 372-382 in Proceedings of the International
, Conference on Computing and Information (ICCI '91)--
, Advances in Computing and Information, Ottawa, Canada,
, May 1991.
for Until recently, a large majority of theoretical work
, in parallel algorithms has ignored communication costs
, and other realities of parallel computing. This paper
, attempts to address this issue by developing parallel
, algorithms that not only are efficient using standard
, theoretical analysis techniques, but also require a
, minimal amount of communication. The specific
, parallel algorithms developed here include one to find
, the set of maximal cliques and one to find a perfect
, elimination ordering of a chordal graph.
file vector.ps
by David Levine, David Callahan, & Jack Dongarra,
title A Comparative Study of Automatic Vectorizing Compilers,
ref Parallel Computing 17 (1991), pp. 1223-1244.
for We compare the capabilities of several commercially
, available, vectorizing Fortran compilers using a test
, suite of Fortran loops. We present the results of
, compiling and executing these loops on a variety of
, supercomputers, mini-supercomputers, and mainframes.
file ut-cs-91-147.ps
by Bruce MacLennan,
title Characteristics of Connectionist Knowledge
, Representation,
ref University of Tennessee Technical Report CS-91-147,
, November 1991.
for Connectionism--the use of neural networks for
, knowledge representation and inference--has profound
, implications for the representation and processing of
, information because it provides a fundamentally new
, view of knowledge. However, its progress is impeded
, by the lack of a unifying theoretical construct
, corresponding to the idea of a calculus (or formal
, system) in traditional approaches to knowledge
, representation. Such a construct, called a simulacrum,
, is proposed here, and its basic properties are
, explored. We find that although exact classification
, is impossible, several other useful, robust kinds of
, classification are permitted. The representation of
, structured information and constituent structure are
, considered, and we find a basis for more flexible
, rule-like processing than that permitted by
, conventional methods. We discuss briefly logical
, issues such as decidability and computability and show
, that they require reformulation in this new context.
, Throughout we discuss the implications for artificial
, intelligence and cognitive science of this new
, theoretical framework.
file ut-cs-91-145.ps
by Bruce MacLennan,
title Continuous Symbol Systems: The Logic of Connectionism,
ref University of Tennessee Technical Report CS-91-145,
, September 1991.
for It has been long assumed that knowledge and thought
, are most naturally represented as discrete symbol
, systems (calculi). Thus a major contribution of
, connectionism is that it provides an alternative model
, of knowledge and cognition that avoids many of the
, limitations of the traditional approach. But what
, idea serves for connectionism the same unifying role
, that the idea of a calculus served for the traditional
, theories? We claim it is the idea of a continuous
, symbol system.
,
, This paper presents a preliminary formulation of
, continuous symbol systems and indicates how they may
, aid the understanding and development of connectionist
, theories. It begins with a brief phenomenological
, analysis of the discrete and continuous; the aim of
, this analysis is to directly contrast the two kinds of
, symbols systems and identify their distinguishing
, characteristics. Next, based on the phenomenological
, analysis and on other observations of existing
, continuous symbol systems and connectionist models, I
, sketch a mathematical characterization of these
, systems. Finally the paper turns to some applications
, of the theory and to its implications for knowledge
, representation and the theory of computation in a
, connectionist context. Specific problems addressed
, include decomposition of connectionist spaces,
, representation of recursive structures, properties of
, connectionist categories, and decidability in
, continuous formal systems.
file nipt91-panel.ps
by Bruce MacLennan,
title The Emergence of Symbolic Processes From the
, Subsymbolic Substrate,
ref text of invited panel presentation, International
, Symposium on New Information Processing Technologies
, `91, Tokyo, Japan, March 13-14, 1991.
for A central question for the success of neural network
, technology is the relation of symbolic processes
, (e.g., language and logic) to the underlying
, subsymbolic processes (e.g., parallel distributed
, implementations of pattern recognition, analogical
, reasoning and learning). This is not simply an issue
, of integrating neural networks with conventional
, expert system technology. Human symbolic cognition is
, flexible because it is not purely formal, and because
, it retains some of the ``softness'' of the subsymbolic
, processes. If we want our computers to be as flexible
, as people, then we need to understand the emergence
, of the discrete and symbolic from the continuous and
, subsymbolic.
file ut-cs-91-144.ps
by Bruce MacLennan,
title Gabor Representations of Spatiotemporal Visual Images,
ref University of Tennessee Technical Report CS-91-144,
, September 1991.
for We review Gabor's Uncertainty Principle and the limits
, it places on the representation of any signal.
, Representations in terms of Gabor elementary functions
, (Gaussian-modulated sinusoids), which are optimal in
, terms of this uncertainty principle, are compared with
, Fourier and wavelet representations. We also review
, Daugman's evidence for representations based on
, two-dimensional Gabor functions in mammalian visual
, cortex. We suggest three-dimensional Gabor elementary
, functions as a model for motion selectivity in complex
, and hypercomplex cells in visual cortex. This model
, also suggests a computational role for low frequency
, oscillations (such as the alpha rhythm) in visual
, cortex.
file uist91.ps
by Brad Vander Zanden, Brad A. Myers, Dario Giuse, &
, Pedro Szekely,
title The Importance of Pointer Variables in Constraint
, Models,
ref pp. 155-164 in Proceedings of UIST '91, ``ACM SIGGRAPH
, Symposium on User Interface Software and Technology,''
, Hilton Head, South Carolina, November 11-13, 1991.
for Graphical tools are increasingly using constraints to
, specify the graphical layout and behavior of many
, parts of an application. However, conventional
, constraints directly encode the objects they reference,
, and thus cannot provide support for the dynamic
, runtime creation and manipulation of application
, objects. This paper discusses an extension to current
, constraint models that allows constraints to
, indirectly reference objects through pointer variables.
, Pointer variables permit programmers to create the
, constraint equivalent of procedures in traditional
, programming languages. This procedural abstraction
, allows constraints to model a wide array of dynamic
, application behavior, simplifies the implementation of
, structured object and demonstrational systems, and
, improves the storage and efficiency of highly
, interactive, graphical applications. It also promotes
, a simpler, more effective style of programming than
, conventional constraints. Constraints that use
, pointer variables are powerful enough to allow a
, comprehensive user interface toolkit to be built for
, the first time on top of a constraint system.
file hence.ieee
title HeNCE: Graphical Development Tools for Network-Based
, Concurrent Computing
by Adam Beguelin, Jack J. Dongarra, G.A. Geist, Robert Manchek,
, Keith Moore, V. S. Sunderam, and Reed Wade.
for Wide area computer networks have become a basic part of
, today's computing infrastructure.
, These networks connect a variety of machines, presenting an
, enormous computing resource.
, In this project we focus on developing methods
, and tools which allow a programmer to tap into this resource.
, In this talk we describe HeNCE,
, a tool and methodology under development that assists a
, programmer in developing programs to execute on a networked group of
, heterogeneous machines.
,
, HeNCE is implemented on top of a system called PVM
, (Parallel Virtual Machine).
, PVM is a software package that allows
, the utilization of a heterogeneous network of parallel and serial
, computers as a single computational resource. PVM provides
, facilities for spawning, communication, and
, synchronization of processes over a
, network of heterogeneous machines. While PVM provides the low
, level tools for implementing parallel programs,
, HeNCE provides the programmer
, with a higher level abstraction for specifying parallelism.
file siampvm.ps
by A. Beguelin, J. Dongarra, A. Geist, R. Manchek & V. Sunderam,
title Solving Computational Grand Challenges Using a Network of Heterogeneous
, Supercomputers
ref Proceedings of the Fifth SIAM Conference on Parallel
, Processing for Scientific Computing, pp. 596-601, March 25-27, 1991.
for This paper describes simple experiments connecting a Cray XMP, an
, Intel iPSC/860, and a Thinking Machines CM2 together over a high
, speed network to form a much larger virtual computer. It also
, describes our experience with running a Computational Grand Challenge
, on a Cray XMP and an iPSC/860 combination. The purpose of the
, experiments is to demonstrate the power and flexibility of the PVM
, (Parallel Virtual Machine) system to allow programmers to exploit a
, diverse collection of the most powerful computers available to solve Grand
, Challenge problems.
file ut-cs-92-168.ps
by Jack J. Dongarra & H.A. Van der Vorst,
title Performance of Various Computers Using Standard Sparse Linear Equations
, Solving Techniques
ref University of Tennessee Technical Report CS-92-168,
, February 1992.
for The LINPACK benchmark has become popular in the past few years
, as a means of measuring floating-point performance on computers.
, The benchmark shows in simple and direct way what performance is to
, be expected for a range of machines when doing dense matrix computations.
, We present performance results of sparse matrix computations which
, is an iterative approach.
file ut-cs-92-154.ps
by Bruce MacLennan
title $L_p$-Circular Functions
ref University of Tennessee Technical Report CS-92-154, May 1992.
for In this report we develop the basic properties of a set of
, functions analogous to the circular and hyperbolic functions,
, but based on $L_p$ circles. The resulting identities may simplify
, analysis in $L_p$ spaces in much the way that the circular functions
, do in Euclidean space. In any case, they are a pleasing example of
, mathematical generalization.
file ut-cs-92-172.ps
by Bruce J. MacLennan
title Research Issues in Flexible Computing: Two Presentations in Japan
ref University of Tennessee Technical Report CS-92-172, September 1992.
for This report contains the text of two presentations made in
, Japan in 1991, both of which deal with the Japanese ``Real World
, Computing Project'' (previously known as the ``New Information
, Processing Technology,'' and informally as the ``Sixth Generation Project'').
file ut-cs-92-174.ps
by Bruce MacLennan
title Field Computation in the Brain
ref University of Tennessee Technical Report
, CS-92-174, October 1992.
for We begin with a brief consideration of the {\it topology of knowledge}.
, It has traditionally been assumed that true knowledge must be represented
, by discrete symbol structures, but recent research in psychology,
, philosophy and computer science has shown the fundamental importance of
, {\it subsymbolic} information processing, in which knowledge is represented
, in terms of very large numbers--or even continua--of {\it microfeatures}.
, We believe that this sets the stage for a fundamentally new
, theory of knowledge, and we sketch a theory of continuous information
, representation and processing. Next we consider {\it field computation},
, a kind of continuous information processing that emphasizes spatially
, continuous {\it fields} of information. This is a reasonable
, approximation for macroscopic areas of cortex and provides a convenient
, mathematical framework for studying information processing at this level.
, We apply it also to a linear-systems model of dendritic information
, processing. We consider examples from the visual cortex,
, including Gabor and
, wavelet representations, and outline field-based theories of sensorimotor
, intentions and of model-based deduction.
file ut-cs-92-180.ps
by Bruce MacLennan
title Information Processing in the Dendritic Net
ref University of Tennessee Technical Report CS-92-180, October 1992.
for The goal of this paper is a model of the dendritic net that:
, (1) is mathematically tractable, (2) is reasonably true to the
, biology, and (3) illuminates information processing in the neuropil.
, First I discuss some general principles of mathematical modeling in a
, biological context that are relevant to the use of linearity and
, orthogonality in our models. Next I discuss the hypothesis that
, the dendritic net can be viewed as a linear field computer. Then I
, discuss the approximations involved in analyzing it as a dynamic,
, lumped-parameter, linear system. Within this basically linear framework
, I then present: (1) the self-organization of matched filters and
, of associative memories; (2) the dendritic computation of Gabor and other
, nonorthogonal representations; and (3) the possible effects of
, reverse current flow in neurons.
file oopsla.ps
by Brad A. Myers, Dario A. Giuse, & Brad Vander Zanden
title Declarative Programming in a Prototype-Instance System: Object-Oriented
, Programming Without Writing Methods
ref Sigplan Notices, Vol.~27,
, No.~10, October 1992, pp.~184-200.
for Most programming in the Garnet system uses a declarative style that
, eliminates the need to write new methods. One implication is that
, the interface to objects is typically through their data values.
, This contrasts significantly with other object systems where writing
, methods is the central mechanism of programming. Four features are
, combined in a unique way in Garnet to make this possible: the use
, of a prototype-instance object system with structural inheritance, a
, retained-object model where most objects persist, the use of constraints
, to tie the objects together, and a new input model that makes writing
, event handlers unnecessary. The result is that code is easier to
, write for programmers, and also easier for tools, such as interactive,
, direct manipulation interface builders, to generate.
file ut-cs-92-152.ps
by Marc D. VanHeyningen & Bruce J. MacLennan,
title A Constraint Satisfaction Model for Perception of Ambiguous Stimuli
ref University of Tennessee Technical Report CS-92-152, April 1992.
for Constraint satisfaction networks are natural models of the interpretation
, of ambiguous stimuli, such as Necker cubes. Previous constraint
, satisfaction models have stimulated the initial interpretation of a
, stimulus, but have not simulated the dynamics of perception, which
, includes the alternation of interpretations and the phenomena known
, as bias, adaptation and hysteresis. In this paper we show that these
, phenomena can be modeled by a constraint satisfaction network {\it with
, fatigue}, that is, a network in which unit activities decay in time.
, Although our model is quite simple, it nevertheless exhibits some
, key characteristics of the dynamics of perception.
file ut-cs-93-194.ps
by Michael Berry, Theresa Do, Gavin O'Brien,
, Vijay Krishna, & Sowmini Varadhan,
title SVDPACKC (Version 1.0) User's Guide,
ref University of Tennessee Technical Report CS-93-194,
, April 1993.
for SVDPACKC comprises four numerical (iterative) methods
, for computing the singular value decomposition (SVD)
, of large sparse matrices using ANSI C. This software
, package implements Lanczos and subspace
, iteration-based methods for determining several of the
, largest singular triplets (singular values and
, corresponding left- and right-singular vectors) for
, large sparse matrices. The package has been ported to
, a variety of machines ranging from supercomputers to
, workstations: CRAY Y-MP, IBM RS/6000-550,
, DEC 5000-100, HP 9000-750, SPARCstation 2, and
, Macintosh II/fx. This document {\it (i)} explains each
, algorithm in some detail, {\it (ii)} explains the
, input parameters for each program, {\it (iii)}
, explains how to compile/execute each program, and
, {\it (iv)} illustrates the performance of each method
, when we compute lower rank approximations to sparse
, {\it term-document} matrices from information
, retrieval applications. A user-friendly software
, interface to the package for UNIX-based systems and
, the Macintosh II/fx is also described.
file ut-cs-93-195.ps
by Brian Howard LaRose
title The Development and Implementation of a Performance
, Database Server
ref University of Tennessee Technical Report CS-93-195,
, August 1993.
for The process of gathering, archiving, and distributing
, computer benchmark data is a cumbersome task usually
, performed by computer users and vendors with little
, coordination. Most importantly, there is no
, publicly-available central depository of performance
, data for all ranges of machines: supercomputers to
, personal computers. We present an Internet-accessible
, performance database server (PDS) which can be used to
, extract current benchmark data and literature. As an
, extension to the X-Windows-based user interface
, (Xnetlib) to the Netlib archival system, PDS provides
, an on-line catalog of public-domain computer
, benchmarks such as the Linpack Benchmark, Perfect
, Benchmarks, and the Genesis benchmarks. PDS does not
, reformat or present the benchmark data in any way
, which conflicts with the original methodology of any
, particular benchmark, and is thereby devoid of any
, subjective interpretations of machine performance.
, We feel that all branches (academic and industrial) of
, the general computing community can use this facility
, to archive performance metrics and make them readily
, available to the public. PDS can provide a more
, manageable approach to the development and support of
, a large dynamic database of published performance
, metrics.
file ut-cs-93-196.ps
by Douglas J. Sept
title The Design, Implementation and Performance of a
, Queue Manager for PVM
ref University of Tennessee Technical Report CS-93-196,
, August 1993.
for The PVM Queue Manager (QM) application addresses
, some of the load balancing problems associated with
, the heterogeneous, multi-user, computing environments
, for which PVM was designed. In such environments, PVM
, is not only confronted with the difficulties of
, distributing tasks among machines of variable loads,
, it must also contend with machines of varying
, performance levels in the same virtual machine. The
, QM addresses both of these problems using two
, different load balancing techniques, one static, the
, other dynamic. In its simplest (static) mode, the QM
, will initiate PVM processes for the user on demand,
, taking into account information such as the peak
, megaflops/sec and actual load of each machine. In
, addition to the initiation of processes, the QM will
, also accept tasks to be completed by a specified PVM
, process type. These tasks are shipped to the QM where
, they are kept in a FIFO queue. Worker processes in
, the virtual machine send idle messages to the QM when
, they are ready for a task, and the QM ships a task to
, the process if there is one (of a type matching the
, process) in the queue. The QM also maintains a list
, of idle processes and chooses the {\em best} one for
, the task, should one arrive when several processes
, are idle. Since faster machines typically send more
, idle messages (and receive more tasks) than slower
, ones, this provides a level of dynamic load balancing
, for the system. Three applications have already been
, implemented using the QM within PVM: a Mandelbrot
, image generator, a conjugate-gradient algorithm, and
, a map analysis program used in landscape ecology
, applications. Benchmarks of elapsed wall-clock time
, comparing standard PVM versions with the QM-based
, versions demonstrate substantial performance gains for
, both methods of load balancing. When processing a
, $1000 \times 1000$ image, for example, the QM-based
, Mandelbrot application averaged 63.92 seconds,
, compared to 139.62 seconds for the standard PVM
, version in a heterogenous network of five
, workstations (comprised of Sun4's and an
, IBM RS/6000).
file ut-cs-93-197.ps
by Karen Stoner Minser
title Parallel Map Analysis on the CM-5 for Landscape
, Ecology Models
ref University of Tennessee Technical Report CS-93-197,
, August 1993.
for In landscape ecology, computer modeling is used to
, assess habitat fragmentation and its ecological
, implications. Specifically, maps (2-D grids) of
, habitat clusters are analyzed to determine numbers,
, sizes, and geometry of clusters. Previous ecological
, models have relied upon sequential Fortran-77 programs
, which have limited the size and density of maps that
, can be analyzed. To efficiently analyze relatively
, large maps, we present parallel map analysis software
, implemented on the CM-5. For algorithm development,
, random maps of different sizes and densities were
, generated and analyzed. Initially, the Fortran-77
, program was rewritten in C, and the sequential cluster
, identification algorithm was improved and implemented
, as a recursive or nonrecursive algorithm. The major
, focus of parallelization was on cluster geometry using
, C with CMMD message passing routines. Several
, different parallel models were implemented: host/node,
, hostless, and host/node with vector units (VUs). All
, models obtained some speed improvements when compared
, against several RISC-based workstations. The
, host/node model with VUs proved to be the most
, efficient and flexible with speed improvements for a
, $512\times 512$ map of 187, 95, and 20 over the Sun
, Sparc 2, HP 9000-750, and IBM RS/6000-350,
, respectively. When tested on an actual map produced
, through remote imagery and used in ecological studies
, this same model obtained a speed improvement of 119
, over the Sun Sparc 2.
file ut-cs-92-157.ps
title HeNCE: A Users' Guide Version 1.2
by Adam Beguelin, Jack Dongarra, G. A. Geist, Robert Manchek,
, Keith Moore, Reed Wade, Jim Plank, and Vaidy Sunderam
ref University of Tennessee Technical Report CS-92-157
for HeNCE, Heterogeneous Network Computing Environment, is a graphical
, parallel programming environment. HeNCE provides an easy to use
, interface for creating, compiling, executing, and debugging parallel
, programs. HeNCE programs can be run on a single Unix workstation or
, over a network of heterogeneous machines, possibly including
, supercomputers. This report describes the installation and use of the
, HeNCE software.
file ut-cs-93-191.ps
title Software Distribution Using XNETLIB
by Jack Dongarra, Tom Rowan and Reed Wade
ref University of Tennessee Technical Report CS-93-191
for Xnetlib is a new tool for software distribution. Whereas its
, predecessor netlib uses e-mail as the user interface
, to its large collection of public-domain mathematical software,
, Xnetlib uses an X-Window interface and socket-based communication.
, Xnetlib makes it easy to search through a large distributed collection
, of software and to retrieve requested software in seconds.
file ut-cs-93-207.ps
title Data-parallel Implementations of Map Analysis and Animal Movement
, for Landscape Ecology Models
by Ethel Jane Comiskey
file ut-cs-93-213.ps
title Public International Benchmarks for Parallel Computers
by assembled by Roger Hockney (chairman) and Michael Berry (secretary)
ref PARKBENCH Committee: Report-1, November 17, 1993
file ornl-tm-11669.ps
title Fortran Subroutines for Computing the Eigenvalues and Eigenvectors of
, a General Matrix by Reduction to General Tridiagonal Form,
by J. Dongarra, A. Geist, and C. Romine
ref ORNL/TM-11669, 1990.
, (Also appeared as a ACM TOMS Vol. 18, No. 4, Dec 1992, pp 392-400.
for This paper describes programs to reduce a nonsymmetric matrix to
, tridiagonal form, compute the eigenvalues of the tridiagonal matrix,
, improve the accuracy of an eigenvalue, and compute the corresponding
, eigenvector. The intended purpose of the software is to find a few
, eigenpairs of a dense nonsymmetric matrix faster and more accurately
, than previous methods. The performance and accuracy of the new
, routines are compared to two \eispack\ paths: {\tt RG} and {\tt
, HQR-INVIT}. The results show that the new routines always more accurate
, and also faster if less than 20\% of the eigenpairs are needed.
file ut-cs-89-90.ps
title Advanced Architecture Computers,
by Jack Dongarra and Iain S. Duff,
ref University of Tennessee, CS-89-90, November 1989.
for We describe the characteristics of several recent computers that
, employ vectorization or parallelism to achieve high performance
, in floating-point calculations.
, We consider both top-of-the-range supercomputers and computers
, based on readily available and inexpensive basic units.
, In each case we discuss the architectural base, novel features,
, performance, and cost. We intend to update this report
, regularly, and to this end we welcome comments.
file ornl-tm-12404.ps
title Software Libraries for Linear Algebra Computation on High-Performance
, Computers
by Jack J. Dongarra and David W. Walker
ref Oak Ridge National Laboratory, ORNL TM-12404, August, 1993.
for This paper discusses the design of linear algebra libraries for high
, performance computers. Particular emphasis is placed on the development
, of scalable algorithms for MIMD distributed memory concurrent
, computers. A brief description of the EISPACK, LINPACK, and LAPACK
, libraries is given, followed by an outline of ScaLAPACK, which is a
, distributed memory version of LAPACK currently under development. The
, importance of block-partitioned algorithms
, in reducing the frequency of data movement between different levels
, of hierarchical memory is stressed. The use of such algorithms
, helps reduce the message startup costs on distributed memory concurrent
, computers. Other key ideas in our approach are the use of distributed
, versions of the Level 3 Basic Linear Algebra Subprograms (BLAS) as
, computational building blocks, and the use of Basic
, Linear Algebra Communication Subprograms (BLACS) as communication
, building blocks. Together the distributed BLAS and the BLACS can be
, used to construct higher-level algorithms, and hide many details of
, the parallelism from the application developer.
,
, The block-cyclic data distribution is described, and adopted as a good
, way of distributing block-partitioned matrices. Block-partitioned
, versions of the Cholesky and LU factorizations are presented, and
, optimization issues associated with the implementation of the LU
, factorization algorithm on distributed memory concurrent computers
, are discussed, together with its performance on the Intel Delta system.
, Finally, approaches to the design of library interfaces are reviewed.
file ut-cs-93-205.ps
title HeNCE: A Heterogeneous Network Computing Environment,
by Adam Beguelin, Jack Dongarra, Al Geist, Robert Manchek, and Keith Moore
ref University of Tennessee Technical Report CS-93-205
for Network computing seeks to utilize the aggregate resources
, of many networked computers to solve a single problem.
, In so doing it is often possible to obtain supercomputer performance
, from an inexpensive local area network.
, The drawback is that network computing is complicated
, and error prone when done by hand, especially if the computers
, have different operating systems and data formats and are thus
, heterogeneous.
,
, HeNCE (Heterogeneous Network Computing Environment)
, is an integrated graphical environment for creating and running
, parallel programs over a heterogeneous collection of computers.
, It is built on a lower level package called PVM.
, The HeNCE philosophy of parallel programming is to have the programmer
, graphically specify the parallelism of a computation and to automate,
, as much as possible, the tasks of writing, compiling,
, executing, debugging, and tracing the network computation.
, Key to HeNCE is a graphical language based on directed graphs
, that describe the parallelism and data dependencies of an application.
, Nodes in the graphs represent conventional Fortran or C subroutines
, and the arcs represent data and control flow.
,
, This paper describes the the present state of HeNCE,
, its capabilities, limitations, and areas of future research.
file ut-cs-93-186.ps
title A Proposal for a User-Level, Message-Passing Interface
, in a Distributed Memory Environment
by Jack J. Dongarra, Rolf Hempel Anthony J. G. Hey, and David W. Walker
ref University of Tennessee Technical Report CS-93-186
for This paper describes Message Passing Interface 1 (MPI1), a
, proposed library interface standard for supporting point-to-point
, message passing. The intended standard will be provided with
, Fortran 77 and C interfaces, and will form the basis of a standard high
, level communication environment featuring collective communication and
, data distribution transformations. The standard proposed here provides
, blocking and nonblocking message passing between pairs of processes,
, with message selectivity by source process and message type. Provision
, is made for noncontiguous messages. Context control provides a
, convenient means of avoiding message selectivity conflicts between
, different phases of an application. The ability to form and manipulate
, process groups permit task parallelism to be exploited, and is a useful
, abstraction in controlling certain types of collective communication.
file ut-cs-93-214.ps
by Message Passing Interface Forum,
title DRAFT: Document for a Standard Message-Passing
, Interface,
ref University of Tennessee Technical Report CS-93-214,
, October 1993.
for The Message Passing Interface Forum (MPIF), with
, participation from over 40 organizations, has been meeting
, since January 1993 to discuss and define a set of library
, interface standards for message passing. MPIF is not
, sanctioned or supported by any official standards
, organization.
,
, This is a draft of what will become the Final Report,
, Version 1.0, of the Message Passing Interface Forum. This
, document contains all the technical features proposed for
, the interface. This copy of the draft was processed by
, LATEX on October 27, 1993.
,
, MPIF invites comments on the technical content of MPI, as
, well as on the editorial presentation in the document.
, Comments received before January 15, 1994 will be
, considered in producing the final draft of Version 1.0 of
, the Message Passing Interface Specification.
,
, The goal of the Message Passing Interface, simply stated, is
, to develop a widely used standard for writing
, message-passing programs. As such the interface should
, establish a practical, portable, efficient, and flexible
, standard for message passing.
file ut-cs-93-209.ps
title Efficient Communication Operations in Reconfigurable Parallel Computers
by F. Desprez, A. Ferreira, and B. Tourancheau,
ref University of Tennessee Technical Report CS-93-209
for Reconfiguration is largely an unexplored property in the context
, of parallel models of computation. However, it is a powerful concept
, as far as massively parallel architectures are concerned, because it
, overcomes the constraints due to the bissection width arising in
, most of distributed memory machines. In this paper, we show how
, to use reconfiguration in order to improve communication operations
, that are widely used in parallel applications. We propose quasi-optimal
, algorithms for broadcasting, scattering, gossiping and multi-scattering.
file ut-cs-93-208.ps
title Trace2au Audio Monitoring Tools for Parallel Programs,
by Jean-Yves Peterschmitt and Bernard Tourancheau
ref University of Tennessee Technical Report CS-93-208, August 1993.
for It is not easy to reach the best performances you can expect of
, a parallel computer. We therefore have to use monitoring programs
, to study the performances of parallel programs. We introduce here
, a way to generate sound in real-time on a workstation, with no
, additional hardware, and we apply it to such monitoring programs.
file ut-cs-93-204.ps
title A General Approach to the Monitoring of Distributed Memory MIMD
, Multicomputers
by Maurice van Riek, Bernard Tourancheau, Xavier-Francois Vigouroux,
ref University of Tennessee Technical Report CS-93-204
for Programs for distributed memory parallel machines are generally
, considered to be much more complex than sequential programs.
, Monitoring systems that collect runtime information about a program
, execution often prove a valuable help in gaining insight in the
, behavior of a parallel program and thus can increase its performance.
, This report describes in a systematic and comprehensive way the
, issues involved in the monitoring of parallel programs for distributed
, memory systems. It aims to provide a structured general approach
, to the field of monitoring and a guide for further documentation.
, First the different approaches to parallel monitoring are presented
, and the problems encountered are discussed and classified.
, In the second part, the main existing systems are described to provide
, the user with a feeling for the possibilities and limitations of
, real tools.
file ut-cs-93-210.ps
by Frederic Desprez, Pierre Fraigniaud, and Bernard Tourancheau
title Successive Broadcasts on Hypercube,
ref University of Tennessee Technical Report CS-93-210,
, August 1993.
for Broadcasting is an information dissemination problem in
, which information originating at one node of a communication
, network must be transmitted to all the other nodes as
, quickly as possible. In this paper, we consider the
, problem in which all the nodes of a network must, by turns,
, broadcast a distinct message. We call this problem the
, successive broadcasts problem. Successive broadcasts is a
, communication pattern that appears in several parallel
, implementations of linear algebra algorithms on distributed
, memory multicomputers. Note that the successive broadcasts
, problem is different from the gossip problem in which all
, the nodes must perform a broadcast in any order, even
, simultaneously. We present an algorithm solving the
, successive broadcasts problem on hypercubes. We derive a
, lower bound on the time of any successive broadcasts
, algorithms that shows that our algorithm is within a factor
, of 2 of the optimality.
file ut-cs-94-222.ps
title Netlib Services and Resources, (Rev. 1)
by S. Browne, J. Dongarra, S. Green, E. Grosse, K. Moore, T. Rowan,
, and R. Wade
ref University of Tennessee Technical Report CS-94-222,
, December, 1994.
for The Netlib repository, maintained by the University of Tennessee and
, Oak Ridge National Laboratory, contains freely available software,
, documents, and databases of interest to the numerical, scientific
, computing, and other communities. This report includes both the
, Netlib User's Guide and the Netlib System Manager's Guide, and
, contains information about Netlib's databases, interfaces, and system
, implementation. The Netlib repository's databases include
, the Performance Database, the Conferences Database, and
, the NA-NET mail forwarding and Whitepages Databases. A variety of
, user interfaces enable users to access the Netlib repository in the
, manner most convenient and compatible with their networking
, capabilities. These interfaces include the Netlib email interface,
, the Xnetlib X Windows client, the netlibget command-line TCP/IP
, client, anonymous FTP, anonymous RCP, and gopher.
file ut-cs-94-226.ps
by Makan Pourzandi and Bernard Tourancheau
title A Parallel Performance Study of Jacobi-like
, Eigenvalue Solution
ref University of Tennessee Technical Report CS-94-226,
, March 1994.
for In this report we focus on Jacobi like resolution of
, the eigen-problem for a real symmetric matrix from a
, parallel performance point of view: we try to optimize the
, algorithm working on the communication intensive part of the
, code. We discuss several parallel implementations and
, propose an implementation which overlaps the communications
, by the computations to reach a better efficiency. We show
, that the overlapping implementation can lead to significant
, improvements. We conclude by presenting our future work.
file ut-cs-94-229.ps
by James C. Browne, Jack Dongarra, Syed I. Hyder, Keith
, Moore, and Peter Newton,
title Visual Programming and Parallel Computing
ref University of Tennessee Technical Report CS-94-229,
, April 1994.
for Visual programming arguably provides greater benefit
, in explicit parallel programming, particularly coarse grain
, MIMD programming, than in sequential programming.
, Explicitly parallel programs are multi-dimenstioal objects;
, the natural representations of a parallel program are
, annotated directed graphs: data flow graphs, control flow
, graphs, etc. where the nodes of the graphs are sequential
, computations. The execution of parallel programs is a
, directed graph of instances of sequential computations. A
, visually based (directed graph) representation of parallel
, programs is thus more natural than a pure text string
, language where multi-dimensional structures must be
, implicitly defined. The naturalness of the annotated
, directed graph representation of parallel programs enables
, methods for programming and debugging which are
, qualitatively different and arguably superior to the
, conventional practice based on pure text string languages.
, Annotation of the graphs is a critical element of a
, practical visual programming system; text is still the best
, way to represent many aspects of programs.
,
, This paper presents a model of parallel programming and a
, model of execution for parallel programs which are the
, conceptual framework for a complete visual programming
, environement including capture of parallel structure,
, compilation and behavior analysis (performance and
, debugging). Two visually-oriented parallel programming
, systems, CODE 2.0 and HeNCE, each based on a variant of the
, model of programming, will be used to illustrate the
, concepts. The benefits of visually-oriented realizations of
, these models for program structure capture, software
, component reuse, performance analysis and debugging will be
, explored and hopefully demonstated by examples in these
, representations. It is only by actually implementing and
, using visual parallel programming languages that we have
, been able to fully evaluate their merits.
file ut-cs-94-230.ps
by Message Passing Interface Forum,
title MPI: A Message-Passing Interface Standard,
ref University of Tennessee Technical Report CS-94-230,
, April 1994.
for The Message Passing Interface Forum (MPIF), with
, participation from over 40 organizations, has been meeting
, since November 1992 to discuss and define a set of library
, standards for message passing. MPIF is not sanctioned or
, supported by any official standards organization.
,
, The goal of the Message Passing Interface, simply
, stated, is to develop a widely used standard for writing
, message-passing programs. As such the interface should
, establish a practical, portable, efficient and flexible
, standard for message passing.
,
, This is the final report, Version 1.0, of the
, Message Passing Interface Forum. This document contains all
, the technical features proposed for the interface. This
, copy of the draft was processed by LATEX on April 21, 1994.
,
, Please send comments on MPI to mpi-comments@cs.utk.edu.
, Your comment will be forwarded to MPIF committee members who
, will attempt to respond.
file ut-cs-94-232.ps
by Robert J. Manchek,
title Design and Implementation of PVM Version 3,
ref University of Tennessee Technical Report CS-94-232,
, May 1994.
for There is a growing trend toward distributed
, computing - writing programs that run across multiple
, networked computers - to speed up computation, solve larger
, problems or withstand machine failures. A programming model
, commonly used to write distributed applications is
, message-passing, in which a program is decomposed into
, distinct subprograms that communicate and synchronize with
, one another by explicitly sending and receiving blocks of
, data.
,
, PVM (Parallel Virtual Machine) is a generic message-passing
, system composed of a programming library and manager
, processes. It ties together separate physical machines
, (possibly of different types), providing communication and
, control between the subprograms and detection of machine
, failures. The resulting virtual machine appears as a
, single, manageable source. PVM is portable to a wide
, variety of machine architectures and operating systems,
, including workstations, supercomputers, PCs and
, multiprocessors.
,
, This paper describes the design, implementation and
, testing of version 3.3 of PVM and surveys related works.
file ut-cs-94-261.ps
by Peter Newton & Jack Dongarra,
title Overview of VPE: A Visual Environment for Message-Passing
, Parallel Programming,
ref University of Tennessee Technical Report CS-94-261,
, November 1994.
for This document introduces the VPE parallel programming
, environment as it was first conceived.
,
, VPE is a visual parallel programming environment for
, message-passing parallel computing and is intended to
, provide a simple human interface to the process of
, creating message-passing programs. Programmers describe
, the process structure of a program by drawing a graph in
, which nodes represent processes and messages flow on arcs
, between nodes. They then annotate these computation nodes
, with program text expressed in C or Fortran which
, contains simple message-passing calls. The VPE
, environment can then automatically compile, execute, and
, animate the program. VPE is designed to be implemented on
, top of standard message-passing libraries such as PVM and
, MPI.
file vp.ps
by Bruce J. MacLennan,
title Visualizing the Possibilities, Commentary, Behavioral and Brain
ref Sciences (1993) 16:2.
for I am in general agreement with Johnson-Laird \& Byrne's (J-L \& B's)
, approach and find their experiments convincing: therefore my commentary
, will be limited to several suggestions for extending and refining their
, theory.
file ipdn.ps
by Bruce MacLennan,
title Information Processing in the Dendritic Net
ref Ch. 6 of
, Rethinking Neural Networks: Quantum Fields \& Biological Data,
, Karl H. Pribram, ed., Lawrence Erlbaum Associates, Publishers, 1993,
, pp.~161-197.
for The goal of this paper is a model of the dendritic net that: (1) is
, mathematically tractable, (2) is reasonably true to the biology, and
, (3) illuminates information processing in the neuropil. First I
, discuss some general principles of mathematical modeling in a
, biological context that are relevant to the use of linearity
, and orthogonality in our models. Next I discuss the hypothesis that
, the dendritic net can be viewed as a linear field computer. Then
, I discuss the approximations involved in analyzing it as a dynamic,
, lumped-parameter, linear system. Within this basically linear
, framework I then present: (1) the self-organization of matched
, filters and of associative memories; (2) the dendritic computation
, of Gabor and other nonorthogonal representations; and (3) the
, possible effects of reverse current flow in neurons.
file fcb.ps
by Bruce MacLennan
title Field Computation in the Brain,
ref Ch. 7 of Rethinking Neural
, Networks: Quantum Fields \& Biological Data, Karl H. Pribram,
, ed., Lawrence Erlbaum Associates, Publishers, 1993, pp.~199-232.
for We begin with a brief consideration of the topology of knowledge.
, It has traditionally been assumed that true knowledge must be
, represented by discrete symbol structures, but recent research in
, psychology, philosophy and computer science has shown the
, fundamental importance of subsymbolic information processing, in
, which knowledge is represented in terms of very large numbers---or
, even continua---of microfeatures. We believe that this sets the
, stage for a fundamentally new theory of knowledge, and we sketch
, a theory of continuous information representation and processing.
, Next we consider field computation, a kind of continuous information
, processing that emphasizes spatially continuous fields of information.
, This is a reasonable approximation for macroscopic areas of cortex
, and provides a convenient mathematical framework for studying
, information processing at this level. We apply it also to a
, linear-systems model of dendritic information processing. We consider
, examples from the visual cortex, including Gabor and wavelet
, representations, and outline field-based theories of sensorimotor
, intentions and of model-based deduction.
file cckr.ps
by Bruce J. MacLennan
title Characteristics of Connectionist Knowledge Representation,
ref Information Sciences 70, pp. 119-143, 1993.
for Connectionism---the use of neural networks for knowledge
, representation and inference---has profound implications for the
, representation and processing of information because it provides
, a fundamentally new view of knowledge. However, its progress is
, impeded by the lack of a unifying theoretical construct corresponding
, to the idea of a calculus (or formal system) in traditional
, approaches to knowledge representation. Such a construct, called a
, simulacrum, is proposed here, and its basic properties are explored.
, We find that although exact classification is impossible, several
, other useful, robust kinds of classification are permitted. The
, representation of structured information and constituent structure are
, considered, and we find a basis for more flexible rule-like processing
, than that permitted by conventional methods. We discuss briefly
, logical issues such as decidability and computability and show that
, they require reformulation in this new context. Throughout we
, discuss the implications of this new theoretical framework for
, artificial intelligence and cognitive science.
file kohl-contact
for Information on where to contact author, James Arthur Kohl.
file kohl-93-mascots.ps
by T. L. Casavant, J. A. Kohl,
title "The IMPROV Meta-Tool Design Methodology for Visualization
, of Parallel Programs,"
ref Invited Paper, International Workshop on Modeling, Analysis,
, and Simulation of Computer and Telecommunication Systems (MASCOTS),
, January 1993.
for A design methodology is presented that simplifies the creation
, of program visualization tools while maintaining a high degree
, of flexibility and expressive power. The approach is based on a
, "circulation architecture" model that organizes the details of the
, user specification, and provides a formal means for indicating
, relationships. The overall user specification is divided into
, independent modules containing distinct, well-defined entities,
, and the relationships among these module entities are identified
, using a powerful "mapping language". This language maps conditions
, on entities to manipulations that modify entities, resulting in
, dynamic animations of program behavior. The mapping language
, supports arbitrary levels of abstraction providing a full range
, of detail, and allowing efficient view development. To demonstrate
, the feasibility and usefulness of this approach, a specific program
, visualization meta-tool design, IMPROV, is described.
file kohl-92-compsac.tgz
by J. A. Kohl, T. L. Casavant,
title "A Software Engineering, Visualization Methodology
, for Parallel Processing Systems,"
ref Proceedings of the Sixteenth Annual International
, Computer Software & Applications Conference (COMPSAC),
, Chicago, Illinois, September 1992, pp. 51-56.
for This paper focuses on techniques for enhancing the feasibility
, of using graphic visualization in analyzing the complexities of
, parallel software. The central drawback to applying such visual
, techniques is the overhead in developing analysis tools with flexible,
, customized views. The "PARADISE" (PARallel Animated DebuggIng and
, Simulation Environment) system, which has been in operation since 1989,
, alleviates some of this design overhead by providing an abstract,
, object-oriented, visual modeling environment which expedites custom
, visual tool development. PARADISE is a visual tool which is used to
, develop other visual tools, or a "meta-tool". This paper complements
, previous work on PARADISE by describing the philosophy behind its
, design, and how that philosophy leads to a methodology for constructing
, visual models which characterize parallel systems in general. Emphasis
, will be on the crucial issues in utilizing visualization for parallel
, software development, and how PARADISE deals with these issues.
file kohl-92-prop.ps
by J. A. Kohl,
title "The Construction of Meta-Tools for Program Visualization
, of Parallel Software,"
ref Ph.D. Thesis Proposal,
, Written Paper Accompanying Oral Comprehensive Examination,
, Technical Report Number TR-ECE-920204, Department of ECE,
, University of Iowa, Iowa City, IA, 52242, February 1992.
for This proposal provides a design methodology for program visualization
, meta-tools for parallel software that simplifies the use of such tools
, while maintaining a high degree of flexibility and expressive power.
, The approach is based on a "meta-tool circulation architecture"
, model that organizes the details of the user specification, and
, provides a circulation of information which supports a formal means
, for indicating relationships among that information. The overall user
, specification is divided into independent modules containing distinct
, entities, and the relationships among these module entities are
, identified using a powerful "relationship mapping language". This
, language maps conditions on selected entities to manipulations that
, modify the entities, allowing the state of an entity to be controlled
, in terms of the state of any other entity or itself. The mapping
, language supports arbitrary levels of abstraction in manipulating
, entities, allowing a full range of possible detail. As a result,
, visual analyses can be specified efficiently, utilizing only the
, minimum level of detail necessary. To demonstrate the feasibility
, and usefulness of this approach, a specific program visualization
, meta-tool design is proposed based on the methodology.
file kohl-92-ewpc-conf.tgz
by T. L. Casavant, J. A. Kohl, Y. E. Papelis,
title "Practical Use of Visualization for Parallel Systems,"
ref Invited Keynote Address Text for
, 1992 European Workshop on Parallel Computers (EWPC),
, Barcelona, Spain, March 23-24, 1992.
for This paper overviews the major contributions to the field of
, visualization as applied to parallel computing to date.
, Advances have come mostly from academics, but the influence on
, industrial and commercial settings for the future will be dramatic.
, The paper emphasizes how to improve the software development
, process for high-performance parallel computers through the use of
, visualization techniques both for program creation, as well as for
, debugging, verification, performance tuning, and maintenance.
, A concrete discussion of actual tool behavior is also presented.
file kohl-92-ewpc-full.tgz
by T. L. Casavant, J. A. Kohl, Y. E. Papelis,
title "Practical Use of Visualization for Parallel Systems,"
ref Technical Report Number TR-ECE-920102, Department of ECE,
, University of Iowa, Iowa City, IA, 52242,
, January 1992 (full version of EWPC 92 paper).
for This paper overviews the major contributions to the field of
, visualization as applied to parallel computing to date.
, Advances have come mostly from academics, but the influence on
, industrial and commercial settings for the future will be dramatic.
, The paper emphasizes how to improve the software development
, process for high-performance parallel computers through the use of
, visualization techniques both for program creation, as well as for
, debugging, verification, performance tuning, and maintenance.
, A concrete discussion of actual tool behavior is also presented.
file kohl-91-ipps.tgz
by J. A. Kohl, T. L. Casavant,
title "Use of PARADISE: A Meta-Tool for Visualizing Parallel Systems,"
ref Proceedings of the Fifth International Parallel Processing
, Symposium (IPPS),
, Anaheim, California, May 1991, pp. 561-567.
for This paper addresses the problem of creating software tools for
, visualizing the dynamic behavior of parallel applications and systems.
, "PARADISE" (PARallel Animated DebuggIng and Simulation Environment)
, approaches this problem by providing a "meta-tool" environment
, for generating custom visual analysis tools. PARADISE is a meta-tool
, because it is a tool which is utilized to create other tools. This
, paper focuses on the user's view of the use of PARADISE for
, constructing tools which analyze the interaction between parallel
, systems and parallel applications. An example of its use, involving
, the PASM Parallel Processing System, is given.
file kohl-91-santafe.ps
by J. A. Kohl, T. L. Casavant,
title "Methodologies for Rapid Prototyping of Tools for
, Visualizing the Performance of Parallel Systems,"
ref Presentation at Workshop on Parallel Computer Systems: Software Tools,
, Santa Fe, New Mexico, October 1991.
for This presentation focuses on the issues encountered in developing
, visualization tools for performance tuning of parallel software. This
, task will be analyzed from the perspective of the user and the "meta-
, tool" designer. The talk will emphasize these two perspectives on
, performance tuning, as well as another approach which utilizes a
, limited tool kit. Then, the current state of the PARADISE tool, a
, meta-tool for analyzing parallel software, will be examined, along
, with other visual tools, to determine the extent to which each tool
, satisfies the goals and guidelines of the previous discussion.
, Finally, directions for future work will be explored.
, ( Note: Presentation slides only. )
file kohl-91-comp.ps
by J. A. Kohl,
title "Visual Techniques for Parallel Processing,"
ref Written Comprehensive Examination,
, University of Iowa, Department of Electrical and Computer Engineering,
, ECETR-910726, July 1991.
for This Comprehensive Examination consists of an accumulation and
, analysis of research on the use of visualization in computing systems
, over the past decade, as well as recent efforts specifically in the
, area of software development for parallel processing. The goal of
, the examination is to determine the relationships among the references
, located, and their cumulative effect in directing the course of future
, research in the field of visualization. The examination includes a
, creative portion in which the various uses and approaches for
, visualization are to be classified via a taxonomical system. This
, classification will identify the central issues which differentiate
, the visualization environments for developing parallel software.
, In addition, a quantitative assessment of these environments will
, be constructed which presents a more concrete evaluation
, and categorization technique.
file kohl-91-901011.tgz
by J. A. Kohl, T. L. Casavant,
title "PARADISE: A Meta-Tool for Program Visualization in
, Parallel Computing Systems,"
ref Technical Report Number TR-ECE-901011, Department of ECE,
, University of Iowa, Iowa City, IA, 52242,
, Revised December 1991.
for This paper addresses the problem of creation of software tools for
, visualizing the dynamic behavior of parallel applications and systems.
, "PARADISE" (PARallel Animated DebuggIng and Simulation Environment)
, approaches this problem by providing a "meta-tool" environment for
, generating custom visual analysis tools. PARADISE is a meta-tool
, because it is a tool which is utilized to create other tools.
, The fundamental concept is the use of abstract visual models to
, simulate complex, concurrent behavior. This paper focuses on the
, goals of PARADISE, and reflects on the extent to which the prototype
, system, which has been in operation since 1989, meets these goals.
, The prototype system is described, along with a methodology for
, using visual modeling to analyze parallel software and systems.
, Examples of its use are also given.
file ut-cs-94-263.ps
by Shirley Browne, Jack Dongarra, Stan Green, Keith Moore,
, Tom Rowan, Reed Wade, Geoffrey Fox and Ken Hawick
title Prototype of the National High-Performance Software Exchange
ref University of Tennessee Technical Report CS-94-263,
, December, 1994
for This report describes a short-term effort to construct a prototype
, for the National High-Performance Software Exchange (NHSE).
, The prototype demonstrates how
, the evolving National Information Infrastructure (NII) can be used
, to facilitate sharing of software and information among members of the
, High Performance Computing and Communications (HPCC) community.
, Shortcomings of current information searching and retrieval tools
, are pointed out, and recommendations are given for areas in need
, of further development.
, The hypertext home page for the NHSE is accessible at
, http://www.netlib.org/nse/home.html.
file ut-cs-95-272.ps
by Shirley Browne, Jack Dongarra, Stan Green, Keith
, Moore, Tom Rowan, Reed Wade, Geoffrey Fox, Ken Hawick, Ken
, Kennedy, Jim Pool, and Rick Stevens,
title National HPCC Software Exchange,
ref University of Tennessee Technical Report CS-95-272,
, January 1995.
for This report describes an effort to construct a
, National HPCC Software Exchange (NHSE). This system shows
, how the evolving National Information Infrastructure (NII)
, can be used to facilitate sharing of software and
, information among members of the High Performance Computing
, and Communications (HPCC) community. To access the system
, use the URL: http://www.netlib.org/nse/.
file ut-cs-95-274.ps
by Jack J. Dongarra, Steve W. Otto, Marc Snir, and
, David Walker,
title An Introduction to the MPI Standard,
ref University of Tennessee Technical Report CS-95-274,
, January 1995.
for The Message Passing Interface (MPI) is a portable
, message-passing standard that facilitates the development of
, parallel applications and libraries. The standard defines
, the syntaax and semantics of a core of library routines
, useful to a wide range of users writing portable
, message-passing programs in Fortran 77 or C. MPI also forms
, a possible target for compilers of languages such as High
, Performance Fortran. Commercial and free, public-domain
, implementations of MPI already exist. These run on both
, tightly-coupled, massively-parallel machines (MPPs), and on
, networks of workstations (NOWs).
,
, The MPI standard was developed over a year of intensive
, meetings and involved over 80 people from approximately 40
, organizations, mainly from the United States and Europe.
, Many vendors of concurrent computers were involved, along
, with researchers from universities, government laboratories,
, and industry. This effort culminated in the publication of
, the MPI specification. Other sources of information on MPI
, are available or are under development.
,
, Researchers incorporated into MPI the most useful features
, of several systems, rather than choosing one system to adopt
, as the standard. MPI has roots in PVM, Express, P4,
, Zipcode, and Parmacs, and in systems sold by IBM, Intel,
, Meiko, Cray Research, and Ncube.
file ut-cs-95-276.ps
by Henri Casanova, Jack Dongarra, Phil Mucci
title A Test Suite for PVM,
ref University of Tennessee Technical Report CS-95-276,
, June 1995.
for Although PVM is well established in the field of distributed
, computing, the need has been shown for a standard set of
, tests to give its users further confidence in the
, correctness of their installation. This report introduces
, pvm_test and its X interface pvm_test_gui. pvm_test was
, designed to exercise some of PVM's more important functions
, and to provide some primitive measures of its performance.
file ut-cs-95-277.ps
by Philip J. Mucci and Jack Dongarra,
title Possibilities for Active Messaging in PVM,
ref University of Tennessee Technical Report CS-95-277,
, February 1995.
for Active messaging is a communications model designed
, around the interaction of a network interface and its
, driving software in an operating system. By utilizing this
, model, the user can design applicatiions that make better
, use of the available computing and communication resources.
, Currently, successful implementations exist only for a
, certain subset of workstations and network adapters. This
, paper is an exploration into a portable implementation of
, active messaging for possbile inclusion to the PVM suite, a
, generalized framework for distributed computing.
file ut-cs-95-278.ps
title Location-Independent Naming for Virtual Distributed Software
, Repositories
by Shirley Browne, Jack Dongarra, Stan Green,
, Keith Moore, Theresa Pepin, Tom Rowan, Reed Wade, Eric Grosse
ref University of Tennessee Technical Report CS-95-278,
, February 1995.
for A location-independent naming system for network resources
, has been designed to facilitate organization and description
, of software components accessible through a virtual distributed
, repository.
, This naming system enables easy and efficient searching and retrieval,
, and it addresses many of the
, consistency, authenticity, and integrity issues involved with
, distributed software repositories by providing mechanisms for
, grouping resources and for authenticity and integrity checking.
, This paper details the design of the naming system, describes
, a prototype implementation of some of the capabilities, and
, describes how the system fits into the development of the National
, HPCC Software Exchange, a virtual software repository that has the goal
, of providing access to reusable software components for
, high-performance computing.
file ut-cs-95-279.ps
title Digital Software and Data Repositories for Support of
, Scientific Computing
by Ronald Boisvert, Shirley Browne, Jack Dongarra, and Eric Grosse
ref University of Tennessee Technical Report CS-95-279,
, February 1995.
for This paper discusses the special characteristics and needs of
, software repositories and describes how these needs have been
, met by some existing repositories. These repositories include
, Netlib, the National HPCC Software Exchange,
, and the GAMS Virtual Repository.
, We also describe some systems that provide on-line access
, to various types of scientific data.
, Finally, we outline a proposal for integrating software and data
, repositories into the world of digital document libraries, in
, particular CNRI's ARPA-sponsored Digital Library project.
file ut-cs-95-288.html
title Distributed Information Management in the National HPCC
, Software Exchange
by Shirley Browne, Jack Dongarra, Geoffrey C. Fox, Ken Hawick,
, Ken Kennedy, Rick Stevens, Robert Olson, Tom Rowan
ref University of Tennessee Technical Report CS-95-288,
, April 1995.
for The National HPCC Software Exchange
, is a collaborative effort by member institutions of the
, Center for Research on Parallel Computation
, to provide network access to HPCC-related software, documents,
, and data.
, Challenges for the NHSE include identifying, organizing, filtering,
, and indexing the rapidly growing wealth of relevant information
, available on the Web.
, The large quantity of information necessitates performing these
, tasks using automatic techniques, many of which make use of parallel
, and distribution computation, but human intervention is needed for
, intelligent abstracting,
, analysis, and critical review tasks. Thus, major goals of
, NHSE research are to find the right mix of
, manual and automated techniques, and to leverage the results of
, manual efforts to the maximum extent possible. This paper describes
, our current information gathering and
, processing techniques, as well as our future plans for integrating
, the manual and automated approaches.
, The NHSE home page is accessible at http://www.netlib.org/nse/.
file ut-cs-95-287.ps
title Management of the NHSE - A Virtual Distributed Digital Library
by Shirley Browne, Jack Dongarra, Ken Kennedy, Tom Rowan
ref University of Tennessee Technical Report CS-95-287,
, April 1995.
for The National HPCC Software Exchange (NHSE) is a distributed collection
, of software, documents, and data of interest to the high performance
, computing community. Our experiences with the design and initial
, implementation of the NHSE are relevant to a number of general digital
, library issues, including the publication process, quality control,
, authentication and integrity, and information retrieval.
, This paper describes an authenticated submission process that is
, coupled with a multilevel review process.
, Browsing and searching tools for aiding with
, information retrieval are also described.
file ut-cs-95-294.ps
by J. J. Dongarra, B. Straughan, and D. W. Walker,
title Chebyshev tau - QZ Algorithm Methods for Calculating
, Spectra of Hydrodynamic Stablilty Problems,
ref University of Tennessee Technical Report CS-95-294,
, June 1995.
for The Chebyshev tau method is examined in detail for a
, variety of eigenvalue problems arising in hydrodynamic
, stability studies, particularly those of Orr-Sommerfeld
, type. We concentrate on determining the whole of the top
, end of the spectrum in parameter ranges beyond those often
, explored. The method employing a Chebyshev representation
, of the fourth derivative operator, D^4, is compared with
, those involving the second and first derivative operators,
, D^2, D, respectively; the latter two representations require
, use of the QZ algorithm in the resolution of the singular
, generalised matrix eigenvalue problem which arises. The D^2
, method is shown to be different from the stream function -
, vorticity scheme in certain (important and practical) cases.
, Physical problems explored are those of Posieuille, Couette,
, and pressure gradient driven circular pipe flow. Also
, investigated are the three-dimensional problem of Posieuille
, flow arising from a normal velocity - normal vorticity
, interaction, and finally Couette and Posieuille problems for
, two viscous, immiscible fluids, one overlying the other are
, studied.
file ut-cs-95-297.ps
by Jack J. Dongarra, Hans W. Meuer, and Erich
, Strohmaier,
title TOP500 Supercomputer Sites,
ref University of Tennessee Technical Report CS-95-297,
, July 1995.
for To provide a better basis for statistics on
, high-performance computers, we list the sites that have the
, 500 most powerful computer systems installed. The best
, LINPACK benchmark performance achieved is used as a
, performance measure in ranking the computers.
file ut-cs-95-299.ps
by Jack J. Dongarra and Tom Dunigan,
title Message-Passing Performance of Various Computers,
ref University of Tennessee Technical Report CS-95-299,
, July 1995.
for This report compares the performance of different
, computer systems for basic message passing. Latency and
, bandwidth are measured on Convex, Cray, IBM, Intel, KSR,
, Meiko, nCUBE, NEC, SGI, and TMC multiprocessors.
, Communication performance is contrasted with the
, computational power of each system. The comparison includes
, both shared and distributed memory computers as well as
, networked workstation clusters.
file ut-cs-95-301.ps
by Henri Casanova, Jack Dongarra, and Weicheng Jiang,
title The Performance of PVM on MPP Systems,
ref University of Tennessee Technical Report CS-95-301,
, August 1995.
for PVM (Parallel Virtual Machine) is a popular standard
, for writing parallel programs so that they may execute over
, a network of heterogeneous machines. This paper presents
, some performance results of PVM on three massively parallel
, processing systems: the Thinking Machines CM-5, the Intel
, Paragon, and the IBM SP-2. We describe the basics of the
, communication model of PVM and its communication routines.
, We then compare its performance with native message-passing
, systems on the MPPs.
file ut-cs-95-310.ps
by Jack Dongarra, Loic Prylli, Cyril Randriamaro, and
, Bernard Tourancheau,
title Array Redistribution in ScaLAPACK using PVM,
ref University of Tennessee Technical Report CS-95-310,
, October 1995.
for Linear algebra on distributed-memory parallel
, computers raises the problem of data distribution of
, matrices and vectors among the processes. Block-cyclic
, distribution works well for most algorithms. The block size
, must be chosen carefully, however, in order to achieve good
, efficiency and good load balancing. This choice depends
, heavily on each operation; hence, it is essential to be able
, to go from one distribution to another very quickly. We
, present here the algorithms implemented in the ScaLAPACK
, library, and we discuss timing results on a network of
, workstations and on a Cray T3D using PVM.
file ut-cs-95-312.ps
by Shirley Browne and Tom Rowan,
title Assessment of the NHSE Software Submission and
, Review Process,
ref University of Tennessee Technical Report CS-95-312,
, November 1995.
for An NHSE Software submission trial run was conducted
, to facilitate evaluation of the submission and review
, process. This document describes the experiment and
, assesses the current state of the NHSE software submission
, and review process.
file ut-cs-95-313.ps
by Henri Casanova and Jack Dongarra,
title NetSolve: A Network Server for Solving
, Computational Science Problems,
ref University of Tennessee Technical Report CS-95-313,
, November 1995.
for This paper presents a new system, called NetSolve,
, that allows users to access computational resources, such as
, hardware and software, distributed across the network. This
, project has been motivated by the need for an easy-to-use,
, efficient mechanism for using computational resources
, remotely. Ease of use is obtained as a result of different
, interfaces, some of which do not require any programming
, effort from the user. Good performance is ensured by a
, load-balancing policy that enables NetSolve to use the
, computational resource available as efficiently as possible.
, NetSolve is designed to run on any heterogeneous network and
, is implemented as a fault-tolerant client-server
, application.
file ut-cs-96-318.ps
by Jack J. Dongarra and Horst D. Simon,
title High Performance Computing in the U.S. in 1995 - An
, analysis on the Basis of the TOP500 List,
ref University of Tennessee Technical Report CS-96-318,
, January 1996.
for In 1993 for the first time a list of the top 500
, supercomputer sites worldwide has been made available. The
, TOP500 list allows a much more detailed and well founded
, analysis of the state of high performance computing.
, Previously data such as the number and geographical
, distribution of supercomputer installations were difficult
, to obtain, and only a few anslysts undertook the effort to
, track the press releases by dozens of vendors. With the
, TOP500 report now generally and easily available it is
, possible to present an analysis of the state of High
, Performance Computing (HPC) in the U.S. This note
, summarizes some of the most important observations about HPC
, in the U.S. as of late 1995, in particular the continued
, dominance of the world market in HPC by the U.S., the market
, penetration by commodity microprocessor based systems, and
, the growing industrial use of supercomputers.
file ut-cs-96-325.ps
by Aad J. van der Steen and Jack J. Dongarra,
title Overview of Recent Supercomputers,
ref University of Tennessee Technical Report CS-96-325,
, April 1996.
for In this report we give an overview of parallel- and
, vector computers which are currently available or will become
, available within a short time frame from vendors; no attempt
, is made to list all machines that are still in the research
, phase. The machines are described according to their
, architectural class. Shared- and distributed memory SIMD-
, and MIMD machines are discerned. The information about each
, machine is kept as compact as possible. Moreover, no attempt
, is made to quote prices as these are often even more elusive
, than the performance of a system. This document reflects the
, technical state of the supercomuter arena as accurately as
, possible. However, the authors nor their employers take any
, responsibility for errors or mistakes in this document. We
, encourage anyone who has comments or remarks on the contents
, to inform us, so we can improve this work.
file ut-cs-96-329.ps
by Shirley Browne, Jack Dongarra, Kay Hohn, and Tim
, Niesen,
title Software Repository Interoperability,
ref University of Tennessee Technical Report CS-96-329,
, July 1996.
for A number of academic, commercial, and government
, software repositories currently exist that provide access to
, software packages, reusable software components, and related
, documents, either via the Internet or via
, intraorganizational intranets. It is highly desirable,
, both for user convenience and savings in duplication of
, effort, that these repositories interoperate. This paper
, describes interoperability standards that have already been
, developed as well as those under development by the Reuse
, Library Interoperability Group (RIG). These standards
, include a data model for a common semantics for describing
, software resources, as well as frameworks for describing
, software certification policies and intellectual property
, rights. The National HPCC Software Exchange (NHSE) is
, described as an example of an organization that is achieving
, interoperation between government and academic HPCC software
, repositories, in part through adoption of RIG standards.
file ut-cs-96-342.ps
by Jack J. Dongarra, Hans W. Meuer, and Erich
, Strohmaier,
title TOP500 Report 1996,
ref University of Tennessee Technical Report CS-97-342,
, November 1996.
for This report is a snapshot of the state of
, supercomputer installations in the world. It is based on
, the TOP500 list that was published in November 1996 and
, includes trends from the previous lists from June 1993 till
, November 1996.
file ut-cs-96-343.ps
by Henri Casanova, Jack Dongarra, and Keith Seymour,
title Client User's Guide to NetSolve
ref University of Tennessee Technical Report CS-96-343,
, December 1996.
for The NetSolve system, developed at the University of
, Tennessee, is a client-server application designed to solve
, computational science problems over a network. Users may
, access NetSolve computational servers through C, Fortran,
, MATLAB, or Java interfaces. This document briefly presents
, the basics of the system. It then describes in detail how
, the different clients can contact the NetSolve system to
, have some computation performed, thanks to numerous
, examples. Complete reference manuals are given in the
, appendixes.
file ut-cs-96-338.ps
by Shirley V. Browne and James W. Moore,
title Reuse Library Interoperability and the World Wide
, Web,
ref University of Tennessee Technical Report CS-96-338,
, October 1996.
for The Reuse Library Interoperability Group (RIG) was
, formed in 1991 for the purpose of drafting standards
, enabling the interoperation of software reuse libraries. At
, that time, prevailing wisdom among many reuse library
, operators was that each should be a stand-alone operation.
, Many operators saw a need for only a single library, their
, own, and most strived to provide the most general possible
, services to appeal to a broad community of users. The ASSET
, program, initiated by the Advanced Research Project Agency
, STARS program, was the first to make the claim that it
, should properly be one part of a network of interoperating
, libraries. Shortly thereafter, the RIG was formed,
, initially as a collaboration between the STARS program and
, the Air Force RAASP program, but growing within six months
, to a self-sustaining cooperation among twelve chartering
, organizations. The RIG has grown to include over twenty
, members from government, industry, and academic reuse
, libraries. It has produced a number of technical reports
, and proposed interoperability standards, some of which are
, described in this report.
file ut-cs-97-346.ps
by Keith Moore, Shirley Browne, Jason Cox, and Jonathan
, Gettler,
title Resource Cataloging and Distribution System,
ref University of Tennessee Technical Report CS-97-346,
, January 1997.
for We describe an architecture for cataloging the
, characteristics of Internet-accessible resources, for
, replicating such resources to improve their accessibility,
, and for registering the current locations of the resources
, so replicated. Message digests and public-key
, authentication are used to ensure the integrity of the files
, provided to users. The service is designed to provide
, increased functionality with only minimal changes to either
, a client or a server. Resources can be named either by URNs
, or by existing URLs, and either type of resource name can be
, resolved to a description and ultimately to a set of
, locations from which the resource can be retrieved.
file ut-cs-97-350.ps
by Pierre-Yves Calland, Jack Dongarra, and Yves Robert,
title Tiling with limited resources,
ref University of Tennessee Technical Report CS-97-350,
, February 1997.
for In the framework of perfect loop nests with uniform
, dependences, tiling has been extensively studied as a
, source-to-source program transformation. Little work has
, been devoted to the mapping and scheduling of the tiles on
, to physical processors. We present several new results in
, the context of limited computational resources, and assuming
, communication-computation overlap. In particular, under
, some reasonable assumptions, we derive the optimal mapping
, and scheduling of tiles to physical processors.
file ut-cs-97-351.ps
by Ronald F. Boisvert, Shirley V. Browne, Jack J.
, Dongarra, Eric Grosse, and Bruce Miller,
title Interactive and Dynamic Content in Software
, Repositories,
ref University of Tennessee Technical Report CS-97-351,
, February 1997.
for The goal of our software repository research is to
, improve access to tools for doing computational science for
, both expert and non-expert users. We are exploring the use
, of emerging Web and network technologies for enhancing
, repository usability and interactivity. Technologies such
, as Java, Inferno/Limbo, and remote execution services can
, interactively assist users in searching for, selecting, and
, using scientific software and computational tools. This
, paper describes various related prototype experimental
, interfaces and servides we have developed for traversing a
, software classification hierarchy, for selection of software
, and test problems, and for remote execution of library
, software. After developing and tesing our research
, prototypes, we deploy them in working network services
, useful to the computational science community.
file ut-cs-97-354.ps
by Erich Strohmaier,
title Statistical Performance Modeling: Case Study of the
, NPB 2.1 Results,
ref University of Tennessee Technical Report CS-97-354,
, March 1997.
for With the results of the version 2.1 a consistent set
, of performance measurements of the NAS Parallel Benchmarks
, (NPB) are available. Unchanged portable MPI code was used
, for this set of 269 single measurements. In this study we
, investigate how this amount of information can be condensed.
, We present a metholodogy for analyzing performance data not
, requiring detailed knowledge of the codes. For this we
, study several different generic timing models and fit the
, reported data. We show that with a joint timing model for
, all codes and all systems the data can be fitted reasonably
, well. This model also contains only a minimal set of free
, parameters. This method is usable in all cases where the
, analysis of results from complex application code benchmarks
, is necessary.
file ut-cs-97-360.ps
by Frederic Desprez, Jack Dongarra, Fabrice Rastello,
, and Yves Robert,
title Determining the Idle Time of a Tiling: New Results,
ref University of Tennessee Technical Report CS-97-360,
, May 1997.
for In the framework of perfect loop nests with uniform
, dependencies, tiling has been studied extensively as
, a source-to-source program transformation. We build
, upon recent results by Hogsted, Carter, and Ferrante
, [10], who aim at determining the cumulated idle time
, spent by all processors while executing the partitioned
, (tiled) computation domain. We propose new, much
, shorter proofs of all their results and extend these
, in several important directions. More precisely, we
, provide an accurate solution for all values of the
, rise parameter that relates the shape of the iteration
, space to that of the tiles, and for all possible
, distributions of the tiles to processors. In contrast,
, the authors in [10] deal only with a limited number of
, cases and provide upper bounds rather that exact
, formulas.
file ut-cs-97-371.ps
by Antoine Petitet
title Algorithmic Redistribution Methods for Block Cyclic
, Decompositions
ref University of Tennessee Technical Report CS-97-371,
, July 1997.
for This research aims at creating and providing a frame-
, work to describe algorithmic redistribution methods
, for various block cyclic decompositions. To do so
, properties of this data distribution scheme are
, formally exhibited. The examination of a number of
, basic dense linear algebra operations illustrates the
, application of those properties. This study analyzes
, the extent to which the general two-dimensional block
, cyclic data distribution allows for the expression of
, efficient as well as flexible matrix operations. This
, study also quantifies theoretically and practically
, how much of the efficiency of optimal block cyclic
, data layouts can be maintained.
, The general block cyclic decomposition scheme is shown
, to allow for the expression of flexible basic matrix
, operations with little impact on the performance and
, efficiency delivered by optimal and restricted kernels
, available today. Second, block cyclic data layouts,
, such as the purely scattered distribution, which seem
, less promising as far as performance is concerned, are
, shown to be able to achieve optimal performance and
, efficiency for a given set of matrix operations. Conse-
, quently, this research not only demonstrates that the
, restrictions imposed by the optimal block cyclic data
, layouts can be alleviated, but also that efficiency and
, flexibility are not antagonistic features of the block
, cyclic mappings. These results are particularly relevant
, to the design of dense linear algebra software libraries
, as well as to data parallel compiler technology.
file overview98.ps
by Aad J. van der Steen, Jack Dongarra
title Overview of recent supercomputers
date Feb 1998
size 748k
for In this report we give an overview of parallel an vector
, computers which are currently available or will become
, available within a short time frame from vendors; no
, attempt is made to list all machines that are still in
, the research phase. The machines are described according
, to their architectural class. Shared and
, distributed-memory SIMD an MIMD machines are discerned.
, The information about each machine is kept as compact as
, possible. Moreover, no attempt is made to quote price
, information as this is often even more elusive than the
, performance of a system.