NestStep is a partitioned global address space (PGAS) language
for the bulk-synchronous parallel (BSP) programming model.
NestStep is designed as an
explicitly parallel extension of existing imperative programming
languages such as C or Java.
Several instances of the same NestStep program
run on different processors
and communicate via an interconnection network
and the NestStep run time system, which
emulates a CRCW-BSP computer with a virtually shared memory
on top of a message-passing system.

The language design of NestStep was developed 1998-2000 by
Christoph W. Kessler,
then at the University of Trier, Germany.
The NestStep language extensions have been defined for Java
(NestStep-Java, 1998), for
C (NestStep-C, 2000) and for the imperative part of
Modelica (NestStepModelica, 2006).
Implementations of a NestStep run-time system have been done
in Java (1999) and C (2000, 2006).
See the
implementation and download information further below for technical
details of the implementations.
The current version of the NestStep-C run-time system is available for
download.

NestStep is well-suited for parallel applications that match the BSP
model, such as most parallel linear algebra computations, iterative
solvers for partial differential equations etc., see for instance
Bisseling's book for a presentation of BSP algorithms in scientific computing.
In contrast, NestStep is not
appropriate for all problems that inherently require sequential consistency
or bilateral mutual exclusion synchronization, because this conflicts
with the superstep consistency of NestStep.
In certain cases, such applications could be transformed into bulk-synchronous form
without introducing prohibitive overhead. An elaboration on this topic can
be found in the
PhD thesis by Arturo Gonzalez-Escribano (2003).

A superstep
consists of
(1) a phase of local computation of each processor, where only
local variables (and locally held copies of remote variables)
can be accessed, and
(2) a communication phase that sends some data to the processors that
may need them in the next superstep(s), and then
waits for incoming messages and
processes them.
The separating barrier after a superstep is not necessary
if it is implicitly covered by (2). For instance,
if the communication pattern in (2) is a complete exchange,
a processor can proceed to the next superstep as soon as it has received
a message from every other processor.

In order to simplify cost predictions,
the BSP machine model is characterized by only four
parameters: the number p of processors,
the per-processor byte transfer rate g
the latency L (i.e. the minimum time
between subsequent synchronizations), and the processor speed, s.

Additionally, some properties of the program dramatically influence
the run time, in particular the communication behaviour.
By convention, the maximum total number of bytes in all outgoing or incoming messages
for a single processor at the end of a specific superstep is denoted by h.
A communication pattern that bounds the maximum communication indegree
and outdegree of any processor by h is called h-relation.
Hence, a superstep performing balanced local work w and requiring the
system to realize an h relation in the communication phase
has time complexity w+gh+L.

The BSP model, as originally defined, has no support for shared memory.
Also, in BSP there is no support for processor subset synchronization,
i.e. for nesting of supersteps. Thus, programs can only exploit
one-dimensional parallelism or must apply a
flattening-transformation
that converts nested parallelism to flat parallelism.
However, automatic flattening by the compiler
has only been achieved for SIMD parallelism (e.g. NESL).

a source-to-source (NestStep-C to C) compiler
(finished, 2010, but no support for nested supersteps yet.)

the NestStep-C run-time system, written in C.
2 versions are available:
Cluster-NestStep-C using MPI for interprocessor communication, and
Cell-NestStep-C running on the Cell BE, a heterogeneous multicore processor.

BlockLib,
a skeleton programming library of data-parallel skeletons such as
map, reduce, map-with-overlap, working on
Cell-NestStep-C distributed arrays was designed and implemented for Cell in 2007/08:

NestStep-Modelica

The NestStepModelica implementation combines an extension
of the Modelica language and compiler to support the NestStep constructs,
with the NestStep-C run-time system.
The language extensions, implemented in MetaModelica,
are limited to the imperative part of the Modelica language, namely
imperative computations of computationally heavy but side-effect free user functions
occuring on the right-hand side of the ordinary differential equation system generated
from a Modelica program. These computations are encapsulated in
Modelica's algorithm construct.
The extended compiler generates C code that is linked with the NestStep-C run-time system.

A first version of the NestStep-C to C front end compiler,
written by Magnus Holm during his final thesis (2010) project, is
now available, along with an extended run-time system for Cell BE
(nested supersteps are not supported yet).
Available from C. Kessler on request.
System requirements (tested on a Windows machine):
Java 2 SDK SE 1.6.x (or later), ANTLRv2, GCC.