CS267: Feb 13, 1996
Programming with pSather

Sather is an object
oriented language designed to be simple, efficient, safe, and
non-proprietary. Sather was developed at the
International Computer Science
Institute , a research institute affiliated with the
computer science department of University of California at Berkeley.
It was first introduced in 1991. Since then, considerable practical
experience has been obtained with the language by the hundreds of users
making up the Sather community. Sather offers many safety and convenience
features to help programmers avoid common errors and reuse code.

pSather,
the parallel and distributed extension of Sather, presents a shared memory
abstraction to the programmer while allowing explicit placement of data and
threads. pSather adds threads and synchronization mechanisms to the language.
Even though pSather programs can run on distributed computer systems, they
offer shared memory abstraction across all threads

Serial Sather has been ported to almost any Unix platform, as well as
Macinstosh and PCs. The Sather and pSather compilers are now integrated
and distributed jointly. The current version of pSather compiler is known
to run on

Sun SMPs running SunOS, and Solaris.

Clusters of Sun SMPs and single processing machines connected by Myrinet

Clusters of SMPs and single processing machines connected by Ethernet

Meiko CS-2

There have been ports of older versions of pSather to CM5.

We begin with the explanation of some of the serial Sather features necessary
for the following discussion of pSather and Sharks and Fish problem 1.

Important Concepts

This section briefly introduces some concepts important to Sather that the
reader may not have been exposed to in C++ or other popular OO languages. It
isn't meant as a complete language tutorial. More information of a tutorial
nature, manuals, class browsers, and numerous examples are available from
the WWW page:
http://www.icsi.berkeley.edu/~sather

Safety

Sather is designed to shield programmers from common sources of bugs.
Two important language features are strong typing and
garbage collection .

Sather programs are strongly typed , so variables
can't point to memory of an incorrect type. For example, in C it is common practice
to freely mix different data types, such as signed and unsigned
integers, characters, and even pointers. This can lead to subtle
bugs and prohibits many compiler optimizations because there is very
little information known at compile time about the behavior of pointers
(recall assignment 1 and how much trouble you had to go through to
make sure you know what code the compiler generates).

Like many object-oriented languages, serial Sather is garbage
collected , so programmers never have to free memory explicitly.
The runtime system does so automatically when it can be proven to be safe.
With explicit deallocation, this work is done by the programmer and could
lead to "dangling pointers" and "memory leaks". These problems are severe
enough that a small industry has arisen in providing tools to assist C and
C++ programmers find such bugs (Purify).

Work is under way to complete a garbage collector for pSather. However,
for Assignment 2 you will need to free memory explicitly, or structure
the code to avoid generating garbage.

When checking options have been turned on in a Sather program by
compiler flags ( -check all turns on all checking options), the
resulting program cannot crash disastrously or mysteriously. All sources
of errors that cause crashes are either eliminated at compile-time or
funneled into narrow circumstances (such as accessing beyond array bounds)
that are found at run-time precisely at the source of the error.

Separation of Subtyping and Code Inclusion

In many object-oriented languages, the term `inheritance' is used to mean
two things simultaneously:

subtyping - the requirement that a class provide implementations
for the abstract methods in a supertype.

code inheritance - allows a class to reuse a portion of the
implementation of another class.

Sather provides separate mechanisms for these two concepts.
Abstract classes
represent interfaces: sets of signatures that subtypes of the abstract class
must provide. A name of an abstract class has to start with a ``$'', as in
$FISH. Other kinds of classes provide implementation.
Classes may include implementations from other classes using a special
`include' clause ;
this does not affect the subtyping relationship between classes. Separating
these two concepts simplifies the language considerably and makes it easier
to understand code.

No Implicit Calls

Sather does as little as possible behind the user's back at runtime.
There are no implicitly constructed temporary objects, and therefore no rules
to learn or circumvent. This extends to
class constructors: all calls that
can construct an object are explicitly written by the programmer.

In Sather, constructors are ordinary routines distinguished only by a
convenient but optional calling syntax. With garbage collection there is no
need for destructors; however, explicit finalization is available when
desired (SYS::destroy).

Sather never converts types implicitly, such as from integer to character,
integer to floating point, single to double precision, or subclass to
superclass. With neither implicit construction nor conversion, Sather
resolves routine overloading (choosing one of several similarly named
operations based on argument types) much more clearly than C++. The
programmer can easily deduce which routine will be called.

"#" is syntactic sugar for function "create". ``::'' is used as
a shorthand, when the type of the lefthand side could be infered from
that of the righthand side: i::=2 is equivalent to i:INT:=2;

Iteration Abstraction

Earlier versions of Sather used a conventional
until...end statement much like other languages. This
made Sather susceptible to bugs that afflict looping constructs in most
languages. Code which controls loop iteration is known for tricky
``fencepost errors'' (incorrect initialization or termination).
Traditional iteration constructs also require the internal
implementation details of data structures to be exposed when iterating
over their elements.

An important language improvement in Sather 1.0 over earlier versions was
the addition of iterators (or just iters);
please, check out our
Iterator Page . Iterators encapsulate user defined looping control
structures just as routines do for algorithms. Code using iterators is
safe, because the creation, increment, and termination check are bound
together at one point. Each class may define many sorts of iters,
whereas a traditional approach requires a different yet intimately coupled
class for each kind of iteration over the major class.

Iterators are part of the class interface just like routines. Instead
of a return statement, they use yield and quit ,
and may only be called in loops. When an iter yields, it returns
control to the calling loop. When it is called in the next iteration
of the loop, execution resumes in the iterator at the statement
following the yield. When an iter quits, it terminates the loop in
which it appears.

Iterator names must end with a `!', which textually points out all
places where a loop may exit. The Sather loop construct is simply
loop...end . Built-in iters until! , while! ,
and break! offer traditional control constructs. The standard
libraries define many other useful iters in many classes such as
upto! (generate successive numbers), elt! (yield the
elements of a container) and set! (store elements into a
container). Such iterators make it convenient and safe to traverse
complicated data structures by isolating the details of the iteration
from the client of the data structure abstraction.

Iterators are critical for operating on collections of
items. Matrices define iters to yield rows and columns; tree classes
have recursive iters to traverse the nodes in pre-order, in-order and
post-order; graph classes have iters to traverse vertices or edges
breadth-first and depth-first. Other container classes such as hash
tables, queues, etc. all provide iters to yield and insert elements.
Arbitrary iterators may be used together in loops with other code.

Most likely, you will be able to do Assignment 2 using only already
defined iterators. If interested, examine our iterator tutorial
for more information and exercises. Also, see a TOPLAS paper
"Iteration Abstraction in Sather" by
Stephan Murer, Stephen Omohundro, David Stoutamire and Clemens Szyperski

Iterator usage example.

my_fish.elt! produces elements of my_fish
(which is a list of fish) one by one. A new
position is computed For each produced fish.

pSather

pSather is the parallel extension to Sather. A major goal has been the easy
reuse of serial code in parallel applications. pSather adds support for
threads, synchronization, communication and placement of objects and threads.

Because of volume production, commercial workstations today offer
better potential price/performance for general code than massively
parallel processors. They also fit comfortably in the capital budget
of most research grants. For these reasons networks of workstations
considered as a single parallel computing facility will become an
economically important platform.

Networks of workstations have longer latencies than centralized
machines. In order to achieve high performance, it is important to
organize data so that they do not need to be moved between workstations
when that will stall waiting computations. Some compiler optimizations
can alleviate this, but generally the programmer must design a layout
intimately integrated with the specific algorithm

Machines do not have to have large latencies to make data placement
important. Because processor speeds are outpacing memory speeds,
attention to locality can have a profound effect on the performance of
even ordinary serial programs (recall assignment 1). Existing serial
languages can make life difficult for the performance-minded programmer
because they do not allow much leeway in expressing placement. For example,
extensions allowing the programmer to describe array layout as
block-cyclic is helpful for matrix-oriented code but of no use for
general data structures.

Some environments expose latencies to the programmer with a distributed
memory model and explicit communication (split-phase). However,
it is easier to program with a shared memory space which uses one name
to refer to each datum no matter where the reference is. High performance
still requires explicit human-directed placement. pSather tries to provide
the best of both worlds; the compiler implements the shared memory abstraction
using the most efficient facilities of the target platform available,
while allowing the programmer to provide placement directives for
control and data (without requiring them).

The memory performance model of pSather has two levels. The basic unit
of location in pSather is the cluster . It is assumed that
reading or writing memory on the same cluster is significantly faster
than on a remote cluster. A cluster corresponds to an efficient group
in the memory hierarchy, and may have more than one processor. For
example, on a network of workstations a cluster would correspond to one
workstation, although that workstation may have multiple processors
sharing a common bus. This model is appropriate for any machine for
which local cached access is significantly faster than general access.

In most languages there is no way to distinguish the locality of data
that is referenced. This is convenient for the programmer but ignores
the realities of modern machines, which introduce penalties for poorly
placed data in the form of missing a cache line, TLB misses, or even
paging to disk. This is especially important for distributed machines
where data may reside on other nodes. pSather allows the programmer
to help the compiler and runtime by providing explicit placement. If
threads or objects are unfixed, the compiler and runtime can attempt to
place the data somewhere suitable.

Threads

In serial Sather there is only one thread of execution; in pSather there
may be many. Multiple threads are similar to multiple serial Sather programs
executing concurrently, but threads share variables of a single namespace.

A new thread is created by executing a fork ,which may be a par
or fork statement, parloop statement, or an attach .
The new thread is a child of the forking thread.
pSather provides operations that can block a thread, making it unable to
execute statements until some condition occurs. pSather threads that are not
blocked will eventually run, but there is no other constraint on the order
of execution of statements between threads that are not blocked. Threads no
longer exist once they terminate. When a pSather program begins execution
it has a single thread corresponding to the main routine.

Fork Statement Example

fork statement must be syntactically enclosed in a par
statement. Statements in the fork body are executed in a separate
thread. Variables declared outside par (such as dt) are shared
among all threads. Each thread has a copy of locals declared in the
par body (such as t).

Parloop Statement Example

syntactic sugar for
par
loop
S1
fork
S2
end
end
end
i::=0.upto!(nthreads-1) is evaluated serially. Code bracketed by
do and end is executed in a different thread.
Variables declared outside parloop (such as dt) are shared
among all threads. Each thread has a copy of locals declared in the
parloop body (such as t).

Attach Statement Example

The left side of the attach statement must be of type $ATTACH
(such as gates). For example, if the lhs is of type GATE{T}, the return
type of the rhs must be of type 'T'. If the gate is locked by another
thread, the executing thread is suspended until the gate becomes unlocked.
The new thread is attached to the lhs. It receives a unique copy of
every local variables. Changes to the locals by the originating thread
are not observed by the new thread. When the rhs terminates, it detaches
itself from the lhs, and enqueues the return value if any.

Synchronization between tasks often coincides with communication. Many
parallel languages and libraries distinguish primitives for communicating and
synchronizing. Because these so frequently go hand-in-hand, pSather provides
a powerful construct to do both at once. A pSather GATE is a queue
with implicit synchronization. It generalizes many constructs found in other
languages, such as fork/join, barriers, semaphores, futures,
condition variables, and mailboxes .

Gate Features

Signature

Description

Exclusive

create:SAME

Make a new unlocked GATE{T} object with an empty queue and no
attached threads

N/A

size:INT

Returns number of elements in queue [GATE: returns counter]

No

has_thread:BOOL

Returns true if there exists a thread attached to the gate.

No

set(T)
[GATE::set]

Replace head of queue with argument, or insert into queue if
empty. [GATE: If counter is zero, set to one.]

Yes

get:T
[GATE::get]

Return head of queue; do not remove from queue. Blocks until
queue is not empty. [GATE: Blocks until counter is nonzero.]

Yes

enqueue(T)
[GATE::enqueue]

Insert argument at tail of queue. [GATE: increment counter.]

Yes

dequeue:T
[GATE::dequeue]

Block until queue is not empty, then remove and return head of
queue. [GATE: Block until counter nonzero, then decrement.]

Yes

Locks

Locks control the blocking and unblocking of threads. `GATE',
`GATE{T}', various `MUTEX's and read/write locks are special synchronization
objects which provide a mutual exclusion lock. A thread acquires a lock,
then holds the lock until it releases it. A single thread may acquire a
lock multiple times recursively; it will be held until a corresponding
number of releases occur. Exclusive locks such as `MUTEX' may only be held
by one thread at a time. In addition to these simple exclusive locks, it is
possible to lock on other more complex conditions.

Locks are acquired by the lock statement. The type of all expressions
following `when' must be subtypes of $LOCK (MUTEX is a subtype of $LOCK).
The statement list following the `then' is called the lock branch. A lock
statement guarantees that all listed locks are atomically acquired before
a lock branch executes. If the lock can't be acquired (for example, when
some other thread holds it), the thread is suspended.

pSather Solution for Sharks & Fish 1.

A complete solution for problem 1 is available here . Problem 2 solution could be obtained here .

There are three main classes: MAIN, FISH, and CURRENT . Class
MAIN contains the top level parallel loop for the simulation.
Class FISH encapsulates a state of a single fish. It contains
various fish attributes, such as position, mass, and velocity. It also
defines a function that updates these attributes a given a force
acting on a fish.

An important feature of class CURRENT is to return a current
force given a position in space.

After definitions of various class attributes and local variables in main,
we see calculations of the total number of threads and the number of
fish managed by each thread. cluster_size is a built-in expression
that returns a number of processors per calling cluster. For example,
on ICSI machines that you will use for assignment 2, it should return 4,
since each SMP has 4 processors. clusters is another built-in
expressions that returns the number of clusters in the networks.
Since you will be using a single stand-alone mutiprocessor, it should
return 1. Fish are distributed equally among all threads.

A single parloop is sufficient to specify parallelism of the simulation.
Each thread maintains a CURRENT object (although this is not necessary).
An alternative would be to share a single global CURRENT object.
Since the state of CURRENT does not change over the course of simulation
and the object itself occupies very little space, it was decided to
replicate it across the threads to save some communication cost.

Each thread manages a list of its local fish declared
my_fish:FLIST{FISH}
This means that my_fish is a variable of type FLIST{FISH} (a list whose
elements are objects of type FISH).

Each thread iterates over a specified number of time steps. For each
time step, threads first compute new positions for their respective
lists of fish (by calling move_my_fish(my_fish, current, dt). This
happens without any communication or synchronization since the lists
are completely disjoint and there is no interaction among various fish.
Then, each thread computes various things necessary to determine
the size of the next time step: max velocity, max acceleration, etc.
For simplicity (and also given that the number of threads is very
limited), the following strategy was used to find global maximums of
fish velocity and acceleration.

Each threads compares its maximum values with the currently known
global maximums and replaces them if necessary. To avoid race conditions
when multiple threads are trying to update the same variable such as
global_max_acc and global_max_speed , a mutual exclusion lock
is used to protect a critical section.

Another mutual exclusion lock is used to serialize drawing of fish
by different threads. This is necessary primarily because Xlib functions
are not thread safe.