ARSC T3D Users' Newsletter 98, August 2, 1996

Porting T3D PVM Codes to Network PVM

Last week, Richard Barrett of LANL discussed numerous issues in porting Network PVM codes to the T3D. I have been interested in this area, and in the reverse scenario - writing T3D codes so that they'll easily port to a network PVM environment. Clearly, the issues mentioned by Richard, such as sending messages to TID's rather than PE's, are important when writing portable code. I have found that the most difficult task in writing portable PVM code is that of dealing with "spawning" vs. "non-spawning" environments. Although Richard touched upon this, I'd like to go a little deeper.

If we are writing code for the T3D, we are restricted to a Single-Program Multiple-Data (SPMD) paradigm - that is, we have the same executable running on each processor. We tend to address other processes by a PE number, or at the very least, the PVM TID of a PE. We don't have an inherent master-slave relationship between processes, although traditionally we view PE0 as a "master" process. If we want to run our T3D SPMD code in a network PVM environment, where spawning is necessary, then it's necessary to simulate the T3D behavior of a non-spawning environment. The following function is designed to do just that. It has been utilized in a T3D environment AND in a cluster of PC's running Linux.

We assume that the "master" process (the initial process in a
network environment, PE0 on the T3D) somehow knows how many
processes are to run. This might be read in from a file, come
from a command-line argument, etc. The "master" process regards
"numprocs" as an (in) parameter. All other processes regard
this as an (out) parameter. Additionally, "mype", "tidlist",
and "icode" are all (out) parameters.

From each process' point of view, startup() is called with
uninitialized parameters (with the exception of "numprocs" on
the master process), and upon return from the function, every
process has a value for each of the parameters. Thus, each
process knows how many total processes are in the virtual
machine, they each know their logical position in the system,
and they each have an identical list of TID's, so they can
easily communicate with each other by referencing the
appropriate logical PE in "tidlist" - if Process 3 wants to
communicate with Process 7, it simply references tidlist(7).

"icode" is an error flag. If the call to startup() was
successful, icode stores the number of total processes,
otherwise it stores some negative value (note - this aspect
hasn't been fully implemented in the following code).

The code (see below), "startup.F", is preprocessed with gpp on the T3D and cpp in other environments, and is conditionally compiled depending on values of preprocessing macro _CRAYMPP. _CRAYMPP is automatically defined if you're using the Cray MPP compiler.

The entire function is based on processes joining a global group, then obtaining information about themselves and other processes within the global group. Conditional compilation enables/disables code for spawning and T3D environments. The following differences are addressed through conditional compilation:

The global group name, "ALLGROUPNAME."

Variables declared for use in spawning environments are not used
in the T3D environment.

In T3D environments, we're not able to join the global group
(via pvmfjoingroup()), or obtain the instance within the global
group (via pvmfgetinst()). Thus, the T3D-specific function,
pvmfgetpe() is used. In spawning environments, we join the
global group and get our instance in the group with
pvmfjoingroup().

A good portion of the code in the function is only used in a
spawning environment. After the spawn, we force a
synchronization by having child processes send a message to the
parent. Without this forced synchronization, future group
operations (e.g. pvmfbcast(), pvmfbarrier(), pvmfgettid()) may
fail, as the parent process may attempt to call these functions
before others have even joined the group. Trust me, I've
encountered this way too many times!

After the spawning code, there is no difference in T3D and spawning implementations. The rest of the code is centered on insuring that every process knows how many total processes there are, then each process obtains the TID's of every other process. One might be tempted to use pvmfgsize() to find the number of total processes, but, as above, this "forced" synchronization is necessary. Without it, it is possible for one process to reach this point before others have joined the global group, producing different values of "numprocs" for different processes! Again, trust me! Been there, done that!

The source code for the function is below, followed by the Makefile, which in turn is followed by "test.f," a simple program which calls the startup() subroutine. Note that in the main program there is no dependence on the architecture. This is all handled in startup(), and as long as basic PVM operations are utilized, it will be portable.

ARSC T3D User Group Met on August 1st

We had a nice turnout yesterday, despite the rain. Users and staff alike are both excited and apprehensive about the T3E. Excited? Of course. This is new technology that promises to solve bigger problems faster. Apprehensive? Of course. This is new technology that promises startup problems, incompatibilities, and a learning curve.

Some Concerns:

How easily will T3D codes port to the T3E?

Will there be a T3E version of AVS?

Will the T3E have a C compiler, or only C++? Users report difficulty compiling public domain and personal C code on various C++ compilers.

Will existing CRAFT codes port to the T3E? How small a subset is planned? What other implicit programming models will be available?

What will the file system look like? DMF? CRL?

Some Expectations:

Improved job scheduling and more immediate access to PEs may be possible, given the T3E's job swapping capability and its lifting of the power-of-two number of PEs per job restriction.

Turnaround on the Y-MP should improve once separated from the T3D.

Many users see the T3E as an major upgrade: they are already looking forward to running bigger simulations.

Don Morton Migrates South

At any rate, Don plans to start the long drive tomorrow morning. At ARSC, Don's departure is an early sign of approaching winter -- next thing you know, the sandhill cranes will start to fly. We thank Don for his many contributions to ARSC, the T3D User Group, and this Newsletter. We wish him a safe journey, a good year, and look forward to seeing him again next summer.

The University of Alaska Fairbanks is an affirmative action/equal
opportunity employer and educational institution and is a part of the University
of Alaska system.
Arctic Region Supercomputing Center (ARSC) |PO Box 756020, Fairbanks, AK 99775 | voice: 907-450-8602 | fax: 907-450-8601 | Supporting high performance computational research in science and engineering with emphasis on high latitudes and the arctic.
For questions or comments regarding this website, contact info@arsc.edu