ARSC Events and User Training

UPC, or unified parallel C, is an explicit parallel extension of ISO C, which follows the Partitioned Global Address Space (GPAS) programming model. UPC combines the ease of programming of the Shared Memory Model with the ability to control and exploit data locality, as in the message passing paradigms.

In this talk, we will share some of the benchmarking results of UPC against other popular programming paradigms using the NAS Parallel Benchmark and other workloads. Scalability and programming efforts results will be discussed. We will also show that should UPC compilers employ a number of simple optimizations, UPC will compare favorably with current popular paradigms in both execution and code development time.

The Speaker:

Dr. El-Ghazawi is a Professor in the Department of Electrical and Computer Engineering at The George Washington University,
http://www.gwu.edu/
and founding Co-Director of the High-Performance Computing Lab (HPCL).

Using TurboMPI 3.0

[ Thanks to Don Bahls of ARSC for this article. ]

TurboMP is a set of two API libraries developed by IBM which take advantage of on-node shared memory. These libraries consist of a small set of collective functions from the MPI API as well as an implementation of SHMEM. This article will focus on TurboMPI, the optimized version of MPI in TurboMP.

The TurboMPI versions of these functions can be linked in by adding the following libraries and library path to your link line:

-L/usr/local/lib -lturbo1 -lxlf90_r

Note that in the current release, TurboMP 3.0, the xlf90 library is required for both C and FORTRAN programs.

MPI-1 functionality is in the turbo1 library (libturbo1.a), while MPI-2 functionality is in the turbo2 library (libturbo2.a). The turbo2 library is a superset of turbo1 so only one of these should be included. The documentation recommends linking with turbo2 only if MPI-2 functionality is needed.

Performance:

To assess the performance gain from TurboMP over the default IBM MPI implementation, I ran the Pallas MPI benchmark on both a single P655+ and P690+ node of ARSC's recently installed SP cluster, "iceberg." The benchmark was run with and without the TurboMPI libraries and all runs, including those using the default MPI, set "MP_SHARED_MEMORY=yes".

On both the P655+ and P690+ nodes I saw the same thing. MPI_Reduce and MPI_Allreduce performed better than the standard MPI implementations of these functions for all message sizes up to 4 MB (the largest message size in the Pallas benchmark). For messages under 1024 bytes, these calls completed consistently in under half the time of the standard MPI calls

With other operations such as MPI_Bcast and MPI_Alltoall there were no noticeable differences in performance.

Results can obviously vary quite a bit from program to program, but this quick experiment suggests that if your code spends a lot of time in MPI_Reduce or MPI_Allreduce calls, TurboMPI could be an easy way to shave some run-time off your model.

X1: make or gmake ?

Cray supports many open source utilities on the X1 (dubbed "Cray Open Source" or "COS" to distinguish them from the set of regular Cray utilities). To access the open source versions, load one of two modules provided by ARSC:

open
open_lastinpath

If you load "open", the paths to the COS utilities will be prepended to your path variables. If you load "open_lastinpath", the COS paths will be appended.

Some of the open source utilities are not duplicated in the standard Cray utilities (e.g., less, gnuplot, vim, seq). Other utilities, however, are duplicated, but, in some cases, provide different features and command line options (e.g., make, find, diff, tar).

The reason ARSC provides the two different modules is to make it easier for you to select which version of the utilities will execute. Here are some options for setting up your account:

Load module open (the default):

EFFECT:
Always use the COS version when it exists, otherwise use Cray.

Don't load module open or open_lastinpath (you'll have to comment out the "module load open" command from your .login or .profile):

EFFECT:
You'd only get Cray's standard utilities, and never the COS version.

Quick-Tip Q & A

A:[[ I almost always need my loadleveler script to start executing in the
[[ same directory from which I "llsubmit" it.
[[
[[ Thus, my first executable command is generally a "cd" right back to
[[ whatever that directory is. Sometimes I copy a script to a new
[[ directory and forget to change the "cd" command, and of course
[[ then everything goes wrong. Any advice?
#
# From Matt MacLean
#
From the Load Leveler manual, the environment variable
LOADL_STEP_INITDIR
will give you the initial working directory where the job was
submitted.
So, starting a script with:
cd ${LOADL_STEP_INITDIR}
will set up the directory correctly.
#
# From Kate Hedstrom
#
Delete that cd line - it's not needed. The script will automatically
start in the directory from which it is submitted.
#
# Editor's note:
#
The other batch systems used at ARSC also provide environment
variables which point back to the initial working directory:
PBS (e.g., on the X1): PBS_O_WORKDIR
NQS (e.g., SV1ex or T3E): QSUB_WORKDIR
Both PBS and NQS start your script in your home directory as if it
were you, logging on. Thus, you must include an explicit "cd" to the
correct working directory.
Q: Let's repeat a good one from 1998, issue #146:
What's your favorite shell alias?
If you would, send the alias and a very brief explanation.
[[
Answers, Questions, and Tips Graciously Accepted
]]

The University of Alaska Fairbanks is an affirmative action/equal
opportunity employer and educational institution and is a part of the University
of Alaska system.
Arctic Region Supercomputing Center (ARSC) |PO Box 756020, Fairbanks, AK 99775 | voice: 907-450-8602 | fax: 907-450-8601 | Supporting high performance computational research in science and engineering with emphasis on high latitudes and the arctic.
For questions or comments regarding this website, contact info@arsc.edu