ARSC T3D Users' Newsletter 101, August 23, 1996

ARSC Upgrades to UNICOS MAX 1.3.0.2

Last Tuesday, we upgraded the T3D's MAX operating system from version 1.2.0.5 to version 1.3.0.2. This upgrade should be transparent, but you should recompile and relink all of your your T3D executables as it affects include and library files as well as kernel routines and the user environment.

Here are the release contents for both MAX 1.3.0.0 and 1.3.0.2. In each case, point #1 is the most important.

Release contents
----------------
The UNICOS MAX 1.3.0.0 release includes the following changes:
1) Added a series of fixes designed to enhance system stability
2) Added support for preallocation of the roll file
3) Added binary executables for SAM, mppview, and URM
4) Added support for Phase III I/O
Release contents
----------------
The UNICOS MAX 1.3.0.2 release includes the following changes:
1) Added a series of fixes designed to enhance system stability
2) Added some improvements to the XDR routines, primarily to improve
the performance by converting numbers in large blocks. This allows
the conversion to vectorize on PVP systems and to execute in a small
(icache) loop on MPP systems.

Use f90 for Loopmark Listings of T3D Codes

If you use the "-rm" flag, CRI's f90 compiler will create a listing file with loops marked and optimizations explained. It can provide this for either T3D or Y-MP compilations. This is a big improvement over cf77, which only does "loopmark listing" of Y-MP compiles.

It's nice to know how a compiler alters your code when it optimizes it. Some optimizations reduce precision. Others, if you mislead the compiler (for instance, telling the Y-MP compiler to ignore vector dependencies when it shouldn't) can lead to incorrect results.

When I compiled it for the T3D and Y-MP by setting the TARGET environment variable accordingly and then using the cf77 commands:

T3D: "cf77 prog.f -o t3d.exe"
Y-MP: "cf77 prog.f -o ymp.exe"

I was surprised by the timings:

T3D: 200,000 mflop/s
Y-MP: 150 mflop/s

It was easy to find out what the Y-MP compiler had done, as a recompile with the cf77 flag, -Wf"-em":

Y-MP: "cf77 -Wf"-em" prog.f -o ymp.exe"

produced a "loopmark listing" which showed that the loop had vectorized. Good enough.

I assumed that the T3D compiler had actually eliminated the loop, but as "loopmark listing" is not available under cf77 for T3D codes, I didn't know how to prove it. Eventually, I discovered that I could recompile with the -Wf"-cm" flag:

T3D: "cf77 -Wf"-cm" prog.f -o t3d.exe"

which produced a CIF (Compiler Information File). CIF's contain human unreadable data, but in the CIF manual, I found a C program which extracts "compiler messages" from CIFs. I copied, compiled, and ran this C program on my CIF to get the following information:

"message at line 22: A loop was eliminated by optimization."

This was moderately satisfying, at best. As far as I know, it's the most information on T3D optimizations you can get from the cf77 compiling system (if anyone knows a better way, let me know, and I'll pass it on).

The solution I found was to use f90.

In f90, you can compile with the same flags for either T3D or Y-MP to get various human readable listing files. For instance:

T3D: "f90 -rm loops.f -o loops"
Y-MP: "f90 -rm loops.f -o loops"

will produce a listing similar to cf77's Y-MP loopmark listing. To provide a T3D vs Y-MP example, I used these compile commands on the following code:

Running the "explain" command on any of these messages provides even more help (but the error codes should start with "cf90", not "f90"). For instance:

denali$ explain cf90-6204
Vector code was generated for the loop. The compiler vectorizes
a loop when it can be determined that the meaning of the loop
will not change by doing so. However, the order of expression
evaluation may change, and results may differ. Generally, the
vector version of a loop executes much faster than the scalar
version.

This consistent behavior across platforms is really nice. A good reason to use f90 instead of cf77.

The 'mppfixpe' Command and Plastic Executables

The UNICOS command, "mppfixpe," will convert a plastic to a fixed executable. In other words, it will convert an executable which can use a variable number of PEs (to be determined at run-time) to one which must always use the same number of PEs (as specified in arguments to the command).

This may not seem like the most useful command (why would one want to sacrifice flexibility?), but there are good reasons to use fixed executables. For instance, we had visitors working on-site last week who got a 2:1 speedup in the load time of a program when they switched from plastic to fixed. This was a boon because they wanted to do multiple, short test runs, and the load time had become a major percentage of the total time spent on each run.

They used 'mppldr -X $(NPES) ...' to re-link the program with a fixed number of pes. However, had they no longer had access to the source or object files, mppldr would not have worked, and they could have used mppfixpe.

For a thorough discussion of plastic and fixed executables, see Newsletter
#44
. Here, however, is a quick comparison:

Advantages of fixed executables:

mppldr not called on each run

smaller file size

Advantages of plastic executables:

number of PEs is flexible -- determined at runtime

can usually be converted to fixed, whenever desired, using mppfixpe

This is from CRI's man page:

NAME
mppfixpe - Reconfigures a CRAY T3D absolute for a different
number of PEs
SYNOPSIS
mppfixpe -o newname -X npes [-M opts] [-V] oldname
DESCRIPTION
The mppfixpe utility reads an existing CRAY T3D absolute (plastic
a.out file) and, if possible, changes it so that it will execute using
a different number of processing elements (PEs).
A plastic a.out file refers to an a.out file on a CRAY T3D system that
has been created without using either compiler or loader directives to
specify (or fix) the number of processing elements. This lets you
specify the at execution time the number of PEs. For example:
/mpp/bin/cft77 t.f
/mpp/bin/mppldr t.o
a.out npes 128
If you fix the number of PEs on either the cf77 or the mppldr command
line, the resulting a.out file no longer is considered to be plastic,
and you cannot specify the number of PEs to use at run time.
A plastic a.out file is assumed to have been targeted for 0 PEs.
The mppfixpe utility accepts the following options:
-o newname Specifies the path name where the new absolute is to be
stored.
-X npes Specifies the number of PEs for which the new absolute
is to be configured.
-M opts Requests that the loader produce a map of the new
absolute. The opts values are those known to mppldr(1).
-V Causes the mppfixpe utility to write its version
identification to stderr.
oldname Specifies the path name of the existing CRAY T3D
absolute.
NOTES
The mppldr and mppfixpe utilities assume that fairly ordinary things
are being done. However, if you are changing the loader's CALLXFER
directive, things may not work the way you want.

The University of Alaska Fairbanks is an affirmative action/equal
opportunity employer and educational institution and is a part of the University
of Alaska system.
Arctic Region Supercomputing Center (ARSC) |PO Box 756020, Fairbanks, AK 99775 | voice: 907-450-8602 | fax: 907-450-8601 | Supporting high performance computational research in science and engineering with emphasis on high latitudes and the arctic.
For questions or comments regarding this website, contact info@arsc.edu