ARSC T3E Users' Newsletter 158, December 18, 1998

ARSC Upgrades to VAMPIR 2.0 and Revamps Tutorial

VAMPIR is a graphical tool for analyzing the performance and message passing characteristics of parallel programs that use the MPI communication library. There are three steps to using VAMPIR: 1) compile your T3E MPI code for tracing, 2) run the executable, 3) analyze the resulting .bpv file on an SGI workstation.

VAMPIRtrace has been upgraded to version 1.5.1 on yukon and VAMPIR has been upgraded to version 2.0 on the ARSC SGIs. VAMPIR 2.0 offers several new features and a significantly improved user interface.

The location of the libraries and license files has been changed to conform with ARSC's standard method of installing third-party packages.

We have extracted the 2-part tutorial given in issues #146, #147, brought it up-to-date for the new version of VAMPIR, and put it on-line at:

http://www.arsc.edu/support/howtos/usingvampir.html

This tutorial tells everything you need to know in order to use this powerful tool. Alternatively, ARSC users can read "news VAMPIR" for the nitty-gritty. Enjoy!

MPICH-T3E Installed on Yukon

The MPICH-T3E implementation of MPI is now available on yukon in the directory:

/usr/local/pkg/mpich/current/

MPICH-T3E is the port of MPICH-1.1.0 to the Cray T3E supercomputer. This port was developed by the High Performance Computing Lab at Mississippi State University. For more information, visit:

Below are times from the "ring" program given first in
newsletter #66
. This program was modified to include larger buffers and to use MPI_REAL instead of MPI_REAL4 buffers.

The program times a buffer as it is passed around a ring of PEs using MPI_Send and MPI_Recv.

The table below reports transfer times in microseconds per MPI_REAL buffer. It shows runs compiled with both the MPICH-T3E 1.1.0 and MPT 1.2.0.2 version of MPI. This was run on a T3E-900. Your mileage may vary, but if your program passes a lot of messages, you might try MPICH. Be sure to do some validations runs, and let us know what you find.

Another Example of Post-Processing: Co-array Fortran

A strength of Co-array Fortran (CAF) is that it's such a simple extension to a well-known language. We thought we might learn something by using CAF to rewrite the post-processing program given in the last issue (
/arsc/support/news/t3enews/t3enews157/index.xml
).

As a reminder, the problem is data reduction of data stored across many files. We read the files on all PEs, extract particular fields from them, combine the data to a "master" PE, sum it, and store these results.

The CAF version was easy to write, the logic is transparent, and, given a little background, would make sense to most Fortran programmers. On the other hand, someone who knows MPI could write the MPI version easily and the MPI_REDUCE call conveys a lot of meaning by encapsulating several actions. Also, the MPI version is faster.

Addressing the performance issue, in the CAF version, only the master PE does work. It gets data from the other PEs, one at a time, while they sit idle. We can distribute this work better, using a tree to gather data to the master:

CAF version #2 is almost as fast as the MPI version. It's certainly not as easy to write or read as CAF version #1.

The graphs in figure 1 give timing comparisons between the two CAF versions and the MPI version. The execute times include the time to read the files, combine the results, and reduce the data.

Figure 1

The first graph considers a problem size that scales as we add PEs. This is a common scenario--people want to solve bigger problems, not just the same problems faster. The inefficiency of the CAF #1 implementation is apparent in this graph as the number of PEs increase.

The second graph considers a fixed problem size. We see the same problem with CAF #1. It's also apparent that the overall rate of disk I/O stops improving once 8 or more PEs are reading at the same time. (This was the conclusion of the article in the previous issue.)

It's interesting that the three traces converge at 8-PES. Given the file I/O constraint, 8-PEs seems appropriate, regardless of algorithm. Thus, in this situation, the simple CAF program is perfectly serviceable, and (depending on your MPI experience) might be easier to write.

Here is the code for CAF #1. (Send e-mail if you'd like to see all of CAF #2.)

Call for CUG Technical Paper Abstracts by 8 Jan 1999

[ Received recently... ]

> You are invited to submit a technical paper for the 41st CUG Conference
> (Supercomputing Summit) in Minneapolis, Minnesota USA during 24-28 May
> 1999.
>
> CUG is your SGI Cray systems user forum for high-performance
> computation and visualization, and your CUG Program Committee is
> working diligently to maximize that value for you and your colleagues
> at supercomputing sites like yours across the world.
>
> That's why we're asking you to consider submitting a technical paper
> abstract for the next Supercomputing Summit sponsored by CUG. All the
> information you need is on-line at either of the following URLs.
>
>
http://www.cug.org/
> or
>
http://www.fpes.com/cug_abstracts/call.html
>
> As it was with the 40 technical programs before it, the key to our next
> conference is a broad variety of high quality technical papers from SGI
> Cray contacts and from CUG site colleagues like yourself. Exchanging
> supercomputing insights is the essence of the technical presentations
> and their publication in the conference proceedings on CD-ROM. So, we
> have a great need for your suggestions on how to make the next
> conference the best one, yet.
>
> And, we hope that you will please consider submitting an abstract for a
> technical paper!
>
> In any case, you will want to mark your calendar and plan to register
> for the conference. You can find out more about the conference on-line
> at our CUG home page.
>
>
http://www.cug.org/
>
> There will be many formal and informal opportunities for you to share
> challenges and exchange information with your colleagues from other CUG
> sites and with SGI Cray technical experts, so you won't want to miss
> this meeting! And, we don't want you to miss the opportunity to submit
> an abstract for a technical paper. The deadline for submitting an
> abstract is Friday,8 Jan 1999.
>
> Please check the WWW information on the CUG home page today!
>
>
> With regards,
>
> Sam Milosevich (sam@lilly.com)
> CUG Vice President and Program Committee Chair

Next Issue: Jan. 8, 1999

Happy Holidays, Everyone!

Quick-Tip Q & A

A: {{ Fortran again. ALOG10 is an intrinsic CF90 function. The man
page even says so:
DESCRIPTION
LOG10 is the generic function name. ALOG10 and DLOG10 are
intrinsic for the CF90 compiler.
But this program won't compile:
!----------------------------------------
program test
write (6,*) "alog10 (1000) :", alog10 (1000)
end
!----------------------------------------
Here's the error message:
yukon% f90 test.f
write (6,*) "alog10 (1000) :", alog10 (1000)
^
cf90-700 f90: ERROR TEST, File = junk.f, Line = 3, Column = 35
No specific intrinsic exists for the intrinsic call "ALOG10".
What's wrong? }}
Thanks to two readers:
######################
Here is my two-minute fix; insert a decimal point into 1000 to make
it a floating point number, which is the argument that the intrinsic
function alog10 expects.
###
I think there are two problems here, i) an imprecise error message
and ii) ugly Fortran programming. The compiler tries to tell it has
no integer intrinsic for the log10. If you replace alog10(1000) with
log10(1000) you get the same error message. There is no ilog10, as
documented in the man page.
The second problem is peoples bad habits of using specific intrinsics
rather than generic. I think generic intrinsics are one of the nice
things with Fortran. The compiler looks at the data type of the
argument and chooses by itself, making code cleaner. If you need or
want to direct it towards a specific intrinsic this is done by type
casting the argument. In this case you could for example type
write (6,*) "alog10 (1000) :", log10 (real(1000))
and everything will work as expected.
Q: (Finally, one for the C programmers!) Will C correctly cast
a pointer reference value? For instance, given this declaration:
unsigned temp = *uint;
will "temp" be set as expected even if "uint" is an int pointer?

The University of Alaska Fairbanks is an affirmative action/equal
opportunity employer and educational institution and is a part of the University
of Alaska system.
Arctic Region Supercomputing Center (ARSC) |PO Box 756020, Fairbanks, AK 99775 | voice: 907-450-8602 | fax: 907-450-8601 | Supporting high performance computational research in science and engineering with emphasis on high latitudes and the arctic.
For questions or comments regarding this website, contact info@arsc.edu