ARSC T3D Users' Newsletter 97, July 26, 1996

IHPSTAT Provides Run-Time Stats on Available Memory

"IHPSTAT" is a UNICOS library routine which you can call from your programs (see "man IHPSTAT"). It returns statistics about the heap.

You can use IHPSTAT to check the availability of memory, at run-time. You might want to do this even if your code itself does not allocate memory dynamically: it might call library routines which do. If you get unexpected out-of-memory conditions, you can instrument your code with IHPSTAT to help locate and avoid these problems.

What follows is a makefile, a sample program, and some output to demonstrate IHPSTAT. It causes the system library routine, OPEN, to fail on an out-of-memory condition.

The program allocates a chunk of static memory, then dynamically allocates ten blocks, one at a time, until failure or done. At every step, it prints all of IHPSTAT's available information. This allows us to watch the heap approach critical limits. After the allocation loop, the program attempts to open a file, write to it, and close it.

For the sample run, I have requested more memory than the system can provide. I trap this error in HPALLOC (another UNICOS library routine), abort the allocation loop, and, although at this point the program is still running, it crashes in the "OPEN" call and gives this message:

mpplib-5010 ./tst0: UNRECOVERABLE library error
A request for more memory has failed.

The program is, of course, designed to test memory limits. If it were a real program, there are a couple of ways the programmer might avoid a crash of this nature. The first line of defense is to trap the error in the call to OPEN:

A second approach is to use the output of IHPSTAT(11) to watch for critically low memory. An advantage of this approach is that it can be used to protect any library calls -- even those which, unlike OPEN and HPALLOC, do not have built-in exception handling.

Porting Network PVM Codes to the T3D

(This article discusses low-level issues involved in porting network PVM codes to the T3D. It makes a nice companion to the articles in Newsletters
#90
and
#91
on porting heterogeneous, master-slave codes.)

The Cray T3D message passing library uses a PVM interface. (Functionality is accomplished using the shmem library.) So for the most part programs written in PVM will run properly on the T3D. However, there are some exceptions, which we list below. Also included are some performance tips and cautions.

There is no spawning on the T3D. You simply "tell it" how many processes you want, and they are started up. So to make your code portable, simply put #ifndef _CRAYMPP around your code that does the spawning. Then to get the task ids (tids) of participating processes, each process can execute the following code:

Unlike the network version of PVM, only one process per processor is allowed. The processes are numbered 0 to NPROCS-1, and there is a one to one correspondence between process ids and processor numbers. (Process 0 is the parent, and runs on processor 0.)

Because processes/processors are numbered 0 to NPROCS-1 on the T3D, you can avoid the use of tids by using processor numbers. (Returned using pvm_get_PE( taskid ).) However, this isn't portable, so use tids as usual. Still, the processor numbers are convenient for debugging, so we set them using a global variable. Of course this is possible on all platforms by assigning processor numbers to PVM tasks according to their position in the tids array (i.e. order of spawning), so pvm_get_PE is not necessary anyway. (So use TIDS(I) to send a message to process I.)

Be careful with the use of PvmDataInPlace. It is a documented T3D feature that this mode of data management may allow pvm_send/pvm_psend to return before the data is safely on its way to the target process. So it's possible to overwrite the data you think you are sending. And Cray doesn't provide a polling function to find out if/when the data is safe. Therefore, the safe way to send data is to use PvmDataRaw and incur a data copy/performance penalty.

Group operations. With the exception of the default global group (designated "NULL"), the performance of group operations is poor. This is due to hardware constraints, so a software fix probably won't improve the situation.

Fast parallel I/O is possible using pvm_channels. However, this functionality is specific to the T3D, so it is not portable. On the other hand, efficient parallel I/O is not yet available using a common interface, so pvm_channels is worth investigating when large amounts of data are involved.

pvm_pack/unpack functions perform poorly on this machine, or are at least more noticeable if you make repeated calls to pvm_pack before a pvm_send. In doing our own data management (i.e. packing our own array and making one call to pvm_pack), we got a 25 times speedup on a hydro code, and this even though our code is not (at least with this method) communication intensive.

For small messages, the T3D specific functions pvm_fastsend will transfer the data in about half the time of a regular send. "Small" is a user definable environment variable (the default size is 256 bytes). Note that this functionality is accomplished by including the data in the message header (which is used in all T3D message passing) rather than in the subsequent data packets, so setting it "too high" can have an adverse effect on your larger messages.

A note on MPI:

If you are starting from scratch, you should consider using MPI. Most of the above problems disappear, and you might find programming simpler and performance better (especially in light of the PvmDataInPlace problem) than with PVM. Additionally, the MPI I/O group has joined more closely with the MPI Forum, so parallel I/O may one day come easier. I will add a caution here also, however. The MPI-2 effort (to be announced at SuperComputing '96 in November) will probably include one sided communication (a la T3D shmem). However, although this will be portable, the performance on other platforms is in question. And although not wanting to be a member of the flat-earth society, I believe this performance degradation will not easily go away since it is perhaps due to hardware constraints.

Local User Group Meeting -- Next Thursday

ARSC's T3D User Group will be meeting on Thursday, August 1 at 3:00 PM. The location will be Butrovich, room 106A.

We would like to get to know our local users a little better, and seek opinions on possible configurations for the T3E. The agenda is:

Attendees describe the work they are doing (or planning to do) on the T3D/E. With that in mind, it would be helpful for members of the group to give a quick summary of their work. This would allow ARSC to gain deeper insight into the activities of local users and provide you with an opportunity to find out what your peers at UAF are doing.

We stimulate conversation about possible configurations for the "E."

Please drop by!

On the Web: Current Status of ARSC's T3D

Next time you wake up with that nagging question, "I just wonder how many grayling I could catch, up there in Alaska..." Well, we wouldn't be much help. But if you're wondering how many PEs you could get, go to ARSC's welcome page:

The University of Alaska Fairbanks is an affirmative action/equal
opportunity employer and educational institution and is a part of the University
of Alaska system.
Arctic Region Supercomputing Center (ARSC) |PO Box 756020, Fairbanks, AK 99775 | voice: 907-450-8602 | fax: 907-450-8601 | Supporting high performance computational research in science and engineering with emphasis on high latitudes and the arctic.
For questions or comments regarding this website, contact info@arsc.edu