ARSC HPC Users' Newsletter 238, February 4, 2002

Contents

FFTs on Chilkoot

I have been testing the various FFT routines available at ASRC, with an eye to performance. This article describes results for Chilkoot, the Cray SV1ex.

For comparison purposes, I started with a fairly quick FFT code taken from the INFO-MAC hyper archive whose original author was John Green (I shall refer to this as Green's FFT). Green's FFT routines were reported to be 2-3 times faster than the standard Numerical Recipes FFT algorithms.

I compared Green's complex 1-D FFT with the equivalent CRAY LibSci routine CCFFT and with the IMSL Math Library routines FFTCF/FFTCB. The tests were run on randomly generated complex arrays of varying lengths by calling both the forward and the reverse FFT 10000 times each. The run times in seconds for vectors of lengths from 64 to 16384 are summarized in the following table:

The results overwhelmingly show the CRAY LibSci routines are superior.

Figure 1 - FFT Comparison on Chilkoot:

Graphically we see a 2 to 7.5 times speedup over Green's implementation when using LibSci.

Figure 2 - Speedup on SV1ex: LibSci versus Green's:

As a further comparison, I performed a similar test using the 2-D complex FFTs from LibSci (CCFFT2D) and from IMSL (F2T2D/F2T2B), running each 2000 times forward and reverse on matrices of varying sizes. Not surprisingly, the results were quite similar to the 1-D cases, with the LibSci routine running 2.5 to 5 times faster than the IMSL version.

Figure 3 - 2-D FFT Comparison on Chilkoot:

Upon reading the CCFFT2D man page more closely, I noticed under the performance tips the comment "it is very important to make the leading dimensions of the arrays odd numbers to avoid memory bank conflicts." Doing this provided an additional 1.2 to 1.4 times speed up over the already speedy LibSci FFT routine.

I think the conclusion to this exercise is obvious - if your application on the SV1ex uses FFTs, then you will be richly rewarded by faster run times when you use the CRAY LibSci routines and read the man pages for usage tips.

T3E Programming Environment and OS Upgrades

As announced in news/motd, yukon's default programming environment was upgraded to PE 3.5.0.3 last Wednesday, and the default message passing toolkit will be upgraded to MPT 1.4.0.4 this Wednesday.

As always, we'd like to hear of any problems, performance improvements, or other changes you notice. You'll have to recompile your code for the upgrade to have any effect.

UNICOS/mk will be upgraded soon. Watch news/motd.

"slayall" on Cluster

Clusters need to be watched a bit more than HPC systems like the T3E. One common problem is that job processes are inadvertently left running after a job completes.

Users of ARSC's linux cluster, quest, can ensure that all processes have been terminated by issuing the "slayall" command before leaving the system:

usage:

slayall $USER

(Where USER is the environment variable with your username in it. You can use the command just as it's given above or substitute your actual username. For example, if you were user "farquat," you could use the command, "slayall farquat".)

Note that if you have jobs running in pbs, slayall will kill them. If not, please run "slayall" prior to logging off.

Quick-Tip Q & A

(The quick tip is still in hibernation... If you have any tips to
share, we'd love to see them.)

The University of Alaska Fairbanks is an affirmative action/equal
opportunity employer and educational institution and is a part of the University
of Alaska system.
Arctic Region Supercomputing Center (ARSC) |PO Box 756020, Fairbanks, AK 99775 | voice: 907-450-8602 | fax: 907-450-8601 | Supporting high performance computational research in science and engineering with emphasis on high latitudes and the arctic.
For questions or comments regarding this website, contact info@arsc.edu