This class will cover the characterization and history of MPPs. With this background the students will experience how the T3D approaches the problem of executing a program in parallel. The class will cover the three programming paradigms for extracting parallelism:

Data-sharing, as with Fortran 90

Work-sharing, as with Craft Fortran and

Message-passing as implemented with PVM or shmem

The primary goal is to provide practical experience in getting codes up and running efficiently on the T3D. Examples used in the class can be used as models for application programs on the T3D. Also covered will be:

Performance measurement and tools

Debugging techniques and tools

This class will have directed lab sessions and users will have an opportunity to have their applications examined with the instructor.
Intended Audience:
Researchers who will be developing programs to run on the T3D and current users of the T3D who want a comprehensive, up-to-date survey of programming on the T3D.
Prerequisites:
Applicants should have a denali userid or be in the process of applying for a userid. Applicants should be familiar with programming in Fortran or C on a UNIX system.

Application Procedure

There is no charge for attendance, but enrollment will be limited to 15. In the event of greater demand, applicants will be selected by ARSC staff based on qualifications, need, and order and completeness of application. The class may be cancelled if there are fewer than 5 applicants.

Send e-mail to
consult@arsc.edu
with the following information:

course name

your name

UA status (e.g., undergrad, grad, Asst. Prof.)

institution/dept.

phone

advisor (if you are a student)

denali userid

preferred e-mail address

describe programming experience

describe need for this class

I/O on the T3D and Y-MP

In investigating the prospect of implementing Phase II I/O on the T3D, we at ARSC have begun measuring I/O speeds both on the Y-MP and the T3D. Measuring I/O is a complicated situation because:

It is always implemented with shared resources:

shared physical disks

shared I/O devices

shared system buffers

It depends on the operating system to service user requests and the availability of the OS depends on system load.

Its environment is not uniform across Y-MP systems

what physical devices?

SSD or BMR or LDcache in memory?

T3D or Y-MP?

A rich set of user options are available:

formatted or unformatted?

sequential or direct?

record size large or small?

There are so many options that it is sometimes just easier to ignore the whole situation, except that eventually everyone has to do I/O and most likely the larger the problem the more I/O will be a bottleneck. So to get started I wanted to present some speeds contrasting I/O on the T3D and the Y-MP. Below is a table of the speed (in MW per second) reads and write on the Y-MP to a file on the /u1 file system (a typical home directory) and the /tmp file system (a larger, faster file system where users are encouraged to work). In Table 1 we have Y-MP speeds of an unformatted write for arrays of increasing size:

Of course speed increases with the size of the transfer but only while the size of the buffer is larger than the size of the transfers. The high speed on the /tmp file system is due (among other things) to the LDcache which is like a ram disk used as a buffer and is made out of some of the 1GW of memory on ARSC's Y-MP. (So higher I/O speeds in another reason why users should work out of /tmp, rather than their home directories that are not LDcached.)

Next, we compare the Y-MP uniprocessor speeds to those of a single T3D PE running the same program:

The big difference between I/O on the Y-MP and the T3D is that the I/O on the T3D is done by the mppexec agent that is just another Y-MP job competing with all other Y-MP jobs in the mix (ARSC's Y-MP is always running at more than 95% utilized). The degradation beyond 16K operations on the T3D must be due to some buffer other than the LDcache buffer because both writes are to files on the /tmp file system.

I'm sure this is only a temporary difference between the Y-MP and T3D compilers and that in the future the T3D compilers will be as smart as the Y-MP compiler for such an implied do loop on the I/O construct. I/O is a complicated situation and if you find some insight or technique, I'm sure we'd all like to hear about it.

Reminders

List of Differences Between T3D and Y-MP

The current list of differences between the T3D and the Y-MP is:

Data type sizes are not the same (Newsletter #5)

Uninitialized variables are different (Newsletter #6)

The effect of the -a static compiler switch (Newsletter #7)

There is no GETENV on the T3D (Newsletter #8)

Missing routine SMACH on T3D (Newsletter #9)

Different Arithmetics (Newsletter #9)

Different clock granularities for gettimeofday (Newsletter #11)

Restrictions on record length for direct I/O files (Newsletter #19)

Implied DO loop is not "vectorized" on the T3D (Newsletter #20)

I encourage users to e-mail in differences that they have found, so we all can benefit from each other's experience.

In newsletter #18 there is a list of CRI T3D optimization articles available from ARSC.

In Newsletter #19 there is a list of CUG articles on the T3D available from ARSC.

The University of Alaska Fairbanks is an affirmative action/equal
opportunity employer and educational institution and is a part of the University
of Alaska system.
Arctic Region Supercomputing Center (ARSC) |PO Box 756020, Fairbanks, AK 99775 | voice: 907-450-8602 | fax: 907-450-8601 | Supporting high performance computational research in science and engineering with emphasis on high latitudes and the arctic.
For questions or comments regarding this website, contact info@arsc.edu