ARSC T3E Users' Newsletter 145, June 26, 1998

CUG Notes

ARSC staff members Kurt Carlson and Barbara Horner-Miller who attended the Stuttgart CUG were kind enough to share their notes with the T3E Newsletter. The first notes were written by co-editor, Guy Robinson:

Guy's Notes:

SV1. What is it?

The SV1 is CRAYs latest member in the vector supercomputer product line. Key features of the new line are:

Peak performance of 4 gigaflops from a single processor unit, which actually consists of four 1 gigaflop processors. These processors exploit vector cache memory which, with a suitable compiler, will increase effective memory bandwidth.

A symmetric multiprocessing architecture which builds larger systems by combining many cabinets. Scalable from entry level single cabinet systems of 32-gigaflops to multi-cabinet teraflop systems, and a current upper limit of 1 terabyte of memory.

A powerful suite of clustering tools to ease the management of such complex systems will also be provided.

Existing J systems can actually be upgraded to use the new fast processors, giving higher levels of performance but without some of the advanced clustering abilities, and Cray is expecting many sites to take this path.

With entry level systems priced at $500,000 and an aggressive trade in policy it is clear this system is intended to be the mainstream of the SGI/Cray scientific product line. More on the SV1 can be found at:

http://www.cray.com/products/systems/craysv1/intro.html.

Software tools

Several SGI/Cray staff described what would be happening in the current programming environment in the coming months/years. Along with a continual set of fixes and improved compatibility between the various SGI/Cray systems, key points are:

improved totalview debugger which will allow the inspection of message queues etc.

a new set of scientific libraries with a clearer structure overall which will ease user confusion over which libraries contain which algorithm and which are parallelized for which architecture.

in the longer term a single
workshop
of tools with a similar general look and feel across all platforms.

T3E news

SGI/Cray predicted continuing sales for the T3E systems in the coming year and reported it to be one of the most successful MPP platforms ever produced with several 1000+ node systems either in place or being delivered shortly and many systems with over 500 nodes in production.

A new internal network is now available which increases bandwidth from 350MBytes/sec to 420Mbytes/sec but other hardware changes are unlikely to occur unless there is demand from users. (The current hardware could accept a 750Mhz processor.)

A tutorial was held on the problems of scheduling MPP systems and several available products were reviewed. All present agreed there was no perfect solution and that there was much to be said for sites determining a best practice rather than seeking a holy grail.

Barbara's Notes:

Cray User Group (CUG) Report
Stuttgart Germany
June 14-18, 1998

This CUG was the first once-a-year-CUG and the first CUG with the new SGI management. It was the last CUG for several Cray Research corporate faces; fond farewells were bid to Bo Ewald and Irene Qualters. There was an election for four offices and a reorganization of the Special Interest Group (SIG) structure in addition to the regular general and parallel sessions.

CHANGING OF THE GUARD

Bo Ewald, Executive Vice President of SGI, and Irene Qualters, President of Cray Research, resigned from SGI a few weeks before the Stuttgart CUG. They attended the conference to say their good-byes to, and be honored by, CUG, an organization that had supported them, and been supported by them, for many years. The honoring of Bo and Irene took place at the Cray Reception on Monday night when they received tokens of remembrance presented by Gary Jensen, President of CUG. Bo was given a remote controlled Mercedes and Irene received a necklace and earring set. Both spoke briefly Monday night but their official good-byes were said at the close of the executive general session the next day.

Rick Belluzzo, President of SGI, addressed the Tuesday general session where he presented his direction for the company. He touched on the company definition (visualization, data management and computation), the market focus (time to insight), the product roadmap, operation/execution (clear responsibility and accountability through annual plans and tracking of metrics) and the business model (key industries). One of the more interesting slides in this series showed that supercomputers migrate downward: the supercomputers of yesterday are the servers of today and the desktops of tomorrow. Prior to addressing the General Session, Rick and others had held a news conference to formally announce the SV1, the J90 follow-on.

Beau Vrolyk and Earl Joseph II wrapped up the executive General Session with information on the SV1 announcement followed by Q & A. . SGI has orders for more than 500 processors. Specific information on the SV1 can be found on the SGI web pages beginning with:

http://www.cray.com/products/systems/craysv1/intro.html.

CUG ELECTION

Sally Haerer ran unopposed for CUG President. Sam Milosevich, ELILLY, won a very close race with Nick Cardo, SSD-SS, for Vice President. The race for Secretary, a position held for 11 years by Gunther Giorgi, GRUMANN, became dynamic when Gunther withdrew and Margaret Simmons, SDSC, petitioned onto the ballot. This resulted in a race between Margaret and Eric Greenwade, INEL. It was won by Margaret. Bruno Loepfe, ETHZ, won the race for Director of Europe over Michael Brown, EPCC. ??? The remaining members of the new Board of Directors were not up for re-election: Barbara Horner-Miller, ARSC, Treasurer; Shigeki Miyaji, CHIBA, Director of Asia; Barry Sharp, BCS, Director of the Americas. Gary Jensen, UIUCNCSA, completes the Board as Past President.

SIG REORGANIZATION

Following a recommendation by the Future of CUG Committee, the Board of Directors, reorganized the SIGs into a two-tiered structure. Under the new structure, there are five Group SIGs, each of which is comprised of several Focus SIGs. The Board appointed Chairs for each of the five SIGs and for many of the initial Focus Groups. On Thursday afternoon, the SIGs met to discuss organizational issues and to confirm the focus areas within the SIG. The SIG organization which will carry forward to the Minneapolis CUG next May will be

Computer Center Management Group Chair - Mike Brown

User Services Chair - Leslie Southern

Operations Chair -

Communications & Data Management Group Chair - Hartmut Fichtel

Mass Storage Chair -

Networking Chair - Hans Mandt

Operating Systems Group Chair - Chuck Keagle

UNICOS Chair - Ingeborg Weidl

IRIX Chair -

Security Chair - Virginia Bedford

Programming Environments Group Chair - Jeff Kuehn

Compilers & Libraries Chair - Hans-Hermann Frese

Software Tools Chair - Guy Robinson

High Performance Solutions Group Chair - Eric Greenwade

Applications Chair - Larry Eversole

Visualization Chair -

Performance Chair -

TIDBITS FROM SESSIONS

CUG has 195 members, $193,869.21 in the bank and 271 attendees in Stuttgart

Future CUGs are slated for

Minneapolis MI, USA for May 24-28, 1999

Nordwijk, NL for May 22-26, 2000

An Origin 2000 meeting will be sponsored by CUG this fall

The SV1 is not IEEE-based but the SV2 will be

Walter Wehinger was named Chief Information Manager for CUG

SGI will take a global view for training and try to put together classes that won't have enough participants at the regional level

SESSION NOTES

Hardware:

The SGI hardware organization is split into two parts: Vector Supercomputing Development under Steve Oberlin will concentrate on the T90, T3E, J90, SV1, SV2 and GigaRing; the Advanced Systems Development under Rick Barr will concentrate on the SN1, SN2 and XIO products. The support organizations span the Mountain View and Chippawa Falls sites with Barr's organization located in both sites.

The SV1 has multi-streaming. The SV1 and SV2 are expected to move forward at Moore's law or super Moore's law rates. The SV1e will have faster processors, reduced interconnect latency, increased GigaRing interconnect speed and higher bandwidth.

The name Cray will continue to be applied to high-end, computational products, e.g., Cray SV1.

The SN1 employs 2nd generation Origin DSM architecture, a more scalable router and more ports. Each link is twice the speed of the Origin or the T3E router. It's scalable to a thousand processors.

The SN2 is the 3rd generation. It will use the Merced follow-on from Intel and will have a faster hub. It will employ flexible network architecture and has an adjustable balance of PEs and routers to fit the applications. It will be air-cooled.

Software:

UNICOS 10 is Y2K compliant; it will have updates released on a 3-6 month interval; there will not be a UNICOS 11.

The SV1 will be supported through UNICOS updates.

UNICOS/mk is at the 2.0 level and will be updated on a 6-9 month interval with weekly archives. UNICOS/mk will be active until 2000 and maintained until 2004. Cray believes the software MTTI of UNICOS/mk to be more than 3000 hours.
psched
was added in 2.0.2. 2.0.3 brought the
prime job
concept, improvement to
swap
and the implementation of express message queues. Future activity includes the migration and checkpointing of swapped jobs as well as DCE and DFS implementation.

Service:

There are more than 2000 employees in the SGI service area with a 2% turnover rate. While a few locations and classifications are difficult to recruit, in general they find recruitment easy. The role of the local Support Manager is customer satisfaction, account management, and the work environment and moral of their employees. SGI is striving for a common support environment and tools between IRIX and UNICOS.

Training:

Leslie Southern described how OSC takes a 2-day, instructor-led training course and makes it available on the Web for self-paced instruction. The instructor puts materials on the Web in his/her preferred format and they are converted to html. The instructor uses a wireless microphone and a projection system. Real audio and video are recorded using the Web Lecture System, WLS from NC State University. The resulting class can be viewed as a class (sound and video), a review (printable notes) or sound (lecture only) on the Web.

Documentation:

Lynda Lester gave pointers on how to use the Web effectively; she had lots of examples, of both good and bad usage, to emphasize her points. Among the suggestions were to put the important stuff at the top, watch out for platform specific gottchas such as monitor resolution, color differences, tables and line spacing. Provide the viewer with alternative ways to navigate through the document: Table of Contents, Article Index and Search Engines are a few. Viewers would rather scroll than click so make links predictive (links with more words are better than shorter, more cryptic links).

System Group Manager: Manage multiple systems... Group event tracking, Config mgt., Availability monitor, Notification based on group system events.

System Support Console: gui and ascii interfaces; launch and configure; control notifiers and reports.

Proposed to be available next year for Irix; written for NT as well.
Intent is to provide for U/mk (unknown: "Efforts all directed to IRIX now").

Industry Directions in Storage
Mike Anderson, SGI/Cray

Seagate & IBM are primary players in high performance disks. Quantum bought out by Matsushita (sp).
Market dominated by desktop (70% of units), roughly 17% is high performance and 13% mobile.
CD-RW likely to takeover CD market by 2000-2001.
Industry has not accepted IBM SSA disks.
Fibre channel-0wid has industry acceptance.
Capacities growing (expect 40gb drives by end of 1998).
MTBF measurement varies... for some it's when 2/3 have failed, for some it's measured by returned failed drives (many of which are just thrown away so by measurement they're still ticking); useful life expectancy is 5 years, but economic life may be less than actual life.
LTO (Linear Tape Open) (
www.lto-technology.com
) is a new emerging media type/standard... near term expect 100gb capacity, expect 800gb futures.
Super-DLT (100gb/cart) also should be out by 1999, SGI will support when available.
STK Eagle will be released June 1998, SGI will need 4 months for validation testing once released.

General Session

Does IT for others: 50% split of public (University) vs. industry at HWW.
Objectives:
More power at same cost; less operating personnel; smaller infrastructure costs; ability to solve very large problems;
bi-directional knowledge transfer between industry and University.
Academic & Industry: 2 very different cultures, took time to get it together.

Running in house now.
Will be the resultant operating system for all (move from dual expertise to less duplication with more applications available).
Support for large systems: 64 to 4000 CPU's: fault tolerance, reliability, high-end features like checkpoints and accounting.
Support for server systems: 4-64 CPUS: general purpose workload; fault containment; different requirements from large.... constant availability.
Architecture: scalability and fault containment.

8 participants, most with perfomance interest. Many viewed performance here as application or algorithms oriented vs. capacity planning and data center management which may be covered by Group 1 (not clear). Lots of TBD's.

Reminder: Use qsub's "-l mpp_t" Option

The "-l mpp_t" option allows you to request a specific amount of time for your MPP job to run. You should request, as closely as possible without going under, the actual time your job needs, and not simply request the maximum possible time for a given queue. Realistic requests improve job scheduling by both NQS and by the real people who manage the system. (Yes, we look at the time requests!)

Smaller mpp_t requests have a chance of running sooner. At ARSC, this is especially true if the request is under 30 minutes (which puts it into one of the "Quick" queues, which have the highest priority).

For help, see "man qsub" or:

http://www.arsc.edu/support/howtos/usingnqs.html

For more on ARSC's checkpoint procedures and queue policy, see "news chkpnt_sched" and "news queue_policy" on yukon.

TARGET Follow-up

[ Thanks to the reader who sent this response to last week's article on "TARGET." ]

Regarding the TARGET environment variable, another possibility would be to use the following TARGET setting:

setenv TARGET=cray-t3e,memsize=256M

See "man target".

Although your solution (TARGET=target) works for your machine (since all PEs contain identical amounts of memory), users of other T3Es, or your users if your T3E is eventually upgraded with PEs of differing size, may prefer this more general solution to allow them to specify which size PE to compile for.

Quick-Tip Q & A

A: {{
In C, you don't need to specify the size of arrays at compile time
(ie. pointers are basically arrays). So you could have a code
fragment:
double* x;
double* y;
for(i=0;i<SIZE;i++) {
y[i] = alpha*x[i] + y[i];
}
How can you view C arrays in totalview? }}
# In the GUI version, double-click on the array you want to display.
# Then in the resulting data_object_window, double-click on "type"
# (or choose "Edit -> Type" from the menu). Then specify the type as
# <double>[SIZE]. The value of SIZE will probably need to be
# explicitly typed. For example, enter "<double>[100]" if SIZE ==
# 100.
Q: A shell alias allows Unix users to create custom mnemonics and
short-hands for commands or command strings. Two common aliases:
alias ll='ls -lF' <korn shell syntax>
alias mroe more <csh syntax>
What's your favorite alias?
(Send it in with a brief explanation. If you can't choose only one,
send two--they're small.)

The University of Alaska Fairbanks is an affirmative action/equal
opportunity employer and educational institution and is a part of the University
of Alaska system.
Arctic Region Supercomputing Center (ARSC) |PO Box 756020, Fairbanks, AK 99775 | voice: 907-450-8602 | fax: 907-450-8601 | Supporting high performance computational research in science and engineering with emphasis on high latitudes and the arctic.
For questions or comments regarding this website, contact info@arsc.edu