February 15, 2005 | As clusters take on more of the computational chores within life science organizations, the challenge for researchers is how to make sure software is running efficiently on their systems.

Many open-source and commercial diagnostic tools can probe a cluster's performance. But virtually all of these tools are designed for use by the experienced software developer. This quarter, Engineered Intelligence, Microway, and PathScale are bringing new tools to market that are aimed at the scientists themselves.

The new tools are part of a class of tools that Art Wieboldt, marketing manager for IBM Deep Computing, and others call productivity software. "This is a layer of software that [functionally] sits above the operating environment and management layer software," Wieboldt says.

"It's the next layer [of software] up the stack from operating environment tools such as debuggers, compilers, and math libraries," Wieboldt says. He notes that productivity software includes such things as interconnect management tools, trace analyzers, performance tuning tools, and parallelization tools.

Changing Requirements
The need for new tools — especially ones that work at Wieboldt's productivity layer — is being driven by two major life science software development trends.

First, there is the porting (to clusters) of software that used to run on SMP Unix mini-computers and high-end workstations. Industry experts say there is a large installed base of such software (much of which was custom written) within the life sciences.

The second trend is to take software that was written to run on a single Linux server and move it to a cluster.

In both areas, the big issue is how to get software that was designed to run on a system with tightly integrated memory, processors, and data management to run efficiently on multiple independent cluster nodes.

For several years, application developers and programmers have had a bevy of tools that helped in this area. For example, vendors such as Verari Systems Software (formerly MPI Software Technology), Scali, Intel, and Scientific Computing Associates have been offering tools that help parallelize applications, tune performance, and manage the interconnection of server nodes and storage systems.

The new tools from Engineered Intelligence, Microway, and PathScale work in these areas, but with a twist. The tools are some of the first designed specifically with the needs of a researcher — and not necessarily the experienced application developer or programmer — in mind.

One distinction between these new tools and what has typically been available is ease of use. "I have been using trace analyzers for years," says Martin Cuma, scientific applications programmer at the University of Utah. In some cases, the newer tools simply handle mundane tasks that were required to use the older tools. For instance, Cuma notes that with some trace analyzer tools he would have to run the software on each node and collect the data together to look for patterns. He is now using PathScale's OptiPath MPI Acceleration Tools, which carries out performance analysis of applications that use message passage interface (MPI) techniques to run an application in a distributed mode on a cluster.

PathScale says that the OptiPath software is easier to use than many open-source performance analysis programs. For instance, the software automates many manual tasks (e.g., running test programs in multiple nodes and aggregating the information in one place) that typically are required to diagnose a cluster performance problem.

The PathScale OptiPath MPI Acceleration Tools software has been in limited distribution for several months; it will be commercially available this spring.

In a similar ease-of-use vein, Microway's MPI Link-Checker offers performance analysis features that a developer could use, but the tool presents that information in a way that helps scientists troubleshoot problems and tune and optimize an application's performance.

In the past, performance monitoring was often done using benchmark programs alone. But this approach has limitations. For instance, if an application underperforms, benchmarking software typically cannot tell if the problem is a single bad cable, a systems-wide problem, or poorly written application algorithms. And even when a benchmark program can isolate such problems, the user must sift through data to determine the root cause of a problem.

The MPI Link-Checker runs an MPI application on each node and then measures latency and bandwidth between all of the computational nodes of a cluster. The software then collects the results for all the nodes and plots the latency and bandwidth between all pairs of nodes in the cluster. The visual display of the data makes it easy to identify problems. For instance, if a single cable, network interface card, or node is bad, the program's display flashes a yellow background on the bad node to point out the problem.

Microway used the tool to test its own systems before shipping them, and would include the tool for its distributors and partners to use. After feedback from partners, systems vendors, and others, Microway decided to commercialize the tool. The MPI Link-Checker was announced late last year and will be commercially available this quarter.

Test Driving Applications
The PathScale and Microway products can help troubleshoot, optimize, and tune clustered applications' performance. But before such testing is required, an application must be running on a cluster. And that's where the Engineered Intelligence tool comes in.

Labs looking to move applications onto clusters often need help porting software that ran on a single machine to the distributed cluster environment. Many organizations do not have the technical skills to do this and must hire developers to perform the porting.

Engineered Intelligence's CxC tool helps a scientist develop a parallel application on his or her desktop computer, and then helps deploy that application to a cluster. Using virtual machine technology (in a similar way that Java applications run on virtual machines), CxC lets a scientist use a desktop computer to prototype and test a program intended for a cluster.

Essentially, the CxC software lets scientists define a parallel computer environment. In that environment, the scientist can then run existing programs or create new applications and test them on a virtual system with a pre-defined number of nodes. (CxC can work with programs written in C, C++, or Fortran.)

CxC and the other new productivity layer tools that are easier to use are part of general trend in high-performance computing. "There's been a maturing of cluster computing," says Michael Swenson, research manager at Life Science Insights. "There's an expanding base of users for clusters."

In the past few years, the increased use of clusters has focused attention on the operational management of the systems. Today, Swenson says, "you see cluster management tools maturing; IT vendors offering packaged, pre-loaded small cluster systems for the life sciences; and improving tools and compilers for developing parallel applications."