Energizing Grid Computing

04/25/2002

Beyond all the battles between Linux and Microsoft on the desktop, Linux has already made its bones in business. IBM has invested over US$1 billion in Linux technologies, while Sun, Compaq, and HP have all made Linux a key part of their business strategies.

One of the hottest areas of computing that Linux plays in has come to be called "grid computing." We used to call this variously "supercomputing," "high-performance computing," "distributed computing" and "peer-to-peer networking (P2P) ."

All of this terminology was apparently causing too much confusion in the marketplace, so the marketing-powers-that-be came up with the term "grid computing" as a catchall marketing term. It's used to describe all of the various mechanisms that allow massive numbers of computers and almost unimaginable computing power to be applied to problems ranging from drug discovery and weather-modeling to SETI@home and computer-generated animation.

Many people know about the showpieces of Linux high-performance computing like IBM's "ASCI White," a Department of Energy supercomputer, and the rendering farms used to create such blockbusters as Titanic and Shrek, but more pedestrian uses of Linux power have been harder to find.

The State of the World in Grid Computing

Until the term "grid computing" hit the scene, there were basically two ways to make a distributed, parallel application, and they both involved a lot of infrastructure development and a lot of programming.

On the infrastructure side, one generally had to design a more-or-less custom network for running distributed applications that was optimized for moving the inter-process communications of your application efficiently. Unless you're solving toy problems, this involves lots of expensive networking hardware and a lot of planning.

On the programming side, unless you were really brave and wanted to invent your own mechanisms for slicing and dicing data and getting it to processors, you used either the Message Passing Interface (MPI) or the Parallel Virtual Machine (PVM) systems as the communications libraries to effect the parallel communications in your application.

In the last six months there have been several major announcements of efforts to formalize the grid computing revolution. IBM and Sun have each announced hardware systems specially designed to support grid computing. They’ve also delivered software packages to make running programs in a distributed environment easier and more productive, with software libraries which allow everything from Java to Legacy Mainframe applications join the grid.

These join a number of existing efforts, including those sponsored by the Defense Advanced Research Projects Agency (DARPA) and the National Science Foundation (NSF). They include the Cactus and Globus projects that provide users with an entire set of libraries and communications systems to grid-enable their applications.

These are great projects that solve a lot of the low-level networking issues and allow many kinds of applications to take advantage of distributed-computing capabilities. The only drawback is the large investment of time and resources to convert an existing application to run on the grid. And once you port your application to a "grid" structure there’s no guarantee that you’ll get better or faster results, which should be one of the primary goals of creating a distributed or parallel application.

So, with all of the buzz and hype, what can you (and your enterprise) actually do right now to speed up your applications and take advantage of this new grid-computing paradigm?

The Powerllel System

One of the hottest new systems to help businesses jumpstart the process of parallelizing applications and reaping the benefits of faster applications has been developed by a New York City-based start-up called Powerllel.

The founders of Powerllel have a long history at large brokerage firms of developing trading analytics, a process that takes the idea of needing results "fast" to a whole new level. In the financial markets there aren’t many differentiators -- when you get down to it everyone is using the same algorithms, and more or less the same data. What differentiates the winners from the losers in trading systems is how fast and accurately you can get to a result and how much data you can plow through in the time available to make a trade.

For many firms the answer has been to buy traditional super-computers, but this is an expensive and limited solution because of the expense of these systems and the insatiable appetite for computer power that financial (and other businesses) have.

The Powerllel developers figured there had to be a way to capitalize on off-the-shelf hardware (personal computers, workstations, and UNIX servers) to make parallel-application development easier and more effective, while ensuring that existing investment in software technologies (like applications written with MPI and PVM and custom parallel libraries) didn’t go to waste. They accomplished this and a whole lot more with the Powerllel Software Suite.

DNET, Lobsters and Adapters: Oh, MY!

Powerllel has created an innovative system that allows applications to be developed and/or re-targeted in a matter of days, as opposed to months (or even years), that is typical with most parallel program-development processes.

The Powerllel system consists of three major components:

A distributed network layer, called DNET, that completely hides the underlying networking and communication system from an application. This means programmers no longer have to code their applications to a particular network communications model (MPI, PVM, Linda, etc.). The DNET system decides what the most effective communications mechanism is, given the computational resources available on the network and what communications models they support.

A load-balancing system called Lobster (Load-Balancing Sub-Task ExecuteR). Lobster is an integrated mechanism to distribute slices of a computation across a network of computational resources (such as a collection of Linux boxes, Suns, mainframes, whatever) and to ensure that failover, recovery, and other critical services are managed on behalf of the application.

Adapters -- application frameworks that represent various styles of execution for distributed/parallel computations. These adapters include simple parametric systems (send a bunch of parameters to a subroutine, get back a discrete result, much like SETI@home-type calculations), and complex tree-structured recursive systems where all of the computations are interdependent and intertwined (typical of financial models, pharmaceutical modeling, weather modeling, and other real-world problems).

All of this is brought together in a system that allows the developer to focus on the problem being solved, rather than the intricacies of how to structure communications between nodes or whether MPI is a better model than PVM for a given class of problem.

What's the Big Deal?

Embarrassingly parallel problems are things like SETI@home, where you have a whole lot of data that you’re applying some kind of analysis to, but it doesn't really mater where you break up the data. In effect, you're handing a bunch of parameters (the data) to a subroutine and waiting for some discrete set of results to come back. More importantly, the results from one data block are not linked to the results from any other block. Everything is discrete; this is often referred to as parametric computing.

Deeply parallel problems are those in which most, if not all, of the calculations are intertwined with some other part of the computational process, either as an input or a potential side effect. These are often referred to as non-linear or tree-structured problems. Weather modeling, financial models, and drug development are great examples: every aspect of the computational process is interlinked.

These deeply parallel problems can take vast amounts of time to get right if coded by hand: there are just too many things that have to be done exactly right, from networking decisions and communications libraries to the decomposition of the algorithms themselves. Mistakes can costs millions in redesign, re-coding, and delays in deployment.

The Powerllel solution is a revolution in the development of "grid" applications because the Powerllel Software Suite presents a system to the developer where they provide the fundamental algorithm to be solved, compile the application, and they're good to go across any range and type of computing system (Windows, Mac, Linux, UNIX, etc.). The DNET and Lobster mechanisms make all the hard decisions about network topology, routing and failover/robustness, while the Adapters determine which kind of processing structure makes sense given the kind of algorithm being solved.

The Bottom Line

Grid computing is the newest and hottest trend in high-performance computing, and will help companies -- eventually -- take advantage of the power of distributed systems. What's been missing is the ability to simplify the application development or porting process and get applications up and running without months and/or years of expensive error-prone development: Powerlell's Software Suite just may be the killer-app for grid computing.