The OSCAR Revolution

Richard describes the history and goals of the Open Source Cluster Application Resource.

The Software Stack

It took nearly a full year, but OSCAR had a beta
demonstration at SC2000 in Dallas, Texas at the Oak Ridge National
Lab booth in November 2000. The beta was run on a heterogeneous
cluster of servers provided by Dell and SGI. The first release was
announced shortly thereafter and made a successful debut at
LinuxWorld Expo in New York City in February 2001, at the Intel
booth. Since then, there have been continuous improvements in the
OSCAR software stack, which currently includes:

Linux installation: SIS (system installation
suite). SIS is an open-source cluster installation tool based on
the merger of LUI (the Linux utility for cluster install) and the
popular SystemImager. SIS, developed by Michael Chase-Salerno and
Sean Dague from IBM, made its debut in the 1.2.1 version of OSCAR.
Most recently, Brian Finley of Bald Guy Software, the creator of
SystemImager, has been attending the OSCAR meetings and looking for
free beer, as in free beer.

Security: OpenSSH—the most common way to allow
secure connections in a Linux environment. OpenSSH is a collection
of packages that handles secure connections, server-side SSH
services, secure key generation and any other functions used to
support secure connections between computers.

Cluster management: for cluster-wide management
operations, OSCAR uses the Cluster Command and Control (C3)
management package developed at Oak Ridge National Lab by Stephen
Scott and Brian Luethke, an East Tennessee State University student
working at ORNL. C3 provides a “single-system illusion” so that a
single command affects the entire cluster. C3 remains installed on
the cluster nodes for later use by cluster users and
administrators.

Programming environments: Message-Passing Interface
(MPI) and Parallel Virtual Machine (PVM). Most cluster users write
the software that runs on the cluster. There are many different
ways to write software for clusters. The most common approach is to
use a message-passing library. Currently, compilers or math
libraries installed by OSCAR come from the Linux distribution. Both
LAM/MPI and MPICH have been available since OSCAR 1.1.

Workload management: Portable Batch System (PBS)
from Veridian and Maui Scheduler (developed by Maui High Times
Computing Center). To time-share a cluster, some type of workload
or job management is needed. Maui acts as a job scheduler for
OSCAR, making all resource allocation and scheduling decisions. PBS
is the job server/launcher and in addition to launching and killing
jobs, handles job queues.

MSC.Software and OSCAR

MSC.Linux, a distribution developed by the Systems Division
of MSC.Software Corporation, is of special importance in the
acceptance of OSCAR. Shortly after the 1.0 version of OSCAR was
available, MSC.Software announced their own cluster solution, the
MSC.Linux Version 2001 operating system. This 2001 offering was in
large part based on OSCAR, the first commercial offering based on
the work of the OCG. MSC.Software's Joe Griffin added a Webmin
interface to LUI (the first OSCAR cluster installation tool), which
generated LUI bottom-line commands for multiple nodes to provide an
easy-to-use interface in defining the nodes of the cluster and what
resources to install on each. One of the original intents of the
OCG was that commercial companies would see the value in the open
OSCAR software stack and build their own proprietary or open stacks
around the OSCAR stack. In so doing, companies using OSCAR would be
freed from the mundane chores associated with building a cluster,
such as providing the basic infrastructure, and could concentrate
instead on more cutting-edge improvements to distinguish their
offering.

Working Together

Like other far-flung open-source projects, it was clear from
the beginning that doing the work of the consortium face to face
would not always be an option. The travel expense was simply too
great, and it was difficult to align so many schedules. To
coordinate the work, the group held open weekly phone conferences
and would rely on mailing lists and an occasional meeting at a
workshop or expo. There were face-to-face “integration parties”
held quarterly, one at Intel in Hillsboro, Oregon and another at
NCSA in Illinois. But for integrations held between meetings, a new
construct was developed, called DIP Day, for distributed
integration party. The intent of DIP Days was that everyone working
on the project that had a cluster would set aside those days to
work on OSCAR, jointly and remotely. Everyone would download the
OSCAR package and install and run it, reporting any bugs to the
group. On DIP Days, programmers were expected to provide fixes in
real time, so that multiple iterations of the code could be tested
shortly. Several conference calls with the entire team were held
every DIP Day to assess progress and assign new work and
priorities. By loosely coordinating the group between DIPs and
face-to-face meetings, OSCAR made great strides in reliability and
function.

Comments

Comment viewing options

As Senior Executive Manager of Product Operational Testing (POT) at the Maui "High Times" Computing Center, let me say that we're like totally stoked that the OSCAR dudes are using Maui Wowee scheduler in their groovy software!

We're gonna be like helping out with their upcoming Benchmark Oscar for the Next Generation (BONG) project. Oops maybe I wasn't sposed to mention that yet, but kudos all around and oh yeah I forgot to mention that we now print all of our documentation on like organically grown hemp stock. But it mostly just gives you a headache (reading or smoking it). Bummer.

One of the projects at the Open Systems Lab (Ericsson Research) is the ARIES project
that targets improving the clustering capabilities of Linux to fulfill the carrier class requirements. ARIES shares some overlapping activities with the OSCAR project. However, the typical Ericsson Linux cluster supports many high-end characteristics that are not available on an OSCAR cluster.

Telecommunication systems are one of the several potential specialized platforms that can take full advantage of clustering. These systems support some of the most stringent requirements in terms of reliability, availability, and scalability. They must be available 99.999 percent of the time which includes hardware and software upgrades (including operating system) for any mission critical server applications. Among these characteristics are build-in redundancy schemes at different levels such as redundant Ethernet connections, redundant Network File System servers, and software RAID support for data redundancy, special methods for booting diskless nodes, optimized traffic distribution and
load balancing schemes and so on.

Trending Topics

Upcoming Webinar

Getting Started with DevOps - Including New Data on IT Performance from Puppet Labs 2015 State of DevOps Report

August 27, 2015
12:00 PM CDT

DevOps represents a profound change from the way most IT departments have traditionally worked: from siloed teams and high-anxiety releases to everyone collaborating on uneventful and more frequent releases of higher-quality code. It doesn't matter how large or small an organization is, or even whether it's historically slow moving or risk averse — there are ways to adopt DevOps sanely, and get measurable results in just weeks.