Getting Started with Condor

Cluster computing emerged in the early 1990s when hardware prices were
dropping and PCs were becoming more and more powerful. Companies were
shifting from large mini-computers to small and powerful micro-computers, and many people
realized that this would lead to a large-scale
waste of computing power, as computing resources were being
fragmented more and more. Organizations today have hundreds to thousands of
PCs in their offices. Many of them are idle most of the time. However, the
same organizations also face huge computation-intensive problems and thus require great
computing power to remain competitive—hence the stable demand for supercomputing
solutions that largely are built on cluster computing concepts.

Many vendors offer commercial cluster computing solutions. By using
free and open-source software, it is possible to forego the purchase
of these expensive commercial cluster computing solutions and set up
your own cluster. This article describes such a solution,
developed by University of Wisconsin, called Condor.

The idea behind Condor is simple. Install it on every machine you
want to make part of the cluster. (In Condor terminology, a Condor cluster
is called a pool. This article uses both terms interchangeably.) You
can launch jobs from any machine, and Condor matches the requirements of the job
with the capabilities offered by the idle computers currently available.
Once it finds a suitable idle machine, it transfers the job to it,
executes it and retrieves the results of the execution. One of the
features of Condor is that it doesn't require programs to be modified
to run on the cluster.

In practice, however, Condor is more complicated. Condor is installed
in different configurations on each machine. Each Condor pool has a
central manager. The central manager, as the name implies, is the central
manager of the cluster. It manages the detection of new idle machines
and coordinates the matchmaking between job requirements and
available resources. Machines in a Condor pool also can have Submit
and Full Install configurations. Submit machines are those machines that
can only submit jobs, but can't run any jobs; Full Install machines are
machines that can do both, submit and execute.

Requirements and Installation

Condor does not require the addition of any new hardware to the network;
the existing network itself is sufficient. Condor runs on a variety
of operating systems, including Linux, Solaris, Digital Unix, AIX, HP-UX and Mac OS X as well as MS Windows 2000 and XP. It supports
various architectures, including Intel x86, PowerPC, SPARC and so on. However,
jobs developed on one specific architecture, such as Intel x86, will
run only on Intel x86 computers. So, it is best if all the computers in
a Condor pool are of a single architecture. It is possible, however, for
Java applications to run on different architectures.

In this article, we cover the installation from basic tarballs
on Linux, although distribution/OS-specific packages also
may be available from the official site or sources. (See the Condor Project site for more
details,
www.cs.wisc.edu/condor/downloads.)

Download the tarball from the Project site, and uncompress it with:

tar -zvf condor.tar.gz

The condor_install script, located in the sbin directory, is all you need
to run to set up Condor on a machine. Before you run this script, add a
user named condor. For security reasons, Condor does not allow you to run jobs
as root; thus, it is advisable to make a new user to protect the system.

One of the first questions the script asks is how many machines are
you setting up to be part of the pool? This is important if you have a
shared filesystem. If you do, the installation script will prompt
you for the names of those machines, and the installation of Condor on
those machines will be handled by the software itself. If a shared filesystem does not exist,
you have to install Condor manually on each
system. Also, if you want to be able to use Java support, you
need to have Sun's Java virtual machine installed prior to installing
Condor. The install script provides plenty of help and annotation on
each question it asks, and you always can turn to Condor's
comprehensive user manual and its associated mailing lists for help.

The variable $CONDOR is used from now on to denote the root path where
condor has been installed (untarred).

After the installation, start Condor by running:

$CONDOR/bin/condor_master

This command should spawn all other processes that Condor requires. On the
central manager, you should be able to see five condor_ processes running after entering:

ps -aux | grep condor

On the central manager machine, you should have the following processes:

condor_master

condor_collector

condor_negotiator

condor_startd

condor_schedd

All other machines in the pool should have processes for the following:

condor_master

condor_startd

condor_schedd

And, on submit-only machines you will see:

condor_master

condor_schedd

After that, you should be able to see the central manager machine as part
of your Condor cluster when you run condor_status:

Right source code is NOT available from the website, however it is STILL opensource, because if you request it from them and have a good reason to do, like extending it for something, they will not deny the request.