Clusters for Nothing and Nodes for Free

When the users are away, your company's legacy desktop systems can become a powerful temporary Linux cluster.

At Quantum Magnetics we do contract R&D. We often need to design silicon
chips, simulate electromagnetic systems and analyze masses of data from
field tests. When a single set of regression tests started taking longer
than a working day to perform, coauthor Alex Perry found himself wondering how to get
short-term access to a cluster. We describe here the sequence of steps
that enabled us to set up an OpenMosix cluster with little effort and
without having to purchase anything.

Each productivity increase justified
putting time into the next step of bringing up the
company-wide cluster. We omit details here
that are provided in the instructions and FAQs for each
project (see the on-line Resources section), partly because things will have changed
by the time the article goes to print and partly for brevity.

Choose an Application

The simplest applications to run on a cluster are command-line based
and run as multiple instances on one computer. Applications don't
have to be written specifically for Linux, because they could use WINE or another
portability layer. If multiple instances are not possible, much more
time has to be put into providing a virtual machine abstraction layer.
It is worth checking your specific application before putting any effort
into building a cluster to see whether it is capable of benefitting
from an OpenMosix-based cluster.

Most of our logic code is written in Verilog partly because, as the
joke goes, we can't type fast enough to use VHDL.
Mainly, though, our reason is that a broader range of tools is available in
Verilog. We use several
closed-source place-and-route tools under Microsoft Windows, the runtime of
which is tiny, so putting these on the cluster is not worth the effort.
For simulation, we have both open- and closed-source options. It is
convenient to use the graphical tools (all closed-source, unfortunately)
that have IDE source-level debuggers when trying to track down a bug,
but these either don't like clusters or have a hefty licensing price
tag when running on a cluster. We use Icarus Verilog for
non-interactive simulations, as regression testing is more than 99%
of the total simulation workload. We like it because
multiple simulators can run in parallel; each simulator is a single Linux
process; the tool has its own public regression suite;
the developers are helpful and responsive; and the syntax parser is paranoid and accurate.

The paranoia of the syntax parser flags a lot of problems for us.
Many parsers simply select one interpretation of ambiguously
written source, leading to incorrect behaviour that is effectively a bug.
In contrast, Icarus immediately complains about ambiguities, and after
we've made the tiny rewrite, the synthesized chip suddenly starts working
the way that it was intended.

The developers for Icarus, by responding rapidly to bug reports and
patches, enhance the value of the simulator in our work. We update
from CVS to benefit from those almost-immediate source changes. In
addition,
it is much easier to standardize one virtual machine (the cluster)
than to manage the versions on the individual workstations.

We run all our proprietary simulation tests
immediately before and after a new version of
Icarus is retrieved from CVS. About once a year,
the simulation results are different, so we submit
a bug report that localizes the problem to a test
case outside our proprietary work. In this way,
all our proprietary work acts as an additional
regression suite for the Icarus Project without
us having to make it available to our competitors.
It also ensures that any official release of Icarus
is useful to us.

In our engineering design work, we use make, as shown in Listing 1, to automate
test execution and to manage all the Verilog source files, the reference
implementation in C, validated test data, the pool of regression tests
and all the simulation results.

Without the cluster, between six and ten hours were needed to complete
all the dependencies that resulted from a minor change to a source file.
Logic simulation usually is about a factor of a million slower than real
life, so the regression simulates only about 20 milliseconds of time.
The tests have to be selected carefully, because the board can run for
as long as 30 seconds per use (about a year of simulation).