I have somewhat of a general question, and I hope this is the right place to ask.

We currently have 11 (9 Red Hat/2 Centos) servers, which have been purchased at different times over the last 5 years, and although they all have Linux OS, there are hardware differences between them, simply because they were bought at different times. However, we need a core set of bioinformatics software installed on all of the machines to allow users to have the same environment. In the past we have used a shared mounted drive, but we have run into problems when software does not work on a given machine because not all necessary libraries were installed, etc.

Long story short, I’m looking for tips and advice as to the best approach to this problem. Do you think working on a shared drive and troubleshooting issue as they occur is the best path, or is there an improved method I’m not privy to (software/ theoretical). Ultimately I feel that trying to maintain each machine independently has to be the most painful method.

3 Answers
3

I don't think their is "one best approach", but an oft-touted solution is to use Puppet or similar software (Chef, CFEngine).

In my experience its quite a lot of work to set up, but useful in an environment where you have a lot of similar systems. That said, I found the solution cumbersome and quite a bit of work.

Maybe "a better" way to go - and I use a hybrid solution for an organisation I work for - and in addition to using Puppet to "minimally manage" key parts of the system, I have also rolled out a PXE booting system which means its very easy to reimage systems so they all have the same "base config" without any manual labour.

Another technique in my arsenal is to have a "master machine" which can SSH into all the others and execute the same job on all of them with a single command. (We have a script which takes the entire command we want to execute as a parameter and executes it over SSH on all machines).

I have to say that for me PXE is actually the minimal set up. After I have a base system set up by PXE that can run puppet, I use that to configure everything else.
–
ptmanMar 10 '13 at 21:12

Another thought - If you want each machine to appear "identically", it might make sense to look at converting them to virtual machines (for example KVM), and then using virtualised drivers so the devices are identical (rather then, for example sda vs hda, ethX vs bondx etc). This does, however impose a performance penalty - about 5% for KVM.
–
davidgoMar 11 '13 at 2:54

I used to use cluster ssh (cssh) to handle sshing into multiple machines at once. It popups up multiple xterm windows, and will mux your commands onto the different sessions, and you can select individual windows to handle different machines individually. This approach worked, but was somewhat of a pain.

Since then I've started using apt-dater. This runs subprocesses that ssh to the target machines and run the appropriate update commands (supports apt, rug, and yum). It'll also let you select new packages for installation. Despite being called apt-dater, it supports yum, so should work with CentOS/RHEL systems.

In order to make sure necessary libraries are present, use a build system to package the software (e.g. as rpm). The package should then contain information on requirements (including versions), and can be installed via a simple command (yum in the case at hand), possibly via a parallel shell or a tool like Puppet or CFengine.
For Open Source software, you can use Open Build Service (https://build.opensuse.org/; despite the URL, it does RH and Debian as well). OBS itself is Open Source, so you can even run it at your site (but the setup might be complex -- I have never tried).

The basic principle is to write an RPM spec file that contains a general recipe for building the software. If the software brings a decent build system (e.g. autotools), the spec file mainly consists of a list of required libraries (usually without version numbers, or only with minimum requirements), and a call to the software's build system. In the case of autotools, that would be configure (with options to enable desired functionality)/make/make install.
A list of files is also required, but that can often be kept short by using shell globs.

The build system will then install the required packages (say, libfoo-devel) and all of their dependencies, run the recipe, and create an rpm package from you file lists. Shared libraries that the packages software links to will be detected automatically, and a corresponding requirement inserted into the rpm. When installing, yum will use these requirements to add the libraries when you install the software. A proper build system can also run a lot of checks on built packages, e.g. look for serious compiler warnings, violations of the file hierarchy, suspicious file permissions etc.

Writing a spec file that will build without modifications across various versions of one distribution (or closely related distributions, such as RH and Fedora) is usually very little extra work. With a little more work, even cross-distro specs (e.g. Opensuse/Fedora) can be done.