Taming the Beast

The right plan can determine the difference between a large-scale system administration nightmare and a good night's sleep for you and your sysadmin team.

Configuration Management

In small environments, you can maintain Linux systems successfully without a
configuration management tool. This is not the case in large
environments. If you plan on running a large number of Linux systems
efficiently, I strongly encourage you to consider a configuration management
system. There are currently two heavyweights in this area, Cfengine and
Puppet. Cfengine is a mature product that has been around for years, and it
works well. The new kid on the block is Puppet, a Ruby-based tool that is
quickly gaining popularity. Your configuration management tools should,
obviously, allow you to add or modify system or application configuration
files to a single system or groups of machines. Some examples of files you
might want to manage are /etc/fstab, ntpd.conf, httpd.conf or
/etc/password. Your tool also should be able to manage symlinks and
software packages or any other node attributes that change frequently.

Configuration Management Tools

Cfengine is the grandfather of configuration management systems. The
project started in 1993 and continues to be actively developed. Although I
personally find some aspects of Cfengine a little clunky, I've been using
it successfully for many years.

Puppet is a highly regarded Ruby-based tool that should be considered by
anyone considering a configuration management solution.

Regardless of which configuration management tool you use, it's important
to implement it early. Managing Linux configurations is something that
should be set up as the node is being installed. Retrofitting configuration
management on a node that is already in production can be a dangerous
endeavor. Imagine pushing out an incorrect fstab or password file, and you
get an idea of what can go wrong. Despite the obvious hazards of fat-fingering a configuration management tool, the benefits far outweigh the
dangers. Configuration management tools provide a highly effective way of
managing Linux systems and can reduce system administration
overhead dramatically.

As an added bonus, configuration management systems also
can be
used as a system backup mechanism of sorts. Granted, you don't want to store
large amounts of data in a tool like Cfengine, but in the event of system
failure, using a configuration managment tool in conjunction with your node
installation tools should allow you to get the system into a known good state
in a minimal amount of time.

Provisioning

Provisioning is the process of installing the operating system on a machine
and performing basic system configuration. At home, you probably boot your
computer from a DVD to install the latest version of your favorite Linux
distro. Can you imagine popping a DVD in and out of a data center full of
systems?
Not appealing. A more efficient approach is to install the OS over the
network, and you typically do this with with a combination of PXE and
Kickstart. There are numerous tools to assist with large-scale
provisioning—Cobbler and Spacewalk are two—but you may prefer to roll your own. Your
provisioning tools should be tightly coupled to your configuration
management system. The ultimate goal is to be able to sit at your desk, run
a couple commands, and see a hundred systems appear on the network a few
minutes later, fully configured and ready for production.

Provisioning Tools

Rocks is a Linux distribution with built-in network installation
infrastructure. Rocks is great for quickly deploying large clusters of
Linux servers though it can be difficult to use in mixed Linux distro
environments.

Cobbler, part of the Fedora Project, is a lightweight system installation
server that works well for installing physical and virtual systems.

Hardware

When it's time to purchase hardware for your new Linux super cluster,
there are many things to consider, especially when it comes to choosing
a good vendor. When selecting vendors, be sure to understand their
support offerings fully. Will they come on-site to troubleshoot issues, or do
they expect you to sit for hours on the phone pulling your hair out while they
plod through an endless series of troubleshooting scripts? In my
experience, the best, most responsive shops have been local whitebox
vendors. It doesn't matter which route you go, large corporate or whitebox
vendor, but it's important to form a solid business relationship, because
you're going to be interacting with each other on a regular basis.

The odds are that old hardware is more likely to fail than newer hardware.
In my shop, we typically purchase systems with three-year support contracts
and then retire the machines in year four. Sometimes we keep machines
around longer and simply discard a system if it experiences any type of
failure. This is particularly true in tight budget years.

Purchasing the latest, greatest hardware is always tempting, but I suggest
buying widely adopted, field-tested systems. Common hardware usually means
better Linux community support. When your network card starts flaking out,
you're more likely to find a solution to the problem if 100,000
other Linux users also have the same NIC. In recent years, I've been very
happy with the Linux compatibility and affordability of Supermicro systems.
If your budget allows, consider purchasing a system with hardware RAID and
redundant power supplies to minimize the number of after-hours pages. Spare
systems or excess hardware capacity are a must for large shops, because the
fact of the matter is regardless of the quality of hardware, systems will
fail.

It's is a virtual drop-in replacement for the venerable Nagios project which seems to have stalled into legacy land. It also has a performant, distributed architecture that includes database support for Postgres and Oracle as well as MySQL. It's pretty sweet and easy to deploy too. Something to watch and try out.

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.