If you're a modern sysadmin you've probably been sipping at the devops
koolaid and trying out one or more of the current system configuration
management tools like puppet or chef.

These tools are awesome - particularly for homogenous large-scale
deployments of identical nodes.

In practice in the enterprise, though, things get more messy. You can
have legacy nodes that can't be puppetised due to their sensitivity and
importance; or nodes that are sufficiently unusual that the payoff of
putting them under configuration management doesn't justify the work;
or just systems which you don't have full control over.

We've been using a simple tool called extract in these kinds of
environments, which pulls a given set of files from remote hosts and
stores them under version control in a set of local per-host trees.

You can think of it as the yang to puppet or chef's yin - instead of
pushing configs onto remote nodes, it's about pulling configs off
nodes, and storing them for tracking and change control.

We've been primarily using it in a RedHat/CentOS environment, so we
use it in conjunction with
rpm-find-changes,
which identifies all the config files under /etc that have been
changed from their deployment versions, or are custom files not
belonging to a package.

Extract doesn't care where its list of files to extract comes from, so
it should be easily customised for other environments.

It uses a simple extract.conf shell-variable-style config file,
like this:

Extract also allows arbitrary scripts to be called at the beginning
(setup) and end (teardown) of a run, and before and/or after each host.
Extract ships with some example shell scripts for loading ssh keys, and
checking extracted changes into git or bzr. These hooks are also
configured in the extract.conf config e.g.:

# Pre-process scripts
# PRE_EXTRACT_SETUP - run once only, before any extracts are done
PRE_EXTRACT_SETUP=pre_extract_load_ssh_keys
# PRE_EXTRACT_HOST - run before each host extraction
#PRE_EXTRACT_HOST=pre_extract_noop
# Post process scripts
# POST_EXTRACT_HOST - run after each host extraction
POST_EXTRACT_HOST=post_extract_git
# POST_EXTRACT_TEARDOWN - run once only, after all extracts are completed
#POST_EXTRACT_TEARDOWN=post_extract_touch

Extract is available on github, and
packages for RHEL/CentOS 5 and 6 are available from
my repository.