incron
is a useful little cron-like utility that lets you run arbitrary jobs
(like cron), but instead of being triggered at certain times, your
jobs are triggered by changes to files or directories.

It uses the linux kernel inotify
facility (hence the name), and so it isn't cross-platform, but on linux
it can be really useful for monitoring file changes or uploads, reporting
or forwarding based on status files, simple synchronisation schemes, etc.

Again like cron, incron supports the notion of job 'tables' where
commands are configured, and users can have manage their own tables
using an incrontab command, while root can manage multiple system
tables.

So it's a really useful linux utility, but it's also fairly old (the
last release, v0.5.10, is from 2012), doesn't appear to be under
active development any more, and it has a few frustrating quirks that
can make using it unnecessarily difficult.

So this post is intended to highlight a few of the 'gotchas' I've
experienced using incron:

You can't monitor recursively i.e. if you create a watch on a
directory incron will only be triggered on events in that
directory itself, not in any subdirectories below it. This isn't
really an incron issue since it's a limitation of the underlying
inotify mechanism, but it's definitely something you'll want
to be aware of going in.

The incron interface is enough like cron (incrontab -l,
incrontab -e, man 5 incrontab, etc.) that you might think
that all your nice crontab features are available. Unfortunately
that's not the case - most significantly, you can't have comments
in incron tables (incron will try and parse your comment lines and
fail), and you can't set environment variables to be available for
your commands.

That means that cronMAILTO support is not available, and in
general there's no easy way of getting access to the stdout or
stderr of your jobs. You can't even use shell redirects in your
command to capture the output (e.g. echo $@/$# >> /tmp/incron.log
doesn't work). If you're debugging, the best you can do is add a
layer of indirection by using a wrapper script that does the
redirection you need (e.g. echo $1 2&>1 >> /tmp/incron.log)
and calling the wrapper script in your incrontab with the incron
arguments (e.g. debug.sh $@/$#). This all makes debugging
misbehaving commands pretty painful. The main place to check if
your commands are running is the cron log (/var/log/cron) on
RHEL/CentOS, and syslog (/var/log/syslog) on Ubuntu/Debian.

incron is also very picky about whitespace in your incrontab.
If you put more than one space (or a tab) between the inotify
masks and your command, you'll get an error in your cron log
saying cannot exec process: No such file or directory, because
incron will have included everything after the first space as part
of your command e.g. (gavin) CMD ( echo /home/gavin/tmp/foo)
(note the evil space before the echo).

It's often difficult (and non-intuitive) to figure out what inotify
events you want to trigger on in your incrontab masks. For instance,
does 'IN_CREATE' get fired when you replace an existing file with a
new version? What events are fired when you do a mv or a cp?
If you're wanting to trigger on an incoming remote file copy, should
you use 'IN_CREATE' or 'IN_CLOSE_WRITE'? In general, you don't want to guess,
you actually want to test and see what events actually get fired on
the operations you're interested in. The easiest way to do this is
use inotifywait from the inotify-tools package, and run it using
inotifywait -m <dir>, which will report to you all the inotify
events that get triggered on that directory (hit <Ctrl-C> to exit).

The "If you're wanting to trigger on an incoming remote file copy,
should you use 'IN_CREATE' or 'IN_CLOSE_WRITE'?" above was a trick
question - it turns out it depends how you're doing the copy! If
you're just going a simple copy in-place (e.g. with scp), then
(assuming you want the completed file) you're going to want to trigger
on 'IN_CLOSE_WRITE', since that's signalling all writing is complete and
the full file will be available. If you're using a vanilla rsync,
though, that's not going to work, as rsync does a clever
write-to-a-hidden-file trick, and then moves the hidden file to
the destination name atomically. So in that case you're going to want
to trigger on 'IN_MOVED_TO', which will give you the destination
filename once the rsync is completed. So again, make sure you test
thoroughly before you deploy.

First, the WiredTiger storage engine (the default since mongodb 3.2)
"strongly" recommends using the xfs filesystem on linux, rather than
ext4 (see https://docs.mongodb.com/manual/administration/production-notes/#prod-notes-linux-file-system
for details). So the first thing to do is reorganise your disk to make
sure you have an xfs filesystem available to hold your upgraded database.
If you have the disk space, this may be reasonably straightforward; if
you don't, it's a serious PITA.

These are the basics anyway. This doesn't cover configuring access control on your
new database, or wrangling SELinux permissions on your database directory, but if
you're doing those currently you should be able to figure those out.

support for (secure) downloads, ideally via a browser (no special software required)

support for (secure) uploads, ideally via sftp (most of our customers are familiar with ftp)

Our target was RHEL/CentOS 7, but this should transfer to other linuxes pretty
readily.

Here's the schema we ended up settling on, which seems to give us a good mix of
security and flexibility.

use apache with HTTPS and PAM with local accounts, one per customer, and nologin
shell accounts

users have their own groups (group=$USER), and also belong to the sftp group

we use the users group for internal company accounts, but NOT for customers

customer data directories live in /data

we use a 3-layer hierarchy for security: /data/chroot_$USER/$USER
are created with a nologin shell

the /data/chroot_$USER directory must be owned by root:$USER, with
permissions 750, and is used for an sftp chroot directory (not writeable
by the user)

the next-level /data/chroot_$USER/$USER directory should be owned by $USER:users,
with permissions 2770 (where users is our internal company user group, so both
the customer and our internal users can write here)

we also add an ACL to /data/chroot_$USER to allow the company-internal users
group read/search access (but not write)

We just use openssh internal-sftp to provide sftp access, with the following config:

So we chroot sftp connections to /data/chroot_$USER and then (via the ForceCommand)
chdir to /data/chroot_$USER/$USER, so they start off in the writeable part of their
tree. (If they bother to pwd, they see that they're in /$USER, and they can chdir
up a level, but there's nothing else there except their $USER directory, and they
can't write to the chroot.)

Ran out of space on an old CentOS 6.8 server in the weekend, and so had
to upgrade the main data mirror from a pair of Hitachi 2TB HDDs to a pair
of 4TB WD Reds I had lying around.

The volume was using mdadm, aka Linux Software RAID, and is a simple mirror
(RAID1), with LVM volumes on top of the mirror. The safest upgrade path is
to build a new mirror on the new disks and sync the data across, but there
weren't any free SATA ports on the motherboard, so instead I opted to do an
in-place upgrade. I haven't done this for a good few years, and hit a couple
of wrinkles on the way, so here are the notes from the journey.

Below, the physical disk partitions are /dev/sdb1 and /dev/sdd1, the
mirror is /dev/md1, and the LVM volume group is extra.

1. Backup your data (or check you have known good rock-solid backups in
place), because this is a complex process with plenty that could go wrong.
You want an exit strategy.

4. Since these are brand new disks, we need to partition them. And since
these are 4TB disks, we need to use parted rather than the older fdisk.

parted /dev/sdb
print
mklabel gl
# Create a partition, skipping the 1st MB at beginning and end
mkpart primary 1 -1
unit s
print
# Not sure if this flag is required, but whatever
set 1 raid on
quit

5. Then add the new partition back into the mirror. Although this is much
bigger, it will just sync up at the old size, which is what we want for now.

mdadm --manage /dev/md1 --add /dev/sdb1
# This will take a few hours to resync, so let's keep an eye on progress
watch -n5 cat /proc/mdstat

6. Once all resynched, rinse and repeat with the other disk - fail and remove
/dev/sdd1, shutdown and swap the new disk in, boot up again, partition the new
disk, and add the new partition into the mirror.

7. Once all resynched again, you'll be back where you started - a nice stable
mirror of your original size, but with shiny new hardware underneath. Now we
can grow the mirror to take advantage of all this new space we've got.

mdadm --grow /dev/md1 --size=max
mdadm: component size of /dev/md0 has been set to 2147479552K

Since I got bitten by this recently, let me blog a quick warning here:
glibc iconv - a utility for character set conversions, like iso8859-1 or
windows-1252 to utf-8 - has a nasty misfeature/bug: if you give it data on
stdin it will slurp the entire file into memory before it does a single
character conversion.

Which is fine if you're running small input files. If you're trying to
convert a 10G file on a VPS with 2G of RAM, however ... not so good!

This looks to be a
known issue, with
patches submitted to fix it in August 2015, but I'm not sure if they've
been merged, or into which version of glibc. Certainly RHEL/CentOS 7 (with
glibc 2.17) and Ubuntu 14.04 (with glibc 2.19) are both affected.

Once you know about the issue, it's easy enough to workaround - there's an
iconv-chunks wrapper on github that
breaks the input into chunks before feeding it to iconv, or you can do much
the same thing using the lovely GNU parallel
e.g.

Those files capture the ACPI events and handle them via a custom script in
/etc/acpi/actions/volume.sh, which uses amixer from alsa-utils. Volume
control worked just fine, but muting was a real pain to get working correctly
due to what seems like a bug in amixer - amixer -c1 sset Master playback toggle
doesn't toggle correctly - it mutes fine, but then doesn't unmute all
the channels it mutes!

I worked around it by figuring out the specific channels that sset Master
was muting, and then handling them individually, but it's definitely not as clean:

So in short, really pleased with the X250 so far - the screen is lovely, battery
life seems great, I'm enjoying the keyboard, and most things have Just
Worked or have been pretty easily configurable with CentOS. Happy camper!

I wrote a really simple personal URL shortener a couple of years ago, and
have been using it happily ever since. It's called shrtn
("shorten"), and is just a simple perl script that captures (or generates) a
mapping between a URL and a code, records in a simple text db, and then generates
a static html file that uses HTML meta-redirects to point your browser towards
the URL.

It was originally based on posts from
Dave Winer
and Phil Windley,
but was interesting enough that I felt the itch to implement my own.

I just run it on my laptop (shrtn <url> [<code>]), and it has settings to
commit the mapping to git and push it out to a remote repo (for backup),
and to push the generated html files up to a webserver somewhere (for
serving the html).

Most people seem to like the analytics side of personal URL shorteners
(seeing who clicks your links), but I don't really track that side of it
at all (it would be easy enought to add Google Analytics to to your html
files to do that, or just doing some analysis on the access logs). I
mostly wanted it initially to post nice short links when microblogging,
where post length is an issue.

Surprisingly though, the most interesting use case in practice is the
ability to give custom mnemonic code codes to URLs I use reasonably often, or
cite to other people a bit. If I find myself sharing a URL with more
than a couple of people, it's easier just to create a shortened version and
use that instead - it's simpler, easier to type, and easier to remember for
next time.

So my shortener has sort of become a cross between a Level 1 URL cache
and a poor man's bookmarking service. For instance:

If you don't have a personal url shortener you should give it a try - it's
a surprisingly interesting addition to one's personal cloud. And all you
need to try it out is a domain and some static webspace somewhere to host
your html files.

Too easy.

[ Technical Note: html-based meta-redirects work just fine with browsers,
including mobile and text-only ones. They don't work with most spiders and
bots, however, which may a bug or a feature, depending on your usage. For a
personal url shortener meta-redirects probably work just fine, and you gain
all the performance and stability advantages of static html over dynamic
content. For a corporate url shortener where you want bots to be able to
follow your links, as well as people, you probably want to use http-level
redirects instead. In which case you either go with a hosted option, or look
at something like YOURLS for a slightly more heavyweight
self-hosted option. ]

Just picked up a shiny new Fujitsu ScanSnap 1300i ADF scanner to get
more serious about less paper.

I chose the 1300i on the basis of the nice small form factor, and that SANE
reports
it having 'good' support with current SANE backends. I'd also been able
to find success stories of other linux users getting the similar S1300
working okay:

I plugged the S1300i in (via the dual USB cables instead of the power
supply - nice!), turned it on (by opening the top cover) and then ran
sudo sane-find-scanner. All good:

found USB scanner (vendor=0x04c5 [FUJITSU], product=0x128d [ScanSnap S1300i]) at libusb:001:013
# Your USB scanner was (probably) detected. It may or may not be supported by
# SANE. Try scanimage -L and read the backend's manpage.

Ran sudo scanimage -L - no scanner found.

I downloaded the S1300 firmware Luuk had provided in his post and
installed it into /usr/share/sane/epjitsu, and then updated
/etc/sane.d/epjitsu.conf to reference it:

And so far gscan2pdf 1.2.5 seems to work pretty nicely. It handles both
simplex and duplex scans, and both the cleanup phase (using unpaper)
and the OCR phase (with either gocr or tesseract) work without
problems. tesseract seems to perform markedly better than gocr so
far, as seems pretty typical.

So thus far I'm a pretty happy purchaser. On to a paperless
searchable future!

You'd think that 20 years into the Web we'd have billing all sorted out.
(I've got in view here primarily bill/invoice delivery, rather than
payments, and consumer-focussed billing, rather than B2B invoicing).

We don't. Our bills are probably as likely to still come on paper as in
digital versions, and the current "e-billing" options all come with
significant limitations (at least here in Australia - I'd love to hear
about awesome implementations elsewhere!)

Here, for example, are a representative set of my current vendors, and
their billing delivery options (I'm not picking on anyone here, just
grounding the discussion in some specific examples).

So that all looks pretty reasonable, you might say. All your vendors have
some kind of e-billing option. What's the problem?

The current e-billing options

Here's how I'd rate the various options available:

email: email is IMO the best current option for bill delivery - it's
decentralised, lightweight, push-rather-than-pull, and relatively easy
to integrate/automate. Unfortunately, not everyone offers it, and sometimes
(e.g. Citibank) they insist on putting passwords on the documents they send
out via email on the grounds of 'security'. (On the other hand, emails
are notoriously easy to fake, so faking a bill email is a straightforward
attack vector if you can figure out customer-vendor relationships.)

(Note too that most of the non-email e-billing options still use email
for sending alerts about a new bill, they just don't also send the bill
through as an attachment.)

web (i.e. a company portal of some kind which you log into and can
then download your bill): this is efficient for the vendor, but pretty
inefficient for the customer - it requires going to the particular
website, logging in, and navigating to the correct location before you
can view or download your bill. So it's an inefficient, pull-based
solution, requiring yet another username/password, and with few
integration/automation options (and security issues if you try).

BillPayView
/ Australia Post Digital Mailbox:
for non-Australians, these are free (for consumers) solutions for
storing and paying bills offered by a consortium of banks
(BillPayView) and Australia Post (Digital Mailbox) respectively.
These provide a pretty decent user experience in that your bills are
centralised, and they can often parse the bill payment options and
make the payment process easy and less error-prone. On the other
hand, centralisation is a two-edged sword, as it makes it harder to
change providers (can you get your data out of these providers?);
it narrows your choices in terms of bill payment (or at least makes
certain kinds of payment options easier than others); and it's
basically still a web-based solution, requiring login and navigation,
and very difficult to automate or integrate elsewhere. I'm also
suspicious of 'free' services from corporates - clearly there is value
in driving you through their preferred payment solutions and/or in the
transaction data itself, or they wouldn't be offering it to you.

Also, why are there limited providers at all? There should be a
standard in place so that vendors don't have to integrate separately
with each provider, and so that customers have maximum choice in whom
they wish to deal with. Wins all-round.

And then there's the issue of formats. I'm not aware of any Australian
vendors that bill customers in any format except PDF - are there any?

PDFs are reasonable for human consumption, but billing should really be
done (instead of, or as well as) in a format meant for computer consumption,
so they can be parsed and processed reliably. This presumably means billing
in a standardised XML or JSON format of some kind (XBRL?).

How billing should work

Here's a strawman workflow for how I think billing should work:

the customer's profile with the vendor includes a billing delivery
URL, which is a vendor-specific location supplied by the customer to
which their bills are to be HTTP POST-ed. It should be an HTTPS URL to
secure the content during transmission, and the URL should be treated
by the vendor as sensitive, since its possession would allow someone
to post fake invoices to the customer

if the vendor supports more than one bill/invoice format, the customer
should be able to select the format they'd like

the vendor posts invoices to the customer's URL and gets back a URL
referencing the customer's record of that invoice. (The vendor might,
for instance, be able to query that record for status information, or
they might supply a webhook of their own to have status updates on the
invoice pushed back to them.)

the customer's billing system should check that the posted invoice has
the correct customer details (at least, for instance, the vendor/customer
account number), and ideally should also check the bill payment methods
against an authoritative set maintained by the vendor (this provides
protection against someone injecting a fake invoice into the system with
bogus bill payment details)

the customer's billing system is then responsible for facilitating the
bill payment manually or automatically at or before the due date, using
the customer's preferred payment method. This might involve billing
calendar feeds, global or per-vendor preferred payment methods, automatic
checks on invoice size against vendor history, etc.

This kind of solution would give the customer full control over their
billing data, the ability to choose a billing provider that's separate from
(and more agile than) their vendors and banks, as well as significant
flexibility to integrate and automate further. It should also be pretty
straightforward on the vendor side - it just requires a standard HTTP POST
and provides immediate feedback to the vendor on success or failure.

I'm now working on a big data web mining startup,
and spending an inordinate amount of time buried in large data files, often
some variant of CSV.

My favourite new tool over the last few months is is Karlheinz Zoechling's
App::CCSV
perl module, which lets you do some really powerful CSV processing using
perl one-liners, instead of having to write a trivial/throwaway script.

If you're familiar with perl's standard autosplit functionality (perl -a)
then App::CCSV will look pretty similar - it autosplits its input into an
array on your CSV delimiters for further processing. It handles
embedded delimiters and CSV quoting conventions correctly, though, which
perl's standard autosplitting doesn't.

App::CCSV uses @f to hold the autosplit fields, and provides utility
functions csay and cprint for doing say and print on the CSV-joins
of your array. So for example:

# Print just the first 3 fields of your file
perl -MApp::CCSV -ne 'csay @f[0..2]' < file.csv
# Print only lines where the second field is 'Y' or 'T'
perl -MApp::CCSV -ne 'csay @f if $f[1] =~ /^[YT]$/' < file.csv
# Print the CSV header and all lines where field 3 is negative
perl -MApp::CCSV -ne 'csay @f if $. == 1 || ($f[2]||0) < 0' < file.csv
# Insert a new country code field after the first field
perl -MApp::CCSV -ne '$cc = get_country_code($f[0]); csay $f[0],$cc,@f[1..$#f]' < file.csv

App::CCSV can use a config file to handle different kinds of CSV input.
Here's what I'm using, which lives in my home directory in ~/.CCSVConf:

That just defines two sets of names for different kinds of input: comma,
tabs, and pipe for [,\t|] delimiters with standard CSV quote conventions;
and three nq ("no-quote") variants - commanq, tabsnq, and pipenq - to
handle inputs that aren't using standard CSV quoting. It also makes the comma
behaviour the default.

You use one of the names by specifying it when loading the module, after an =:

If you're a modern sysadmin you've probably been sipping at the devops
koolaid and trying out one or more of the current system configuration
management tools like puppet or chef.

These tools are awesome - particularly for homogenous large-scale
deployments of identical nodes.

In practice in the enterprise, though, things get more messy. You can
have legacy nodes that can't be puppetised due to their sensitivity and
importance; or nodes that are sufficiently unusual that the payoff of
putting them under configuration management doesn't justify the work;
or just systems which you don't have full control over.

We've been using a simple tool called extract in these kinds of
environments, which pulls a given set of files from remote hosts and
stores them under version control in a set of local per-host trees.

You can think of it as the yang to puppet or chef's yin - instead of
pushing configs onto remote nodes, it's about pulling configs off
nodes, and storing them for tracking and change control.

We've been primarily using it in a RedHat/CentOS environment, so we
use it in conjunction with
rpm-find-changes,
which identifies all the config files under /etc that have been
changed from their deployment versions, or are custom files not
belonging to a package.

Extract doesn't care where its list of files to extract comes from, so
it should be easily customised for other environments.

It uses a simple extract.conf shell-variable-style config file,
like this:

Extract also allows arbitrary scripts to be called at the beginning
(setup) and end (teardown) of a run, and before and/or after each host.
Extract ships with some example shell scripts for loading ssh keys, and
checking extracted changes into git or bzr. These hooks are also
configured in the extract.conf config e.g.:

# Pre-process scripts
# PRE_EXTRACT_SETUP - run once only, before any extracts are done
PRE_EXTRACT_SETUP=pre_extract_load_ssh_keys
# PRE_EXTRACT_HOST - run before each host extraction
#PRE_EXTRACT_HOST=pre_extract_noop
# Post process scripts
# POST_EXTRACT_HOST - run after each host extraction
POST_EXTRACT_HOST=post_extract_git
# POST_EXTRACT_TEARDOWN - run once only, after all extracts are completed
#POST_EXTRACT_TEARDOWN=post_extract_touch

Extract is available on github, and
packages for RHEL/CentOS 5 and 6 are available from
my repository.

Ok, this has bitten me enough times now that I'm going to blog it so I
don't forget it again.

Symptom: you're doing a yum update on a centos5 or rhel5 box, using rpms
from a repository on a centos6 or rhel6 server (or anywhere else with
a more modern createrepo available), and you get errors like this:

What this really means that yum is too stupid to calculate the sha256
checksum correctly (and also too stupid to give you a sensible error
message like "Sorry, primary.sqlite.bz2 is using a sha256 checksum,
but I don't know how to calculate that").

The fix is simple:

yum install python-hashlib

from either rpmforge or epel, which makes the necessary libraries
available for yum to calculate the new checksums correctly. Sorted.

I've been enjoying playing around with ZeroMQ
lately, and exploring some of the ways it changes the way you approach
system architecture.

One of the revelations for me has been how powerful the pub-sub (Publish-
Subscribe) pattern is. An architecture that makes it straightforward for
multiple consumers to process a given piece of data promotes lots of
small simple consumers, each performing a single task, instead of a
complex monolithic processor.

This is both simpler and more complex, since you end up with more
pieces, but each piece is radically simpler. It's also more flexible and
more scalable, since you can move components around individually, and it
allows greater language and library flexibility, since you can write
individual components in completely different languages.

What's also interesting is that the benefits of this pattern don't
necessarily require an advanced toolkit like ZeroMQ, particularly for
low-volume applications. Here's a sketch of a low-tech pub-sub pattern
that uses files as the pub-sub inflection point, and
incron, the 'inotify cron' daemon, as our
dispatcher.

Recipe:

Install incron, the inotify cron daemon, to monitor our data directory
for changes. On RHEL/CentOS this is available from the rpmforge or EPEL
repositories: yum install incron.

I'm a big fan of Coraid and their relatively
low-cost storage units.
I've been using them for 5+ years now, and they've always been pretty
well engineered, reliable, and performant.

They talk ATA-over-Ethernet (AoE),
which is a very simple non-routable protocol for transmitting ATA
commands directly via Ethernet frames, without the overhead of higher
level layers like IP and TCP. So they're a lighter protocol than
something like iSCSI, and so theoretically higher performance.

One issue with them on linux is that the in-kernel 'aoe' driver is
typically pretty old. Coraid's
latest aoe driver is version
78, for instance, while the RHEL6 kernel (2.6.32) comes with aoe v47,
and the RHEL5 kernel (2.6.18) comes with aoe v22. So updating to the
latest version is highly recommended, but also a bit of a pain, because
if you do it manually it has to be recompiled for each new kernel
update.

The modern way to handle this is to use a
kernel-ABI tracking kmod, which gives you
a driver that will work across multiple kernel updates for a given EL
generation, without having to recompile each time.

So I've created a kmod-aoe package that seems to work nicely here. It's
downloadable below, or you can install it from my
yum repository.
The kmod depends on the 'aoetools' package, which supplies the command
line utilities for managing your AoE devices.

Be aware that there are multiple ldap configuration files involved now.
All of the following end up with ldap config entries in them and need to
be checked:

/etc/openldap/ldap.conf

/etc/pam_ldap.conf

/etc/nslcd.conf

/etc/sssd/sssd.conf

Note too that /etc/openldap/ldap.conf uses uppercased directives (e.g. URI)
that get lowercased in the other files (URI -> uri). Additionally, some
directives are confusingly renamed as well - e.g. TLA_CACERT in
/etc/openldap/ldap.conf becomes tla_cacertfile in most of the others.
:-(

If you want to do SSL or TLS, you should know that the default behaviour
is for ldap clients to verify certificates, and give misleading bind errors
if they can't validate them. This means:

if you're using CA-signed certificates, and want to verify them, add
your CA PEM certificate to a directory of your choice (e.g.
/etc/openldap/certs, or /etc/pki/tls/certs, for instance), and point
to it using TLA_CACERT in /etc/openldap/ldap.conf, and
tla_cacertfile in /etc/ldap.conf.

RHEL6 uses a new-fangled /etc/openldap/slapd.d directory for the old
/etc/openldap/slapd.conf config data, and the
RHEL6 Migration Guide
tells you to how to convert from one to the other. But if you simply
rename the default slapd.d directory, slapd will use the old-style
slapd.conf file quite happily, which is much easier to read/modify/debug,
at least while you're getting things working.

If you run into problems on the server, there are lots of helpful utilities
included with the openldap-servers package. Check out the manpages for
slaptest(8), slapcat(8), slapacl(8), slapadd(8), etc.

rpm-find-changes is a little script I wrote a while ago for rpm-based
systems (RedHat, CentOS, Mandriva, etc.). It finds files in a filesystem
tree that are not owned by any rpm package (orphans), or are modified
from the version distributed with their rpm. In other words, any file
that has been introduced or changed from it's distributed version.

It's intended to help identify candidates for backup, or just for
tracking interesting changes. I run it nightly on /etc on most of my
machines, producing a list of files that I copy off the machine (using
another tool, which I'll blog about later) and store in a git
repository.

I've also used it for tracking changes to critical configuration trees
across multiple machines, to make sure everything is kept in sync, and
to be able to track changes over time.