Thomas Kjeldahl Nilsson

Unix: Tools for Creating Tools

This article is translated from a guest post originally written for a Norwegian programming blog.

Blacksmithing fascinates me.

Blacksmiths work with simple, basic tools: furnace, anvil, hammer,
tong. However: the smith can create more specialized tools as
needed. To my knowledge, no other craftsman can bootstrap their
process this way…

… apart from programmers. Given good fundamental tools we can
build anything we need to work faster and more efficiently.

This article will go over how Unix environments provide tools for
building other tools. We'll work on three levels: the bare
command-line, shellscripting, and Ruby programming.

Note: this text is mainly aimed at programmers not
comfortable/experienced with Unix and command-line work, but
programmers more experienced with Unix/Linux may pick up a thing or
two as well.

Infrastructure: installing offlineimap and msmtp

When these are installed and configured we'll end up with a Maildir
dir synchronized with our Gmail account via IMAP, plus a simple
command to send new emails from the command-line. Both tools are
available for Linux as well as OS X and other flavors of Unix.

Note: I've only tested the setup below in Ubuntu — YMMV in other
Unix/Linux flavors.

offlineimap

In Ubuntu, offlineimap installs like this:

sudo apt-get install offlineimap

Then we'll create a local Maildir dir where we want offlineimap to
store our local copy of our emails:

mkdir ~/Maildir

We need to configure offlineimap. We'll create ~/.offlineimaprc,
and set correct permissions on it:

Now test your config: Run the offlineimap command in your
terminal. If your config is correct, your Gmail inbox will be synched
down to ~/Maildir. Be patient: this can take a while if you have a
lot of unarchived email in the root inbox.

msmtp

In Ubuntu, install msmtp like this:

sudo apt-get install msmtp

We'll need config for this as well: create the ~/.msmtprc file with correct permissions:

The filenames are kinda cryptic — offlineimap uses filenames to
encode some metadata about each email: unique ids, checksums,
etc. Files under cur folders are read emails, while unread emails
are found under new.

The state of your local mailbox is synchronized with Gmail on each
offlineimap execution. For example: by moving an email from a new
to cur dir and synchronizing, that email will be marked as read in
the remote Gmail account.

Why is this useful?

Since our mailbox is represented by standard directories, files and
strings, we'll be able to use simple Unix tools to read and manipulate
emails from the command-line. We're ready to start playing with our
toolbox!

"In The Beginning was the Command Line"

Unix tools follow some common conventions to receive and pass along data.

Programs run on the command-line take data in on the STDIN stream,
and spit results out on the STDOUT stream (exceptions/errors are
directed to a different stream: STDERR). The output can either end
up directly in your terminal, or be redirected as input to other
programs.

Given these common conventions we can combine programs: by chaining
multiple commands with the | (pipe) operator, we can let data
flow through them sequentially as in a water pipe — we build new
tools by creating pipelines of other, simpler tools.

For example: I wrote this line last week to find all likely synch
conflicts in my Dropbox folder:

find ~/Dropbox | grep conflicted

The find command lists all files, recursively, below the named
dir. The files/paths are output as multiple lines, one for each path,
and then piped to grep which will act as a filter and only pass along
the ones with 'conflict' somewhere in them. The result ends up in my
terminal, giving me a list of likely conflicted files to deal with.

There are more optimal ways to perform this task but this works, and
it only took a few seconds to bang out this automation.

Now, let's build some command-line tools.

Terminal-snippet: Send an email

This is the line we ran to verify that email sending worked after installing msmtp above:

echo 'Sent from the terminal' | msmtp -a gmail TO_ADDRESS

echo dumps the following text to STDERR. On its own, this will
print the text to the terminal. We instead pipe it to msmtp, which
receives the mail body on STDIN.

Terminal-snippet: Count unread emails

This one-liner counts unread emails:

find ~/Maildir/Gmail/INBOX/new -type f | wc -l

We find all files in the new folder (only files, not directories),
and use wc to count how many hits we got. I think just dumping the
number is a bit terse, so let's add a human readable label:

Terminal-snippet: Read an email

We should also be able to read a specific email. The following
one-liner lets you dump out the contents of mail no. N from the top
of the list above.

This one is a bit more complicated:

find ~/Maildir/Gmail/INBOX -type f | sed -n 2p | xargs cat

We find all files recursively in our inbox. We pluck out the nth
line in that list (in this case number two), and pass that single
filepath along to cat, which dumps out the contents of the file.

These commands work just fine, but aren't super-readable or easy to
recollect. It's time to reach for shell-scripting to
simplify and reuse things.

Aside: preserve small things learned and built

I have trouble remembering useful snippets the first time I use
them. The following tricks help, though:

You can search backwards in your terminal history by entering
Ctrl-r in your terminal. Subsequent typing will display the first
matching entry in your history. Push arrow up to cycle backwards
through other candidates in your history. Note that this works best
if you set your terminal to preserve a lot, or all, of your history
between sessions.

Personal "cheat-sheets". I've got an orgmode-file where I store
handy one-liners, tools, snippets etc that I encounter, either
during work, from articles and books as well as colleagues. I'm not
great at retaining stuff the first time, so I like to come back and
refresh or rediscover stuff later on.

Define aliases in your shell environment. If you use Bash, create
or update ~/.bashrc with lines like this:

alias helloworld="echo 'hello world'"

When you reload your environment you can use this alias like any other
command. For instance, we could simplify one of our one-liners above:

When one-liners don't suffice, shell-scripting takes over

We'll get to a point where we need more actual programming to get
things done. In other words: variables, conditionals, loops and last
but not least: the ability to spread our logic over multiple lines of
code.

Let's turn our email tools into bash scripts. That way we can make
them available as shorter commands that take parameters.

Shellscript: Send an email

We'll create a script called send-email, which takes the recipient email
and mail body as parameters.

#!/bin/sh
RECIPIENT=$1
TEXT=$2
echo $TEXT | msmtp -a gmail $RECIPIENT

The very first line is a shebang which tells the system how to
execute the script (in this, run it as a shellscript). $1, $2 etc are
variables bound to the inbound parameters. To be extra clear, we
assign them to explicit variable names before executing the same
command as above to send the email.

If you put this script file in your PATH you can run it from anywhere like this:

send-email EMAIL_ADDRESS "Sent from a tiny shellscript"

A bit more user friendly than the original one-liner, don't you think?

Shellscript: Count unread emails

We'll port our "unread count widget" to a script called watch-unread-emails, which looks like this:

The script takes "mail no. N from the top of your inbox" as an
argument. We construct the sed command separately to make it more
readable.

Now we can read an email like this:

read-email 2

Better, yes?

Aside: how to script early and often

Make the threshold for writing new scripts as low as possible, and
you'll end up writing more of them. That way you can't help but mold
and improve your personal workflow/environment over time.

Here's two steps that will help with that:

Create a dir in your HOMEDIR, something like ~/bin or ~/scripts.
Put this dir in your PATH, making your scripts available throughout
your environment. Bonus points: create a git repo of the script
directory to give you version control of your scripts. Also, if you
work across several machines, synch your scripts between them using
Dropbox or a scheduled rsync operation.

Create a program that makes it super simple to create new
scripts. Below you'll find my ~/script/generatescript bash
script. It'll take the name of the new scripts as its argument,
create it in the script directory (with executable permission set),
and fire up my standard editor to let me start working on it right
away.

When shell-scripting becomes too ugly, lovely Ruby says hello

Perl was born because Larry Wall thought raw shell-scripting was too
primitive and limiting. Later on we got additional languages like
Ruby, Python and Groovy, directly inspired by Perl. Unix scripting got
a whole lot more comfortable.

We'll rewrite our commands to Ruby. This provides two benefits: more
readable and extendable scripts and access to tons of external
libraries (for example, we can use a Rubygem called mail to parse
email).

We now start with a different shebang to make the system run the file
using the Ruby interpreter.

We also add a validation of the number of parameters. CLI arguments to a
Ruby program are placed in a constant, global array called ARGV. If
the script is called with the wrong number of arguments we dump out a
usage text and immediately exit with an error signal.

The actual execution of the msmtp we just shell out to the
underlying system. This is the charm of using Ruby and other such
languages for scripting: we can choose to call out to the underlying
system at any time. This way, we can choose how much to lean on
standard Unix tools versus the libraries and frameworks of the
programming language.

Instead of leaning on watch, we implement the same logic directly in
Ruby: output unread count each nth second.

On each loop we clear our terminal of content and synch our email,
then wait N seconds before we do it again. The actual unread count we
find by using the Ruby File and Dir apis.

This is a bit longer than our original shellscript. However it still
feels a bit more extendable and readable than the original one-liner
and shellscript.

Ruby-script: Check inbox contents

We only port display-inbox to Ruby to stay consistent here: the Ruby
version of the script simple shells out the same one-liner. I find a
single line of grep perfectly readable, and it makes a point: Ruby
can at times be a very thin wrapper around regular shell-scripting.

We use Ruby file apis to find the path with the nth mail. Then we
lean on an external Ruby library (a so-called gem) called Mail to
parse the email. Finally we dump the email to html to STDOUT, which
leaves us with this:

The advantage of modern scripting-languages like Ruby and Python is
less arcane syntax, and tons of useful libraries and DSLs. Modern
scripting languages are also more portable than raw shell-scripting — enabling you to support Windows as well. For example: by using the
Ruby File API you'll abstract away the difference between path
separators, filesystem commands etc between Linux and Windows.

An downside of modern scripting languages is that they introduce
additional dependencies: if you stick to standard shell-scripting and
basic Unix tools, your script can function in very minimal systems
without installing external packages.

I often start on small tools with a simple shellscript automation in the
terminal. As soon as the script becomes a unwieldy I switch to Ruby
instead.

Build or not?

When you accumulate new building blocks like this, you see ever more
solutions to problems. It's tempting to just build anything you need
yourself. But: just because you can do so doesn't make it a good
idea. We have to pick our battles. Sometimes the pragmatic choice is
to pick an off-shelf, suboptimal, proprietary tool… that actually gets the job done right today.