The Command-Line RESTafarian

Almost any modern-day service or application provides an HTTP endpoint to
work with. Whether they provide metrics, allows remote administration, or
accepts complex requests, a system administrator will spend a lot of time
working in the terminal accessing and updating such APIs.

There are many many tools to help us, but today we’re going to look at just
four key tools: curl, jq, yajl, and httpie.

curl is probably available on every non-Windows system out of the box
these days, except minimal builds. It supports almost any protocol you can
name, and many you’ve likely never even heard of. For most people, curl is
both the reference tool when testing an HTTP API for compatibility with the
HTTP spec (note, in 2014 now split into multiple specifications), and the
work horse used in many applications, either directly, or as an integrated
libcurl library built into the application.

-o to write any data output to a file, verbose info such as headers go to stderr still

-# to get a hash on stderr each time a chunk of data is received

-L to follow a link, for example when checking a 301 redirect

Uploading large files

When uploading large files, it’s important to consider the data format used
(binary, ascii, UTF-8, BASE64 encoded), and whether the data is streamed or
resident in memory. Most people use the -d | --data option, which will
load the entire file into memory, and then send it with
Content-Type:application/x-www-form-urlencoded which is usually not what
you expected to happen - lots of CPU to encode the data, lots of memory to
hold the file.

“if you start the data with the letter @, the rest should be a file name
to read the data from, or - if you want curl to read the data from
stdin. The contents of the file must already be URL-encoded. Multiple
files can also be specified. Posting data from a file named ‘foobar’ would
thus be done with --data-binary @foobar”.

The --data-binary option is normally more appropriate, as it at least
avoids base64-encoding, but it will still load the entire file into memory

I recommend using -T | --upload-file pretty much all the time. It streams
data, so your memory usage is not excessive, does not encode the data
unnecessarily — most of the time it just Does What You Mean.

jq is one of those tools that you wonder how you ever did without it. It’s a
pipe-capable terminal tool that can be used to reformat or select streaming
JSON-based data, in the same way you might use grep or wc on some
arbitrary data.

Let’s take a look at a simple example, piping the output of curl directly
into jq. The . parameter represents the identity function, and as by
default jq also pretty-prints JSON, it takes the single-line curl response
and gives it pretty colours and indentation. This alone makes me happy.

But let’s say we are only interested in the version number of Cloudant’s
API. Maybe this is part of a script or cron job, and we want to confirm
that the version number is compatible with some operation we want to perform.
Easy-peasy.

$ curl -s skunkwerks.cloudant.com | jq .version
"1.0.2"

Perhaps we need to transform that JSON in some fashion. In this case, we’ll
destructure the JSON object, and produce a new one that happens to have
different keys than the original. This is useful when, for example, you
migrate JSON data out of one system that uses id as a key for a document,
into CouchDB that wants _id.

The second operation pulls out all document ids, and the current revision
from the value object, and finally wraps them as a convenient array. Neat.

unbuffered

The final feature of jq that I love is its support for streaming APIs.
jq’s default mode loops over data until it gets to a suitable spot, and
flushes data when required to the next part of our shell command - perhaps
another pipe. But sometimes we need to deal with sources that deliver data
at a different rate to how fast it can be processed.

The --unbuffered flag tells jq to spit out JSON data as soon as it has
something useful, rather than waiting for enough data to complete parsing.

Our example is a bit contrived, as it would work perfectly well without
streaming support. In this case we receive a continual stream of updated
metrics from a riemann server that is monitoring a number of Erlang
servers.

yajl is actually the first JSON library I encountered, and it’s still one
of my favourites. It’s extremely fast, available as both a library and a
command-line tool on every platform I’ve used recently.

I tend to use yajl for validating and reformatting JSON from pretty-printed
to packed form. The latter can be done with jq as well of course, so let’s
just see validation of piped data. Often somebody will ask why CouchDB is
rejecting their valid JSON, and I point them to yajl so that they can check
themselves. This is almost always invalid UTF-8, by the way.

HTTPie is another very powerful tool written in python, using the well
known requests library under the hood. I use it a lot when talking to
JSON webservices to build up JSON objects rather than copy and paste text
into the shell from elsewhere. This way, HTTPie takes care of ensuring
I’m supplying valid JSON.