Sunday, February 1

why aren't relational databases everywhere, like filesystems?

A couple of months ago, I went from working on one big software project
to working on a series of smaller, limited-scope projects. Of course a lot
changes when you do that. One of the things I’ve really been noticing is that
I’m not using relational databases for much right now, and probably haven’t
written any SQL since November or thereabouts.

I should note here that I think filesystems are a really powerful abstraction,
frequently underrated by people who ought to know better, and I usually argue
for simple tools. I am often that guy who wonders why you didn’t just use
grep, and even when I have a robust database at my disposal, I often explore
data by hacking together silly ad hoc pipelines in the shell.

With that out of the way, I just spent years treating first MySQL/MariaDB and
then PostgreSQL as default storage for most problems, and I’ve arrived at a
healthy respect for the tools, particularly PostgreSQL. It is nice to
accumulate collections of structured data. It’s nice to have defined schema
with a rich set of types, and to expect type constraints to be enforced and the
semantics of conversion between types made explicit. It’s really nice to
have a regular language for querying these collections.

Thinking about all of that, I thought “I should really just stand up psql on my
primary machine and populate it with interesting things”. SQL is, to be sure,
imperfect, but as a nerd working in the guts of a money-making enterprise, it
quickly becomes one of your basic tools for turning a store of facts into
answers to the questions that you didn’t know you were going to have to answer.
It’s one of the technologies that help render empirical questions
answerable, or at least help define the boundaries of which ones are
answerable.

And thinking about that, I started to wonder why I hadn’t set up a personal
database years ago — why stores of relational data weren’t as much a part of
my personal kit as the hierarchical filesystems I’ve been accumulating for 20+
years. And it seems to me like there’s something important in the answers to
that question.

In no particular order, some thoughts:

No operating system in wide use offers a relational DB to the user out of the
box. There’s a concerted effort underway to obscure them, but filesystems
have been user-level interfaces for most of the history of computing and the
idea of a “file” is pretty well embedded in the general awareness by now.

Moving files between machines and environments is (relatively) easy: The
protocols, hardware, and services are widely available. Moving relational
data is almost universally unpleasant, despite piggybacking on the file
stuff. It’s unforgiving in interface, slow, error prone, and implementation
specific.

There are plenty of tolerable GUI file managers, and desktop systems offer
them out of the box. The same can’t really be said for databases. GUI
clients exist (even some fairly good ones), but they’re mostly erratic,
require installation, demand configuration, and/or cost money.

text : vim / emacs :: sql : ? — there is not, so far as I am aware, any
good answer to this question. The statement generalizes quite a bit, and
yet.

While most programming languages offer library support for the widely-used
relational DBs, making them accessible for programmers who care to invest
a little time, there’s no comparable level of access at the level of the
shell, the windowing system, the web browser, etc.

People accomplish a lot with applications like Access, but while this speaks
to the utility of databases, forms, and scripting, it is not anything we
should feel especially good about.

A relational data store requires configuration to install, and configuration
(or at least another set of credentials) to access from a client.

Like programming languages and text editors, a full-featured database is
generally designed as a robust environment for tool use, offering lots of
abstractions and utility functions, and quite extensible within its own
space. What it’s not is much of a first-class citizen in a user-level tool
environment. Databases are for people who build infrastructure, but they
aren’t part of the infrastructure from the perspective of an end-user in
the same way that things like filesystems, HTTP clients, and simple utility
software are. Databases are earth-moving equipment, not highways and
on-ramps.

Writing queries is fairly cheap, once you learn a little conceptual
machinery, but defining or altering schema is usually slow and painful. (I’m
pretty sure that the friction here generates a lot of the incentive to
prefer NoSQL tools, and avoid rigorous schema definitions
generally, which leads to a lot of probably-unnecessary pain.)

It should be trivial to pipe things in and out of queries/resultsets,
but I don’t think it really is.

Of course, there’s a whole history here, and though I don’t know it well I do
have a rough sense of the conditions and accidents that brought us to the
status quo. It’d be a mistake to believe that the dominance of hierarchical
filesystems as we know them was inevitable, but filesystems as a base layer
appear to satisfy a lot of evolutionary constraints.

Which is to say that I’m not really interested in arguments that the filesystem
is a dead-end or whatever. On a long enough timescale, I suppose it might be,
but life is short and I have problems to solve. I’m just interested in whether
you can make something approximately as good as all the good parts of
PostgreSQL legible and accessible in the day-to-day nerd environment,
transparent to the network and to the broader ecosystem of tools. What would
it actually take?