Paths aren't strings

18 January 2014

In Ruby, we deal often with files; reading them, writing them, checking
whether or not they exist. When working with these files, we generally
reference them by their paths on the filesystem: /etc/hosts, for
example, or /usr/local/bin/git.

As in other languages, it’s pretty common in Ruby to represent these
filesystem paths as strings. In a way, that’s fine: it works okay, and
if we want to do something that gets to the files that they represent,
there are methods on File that can help us find what we want (for
fetching the absolute path of a relative filename, or checking whether
a file exists, for example).

But in the world of Ruby, with its rich object model, this feels neither
very idiomatic nor very object oriented. There’s lots of behaviour
associated with paths, and strings don’t encapsulate this behaviour very
well.

Paths can be relative, for example. That is, multiple paths that seem
different when expressed as a string can in fact correspond to the same
file; if we’re in the /usr/local directory, for example, we can reach
/etc/hosts using the paths /etc/hosts or ../../etc/hosts; we can
reach /usr/local/bin/git with both /usr/local/bin/git and bin/git.
To check if one string path is the same as the other, then, we can’t
just do path1 == path2.

That’s not all. Paths are representations of files, and those files have
attributes and states that matter to our programs. Does the path point
to a directory, for example? Does the file the path points to exist? How
big is it? Can we read from it? Can we write to it?

Paths are fundamentally also a hierarchical data type, expressed using
a delimiter (usually /); we can traverse deeper into the filesystem by
adding slash-separated values to a path, and climb back up the
filesystem hierarchy by removing them.

The String class in Ruby is aware of precisely none of these
behaviours, and so if we want to use them then we’re forced to use
a kludgey mix of static methods; things like File.join to build up
paths, File.exists? to check for the existence of files, and so on.
Some things can’t really be done at all if we store our paths as
strings, assuming that things like traversing the filesystem by using
split("/") fills you — rightly — with unease.

So if storing paths as strings is an anti-pattern, what are we to do?
Well, it turns out that the Ruby standard library comes with a type for
just this purpose, albeit one that’s underused: Pathname.

Pathname is part of the standard library in Ruby; it’s not an external
dependency like a Gem, so you can safely rely on it being present in all
your scripts. Once we’ve required the library, we can create
a Pathname in Ruby by passing a string to Pathname.new:

require"pathname"path=Pathname.new("/etc/hosts")

In fact, there’s a shortcut for Pathname.new; just call Pathname
like a method:

path=Pathname("/etc/hosts")

If we do nothing else, we’ve got ourselves an object that behaves in
many ways like a string. It’s to_s method, for example, returns the
path as a human-readable, ordinary string:

path.to_s# => "/etc/hosts"

In places where things are implicitly converted to strings, then — like
puts and print — we can use our Pathname object just as we would
a normal string.

It also implements to_path, which is used internally
by the File class; so, we can pass our Pathname object into
something like File.open, and it will act just the same as if we
passed it the path as a string:

File.open(path,"r"){|file|putsfile.read}

But we also gain a lot of methods that a string doesn’t have. In this
brief overview, I’m going to split them into two categories: inquiry and
traversal.

Inquiry

Since our Pathname object knows that it represents a path to a file,
unlike a string would, we can ask it questions about the file that our
path represents. To continue our above example, we might want to check
whether the path points to a directory:

path.directory?# => false

Or whether the file actually exists:

path.exist?# => true

We can also check whether the current process has permission to either
read from or write to the file:

path.readable?# => truepath.writable?# => false

Of course, these aren’t particularly exciting features; they’re already
fairly accessible as part of the File class thanks to the FileTest
module. But it certainly feels a lot more OO to pass these messages to
the path itself, rather than using some entirely separate static
methods.

Traversal

For my money, though, it’s when traversing the filesystem that
representing paths as Pathname objects really starts to feel
worthwhile.

Let’s imagine that you have the following folder structure:

lib/
+ script.rb
data/
+ file.txt

You want to access file.txt from your script.rb script, but you want
to make sure that this works whatever working directory you run the
script from. That means you need to figure out what the absolute path to
file.txt is, and then reference it using this absolute path.

If you’re written a gem, for example, you might well have encountered
this sort of task before. A solution I often see is something like the
following:

I see this pattern in gems a lot, and despite having seen it hundreds of
times and knowing instinctively what it’s doing, it still throws me
a little when I encounter it: there’s so much noise there that I have to
actively think about what the author is doing.

Let’s rewrite this to use Pathname, and see if we can’t reveal our
intentions a little more clearly:

path=Pathname(__FILE__).dirname.parent+"data"+"some_file"

We start by getting a reference to the current file. Then, we go up one
level to the directory that the file resides in; then up another to the
directory one level above.1

The next step, if you’re used to representing files as strings, might
seem odd: we’re just using the + operator to add elements onto the
path, but we’re not adding a separator as we might otherwise do either
manually ("foo" + "/bar") or with File.join. That’s because
Pathname will take care of adding the separators for us every time we
append a new element to the path.

I don’t know about you, but to me the second example seems clearer.

We’re not just limited to this simple traversal, either. Let’s imagine
we have a path to a file deep in the hierarchy of the file system:

path=Pathname("/some/really/deep/file/in/some/really/deep/folder")

Imagine we want to work our way up the filesystem from our current
location until we hit a certain point: a directory with a certain name,
for example. There’s no straightforward way to do this with the path
represented as a string, but with Pathname it’s easy:

dir=nilpath.ascend{|f|dir=fandbreakiff.basename.to_s=="some"}

Here, we climb upwards through the filesystem (so we get to folder,
then up to deep, then up to really, and so on backwards through the
path). As soon as we find a directory whose name is some, we’ve found
what we’re looking for and so break out of our loop.

(If we wanted to proceed in the opposite direction — that is, to start
with /, then /some, then /some/really, and so on — we could use
descend, which is otherwise identical to ascend.)

The great thing about this type of traversal is that we don’t have to
touch the filesystem at all. The above example, with its path that
doesn’t exist at all, will still execute perfectly well; Pathname has
enough information from the path to know each step along the way, right
up to the filesystem root.

That’s not to say that we can’t access the filesystem when we want to,
though. For example, we don’t have to traverse the filesystem upwards:
we can drill down into it with children:

The array returned by children contains references — as Pathname
objects, naturally — to all the files and directories in the /etc
directory.

From here, it’s a short leap to powerful and expressive traversal of the
filesystem, especially for methods like children that return arrays
(and so have the full power of Enumerable available to them). For
example, let’s fetch all the directories in the current directory that
have more than 10 files in them:

There’s much more to Pathname than the small snippet I’ve presented
here, but hopefully it’s been enough to convince you to think about
using Pathname the next time you want to represent file paths in Ruby.
It’s powerful, semantic and, since it’s part of the Ruby standard
library, there’s not much excuse not to use it.

In Ruby 2.0, we can simplify this further by calling
Pathname(__dir__), eliminating the need for the call to
dirname. ↩

I'm Rob Miller. I'm a Ruby developer and I work as Operations Director
for Big Fish in London. Find out more