Safer paths, part 1 - valid and typed paths

Filepaths have been a pain in my neck for years. Paths are hard, overused,
misused and mostly unsafe. In this post I present a newly released library
that serves to make working with paths safer in the common use-case.

Misuse and scope

Have you every thought about why we use paths in the first place? and why
we use strings to denote them? Is it not weird that . means the
‘current’ directory (whatever that even means) and is also the
separator for extensions? Is it not absurd that / is both the
root of the path system and the separator? Path are hard, but in any case, we
are stuck with them now.

Paths are often overused or misused. By this I mean that they are used to
denote more than just the location of a file(/…). Sometimes a path is a vital
part of the meaning of the contents of the file. Sometimes the mere existence
of a file at a certain path even has a meaning.

The safepath library
is only to be used for real paths. Paths that can point to a directory, a
file, a device, a socket, etc. It is not meant to be used for glob patterns,
for $PATH’s, etc. It also encourages safer use of paths: only
absolute paths, no Strings and valid-by-construction
Path’s.

Data.FilePath versus
System.FilePath

Data.FilePath uses an opaque datatype to represent a path
instead of a plain String. This is the exact opposite of what
System.FilePath does.

safepath encourages safe usage of paths: preferably no
semantics in paths and no String juggling.

Data.FilePath versus
System.Path

Data.FilePath’s opaque data type with one type parameter:
Path. There are two possible occurrences of this path:

typeAbsPath=PathAbsolutetypeRelPath=PathRelative

This means that there is a type-level distinction between absolute and
relative paths now. It ensures that always the right type of path is passed
to a function. It should also encourage programmers to only ever use Absolute
paths in the heart of their application.

This approach may looks familiar. There are the path and pathtype
libraries that do something similar, only with more type parameters.

These different parameters serve to also support:

Whether a path points to a file or a directory

What platform the path originates from

The Path in Data.FilePath does not make these
distinctions for two reasons:

Whether a path points to a file or a directory can not be determined
from the path itself. Whether it points to a file or a directory on disk
is not even an immutable fact. It is dangerous to pretend that Path
Dir is any safer than ∃ t -> Path t. As such,
Data.FilePath does not make this distinction.

Supporting different platforms’ paths in the same system is not a
common enough use-case. The common use-case is to use the hosts
platform’s paths.

Implementation

A path is data with invariants

A path has invariants. Some can be encoded in the type, others have to be
contained and heavily tested. Of course these should be hidden from the user,
but they should be thoroughly tested. Looking at the other ‘safe’ path
libraries, I see only a handful of tests and no property tests to check
invariants.

Extreme testing

In terms of lines, there are many more for testing as there are for code.
In absolute numbers, there are:

over 150 doctests

over 100 property tests

over 10000 unit tests for parsing and rendering

Doctests ensure that the semantics of functions are at least intuitively
clear. Property tests look for edge cases that need to be handled and ensure
that the invariants of the Data.FilePath types are maintained
under all circumstances. Lastly, the unit tests serve to ensure that when the
handling of the paths changes behind the scenes, the API retains its
semantics.