Future readers might also want to search for "file URI scheme RFC", and find the latest version. If you're a programmer, read the RFC. This post is to raise the awareness of the some of the issues around file to URI encoding, but it's not a substitution.

Recently I've been running into interop problems as some platforms are unable to parse file:/foo/bar. But this is not the first time I'm having trouble with file path represented as URI. Considering that the notion of filesystem goes back to 1960s, and URL has been around since 1990s, it's surprising that we haven't come to a concensus on this. But then again, like decimal numbers, once you start digging deeper, or start exchanging data, we find some glitches in the Matrix.

what are file paths?

The following is by no means an exhaustive list, but it covers much of the path that we come across on popular operating systems like macOS, Linux, and Windows:

For our purpose, we can think of it as mostly as the path component of the URI, which then gets applied to some target URI.

absolute paths on Unix-like filesystem

An absolute path on Unix-like filesystem /etc/hosts should be encoded using u3 notation file:///etc/hosts to maximize compatibility with current and previous RFCs.

Current RFC 8089 allows /etc/hosts to be encoded in u1, u2, and u3 notations.

file:/etc/hosts

file://localhost/etc/hosts

file:///etc/hosts

But the problem is that RFC 8089 came out in Februrary 2017, and there has been plenty of programs and libraries written priror to 2017. RFC 1738 that came out in 1994 defines URL, and 3.10 FILES defines the file scheme as

file://<host>/<path>

and

As a special case, <host> can be the string "localhost" or the empty string; this is interpreted as 'the machine from which the URL is being interpreted'.

In other words, RFC 1738 requires u2 notation or u3 notation. This is further confirmed in RFC 3986 and Kerwin 2013 Draft examples. So if we encode using u1 notation, it might be legal for RFC 8089, but other programs may not be able to parse it correctly.

absolute path on Windows filesystem

An absolute path on Windows filesystem C:\Documents and Settings\ should be encoded using u3 notation file:///C:/Documents%20and%20Settings/ to maximize compatibility with current and previous RFCs.

In addition to RFC 1738, there's another interesting source which is a post titled File URIs in Windows written by Dave Risney for Internet Explorer Team Blog in 2006. This post states C:\Documents and Settings\davris\FileSchemeURIs.doc should be encoded as file:///C:/Documents%20and%20Settings/davris/FileSchemeURIs.doc.

In Scala/Java, java.nio.file.Path#toUri works only when you run it on Windows:

This is intended to support the minimal representation of a local file in a DOS- or Windows-like environment, with no authority field and an absolute path that begins with a drive letter. For example:

file:c:/path/to/file

Accomodating u0 notation for Windows absolute path opens the door to an elegant conversion from any absolute file path to URI: just prepend file: in front of the path after slash conversion. But this does not work by default:

scala>new File(new URI("file:C:/Documents%20and%20Settings/"))
java.lang.IllegalArgumentException: URI is not hierarchical
at java.io.File.<init>(File.java:418)