The great demise of the file

I recently had an HDD failure and had to dig deep into the filesystem to recover some data. Pain in the ass. While navigating the crummy, hidden folders to recover weirdly-sounding filenames… an epiphany hit me — The File is dying. The metaphorical role of a file is dying. The file used to be the beginning and the end — it contained an essay, a book, a photo, a song, a todo list, a business card… to give you a document I would put the file containing it on a disk. Or on a usb stick. Or I would send it via email.

The piece of information was the file and vice versa: the file was the piece of information.

But not so much anymore. The email program does not operate on files. Neither does the twitter client nor the todo-tracking application. They store data in well-hidden databases in the internal directories that we’re not supposed to look at. And quite rightly — because there is nothing to see there. Looking at the huge photos.db one might only wonder where his dog photos are.

Sure, the applications still give you a way to share things and take them out of the storage. You can export a contact out of your address book as a vcard file. But the role of The File here is slowly being reduced to a role of an intermediate storage medium. The business card is temporarily put in the .vcf file before it gets injected into somebody else’s database (another address book?).

As more and more applications operate on databases, the computer is becoming a monolithic black-box that “has things”. How exactly (and where) the data is stored is becoming less clear. The application and the interface becomes united with the user data. It becomes one.

And if I own your user interface, I own your data.

Not so good

Databases as storage mediums solve problems by creating new ones.

I've got a directory on my disk that contains stuff related to my company – documents, invoices and some other content which is still mostly file-based. But I also have a todo list related to my work things and some notes about current projects. These are stored in two separate applications that operate on their own storage. So now: if I copy the whole company directory to another medium I'm not effectively copying everything. This trivial issue is extremely confusing for most users who are not familiar with the application internals.

There are other technical issues as well. Apple’s Time Machine is a good example here. It’s a great backup service but it was designed to operate on files. When a file is modified locally Time Machine copies it to a backup storage. This way it’s fast and incremental – the changes are small and isolated. You can even browse various versions of the file from the past. But if your “file” is a monolithic 1GB notes & pictures database (it’s “everything”) the whole concept is blown away (ie. Yojimbo example).

Meet the service

Not surprisingly, we stopped talking “files” long time ago. We’re talking “services” now. In some cases it works great: in example, I don’t exactly care how my Linux, Mac or iPhone email client store the mail data. I don’t care because the central storage (my IMAP server) is out there and it has a clear, well-defined service API. It’s transparent.

Too bad that email is a lonely exception here. Quite a significant portion of the data generated by us today is put in totally black-boxed, closed services that are very open to take our data but not very keen on giving it back in a sane way. This is often a deliberate business practice but sometimes – a lack of thought. Social services like Facebook come to mind. Google does not score better either.

The great escape

Should we stop defending the lost trenches of The File? Should we fight for more open services? Should we get concerned about freedom of data storage? Should we acknowledge that open code is not enough? Should I buy a new guitar?

Could be. Remembering the hard lesson from the web: convenience always wins over integrity.