Storage and beyond

POHMELFS

In a meantime I rewrote pohmelfs from scratch and it enters heavy testing stage.
As promised, it became just a POSIX frontend to elliptics network with weak synchronization. By using elliptics as its backend, it gets multiple copies support, atomic transactions (in single replica), multiple datacenter support with IO balance, checksums, namespaces and so on.

And by ‘weak synchronization’ here I mean, that all writes are not visible to other users, who mounted external storage, until writer performs sync or writer’s host system decides to writeback dirty pages to the storage.
This actually mirrors behaviour of VFS in all modern OSes – we write data into page cache, and if system catches power failure, its data is lost. Even more – users are not synchronized in any way, and if one of them removes file, another one will only detect that after reading directory again (or trying to open/access given filename).

There is a very interesting approach I use in directory listing. We store directory information as a record indexed by directory key id. It is atomically (as in single replica, multiple replicas are updated independently) updated for every written/removed object and hosts whole inode indexed by dentry names.
But directory listing just reads that whole directory structure and parses it adding inodes/dentries not at lookup time (this is supported too of course), but at readdir time. Since records are stored as single continuous areas in elliptics, we only have to download and sequentially iterate over this blob data to get listing completed without multiple server lookups per name.

But we have to cleanup parent direntry list every time we are about to perform directory listing, since other users may delete some files or rename them. For example rsync creates ‘.blah.random-crap’ files first and then renames them to ‘blah’ when copy is completed, which resulted in 2 files having the same inode and id previously.

There is no hardlink support yet, balanced/random reads from multiple replicas and quorum read, when we try to reach multiple replicas and select one with the latest consistent data. Writes also should support quorum option (at least mark pages back as dirty if write did not reach quorum or requested number of replicas).

I also plan to add column read/write (this is definitely not a POSIX interface, but kind of file-is-a-directory feature).
Also we want HTTP API compatibility with elliptics, i.e. we write data via pohmelfs and read it via HTTP (or any other) client, which uses default id-is-a-name-hash approach.

This is all is planned for future releases though, I plan to submit new stable version in December and better in a week or two.
Stay tuned!