EPRD – An eventually persistent ramdisk / disk cache

Today I uploaded a kernel project that I call eprd. This kernel module allows you to create a persistent ram disk. It can also be used to use DRAM to cache disk IO. Of course this comes with all the dangers that volatile ram introduces. EPRD does however support barriers. When barriers are enabled any sync() on the file system will result in EPRD flushing all dirty buffers to disk. It also allows to set a commit interval for flushing dirty buffers to disk.

This project can be useful whenever one needs more IOPS in a non critical environment. There is more to this project though. I am working on a kernel based high performance deduplicating block device. This project will share code and ideas from EPRD as well as Lessfs.

Hi Maru
tahnks for your new great project!!
I have try to find a lot of solution for VM filesystem implementation until your project!!
What do you think about a new feature for EPRD like multi tier and caching the most used data block?
My idea is to use EPRD as raw device for LVM and KVM virtualization what do you think about? (I know its not safe)
Thanks so much

The latest version of EPRD will allow you to create a storage solution for example vmware or kvm. EPRD can now transparently export any disk. No need to reformat the drive. Eprd now supports sector sizes ranging from 512 to 4096 bytes.

Testing with SCST + EPRD to export an eprd drive to Vmware was very promising. A Windows 2008 guest machine running on Vmware was capable of doing 20k IOPS on a SATA drive. The SATA drive can handle 150 IOPS.

Multi tiering is a possibility. Even with de-duplication.
I’ll give it some thought.

Hi Mark
thanks for reply, very promising solution!!
My Idea on multi tier solution is for example RAM+SSD+HDD and destagind the data block using some sort of payload for device and most wanted block data (please take look to this discussion https://bbs.archlinux.org/viewtopic.php?id=113529).
At the moment i have try to use EPRD as Physical Volume on LVM and works! so i can use it as raw volume for my VM.
The best solution is Multi-Tier and Dedupe Filesyetm!! But is another story.
Thanks so much for your work

bcache is surely interesting. As well is flashcache.
The problem with bcache is that it comes with it’s own kernel code instead of a clean patch that works with most kernels.
Flashcache works well but requires to reformat your drive as I recall.

Judging from the thread at:https://www.redhat.com/archives/dm-devel/2012-March/msg00078.html
having bcache merged in the kernel is not very likely. Furthermore bcache is not the same as EPRD: it does read caching so it is less specifically aimed, and AFAIR it does not respect barriers which is a very very big big problem imho.

I am trying to use hamsterdb with 1.5.9.
I have used the default lessfs.etc-hamster, and created the directories.
mklessfs -f -c /etc/lessfs.cfg works fine and creates files.
lessfs /etc/lessfs.cfg /usr/lessfs does nothing and in /var/log/messages it says:
Apr 19 12:24:01 localhost lessfs[10648]: ham_env_open failed to open the databases() returned error -8

This is somewhat like the many projects aimed at caching HDDs with SSDs, however none of the others clearly separates the two tasks of caching/optimizing the writes, and caching/optimizing the reads, and they are actually mainly concerned in optimizing the reads. Yours is the only one concerned only with writes, which is more interesting for me, and it seems you did a great job! (I have not tried it yet).

– I would suggest you to consider the existence of SSDs and to combine your EPRD with another layer, let’s call DW (dual writer): you would use two backend devices, one SSD and one HDD. You write stuff simultaneously to the SSD and HDD, you return completion to the application when the SSD returns completion, but you unmap the data from the SSD as soon as the HDD returns completion so to re-use that SSD space later. In this way you will be able to sustain very long bursts of I/O at the speed of an SSD, and eventually ending up on the HDD. You will need to slow down the I/O only in case the HDD lags behind so much that the space on the SSD is exhausted.

– I don’t know if you already do it, I suggest to write data to the HDD in LBA order (at each flush) so to minimize seeks (i.e.: reimplement the blockdevice queue scheduler). If you eventually create the Dual Writer there is an additional optimization you can do: X upstream flushes should result in X flushes to the SSD downstream, this needed for barrier correctness, but for the HDD you can do that in less than X flushes, and at every flush to the HDD you can sort blocks in LBA order, so you actually can sort block across upstream flush requests (i.e. across barriers), eventually obtaining a much higher HDD throughput.

– Also I suggest to publicize your project into the dm-devel mailing list ASAP, because Mike Snitzer has said he is working on bringing some caching/HSM mechanism thing to DM, see:https://www.redhat.com/archives/dm-devel/2012-March/msg00069.html
try to push your EPRD to him! Especially if you can implement the Dual Writer so that it can be sold as an RAM+SSD hybrid caching mechanism for writes.

Now speaking of lessfs: I would really like to see snapshotting in there, or maybe the BTRFS_IOC_CLONE (aka cp –reflink) which can be used to implement a poor man’s snapshotting mechanism (i.e. not instantaneous for a whole directory hierarchy, but close enough), but can also be used to clone single files which is extremely useful per-se. Please don’t stop the 1.x project just yet if it is anyhow possible to implement BTRFS_IOC_CLONE in there (maybe with a small kernel patch for fuse to be pushed upstream).

I’m working on a similar project and I was wondering if you ever had to face issues reading the image file within the kernel? I’m trying to use vfs_read to read my image file but the call to read freezes! My read routine looks like yours and I don’t see you doing anything different. But I’m not sure what’s causing my reads to hang.