Say I mount some cloud storage (Amazon Cloud Drive in my case) with a FUSE client at /mnt/cloud. But because reading and writing files directly to /mnt/cloud is slow because it has to go over the internet, I want to cache the files that I'm reading from and writing to cloud storage. Since I might be writing a lot of data at a time, the cache should sit on my disk and not in RAM. But I don't want to replicate the entire cloud storage on my disk, because my disk may be too small.

So I want to have a cached view into /mnt/cloud mounted at /mnt/cloud_cache, which uses another path, say /var/cache/cloud as the caching location.

If I now read /mnt/cloud_cache/file, I want the following to happen:

Check whether file is cached at /var/cache/cloud/file.

If cached: Check file in cache is up-to-date by fetching modtime and/or checksum from /mnt/cloud. If it's up-to-date, serve the file from the cache, otherwise go to 2.

If not cached or cache is out-of-date: Copy /mnt/cloud/file to /var/cache/cloud/file and serve it from the cache.

When I write to /mnt/cloud_cache/file, I want this to happen:

Write to /var/cache/cloud/file and record in a journal that file needs to be written back to /mnt/cloud

Wait for writing to /var/cache/cloud/file to be done and/or previous write backs to /mnt/cloud to be completed

gcsfuse (https://github.com/GoogleCloudPlatform/gcsfuse) I think this does exactly what I want, but it's integrated with Google Cloud Storage. To make it work in general, I would have to hack it and change any accesses to GCS to local file accesses in the given mount-point or accesses to Amazon Cloud Drive

Curious if you ever found a solution? Looking for a similar cache'ing layer with similar requires as your own.
– SS44Jan 20 '17 at 16:28

1

bitbucket.org/nikratio/s3ql does pretty much what I want. However, unfortunately, it doesn't play too nicely with Amazon Cloud Drive in particular (mainly ACD's fault by lack of a good Linux client)
– FlectoJan 24 '17 at 20:16

I've used s3ql in the past myself, but having migrated over to ACD for my files seemed to limit it's use with that provider. Did run into problems with data consistency with s3ql when data collections > 2TB. RClone seems promising but missing that vital caching piece.
– SS44Jan 25 '17 at 21:33

If you are seriously interested in that - we can write it in C++, using tmpfs and stat.
– GOSTMar 6 '17 at 15:08

It is possible to use FS-Cache/CacheFS to cache a fuse-mounted system, by adding an NFS indirection inbetween: If your fuse mount is on /fusefs, then share it to yourself on nfs by writing this in /etc/exportfs: