This blog is about the Google Summer of Code project "ZFS filesystem for FUSE/Linux"

Tuesday, December 26, 2006

First alpha of ZFS on FUSE with write support

Ladies (?) and gentlemen, the first preview of ZFS on FUSE/Linux with full write support is finally here!

You can consider it my (late) Christmas gift for the Linux community ;)

Don't forget this is an alpha-quality release. Testing has been very limited.

Performance sucks right now, but should improve before 0.4.0 final, when a multi-threaded event loop and kernel caching support are working (both of these should be easy to implement, FUSE provides the kernel caching).

For more information, see the README and the STATUS file for working/not working features. Download here.

Awesome! I compared a zpool with a single file (rather than a partition) compared to ext2 on loopback to a single file. With bonnie++, I was impressed to see the performance of zfs-fuse was only 10-20% slower than ext2.

For fun, check out what happens when you turn compression on and run bonnie++. The bonnie++ test files compress 28x, and the read and write rates quadruple! It's not a realistic scenario, but interesting to see.

I also tried turning off checksums to see if that had any noticable impact on speed. Much to my surprise, with checksumming off the read rate dropped by 20%! I don't understand how that could be possible though...

I have lots of bugs to report but I can never seam to be able to access http://developer.berlios.de/projects/zfs-fuse/. All i get is timeouts. Ive tried several proxies but nothing. Ive even checked the uptime at http://www.siteuptime.com but it says Quick check for: developer.berlios.de/projects/zfs-fuse Failed

To follow up: I next tried zfs-fuse and ext2 on a LVM2 logical volume (one layer closer to the metal than my file vdev test). It's clear the loopback in my previous test penalized ext2 more than zfs-fuse. With logical volume-based vdevs, read and write is 40% slower with zfs-fuse compared to ext2 (as measured by bonnie++).

For a more apples-to-apples comparison, I should be testing zfs with physical disk vdevs against a journaling filesystem on LVM2. Right now ext2 is whipping zfs-fuse, but ext2 also provides no filesystem integrity guarantees or disk spanning. For that, you need something like ext3 + LVM2, which would be a more fair match.

BTW: I have one issue with ZFS. After creating f.e. test/users/mneisen and cd'ing to that directory, I cannot checkout a subversion or git project. The creation of the .git or .svn (or some contents thereof) fails miserably, although the current user has all rights in this directory. root does not have this limitations. What am I doing wrong?

Yesterday I redid my bonnie++ tests with zfs-fuse LD_PRELOAD'ed with Google's tcmalloc library (a high performance malloc implementation) and found it shaved a minute off the compressed test and over 1 minute 30s off the uncompressed tests.

Well I just realised I handicapped my XFS benchmark tests quite severely. I'd forgotten I was running Beagle and it was helpfully trying to index the 2GB scratch files that Bonnie++ was creating as Bonnie++ was running!

This meant there was a lot of contention for disk I/O and it looks like it penalised XFS by almost 90 seconds over the whole run. I've updated my blog post (again) with the new (better) numbers for XFS.

I've just done a different series of tests on an ancient machine with 4 spare SCSI drives and was surprised to see that I don't get any speedup when I add more drives as stripes (or mirrors, which under Linux SW RAID can give improved read performance).

I'm pretty sure it's not a hardware limitation as XFS is almost twice the speed on a single drive and what puzzles me is that the I/O's do seem nicely balanced over the drives, just that when you add more drives each drive becomes less busy. RAID-Z was slower than a single drive too, though that's probably because of the additional burden of parity calculation.

It might be that because it's running in user space its threading model isn't sufficiently optimised yet to take advantage of the 4 CPUs in the box (i.e. the machine is too old for one of the 200MHz PPro's to do enough computation for the filesystem).

Anyway, the fact that I can just add drives to the array/mirror while the filesystem is live and have them in use immediately is still pretty cool. :-)

I've pretty much exhausted the tests I was going to do until the bug that stops me running executables from ZFS is fixed, so you can all get some rest from me for a bit.

Ricardo (and all the contributors, bug reporters, sponsors and supporters), thanks so much for your good work on this.

With the executable bug fixed the trunk version of ZFS can now do a "make bootstrap" of GCC 4.1.1 which involves a three-stage build. Stage 1 is built with the system GCC, stage 2 is built with the stage 1 compiler and stage 3 is built with the stage 2 compiler. Stage 2 and 3 are then compared with each other to make sure they have built identical code.

Was really impressed by it when I played with it. I had it running on an Ubuntu Edgy box. It was not very stable at all I'm afraid though. It brought down gnome when I played with drag and drop functionality. I tried copying about 200 megs of songs to the file system that I created.

Maybe it was just me, as I dont really know how to use ZFS yet. But I also was not able to delete pools afterwards, that was frustrating. And when it crashed or was restarted it seemed to lose the file system along with any data put on it, however, it would still believe that it is still there even if you are not able to see it.

All in all, I am really impressed and I am disparately looking forward to it being stable enough for real work.