If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

ZFS Still Trying To Compete With EXT4 & Btrfs On Linux

Phoronix: ZFS Still Trying To Compete With EXT4 & Btrfs On Linux

With the recent release of ZFS On Linux 0.6.2 that provides an open-source native Linux kernel module implementation of the Sun/Oracle ZFS file-system, the performance is faster, there are greater Linux kernel compatibility, and other improvements. Here's a fresh round of ZFS Linux benchmarks against EXT4 and Btrfs.

I've asked when other ZFS benchmarks have been posted if the pool was created with "ashift=12" for backing devices that are optimized for 4k writes, or even better, "ashift=13" for 8k on modern SSDs (basically everything these days, though I know the 510 advertises physical 512 sectors for whatever reason, it should still do significantly better with 4k or 8k). I never find out what the answer is, but it's probably 'no', which means these tests are kind of pointless.

I've asked when other ZFS benchmarks have been posted if the pool was created with "ashift=12" for backing devices that are optimized for 4k writes, or even better, "ashift=13" for 8k on modern SSDs (basically everything these days, though I know the 510 advertises physical 512 sectors for whatever reason, it should still do significantly better with 4k or 8k). I never find out what the answer is, but it's probably 'no', which means these tests are kind of pointless.

Phoronix refuses to adjust ashift itself. However, ZFSOnLinux added a drive database that will automatically do this for known drives. It is incomplete, but it will grow as people send me information on drives that are missing. Instructions for those who wish to contribute are available on the mailing list. Note that the link to the database is outdated. The current database is visible in the repository.

In this case, the drive was on the list, which mean that the benchmarks were done with ashift=13. This is what enabled ZFSOnLinux to go from underperforming ext4 in the IOMeter file server benchmark to outperforming it significantly. With that said, it is not clear to me how partitioning was done. ZFS would be somewhat handicapped (although not by much) if partitioning was done for it versus it doing its own partitioning. This is because the Linux elevator is redundant and is set to noop when ZFS has full control of the disk. Anyway, I have a few comments on each of the benchmarks:

1. Good benchmarking is hard and it is easy to do benchmarks that provide irrelevant results. I can usually find issues with the design of Phoronix's benchmarks, but in the case of IOMeter, I have not found anything wrong yet. Incidentally, ZFS does well here. That is likely because of a mix of ARC and ZIL.

2. The FS-Mark benchmarks tested the creation of 1MB files. This is a purely synthetic benchmark that does not match any workload anyone would do and does not matter to me much. If anyone has a real workload that does this, please let me know so I can start caring. Of some interest is how the filesystems scaled from 1 to 4 threads. ZFS had a 3% increase while btrfs and ext4 had 82% and 59% increases respectively. It is probably worth investigation why throughput did not increase quite so much. There is an Illumos patch to ZFS' internal IO elevator that might help with this. It will likely be merged in 0.6.3.

3. Phoronix did not appear to use DBench as it was intended to be used. It is supposed to use a load file that simulates a specific application, but there is no information about that. Being designed to test network filesystems, it is really useful when data points at different client counts are taken, but Phoronix only tested 1 client. With that said, I am okay with how ZFS performed versus other filesystems. The numbers here do not matter to me much.

4. Compile Bench is a fairly useless benchmark because compilation is not IO bound, yet this appears to run the IO workload without doing any real compilation. It is unlikely that a real build process will exceed more than a few megabytes per second, which basically any filesystem can handle. Despite that, it is interesting that ext4 managed to outperform the interface bandwidth of SATA III. The peak bandwidth of SATA III is approximately 600MB/sec, but ext4 managed 726MB/sec. This suggests that writes are being buffered. It is possible to get the same effect with ZFS by using a dedicated dataset and setting sync=disabled. This is what I do for builds on my computer. However, it does not make much of a difference because compilation is CPU-bound and not IO-bound.

5. Postmark has a few interesting irregularities. The first is that it is absent in previous Phoronix benchmarks. I noticed this when I went to look at ZFS' relative performance to ext4 and others so that I could see how using a proper ashift changed things. Another is that the standard error calculation for ext4 and btrfs is both 0. This suggests that ext4 and btrfs were not writing to disk. This benchmark was intended to measure mailserver IO performance, but it does a remarkably poor job of that. First, it is single-threaded and second, it does not call fsync(). Good mail server software should should call fsync before reporting delivery to ensure data integrity, but that does not happen here. Mail server software intended to scale should be multithreaded, but this benchmark is single threaded. This writes about 500 small files that in total are less than 5MB, which the kernel has no reason to flush to disk. In the case of ZFS, the non-zero standard error suggests that data is being written out. If a crash occurred during this benchmark, the simulated mail would be lost on ext4 and btrfs while ZFS would have managed to save at least some of it. Doing better here means increased data loss in the event of a crash, which does not interest me very much.

Phoronix refuses to adjust ashift itself. However, ZFSOnLinux added a drive database that will automatically do this for known drives. It is incomplete, but it will grow as people send me information on drives that are missing. Instructions for those who wish to contribute are available on the mailing list. Note that the link to the database is outdated. The current database is visible in the repository.

More Drives, and Data Reliability

I would live to see Tests on ZFS done with at least 4 Drives using RaidZ or RaidZ2 modes, since that to me is the strongest feature of ZFS along with some Data reliability Tests, compared to BTRFS, and EXT4.

Phoronix refuses to adjust ashift itself. However, ZFSOnLinux added a drive database that will automatically do this for known drives. It is incomplete, but it will grow as people send me information on drives that are missing. Instructions for those who wish to contribute are available on the mailing list. Note that the link to the database is outdated. The current database is visible in the repository.

In this case, the drive was on the list, which mean that the benchmarks were done with ashift=13. This is what enabled ZFSOnLinux to go from underperforming ext4 in the IOMeter file server benchmark to outperforming it significantly. With that said, it is not clear to me how partitioning was done. ZFS would be somewhat handicapped (although not by much) if partitioning was done for it versus it doing its own partitioning. This is because the Linux elevator is redundant and is set to noop when ZFS has full control of the disk. Anyway, I have a few comments on each of the benchmarks:

Yeah, I saw your other thread about the 0.6.2 release after I posted here. Are you sure the Intel 510 is in your list? It doesn't appear to be to me. As I posted in the other thread: