As I promised I benchmarked some of the Linux filesystems on my solid state disk.

Introduction

I wanted to benchmark the following filesystems:

ext2, ext3

ext4

xfs

reiserfs, reiser4

nilfs

btrfs

zfs (via fuse)

NILFS2 haven't managed to even finish the Bonnie++ test. This means this filesystem is not yet ready to use (but promises very nice features). Other filesystem that has not been benchmarked is reiser4, because the Ubuntu kernel doesn't have support for it. I would need to patch it and I wasn't happy about it.

Images shown here show results of the standard Bonnie++ tests. Command to do them was:

bonnie -d /dir/on/ssd/partition -n 200:200

The -n parameter was tuned so that for each test some values were returned. With default setting I got many values "++++" indicating test was performed so fast, that Bonnie++ was not able to calculate the performance.

Possible write-caching is good thing though (and by default enabled), so I have no problem with that.

All tests were run twice, but the results was nearly the same, so I just removed the second results for each filesystem.

For each test, bigger is better with value being thousands operations per second.

The best filesystem

As some suggest, the preferred I/O-scheduler for SSD disk is "noop", which means there's no IO scheduling in kernel, so we rely on scheduling logic in the hardware (which for various reasons is believed to be good in SSD disks) and profit from no software overhead of queuing.

Let's then compare how well filesystems perform with this scheduler chosen:

This benchmark was performed for all filesystems but NILFS2 and reiser4.

Random seeks

When it comes to random seeks (very important for low-latency systems), the best is ext4 with reiserfs and xfs having almost the same result. Btrfs is next (10% slower), then ext3, ext2 and zfs at the end being 6 time worse than the best.

Creation and deletion of files

Ext4 is the fastest in creating files (both sequentially and randomly) while btrfs is the fastest in deleting files, which small exception of ext2 being 7 times faster than everything in sequentially deleting files. On the other hand it's ability to delete files in random fashion is pretty bad. Comparing only btrfs and ext4, both are fast, the difference is about 10% to the one or the other side. Ext3 performs pretty well in this test, reiserfs reaches about half the performance of ext4/btrfs, while xfs and zfs are really slow.

Read/write

Reading and writing of data is pretty equal through filesystems in terms of benchmark results. The worst results has zfs and ext2 (especially in random read, which is vital in modern use of computers).

As per-character reading/writing is not so important, let's concentrate on the rest of tests. As you can see, btrfs is clearly the best, having first place in 3/5 tests being really close to first in the following two. Reiserfs and ext4 perform really well in this area.

Semi-summary

At this point it's clear that when it comes to performance on SSD disk we have two filesystems to consider the best: btrfs and ext4. Reiserfs is slightly worse, but the most mature from them. Xfs and ext3 are just OK, but don't perform equally good in all tests, while ext2 and zfs (on fuse) being not an option at all.

Let's then compare the features of the three:

Limits

reiserfs

ext4

btrfs

file name

4 KB

256 B

255 B

max file size

8 TB

16 GB to 16 TB
(depends on block size)

16 EB

max volume size

16 TB

1 EB

16 EB

Features

reiserfs

ext4

btrfs

checksum (error check)

no

yes

yes

snapshots (like time machine)

no

no

yes

mirroring/stripping on FS layer

no

no

yes

compression

no

no

yes

So it seems, btrfs is full of new features compared to (old) reiserfs and (new) ext4 with only small performance penalty in some areas, while even being faster in some.

What is the best scheduler, then?

Having chosen the best filesystems, let's see, which scheduler works best.

ext4

Comparison of schedulers performance for ext4 filesystem:

As we see, cfq is the best in 5 tests, significantly worse in two (random creation of files and random seeks) and nearly as good as best in the rest. Deadline and noop perform pretty the same (with noop being better at creating files randomly and deadline being better at creating files sequentially).

reiserfs

Schedulers performance for reiserfs filesystem:

For reiserfs, again, cfq does its work really well with only random reads being significantly slower than deadline and noop schedulers.

btrfs

Let's now see what scheduler will be best for btrfs filesystem:

This time not cfq, but deadline scheduler is the man! In random seeks, where cfq is generally worse, this time it's worse by about 30% than the best: deadline scheduler. Only in one test deadline is worse than noop, but this is only slight difference. Cfq is only slightly better in 4 tests, but in the rest, deadline is better.

Ultimate comparison of btrfs, reiserfs and ext4

When we know which scheduler will run best with certain filesystems, let's compare Bonnie++ results for the perfect tandems:

btrfs with deadline scheduler on underlying disk

ext4 with cfq

reiserfs with cfq

Random seeks

Choosing cfq scheduler on any of the tested filesystems degraded random seek performance. This is why btrfs with deadline scheduler is better than its competitors.

Creation and deletion of files

This time, without ext2 having so high bar, you can see the differences in creating and deleting files on the three filesystems I tested. Btrfs is much (about 2 times) faster than ext4 in deleting files, while ext4 is a bit faster in creating files randomly, and significantly faster (about 50%) in creating files sequentially. Reiserfs is about 2 times slower than the slower filesystem in each test.

Read/write

In read/write tests btrfs, ext4 and reiserfs perform almost equally well, with btrfs being slightly better than the latter two filesystems.

Summary

Ext4 and btrfs filesystems perform really well on SSD disks making the users really happy about the speed they get from normal computer use.

With ext4 being the default filesystem for Ubuntu 9.10, having SSD disk, you'll notice that it boots really fast:

from Grub to GDM in 8 seconds

from GDM to GNOME in 5 seconds

OpenOffice launches the first time in 2-3 seconds (second time is 0 seconds)

With btrfs being equally fast (or even faster) than ext4 it's amazing what features it delivers:

snapshots (you can make a snapshot of filesystem and then roll back to it, or just explore historical version files)

compression

mirroring/stripping — things usually done on block-device level now incorporated into filesystem

nice internal structures and algorithms (copy on write, B-trees, …)

integrated volume management

On the second hand, btrfs as of version 0.19 has still experimental disk format, which means it can be non-compatible with future kernels, but usually kernel developers create code that either preserve the old format or converts the filesystem to new one on the first mount (and then it's not possible to mount such a partition from an older kernel). Also if I understand well, the biggest changes from 0.18 to 1.0 were just applied in 0.19, so probably this will be the final format of btrfs partitions.

It's clear, that btrfs is the Linux answer to Sun's ZFS, which due to incompatible license can't be incorporated into the kernel (this is why we only have FUSE port available).

Having all this said, it's time for me to migrate my /home (and maybe the system partition too) to btrfs!

Some of you, following Wikidot code on GitHub may see it's nicely split into templates, php, web and conf directories. But this is the first impression.

Maintaining Wikidot is a bit more complex, because, files uploaded to sites are located in web, side to side with some static Wikidot php and javascript files. Also for historical reasons, there are web/files--common and web/files--local directories, which maps to /common--* and /local--* URLs and in fact, the files--local is never served directly by the web server (need to check permissions first).

Also some time ago, we made static files versioned, so that we can apply more aggressive HTTP caching to them (reducing average page load time) and still be able to fix bugs on them without waiting a few days till the cache expire. In current model, URL to static file contains version hash, this may be for example: http://static.wikidot.com/v--b44e0ce810ee/common--javascript/WIKIDOT.js (notice the b44e0ce810ee). The whole static.wikidot.com is now hosted on Amazon's CloudFront, which means you get static Wikidot files from a server nearby your location and not always from USA.

This all become quite complicated, so we decided to make things really clear and simple in the source code. The primary rule: make the source code (updatable from git) separate from files uploaded by users and generated by Wikidot. Second rule: make files that are automatically generated during installation (not in the runtime) separate from persistent files (like the uploaded by users) and from source code.

And at the end there needs to be some place for logs and a place for temporary data (we need this to generate some random cool stuff, but after generating them, files are deleted).

So we end up with something like this:

WIKIDOT_ROOT

data/

avatars/ — user avatars

sites/ — site files (both generated thumbnails and uploaded files)

generated/

static/ — generated static files. This dir can be server directly by a fast non-PHP webserver for static.wikidot.com in case we don't want CloudFront anymore

tmp/ — temporary files including Smarty compiled versions of templates. Content of this dir can be safely removed

Wikidot persistent data is now ONLY database and data/ directory, so it's easy to backup and restore the application (if you have enough time to make full backup of this).

There is still one exception to this nice schema which is php/db/base directory, which is autogenerated during installation from XML database definition files, but the cleaning is not over, I still work on this.

Nice thing about this work is that it does not need a lot of code changing, because directory paths are usually stored in one (max two) places in application, so this kind of totally reorganizing directory structure does not break things. As such, it is very very worth doing it. In the end we get clean internal structure of files and it's clear which files you can safely remove, which you can restore from git (and thus you can experiment a little on them — in case of crash, just re-download application), which are "state" of the Wikidot and where to look for logs.

This all is also very important, because we aim to make current Wikidot.com source open and as such we want it to be a nice code.