I am a new Gentoo developer. I am working on ZFS support in Gentoo Linux, among a few other things. I am the person responsible for the sys-fs/zfs and sys-kernel/spl ebuilds in the tree. I am also the person who wrote ZFS support in genkernel. I have been working hard on ZFS support for the better part of the year and that is starting to bear fruit.

In particular, support for Linux swap on ZFS zvols, kernel preemption and Gentoo Hardened will soon enter the tree in the form of a snapshot. It is possible to boot off ZFS using GRUB2, but that is limited to single drive pools and mirrored setups. I am working on a port of FreeBSD's bootloader to address that. I intend to release documentation detailing how to install Gentoo Linux on ZFS when it is ready.

Does anyone have any questions?

Last edited by ryao on Mon Jun 04, 2012 7:04 am; edited 1 time in total

Is ZFS on Linux fast and stable? I'd be interested in using the features, but I'm just not sure how "hacky" is that.

Qualitatively speaking, I see a performance improvement whenever I rebuild a system using it. My desktop, which has 8GB of RAM, is more responsive. I have not done any serious quantitative performance analysis, but I do have some figures. I have a server with 6x 5400RPM 2TB disks that was using md raid 6, lvm and ext4. Sequential write performance would never exceed 20MB/sec, but with ZFS raidz2, it is able to achieve 286MB/sec sequential write performance when using dd to write 4GB of zeroes (without compression). The code is not performance optimized, but all indicators are promising.

ZFS has the ARC page replacement algorithm, which is likely responsible for the improvement in my desktop's interactive response. ZFS also has the ability to use SSDs as L2ARC and SLOG devices, which should enable it to outscale other RAID/LVM/FS combinations. Using a L2ARC device purely benefits read IOPS by effectively expanding the space that the ARC algorithm has. Using a SLOG device makes random write performance match sequential write performance by requiring that all changes be written to it and then written to the actual disks asynchronously. Sequential write performance is equal to that of the SLOG device. ZFS organizes disks into vdevs and does striping across vdevs, so the maximum write performance is theoretically equal to the sum of the SLOG devices across all of the vdevs.

As for stability, people have had good experiences with the code, but I discovered numerous possible deadlocks involving direct reclaim when trying to make Linux swap on zvols work. Many of those deadlocks could even occur on systems that did not do swap on ZFS. So far, my GIT contains fixes for all but one of those bugs.

The remaining bug is a regression caused by the use of PF_MEMALLOC in sys-kernel/spl to workaround the kernel deadlock described in bug #416685. The developers at LLNL discovered it close to two years ago, talked to the kernel virtual memory subsystem maintainer and came to an agreement that it was a bug. They then discovered the PF_MEMALLOC workaround and no attempts have been made to fix the deadlock since then, although I want to address it soon. Using PF_MEMALLOC prevented a deadlock where memory allocations during direct reclaim could trigger direct reclaim inside the upstream Linux kernel code. It also has the side effect of allowing pages to be allocated from ZONE_DMA. Under heavy memory pressure, this will cause ZONE_DMA to become exhausted, causing a deadlock.

The deadlock caused by the remaining PF_MEMALLOC instance is a rare issue. Nearly all of the people on freenode that use Gentoo on ZFS have never encountered it and I do not think it should stop people from using ZFS. With that said, you will likley want to use code from my GIT. It is on Github, including an overlay that provides ebuilds that pull from it:

I plan to put a snapshot into the portage tree after I solve the last PF_MEMALLOC issue and implement Linux 3.4.0 support. I am uncertain how I will get a patch for the issue in the linux upstream sources into them, although I intend to push to get it into gentoo-sources when it is ready.

I've been running your overlay for a while now with no problems, but a few days ago when I re-ran genkernel again, I ended up with a system that won't boot. I don't remember the exact details, but it hangs when trying to mount the root filesystem. It was working before with the exact same kernel, but one thing that I noticed is that mount now also passes the "zfsutil" option, which it previously didn't. Have there been any recent changes to the genkernel support that could have caused this? For now, I've downgraded to a slightly older kernel that has a working initrd.

Also, when do you plan to add kernel 3.4 support? IIRC, the main repository already has the relevant patches. Thanks!

I've been running your overlay for a while now with no problems, but a few days ago when I re-ran genkernel again, I ended up with a system that won't boot. I don't remember the exact details, but it hangs when trying to mount the root filesystem. It was working before with the exact same kernel, but one thing that I noticed is that mount now also passes the "zfsutil" option, which it previously didn't. Have there been any recent changes to the genkernel support that could have caused this? For now, I've downgraded to a slightly older kernel that has a working initrd.

The zfsutil is something that I had failed to pass originally. It should not cause this. Which kernel sources do you use, what is your kernel version, which version of genkernel do you use and what command did you execute to generate the initramfs? Is it possible that you could have failed to rebuild the kernel modules against the updated kernel?

Note that I have a genkernel improvement planned that will eliminate the need to explicitly rebuild out-of-tree kernel modules by including it as part of the process, but for now, it is necessary to do --callback='module-rebuild rebuild' whenever you rebuild your initramfs.

EatMeerkats wrote:

Also, when do you plan to add kernel 3.4 support? IIRC, the main repository already has the relevant patches. Thanks!

Unfortunately, those patches do not meet upstream's coding standards. I will likely work on Linux 3.4 support after I resolve the last PF_MEMALLOC issue, assuming someone else does not beat me to it.

Is my overlay up to date? There was a regression in my overlay's zfs-9999.ebuild that prevented sys-fs/zfs from being added to the module database, so `module-rebuild rebuild` would not rebuild it. You can check if this is happening on your system by looking for sys-fs/zfs in `module-rebuild list`. Run `layman -S && emerge --oneshot sys-kernel/spl sys-fs/zfs` to correct it.

Yes, I had noticed that when I first switched to ZFS root, but I've since updated the overlay and it's definitely being rebuilt now.

I was able to reproduce this on my system. It seemed that I made a mistake when introducing Gentoo Hardened support into ZFS. I have reverted that patch for now. You should be able to run genkernel and these issues will go away.

With that said, I am going to make a snapshot in a day or two. That way people will not need to use 9999 ebuilds anymore.

Tried installing gentoo using your guide but after reboot it would get stuck on openrc. Not sure what the issue is. My approach was to have a rpool on a single ssd drive and a dpool raidz on 4 sata drives for /home and /usr/portage on dpool.

Tried installing gentoo using your guide but after reboot it would get stuck on openrc. Not sure what the issue is. My approach was to have a rpool on a single ssd drive and a dpool raidz on 4 sata drives for /home and /usr/portage on dpool.

Would you elaborate on the point at which it got stuck? Also, there are two reboots in my guide. Is this after the first reboot (needed to refresh partition tables) or the second reboot (needed to start the system)?

speed and throughput is nice but this lag is killing me (it has significantly improved but using it on the desktop & multi-tasking is still not possible that way; it could
also be an issue with data-writing on ~amd64 and the core architecture)

Unfortunatley, I did not spot your reply until now. What do you have on your pool? Does this problem occur merely from the module being loaded or is your media player software interacting with ZFS in some way?

Note that the best way to get in touch with me is in #zfsonlinux on freenode.

But gave up on FS-Mark, this runs for ages...
Flexible IO Tester = ~112s
FS-Mark (1000 files 1MB) = ~25files/s
The FS-Mark test with 5000 files wasn't finished with the first test after ~30min

If I had to guess, I would suspect that this is a mix of an improper ashift value and high flash utilization, in either actual space usage or the SSD's dirty pages. I suggest emailing the mailing list for further advice:

I've just merged zfs the other day and was trying to experiments a few things here and there. I'd consider adding support to sys-boot/mkinitramfs-ll (ref.: sig). However, I'm struggling(?) on compression property:

[/var/]portage tree size is ~300MB and ~60MB for a squashded copy (aufs+squashfs[gzip]); [/var/lib/]layman size is ~36MB and ~15MB (aufs+squashfs[gzip]). And here I got almost x3 size with property compression=gzip-9 zfs (WTF).

I did not yet read the whole man pages of zpool(8) and zfs(8) to be able to get what is going on or what could be going wrong atm.
So what could be the cause of getting x3 size with compression=gzip[-9]?_________________home/:mkinitramfs-ll/:supervision-scripts/:e-gtk-theme/:(bar-)overlay/

On the Gentoo Wiki there is a request for a Gentoo on ZFS article. That would be a good place to get all the relevant documentation in one place. If you guys could get that started, that would be grand... I am available for help on editing / wikifying (among other places in the #gentoo-wiki channel)._________________"Those who deny freedom to others deserve it not for themselves." - Abraham Lincoln
Free Culture | Defective by Design | EFF

Unfortunatley, I did not spot your reply until now. What do you have on your pool? Does this problem occur merely from the module being loaded or is your media player software interacting with ZFS in some way?

Note that the best way to get in touch with me is in #zfsonlinux on freenode.

so I switched to lzjb compression - I don't need best compression ratio on this new harddrive but it's nevertheless a nice bonus and lzjb even seems to raise performance and the load on the cpu is way lower

lowering compression helped with the issue but the main issue is still when ZFS is writing to the harddrive

I'm not sure how much of a role using cryptsetup/luks plays here since I don't want to try writing all of my data into unencrypted space

3) watching (HD) videos on youtube stutters a lot - any form of video, actually

4) entering text and other everyday productivity tasks are affected in the sense that letters are simply ignored/omitted or e.g. other keys of the keyboard for the time while this "lock" occurs the computer is "blind" for input from outside [is improved with BFS + O(1) patch]

5) streaming audio/video content is more affected than playing videos or sound locally

6) this luckily (?) only happens when ZFS is writing data, when ZFS is idling the kernel module doesn't affect workflow

hope that helps

edit:

made some changes - video playback is especially affected

the issue can be mitigated by using alternative cpu schedulers + tweaks