If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

In addition, ZFSOnLinux now has an internal database of drives known to misreport sector size information, which should improve the performance of pools made by people unaware of ashift whenever they use hardware listed in the database. For those who are unaware, there is an internal setting called ashift (alignment shift) that is set internally at vdev creation (pools are made out of vdevs). This determines the layout of the disk format such that the minimum block size is 2^ashift. ZFS automatically picks the value of ashift by calculating the base-2 logarithm of drives' reported physical sector size (512-bytes is 9; 4096-bytes is 12) and using the largest one found in a vdev; this makes ashift analogous to blocksize in other filesystems. In an ideal world, this would be sufficient ensure vdevs are always created with proper alignment. Unfortunately, most (all?) SSDs and some advanced format disks misreport their sector sizes for Windows XP compatibility. ZFSOnLinux allows system administrators to override ashift at vdev/pool creation to ensure that the proper value is used, but ZFS will suffer from a fairly severe misalignment penalty on such hardware when this is not done. Other filesystems tend to default to a 4096-byte sector size, regardless of what the drive reports.

The new database does not include every drive that misreports sector size information, but it covers dozens of drives that do and it will grow as users contact me with missing entries. Instructions for those who wish to contribute are available on the mailing list. Note that the link to the database is outdated. The current database is visible in the repository.

The hardware used in Phoronix benchmarks is known to misreport its sector size. Last year, Michael informed me that ZFSOnLinux had to fix this for him. I am happy to state that is now the case. This should make Phoronix's benchmarks more accurately compare ZFSOnLinux' real world single-disk performance in synthetic benchmarks with that of other filesystems.

on Gentoo, which is my main production system, I do (I even use the live-/9999-ebuilds )

For the moment, the 9999 ebuilds are effectively the same as the 0.6.2 ebuilds. The only difference is that the 0.6.2 SPL ebuild includes a patch for FreeBSD-style hostid detection that upstream has not adopted.

Just found this thread after Michael's most recent non-representative ZFS benchmark. And of course he managed to find an Intel SSD that's not in the blacklist. You might want to check the FreeBSD 4k quirks (ADA_Q_4K) list from ata_da.c to boost your list.

Just found this thread after Michael's most recent non-representative ZFS benchmark. And of course he managed to find an Intel SSD that's not in the blacklist. You might want to check the FreeBSD 4k quirks (ADA_Q_4K) list from ata_da.c to boost your list.

Thanks for the link. Unfortunately, the formats are not interchangeable. While entries can be copied from our list into FreeBSD, the reverse is not the case. It also does not make a distinction between drives with 4KB sectors and 8KB sectors.

On a semi-related note, FreeBSD currently uses that list solely to adjust stripe size, which ZFS does not use.

Thanks for the link. Unfortunately, the formats are not interchangeable. While entries can be copied from our list into FreeBSD, the reverse is not the case. It also does not make a distinction between drives with 4KB sectors and 8KB sectors.

On a semi-related note, FreeBSD currently uses that list solely to adjust stripe size, which ZFS does not use.

Yeah, I realize they don't differentiate between 4KB and 8KB, which is somewhat problematic. On incompatibility, matching via pattern would seem to be a much more sensible idea, otherwise your list needs to be enormous for minor model variations, and it will take a seriously long time to catalogue them all. You'll also need to make changes every time a minor model rev is released, which going forward is likely to be every device released, until they stop lying, which could be a seriously long time. I understand the current method is more accurate, but it's also a maintenance nightmare, and specificity is pretty simply controlled by list order if you need to override an existing pattern for a special-case device. Also, since matching only happens very infrequently, the performance difference should have negligible impact.

I actually sort of wonder if it wouldn't have been a better idea to just go ashift=12 by default and potentially have a blacklist for true 512B devices.