On Dec 29, 2009, at 3:48 AM, Matthew McClement wrote:
> On 12/27/09 19:45, Daniel Kristjansson wrote:
>> I sounds like you had performance issues with mysql on ext4.
>> I'm not aware of any such problems which are not shared by ext3.
>> It may have just been the ext4 file system creation params are
>> different in your distro for ext3 and ext4. XFS beats both handily
>> for mysql performance. But I'm actually using ext3 her for mysql
>> tables because XFS clears all open files on a crash, while I prefer
>> to have them in an intermediate state so mysql can attempt recovery.
>> Just to address the "XFS clears open files" bit, this isn't strictly
> true(certainly not since 2.6.22). XFS will never clear an open file
> that
> already has committed data on disk. What it *will* clear are any
> extents
> that were allocated prior to the crash(whether it's a power failure,
> kernel panic, FS error), but hadn't actually been written out to disk
> yet. This is a side effect of only metadata being journal'd, so a
> rollback to create a consistent filesystem means that the extent
> allocation needs to be wiped.
I realize anecdotes don't prove anything, but I've been using XFS for
many years and have never had an entire file zeroed out, which would
seem to back up the above.
I suspect XFS just makes explicit something that is, as far as I know,
a limitation of any filesystem that does caching and doesn't journal
file data. This would presumably include ext3fs in its default
configuration. (Ext3 *can* be configured to journal data, but it
results in a large performance hit.)
If it *didn't* clear those extents, they'd have whatever random data
happened to be on that patch of disk. Remember, the data hasn't
actually been written yet. To my mind this is worse than zeroes.
While we're on the subject, it's worth mentioning that with consumer-
grade disks, there's also no guarantee that the disk will write the
data to the platter when it says it did. Most do, but some -- in
particular, some USB enclosures -- fake it and leave the data in
cache. The ZFS guys have had trouble with this in the past.