On Dec 15, 2009, at 12:32 AM, tytso@mit.edu wrote:> Can you separate this patch into separate ones for each file system?

Sure.

> I think we can actually do better for ext4; for example, in the case> of data=journal or data=writeback, it's not necessary to flush the fs> data device. It's only the case of data=ordered that we need to send> a barrier. With that optimization, we do need to add a barrier in the> case of fsync() and when we are doing a journal checkpoint, but the> extra code complexity is worth not having to force a barrier for the> fs data device for every single commit, especially in the data=journal> mode with a heavy fsync workload.

Indeed, this is a good idea too.I think we still can squeeze a bit more juice out of it if we cansubmit the data device flush right after submitting all the data, butonly do the waiting for it before issuing journal device flush in normaljournaling mode (we need to do the waiting before writing commit blockin async journaling mode).

> Do you have a test case that will allow you to easily try out this> patch, in all of ext4's various journalling modes? Thanks!!

I am afraid this is not enough. This code is called after journal was flushed forasync commit case, so it leaves a race window where journal transaction is alreadyon disk and complete, but the data is still in cache somewhere.

Also the callsite has this comment which is misleading, I think: /* * This is the right place to wait for data buffers both for ASYNC * and !ASYNC commit. If commit is ASYNC, we need to wait only after * the commit block went to disk (which happens above). If commit is * SYNC, we need to wait for data buffers before we start writing * commit block, which happens below in such setting. */

In fact it is only safe to wait here in ASYNC mode (and internal journal only) becausethe device was already flushed (and I hope all device drivers drain the queue too, if not,even this might be problematic) by the blkdev_issue_flush.