Thanks everyone for your input. There was some very valuable
observations in the various emails.
I will try to pull most of it together and bring out what seem to be
the important points.
1/ A BIO_RW_BARRIER request should never fail with -EOPNOTSUP.

Sounds good to me, but how do we test to see if the underlying
device supports barriers? Do we just assume that they do and
only change behaviour if -o nobarrier is specified in the mount
options?

I would assume so.
Then when the block layer finds that they aren't supported and does
non-barrier ones, then it could report a message.
We, xfs, I guess can't take much other course of action
and we aint doing much now other than not requesting them
anymore and printing an error message.

2/ Maybe barriers provide stronger semantics than are required.
All write requests are synchronised around a barrier write. This is
often more than is required and apparently can cause a measurable
slowdown.
Also the FUA for the actual commit write might not be needed. It is
important for consistency that the preceding writes are in safe
storage before the commit write, but it is not so important that the
commit write is immediately safe on storage. That isn't needed until
a 'sync' or 'fsync' or similar.

The use of barriers in XFS assumes the commit write to be on stable
storage before it returns. One of the ordering guarantees that we
need is that the transaction (commit write) is on disk before the
metadata block containing the change in the transaction is written
to disk and the current barrier behaviour gives us that.

Yep, and that one is what we want the FUA for -
for the write into the log.
I'm taking it that the FUA write will just guarantee that that
particular write has made it to disk on i/o completion
(and no write cache flush is done).
The other XFS constraint is that we know when the metadata hits the disk
so that we can move the tail of the log.
And that is what we are effectively getting from the pre-write-flush
part of the barrier. It would ensure that any metadata not yet to disk would
be on disk before we overwrite the tail of the log.
If we could determine cases when we don't have to worry about overwriting
the tail of the log, then it would be good if we could
just do FUA writes for contraint 1 above. Is that possible?
--Tim