On Fri, 1 Jul 2005, Alan Cox wrote:
> > But that mentions compiler only, not CPU ordering! I understand the BIU
> > of the issuing CPU and any external hardware is still permitted to
> > merge/reorder these accesses unless separated by wmb()/rmb()/mb() as
>
> I think the practical situation is that this implies ordering to the bus
> interface. It might be interesting to ask the powerpc people their
> experience but looking at most PCI drivers they assume this and it would
> be expensive not to do so on x86.
Hmm, doing this OTOH would be expensive on platforms actually requiring
explicit barriers for this to be the case. The problem is only drivers
know what they expect, e.g. you may need as much as:
writel();
mb();
readl();
but only:
readl();
rmb();
readl();
With barriers coded explicitly in drivers, you may control this, with ones
inside these mmio functions/macros you need to use mb() everywhere as you
don't know what the surrounding operations are going to be. And mb() may
be significantly more expensive than rmb().
Of course to facilitate such explicit barriers for platforms where
inter-processor ordering rules are different to ones for mmio a different
set of operations would have to be defined -- actually we've already got
one, mmiowb(), as a starting point.
> > We have that iob() macro/call as well, so that you can push cycles out of
> > the CPU domain immediately as well, which is equivalent to:
>
> > mb();
> > make_host_complete_writes();
>
> My feeling is the default readb etc are __readb + mb + make_hos...
Hmm, barriers are normally expected to happen *before* affected
operations, which is natural and often much faster as in the case of
traditional MIPS write-back buffers, where there is no "flush" operation
and mb() is just a tight loop spinning on the WB condition non-empty,
e.g.: "0: bc0f 0b" till the buffer empties itself. So I'd rather make
readb() being mb() + make_host_complete_writes() + __readb(). But it
would be more painful performance-wise than necessary for many cases,
questioning the whole idea as any sane driver writer would prefer to use
these double-underscore calls and schedule barriers as necessary manually
anyway.
But if it's indeed what's intended I'd prefer it to be documented
somewhere in a reasonable place as there are people outside the Intel
world which may not necessarily know which interfaces imply Intel
semantics and which do not. ;-)
Maciej