Re: NetBSD 5.1 TCP performance issue (lots of ACK)

On Sat, Oct 29, 2011 at 01:37:40PM -0700, Dennis Ferguson wrote:
>
> On 29 Oct, 2011, at 12:59 , Manuel Bouyer wrote:
> > On Fri, Oct 28, 2011 at 06:55:30PM +0100, David Laight wrote:
> >> On Fri, Oct 28, 2011 at 04:10:36PM +0200, Manuel Bouyer wrote:
> >>> Here is an updated patch. The key point to avoid the receive errors is
> >>> to do another BUS_DMASYNC after reading wrx_status, before reading the
> >>> other values to avoid reading e.g. len before status gets updated.
> >>> The errors were because of 0-len receive descriptors.
> >>
> >> I'm not entirely clear where the mis-ordering happens. I presume the
> >> fields a volatile so gcc won't re-order them. Which seems to imply
> >> that the only problem can be the adapter writing the fields in the
> >> wrong order (unless the data is cached and spans cache lines).
> >> In that case the BUS_DMASYNC is also acting as a delay.
> >
> > AFAIK the CPU is allowed to reorder reads. linux has a rmb() here,
> > which is an equivalent of our x86_lfence() I guess.
> > But for platforms where BUS_DMASYNC is not a simple barrier,
> > 2 BUS_DMASYNC calls are needed.
>
> CPUs in general are allowed to reorder reads, but Intel and AMD
> x86 CPUs in particular won't do that. The linux rmb() expands to
> an empty asm() statement, essentially (not quite) a NOP.
I have established that in -current, at least, the compiler
doesn't reorder the reads in wm_rxintr(). People seem to disagree
whether an x86 CPU will reorder the reads. :-)
According to <http://www.linuxjournal.com/article/8212?page=0,2>, x86
will reorder reads:
..., x86 CPUs give no ordering guarantees for loads, so the smp_mb()
and smp_rmb() primitives expand to lock;addl.
In NetBSD-current, membar_consumer() is
ENTRY(_membar_consumer)
LOCK(13)
addl $0, -4(%esp)
ret
ENDLABEL(membar_consumer_end)
which resembles the x86_lfence() that bus_dmamap_sync(POSTREAD) calls,
ENTRY(x86_lfence)
lock
addl $0, -4(%esp)
ret
END(x86_lfence)
I believe that on a UP machine, the LOCK prefix in membar_consumer() is
overwritten with a NOP. The LOCK prefix in x86_lfence() is not erased
in that way. Is the LOCK prefix important to the proper operation of
bus_dmamap_sync() even on a UP machine?
Dave
--
David Young
dyoung%pobox.com@localhost Urbana, IL (217) 721-9981