On 4/8/07, Nathan Beyer wrote:
> Well, all of the mfence operations seem to be wrapped in helper
> functions, so it should be a fairly targeted extraction that can
> easily be tweaked as we go forward.
>
> In the 'atomics.h' file, the helper functions on EM64/Win64 use the
> intrinsic functions "_ReadWriteBarrier" and "_WriteBarrier"? Could we
> just use those same functions on all platforms? They seem to be
> available everywhere.
Good idea. Those fence instructions have their respetive usage but
could be useless for an architecture with a stronger memory model. To
put them into macro or intrinsic would be a better approach than to
simply remove them. (In case of the architectures that use different
atomic mechanism like LL/CS vs. CAS, we may have to rewrite some code
sequence, but that's another issue.)
Thanks.
xiaofeng
> -Nathan
>
> On 4/6/07, Rana Dasgupta wrote:
> > Gregory,
> > First, the experiments are really useful and increase confidence
> > more than any amount of discussion can. Thanks.
> > Here is my understanding of some processor basics, which is not a
> > whole lot. The x86 memory model is actually quite similar for P3, P4,
> > Xeon processors for write back caches( most ) and non write combining
> > memory ( most ).
> > Some things always hold true...writes are committed in program
> > order( they are not done speculatively...so if a thread/processor does
> > 3 updates in the program stream, they will be in order except for
> > streaming writes like in SSE2 instructions and some rare string
> > operations which are unordered ), but reads can be in any order.
> > Reads can pass buffered writes, but it is almost certainly true that
> > this will not happen on the same location. Reads/writes cannot pass
> > instructions with a lock prefix, etc.
> > This is true of a single processor/thread, but for SMP's the
> > guarantees are weaker. The above is true for each processor, but not
> > for all the processors together. Writes from one processor can be
> > unordered with respect to writes from another processor. This is OK
> > because when we have a true contention between writes to the same
> > memory location across threads, we always explicitly use critical
> > sections and locks. We never rely on the processor ordering. Any VM
> > code that does not do this is possibly wrong, and if we find it, we
> > will need to change it.
> > The fence instructions ( sfence and mfence ) force all the pending
> > and queued upstore and load/store instructions to finish before the
> > next instruction( after the fence ) follows. They are not true lock
> > instructions and are much cheaper...and they can only prevent the
> > following instructions from being surprised by earlier instructions
> > that have not yet been committed because of some complex
> > cache/buffer/speculation behaviour. For example, they enforce volatile
> > behaviour in the concurrent.atomics classes etc. On PIII, if we don't
> > use the SSE type instructions, given the simpler cache and write
> > buffer architecture on the older PIII machines, there is a good chance
> > that we will be OK. This is unlikely to be true on P4, HT and
> > multicore systems.
> > So we should just try operating without them on the PIII only( not
> > sfence, which exists on PIII, but lfence which is used for
> > readwritebarriers), and if Nathan or we find concurrency related
> > failures in some tests down the line, we will need to put locks in
> > that part of the code. Locks are a really expensive way to do this
> > type of serialization, but that's the only option.
> >
> > Thanks,
> > Rana
> >
> >
> >
> > On 4/6/07, Gregory Shimansky wrote:
> > > On Friday 06 April 2007 02:39 Rana Dasgupta wrote:
> > > > On 4/5/07, Gregory Shimansky wrote:
> > > > > On Thursday 05 April 2007 00:48 Rana Dasgupta wrote:
> > > > > > On 4/4/07, Gregory Shimansky wrote:
> > > > > > > On Wednesday 04 April 2007 23:33 Rana Dasgupta wrote:
> > > > > > > > On 4/4/07, Mikhail Fursov wrote:
> > > > > > > > > On 4/4/07, Alexey Petrenko wrote:
> > > > > > > > > > 2007/4/4, Gregory Shimansky :
> > > > > > > > > > > > > I would like to see these modifications. I wonder what
> > > > > > > > > > > > > you've done in
> > > > > > > > > > >
> > > > > > > > > > > port/src/thread/linux/apr_thread_ext.c and
> > > > > > > > > > > vmcore/include/atomics.h. They contain mfence and sfence
> > > > > > > > > > > instructions in inline assembly which have to be changed to
> > > > > > > > > > > something else on P3.
> > > > > > > >
> > > > > > > > MemoryWriteBarrier() etc. should be no-ops on PIII. x86 is already
> > > > > > > > strongly ordered for writes ?
> > > > > > >
> > > > > > > What about MemoryReadWriteBarrier()? If you know, what kind of code
> > > > > > > should be used for this in P3?
> > > > > >
> > > > > > One of the compiler guys can confirm this. But I don't believe that
> > > > > > you need to worry about any of the fence instructions fence on any of
> > > > > > the PIII, PIV genuine intel procs unless you are using streaming mode
> > > > > > ( SIMD ) instructions which are weakly ordered.
> > > > >
> > > > > I actually grepped the use for MemoryReadWriteBarrier, MemoryWriteBarrier
> > > > > and apr_memory_rw_barrier functions which are wrappers to mfence/sfence
> > > > > instructions. They aren't used in the code which uses SSE2 in any way.
> > > > >
> > > > > - The apr_memory_rw_barrier (executes mfence) function is used in thin
> > > > > locks implementation in threading code.
> > > > >
> > > > > - MemoryReadWriteBarrier (executes mfence) is used in
> > > > > org.apache.harmony.util.concurrent natives implementation after
> > > > > writing/reading int/long/object fields via JNI.
> > > > >
> > > > > - MemoryWriteBarrier (executes sfence) is used in classloader for fast
> > > > > management of classes collection and in strings pool for the same reason.
> > > > >
> > > > > In all three cases SSE2 is not involved in any way, simply loads and
> > > > > stores are done with the memory. According to you in all of those cases
> > > > > memory barriers are not needed. I am just confused then why were they
> > > > > inserted in those places?
> > > >
> > > > I don't know the answer to this question ...unless it was intended to
> > > > cover clones etc. that don't fully support the writeback model...
> > >
> > > I should have put the question in a different way. I didn't actually mean that
> > > you should know why some code is written in VM. I don't know why some code is
> > > written in many places including those I mentioned.
> > >
> > > The question should actually be like, should we actually remove mfence and
> > > sfence assembly instructions from the VM sources for x86/x86_64 platforms? I
> > > commented mfence in port/src/thread/linux/apr_thread_ext.c and mfence/sfence
> > > in vmcore/include/atomics.h and ran VM tests on 5 different SMP boxes with no
> > > less than 4 logical CPUs on each of them (2 win32, linux32, windows64 and
> > > linux64). Tests seem to work just fine without mfence and sfence in VM code.
> > >
> > > With these instructions removed from the code there shall be no problem with
> > > P3 port on VM side. It seems they are actually unnecessary and were inserted
> > > for a reason that they help on SMP to synchronize caches. After your
> > > explanation that they are actually needed only when SSE2 is involved, it
> > > seems (and my tests show this) that they are just not needed.
> > >
> > > --
> > > Gregory
> > >
> >
>
--
http://xiao-feng.blogspot.com