Michael Neuling wrote:
> Philippe Bergheaud <felix@linux.vnet.ibm.com> wrote:> > >>Unaligned stores take alignment exceptions on POWER7 running in little-endian.>>This is a dumb little-endian base memcpy that prevents unaligned stores.>>It is replaced by the VMX memcpy at boot.> > > Is this any faster than the generic version?
The little-endian assembly code of the base memcpy is similar to the code emitted by gcc when compiling the generic memcpy in lib/string.c, and runs at the same speed.
However, a little-endian assembly version of the base memcpy is required (as opposed to a C version), in order to use the self-modifying code instrumentation system.
After the cpu feature CPU_FTR_ALTIVEC is detected at boot, the slow base memcpy is nop'ed out, and the fast memcpy_power7 is used instead.
Philippe

Hi,
> > Unaligned stores take alignment exceptions on POWER7 running in> > little-endian. This is a dumb little-endian base memcpy that> > prevents unaligned stores. It is replaced by the VMX memcpy at boot.> > Is this any faster than the generic version?
Once booted the feature fixup code switches us over to the VMX copy
loops (which are already endian safe).
The question is what we do before that switch over. The base 64bit
memcpy takes alignment exceptions on POWER7 so we can't use it as is.
Fixing the causes of alignment exception would slow it down, because
we'd need to ensure all loads and stores are aligned either through
rotate tricks or bytewise loads and stores. Either would be bad for
all other 64bit platforms.
Anton

OK, can you add that and/or maybe antons description to the patch changelog?
Mikey
On Wed, Nov 6, 2013 at 9:21 PM, Philippe Bergheaud
<felix@linux.vnet.ibm.com> wrote:
> Michael Neuling wrote:>>>> Philippe Bergheaud <felix@linux.vnet.ibm.com> wrote:>>>>>>> Unaligned stores take alignment exceptions on POWER7 running in>>> little-endian.>>> This is a dumb little-endian base memcpy that prevents unaligned stores.>>> It is replaced by the VMX memcpy at boot.>>>>>>>> Is this any faster than the generic version?>>> The little-endian assembly code of the base memcpy is similar to the code> emitted by gcc when compiling the generic memcpy in lib/string.c, and runs> at the same speed.> However, a little-endian assembly version of the base memcpy is required (as> opposed to a C version), in order to use the self-modifying code> instrumentation system.> After the cpu feature CPU_FTR_ALTIVEC is detected at boot, the slow base> memcpy is nop'ed out, and the fast memcpy_power7 is used instead.>> Philippe>