Hey,
On Tue, 2013-03-12 at 14:09 +0000, Geoffrey Mainland wrote:
> On 03/10/2013 09:52 PM, Nicolas Trangez wrote:
> > ...
>> Hi Nicolas,
>> Have you read our paper about the SIMD work? It's available here:
>>https://research.microsoft.com/en-us/um/people/simonpj/papers/ndp/haskell-beats-C.pdf
I didn't read that one before (read other stream-fusion related papers
before), but did now. I got most of it already while reading the vector
simd branch commits. Benchmarks results look very nice!
I'm afraid I didn't 'get' how the framework would allow for both AVX and
SSE instructions to work on streams, since it seems to assume Multi's
are always a fixed number of bytes wide (in this case 16 for SSE).
> The paper describes the issues involved with integrated SIMD
> instructions with the vector fusion framework.
>> There are two primary issues with alignment: stack alignment and heap
> alignment.
>> We cannot rely on the stack being properly aligned for AVX spills on any
> platform, and LLVM's stack fixup code does not play well with GHC, so we
> *rewrite* all AVX spill instructions to their unaligned counterparts. On
> Win32 we must do the same for SSE.
Does this imply stack values are always 16-byte aligned?
I haven't worked with AVX yet (my CPU doesn't support it).
> Unboxed vectors are allocated by GHC, and it does not align memory on
> 16-byte boundaries, so our first cut at SSE intrinsics simply used
> unaligned accesses. Obviously with ForeignPtr's we can control alignment
> and potentially use the aligned variants of SSE instructions, but this
> will almost double the number of primops. One could imagine extending
> our fusion framework to transition to aligned move instructions.
Right. I created the patch of #7067
(http://hackage.haskell.org/trac/ghc/ticket/7067) for vector-simd
purposed back then (adding mallocForeignPtrAlignedBytes and
mallocPlainForeignPtrAlignedBytes).
> Finally, LLVM 3.2 does not work with GHC. This means we cannot yet take
> advantage of its new vectorization optimizations, which is a shame.
>> So, four projects for you or anyone else who is interested, in rough
> dependency order:
>> 1) Get LLVM 3.2 working with GHC's LLVM back end.
According to other mails in this thread this should be fixed. I'll give
it a go.
> 2) Fix the stack alignment issue with LLVM. This will likely require a
> patch to LLVM.
I'm afraid that's a bit out of my league for now :-)
> 3) Add support for aligned move primops.
I looked into this before, might give it a stab.
> 4) Extend the current SIMD fusion framework to handle transitioning to
> aligned move instructions. As an alternative, only use aligned move
> instructions on memory that we know is aligned.
This is why I sent my previous mail initially: is there any plan how to
approach the 'memory that we know is aligned' bit? Would it make sense
to have a more general 'alignment restriction' framework for arbitrary
values, not only unboxed vectors (if there are any other use-cases)?
> These are all on my todo list, but my plate is quite full at the moment.
Heh, sounds familiar ;-)
Thanks,
Nicolas