After reading about Intel's 'Shevlin Park' project to implement C++AMP in
llvm/clang, and failing to find any code for it, I decided to try to implement
something similar. I did it as an excuse to explore and hack on llvm/clang,
which I hadn't done before, but it's now at the point where it will run the
simplest matrix multiplication sample from MSDN, so I thought I might as well
share it.
The source is in:
https://github.com/corngood/llvm.githttps://github.com/corngood/clang.githttps://github.com/corngood/compiler-rt.git [unchanged]
https://github.com/corngood/amp.git [simple test project]
It's fairly hacky, and very fragile, so don't expect anything that isn't used
in the sample to work. I also haven't tested it on large datasets, and there
are some things that definitely need fixing before I'd expect good performance
(e.g. workgroup size). It currently works only on NVIDIA GPUs, and has only
been tested on my shitty old 9600GT on amd64 linux with the stable binary
drivers.
The compilation process currently works like this:
.cpp -> [clang++ -fc++-amp] -> .ll
- compile non-amp code
.cpp -> [clang++ -fc++-amp -famp-is-kernel] -> .amp.ll
- compile amp kernels only
.amp.ll -> [opt -amp-to-opencl] -> .nvvm.ll
- create kernel wrapper to deal with buffer/const inputs
- add nvvm annotations
.nvvm.ll -> [llc -march=nvptx] -> .ptx
- compile kernels to NVPTX (unchanged)
.ll + .ptx -> [opt -amp-create-stubs .ptx] -> .opt.ll
- embed ptx as array data
- create functions to get kernel info, load inputs, etc
.opt.ll -> [llc] -> .o
- unchanged
The clang steps only differ in codegen, so eventually they should be combined
into one clang call. NVPTX is meant to be replaced with SPIR at some point,
to make it portable, which is why I didn't bother with text kernel generation.
I won't go into implementation details, but if anyone is interested, or
working on something similar, feel free to get in touch.
Thanks,
Dave McFarland