>> Sunday, November 11, 2012

In an earlier post, I whined about OpenCL's lack of atomic functions for floating-point operations. This makes it hard to code a high-performance dot product in OpenCL, but by using vectors and local memory, we can still do pretty well.

I've coded an application that computes the dot product of two vectors with 2^18 floating-point values each. The source files are on github and the kernel looks like this:

Executing this kernel, the device doesn't compute the entire dot product. Instead, each work group returns a value to the host, and the host computes the final sum. My tests have shown that this runs much faster than a basic multiply-and-add algorithm. Still, I'm sure there's room for improvement.

I've decided to open this blog for comments. If you have any thoughts on this kernel or anything else on this blog, feel free to write.