>> Wednesday, November 19, 2014

The good folks at Manning have uploaded the new FFT code to the book's main site. I've updated the Linux and Windows archives, but I didn't change the Mac OS archive because my MacBook White is long dead.

As far as I can tell, the new code takes care of the race condition. If anyone has any concerns, please let me know.

>> Saturday, November 15, 2014

Because of the comments I received, I decided to test my FFT on new systems with new hardware and new drivers. My FFT passed every test, so I wrote a self-satisfied post stating that the commenter's problem was caused by using work-groups whose sizes weren't a power of two.
Then it dawned on me. In the fft_init kernel, work items read data from bit-reversed addresses and write the processed data to unreversed addresses in the same buffer. This makes it possible for one work item to read data that has already been processed by another. This is the race condition to which the commenter was referring.
Thankfully, this problem is easy to fix. I'll add a second buffer to fft_init so that every work item reads from the first buffer and writes to the second. I'll get this coded tomorrow morning and I'll contact Manning to get it uploaded to their software site.
I'd like to thank the commenter for his/her assistance. I'd also like to point out that my bit-reversal algorithm, while idiosyncratic, is perfectly functional.

>> Saturday, November 1, 2014

Over three and half years ago, I completed the OpenCL FFT that I discussed in Chapter 14. I tested it with data sets of varying sizes on different graphics cards and operating systems. It ran successfully every time, but recent comments make it seem likely that there's a race condition that needs to be addressed.

The problem with debugging an FFT is that it requires lengthy time for concentration, which usually involves me lying on the floor and squinting up at the ceiling for hours on end. Unfortunately, I'm busy at the moment and don't the time. But because I'm so ashamed, I'm going to take the week of 11/10 off from work and I'll do my best to resolve the problem.

It looks like the root cause is my bit-reversal routine, and I'll explain why this is particularly jarring. If you're familiar with FFT code, then you know that many routines perform bit-reversal with code like the following:

ans = x & 1;
while(--numBits) {
x >>= 1;
ans <<= 1;
ans += x & 1;
}

Rather than operate on scalars, I devised a routine that bit-reverses all four elements of a uint4 vector at the same time. I thought it was clever, but if it causes a race condition, it has to go.

I apologize to everyone who was/is disappointed with the code. If you're still looking for a good OpenCL FFT, I recommend the clFFT project. This was once part of AMD's Accelerated Parallel Processing Math Libraries (APPML), but it looks like that's no longer supported.
Read more...