>> Sunday, July 31, 2011

I'm currently coding DynLab, a scientific visualization tool that uses OpenCL for geometry and physics, OpenGL for rendering, and Qt as the overall framework. Some of the theory is difficult, particularly that involving boundary representation and object collision, but I'm enjoying the work. Thankfully, there are two technologies that make life easier:

I'd worked with COLLADA when I wrote the Cell processor book, but that was version 1.4. The latest version of COLLADA, 1.5, supports physics and boundary representation, so that's wonderful.

And so is ODE. Not only does it provide routines related to rigid-body dynamics, it also provides a test application that demonstrates how ODE and OpenGL work together. As I port aspects of ODE to OpenCL, I'm genuinely impressed with the author's code and documentation. I'm surprised I'd never heard of this before.

One thing bothers me, though. ODE hasn't had an official update since 2009. COLLADA hasn't had a new version since 2008. Has the Khronos Group decided that COLLADA 1.5 is perfect, or have they realized that commercial users are relying on Microsoft's technology (stable and integrated) instead of their own (high-performance but decentralized).
Read more...

>> Sunday, July 17, 2011

I've been experimenting with my FFT code, changing the size of the work-groups and the amount of local memory each group has to work with. Here are my observations:

Increasing the work-group size always improves performance.

Decreasing the amount of local memory available to each work-group usually improves performance.

The first point didn't surprise me but the second did. My initial kernel computes one FFT for each work-group and the FFT's size fully occupies local memory. That is, if a work-group has 32kB local memory and each complex point occupies 2*sizeof(float) = 8 bytes, then each work-group can perform a 4k-point FFT. Successive kernels merge the work-groups' results until the final FFT is computed.

I'd assumed that each work-group should perform as large an FFT as possible. This means less synchronization and fewer successive stages. But when I experimented, the opposite held true. As I reduced the amount of local memory allocated for each group, the FFT performance improved.

I have a theory. The more local memory each work-group has, the more each work-item needs to read from global memory. Ideally, work-items in a work-group will combine their read requests so that the group's memory operations are performed at once. But in my FFT, the repeated iterations may end up producing staggered global memory operations, which are very time consuming. Further experiments are needed.
Read more...

>> Wednesday, July 13, 2011

Dear Khronos Group,

I'm a devoted fan of your technologies, from COLLADA to OpenGL to OpenCL. I applaud your commitment to open-source software and the free support you provide through your forums. Like academics and enthusiasts throughout the world, I admire all you've accomplished.

But professionals (excluding micro-entrepreneurs like myself) don't admire you. They appreciate Microsoft and the Visual Studio framework for software development. With Visual Studio, developers can not only access all of Microsoft's technologies but also incorporate them into professional applications. Microsoft's range of technologies can't compete with yours, but they always win in the end -- not because of their technology focus, but because of their developer focus.

Here's a case in point. I downloaded a set of example OpenGL applications from your khronos.org site. I'm impressed with how far OpenGL has come since the disastrous 3.0 release, but there's a problem: every example requires the OpenGL Framework (GLF), which requires GLUT. This is a disgrace.

GLUT was created as a teaching tool for OpenGL, and it serves this purpose well. But its features haven't progressed to a level anyone would consider professional. I've spent a lot of time evaluating different frameworks that support OpenGL rendering, but I'm not 100% satisfied with any of them. To access your technology, I need to make trade-offs in performance and capability that no Windows developer would ever worry about.

And you're in luck. Nokia, Qt's primary guardian, has joined forces with Microsoft in developing their mobile platforms. This means that from now on, Nokia's smartphones will be based on Windows, not Qt. Nokia's leadership has stated that Qt support is still a priority, but I'll bet Qt's lead developers would rather work with you than with their former sponsor.

Qt has been around for decades and it has an established developer base, so you wouldn't have to put any effort into marketing or bug fixing. All you'd have to do is integrate your technologies so developers can easily code full-featured applications with them. This wouldn't be hard. Qt already provides access to OpenGL rendering and there's even a preliminary Qt library that calls OpenCL functions. But neither of these features are perfectly accessible because no one is making integration a serious priority. If you took the reins, however, that would change.

You may think these concerns are beneath your notice, Khronos Group, but if you don't pay attention to your developers' needs, developers will stop paying attention to you.

>> Tuesday, July 12, 2011

I'd heard whispers about OpenCL running in a browser, but I figured it would take months if not years to see any real code. So I was pleasantly shocked when Nokia released a WebCL implementation that runs in Firefox. I haven't figured out how this will set the world on fire, but even if no one takes advantage of it, the technology is astounding.

I found a CNET article that discusses WebCL. It doesn't say anything particularly profound, but one conclusion is clear: OpenCL is gaining momentum.

Reviews of the desktop AMD Fusion A8-3850 have been trickling in, and they're positive for the most part. Tom's Hardware and AnandTech agree that the device is great for entry-level systems but that it doesn't compare to a full CPU/GPU combo.

I'm impressed with the technical discussion of the chip at Real World Technologies. My sole interest in the Fusion is its ability to process OpenCL kernels, so the CPU-GPU integration is a major concern for me. I'm glad that the Fusion's memory bandwidth is greater than that for a discrete GPU (8 GB/s instead of 6 GB/s), but I'd hoped for better. Intel's Sandy Bridge chips provide better integration, but from what I've seen, they don't support OpenCL yet.