Mathematica‘s GPU programming integration is not just about performance. Yes, of course, with GPU power you get some of your answers several times faster than before—but that is only half the story.

The heart of the integration is the full automation of the GPU function developing process. With proper hardware, you can write, compile, test, and run your code in a single transparent step. There is no need to worry about details, such as memory allocation or library binding. Mathematica handles it elegantly and gracefully for you. As a developer, you will be able to focus on developing and optimizing your algorithms, and nothing else.

Here are a couple of examples to give you a taste of the upcoming feature.
We have often been asked whether it is possible to do 3D volumetric rendering with Mathematica. It’s a natural question, considering that Mathematica has supported DICOM format for a while. There have been a few solutions, but nothing was perfect nor fully integrated with Mathematica—until now. With the power of GPU, you can embed interactive volumetric images into your notebook.

Another example is ray tracing. The following image shows a ray-traced Quaternion Julia set using GPU. We can perform real-time ray tracing and maintain an interactive frame rate on consumer-level graphics hardware.

Of course, this is just the tip of the iceberg. We chose these examples simply because they look beautiful. You will be able to see more examples and applications during the NVIDIA conference, so again, stop by booth #31 if you get a chance.

Not attending the NVIDIA conference, but interested in learning more? There’s still time to register to attend the Wolfram Technology Conference 2010 taking place in October and learn how you can tap into the power of GPU in Mathematica.

Impressive. Obviously, relatively to Mathematica 7.0.1, the software needs to massively speed up – and perhaps improve other things – if it is going to be useful for the creation of animated graphics etc. – something that Mathematica is, in principle, designed to do nicely but something whose usefulness is greatly reduced by the current speed, GPU-ignoring, parallelism-disabling limitations.

I have no doubt that impressive things can come of this, but the real question is how hard will it be to code for massively parallel systems? Mathematica has long had the ability to run on multiple processes/cores, but almost nobody ever does because it has always been a hassle.

So Jan and Lubos hint at the issue that is of more interest to most of us. It’s nice to have some sort of support for GPUs in Mathematica, but this is some what specialized, and what most of us care about much more is much better utilization of the parallelism in existing machines — for something other than dense machine precision linear algebra and FFTs.

Which raises the real question. Obviously, at a 1000 mile level, Mathematica provides a perfect environment for programming GPUs — express what you want via Table or Map or some similar such primitive, and have the system run each sub-calculation in parallel on one of the GPUs execution engines.
Of course it’s not quite that trivial, because there is the problem of exactly what each of those sub-calculations looks like, and (right now) they usually boil down, eventually, to pattern matching inside the Mathematica Kernel, which has not (so far?) been parallelized.

So the real issue of interest to us is: to get this to work, did you simply slap on a band-aid (somewhat like the ParallelMap/ParallelTable/etc), or did you do the low level work to make sure that most “reasonable” Mathematica code will be dramatically sped up?
AND if that is the case, does this mean we will see this same technology applied to the rest of Mathematica.
Even apart from parallelizing the kernel (admittedly a hard problem) there is a lot of low lying fruit that, even in 7, was not being picked.
Obvious examples include
- using SSE/AVX when there will be a lot of parallel computation (eg in filling in Table or in numeric integration, or in various plots (esp 3D and density type plots
- likewise at a higher level using multiple CPUs for numeric integration and plots, and anything that involves searching over a large space (NMinimize, numeric integration) and to generate large quantities of randomness and then random numbers, etc etc
- at the very least, give us a version of ParallelMap etc that all see a SINGLE Mathematica kernel so we don’t have to go through this nonsense of exporting the relevant definitions to the other kernels (not to mention the wasted memory).

Obviously changing things for parallelism means that various items will break. Things broke frequently with early upgrades to Mathematic and you know what — it was worth it. If the user wants to write weird code that involves the update of a global by every successive call to a function being plotted, or whatever, give them the means to do so — but the bulk of us writing “normal” code shouldn’t have to be held back by such weird edge cases.

So there you are — that’s what would REALLY excite us. Not being told that we can write to CUDA for some set of problems of interest to but a small fraction of us, but being told that Mathematica 8 is the Parallelism Edition, and that for pretty much anything it will run at a factor of N faster, where N is your number of CPUs, with, for many purpose, a further factor of 2 faster from use of SSE (with a factor of 4 coming with AVX in SandyBridge).

Just read the whitepaper; to be honest, I’m less than impressed. All the additions are nice, but that now “users are not exposed to low-level complexities of GPU programming”, as Herbert mentioned above – please, gimme a break… So we are spared now with managing data transfers between CPU and GPU memory, and some rudimentary Mathematica functionality (basically, BLAS and FFT, as well as some image operations) is now CUDA enabled. But for the most of time we’re still supposed to write CUDA kernels by hand, and if I have to do so, I’d probably keep doing everything else by hand – this other stuff is not that hard after all, and this way at least I have all the other tools (debugger, profiler etc.) from the CUDA C SDK at my disposal… So, again: this is all great and welcome, but definitely AccelerEyes guys with their Jacket plugin for Matlab have a strong edge here.