Montag, 30. März 2009

For the ones of you who cannot run the demo for some reason, I just captured a short video of it. You can watch it below in the window or download the larger version with better quality to see more details.

Samstag, 28. März 2009

Today I would like to share a couple of interesting references about optimizing CUDA. There are many similariries among these presentations, but still its interesting as reading through give you new ideas about whats possible.

Looking at foil 3 of the first presentation, using the GPU should give an average speedup of factor 10 compared to the CPU in case the algorithm can be fully SIMD parallized. ( GPU: GTX280, 933GFlops/141.7 GB/s Mem, CPU: Intel Core 2 QX9650, 96 GFlops/12.8 GB/s Mem).

Now looking at NVidias CUDA page, I am often surprised to see that some algorithms seem to have been sped up like 100x or even more, compared to CPU - this seems to be rather hard to believe, taking the numbers above into account.

Scene is this time the complex version of the one shown in the pictures below(spherescape_complex.rle4).

Reason for the low CPU performance is mostly due many floating point operations I guess. Changing the calculations to Integer might improve the speed. Now its the most possible fair comparison however, since CPU and GPU get the same c++ code to execute.

Mittwoch, 18. März 2009

Small update - the demo linked below now also includes 2xAA (not 2x2!), reducing the aliasing of distant pixels significantly. On the GTS 8800 its quite slow right now, but on the GTX285 its almost no difference to the normal version I found.For the GTS perhaps I will think about only applying AA to distant geometry to increase the speed.

Dienstag, 17. März 2009

Today I finished shifting the ray generation part to the GPU, saving another 1-4ms as well as an unnecessary memcopy. Also silhouette-smoothing is working well, together with basic anti-aliasing ( so far only for GTX2xx cards ).

As for the smoothing, I tried two variants (left), and found the one in the middle looks best so far. The unsmoothed original (top) is too edgy and the one on the bottom smoothens too much for the tree-scene which lets near rendered geometry look like a 2D impostor.

View distance set to 4.000.000 - still interactive (18fps). To have unique voxels everywhere is a problem in this case however.

Here we can also see an advantage of the RLE structure - its very easy to generate procedural mountains. With octree-raycasting it might be possible too, but right now I dont have an idea how this could work easily.