Archive for the ‘transform feedback’ tag

In Part I, I covered Bridson’s “curl noise” method for particle advection, describing how to build a static grid of velocity vectors. I portrayed the construction process as an acyclic image processing graph, where the inputs are volumetric representations of obstacles and turbulence.

The demo code in Part I was a bit lame, since it moved particles on the CPU. In this post, I show how to perform advection on the GPU using GL_TRANSFORM_FEEDBACK. For more complex particle management, I’d probably opt for OpenCL/CUDA, but for something this simple, transform feedback is the easiest route to take.

Initialization

In my particle simulation, the number of particles remains fairly constant, so I decided to keep it simple by ping-ponging two staticly-sized VBOs. The beauty of transform feedback is that the two VBOs can stay in dedicated graphics memory; no bus travel.

In the days before transform feedback (and CUDA), the only way to achieve GPU-based advection was sneaky usage of the fragment shader and a one-dimensional FBO. Those days are long gone — OpenGL now allows you to effectively shut off the rasterizer, performing advection completely in the vertex shader and/or geometry shader.

The first step is creating the two ping-pong VBOs, which is done like you’d expect:

Note that I provided some initial seed data in ParticleBufferA, but I left ParticleBufferB uninitialized. This initial seeding is the only CPU-GPU transfer in the demo.

By the way, I don’t think the GL_STREAM_DRAW hint really matters; most drivers are smart enough to manage memory in a way that they think is best.

The only other initialization task is binding the outputs from the vertex shader (or geometry shader). Watch out because this needs to take place after you compile the shaders, but before you link them:

Yep, that’s a OpenGL program object that only has a vertex shader attached; no fragment shader!

I realize it smells suspiciously like Hungarian, but I like to prefix my vertex shader outputs with a lowercase “v”, geometry shader outputs with lowercase “g”, etc. It helps me avoid naming collisions when trickling a value through the entire pipe.

Advection Shader

The vertex shader for noise-based advection is crazy simple. I stole the randhash function from a Robert Bridson demo; it was surprisingly easy to port to GLSL.

Note the sneaky usage of gl_VertexID to help randomize the seed. Cool eh?

Using Transform Feedback

Now let’s see how to apply the above shader from your application code. You’ll need to use three functions that you might not be familiar with: glBindBufferBase specifies the target VBO, and gl{Begin/End}TransformFeedback delimits the draw call that performs advection. I’ve highlighted these calls below, along with the new enable that allows you to turn off rasterization:

In my case, rendering the particles was definitely the bottleneck; the advection was insanely fast. As covered in Part I, I use the geometry shader to extrude points into view-aligned billboards that get stretched according to the velocity vector. An interesting extension to this approach would be to keep a short history (3 or 4 positions) with each particle, allowing nice particle trails, also known as “particle traces”. This brings back memories of the ASCII snake games of my youth (does anyone remember QBasic Nibbles?)

Well, that’s about it! Please realize that I’ve only covered the simplest possible usage of transform feedback. OpenGL 4.0 introduced much richer functionality, allowing you to intermingle several VBOs in any way you like, and executing draw calls without any knowledge of buffer size. If you want to learn more, check out this nice write-up from Christophe Riccio, where he describes the evolution of transform feedback:

Downloads

The first time you run my demo, it’ll take a while to initialize because it needs to construct the velocity field. On subsequent runs, it loads the data from an external file, so it starts up quickly. Note that I’m using the CPU to generate the velocity field; performing it on the GPU would be much, much faster.

I’ve tested the code with Visual Studio 2010. It uses CMake for the build system.