Tools

CUDA, Supercomputing for the Masses: Part 18

By Rob Farber, May 27, 2010

Using Vertex Buffer Objects with CUDA and OpenGL

Using Primitive Restart for 3D Performance

As mentioned in the introduction, the examples in this article utilize an OpenGL extension called "primitive restart" to minimize communications across the PCIe bus and speed rendering. Simply, primitive restart allows the programmer to specify a data value that is interpreted by the OpenGL state machine as token indicating the current graphics primitive has completed. The next data item is assumed to be at the start of the next graphics primitive. Valid graphics primitives include TRIANGLE_STRIP, TRIANGLE_FAN, and others.

The following illustrates this process. Assume the variable qIndices contains the indexes of data points that are to be used in drawing a triangle strip:

unsigned int qIndicies[] = { 0, 1, 2, 3, 65535, 2, 3, 4, 5};

The call to glDrawElements shown below will draw seven triangles. Note: size is the number elements in qIndicies.

glDrawElements(GL_TRIANGLE_STRIP, size, GL_UNSIGNED_INT, qIndices);

The following code snippet calls glPrimitiveRestartIndexNV to specify that the value 65535 (passed via RestartIndex) is the primitive restart token. The routine glEnableClientState is then called to tell the OpenGL state machine to start using primitive restart:

Now a single call to glDrawElements using qIndicies will draw four triangles because the value 65535 tells OpenGL to act as if two separate glDrawElements calls were made.

glDrawElements(GL_TRIANGLE_STRIP, size, GL_UNSIGNED_INT, qIndices);

The advantages of the primitive restart approach are many-fold as:

All control tokens and data for viewing can be generated and kept on the GPU.

Variable numbers of items can be specified between the primitive restart tokens. This allows irregular grids and surfaces to drawn as arbitrary numbers of line segments, triangle strips, triangle fans, etc, can be specified depending on the drawing mode passed to glDrawElements.

Rendering performance can be optimized by arranging the indices to achieve the highest reuse of the data cache in the texture units.

Higher quality images can be created by alternating the direction of tessellation as noted in the primitive restart specification and illustrated in Figures 6 and 7.

Figure 6: Two strips.

Figure 7: Four fans (center marked with filled circle).

More information on various OpenGL optimizations including multiDraw (an alternative OpenGL method to draw multiple items with one call) can be found here on the OpenGl website. In particular, the primitive restart specification notes that multiDraw "still remain[s] more expensive than one would like".

A rough performance comparison using example code from this article on Linux demonstrates the speed of primitive restart compared to other techniques. Source code for the examples in this article that can utilize different OpenGL rendering techniques (selectable using preprocessor #define statements) can be found here on the GPUcomputing.net website. For clarity, the #ifdef preprocessor statements were not included in the source code provided in this example. Of course, performance results can vary depending on the machine and GPU combination as well as driver version and settings. In addition, no attempt was made to fully optimize any of the drawing methods; see Table 1.

Table 1: Approximate performance number on GTX285.

It is important to stress that these frame rates include the time required to re-compute the 3D position and color for every vertex and color in the image. This represents a worst case frame rate scenario that demonstrates the power and speed possible with hybrid CUDA/OpenGL applications. Real applications will undoubtedly deliver much higher performance by recalculating all the data only when necessary.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!