OpenGL Renderer Design

Lately I’ve been writing lots of OpenGL programs for course projects and for-fun rendering side-projects. I’ve started to notice an overall design that works pretty well for these kinds of projects, and I’m going to document it here.

This design matches closely the idea of “Model-view-viewmodel” aka. “MVVM”. If you’re unfamiliar with this term, be aware that the words “model” and “view” here aren’t talking about 3D models or cameras or transformation matrices. MVVM is an object-oriented architectural pattern, similar to MVC (“Model view controller”). MVVM maps to the design of a renderer as follows:

Model: The scene which the renderer should render.

View: The renderer that traverses the scene and displays it.

View model: The data the renderer uses internally to render the scene.

Another important inspiration of the design in this article is the relational database. While I don’t use an actual relational database like SQL in my projects, I think the principles of relational databases are useful to reason formally about designing the model in the MVVM, so I’ll talk a bit about that in this article too.

In my projects, the three parts of MVVM are split into two C++ classes: “Scene”, and “Renderer”. In short, the Scene stores the data to render, and the Renderer reads the Scene to render it. In this article I’ll first talk about how I design the Scene, and then I’ll talk about how I design the Renderer.

Scene

The “Scene” class is very bare-bones, it contains no member functions whatsoever. Treat this class like a database: It has a bunch of tables in it, the tables contain data that possibly refer to other tables in the database, you can read/write the objects in the tables, and you can insert or erase elements from tables.

In a typical renderer, you might have a table of all the materials, a table of all the meshes, a table of all the instances of meshes, a table of all the cameras, and etc. The specifics of these tables depends on your project, so if you’re doing a skeletal animation project you would also have a table of the skeletons, instances of skeletons, and so on.

I found that designing the schema of the database is a big part of the requirements gathering phase for a project. Once you nail the design of the schema, the code you have to write to manage your data becomes more obvious (“I just need to hook up all this information”). Later modifications to your code can be done by figuring out how your schema needs to change, which is made easier since the relationships between your objects lay bare all together in one place.

In practice, I implement the tables of the database as C++ containers. This allows me to use containers with the access and update patterns I need for the project I’m working on. For example, I might want the objects in a table to be contiguous in memory, or do insertion/deletion of objects in arbitrary order, or be able to efficiently lookup an object by its name.

The design of the Scene class involves many concepts from relational database design. Let’s take some time to consider how primary keys, foreign keys, and indexing apply to the design of the Scene.

Primary Keys and Foreign Keys

Table usually have a “primary key”, which can be used to identify each object in the table uniquely, and can also be used to efficiently lookup the object. When an object in one table wants to refer to an object in another table, it can do so by keeping a copy of the primary key of the referred object, a concept also known as a “foreign key”.

An example of this situation is a mesh instance keeping a reference to the mesh associated to it, and keeping a reference to the transform used to place the instance in the world. Each mesh instance would then have a foreign key to the mesh table, and a foreign key to the transform table.

Primary keys and foreign keys can be as simple as integer IDs, though some care is required if you want to avoid foreign keys becoming invalid when modifications are done on the table they refer to. For example, if you decide to use a std::vector as the container for a table, erasing an element from the vector might cause objects to get shifted around, which will invalidate foreign keys referring to them.

In order to support insertion and deletion, I use a container I call “packed_freelist”, based on bitsquid’s ID lookup table. This container gives each object an ID that won’t change as elements are inserted and deleted, and allows efficient lookup of objects from their ID. The packed_freelist also automatically packs all objects in it to be contiguous in memory, which is desirable for efficient traversal.

Another way to implement foreign keys is to update the foreign keys when their pointed-to table is modified, which might be automated with a more sophisticated scene database system that is aware of foreign key relationships. On the simpler side of the spectrum, if you know that a table will never change after initialization, or if you only support insertion through push_back(), you can be sure that your foreign keys will never be invalidated with zero extra effort. You might also be able to use a C++ container that supports inserting/erasing elements without breaking pointers, like std::set, then using pointers to those elements to refer to other tables.

There exists a lot of formalism about primary keys and foreign keys in the world of relational databases. You might be interested to explore this topic further if you’re interested in exploring the limits of this concept.

Indexing

It’s nice to have primary keys and all, but sometimes you also want to lookup objects based on other attributes. For example, you might want to lookup all meshes with the name “crate”. In the worst case, you can iterate through all the meshes looking for it, which is not a big deal if you do this search infrequently. However, if you do this search often, you might want to accelerate it.

In database-speak, accelerating lookups is done by building an index on a table. You can store the table itself in a way that allows efficient lookups, like sorting all meshes by their name, which would allow binary searching. Storing the table itself in a sorted order is known as building a clustered index. Naturally, you can only have one clustered index per table, since you can’t keep a table sorted by two unrelated things at the same time. By default, your tables should support efficient lookup from their primary key.

If you want to have many indexes on a table, you can store an acceleration structure outside the table. For example, I could store a multimap alongside the table of meshes that lets me lookup mesh IDs by name in logarithmic time. The trouble is that every index needs to be kept in sync with the table, meaning that every additional non-clustered index adds further performance cost to insertions and deletions.

The Code

To give you an idea of what this looks like, I’ll show you some examples of Scene setups in my projects, with varying amount of adherence to the principles I explained so far in this article.

Exhibit A: “DoF”

I did a project recently where I wanted to implement real-time depth-of-field using summed area tables. The code for scene.h is at the bottom of this section of this blog post.

If you look at the code, you’ll see I was lazy and basically just hooked everything together using packed_freelists. I also defined two utility functions for adding stuff to the Scene, namely a function to add a mesh to the scene from an obj file, and a function to add a new instance of a mesh. The functions return the primary keys of the objects they added, which lets me make further modifications to the object after adding it, like setting the transform of a mesh instance after adding it.

You can identify the foreign keys by the uint32_t “ID” variables, such as MeshID and TransformID. Beyond maybe overusing packed_freelist, the relationships between the tables is mostly obvious to navigate. You can see I was lazy and didn’t even implement removing elements, and didn’t even write a destructor that cleans up the OpenGL objects.

Not deleting OpenGL objects is something I do often out of laziness, and it barely matters because everything gets garbage collected when you destroy the OpenGL context anyways, so I would actually recommend that approach for dealing with any OpenGL resources that stay around for the lifetime of your whole program. If you want to be a stickler for correctness, I’ve had success using shared_ptr, using shared_ptr as follows:

A small extension to this is to wrap the shared_ptr in a small class that makes get() return a GLuint, and adds a new member function “get_address_of()” that returns a GLuint*. You can then use get() for your normal OpenGL calls, and get_address_of() for functions like glGenBuffers and glDeleteBuffers. This is very similar to how ComPtr is used in D3D programs (with a very similar interface: Get() and GetAddressOf()).

This shared_ptr thing is not the most efficient approach, but it’s straightforward standard C++, and it saves you the trouble of defining a bunch of RAII wrapper classes for OpenGL objects. I find such RAII wrapper classes are usually leaky abstractions on top of OpenGL anyways, and they also force me to figure out how somebody else’s wrapper classes work instead of being able to reuse my existing OpenGL knowledge to write plain reusable standard OpenGL code.

I should point out that since my Scene class is implemented only in terms of plain OpenGL objects, that makes it easy for anybody to write an OpenGL renderer that reads the contents of the tables and renders them in whatever way they see fit. One thing you might notice is that I also supply a ready-built Vertex Array Object with each mesh, which means that rendering a mesh is as simple as binding the VAO and making a DrawElements. Note you don’t have to bind any vertex buffers or the index buffer, since OpenGL’s Vertex Array Object automatically keeps a reference to the vertex buffers and index buffer that were attached to it when I loaded the model (It’s true!). For a better explanation about how vertex array objects work, check out my Vertex Specification Catastrophe slides.

The pinch is that OpenGL’s Vertex Array Object defines a mapping between vertex buffers and vertex shader attributes. I don’t use glGetAttribLocation to get the attribute locations, since I want to separate the mesh from the shaders used to render it. Instead, I use a convention for the attribute locations, and I establish this convention using the “preamble.glsl” file, as described in my article on GLSL Shader Live-Reloading, which allows me to establish an automatic convention for attribute locations between my C++ code and all my GLSL shaders.

As an isolated example of how to load a mesh (using tinyobjloader), I’ll leave you with some example code. I hope this helps you understand how to use the Vertex Array Object API, which is tragically poorly designed.

Exhibit B: “fictional-doodle”

This was a course project I did with my friend Leon Senft. We wanted to load a 3D mesh and be able to animate it in two modes: skeletal animation, and ragdoll physics. This project is way bigger than the DoF project, and it was also a much earlier project before I had the proper separation of MVVM in mind. Still, from a coding perspective, the project went very smoothly. Leon and I designed the schema together, and once we figured out how we were going to interface between the animation and the ragdoll physics systems, we were able to easily work on the project in parallel.

To decide how the two animation systems should communicate, we sketched a feedback loop flowchart that explained the order in which the two systems read and wrote their data at every frame, which mapped in a straightforward way to our code since the flowchart’s reads and writes corresponded to reads and writes of our scene database’s data. Below is the flowchart copied verbatim from the OneNote document we scratched it up in.

As you can see in the code at the bottom of this section, this schema has a lot more tables. You can also see some non-clustered indexes used to lookup textures by filename, using unordered_map. We used std::vector for most of the tables, mainly because I hadn’t developed the packed_freelist class yet, and because random insertion/deletion wasn’t a requirement we had anyways.

Again, this project worked with OpenGL in terms of plain OpenGL GLuints, and we didn’t go through the trouble of arbitrarily abstracting all that stuff. This project also used shader live-reloading, but it was through a method less polished than my newer ShaderSet design. Leon was working on a Mac, so we had to deal with OpenGL 4.1, which meant we had to use glGetUniformLocation because GL 4.1 doesn’t allow shader-specified uniform locations. This made live-reloading and shader input location conventions a lot more tedious than with newer GL versions.

Speaking of Mac support, it’s really painful that Mac’s GL doesn’t support GL_ARB_debug_output (standard in GL 4.3), which allows you to get an automatic callback whenever a GL error happens. Instead, you have to call glGetError() after every API call, which is really painful. To fix this, we wrote our own OpenGL header that wraps each OpenGL call we used inside a class that defines operator() and automatically calls glGetError() after every call. You can see our implementation here It’s a bit disgusting, but it worked out well for us. I think it could be improved by wrapping the function pointers at the place where SDL_GL_GetProcAddress is called, which would make the interface more transparent.

Renderer

The whole point of the Scene was to store the data of the scene you want to render, without having to specify exactly how it should be rendered. This gives you the freedom of being able to implement different renderers for the same scene data, which is useful for when you’re making a bunch of different OpenGL projects. You can easily implement a new rendering technique without even having to touch the scene loading code, which is nice.

My implementation of Renderer usually has three member functions: Init(), Resize(width, height), and Render(Scene*). Init() does the one-time setup of renderer data, Resize() deletes and recreates all textures that need to be sized according to the screen, and Render() traverses the Scene and renders the data inside it.

I usually begin the program by calling Init() followed by an initial Resize(), which creates all the screen-sized textures for the first time. The code inside Resize() deletes and recreates all textures when the window gets resized, and since deleting is a no-op for null textures it’s safe to call it even after the first Init(). After that, Render() is just called at every frame to render the scene. Before calling Render(), I might also call an Update() method that does any necessary simulation work, like if I want to run some game logic. You can also be lazy and write that simulation update code in the Render(), whatever makes sense.

Your renderer probably has to maintain some data that is used to implement whatever specific rendering algorithm you have in mind. This data is what constitutes the viewmodel. I usually don’t separate the viewmodel into its own class, but I guess you could if you somehow find that useful. The “view” itself is basically the Render() function, which uses the model and viewmodel to produce a frame of video.

Below is an example of part of the Resize() function inside the “DoF” project’s renderer (which you can see here: https://github.com/nlguillemot/dof/blob/master/viewer/renderer.cpp) As you can see, I’m deleting and recreating a color and depth texture that need to be sized according to the window, and I also recreate the FBO used to render with them. As you can see, I’m still sticking to my guns about writing plain standard OpenGL code.

The code I write inside the Scene loading and inside Init() and Resize() tries to be careful about binding. I always set my bindings back to zero after I’m done working with an object, since I don’t want to leak any state. This is especially important for stuff like Vertex Array Objects, which interacts in highly non-obvious ways with the buffer bindings (see the mesh loading example code earlier in this article.)

When writing code in the Render() function, keeping your state clean becomes more difficult, since you probably are going to set a bunch of OpenGL pipeline state to get the effects you want. One approach to this problem is to glGet the old state, set your state, do your stuff, then set the old state back. This is maybe a best-effort approach for OpenGL middleware, but I really hate having to type all the glGet stuff. For this reason, I generally write code that assumes the pipeline is in its default state, and I make sure to re-set the pipeline to its default state at the end of my work. This leaves me with no surprises. Notice that this is yet another place where I try to express everything in terms of plain standard OpenGL code, since that makes it easier to see which pipeline state is set where, which makes it easier to simply read the code and see that all the state for a block of code is set and reset properly.

As an example of Render(), consider the following bit of code that renders all the models in the scene:

As you can see, I set the state at the start of this rendering pass, then reset it back to its default at the end. Since it’s just one block of code, I can easily double-check the code to make sure that there’s an associated reset for every set. There are also some shortcuts to use, for example you can use “glBindTextures(0, 32, NULL);” to unbind all textures.

There are still some parts of OpenGL state which I don’t bother to reset. First, I don’t reset the viewport, since the viewport’s default value is mostly arbitrary anyways (the size of the window). Instead, I adopt the convention that one should always set the viewport immediately after setting a framebuffer. I’ve seen too many bugs of people forgetting to set the viewport before rendering anyways, so I’d rather make it a coding convention not to assume what the viewport is. There is no default viewport in D3D12 anyways, so this convention makes sense moving forward. Another example is the clear color and clear depth, which I always set to the desired values before calling glClear (again, you have to do this in newer APIs too, including GL 4.5’s DSA).

Another example of state I don’t bother to reset is the uniforms I set on my programs. OpenGL does remember what uniforms have been set to a program, but I choose to have the convention that one should not rely on this caching. Instead, you should by convention set every uniform a program has between calling glUseProgram() and glDraw*(). Anyways, D3D12 also forces you to re-upload all your root parameters (uniforms) every time you switch root signature (the program’s interface for bindings), so this convention also makes sense moving forward.

There also exist more automatic solutions to preventing binding leakage. If you’re able to define your rendering code as a series of nested loops (see below), then you can be sure that you’re setting all necessary state before each further nested loop, and you can reset the state all together in one place. Another way to design these nested loops is using sort-based draw call bucketing, which you can use to automatically iterate over all the draw calls in your program in an order that respects which state changes are expensive and which ones aren’t. Since that design also loops over draw calls in order of state changes, you can again have some guarantees about the lifetime of your state changes, so you can reset any state at a place that produces the least waste. This was approximately the design we used in fictional-doodle’s renderer, which you can find here: https://github.com/nlguillemot/fictional-doodle/blob/master/renderer.cpp#L86

Conclusions and Future Work

The details of all this stuff is not super important, and the amount of engineering effort you should invest depends on the scale of your projects. Since I’m usually just implementing graphics techniques in isolation, I don’t have to worry too much about highly polished interfaces. I’ve enjoyed the benefits of being able to reuse Scene code, and I’ve been able to quickly write new Renderers by copy and pasting another one for some base code. My projects that separate the Scene and Renderer have so far been enjoyable, straightforward, and productive to work on, so I consider this design successful.

One part where this starts to crack is if you’re writing a project that implements a whole bunch of rendering techniques in one project, like if you’re trying to compare a whole bunch of different rendering techniques. You can probably get away with stuffing everything into the Scene and Renderer, but there’s a certain threshold where you can no longer keep the whole project in your mind at the same time, and it becomes important to limit the set of data that a given function can access to better understand the code. You might be able to get away with grouping up the model or viewmodel data into different structs, whatever helps keep you sane.

Another problem which I haven’t addressed much is multi-threading. OpenGL projects generally don’t worry much about multi-threading since the OpenGL API abstracts almost all that stuff away from you. This abstraction is partly what makes OpenGL a good API for prototyping rendering techniques, but in the future I’d like to apply this design to newer APIs like Vulkan and Direct3D 12. I suspect that I’ll have to identify which data varies “per-frame” (in order to have multiple frames in flight), while using some encapsulation to prevent one frame to read the data that belongs to another frame, or to avoid global state in general. I also suspect that the design of Render() will have to change to allow multi-threaded evaluation (to write commands faster using CPU parallelism), something which I’m currently looking at TBB flow graph to find a solution.

Appendix A: Transformation Hierarchies

It’s commonly desired to have a hierarchy of transforms to render a scene, sometimes called a “scene graph”. In practice, you shouldn’t think too hard about the whole scene graph thing, since that mindset has a lot of problems. If you do want to implement a tree of transforms, I would recommend keeping the links between transforms separate from the transforms themselves. For example, you might have a table for all the transforms, a table for mesh instances, and a table for meshes. Each mesh instance then keeps a foreign key to a transform and mesh that represent it. To support relative transformations, I suggest creating a new table that stores the edges of the transformation tree, meaning each object in the table stores the foreign keys of two transforms: The parent and the child.

The goal of this design is to compute the absolute transforms of each object at each frame, based on their relative transforms (in the transforms table) and the edges between transforms (in the transform edge table). To do this, make a copy of the local transforms array, then iterate over the transform edges in partial-order based on the height in the tree of the child of each edge. By iterating over the edges in this order to propagate transforms, you’ll always update the parent of a node before its children, meaning the transforms will propagate properly from top to bottom of the tree. You can then use these absolute transforms as a model matrix to render your objects.

By keeping the edges between the transforms in their own table, the cost of the propagation will be proportional to the number of edges, rather than the number of nodes. This is nice if you have fewer edges than nodes, which is likely if your tree is mostly flat. What I mean is that scene tree designs can easily end up with 1 root node with a bazillion children, which is just useless, and is avoided in this design. This also keeps the data structure for your scene nice and flat, without having to have a recursive scene node data structure (like many scene graph things end up).

Then again, I didn’t apply this in the examples earlier in this article, and everything still turned out ok. Pick the design that makes sense for your workload.

Appendix B: User Interface

You probably want some amount of GUI to work with the renderer. Hands down, the easiest solution is to use Dear ImGui. You can easily integrate it into your OpenGL project by copy-pasting the code from one of their example programs. ImGui is used in the dof program mentioned earlier in this article, so you can consult how I used it there too if you’re interested.

Working with cameras is also important for 3D programming of course. Programming a camera is kinda tedious, so I have some ready-made libraries for that too. You can use flythrough_camera to have a camera that lets you fly around the 3D world, and you can use arcball_camera to have a camera that lets you rotate around the scene. Both of these libraries are a single C file you can just add to your project, and you use them by calling an immediate-mode style update function once per frame. These libraries tend to integrate naturally.

Appendix C: Loading OpenGL Functions

One of my pet peeves with typical OpenGL programs is that people use horribly over-engineered libraries like GLEW in order to dynamically link OpenGL functions. You can do something way simpler if you only need to use Core GL and the few extra nonstandard functions that are actually useful. What I did was write a script that read contents of Khronos’ official glcorearb.h header, and produces my own opengl.h and opengl.cpp files. The header declares extern function pointers to all OpenGL entry points, and opengl.cpp uses SDL’s SDL_GL_GetProcAddress function to load all the functions.

To use it, you just have to add those two files to your project, call OpenGL_Init() once after creating your OpenGL context, and you’re good to go. No need to link an external library (like glew), and no need to deal with all of glew’s cruft. You can find the files here: opengl.h and opengl.cpp. You can easily modify them by hand if you want to adapt them to glfw. Since these files are so simple, they’re much more “hackable”, allowing you to do things like wrapping calls in glGetError (like was done in the “fictional-doodle” project, as I explained earlier).

As a bonus, this header also defines GLDrawElementsIndirectCommand and GLDrawArraysIndirectCommand, which are not in the standard GL headers but exist only in the GL documentation for indirect rendering. You can use these structs to pass around draw arguments (perhaps storing them once up front in your Scene) or use them in glDrawIndirect calls.

Advertisements

Share this:

Like this:

4 comments

Awesome post! Really enjoyed it. The database approach for the Scene is a really interesting concept, I’ve come across a similar idea before while reading this – http://www.dataorienteddesign.com/dodmain/ – might be of interest if you haven’t seen it before! It’s really nice to see a concrete use case of a prototyping style renderer without trying to have everything and the kitchen sink style approach. Thank you for sharing!

Could you elaborate on the below paragraph and perhaps provide a code sample?

“To do this, make a copy of the local transforms array, then iterate over the transform edges in partial-order based on the height in the tree of the child of each edge. By iterating over the edges in this order to propagate transforms, you’ll always update the parent of a node before its children”

The algorithm to arrange the nodes in that order is called topological sort. It’s relatively complex, so it’s mostly useful as a preprocessing step for parts of the scenegraph that are static. For example, MD5Mesh stores the skeleton hierarchy in this order, which makes updating the skeleton very efficient. I haven’t thought much about if there’s a good way to do runtime updates to the scenegraph while keeping it in partial order. It’s probably possible, but it’s also not necessarily a bad idea to split the static and dynamic parts of the scenegraph and use different approaches.