Problems getting on the fast path in OpenGL

I'm trying to use glTextureRangeAPPLE and GL_TEXTURE_STORAGE_HINT_APPLE to cache textures on the video card as per Apple's sample code, but I don't seem to be getting any benefit. My game is 2D, with scrolling tiled levels in front of a static sky image, and what's being drawn each frame is the screen filling sky, followed by the level's background tiles that are currently visible, and finally the game objects and text.

The sprite image contains all the sprites for the objects in the game, the background image contains tiles for a single level, and the sky image contains a single large image that is displayed behind the playfield. I read somewhere that for the texture caching to work correctly I need to make sure my texture image dimensions are multiples of 32 bytes, so I set them up accordingly. All three of these images are stored in a contiguous buffer as per Apple's texture range sample. My text library doesn't use the texture range extension since the textures it creates are of variable size. Everything is drawn with simple quads.

I'm shooting for a 60fps frame rate, and each frame the sky is drawn to fill the screen with a single quad, and then up to ~400 background tiles, about 60 objects, and around 30 text textures. I've used timing code in various places and the amount of time spent in game logic and issuing OpenGL commands is negligible, but flushing the buffer seems to be taking long enough to interfere visibly with the framerate. Sometimes I'll get a smooth 60fps, and sometimes it'll drop to 30fps for a while. Scene complexity doesn't seem to be a factor.

If I comment out the texture range and caching stuff, nothing seems to change, so it looks like I'm missing something and not getting on the fast path to begin with.

Here's some relevant code from my NSOpenGLView subclass. The sky and background images are reloaded for each level in a different method, but they are done the same way as the sprite image, just into the appropriate part of the texture buffer.

I'm using a Dual G5 with a Radeon 9600 (cheap replacement for the faulty original card), and I also test on a 800MHz iMac G4 with a GeForce 2 MX. The iMac stays around 30fps. I can't imagine that my game's visuals are complex enough to chug like this even on crappy video cards like mine. :P

So, anyone ready to hit me over the head with the obvious? This is my first OpenGL project, so be gentle. :P

1) The texture range / storage stuff is a performance path for texture upload, not texture rasterization. If you're continually updating texture data (i.e. glTexSubImage2D) then it will help. If all of your texture data is static, which is what it sounds like from your description, then the default GL path will already do what you want (move textures into VRAM on demand, and rasterize from there.)

2) quick sanity check-- compare performance with this tilescroller. See the about box for key commands, in particular try increasing the number of parallax layers until your machine can no longer sustain 60 fps. Every layer here is bilinear filtered & blended, so this is basically a fill rate test, although the edges of the layer are continually updated via texSubImage (and yes it does use texture range etc.)

3) did you try the obvious? Shark? GL Profiler? Maybe there is something else going wrong you don't know about yet.

Ah, I didn't realize the texture range optimization was just for the speed of texture uploads. I think the fact that the texture range sample used the same image over and over made me think it was just allowing it to render faster. Oops.

At any rate, I've seen your demo before, (nice, btw!) but I fired it back up and played with it some more, made the window big, tried it with and without lots of parallax, and discovered that it does the exact same thing my game does! It will scroll smoothly for a while, then it will get choppy for a while, then go back to smooth. In your about box, you mention dropped frames due to NSTimer losing sync. So perhaps this is the issue?

My NSTimer fires at 120Hz, but the game updates at 60fps. It doesn't seem to matter if I set the timer frequency to 60Hz, any other multiple of 60Hz, or even randomly chosen numbers, I still get the choppy/smooth effect. I take it there's no workaround for this since there's no VBL interrupt? :/

Also, is there a good newbie reference for using GL Profiler? The times I've used it I couldn't make much sense of the data I was getting. I'm still new to OpenGL. :P Shark tells me that the bulk of my program's time is being spent in system routines that I can't optimize, and my timing code indicates that the amount of time spent on game logic and issuing OpenGL commands amounts to maybe a couple thousandths of a second per frame, and that the bulk of the time spent in each frame is in the flushBuffer call.

There has been a lot of discussion about this kind of stuff in the past, but I haven't seen it come up in a while. I have fought with that intermittent stuttering problem many times myself. There are several different ways to approach it. The very first thing to remember is that you must set your rendering to synch to VBL for release versions to avoid visual tearing. One of the side-effects of synching to VBL is that framerate will not be consistent across different display setups. LCDs will be *almost* 60 Hz, and CRTs could be many different values. Getting the VBL rate isn't very hard as I recall, but trying to set up a timer to fire at that rate never worked consistently for me because of timer drift. The timers just aren't accurate enough to guarantee everything matches up perfectly, which is where the majority of the stutter issues come from. The only thing that is perfect is VBL synch, but you'll need to overdrive the timer to fire at 1kHz for that to be completely smooth (which isn't actually a problem, as I'll get to later).

Now, taking the fact that rendering must be done at the VBL rate is fine, but decoupling updating from that doesn't work perfectly that I've seen. Your update code looks like the best approach I know of in a single threaded environment. I was doing it almost identically to yours and also had intermittent stuttering problems that I could never quite get rid of. I have yet to see a stutter-free implementation like that. In fact, I was so disappointed/frustrated with every approach I tried, that I finally bit the bullet and went multi-threaded, and it works great with no intermittent stuttering!

IF you want to give threading a try, here are some of my thoughts on it:

So for many months now (maybe a year) I've been using the threaded approach and have been extremely pleased with the results. The main thread handles all the Cocoa interface and system interaction and OpenGL rendering. I have the rendering code (which is in the main thread) set to be called by a 1k timer for maximum smoothness (VBL synch automatically blocks extra calls from the timer from rendering more than the VBL rate so there's no efficiency lost there). The sim thread (update thread) is completely decoupled from the main thread and is very consistent on timing which is called from its own timer. You could do rendering there too, but I chose to keep all rendering code in the main thread for simplicity. The update thread only handles physics, collision, game logic, etc. The rendering thread merely takes a `snapshot' of the game data at any given time when it needs to draw. The one and only serious issue that I've had with it so far that I can't seem to lick is that the sim thread sometimes doesn't run. I'll get that figured out one of these days... Anyway, it seems that the best all-around rate I've found for the sim thread timer is 110 Hz. Why that is, I don't know, but it seems to be the smoothest rate to synchronize with whatever VBL rate is going on at any given time on any given display setup; although I haven't exhaustively tested it everywhere.

All that being said, there are several issues that one needs to be aware of when mult-threading. I could go on and on about this for quite a while, but I'll stick to the big ones here. First and foremost is that you can't call any Appkit routines from anywhere but the main thread, which generally isn't much of a restriction in practice, that I've found. Debugging threaded obj-c stuff (or threaded anything for that matter) can be really really tricky sometimes, so the best technique to stick to is to build very frequently when adding new experimental code. Once you've established some `safe' territory though (code and techniques that you *know* are thread safe and tested and work), it's smooth coding. I would say that if a person isn't very experienced with tracking down Cocoa bugs in general, forget about threading. The debugger quite often has no idea what crashed or why it crashed. BAD_ACCESS errors are quite often missed, and when they aren't, they can often be extremely confusing to track down. The debugger can easily wind up with a crash on one thread and the bug is in another one entirely and you'll have absolutely no idea whatsoever why it happened. Worse, many times the debugger and the log will spit out error messages that will take you off in the completely wrong direction and have absolutely nothing to do with the real cause. Like I said, build often when adding fresh code which isn't known to work solidly when threading.

There is a whole bunch more I could rattle on about here, but I'll spare the space. The last thing I can say about multi-threading games like this is that it really isn't for novices. However, you don't need to be an expert either. And threading isn't a nightmare either, as some would say. It is a realistically usable approach that can offer excellent performance advantages, but there are some relatively small trade-offs in the ease of coding and debugging department.

BTW, arekkusu, I downloaded that tile scroller a while back and have to say it's really cool!

Thanks for the input guys, I do set the NSOpenGLCPSwapInterval to use VBL sync, and the problem occurred even with nothing else running and no significant background processes. AnotherJake's explanation makes a lot of sense.

I guess I'll start multithreading a bit earlier than I intended. I figured I'd give that a stab on my next project, since my current game isn't MVC, but I'm already using this project as a learning experience for OpenGL/OpenAL and a Cocoa refresher (it's a port-up of a fifteen year old shareware game I wrote), so I might as well cover all the bases. From what I've read on threading already, interface/rendering on the main thread and game logic in a separate thread seemed to be the most sensible separation of workload, and now you've confirmed that it even has framerate benefits. Sounds like a winner to me.

What I need to find is a good explanation of how to do a MVC design properly in the context of a game. Should my game data reside within the model only, and be requested by the viewer as needed? Should that data be double buffered somehow? Should I make an external data structure of the coordinate data etcetera that the viewer will need each frame? Currently each object in the game has methods to both update and draw itself, and I need to change that to a proper MVC design before I touch threading. Guess I have some more research to do. Hey, anything to put off working on the art/music. :X

Well, if you already have your draw and update code separated per object, that's enough to thread right there.

The model is your scene graph (linked list of entities, entity database, array, mutable array, whatever you use to organize your scene) and the update code for each of those entities. The view is all your rendering code (your renderer) that reads from the model and draws it. The controller in this case is the NSOpenGLView because it orders the model to be loaded and to update itself, and it also orders the view to draw the model -- it `controls' and manages their operations and timing from a higher level view. So when you think about it, there are really two layers of MVC going on in a game: the app level MVC which is all the Cocoa UI stuff, and the lower-level game MVC contained within the NSOpenGLView of the Cocoa UI layer. Well, that's how I look at it anyway...

How much separation you have between your rendering code and your updating code is really up to you. Like you mentioned, there are different approaches to think about (I'm kind of repeating what you mentioned here). You could have a draw method alongside your update method in each entity class (I think this is least desirable). Or you could have your draw method in a superclass primitive (like a sprite or mesh) and your update in the subclass of it. Or you could have your drawing code completely separated in a generic primitive that your object instantiates and merely references, which is what the view (renderer) calls when it needs the entity to draw itself, which is what I've been trying lately. Or you could even have all your rendering code in one rendering class which knows how to draw each type of entity, which I've done in the past with good results. One really big benefit of having all rendering code in one class is that you can switch out the renderer to be something else like maybe a software renderer or another rendering api entirely, like DirectX, with ease. A drawback of the one big renderer is that it doesn't really follow along the OOP paradigm as much as one might desire.

There are really several different ways to go about this. The main thing to keep in mind is simply that no drawing whatsoever happens during update, and conversely, no data manipulation happens during drawing -- that is the crucial point of separation which must be maintained at a minimum.

The need for double-buffering data hasn't really come up yet with any of my projects, although I suspect there will be issues with dynamic geometry later on. I think I would be much more willing to try using semaphores first if concurrent data access problems come up with that. But as far as simple rotations or positions of static geometry I have seen no need for double-buffering of data yet. I was initially worried about that too. It's a weird concept to think about, but it's like visual tearing, only it's in 3D. Some objects may be updated for the current frame and some might not, but the speed at which all of this is happening (60 fps) seems to eliminate any noticeable anomalies. [edit] And actually, I've tested scenes with so many objects that they slowed the update down to below 10 ticks per second and I still didn't notice any positional irregularities. Of course the rendering thread was still cranking along at 60 fps because it wasn't hindered by the update's slowness, so if there were irregularities, they weren't there for long enough to see them. [/edit] I can imagine that with dynamic data you might see some strange out-of-synch deformations though, but I haven't gotten there yet with my multi-threaded projects.

[edit] Actually, I hadn't really thought through the dynamic geometry thing yet. However, on second thought, it might not be an issue at all since it might be better to update the mesh geometry itself in the rendering thread based upon basic parameters, like the current morph position, given from the update thread. Hmm... [/edit]

I don't want to dissuade you from multithreading if you're willing to deal with that extra complexity, since it is a performance win in general (with all machines today being multi-core.)

But, if your problem with occasionally dropping frames is due to NSTimer inaccuracies, or beat frequencies between NSTimer and the VBL, then another avenue worth exploring is a Core Video DisplayLink. I think if you ignore all the discussion in the documentation about grabbing video frames from QuickTime, you can set up a high-accuracy timer thread which will call your render routine once per VBL, with only a few lines of code.

That definitely looks like it's worth exploring if one doesn't want to go multi-threaded. Had I known about that last year then I definitely would have tried that first! In fact, I *am* going to give it a try soon. Having an optional non-stuttering, non-threaded path that could be enabled/disabled on the fly would be really useful for testing, and possibly even as a workaround release option in the event that a miserable threaded bug couldn't be squashed on time. It would also be nice to be able to turn off threading temporarily to help identify any bug associations arising from it being threaded.

arekkusu, since your suggestion was a very simple change, I went ahead and set up a display link with a callback function that just calls setNeedsDisplay in my NSOpenGLView, and commented out the one in my update function. It works just fine, but I still get the periods of stuttering and smoothness. Is there more to it? I still have the NSTimer in the application controller object driving the game logic updates. Should the display link be driving both?

AnotherJake, my update code and my drawing code are in completely separate methods in each object (and I do in fact have the drawing method in an abstract superclass that all game objects are subclassed from), so I guess I'll start reading the thread programming guides and get started.

The game data double buffering idea occurred to me because in some of my object update functions, the coordinates of the object get changed, then it checks to see if there is a collision with the maze, and if so, the coordinates are adjusted again. So even with the kind of meta-tearing you're describing, I might have the odd flash of an object embedded in the wall if the rendering occurs at the wrong moment. Of course, I could easily use temporary variables and not update the actual object coordinates until they were final, but that would be too sensible. :P I guess I was just thinking of giving the renderer the cleanest data set possible every frame, without running into too much thread blocking. In any case, I'll just try it as is, and see how it looks.

Thanks for the thread/OOP suggestions, they make me feel a bit more confident that I'm not running pell-mell in the wrong direction as it is. :X

Sea Manky Wrote:The game data double buffering idea occurred to me because in some of my object update functions, the coordinates of the object get changed, then it checks to see if there is a collision with the maze, and if so, the coordinates are adjusted again. So even with the kind of meta-tearing you're describing, I might have the odd flash of an object embedded in the wall if the rendering occurs at the wrong moment. Of course, I could easily use temporary variables and not update the actual object coordinates until they were final, but that would be too sensible. I guess I was just thinking of giving the renderer the cleanest data set possible every frame, without running into too much thread blocking. In any case, I'll just try it as is, and see how it looks.

I haven't had to do one single thread block at all yet (no semaphores or mutexes, etc). The rendering thread simply reads what's there at the moment -- data synching be damned! Like I mentioned, I too have definitely been thinking about the double-buffering thing for the `clean dataset' as you say. But with the testing I've done using Newton for physics, I have yet to see one out of place object, which actually surprised me. So yeah, definitely try it without first. Be sure to let us know if you find any whackiness and need to double buffer, because I'm interested!

I second the guess that doing setNeedsDisplay would likely kill the displayLink benefits, but I haven't studied it closely yet. Should be able to shoot directly to the rendering from the timer callback.

Hmmm, but when I try calling drawRect directly from the callback, I get a bus error. I must be calling some AppKit routines or something in my draw code that is killing it since it's being spawned off a separate thread. I'll have to do some digging.

Edit:
Okay, figured it out. I was neglecting to set the context from within the display link spawned drawRect. Thing is, I still get the stuttering.