Is this a case of glTexImage2D vs glTexSubImage2D?There's a huge difference between those two. One is a memory allocation and garbage collection call, and the other one just updates a texture.

I'm actually not sure what is actually going on OpenGL wise, as I'm using LibGDX, which is a thin layer above OpenGL. At a high level, however, I am aware that the bottleneck was because I had been storing 8 textures in the gpu (4 copies of each pattern table, one for each attribute, for background and sprites) and, when drawing the nametable and sprites, it was switching between these textures quite frequently, which is an expensive operation. So I rewrote the code which generates images from the ppu to use just one texture rather than 8...the speed boost was phenomenal, went from 23fps on an old phone of mine to 60fps.

Would it help to use a pixel shader for palette lookup so that you don't have to store eight copies of the pattern table in memory at once?

I actually do use a fragment shader to interpret the textures I'm generating, which does perform a palette lookup on a texture that is 32 pixels wide by 1 pixel tall. I actually encode the pixel value, attribute, and bg or spr in the rgb values of the actual textures. Without the shader turned on, everything looks sickly greenish/red/blue hues, haha.

I've been trying to think of a way to delegate all of the ppu decoding to the shader since that is performed in parallel on the gpu for each pixel (afaik), however the bandwidth needed for uploading 8k to the gpu every frame, though small, might be prohibitive? I'm not sure. #longtermgoals. It'd be neat if I could get that to work though, because then suddenly live CHR-RAM updates would be possible. Right now, I can only support it during transitions.

The PPU graphics data is generated pixel by pixel into a LibGDX Pixmap (only on startup for CHR-ROM based games, or during transitions when graphics are off for CHR-RAM games...live updates not supported), which is converted into a Texture, which then is converted into TextureRegions (one for each chr tile of each pattern table/attribute combination for bg and spr). Finally, Sprite objects are created from TextureRegions. I don't know what's going on at a lower level than this (never did OpenGL before, so I'd probably not easily understand what the library's doing)---just that if I were to use multiple Textures from which to derive TextureRegions/Sprites, if I'm not careful with how many times I switch between using those Textures it slows down drawing a lot. I actually was able to obtain a temporary speed up by sorting drawing by attribute so the 8 textures I had previously been generating would only get switched to once, each. Having rewritten this so only 1 master texture is used, I don't have to do this sorting anymore so it both gets me the speedup and keeps the code simple.

Premature optimization truly is the root of all evil. For most of the duration of this project, I had been executing the cpu on a background thread. Apparently on Android devices this is a bad idea. Just for the hell of it, I experimented with synchronously advancing the cpu in the render thread. Now games which were performing decently but with a lot of stuttering are now very close to being buttery smooth even on my oldest phone.

Who is online

Users browsing this forum: Google [Bot] and 7 guests

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum