Let’s talk about these speed problems I keep alluding to. The framerate is half of what it should be. I’ve mucked about, looking for inefficiencies. I’ve made some minor changes but haven’t seen the framerate change all that much.

I suppose I’ve been spoiled by my days at Activeworlds. Back then I used Microsoft Developer Studio 6 to write my code. It was pretty old (1998) but it was the Professional Edition, which had top-notch tools for profiling performance. You could just fire up the program, run it for a couple of minutes, and then it would show you where all the time went.

This is far more accurate and convenient than manually measuring time from within the program as its running. You’ve got to put this timing code everywhere you want to measure, then you’ve got to surround it with additional lines of code to disable it when it isn’t needed. Then you have to run the program again and again, measuring performance and adding more and more clock-checks to zero in on the source of the problem.

This is an interesting look at how we perceive pricesOr at least, how I perceive them. But I’m willing to bet I’m not the only one.. I would say that $1,200, while steep, isn’t unreasonable for a robust tool like Visual Studio. You can use it to make retail-ready software. Companies can and do use this thing to make millions of bucks. But since Microsoft gives away the free version and the free version does nearly everything the pro version does, it doesn’t feel like I’m paying $1,200 for a development environment. It feels like I’m paying $1,200 for a profiling tool. And there’s no way that’s worth itSeriously. This is NOT me shaking my tin cup or beating around the bush for more donations. No matter how much money I had, I’m not sure I could ever bring myself to spend that much for the profiling tool.. I keep hoping MS will discount some old version and I can pick up the pro edition for a few hundredI do see offers here & there around the web, but they always strike me as being sketchy. It’s always some company I’ve never heard of and I’m always worried I’d be buying a bootleg version..

The point of all this bellyaching is that we’re going to have to look for our performance bottleneck the hard way. But before we get to that, I need to talk about my normal map some more. You’ll see why in a bit.

Normal maps are kind of a pain to make if you don’t have the right tools. There are plugins for Photoshop and such, but I’m using Paint Shop Pro 8. Which came out in 2003. Which means it’s older than normal mapsNot older than the IDEA of normal maps, but it certainly pre-dates the ubiquitous use of them in games by a few years.. I don’t think I’ll be getting any normal-map plugins anytime soon. I know it’s old as hell, but it does exactly what I want and it’s already paid for. Also, it doesn’t have any DRM. I literally just copy it from one hard drive to the next when I migrate to a new computer. And like I keep saying: Convenience is everything.

But even if I had modern Photoshop and a nice plugin, I’m betting it’s a hassle to use them on atlas textures. The problem is that your typical atlas texture looks like this:

That’s the texture atlas for Minecraft. (Or was. I’m sure this version is ages old. It’s just one of the first results from GIS.) And if Minecraft used normal maps then it might have one something like this:

I’m assuming that the average plugin doesn’t “get” atlas textures. It wouldn’t respect the boundaries between grid elements, which means textures wouldn’t tile right. (Or at least, their normal maps wouldn’t.) Instead it would do what we see above, which is create all these goofy edges around every element, making EVERYTHING behave like this tile with a steep lip. To avoid this, you’d probably have to put it together by hand. So if I changed the brick texture in column 8, row 1, then to update the normal map I’d have to run the plugin to make my little patch for the bricks, then copy & paste the result into my normal map.

I’m sure a proper studio with mature tools has a better way of doing this. But here at Twenty Sided we’ve got old ideas and older tools. We’re also cranky and set in our ways. So here is what I came up with:

Reduced. The original is much larger. You know, just in case I suddenly need to add a couple hundred more textures to the scene.

This is the atlas texture I’m using now. You can see there’s not much in it. I’m not even sure where all the bits came from. I’m pretty sure this is mostly a leftover from project Octant. I don’t think this project is going anywhere, so I’m not in a big hurry to make a bunch of new graphics for itNote also that this atlas texture doesn’t match what was used to create the screenshots in this write-up. I saved screenshots as I went, but I didn’t meticulously back up each and every iteration of my texture. So if you’re the sort of person who obsesses over little details like that then in this case, don’t..

I take my atlas texture and make a “relief” version of it. It’s just a greyscale image where the parts that should stick out are white and the parts that dent inward are dark.

Back in my previous project I wrote a shader to take a heightmap (in that case, terrain) and turn it into a normal map. The exact same idea applies here. Instead of reading height values, I’m reading the brightness of individual pixels. The only change I have to make is have it respect the edges of each element. So when it gets to the edge of the first little square of texture, instead of looking at the neighboring element it will look at the opposite edge of the element it’s working on. That way the elements will tile.

This doesn’t create beautiful normal maps or anything. These are kind of rough and half-assy. Sometimes the contours are too severe, or not severe enough, or the contours just don’t feel right for the kind of surface we’re tying to emulate. But that’s all fine. It’s bad from an artistic standpoint, but it’s correct from a technical standpoint, and that’s what matters here.

I probably don’t need to post the final product. I think one blob of lavender pixels looks pretty much like any other. You get the idea. I run this shader on the relief texture at program startup to generate my normal map. Then I can forget all about it and just render everything normallyPun not intended. I didn’t even notice this until the post went live..

On line four I check the flag “normal_map_ready”. If it’s false, then I run my expensive normal-building shader on the relief texture and continue on. Except, I don’t change the value of normal_map_ready, which means next frame we’ll do the whole thing again. And again.

Head. Hit. Keyboard.

Well at least we’ve figured it out. Too bad I wasted a lot of time looking for actual problem areas and overlooked this flagrant waste of power. I feel like I spent a bunch of time meticulously tuning up a car without seeing any performance improvements and then realized there’s a half ton of bricks in the back seatIt’s been too long since our last Terrible Car Analogy..

A huge portion of my graphics muscle was spent repeatedly re-creating the normal map every singe frame. A giant 2048×2048 normal map. It’s only got 8 texture elements in it, and the rest is empty space. And I think I’m only using two of them.

Well, our speed is back up and we’re free to move on. Technically I’ve completed my major goals. We’ve made stencil shadows and bump mapping. The speed is good, even when doing a rough VR approximation.

Looks kind of bare. I mean, obviously. I haven’t done anything with it yet. But hey, I just recently made that grass shader and that worked out pretty nice. Let’s import that and just drop it into this project.

Eh. Better than a poke in the eye. There are a few more experiments I want to do here before we move on. We’re not quite done yet.

Footnotes:

[1] SUPER indie. I’m so indie, I’m not even working on an actual game!

[2] Or at least, how I perceive them. But I’m willing to bet I’m not the only one.

[3] Seriously. This is NOT me shaking my tin cup or beating around the bush for more donations. No matter how much money I had, I’m not sure I could ever bring myself to spend that much for the profiling tool.

[4] I do see offers here & there around the web, but they always strike me as being sketchy. It’s always some company I’ve never heard of and I’m always worried I’d be buying a bootleg version.

[5] Not older than the IDEA of normal maps, but it certainly pre-dates the ubiquitous use of them in games by a few years.

[6] Note also that this atlas texture doesn’t match what was used to create the screenshots in this write-up. I saved screenshots as I went, but I didn’t meticulously back up each and every iteration of my texture. So if you’re the sort of person who obsesses over little details like that then in this case, don’t.

[7] Pun not intended. I didn’t even notice this until the post went live.

Have you tried holding ctrl and scrolling in/out with the mouse wheel to adjust size? You could also hold a piece of cardboard between your eyes when putting your face right int front of your screen to check the faux-VR pictures :)

Unless I’m mistaken, there isn’t any 3D effect to be had here. It looks as if Shamus is rendering the exact same scene twice, without moving the camera–the point isn’t to ACTUALLY create a 3D scene in this instance, but rather to prove that the pipeline is capable of pumping out two full renders every frame at 60 FPS (which is to say that it’s capable of running under the bare minimum requirements specified by the Oculus Rift developer’s information, though more recent guidelines set the ideal rate at somewhere around 90 or higher–rendering your scene 180 times per second requires an extremely well optimized engine).

EDIT: After looking at it for another 5 seconds, it turns out that I was mistaken. Womp womp.

Yeah, it’s there. And really opened up the scene when I tried it. Due to the textures used, you can’t really get a feel for the shape of the terrain in the screenshot with just a flat image. Go into stereo, though, and you can see EVERYTHING.

I actually came down here just to comment on how the stereo really opened the terrain up in that one shot.

Not sure how easy it is to use, but it seems Gimp has a plugin for creating normal maps at least. (although I’m sure it doesn’t understand atlases, which (might) be auto generated by something like the level editor anyway?).

Yeah, it really seems like the atlas ought to be generated by a tool that combines several texture images into one. I imagine a dedicated artist would want to work on individual texture files, and the texture editing programs seem to assume this. The tool could also resize the atlas as your texture library grows.

Yeah, the same thing with the heightmap to normalmap conversion step. This screams to me like something for Make (or equivalent) to handle for each build as and when the source files change. I guess on a larger project it probably would be, but here it might be overkill.

This is definitely how any professional tool handles it. You create n number of textures and another tool takes them all and stitches them together into a texture sheet and defines a data file that says texture n are these pixal ranges and n+1 are these ranges. This has the upshot of also accommodating non-square or non-power of 2 textures easily (which usually comes up with UI elements, not so much for game elements) since they’re all fit in a square, power of 2 texture despite their source size.

The tricky bit would be getting atlases for multiple maps (diffuse, specular, normal) to all lay out the exact same way every time so that when overlaid, they would line up. Most atlas generating tools have some degree of “randomness” (probably not really random, but random enough to the average user), meaning even simple updates could re-layout the entire texture. Either the tool would need to lay out all sheets identically through some process which probably sacrifices some level of texture efficiency and requires more front end work on the tool, or your shader would need to be able to identify and look up the pixel ranges of each texture map individually which would probably increase render times slightly per shader.

Back in Minecraft release 1.5, I was actually working on a tool to help texture pack makers modify the texture atlas without fiddly image manipulation. I’d gotten to the point where it could read a MC atlas and split things up, name them nicely, handle different sizes… and then Mojang changed Minecraft so that it builds the atlas internally. Kinda killed my motivation to finish the rest of it.

Perhaps I should have another bash at it, though – it could be adapted to other games.

Cliffski (of Gratuitous Space Battles and Democracy fame) did a few posts on optimising his games. For example here’s one about optimizing GSB2 rendering. He uses AQtime Pro which seems to be “reasonably” priced at USD$599. There’s a free trial you might want to check out as well.

“Sometimes the contours are too severe, or not severe enough”
My guess that this is due to the following problem:
If you’ve got a value at location 1 and another at location 2, then if you compute the gradient between them, (that’s almost the same as the vector from point 1 to point 2), then that gradient will be valid for neither location 1 or 2 but at 1.5 (assuming either linear or 2nd order function between them) … so for a 2D map, this will assign gradients to the center of a pixel that are valid to the left (x component) and bottom (y component) border of that pixel. (This is a problem well known in engineering numerics, where there are intricate and elaborate schemes of dealing with it, but any solution is an approximation…)

I can think of two solutions that may work on a graphics card:
* Determine the gradient at one pixel by using only the neighbouring pixels (so to get the gradient at Pixel 2, you use the values at 1 and 3). This is known as “second order central differences” and will render strong gradients a bit unsharp. As I understand, that’s what you’re doing.
* first order forward difference: Compute the gradient from four pixels that share one corner (i.e. a 2×2 square). So your two vectors now go diagonally between them, but the result will be the normal vector at the corner these pixels share. This means (and I don’t know whether that’s possible) the projection of the normal map onto the geometry needs to be shifted by half a pixel against the texture map, so that the normal map pixels’ centers are at the texture (and bump) map pixels’ corners. Of course, the bump and texture maps could be constructed from the beginning to account for this.

The third option: Use a bump map of much higher resolution (in engineering circles known as “just refine the friggin’ mesh until it goes away”). If you want to look smug, you could use a Lanczos(sinc) resizer to scale the original, to give you a smoothly enlarged version that tries to preserve steps. Compute a normal map on that and scale it down again (using simple linear resampling).

Oh, fourth: Decide that this isn’t supposed to be an exercise in optimal use of individual pixels. Sorry for lecturing, I had to.

If it was still the Nineties and I had nothing better to do than play with trueSpace, I’d probably run off now and try them all. Good times.

Yeah that texture sheet is quite old, nowadays Minecraft doesn’t have a fixed texture sheet but instead each texture has it’s own file and a texture atlas is generated as the individual textures are loaded.

This was implemented basically because of all the mods having their own textures that they wished to render and pre-1.5 they had to switch to a different texture atlas multiple times every frame to render the various mod blocks, which came at a huge frame-rate penalty.

I don’t know about Shamus, but I know I’ve certainly considered it. The reason I haven’t done it yet is that the SKD always seems to be promised as “coming soon” but never arrives. I’ve made a few tools that edit levels, but putting the effort into writing a bunch of mod code, just to have a release come out and require you to re-code everything because it’s built on a stilt and ladder mod framework…

IIRC Shamus doesn’t use regular software platforms because he doesn’t have enough control over them. Minecraft is even more problematic in this regard.

I recommended this to other developers, but if you are a small business or sole proprietorship, which you’d have to be if you ever want to put this game on steam, you’ll want to try Project DreamSpark, which is not the game Spark, but Microsoft’s program to get MSDN tools like VS Pro at a very small cost.

I got free access to DreamSpark through my community college a couple semesters ago while I was taking a Java class. There really wasn’t anything I wanted on there. Since the class was about Java, I didn’t need any of the developer tools. It gave me access to Windows 8 as well, but I’d need to be pretty desperate to install 8 when I’ve got a desktop with 7 and a laptop with Mint. From what I could there were different versions of DreamSpark with different levels of access, and I was able to get basically everything.
Sadly, I no longer have access now that the class is over. I’m taking a c++ class this semester though, maybe they’ll give me access again.

Hahaaaaa! Yes! I’m right there with you.
It kills me, going over old code especially, and finding little spots where “Huh, I’ve been calculating this twice for every element in the list, when I only needed to calculate it once… per list.”
I know exactly what you mean with manual profiling as well. I still do all my optimization this way…

You know what would be really cool? Write a bit of graphics card profiling that fills in a texture with different colors based on the different processes that are running, so you get a visual process profile display. Probably a waste of time for a one-off project like this, but it could be useful in the future, and it would be a neat way to flex your new shader coding muscles.

For a profiler, you could check out Very Sleepy. It’s a free sampling profiler – “sampling” as in it just peeks into your program a bunch of times per second and reports what percentage of time it saw that it was in function X. I’ve used in the past with some success.

Another coding buddy of mine also speaks well of Very Sleepy. Might be a bit overkill for optimizing a learning project… unless you wanted to learn how to integrate a free optimizer for use in future coding excursions.

VerySleepy is OK, but I also recommend punny named Luke Stackwalker. It has a very nice feature: hot path visualization. Also, in some cases, it simple works better than VerySleepy.

You can also try CodeXL, a free, feature-rich profiler from AMD, which replaced venerable CodeAnalyst. I haven’t tried it myself, but I have used CodeAnalyst: it was much harder to use than VerySleepy or Stackwalker, but it had a lot more features. It also includes tools for profiling GPU code (unlike VerySleepy and SW), but I would guess they only work on AMD GPUs, so it may not be useful to everyone.

I simpy LOVE to use various tools that make development easier. I also is a big proponent of static code analyzers, though so far I failed to make company I work for use one (at least, free cppcheck, but paid PVS-Studio is just GREAT).

No idea about the Windows world, but on GNU/Linux you have FLOSS profiling tools like gprof and valgrind(*). I can’t say how they hold up “comfort”-wise with “professional” tools, but I for one can work pretty well with them.

(*) valgrind is even more. It’s a full-fledged dynamic analyzing framework. I love it to bits.

gcc has flags to build a profiling version of the program instead of a normal version, which dumps the profile data to a file (I think whose name is fixed?) at program exit time. Then gprof lets you interpret the results.

I’m partial to an internal tool we have at work that dumps the results out in text form, or postscript or svg form (with the call tree; one blob per function with lines between them of a thickness that depends on number of calls in the profile data). But obviously that’s not available. I think gprof can get most of the way there though. Its “annotated source” view looks particularly helpful for finding issues like this. (As in: why are the lines in that if block executed a whole bunch of times? …Oh. :-) )

What it *does* mean is that you have to make the program buildable with gcc (…but on the other hand, that old Makefile that you have should be usable with something like cygwin? possibly? might be more time than you want to invest though).

I’ve never used valgrind, but it’s certainly possible that it’s good at this too.

Sorry to rain on your parade,but Ive just noticed another annotation bug:

When you open an annotation box,if it covers another annotation,that new annotation will be on top instead of below the box.Click on the first one,and youll see what I mean.

Its no biggie though,and I wouldnt even notice,if not for a fact that this time two annotations were close enough and big enough to overlap.At first I even thought that you did an annotation inside an annotation,which wouldve been neat.

How much time did you spend trying to find your performance bottleneck? And how often do you have to do that? $1200 is 240 hours at sub-minimum wage, or 40 hours at junior programmer wage. Just saying…

Where are you getting this $1200 figure from? On the Microsoft Store, I am seeing Visual Studio Professional 2013 for $500. The next version of Visual Studio is around the corner, though, so you may want to hold off for that.

Going with the theory that Shamus makes minimum wage (federal) (a theory supported by his statements in the last Patreon post): If Shamus spends 69 hours working on profiling with no return (and that’s a big caveat, since he can write content about his experience profiling and put that content on the website, which contributes to his income by encouraging Patreon support, but I digress) he would be able to buy the pro version of the software at $500 with the time savings. Whether he sees value in doing so is not for me to say; the education in doing stuff the ‘hard way’ has value of its own.

Of course that is just valuing Shamus’ time at minimum wage, calculated hourly. I suspect a Shamushour is worth considerably more than minimum wage. At a hunch I might put the cost tradeoff of the profiling tool closer to 20 or 30 hours – half or three-quarters of a work week – spread out over the potentially large number of projects Shamus might work on (for return or not) as an indie (who is SO indie that he’s not even working on a game). Especially because he is so interested in framerate and speed optimization due to the interest in VR!

As a kid who got outta tech college on 3d graphics in ’04 – just as normal maps were becoming a thing – what exactly is the difference between a ‘bump’ map and a ‘normal’ map, cause these posts are giving conflicting information to how I’ve been perceiving these things for like…ten years.

They’re kind of describing the same thing (describing detailed shapes of a surface). Bump maps are single values, essentially a height map. Normal maps are instead vector valued, describing the direction that point of the surface is facing. Either can be generated from the other.

As of 2017, wiki says that normal maps are actually a subtype of bump maps, bump maps being any map that adds texture to flat polys. Why the heightmap bump maps we all think of predated the precalculated normal maps is beyond me; as Shamus showed, calculating normal maps from heightmaps is expensive, you’d think that the older ones would be more likely to precalculate that rather than calculate them at runtime.

Many moons ago I remember helping someone out with a lottery generator they were doing for a C class. It would run just fine most of the time but every once in a while it would throw itself into and endless loop. After adding several “Here I am” print statements and some head scratching we figured out when the program generated two numbers that were the same it didn’t reroll the second number so the loop never ended.

On a similar note I found some ancient pascal code I wrote to generate Cyberpunk 2020 NPCs that had the seed generator inside the stat loop. It worked alright at 8Mhz but my DX4 100 (which at the time was a barn burner) would spit out the same number over and over.

The solution to that problem seems to be breaking in to it with a debugger once it’s entered its infinite loop state. I really wish schools did a better job of teaching people how to use symbolic debuggers. Recent grads keep showing up at workplaces of mine thinking that printf is the best debugging tool ever developed.

Tell me about it. I just graduated with a CS degree… literally the only time actual debugging tools got mentioned was in a presentation by the campus computing club when they came in to one of my classes and talked about GDB (we were using GCC in the class). The concept of debugging and breakpoints and so on got mentioned enough that I understood what it was at least, but all of my practical knowledge of debugging came from things I read on the internet, my own experimentation, and people at my internship explaining stuff to me.

And it just floors me, when I figure out where a problem is in five minutes just by setting a few breakpoints, and I remember that literally none of my teachers actually sat us down and told us how to do it and why we’d want to!

I figured out how to use debug messages halfway through my final computer science project before graduating college. I’m still surprised they gave me a degree at all! I don’t regret that I never got a job in “my field”; I would much rather learn the important stuff on my own time, without an entire company counting on me!

There are loads of profiling tools that are both good and free. Oprofile and gprof spring to mind. I don’t know if you can use them on Windows. Your code is portable, right? A decent profiler combined with source annotation would probably have made that RenderBlitFramebuffer call pop right out.

Unfortunately, Pro doesn’t include the profiler these days either; you have to get up to Premium for that. It is a nice tool, though :-)

Besides the “Very Sleepy” recommendation posted, above, I’d also recommend looking into xperf and WPA (windows performance analyst), both of which are part of the free windows SDK. They are considerably improved in the Windows 8.1 SDK, so you definitely want that even if developing on/targetting older windows version (multiple SDKs can be installed simultaneously).

It’s not quite as nice a profiler for pure code microoptimization as the Visual Studio one (or at least if it is, I haven’t mastered its powers), but it’s way better at general “why is this slow?” questions, because it has visibility into lots of *other* bottlenecks that the Visual Studio CPU profiler ignores (GPU, I/O, thread scheduling, lock contention, etc). And it doesn’t cost $3000…

This is why when I write code like this, I write the IF statement, then the code to update the flag, THEN the worker code… has saved me plenty of times when I’ve gotten distracted with code somewhere else.

If you’re set on manual profiling, you might want to look into boost::timer. At it’s simplest, you declare a timer object at the beginning of a scope, and when it goes out of scope it spits how much time it’s been around to a stream. It’s portable, and it uses the highest resolution timers available on the system it’s compiled for.

Then you can have all the fun of writing a program to analyze your profiler stream!

Don’t know if it’s already been brought to your attention Shamus, but annotation links show through annotation boxes, at least on a setup like mine. Here‘s a screenshot. I’m running Firefox 30.0 on Win7.

Not that it’s anything more than a nuisance. I just got bummed because it wasn’t an annotation inside an annotation.

I use cross-eyed 3D on your VR image and it looks nice. I basically cross my eyes until the two images overlap, then if I wait a few seconds, my brain kicks in, thinks it’s seeing actual stereo vision and voila, I see 3D. :D

When working on tiles, instead of loading one huge image of tiles, I use layering, each tile being in it’s own layer. I actually got a plugin which then takes all the layers and makes a tiled image from them. I’m not sure if you could do that with paintshop. You really should look into using free tools. Compiler/IDEs like Code::Blocks which uses MinGW32 to compile has profilers and tons of other tools with it that work great, they are always totally free and even better, you can compile your program for Linux if coded right. But even so, the fact that once you get used to using this, even if just for windows, you know any future advances, you will be able to download up to date versions (or even a different IDE as you’re compiling with MinGW (GNU compiler). Code::Blocks can import Visual Studio projects as well, so it might be something for you to consider.

For graphics there’s GIMP, which is also free and has many of the features of Photoshop, with plugins etc… I try to stick with free tools as much as possible, and then show others who want to get started, that they can do it without any cost to them, but with quality tools. There’s youtube videos showing how to do various things with GIMP for anyone interested. (A quick search found this tutorial on normal map creation using GIMP: https://www.youtube.com/watch?v=8Jdo3ZmtPWk)

While I agree it’s hard to justify spending the $$$ for the incremental improvements that Visual Studio Pro provides over the free version, I’m personally tempted simply because another thing that doesn’t work on the free version is nVidia’s nsight environment, and it’s definitely been handy for me figuring out OpenGL performance issues (it also works with DirectX). Of course if you’re not working with nVidia hardware, it’s less tempting.