My analysis and annotation of Quakecon 2013 keynote continues. As before, I want to caution you that I’m sometimes out of my depth as much as anyone. My graphics knowledge is years out of date by now, and even when it was sprintime fresh I never got as close to the hardware as Carmack does. I’ll probably make small errors or omissions in my notes.

Times are approximate.

11:30 “I think the Kinect has some fundamental limitations.”

When he mentions “latency issues” on the Kinect he’s talking about the time between the user taking some action and the point where the game knows about it. The Kinect has to analyze incoming images, identify the human being(s), extrapolate what position their body is in, and compare that to the last several frames to understand how the user is moving. That sort of image processing takes time. The Kinect 2.0 supposedly will have a 60ms latency while the original Kinect hd a latency of 90ms. For reference, I think your typical bargain wireless mouse has a latency time of about 20ms or so, meaning that even the new and improved Kinect is still three time slower than the slowest mice.

But this strikes me as being a bit beside the point. The input lag is pretty bad in a technical sense, but it’s not the real problem in my mind. The gesture itself is going to be the real slowdown. You can’t just wiggle a finger and have the Kinect understand what you want. You need to make broad, obvious body motions. Those take a long time to perform. Compare the time to wave your arm over your head compared to the time it takes to make a little micro-movement with your fingertips on a mouse. We can haggle over the 60ms input lag of the Kinect, but the real problem is probably on the scale of 200ms to 500ms. And that’s assuming the sensor registers every gesture perfectly, which it doesn’t.

This doesn’t mean the Kinect is worthless, but it does mean you’re limited as to what sorts of games to can use it for. It’s a major contributor to the Xbox One price tag, and it’s just not all that useful as a generalized gaming peripheral. It’s like having every console ship with a Guitar Hero instrument. Nice, if you’re into it. But not everyone is into it even though everyone pays for it. (And those people might just get the $100 cheaper PS4.)

23:00 “It has a lot of the messiness of Linux […] but there’s also some of the magic of it.”

He’s talking about developing for Android phones, and how the system has Linux underneath it. As always, Linux is a double-edge sword. In this specific case, he’s talking about having the source code handy when things go wrong.

How it works is this:

There are many layers of software between the game you’re writing and the actual hardware that runs it. If you want mouse input, you ask the operating system, the operating system asks the device driver, which gives you the state of the keyboard, the mouse, the graphics card, the sound system, or whatever else you need. And when I say “operating system”, keep in mind that the OS is probably a few levels deep, all by itself.

So when something crashes or fails to work as documented, advertised, or expected, then hopefully the problem is there in your code. But on some occasions – particularly on young and fast-changing platforms like mobiles – the problem is on one of those layers below your program. If you’re on a proprietary operating system like Windows, then you’re out of luck. You can’t “see” those lower levels. They’re just blocks of machine code talking to other blocks of machine code. Maybe there’s a bug in your code. Maybe the layer below you is working as designed, but the documentation is wrong. Maybe the layer below you has a bug in it that nobody’s run into before. That’s normally extremely unlikely if you’re doing something commonplace. But if you’re doing something outlandish – like cutting-edge game development that pushes the device to its limits – then you may run into problems and situations that the designers never anticipated.

But on Linux, you’ve got the source code. When a problem happens you can “step into” someone else’s code. “Step into” here being programmer talk for when your development tools show you the exact line of code that’s being run right now, and allowing you to run the program a line at a time. If something happens in one of those lower layers, then you can see the code that goes with them and understand what’s really going on. You can see the difference between a bug on your end, bad documentation, or a bug on their end.

25:00 “You can have a four millisecond difference in when you wind up waking up the next time.”

I’ve never done mobile development so I’m a little out of my area of knowledge here, but what I assume he’s talking about is calling Sleep () or the platform equivalent. If you’re developing a videogame designed to run at 30 frames per second, then you have 33 milliseconds per frame to work with. That means 33ms to process user input, make sound effects, update the state of the game, and draw the entire scene. If you take more than 33ms then you’ll have dropped frames, which makes the game feel stutter-y and uneven. (Usually only a concern with fast-paced games.)

But if you happen to take less than 33ms, then what do you do with the leftover time? If you finish everything and you still have 10ms left over, you don’t want to begin a new frame. This can lead to uneven framerates in the other direction, and can also needlessly devour CPU cycles drawing frames that the user will never see. (Which would also be a waste of battery life on a handheld, but I don’t know if it’s enough to matter.) So what you do is call Sleep (n), where n in the number of milliseconds you want your program to be dormant. You’re telling the operating system that your program should stop running, and that it should start it again in n milliseconds.

The problem he’s having is that sometimes the OS wakes the program up late. You tell the OS to wake you up in 3ms, and it doesn’t actually get around to resuming the game until 7ms later, making you 4ms late in starting the next frame. If you’re trying to run a game at 30fps, that’s really annoying. If you’re hardcore and want to run at 60fps, that’s downright scary. That’s like setting your alarm to go off at 6am, knowing that it could go off anywhere between 6am and noon.

A few seconds later Carmack mentions that this problem is probably not going to be solved by an intrepid programmer crawling down into the guts of the operating system and finding out why the OS is so sloppy about this. The problem will likely be solved by hardware improvements that just absorb the inefficiencies causing this.

29:00 “We are fundamentally creativity-bound.”

To be clear, by “creativity bound” I’m sure he’s saying “we are bound by creativity” and not “we are bound FOR creativity”. The intended meaning might be missed by non-programmers because it’s kind of a programmer thing to talk in terms of being “CPU bound”, “pipeline bound”, or “throughput bound” when describing which part of a system is limiting the performance of the whole.

Here he’s saying that further visual improvements will be driven more by what artists can do than by how many polygons we can draw. I also want to point out that I said the same thing before, and it feels pretty good to have my assertion supported by Carmack.

As someone who has done some mobile development, sleep() does indeed save battery, especially when your program is also using the GPU. Also, the phone (or pad or whatever) can be awfully hot if you are wasting CPU, doubly so since the battery also generates more heat the more power you draw. (Also, heat is not a friend of Lithium battery capacity, so there are some long dependency chains there)

And also why you can’t run web browsers on your embedded car computer. Embedded systems don’t ever need to do anything other than their single dedicated task. There’s no problem with interruptions from “outside” or conflicting code, or thread blocking, or any of that. The developer controls all the software and the hardware, and can guarantee that it will all work together.

Theoretically, this is the advantage of the dedicated gaming box as well. It doesn’t have an operating system to get in the way of the performance code… sigh.

Ooh! Zing!
Seriously though, maybe some static code analysis would help here. Like a spellchecker. I know it’s a pain, but maintaining a clean code base is worth it! (coming up soon in the annotations no doubt)

The comment on mobile isn’t about a sleep command or something similar to handing over the core you’re running on. Cooperative multitaksing (for a programmer on anything like a modern OS, including mobile) is dead, except for weird platforms (which is what game developers often get for their consoles). This means you’re working with time slices (on PC or mobile, possibly also on some consoles but I get the impression they still permanently hand over X cores to the game executable to wrangle itself) and while the code thinks it is executing in order and with no gaps the system is actually flipping your code (and the register values) off the CPU core at regular intervals and scheduling other stuff. So there is no way to know how long has elapsed between one executed atomic instruction and the next (unless you ask) and certainly no way to guarantee it. If you got a slice and then the OS had a lot of other things to schedule before you got another one then it could be 1 second later when you next get access to the CPU. Sleep() can still be used to give back the rest of your time slice early if you’re at a good point to wait but fundamentally you can’t code expecting any guarantees on timeliness.

With a console, those systems where programmers are asked for a lot but given a lot of access to do it, then you often see rules like the 360 has 6 threads over 3 cores, one core is totally for the programmer, and one thread on each other core is also totally for game code (with garbage collection/.Net VM running one thread and the OS on the final one in that specific example; the next gen system are probably handing over one CPU block of 4 cores and some percentage of the other one). Being predictable to optimise performance being how consoles compete with far more powerful but varied PC platforms and so they get to grab their cores and not worry about being scheduled off. Those cores are theirs and they can sit there and enjoy that consistency. This is a weird exception and I agree that it isn’t likely that mobiles will just allow a program to grab and monopolise some cores (even with these quad-core modern designs) and avoid the time slices that everyone else has to work with (including the OS).

An interesting extension to the ‘you may think you are running alone on a free CPU but reality is not that simple’ (so be thankful your registers are still full of what you put there and sorry the CPU cache is no longer pointing at the stuff you care about, it’ll refill) is that many mobile platforms (like Android) actually provide programs with the expectation that they won’t be able to be resumed cleanly (invisible to program). You can be asked to prepare to hide, be asked to totally stop and don’t expect more CPU cycles at any point soon, and possibly be expected to not get your RAM back if you are reactivated (although there may now be a proper paging system in place so you do get your RAM back). Basically the CPU is running on limited batteries and may well be taken away by the OS and not resumed back at all but given a new entry point and no registers (if you don’t leave when asked to). I was recently reading about using this mobile pause/resume stuff to do in-place upgrades to programs as they continued to run (sort of like the Java etc hotswapping idea but with a method called to rework the owned RAM to deal with any changes required in the update). This is totally different to the multitasking time slices but also gives an idea for how mobile feels different and programs are expecting a different set of expectations which will probably never match up with the console form of being given a CPU (set of cores) to manage yourself.

This is actually one of the bigger differences between XBone and PS4 from a development point of view. The XBone is rather like an android app in the way it shares time – the XBone OS (essentially a windows RT variant) can tell you to go to sleep at which point it is your responsibility to save your game state and wait for the restore signal. When you get the restore, you have to restore your game state and carry on (and you can also have issues with being in restricted mode where you only get 45% of the CPU time because some sod is using Skype / watching TV at the same time). But basically the game has to handle sleep and restore messages from the system.

The PS4 on the other hand makes all that completely transparent to the user. It is much more akin to time slicing on a PC CPU. You have to call a function to mark when a frame starts and ends, and any sleeping is done completely invisibly to the user. Your game may sleep at any point between frames for any amount of time, and the game will never know. The OS quietly handles the saving and restoring of the game state for you. (The one side effect of this is that you can’t have the GPU running compute shaders that take more than a frame to complete, since you have to guarantee a point at which the GPU is idle in every frame for the potential sleep call).

When I got to the “fundamentally creativity bound” part I cheered. I think he said this in many more words at the last QuakeCon as well, but it was good to hear it again.
I’d venture that this isn’t really talking about graphics necessarily either. It goes for game mechanics, sound design, UI, all of it. We have more technology than we know what to do with at this point.

I can’t remember all the details, and can’t find the relevant info with a quick search, but it was like this:

Someone (a reporter or PR person) was talking about the Kinect’s Time-of-flight camera system, and somehow made it sound like the system was reducing latency by cancelling out the travel time of the photons. I think they actually mentioned this in episode 15 of the Diecast.

It’s entirely possible that all the brain cells responsible for that particular memory died from the stupid. At least I didn’t make myself look like an idiot by explaining to DamienLucifer how far light could travel in 60 miliseconds.

I remember that. I thought it meant they were switching from video to lidar as explained by someone who had no clue what lidar was but had lidar explained to him by someone who was really bad at explaining things.

Because it sounded like someone playing telephone with a very dumbed down description of lidar.

No need for videos.Just look at any commercial GPS,and there you have a device where not only light latency,but also relativity are taken into account.And we use such things every day now.I love living in the future(though I still want my rocket car,dammit!).

You make a good point. I was thinking only about TOF within a single room of the average household.(The video I saw included 3D scanning of tomatoes using a pulse from a xenon laser to produce a brief wavefront of light, which was picked up by the camera as it passed over everything in the scene)

Maybe there’s a bug in your code. Maybe the layer below you is working as designed, but the documentation is wrong. Maybe the layer below you has a bug in it that nobody’s run into before. That’s normally extremely unlikely if you’re doing something commonplace.

It’s especially fun when “the layer below you” is the raw hardware. Or the code that the compiler generated.

Why yes, I do know people who have found both compiler (x86-64 red-zone infractions) and CPU (NDA, unfortunately, but it’s a hilarious bug) bugs before.

Edit: On the other hand, “extremely unlikely” does still apply. The chances of this happening to any given person are probably so low that you can expect to see it less than once per lifetime. Unless you’re doing something like regression testing for new CPUs or new compilers.

I also figure I should explain the red-zone thing. Each CPU has at least one calling convention — this is the set of rules that compilers use when building functions into object code that the CPU can actually run. It covers things like which function in a caller/callee pair saves which registers, how the stack may be used, where function arguments go (either in registers or the stack), in which order, etc. (I say “at least one” calling convention because 32-bit x86 had at least four or five, depending on the OS and language. Windows had at least four of them; Linux used either a fifth or one of the four from Windows, I don’t remember. 64-bit has one for each of these OSes.)

Anyway. The 64-bit Linux calling convention specifies a 128-byte chunk of the stack that any function’s code is allowed to use without doing anything to save or restore it (and which interrupt handlers and such must preserve), but which functions that this one calls will overwrite. That’s the red zone.

The bug was that some versions of the compiler would sometimes mis-calculate the end of the red zone, and would generate object code that used memory past the end of the real red zone. So depending on what happened and when, we’d see the program crash in impossible ways. (Because an interrupt — or I believe in our case, a signal, since this was all in userspace and I’m pretty sure kernel code has a different stack — would happen, and its handler would overwrite the bit of the stack past the real red zone. When the handler cleaned up and returned, some bits of the data had changed, when the function didn’t expect it to.)

“Something commonplace” can mean something very different if you’re doing embedded development though – common microprocessors are available in a few hundred different microcontrollers, and it’s ludicrously common to find an undocumented issue that’s unique to a particular revision of a particular uC (this also makes reproducibility fun – differing demand and lead times can mean that there’s a mix of revisions in the supply channels, particularly across different chip packages, so the part you prototyped with may end up having different issues than the part that goes on the board)

Bugs in the CPU core itself are rare, though, particularly given the age of some of the cores in question.

Our application would crash out after roughly two weeks of continuous running. Not exactly the same length of time, but roughly to within a day or so.

Eventually, it turned out that the memory manager in that version of XPE had a debug switch turned on that made it dump after a certain number of either memory allocations or page misses, and our application hit it at around the two week mark of 24-hour running.

MS eventually realised and gave us a patch to turn that flag off.

– It’s likely that the desktop edition of WinXP very rarely ran the same application for that length of time, and MS could (and maybe did) push out a automatic patch to fix it.

Running into compiler bugs is considerably more common though. We’ve hit two or three of those – though none on x86 so far!

I find it interesting that Carmack buys into this whole cloud rendering thing. He complains about the response time lag of something like Kinect (60ms); but its pretty clear that the physical limitations of our universe are going to ensure that cloud rendering will likely have similar input lag issues. (Though admittedly the computation cost of examining the input will be lower.)

There is one caveat: If the companies running cloud rendering are willing to spam millions of renderers to blanket major markets with them, then I suppose all bets are off. Maybe that’s what Carmack is thinking; that there is so much money at stake that spamming render farms is a potential solution?

Articles I’ve read on cloud gaming services have been generally positive, and apparently the latency is not too terribly bad. It’s worth noting that the limit for typical human perception is 80ms, and my experience with Kinect has been that actual latency has been closer to 300-800ms, while at the best of times, my internet latency(to the League of Legends servers, for example) is typically 60-100ms.

The year it came out, I saw a demo unit that had a 1500ms delay. I was fairly unimpressed.

My personal prediction is that cloud gaming will eventually gain widespread but very lukewarm adoption. It’ll be like TV or Windows Solitaire, where everyone has access to it and uses it, but nobody really celebrates it, and most gamers still buy dedicated boxes for local processing.

The bit about hitting the end of the returns on the pure graphics improvements feels like about the right time to call it. People around here are more perceptive and somewhat ahead of the curve in gaming habits and so had the pleasure of calling it early, but a PS3 game does look and run noticeably better than a PS2 game to the extent it would feel completely unacceptable, and I think even the difference between early and late PS3 games is pretty noticeable.

But we’re into the flat end of the curve now and I think we’ll just have momentum of thought left with this generation (so telling people that they should stop caring about graphics will become a thing even for mainstreamers and everyone will be happy with the answer by the time we reach the end of the cycle)