A blog by Michael Abrash

Next week, I’ll be giving a half-hour talk at Game Developers Conference titled Why Virtual Reality Is Hard (And Where It Might Be Going). That talk will use a number of diagrams of a sort that, while not complicated, might require a little study to fully grasp, so I’m going to explain here how those diagrams work, in the hopes that at least some of the attendees will have read this post before the talk, and will therefore be positioned to follow the talk more easily. The diagrams are generally useful for talking about some of the unique perceptual aspects of head mounted VR and AR, and I will use them in future posts as well.

The diagrams are used in the literature on visual perception, and are called space-time diagrams, since they plot one spatial dimension against time. Here is the space-time diagram for an object that’s not moving in the spatial dimension (x) over time:

You can think of space-time diagrams as if you’re looking down from above an object, with movement right and left representing movement right and left relative to the eyes. However, instead of the vertical axis representing spatial movement toward and away from the eyes, it represents time. In the above case, the plot is a vertical line because the point isn’t moving in space over time. An example of this in the real world would be looking at a particular key on your keyboard – assuming your keyboard is staying in one place, of course.

Here’s the space-time diagram for an object that’s moving at a constant speed from left to right, while the eyes remain fixated straight ahead (that is, not tracking the object):

It’s important to understand that x position in these diagrams is relative to the position and orientation of the eyes, not the real world, because it’s the frame of reference of the eyes that matters for perception. It may not be entirely clear what that means right now, but I’ll return to this shortly.

Here’s a sample of what the viewer might see during the above space-time plot (each figure is at a successively later time):

A real world example of this would be tracking a light on the side of a train that’s passing by from left to right at a constant speed.

Before looking at the next figure, take a moment and try to figure out what the space-time diagram would be for a car that drives by from right to left at a constant speed, then hits a concrete wall head-on, while the eyes are fixated straight ahead.

Ready? Here it is:

These diagrams change in interesting ways when the viewer is looking at a display, rather than the real world. Pixels update only once a frame, remaining lit for part or all of that frame, rather than changing continuously the way a real-world object would. That means that if a light on the side of a virtual train moves past from left to right on a full-persistence display (that is, one where the pixels remain illuminated for the entire frame time) while the eyes are fixated straight ahead, the space-time diagram would look like this, rather than the diagonal line above, as the train’s position updated once per frame:

The above diagram has implications all by itself, but things get much more interesting if the eyes track the moving virtual object:

Remember that the spatial dimension is relative to the eyes, not to the real world; the x axis is perpendicular to a line coming out of the pupil at all times, so if the eyes move relative to the world over time, the x axis reflects that changing position. You can see the effect of this in the above diagram, where even though the virtual object is being drawn so that it appears to the viewer to move relative to the real world, it’s staying in the same position relative to the eyes (because the eyes are tracking it), except for the effects of full persistence.

These temporal sampling effects occur on all types of displays, but are particularly important for head-mounted displays in that they creates major new artifacts, unique to VR and AR, that in my opinion have to be solved before VR and AR can truly be great. My talk will be about why this is so, and I hope you’ll be there. If not, don’t worry – I’m sure I’ll get around to posting about it before too long.

It might help to have some simple animation to go along with these. A dot moving down the time axis at a constant rate, with another “first person perspective” frame to the right that shows what it really represents.

Actually, CRTs are not (as) affected, since they only light up each pixel for an instant and then turn black. As you track a moving object on a CRT, your eye won’t actually “see” anything between refreshes. Nothing but a ghost-image stuck on the retina, that is. And, since the eye only moved over the actual visible image for an instant, that ghost image will be a lot sharper than that of a frame blasted over the entire “frame-time”.

Most CRT phosphors are still too slow for this retinal blurring to be non-existant, but i’m guessing OLEDs could solve this completely – you’d simply turn them off between frames (which is entirely possible since they’re ridiculously fast).

CRTs certainly are a lot better than, say, LCDs. However, they don’t actually light up for an instant; there’s a considerable amount of persistance, often several milliseconds, which varies depending on the phosphor type.

The real problems with CRTs is that they’re bulky – not good for head mounting – and they’ve pretty much been displaced by other display technologies, so there’s not much available to work with.

OLEDs can switch very rapidly. However, they aren’t that bright, and if you crank that up, it could reduce lifetime. Nonetheless, OLEDs are promising for reducing persistence.

Of course, if you reduce persistence enough, you get a new class of problems. I’ll talk about that a bit at GDC, and post about it here later.

It seems like you could emulate the CRT-like persistence with an active shutter system. Like used in 3d glasses. Let the shutter stay open just long enough to sample once per frame.
Though I suppose this will open up the reduce persistence can of worms.

“These temporal sampling effects occur on all types of displays, but are particularly important for head-mounted displays in that they creates major new artifacts, unique to VR and AR”
Not so unique, really. It’s something that needs to be taken into account when shooting TV and film, and changes depending on the capture device and display device (e.g. shooting on film to be displayed on a film projector has different constraints than shooting with a videcon tube and displaying on an LCD). Charles Poynton has a nice little rundown here: http://www.poynton.com/PDFs/Motion_portrayal.pdf

It’s absolutely true that TV and film have similar types of issues, and I will mention film judder in my talk, but I would say that VR and AR truly do have unique problems resulting from the combination of rapid relative velocity between the eyes and the display and from the expectation of the perceptual system that the virtual images will remain in the correct positions relative to the real world.

Great plan. I have one suggestion:
I think it’s easier for people to understand time as “left-to-right”. Perhaps, in your example you should represent vertical motion. The added benefit is your example would be a heavy object falling to the ground instead of a wall. The average person likely expects an object to bounce off a wall. I think that expectation further confuses your example.

I always find it interesting how we try to mash four dimensional principles into a two-dimensional visual diagram to better understand it.

Are you going to record video of your talk? While the slides posted on the Valve publications page are usually decent, good slides supplement what the speaker is saying rather than copying them, which kinda leaves out people who aren’t attending. Any plans on expanding your coverage of presentations, either specifically this talk or in general?

Regardless whom of us will see & experience this development:
Costs will be go down, even for these small & special displays used in VR devices/helmets/glasses.
Just compare it to the development of graphic processors over the last 10 – 15 years. In the beginning, we had small & weak processors on the boards, the last years these chips dramatically improved their perfomance.
Same will happen to displays – production prices will slump while “technical quality” of the components will rise.
And then – gotch’a, cool VR devices.
Sure, this may be far in the future, lets say 15+ years?
Regards

lcd tv’s motionflow (interpolation) strikes me as possibly useful here — though i guess it depends on the implementation; wiki shows some tv set’s interpolation going over 900hz. (my guess though it will add some latency).
if nothing else, the hz++ might help with some artifacts i’m assuming, though at what threshold?

Higher frame rate would help a huge amount in terms of artifacts, although rendering at those higher rates would have its own set of costs. Interpolation is an interesting possibility, but the head can move so quickly between frames that it’s not clear whether that will work well enough.

It would be pretty much the same as 60 Hz, since the 10 identical images would show up in the same place on the display – they’d still get extruded across the retina, instead of landing in the right place.

what about alternatively rendering the L & R eyes but at higher hz (like pageflipping but would only render half the screen) == approx same amount of work

this (A):
frame1/120 L
frame2/120 R
frame3/120 L
…
frame120/120 R

instead of (B):
frame1/60 LR
frame2/60 LR
…
frame60/60 LR

in a way you’d get more information out of it.
but then with interpolation you’d always be some units of time behind before the first image is shown:
in the case of (A): time before first L image is shown = (16msgettingthefirst’L’andthen’R’image + howeverlongittakestointerpolate’L’from’R’)

sidebar:
-i’m assuming that oculus displays one L&R image rendered at the same time and at the same point in time in game.
-also, thanks for the kind replies.

Oculus has only one panel, split between the eyes. You could update only half of the frame buffer, though.

I don’t know what would happen with your alternating approach, which came up in the comments to an earlier post. It could make the eyes unhappy over time, because the information for the two eyes is always conflicting (doing stereo correspondence of retinal images seems dicey), or it could cause the brain to think the frame rate is doubled. In any case, to do this with a single Oculus-style display you’d need to be able to transmit 120 fps to the display, in which case just doing 120 fps would be better. If you had a display for each eye, though, you could run each at 60 fps.

I'll post here whenever there's something about what I'm doing or about Valve that seems worth sharing. The initial post is an unusual one - it's long, my attempt to distill the experience of my first year and a half at Valve - but I think it's well worth reading to understand what I'm doing, why I'm doing it, and the context in which it's happening, and just to understand more about Valve in general.

Michael Abrash is the author of several books, including Zen of Code Optimization and Michael Abrash's Graphics Programming Black Book, and has written columns on graphics and performance programming for several magazines, including Dr. Dobb's Journal and PC Techniques. He was the GDI programming lead for the original version of Windows NT, coauthored Quake at Id Software with John Carmack, and worked on the first two versions of Xbox. He is currently working on R&D projects, including wearable computing, at Valve. He can be reached here.