A blog by Michael Abrash

Back in the spring of 1986, Dan Illowsky and I were up against the deadline for an article that we were writing for PC Tech Journal. The name of the article might have been “Software Sprites,” but I’m not sure, since it’s one of the few things I’ve written that seems not to have made it to the Internet. In any case, I believe the article showed two or three different ways of doing software animation on the very simple graphics hardware of the time. With the deadline looming, both the article and the sample code that would accompany it were written, but one part of the code just wouldn’t work right.

As best I can remember, the problematic sample moved two animated helicopters and a balloon around the screen. All the drawing was done immediately after vsync; the point was to show that since nothing was being scanned out to the display at that time (vsync happens in the middle of the vertical blanking interval), the contents of the frame buffer could be modified with no visible artifacts. The problem was that when an animated object got high enough on the screen, it would start vanishing – oddly enough, from the bottom up – and more and more of the object would vanish as it rose until it was completely gone. Stranger still, the altitude at which this happened varied from object to object. We had no idea why that was happening – and the clock was ticking.

I’m happy to report that we did solve the mystery before the deadline. The problem was that back in those days of dog-slow 8088s and slightly faster 80286s, the display was scanning out pixels before the code had finished updating them. And if that explanation doesn’t make much sense to you at the moment, it should all be clear by the end of today’s post, which covers some decidedly non-intuitive consequences of an interesting aspect of the discussion of latency in the last post – the potentially problematic AR/VR implications of a raster scan display, and the way that racing the beam interacts with the raster scan to address those problems.

Raster scanning

Raster scanning is the process of displaying an image by updating each pixel one after the other, rather than all at the same time, with all the pixels on the display updated over the course of one frame. Typically this is done by scanning each row of pixels from left to right, and scanning rows from top to bottom, so the rightmost pixel on each scan line is updated a few microseconds after the leftmost pixel, and the bottommost row on the screen is updated a few milliseconds (roughly 15 ms for 60 Hz refresh – less than 16.7 ms because of vertical blanking time) after the topmost row. Figure 1 shows the order in which pixels are updated on an illustrative if not particularly realistic 8×4 raster scan display.

Originally, the raster scan pattern directly reflected the way the electron beam in a CRT moved to update the phosphors. There’s no longer an electron beam on most modern displays; now the raster scan reflects the order in which pixel data is scanned out of the graphics adapter and into the display. There’s no reason that the scan-in has to proceed in that particular order, but on most devices that’s what it does, although there are variants like scanning columns rather than rows, scanning each pair of lines in opposite directions, or scanning from the bottom up. If you could see events that happen on a scale of milliseconds (and, as we’ll see shortly, under certain circumstances you can), you would see pixel updates crawling across the screen in raster scan order, from left to right and top to bottom.

It’s necessary that pixel data be scanned into the display in some time-sequential pattern, because the video link (HDMI, for example) transmits pixel data in a stream. However, it’s not required that these changes become visible over time. It would be quite possible to scan in a full frame to, say, an LCD panel while it was dark, wait until all the pixel data has been transferred, and then illuminate all the pixels at once with a short, bright light, so all the pixel updates become visible simultaneously. I’ll refer to this as global display, and, in fact, it’s how some LCOS, DLP, and LCD panels work. However, in the last post I talked about reducing latency by racing the beam, and I want to follow up by discussing the interaction of that with raster scanning in this post. There’s no point to racing the beam unless each pixel updates on the display as soon as the raster scan changes it; that means that global display, which doesn’t update any pixel’s displayed value until all the pixels in the frame have been scanned in, precludes racing the beam.

So for the purposes of today’s discussion, I’ll assume we’re working with a display that updates each pixel on the screen as soon as the scanned-in pixel data provides a new value for it; I’ll refer to this as rolling display. I’ll also assume we’re working with zero persistence pixels – that is, pixels that illuminate very brightly for a very short period after being updated, then remain dark for the remainder of the frame. This eliminates the need to consider the positions and times of both the first and last photons emitted, and thus we can ignore smearing due to eye movement relative to the display. Few displays actually have zero persistence or anything close to it, although scanning lasers do, but it will make it easier to understand the basic principles if we make this simplifying assumption.

Raster scanning is not how anything works in nature

To recap, racing the beam is when rendering proceeds down the frame just a little ahead of the raster, so that pixels appear on the screen shortly after they’re drawn. Typically this would be done by rendering the scene in horizontal strips of perhaps a few dozen lines each, using the latest reading from the tracking system to position each strip for the current HMD pose just before rendering it.

This is an effective latency-reducing technique, but it’s hard to implement, because it’s very timing-dependent. There’s no guarantee as to how long a given strip will take to render, so there’s a delicate balance involved in leaving enough padding so the raster won’t overtake rendering, while still getting close enough to the raster to reap significant latency reduction. As discussed in the last post, there are some interesting ways to try to address that balance, such as rendering the whole frame, then warping each strip based on the latest position data. In any case, racing the beam is capable of reducing display latency purely in software, and that’s a rare thing, so it’s worth looking into more deeply. However, before we can even think about racing the beam, we need to understand some non-intuitive implications of rolling display, which, as explained above, is required in order for racing the beam to provide any benefit.

So let’s look at a few scenarios. If you’re wearing an HMD with a 60 Hz rolling display, and rendering each frame in its entirety, waiting for vsync, and then scanning the frame out to the display in the normal fashion (with no racing the beam involved at this point), what do you think you’d see in each of the following scenarios? (Hint: think about what you’d see in a single frame for each scenario, and then just repeat that.)

Scenario 1: Head is not moving; eyes are fixated on a vertical line that extends from the top to the bottom of the display, as shown in Figure 2; the vertical line is not moving on the display.

Scenario 2: Head is not moving; the vertical line in Figure 2 is moving left to right on the display at 60 degrees/second; eyes are tracking the line.

Scenario 3: Head is not moving; the vertical line in Figure 2 is moving left to right relative to the display at 60 degrees/second across the center of the screen; eyes are fixated on the center of the screen, and are not tracking the line.

Scenario 4: Head is rotating left to right at 60 degrees/second; the vertical line in Figure 2 is moving right to left on the display at 60 degrees/second, compensating for the head motion so that to the eye the image appears to stay in the same place in the real world; eyes are counter-rotating, tracking the line.

Take a second to think through each of these and write down what you think you’d see. Bear in mind that raster scanning is not how anything works in nature; the pixels in a raster image are updated at differing times, and in the case of zero persistence aren’t even on at the same time. Frankly, it’s a miracle that raster images look like anything coherent at all to us; the fact that they do has to do with the way our visual system collects photons and makes inferences from that data, and at some point I hope to talk about that a little, because it’s fascinating (and far from fully understood).

Here are the answers, as shown in Figure 3, below:

Scenario 1: an unmoving vertical line.

Scenario 2: a line moving left to right, slanted to the right by about one degree from top to bottom. (The slant is exaggerated in Figure 3 to make it more visible; in an HMD, a one-degree slant is much more visible, for reasons I’ll discuss a little later.)

Scenario 3: a vertical line moving left to right.

Scenario 4: a line staying in the same place relative to the real world (although moving right to left on the display, compensating for the display movement from left to right), slanted to the left by about one degree from top to bottom.

How did you do? If you didn’t get all four, don’t feel bad; as I said at the outset, this is not intuitive – which is what makes it so interesting.

In a moment, I’ll explain these results in detail, but here’s the underlying rule for understanding what happens in such situations: your perception will be based on whatever pattern is actually produced on your retina by the photons emitted by the image. That may sound obvious, and in the real world it is, but with an HMD, the time-dependent sequence of pixel illumination makes it anything but.

Given that rule, we get a vertical line in scenario 1 because nothing is moving, so the image registers on the retina exactly as it’s displayed.

Things get more complicated with scenario 2. Here, the eye is smoothly tracking the image, so it’s moving to the right at 60 degrees/second relative to the display. (Note that 60 degrees/second is a little fast for smooth pursuit without saccades, but the math works out neatly on a 60 Hz display, so we’ll go with that.) The topmost pixel in the vertical line is displayed at the start of the frame, and lands at some location on the retina. Then the eye continues moving to the right, and the raster continues scanning down. By the time the raster reaches the last scan line and draws the bottommost pixel of the line, it’s something on the order of 15 ms later, and here we come to the crux of the matter – the eye has moved about one degree to the right since the topmost pixel was drawn. (Note that the eye will move smoothly in tracking the line, even though the line is actually drawn as a set of discrete 60 Hz samples.)

That means that the bottommost pixel will land on the retina about one degree to the right of the topmost pixel, which, due to the way images are formed on the retina and then flipped, will cause the viewer to perceive it to be one degree to the left of the topmost pixel. The same is true of all the pixels in the vertical line, in direct proportion to how much later they’re drawn relative to the topmost pixel. The pixels of the vertical line land on the retina slanted by one degree, so we see a line that’s similarly slanted, as shown in Figure 4 for an illustrative 4×4, 60 Hz display.

Note that for clarity, Figure 4 omits the retinal image flipping step and just incorporates its effects into the final result. The slanted pixels are shown at the locations where they’d be perceived; the pixels would actually land on the retina offset in the opposite direction, and reversed vertically as well, due to image inversion, but it’s the perceived locations that matter.

If it’s that easy to produce this effect, you may well ask: Why can’t I see it on a monitor? The answer depends on whether the monitor waits for vsync; that is, whether the entire rendered frame is scanned out to the display only once per displayed frame (i.e., at the refresh rate), or scanned out to the display as fast as frames can be drawn (so multiple rendered frames affect a single displayed frame, each in its own horizontal strip – a form of racing the beam).

In the case where vsync isn’t waited for, you won’t see lines slant for reasons that may already be obvious to you – because each horizontal strip is drawn at the right location based on the most recent position data; we’ll return to this later. However, in this case it’s easy to see the problem with not waiting for vsync as well. If vsync is off on your monitor, grab a screen-height window that has a high-contrast border and drag it rapidly left to right, then back right to left, and you’ll see that the vertical edge breaks up into segments. The segments are separated by the scan lines where the copy to the screen overtook the raster. If you move the window to the left and don’t track it with your eyes, the lower segments will be to the left of the segments above them, because as soon as the copy overtakes the raster (this assumes that the copy is faster than the raster update, which is very likely to be the case), the raster starts displaying the new pixels, which represent the most up-to-date window position as it moves to the left. This segmentation is called tearing, and is a highly visible artifact that needs to be carefully smoothed over for any HMD racing-the-beam approach.

In contrast, if vsync is waited for, there will be no tearing, but the slanting described above will be visible. If your monitor waits for vsync, grab a screen-height window and drag it back and forth, tracking it with your eyes, and you will see that the vertical edges do in fact tilt as advertised; it’s subtle, because it’s only about a degree and because the pixels smear due to long persistence, but it’s there.

In either case, the artifacts are far more visible for AR/VR in an HMD, because objects that dynamically warp and deform destroy the illusion of reality; in AR in particular, it’s very apparent when artifacts mis-register against the real world. Another factor is that in an HMD, your eyes can counter-rotate and maintain fixation while you turn your head (via the combination of the vestibulo-ocular reflex, or VOR, and the optokinetic response, or OKR), and that makes possible relative speeds of rotation between the eye and the display that are many times higher than the speeds at which you can track a moving object (via smooth pursuit) while holding your head still, resulting in proportionally greater slanting.

By the way, although it’s not exactly the same phenomenon, you can see something similar – and more pronounced – on your cellphone. Put it in back-facing camera mode, point it at a vertical feature such as a door frame, and record a video while moving it smoothly back and forth. Then play the video back while holding the camera still. You will see the vertical feature tilt sharply, or at least that’s what I see on my iPhone. This differs from scenario 4 because it involves a rolling shutter camera (if you don’t see any tilting, either you need to rotate your camera 90 degrees to align with the camera scan direction – I had to hold my iPhone with the long dimension horizontal – or your camera has a global shutter), but the basic principles of the interaction of photons and motion over time are the same, just based on sampling incoming photons in this case rather than displaying outgoing ones. (Note that it is risky to try to draw rolling display conclusions relevant to HMDs from experiments with phone cameras because of the involvement of rolling shutter cameras, because the frame rates and scanning directions of the cameras and displays may differ, and because neither the camera nor the display is attached to your head.)

Scenario 3 results in a vertical line for the same reason as scenario 1. True, the line is moving between frames, but during a frame it’s drawn as a vertical line on the display. Since the eye isn’t moving relative to the display, that image ends up on the retina exactly as it’s displayed. (A bit of foreshadowing for some future post: the image for the next frame will also be vertical, but will be at some other location on the retina, with the separation depending on the velocity of motion – and that separation can cause its own artifacts.)

It may not initially seem like it, but scenario 4 is the same as scenario 2, just in the other direction. I’ll leave this one as an exercise for the reader, with the hint that the key is the motion of the eye relative to the display.

Rolling displays can produce vertical effects as well, and they can actually be considerably more dramatic than the horizontal ones. As an extreme but illustrative example (you’d probably injure yourself if you actually tried to move your head at the required speed), take a moment and try to figure out what would happen if you rotated your head upward over the course of a frame at exactly the same speed that the raster scanned down the display, while fixating on a point in the real world.

Ready?

The answer is that the entire frame would collapse into a single horizontal line, because every scan line will land in exactly the same place on the retina. Less rapid motion will result in vertical compression of the image. Vertical motion in the same direction as the raster scan will similarly result in vertical expansion. Either case can cause either intra- or inter-frame brightness variation.

None of this is hypothetical, nor is it a subtle effect. I’ve looked at cubes in an HMD that contort as if they’re made of Jell-O, leaning this way and that, compressing and expanding as I move my head around. It’s hard to miss.

Racing the beam fixes everything – or does it?

In sum, rolling display of a rendered frame produces noticeable shear, compression, expansion, and brightness artifacts that make both AR and VR less solid and hence less convincing; the resulting distortion may also contribute to simulator sickness. What’s to be done? Here we finally return to racing the beam, which updates the position of each scan line or block of scan lines just before rendering, which in turn occurs just before scan-out and display, thereby compensating for intra-frame motion and placing pixels where they should be on the retina. (Here I’m taking “racing the beam” to include the whole family of warping and reconstruction approaches that were mentioned in the last post and the comments on the post.) In scenario 4, HMD tracking data would cause each scan line or horizontal strip of scan lines to be drawn slightly to the left of the one above, which would cause the pixels of the image to line up in proper vertical arrangement on the retina. (Another approach would be the use of a global display; that comes with its own set of issues, not least the inability to reduce latency by racing the beam, which I hope to talk about at some point.)

So it appears that racing the beam, for all its complications, is a great solution not only to display latency but also to rolling display artifacts – in fact, it seems to be required in order to address those artifacts – and that might well be the case. But I’ll leave you with a few thoughts (for which the bulk of the credit goes to Atman Binstock and Aaron Nicholls, who have been diving into AR/VR perceptual issues at Valve):

1) The combination of racing the beam and compensating for head motion can fix scenario 4, but that scenario is a specific case of a general problem; head-tracking data isn’t sufficient to allow racing the beam to fix the rolling display artifacts in scenario 2. Remember, it’s the motion of the eye relative to the display, not the motion of the head, that’s key.

2) It’s possible, when racing the beam, to inadvertently repeat or omit horizontal strips of the scene, in addition to the previously mentioned brightness variations. (In the vertical rotation example above, where all the scan lines collapse into a single horizontal line, think about what each scan line would draw.)

3) Getting rid of rolling display artifacts while maintaining proper AR registration with the real world for moving objects is quite challenging – and maybe even impossible.

These issues are key, and I’ll return to them at some point, but I think we’ve covered enough ground for one post.

Finally, in case you still aren’t sure why the sprites in the opening story vanished from the bottom up, it was because both the raster and the sprite rendering were scanning downward, with the raster going faster. Until it caught up to the current rendering location, the raster scanned out pixels that had already been rendered; once it passed the current rendering location, it scanned out background pixels, because the foreground image hadn’t yet been drawn to those pixels. Different images started to vanish at different altitudes because the images were drawn at different times, one after the other, and vanishing was a function of the raster reaching the scan lines the image was being drawn to as it was being drawn, or, in the case of vanishing completely, before it was drawn. Since the raster scans at a fixed speed, images that were drawn sooner would be able to get higher before vanishing, because the raster would still be near the top of the screen when they were drawn. By the time the last image was drawn, the raster would have advanced far down the screen, and the image would start to vanish at that much lower level.

51 Responses to Raster-Scan Displays: More Than Meets The Eye

So if you had eye tracking data as well, would that help? It would be nice if you did; it would also allow you to do more realistic motion blur, which is also eye-relative, right?

Aside: the last time I commented, there seemed to be a delay before my comment appeared, presumably due to a comment moderation system. That’s all fine and good, but last time you did definitely have a bunch of comments that all said the same thing, which I imagine was a bit tedious for you. That delay prevents us commenters from seeing other comments and suppressing redundant ones, and makes it more difficult for us to have a conversation between ourselves.

You are correct – the delay is due to the moderation system. I have a full time job beside moderating, so I tend to do it in the evening. I agree that the delay results in overlapping comments, but the alternative of having a ton of spam posted is not particularly appealing either. I’ll try to moderate more often.

Naturally expanding from scenario 4 to be compensated with head motion tracking, scenario 2 could be compensated with eye tracking. I know there are many projects already utilizing eye tracking from controlling computers to analyzing human behavior, but it would be interesting to know how much of processing would real-time updating on speeds to race to beam require to correctly utilize head and eye tracking data, and to add that real moving objects you are to augment on the display.

At least quickly thinking that seems to require so much processing that it would be quite impossible to perform for 60 frames per second. Especially if this is to be augmented reality HMD, which would require it to be a mobile device in the long run. For smooth tracking of outside moving objects also a high-fps camera (60 fps in the least) would be needed to track them, and sophisticated software to recognize the objects in real-time. Lots of work for you guys to work on.

Yes, there is lots to work on Tobii now has a 300 fps eye tracker, so it can be done, but you’re right, 60 fps eye tracking in a mobile device is quite a challenge. Also, don’t forget latency; with head tracking, you can get very low latency out of an IMU, but optical eye tracking is going to take longer, if only to capture and transfer the image.

It sounds to me like you need to stop thinking abut how to display a good image on the screen and instead think directly about how to deliver the required photons, at the required time, to the required part of the eye.

This would require tracking eye movements (as the first poster here suggests) and some sort of abstraction layer that translated the AR image into the one needed to be shown on the screen to get the required effect.

Perhaps this is what you are thinking of doing? I would imagine such a scheme would be easier if you didn’t have to update each pixel in a raster scan, but could instead individually tell a particular pixel to emit light when it was needed. For example, tell the pixel at (48,54) to update and then (72,36) to update, etc. I don’t know enough about display hardware to know if this is feasible, and as you have said, developing new display technology is likely to be cost prohibitive.

Anyway, thought I would share my immediate thoughts!

(P.S. I’m really enjoying the blog posts, it’s nice to think about something more tangible than quantum physics every once in a while!)

> … and instead think directly about how to deliver the required photons, at the required time, to the required part of the eye

Interesting. This almost lends itself to ray tracing. The model transformations are typically instantaneous (practically as fast as the tracking HW notifies of the change), but rendering a frame takes time. However, if by the time each pixel is drawn it needs to represent a different transformation of the model, we might as well ray trace this pixel instead.

Ray tracing is a good fit for racing the beam – except for the performance issues, which are considerable. I think performance would be quite a bit better using the standard pipeline and doing image reconstructions, although I’d be happy to be proven wrong.

Glad you’re emjoying the posts! I can see how they might be more concrete – and a lot easier to think through – than quantum physics

It’s certainly possible to update some types of displays in arbitrary order, although it would require custom hardware, as you note. But I don’t see how that helps; whenever the head moves, all the pixels have to change.

Delivering the required photons to the right place at the right time is what’s needed – the trick is how to get them there, especially with variants of existing display technology. For example, with a sampled display you can never get photons to land on the retina exactly as they would with a real scene. I’m not following what you have in mind with your abstraction layer. Can you explain?

I’m interested to see what your thoughts are on global displays and laser retinal displays – I know Carmack has played around with the latter – I believe he complained about DOF issues?

Better display hardware really does seem to be the real answer here – from what I understand OLED can do global updates and can handle very high refresh rates that should be able to to reduce the latency sufficiently, the limiting factor being the circuitry and connect – can you confirm? I’m not how feasible it is for hobbyists to put together the right hardware for that, and I know you would’ve thought of it already, but I really do feel spending all this effort on beam-racing to stay on commodity hardware is wasted when it’s already going to have different requirements being on an HMD, if only the usefulness of higher resolutions than anything reasonable for, say, a phone. I’d take wobbly pictures for a year or two until displays catch up, and I’m on the more snobbish end of hardware. In any case I’d guess ergonomics, cost and, let’s be honest here, the games, will all be way more noticeably terrible for that period!

Have you already called up the display manufacturers to see what sort of interest they would need to see to make what you really want?

Laser retinal displays have some nice attributes – especially low persistence, although that creates its own issues – but I have yet to find one that had the resolution, FOV, and refresh rate that’s needed.

Better display hardware would be great, but of course the problems associated with getting that made are what the last post was about.

OLEDs switch in microseconds, so yes, they could do higher refresh rates and global display given the proper circuitry and interfacing. Definitely not a hobbyist thing, though.

Beam racing is only wasted if better hardware does get made, which is not guaranteed, and also only if there’s an alternative in the interim that’s good enough. You say you’d take wobbly pictures, but I’m guessing you haven’t experienced them – I have (most recently, today), and I’m not at all sure they’re acceptable.

Kinect sure isn’t acceptable, either – that didn’t stop it being shipped and a bunch of games made for it.* The question is not “is it acceptable”, the question is “is it acceptable enough for us to get to make a better one”, and “is that a better path than working around current tech”.

You’re right though – I’ve not seen VR since Dactyl Nightmare (or something – I was like 8 at the time) – I’ll defer to your expertise, I only suggest that the market is looking for “better”, not “best”. I’d want to focus on, say, solving where the heck are you going to make those images when you’re walking around – battery life is probably a far bigger issue to the general public than most IQ concerns. At least, until the first stories about simulator sickness hit, I guess….

In any case, I’m loving hearing about all this stuff – your smooth pursuit link alone led me on an hours long wiki walk all over optical neuroscience.

* I’m interested to see if Microsoft push a Kinect 2 with whatever their next thing is, if it’s actually good enough this time, and if customers aren’t already burned by the last one.

Wavy is a very non-random pattern. I would have guessed that the resulting perceived image would be somewhat noisy, but possibly perceived as having better overall snapping to the real world, or at least, less obvious misalignments and wobblings (which would be replaced with noise). The noise level would correlate to the eye movement speed.

If it’s uniquely random for each frame, the result would be fuzziness/blurriness during motion. If it’s the same random pattern for all frames, then I would guess that there’d be a pattern you’d pick up during motion, where edges and areas didn’t look solid in a consistent way.

Have you looked at something like the Nintendo DS’s GPU. Basically you submit a static scene, and it renders it in nearly in lockstep with the LCD controller scan out. They did it that way because the DS’s GPU is really a 2D processor with 3D tacked on, but it’s an interesting design that may help you. Adding some realtime feedback during actual rendering still leaves you with the slanted line issue, but maybe the latency is low enough to make it not be as big of a deal.

Unfortunately, as I noted last time, anything that requires hardware changes to displays is very expensive and difficult to do. Given the ability to change the hardware, many interesting possibilities open up. I’m not familiar with the DS’s GPU – can you provide more detail?

Actually, you’d be surprised how well the human visual system does at preserving consistent perception even during eye motion. Although some aspects of visual acuity suffer during when the eye moves relative to an object (particularly high-frequency spatial acuity), peripheral vision isn’t lost. In fact, the visual system compensates for anticipated eye movement; when tracking an object, the background visibly smears far less than a target moving at the same speed relative to a non-moving eye. In addition, the targeting mechanism of saccadic eye movements depends on peripheral vision (at least away from the fovea) beforehand and a correct target image where it lands, regardless of whether the eye is moving beforehand.

Most importantly, as Michael has pointed out previously, during head rotation the eye often counter-rotates to compensate, particularly if you’re fixating on something in the world. When that happens, you expect the same level of visual acuity that you’d have when the head isn’t moving; in fact, you’d notice immediately if your peripheral vision smeared every time you moved your head but kept your eyes looking straight ahead.

You mentioned that one of the chief problems with “racing the beam” is that rendering a tile or strip of pixels can take a variable length of time depending on complexity, meaning that a careful balance is needed when determining the buffer length to ensure the render beats the raster by a safe enough margin, but still improves latency.

I feel like one potential solution is to use “good enough” rendering for each tile, by progressively rendering in greater detail. For instance, start by rendering a tile with simple Gouraud shading and diffuse/specular textures, then progressively adding further details (shader effects, proper lighting, even antialiasing if there’s enough time). At any point, if the raster is too close to catching up, just output what you have and move on. That’s a gross oversimplification, but you get the idea.

It might even be possible to detect when a tile would be the same as the tile in the previous refresh, and thus “resume” rendering of it in greater detail, starting from an old output buffer and continuing the rendering process. This would provide both fast updates during movement and high quality for static viewing. Much more difficult, though, and I’m not sure how often you would be able to do that.

Obviously this would require some changes to the rendering pipeline (one’s I’m fairly confident would slow down a complete render, just because of the extra memory I/O), and possibly a way for the display to communicate back to the rendering hardware where it’s rendering (although a simple timer and counter should be enough for that, I would think). But it might be worth it, if it leads to a renderer that can reliably output something within a certain timespan, enabling closer beam-racing and thus lower latency.

My guess is that the artifacts from this would be pretty noticeable, but I haven’t tried it, so I can’t say for sure. Still, imagine that various horizontal strips are drawing in various resolutions, alternating rapidly – I think that would look pretty bad.

Starting from the last frame is a clever idea. Generally, though, I worry about the worst case, since that’ll be the most noticeable, and the worst (and most common) case is that your head is moving, in which case nothing is reusable.

This is very off topic and could almost be considered a PM, but in some respects related to AR/VR. Do you believe that motion control system like the razor hydra, Wii/WiiU, kinect will have a large impact on in AR/VR especially when that field of controller isn’t fully developed. I personally didn’t see these devices having an effect until I read a resent reddit post which brought the idea of adding a gyro system to incorperate feedback into a controller in its rotational motion. Although I am no physicist (yet) multi-directional/axial rotational feedback should be possible with at most 3 gyros which are free spinning when not not in use.

Image any RPG with sword play or magic, this could give weight to spells, indicate the weight of a sword or throwing knife as you thrust it out of your hand and as it leaved the controller becomes easy to handle. Contact vibrations caused by varying degrees of resistance in the spin of the gyros. Your sword is hit, your pushed back. gyros rotation is altered to force the player’s wrist to turn back away. (may take more than 3 gyros to accomplish this, or a powerful motor). Anyways considering that this kind of controller could exist given ample software/hardware do you think it would benefit VR/AR and also revive the slow decline of motion controls?

Current Problems w/ this concept: Battery, weight, application/implementation, blind and deaf while swinging leads to destruction/injury. linear motion force feedback isn’t very plausible. All in all fewer (obvious) problems then VR/AR.

I’m not going to comment in detail on input, because I don’t have extensive experience with it, but it certainly is important to AR/VR. Having actual spinning gyros capable of pushing back against you sounds *very* heavy and power-hungry, not to mention tiring to use. That would be a whole new category of RSI Also, of course, rotation without solid translation would be weird, although it might still be interesting. I don’t think I’d say that this has fewer problems than AR/VR – those seem like pretty serious problems – and it seems quite a bit less compelling. But I’m just guessing

In my opinion the trepidation about power hunger is unwarranted in this application. For AR (VR) outside i 100% agree that power is a huge problem. After all since batteries are not up to snuff (and infrastructure) we still have no proper adoption of electric cars! Also that makes some kinect like system mounted to your AR device awkward since it draws so much power.

However the kinect has been installed in many living rooms precisely since the living room admits a power hungry system. Having a big infrastructure outside is of course a massive challenge, but where do we start building such an infrastructure if not in the living rooms of tech geeks?

The (current) razer hydra has cables, the oculus rift has (a) cable(s), an early version (or the windows version) of the kinect had an extra power cable (not drawing power over usb), i prefer my mouse on the pc to have a cable since i already have enough on my hands to recharge the batteries on my other wireless devices…

So why is it out of the question for some future motion control device to draw power over a cable? If the cable is long enough and has some cable support maybe?

In general, cables aren’t great but are acceptable in the living room or at a desk, exactly as you say. In the specific case of spinning gyros in a hand control – a device whose effectiveness relies on deft manipulation – the cables would throw off the balance, hamper movement, and have the potential to get caught on things. (This is also true of wired HMDs, as I’m reminded on a daily basis.) I have a hard time seeing that be a good experience. Your mouse cable is lighter than the power cable that would be required for massive gyros, and the same is true for the Hydra; also, with a mouse you don’t have to support the cable’s weight, unlike with a gyro device. Nonetheless, wired input devices could be successful – but for the spinning gyros, there’s also the weight of the gyros, which I think is a big problem.

True, there are several issues with the idea. The reason why I believe most of these issues will be resolved, is because they are solved for other products. Mobile, ulta-light high power batteries. Motors, Electric cars and general research into motor efficiency. If contrasted to VR/AR where there isn’t much of a drive for smaller screen w/ extremely high pixel density and faster refresh rates. it is more likely the tech required to make a compelling force feedback system system will come around before the tech required for VR/AR at that higher level.

Corded is quite possible, but yea… not a very attractive feature.

Well, I’d at least want to see if its any fun, I know its more gimmicky stuff, so Nintendo might try something like this… I guess we’ll wait and see.

The target weight of the device is heavy handgun or light sword, for immersion purposes. Weight distribution in the device is an anticipated difficulty, but it’s also possible that the nature of the device might let us cheat. The power cord will be combined with the wrist strap, and include clips for securing it to your clothing. Clipping might be annoying, but one or two clips will go a long way to keeping things neat. A stand for the device might have a boom for holding the cord up and off of the floor.

My greatest challenge so far is how to integrate the three opposing gyroscopes into the same space most efficiently, while keeping the center of mass where it belongs. I also need to figure it all out with the center of mass far from the center, so that I can test how distribution of mass effects my application.

Consider whirling a flail over your head, the device should be able to provide a very convincing torque in the hand , but I’m so curious about how convincing it would be, to only have rotational and matching visual feedback. How much like the real thing can fencing be made to feel?

I can’t see any other way to do free moving force feedback effectively.

It has limited viewing, but searching for “sprite” we find the following:

This program animates two helicopters (14 pixels high by 24 wide) and a balloon (20 by 16) against a backdrop of stars, a crescent moon, and a horizon (see photo 1). The helicopters have large windows, through which the stars, moon, and even the rearward helicopter can be seen (see photo 2).

I think that was the issue before the one I referred to in the post. That was the issue with my one and only cover article in PC Tech Journal (with Dan Illowsky); Dan and I called the cover illustration “Pigs in Space,” and if you look at it, you’ll see why. As I recall, that article was about how to do animation by drawing sprites with a blank border big enough to erase the sprite at the old position (limiting maximum movement between frames), using the string instructions, an approach that Dan invented; it was about 5X faster than XORing sprites on and off, which was the standard technique at the time. That article and the one with drawing during the vertical blanking period were basically a two-parter. However, I think “Flicker-Free Scrolling” may have been a separate article from the cover article, maybe one of the one-pagers PC Tech Journal routinely ran, because the snippet only shows my name, not Dan’s. So I guess I had two articles in that issue.

Assuming we had AR-glasses with a semi-transparent OLED for each eye,
would our renderer actually have to display pixels that are in the center of my FOV on both displays, or could you maybe just display every second pixel on one, and the remaining on the other?

I got this idea by thinking that most of the visual image we percieve is composed of the combined images of both eyes, with just the outer degrees beeing exclusive to each eye, which of course makes stereo-vision possible.

So by tracking the pupil-positions of each eye we should be able to determine how much of the image we see is percieved by both eyes, and than divide the pixels we want to display to each eye´s OLED.

I think this could cut down rendering time to some amount, especially when racing the beam, where we don´t have to scan in order.

Only thing I don´t know is, if depth-information still translates well enough to the brain.

Excuse if my idea or my english sounds strange i´m a 1st year university student from Germany

Good question. You could do that, but it would be easier, cheaper, and work better to simply use displays with half the resolution. Also, if you just skipped every other pixel, you’d have blank space (gray or black), which would lower contrast.

Yeah I didn´t think about the obvious solution of using half-res displays.

About the blank spaces, I´ve read an article about researchers using transparent, conductive metal oxides like zinc oxide for (almost) fully-transparent OLEDs, which would get rid of (significant)contrast issues.

Do you know what percent of transparency we would need, to make contrast issues unimportant ?

As far as i know most standard OLEDs have transparency of about 85 %

And thanks for giving people a platform to talk about such interesting R&D stuff with someone as experienced as you

I don’t know if there’s a threshold level at which contrast doesn’t matter, but 15% occlusion probably isn’t a significant problem. I was thinking of VR, where see-through doesn’t help, but you clearly said AR. However… the problem is that you can’t focus at the distance of the glasses, so a transparent OLED doesn’t help (unless you’re wearing something like the Innovega contact).

Oh, and I’m glad you’ve enjoyed the forum for discussing this stuff. I’m enjoying it too. I’m just a little disappointed that no one has tried to guess why my cryptic third issue with racing the beam is

I tried this, if I understand the question correctly, and I came to the conclusion the dominant eye perceives the image and uses the image of the non-dominant eye mainly for depth. So having a pixel on one screen and not on the other will probably not work as the two images are not merged (equally) in the brain.

Here’s the response sent to me by Aaron Nicholls, who’s been investigating such things:

He and you are correct that depth from stereoscopic vision would likely fall apart if you tried to naively split pixels between hemispheres, breaking both conscious perception and unconscious depth-related reactions such as vergence control and translational VOR. This is especially messy because there is no such thing as splitting pixels between eyes when stereoscopy is involved, as there is no per-pixel correspondence between the eyes in the first place unless everything is at infinity.

However, while eye dominance has a biasing effect beyond conscious perception (for an example, see here: http://m.brain.oxfordjournals.org/content/125/9/2023.long), he’s mistaken in the belief that the secondary eye is only needed for depth and for creating the consciously perceived cyclopean image. There are still a number of other mechanisms (flicker sensitivity, motion detection, and looming) that operate across the entire visual field, even if there is a small bias to the dominant side. If that weren’t true, the 10% of the population who are stereo blind (or the 30% of the population with some sort of stereo deficiency) would be incredibly vulnerable from their non-dominant side.

In animals without stereoscopic vision (of which there are many, and stereoscopic animals evolved from them), there is often completely different directional sensitivity between the two eyes, prioritizing potentially incoming (temporal to nasal) versus outgoing motion and requiring input from both eyes to stabilize the head during egomotion and to keep from being effectively blind to certain classes of motion.

Obviously the brain is capable of operating only on the signals from a single eye, but this is the first I’ve heard of someone saying the secondary eye is only useful for depth and creating the cyclopean image which is perceived. If they have a useful article backing up their claim, I’d love to see it; the Wikipedia article on ocular dominance is mysteriously low on citations, high on speculation, and requoted by other sites as well, so I’d like to see something a little more reliable on the subject.

I’ve always liked the idea of increasing the inactive time in the scan cycle to decrease tearing.

Thus, even at lower resolutions, the monitor is updating as fast as it can, and then just waiting at the end of the cycle.

This would reduce a LOT of the major problems with scan-based displays, not the least of which is tearing.

It is also very ideal for TFT displays which retain their pixels until the next scan, allowing you to decrease their refresh rate to the eyes perception of animation of 45-55hz without increasing tearing, instead of having to worry about preventing tearing, and increasing the write window.

I have noticed tearing many times on several different devices (like scrolling quickly on a phone- also captured a screenshot of it), and sure, it’s annoying, but I have gotten used to it and expect this imperfection. Would it really be that big of an issue in an HMD?

Unfortunately, yes. When your head is moving, the tear can get much larger, because your head is in a different position when the two sides of the tear are drawn – a difference that can be several degrees (which can equate to dozens or even a hundred pixels, depending on pixel density). When you’re looking at something that your brain believes exists in the real world, and it shears by several degrees when you turn your head, the illusion is destroyed.

Dear Abrash, recently I can across another reddit post (funny that I keep citing that place) about how implementing Binaural 3D sound is really important in VR, they linked to a You Tube “sound play” as an example of the possibilities of this relatively untapped Tech. So here in lies the question to what extent do you see Source/Source 2 engine supporting this kind of tech? As the idea is relatively simple it should be quite easy to implement, yet there are few (non that I have played) games which fully simulate the “vectors” of sound which give the player a feel for their surroundings. In the video/”soundplay” the tech/recording techniques give geometry to the room, direction of a character’s voice, and relative proximity to a sound source. Why have we not seen this in video games?

I’d always thought something like you describe in this article was going on when I see those cheap bright LED arrays displaying text in store fronts and public transport – when the text is static the letters are upright, when the text is scrolled the letters (usually) become italic. It was great to read your explanation. I expect the displays that scroll with little appreciable shear are updating their few lines much more quickly, and that the distinct italic effect in other displays was a deliberate choice by the engineers. Presumably they could (and maybe do) flip the shear horizontally by scanning upwards, which wouldn’t be a major stretch for that hardware.

I’ve been reading your blog for a little while now and I have really enjoyed reading your take on the realistic possibilities and issues with implementing VR as the technology currently stands. I agree that hard AR is basically impossible until some serious strides in software and in optics are made. What I’m curious about is how much of the input side you’ve thought of when doing VR.

Trying to do something like a mouse and keyboard would be very difficult without being able to see them out of your peripheral vision, especially the transition between the two. Even with touch-typers, being blind to the keyboard or having lag when trying to display virtual hands would make your efficiency drop drastically. This would be especially troublesome when context switching, such as from scrolling on the mouse to typing, or moving from typing words to navigating in 3D space. I would think that to properly do VR, you would need to not only worry about the glasses but also a new input method that could deal with the shift in interface.

My guess, since in VR you are blind to the outside world and current processing can’t keep a lag free copy of the real world visible, that you would have to come up with a controller that would rely on your body’s ability to know its own position without seeing it. Trying to rely on control points that are not attached to or in the hand seem like they would cause problems with drift or worse.

I’m curious what your thoughts are on this issue and if you have had time to actually think about this aspect with the same depth that you’ve been working on the display. I think that VR may be used for games primarily to start but I think the potential is severely wasted if the only comfortable control method available is designed for games.

Great question! The answer is that I haven’t thought about input with the same depth as display or tracking, because existing games work well in VR with existing controls, which makes one less very hard problem to have to solve in building something useful. However, I consider input as important as display or tracking, and, as you say, the current input methods are limiting for VR. Also, existing input methods don’t work in AR for the most part, so coming up with new input methods is even more important and challenging there.

Many years ago (1989) there was a lightweight HMD called “Private Eye” – you may well know it (I’ve still got two of them). Essentially they used a vertical column of 280 LEDs (probably from an LED printer) with a counter-balanced vibrating mirror to perform horizontal scanning. In this context, you’re racing 280 simultaneous beams across the screen, and being LED of course persistence is no problem. Note that the display is capable of drawing in both directions, i.e. you can draw during ‘flyback’ just as well as in the conventional direction, which has interesting implications (pixels at either end of the line get double updates in quick succession – though you could always interlace).

For fun I just had a look at the hardware interface to see if existing units could be hacked to test the concept – turns out there’s no VSync available on the interface, but I’m sure there’s an internal signal from the mirror/mirror drive that can be pulled out. I’m quite tempted to start hacking!

A modern version with RGB LEDs seems like it would be a good fit for low latency, just as you’ve covered; render the image in vertical strips, and shoot the strips to the LED column just ahead of the scan.

Apparently the company behind the display (Reflection Technology) were reportedly working on a colour version but went out of business first. Shame, as it seems an elegant design and cheap to make in quantity.

I'll post here whenever there's something about what I'm doing or about Valve that seems worth sharing. The initial post is an unusual one - it's long, my attempt to distill the experience of my first year and a half at Valve - but I think it's well worth reading to understand what I'm doing, why I'm doing it, and the context in which it's happening, and just to understand more about Valve in general.

Michael Abrash is the author of several books, including Zen of Code Optimization and Michael Abrash's Graphics Programming Black Book, and has written columns on graphics and performance programming for several magazines, including Dr. Dobb's Journal and PC Techniques. He was the GDI programming lead for the original version of Windows NT, coauthored Quake at Id Software with John Carmack, and worked on the first two versions of Xbox. He is currently working on R&D projects, including wearable computing, at Valve. He can be reached here.