The Promise and Challenges of Head-Mounted Virtual Reality Displays

"The future is already here. It's just not very evenly distributed." - Sci-fi author William Gibson

This quote from Gibson--which he said in a 1999 NPR interview, though not for the first time--endures in popularity because it implies two exciting ideas. One, that with our Internet and our smartphones and scientific advances, we're living in a "future" sci-fi authors imagined in the 1980s. Two, that the future is a nebulous concept, rather than a sudden moment in time, that pops up in different parts of the world based on wealth and other socioeconomic factors. Even as the future as crept forward in fits and bursts, one of its most enduring icons--virtual reality--has stubbornly remained in the realm of fiction.

Until now. The Oculus Rift has arrived.

Oculus VR's head-mounted display fits into the first part Gibson's paradigm. It's a technology driven by sci-fi fantasies like Tron and The Matrix and Star Trek: The Next Generation, but in reality, head-mounted displays have been impossibly expensive and, in consumer models, pretty crappy, until the Oculus Rift. It's the right time for high-resolution displays and low-latency technology to deliver virtual reality into our present "future." But the Oculus Rift--and VR in general--also fit into the second part of that paradigm: Virtual reality is complicated, and it will be a long, long time until it's evenly distributed.

If there's anyone who knows exactly what it will take to overcome VR's complex challenges and hurry along its development, it's Michael Abrash. Abrash programmed Quake alongside John Carmack and has worked on games and graphics programming at Microsoft and other tech companies for more than 20 years. He's spent the past two years at Valve researching wearable computing. This is his thing. And at this year's Game Developer's Conference, Abrash talked to a packed room about why virtual reality is hard and where it's headed in the next few years.

In fact, Abrash and Carmack have both spoken and written extensively about VR in the past year; Carmack demonstrated an early Oculus Rift prototype running Doom 3 at E3 2012, and a few months back Abrash wrote a fascinating dive into the challenge of latency in virtual reality. In his 25 minute GDC talk, Abrash delved even further into the challenges and complexities of both VR and AR. By the end of Abrash's talk, it would be easy to look at the long list of deeply technical challenges he referenced as insurmountable obstacles. But he's a technical guy. These are problems to solve, and he's giddy to solve them. So let's get technical for a bit.

Getting Technical: Why Virtual Reality is so Tough

Photo credit: Flickr user sklathill via Creative Commons

"17 years ago, I gave a talk at GDC about the technology John Carmack and I had developed for Quake," Abrash began. "That was the most fun I ever had giving a talk, because for me Quake was science fiction made real...around 1994, I read Neal Stephenson’s Snow Crash, and instantly realized a lot of the Metaverse was doable then, and I badly wanted to be part of making it happen. The best way I could see to do that was to join Id Software to work with John on Quake, so I did. What we created there actually lived up to the dream Snow Crash had put into my head."

Snow Crash, much like William Gibson's Neuromancer, conjured up images of virtual reality and cyberspace that have endured for decades. 20 years ago, it inspired Abrash to help create Quake, and today that inspiration has driven him to explore all of the roadblocks between the Oculus Rift and Stephenson's Metaverse. Even though the Rift is just now making its way into the hands of game developers, the real challenges start now.

Display Resolution is an immediately obvious issue. The developer kit for the Oculus Rift ships with a 1280x800 panel display, and that resolution is split between both eyes to create a cohesive image. That means each eye is only seeing at maximum a 640x800 image at extremely close range.

"[With current technology] you get a display with less than 1/50th the pixel density of a phone at normal viewing distance."

"The math for resolution together with wide fields of view is brutal, said Abrash. He added that 1K x 1K resolution is likely the best an affordable HMD like the Rift will be able to support in the next year. Divided by a 100-degree field of view, "you get a display with less than 1/50th the pixel density of a phone at normal viewing distance...the bottom line is that as with 3D, it will take years, if not decades, to fully refine VR." Oculus does plan on shipping the consumer version of the Rift with a 1080p display in 2014. But based on proximity to the eye, even that will be noticeably more pixelated than a smartphone held at arm's length.

It will take a few years to deliver head-mounted displays with higher resolution panels--a resolution of about 4K should get us much closer to an ideal pixel density--but that's primarily a cost issue. Abrash quickly moved from resolution to what he considers the three main challenges for making virtual reality "far more convincing": Tracking, latency, and producing perceptions indistinguishable from reality.

Image credit: Michael Abrash

"In order for virtual images to seem real, they have to appear to remain in the correct position relative to the real world at all times," Abrash said. "That means, for example, that as the head turns in this slide [pictured above], the virtual view has to change correspondingly to show the correct image for the new head position, just as would happen with a real-world view."

Tracking, as defined by Abrash, is "the determination of head position and orientation in the real world, known as pose." If virtual images aren't exactly in the right place relative to head position and the real world, the illusion breaks down. "Tracking has to be super-accurate," Abrash said. "On the order of a millimeter at 2 meters distance from the sensor...The human perceptual system has evolved to be very effective at detecting such anomalies, because anomalies might be thinking about eating you, or might be tasty."

If virtual images aren't exactly in the right place relative to head position and the real world, the illusion breaks down.

The implementation of tracking involves translation, which Abrash explains as "accurate reporting of head movement from side to side, up and down, and forward and back." The Oculus Rift development kit does not support translation, which is required for absolute positioning. Instead, it uses a gyroscope and accelerometer, which are cheap and fairly effective at determining rotational movement. But without the absolute positioning involved in translation, an image can "drift," creating a disparity between where you should be looking relative to the real world and what you're actually seeing in virtual reality.

Abrash explained that the Oculus Rift does attempt to model head and neck movements to account for rotation. This works pretty well for first-person shooters, but may break immersion or limit gameplay possibilities when you want to be able to move your head forward and backward, for example. This is something Oculus will undoubtedly continue to improve, but without absolute positioning "VR can’t stay stable with respect to the real world, and that rules out games like board games that need to stay in one place."

Latency, like resolution, has been a storm cloud over the Rift since Oculus first demonstrated the headset. But money won't fix the problem with latency. We need programmers--really, really smart programmers--and maybe technology that doesn't exist yet. We talked in-depth with Oculus Rift about latency at CES, and Abrash devoted an entire article to latency on his Valve blog.

At GDC, he posted a slide showing that for ideal VR, we need tracking, rendering, transmitting to the display, getting photons coming out of the display, and getting photons to stop coming out to all happen in under 20 milliseconds. For comparison, Abrash said: "Since a single 60 Hz frame is 16.6 milliseconds and latency in a typical game is 35 milliseconds or more--often much more--it will be challenging to get to 20 milliseconds or less."

The last issue Abrash addressed, producing perceptions indistinguishable from reality, is by far the most complex. At this point in the presentation, Abrash focused on explaining how human perception works with space-time diagrams, which aren't easy to wrap your head around at first. Most importantly, the on-off nature of pixels in an electronic display differ from the constant way we perceive the real world. This really matters when motion comes into play.

Pixels only update once every frame. More importantly, each sub-pixel--red, blue, and green--activates in sequence, rather than simultaneously. "Color fringes appear at the left and right sides of the image, due to the movement of the eyes relative to the display between the times red, green, and blue are shown," Abrash said. With a 60Hz refresh rate, a frame is displayed for 16.6 ms. Not long, but long enough to cause problems.

Abrash explained that a casual turn of the head is a movement of about 100 degrees per second; at a speed of 120 degrees per second, one frame represents an offset of two degrees. That's enough for us to see a fringe of color from the subpixels. But color fringe is a minor annoyance compared to judder, a term movie buffs are familiar with. Judder is a natural side effect of panning while shooting films at 24 frames per second, because the framerate simply isn't high enough to capture entirely smooth motion.

In VR, judder comes from pixels being lit for too long. Ideally, a pixel is brightly lit for only a fraction of a frame, which Abrash calls zero persistence. Half or full persistence, where pixels remain lit for half or the entirety of a frame, cause problems for fast motion.

Image credit: Michael Abrash

"Here we see the case where the eyes track a virtual object that’s moving across the display," said Abrash, referring to the slide above. "This could involve tracking a virtual object that appears to be moving through space, or it could involve turning the head...while fixating on a virtual object that isn’t moving relative to the real world...Ideally, the virtual object would stay in exactly the same position relative to the eyes as the eyes move. However, the display only updates once a frame, so, as this diagram shows, with full persistence the virtual object slides away from the correct location for the duration of a frame as the eyes move relative to the display, snaps back to the right location at the start of the next frame, and then starts to slide away again."

He elaborated with the video below, adding:

"It’s easy to see that because the pixels are only updated once per displayed frame, they slide relative to the markers for a full displayed frame time--about 10 camera frames--then jump back to the correct position...Because this is being played back in slow motion, we can see the images clearly as they move during the course of each displayed frame. At real-world speeds, though, the pixel movement is fast enough to smear across the retina, which makes the image blurry."

How, then, can virtual reality get zero persistence? Abrash jokingly said that a refresh rate of 1000-2000 frames per second would probably be fast enough that the human eye would react to a display just like it does the real world. Since 120 Hz displays are still out of the ordinary today, refresh rates that high won't be feasible for years, if ever. There is another solution--scanning laser displays--which Abrash has tested at Valve and called "amazingly real" compared to full-persistence LCDs--but solving judder actually introduces another problem.

A refresh rate of 2000hz would be fast enough for the human eye to simulate the real world, but modern displays top out at 120hz.

"Zero persistence works perfectly for whatever image the eyes are tracking, because that image lands in exactly the same place on the retina from frame to frame," Abrash said. But "anything that is moving rapidly relative to the eyes now strobes, because successive frames of such images fall too far apart on the retina to fuse, and there’s no smear to hide that."

This is a problem that, currently, has no solution. Abrash ended his presentation by listing out other perceptual problems like "ocular follwing response," "Cyclovergence and shear in retinal correspondence," and "perception during saccades."

On the bright side, he pointed out that these are obstacles to overcome to make virtual reality great, not to make it work. It works now, and you can play Team Fortress 2 with the Oculus Rift dev kit. Immediately after Abrash's talk, fellow Valve employee Joe Ludwig ran through the process of adding Rift support to TF2.

Making Virtual Reality Work in Games

Compared to the overarching difficulties of improving virtual reality, Valve's work modifying Team Fortress 2 looks straightforward. In his presentation, Ludwig noted that porting another Source game to the Rift should only take about a man month--relatively brief for a technology as complicated as VR. Of course, that means an awful lot of work was front-loaded into getting the Source engine up and running with the Oculus Rift. That involved changing how Team Fortress 2 was rendered, compensating for latency as much as possible, and learning how to adapt user inputs and interfaces to a new medium.

Image credit: Joe Ludwig

Ludwig praised the Rift SDK's code for correcting lens distortion. Because the curved lenses in the Rift distort an image, it must be corrected in software. "The SDK contains the necessary shader code to do this, and will report the values to plug into the distortion function since they vary from display to display," he wrote. "It might seem like this is a bug, but it’s actually a huge feature of the Rift. One of the things that held VR back in the 90s was the difficult optics involved in letting your eye focus on a large fov display that it’s so close to."

To adapt to the Rift's field of view, Valve had to do away with TF2's normal player weapon models, which cut off around the elbow. VR players could see the edges of those models disappear into nothingness, so Valve replaced them with the full third-person models--minus the heads, where the player's first-person camera sits.

In virtual reality, taking head motion control away from players can be a major mistake.

Valve's tests with different control methods are worth looking into in full--we played around with each in our own Team Fortress 2 video (embedded below), and Ludwig's explanations for each in his presentation are interesting. The default they settled on works like some other first-person games: there's a "dead zone" in the middle of the screen in which you can aim without changing your in-game avatar's orientation. The first mode they tried had aiming completely attached to head motion. Imagine the neck strain!

Finally, there was motion sickness. Ludwig had encouraging news on this front. Yes, motion sickness will be a problem for a lot of people using VR, and that's probably not going to change. But Valve's testing found that people adapt. "Motion sickness induced by VR seems to be something that people get used to over multiple days in the same way people get their sea legs," he said."

Image credit: Joe Ludwig

Many elements in TF2 caused motion sickness, including the Scout's speed, walking up stairs, and rocket jumping. But mostly, taking control away from players was the major mistake. "When you die in TF2 you get a nice shot of whoever killed you. This causes significant problems for some people. All we do is show the same image on the screen for a few seconds and ignore head tracking, but some people are convinced that we’re actually moving the view backwards...Don’t mess with headtracking. If the user turns their head 27 degrees to the right and rolls it 3 degrees their view in the game needs to turn 27 degrees to the right and roll 3 degrees. Anything else is going to make people sick."

Given Ludwig's prediction that other games built in the Source engine will be relatively quick to port to the Oculus Rift is encouraging, since Epic already supports the Rift with Unreal Engine 3 and the developer headset ships with the Unreal Development Kit. While Ludwig stressed that every game is different, that baseline engine support should go a long way towards the even distribution of virtual reality--at least within the world of video games.