A developer's perspective on immersive 3D computer graphics

Main menu

Post navigation

For Science!

I’ve been busy finalizing the upcoming 4.0 release of the Vrui VR toolkit (it looks like I will have full support for Oculus Rift DK2 just before it is obsoleted by the commercial version, haha), and needed a short break.

So I figured I’d do something I’ve never done before in VR, namely, watch a full-length theatrical movie. I’m still getting DVDs from Netflix like it’s 1999, and I had “Avengers: Age of Ultron” at hand. The only problem was that I didn’t have a VR-enabled movie player.

Well, how hard can that be? Not hard at all, as it turns out. I installed the development packages for the xine multimedia framework, browsed through their hacker’s guide, figured out where to intercept audio buffers and decoded video frames, and three hours later I had a working prototype. A few hours more, and I had a user interface, full DVD menu navigation, a scrub bar, and subtitles. In 737 lines of code, a big chunk of which is debugging output to trace the control and data flow of the xine library. So yeah, libxine is awesome.

Then it was time to pull the easy chair into the office, start VruiXine, put on the Rift, map DVD navigation controls to the handy SteelSeries Stratus XL bluetooth gamepad they were giving away at Oculus Connect2, and relax (see Figure 1).

Figure 1: The title menu of the “Avengers: Age of Ultron” DVD in a no-frills VR movie player (VruiXine). Fancy virtual environments are left as an exercise for the reader.

Now, a confession: While I have uncountable many-hour sessions in screen-based VR under my belt, this was the longest time I’ve ever worn a head-mounted display (almost) continuously, by far, and it was a bit of a chore. I don’t recall any eye strain or headaches after a bit more than two hours, and there wasn’t any nausea or discomfort (not that there was any reason for there to be any from sitting still and watching a giant movie screen), but when I saw myself in the mirror during a short bathroom break halfway through the movie, I looked like I had just been in a bar fight. That was some serious Oculus face. No, there won’t be any pics. I also had to adjust the Rift a lot, and after about one and a half hours I couldn’t find a position where it wasn’t uncomfortable.

Even with those caveats, the experience was worlds removed from the torture of trying to watch a movie in my Sony HMZ-T1. See, when I said above that I had never watched a movie in VR, I wasn’t technically lying. For one, the Sony isn’t technically a VR headset, and for two, I never finished watching a movie in it. While I primarily bought the Sony to experiment with head-mounted VR (that was about five years ago), I also really wanted to use it as a video viewer. Alas, it was so uncomfortable, nay, outright painful, that I never managed more than about half an hour. On top of that, the entire idea of a non-head tracked head-mounted movie viewer is flawed. Even with the Sony’s rather modest 45° field of view, viewers need to move their heads around to focus on all parts of the virtual screen. Problem is, they can’t, because the screen is bolted to their heads. This aspect, of course, worked perfectly with the Rift.

Going into the experiment, my biggest worry was low resolution. The DK2’s display is a tad on the low-res side (see Figures 2 and 3), stretching less than 960×1080 pixels per eye over about 100° field of view, and the whole experience was decidedly fuzzy. Granted, DVD video at 720×480 doesn’t exactly count as a high-resolution format these days, but it does look considerably better on the 1920×1080 projector in my living room. I had to consciously employ temporal super resolution to see fine details: essentially, by making continuous very small head movements, different pixels or parts of pixels from the virtual movie screen get mapped to the same pixels of the display, and the visual cortex combines these impressions into a subjectively higher-resolution image. It’s why head-tracked displays appear higher-resolution than fixed displays at the same pixel count.

Figure 3: A standard vision test chart, as displayed in an Oculus Rift DK2. My test result: 20/80. The result is actually better than what can be seen in this static image, due to temporal super resolution.

So, what’s the verdict? I really like watching movies on big screens, which is why I have a projector and a very big empty white wall in my living room. The idea of doing this with a small head-mounted display, insulated from outside light and in a comfy chair (or even lying down), is highly compelling. Unfortunately, we’re not quite there yet, mostly for ergonomic and resolution reasons. The Oculus Rift DK2 is already noticeably under-rezzed for DVD video, and I haven’t even tried 720p or 1080p sources yet. Fortunately, higher-resolution and hopefully more comfortable HMDs are soon to be released. But at this point in time, while it’s definitely already possible to watch and enjoy full movies in VR, I still prefer my projector. Not that it’s possible to take that on the road, of course…

Of course I know it was intentional. That’s actually an interesting point — for me, the cartoonishly-designed but photorealistically-rendered characters in a photorealistic environment were themselves instances of the Uncanny Valley, and stuck out like sore thumbs. It took me right out of the movie the first time we saw Poppa in full view. I couldn’t help but thinking “Who designed this character? A ten-year old?”

Compare this to Inside Out, where the characters and backgrounds were in perfect sync.

You have put your finger on the evolutionary divergence that PIXAR is faced with – it can make both the scenery and the characters photorealistic, but that have chosen to dumb down the characters in The Good Dinosaur, as opposed to their Inside Out sync. Will send you a PM to explain this in more detail.

Would it be too slow to extrapolate depth from the stereo pair as the movie plays (no pre-processing)? And are the algorithms for that purpose available today suited for most of the types of scenes present in popular movies?

Unless something bad happens, for sure. The main 4.0 changes were infrastructure for Rift DK2 support, but I always hated how that worked due to the strange way the ovrd tracking driver was set up. It was flaky as hell. And there were no official 6-DOF input devices.

Now that I have really good and stable support for Vive via Valve’s official Linux OpenVR Lighthouse driver, it’s just a matter of polish. I’m currently working on perceived latency reduction, and I’m still lacking a good UI to set up screen protector boundaries.

I’d like to “pre-record” the subject’s frontal face, and using the landmark points from this library: http://blog.dlib.net/2014/08/real-time-face-pose-estimation.html texture the pre-recorded model of the subject’s face onto the live 3D video stream. It may look weird as their facial expression will remain static, however it can’t be worse than the blackness caused by the rift or vive.

Alternative idea: pre-record the user’s face using the same 3D camera you use for full-body capture later on. Then, while having the headset on, you use the headset’s tracking data to align the pre-recorded face surface with where the face would be at that point in time. Simple to implement, and almost zero processing cost.

One concern: you say showing a static face can’t be worse than showing no face, but strangely, that’s not necessarily true. A face with a frozen expression that does not match our expectation of what the person’s face should look like at any given point in time might appear very uncanny. I haven’t tried this, so I can’t say for sure.