This Christmas, at Tool we wanted to create a small interactive experience to share with our friends and clients. Since lately I did experiment with compositing WebGL objets on a video [1, 2] I thought this is a cool technique that we can use.

The idea was simple enough: we would shoot a Christmas tree in a nicely decorated room and composite-in a gift box that the user can interact with while watching the video. All this is rendered with WebGL – the video runs in the background and the 3d interactive content on top, both layers are matched in perspective and movement.

To achieve this effect I had to use quite a lot of different pieces of software. Here’s a breakdown of what it took to build it:

Cinema4D

First of all, I needed to match the perspective of the camera in the footage and that of the camera in the 3D scene. There is no exact science in doing that, so the best way is to take a frame from the video, use it as background in C4D and try to match manually.

It’s a trial and error technique and adjusting the details can be quite a challenge. Fortunately, I found a good book about matchmoving with some very useful tips… like that one about writing down what lens was used during filming (I forgot about that, of course… :)

Mocha

After matching comes tracking. At first I wanted to do full camera solve, but it turned out to be quite complex and not necessary in this case, so I went with 2D tracking instead. A very good tool to do this is the Mocha AE Plugin – it is easy to use, accurate and fast.

Using 2D tracking means that we only track movement on the XY plane. We also do not account for any rotation of the camera. For handheld shots, where there’s only slight camera movement, it is good enough. Of course this would never work for tracking or panning shots – these require full camera solve.

Once the tracking is done and tweaked, Mocha allows to export the tracking data into a text-based format. After this, all I needed was a simple Python script to turn this data into a nicely formated JSON.

Unity3D

Once the 3D model of the gift was in place and all the camera angles were matched in C4D, I exported the whole thing to Unity3D. The main reason for this, is that I wanted to take advantage of the Unity/WebGL exporter to get that to WebGL.

I also used Unity’s animation editor to create the movement of the box when the cap flies off and the nutcracker pops out. Do do that, I added some functionality to J3D to support animations. It can be found in the v2 branch of J3D. One of the main changes in the engine I had to make, was to switch from Euler angles to Quaternions.

FFMpeg

In order to correctly track video and overlay any elements with precision, I needed to know the frame the video is at at any time during playback. The easiest way is to take the current time and divide it by the duration of a single frame (ex. 1/24 seconds). Unfortunately it is not that simple! If you want an in-depth look why it is so complicated please read the excellent article by Zeh Fernando. Even though his article talks about Flash, same thing applies to HTML5 video.

Long story short, each video used in a tracked shot needs to have the number of the frame encoded into it. The best way to do this: encode the frame number as binary marker somewhere in the video. To see how it looks, try playing one of the videos directly in your browser. See those white boxes on the bottom? This is it!

Python/PIL

Adding the binary marker to the video can be painful if done manually. This is where Python comes in. Using a library called PIL (Python Imaging Library) I wrote a simple script that does the following:

decompose a video clip into a sequence of PNGs (using FFMPEG)

manipulate each image by adding the binary frame number at the bottom

encode the frames back into a video optimized for HTML5 (FFMPEG again)

On the Javascript side, I use a simple technique of copying part of the video into a canvas and reading the color of the pixels to determine what frame we are at. And of course I make sure to mask the marker.

Video encoding tip. One thing to consider when adding the frame number marker is that video playback is optimal if the pixel size of the video is modulo 16, for example 1024 x 576. Influxis posted a list of all the optimal video dimensions (again, Flash or HTML5 – same rules apply).

Now, if your video has an optimized pixel dimension it would be a shame to add a few pixels with the binary marker and get a resulting video that is no longer optimized. It’s usually better to add the marker over of the video. You will need to sacrifice a few pixels of the footage, but a better playback speed will make up for that.

WebGL

With the Unity exporter getting the scene to render in WebGL was simple. The one big thing left at this point were custom shaders that would make the gift box blend well with the video. In fact is was the most challenging part od the project! The shader I ended up using on the box has diffuse and specular lighting with a specular map, reflections with a reflection map and a normal map to make the gift wrap look as realistic as possible. Finally, I added a bit of personalization – if you type your name after the # in the URL it will be rendered on the label on the box (example).

Sound

Last but not least: our friends at Plan8 added some great interactive sound FX that, as always, add a lot to the final effect.

It was fun to create this. I feel like this technique can definitely have some interesting uses. Of course, nowadays, anything that requires WebGL is treated as experimental, but once the majority of browsers can render that… What do you think?

Alexander commented on January 28th, 2013

It looks nice. But isn’t this insanely too much work and time to do this stuff in HTML 5 vs doing it in Flash.

Ok I am not an expert but here is my guess.
Flash can work with video, stage3D(3D) and mouse. So setting up
a 3D model over a video should not be such a big deal. Since you
can manipulate the camera and perspective of the stage3D.
Tracking mouse also is easy to do in flash.
Also I guess you can get information about current video frame
directly from flash video API without
encoding this into the video.
Also one big advantage is that Flash has Monocle profiling tool
which gives you a way to run-time debug your 3D environment and
test shaders and see directly the results.
Maybe I am wrong but it seems to me that doing this in Flash will require at least 2 times less time and effort. Don’t want to
criticize your work this looks great. I just compare the time
and the effort, and I am far from being an expert like you ;).

For WebGL, instead of Away3D, you can use Three.js with similar effects. I used J3D which is an engine I wrote from scratch, but not specifically for this project. Also, writing custom shaders for WebGL using GLSL (high level language) seems easier that using AGAL (assembly code) in Flash, but maybe it’s just me

> Tracking mouse also is easy to do in flash.
It is as easy to do in Javascript

> Also I guess you can get information about current video frame directly from flash video API
no, you can’t rely on that – please read Zeh’s article (link above)

Thanks dude! Happy New Year to you too! How’s Stockholm, is it cold enough to ice skate on the lake? :)

Post a comment

everyday3d is a blog run by Bartek Drozdz about web development with a focus on realtime 3d graphics. It's been around since 2007. First, it focused on Flash (remember?) then moved over to Unity3d, WebGL and eventually - Virtual Reality and 360°.

I hope you enjoy the articles and demos posted here. Most of what I write about, I thaught myself from good books. Here's a few I would recommend if you want to deepen your knowledge of creative coding, math and realtime graphics.