A Video‐based Motion Tracking System

On october 2nd 2007, a modified version of the motion tracking system i originally developed for my Lightroom project, will be used in an interactive sound‐installation of Werner Urban at the Kinderboekenbal 2007 at Muziekgebouw aan ‘t IJ in Amsterdam. For those interested i will share here a little bit of the technical backgrounds of this system.

Hardware

The system consists of a firewire camera (from UniBrain), with a wide‐angle lens (Marshall Electronics), an omnidirectional infra‐red illuminator (homemade, for very dark lighting conditions) connected via an 11 mtr. long firewire cable (also from UniBrain) to a computer. This computer is running a softwarepatch (written in the Max/Jitter programming environment). I usually place the camera directly above the scene being tracked, so that the barrel distortion of the wide angle lens isn’t too annoying. It can track and report up to 8 different positions simultaneously (thanks to the excellent external object for Jitter written by Randy Jones: 2up.jit.centroids). These coordinates can then be used for triggering all sorts of actions in Max. In the case of the original project it triggered a matrix of 64 lights and the spatialization of sound in a quadraphonic speaker setup.

Software

The tracking part basically consists of three seperate max‐patches connected to each other, one for setting camera parameters and dimensions, one for video pre‐processing and the last one for the tracking itself.

Setting camera parameters and dimensions.
It consists of getting the live video image into the patch and setting camera parameters, which is some pretty basic tuning of the jit.grab object (like setting dimensions, frame rate and camera‐specific settings like compression etc.) For good tracking (and a speedy patch) the image dimensions don’t have to be big at all. I normally grab a videostream of max. 160 x 120 pixels with a frame rate of 10 fps.

Video pre‐processing.
Some pre‐processing is needed on the video image to get a good and stable image for tracking. What i did was to record (in advance) a short movie from the empty space being tracked, load it in to the patch and use it as a reference movie. This (looping) movie is subtracted from the live input from the camera, resulting in a videostream that shows only pixels that are different then the reference movie (i.e. moving objects, people, etc.). Finally, i normally make the image monochrome for tracking only greyscale values (although you can track colors) and apply some jit.fastblur to get rid of most of the noise before sending it to the tracking part.

Tracking.
Here comes in the magic of the 2up.jit.centroids object. I used a lot from the help patch included with the object. It all is pretty much self‐explanatory. You pick a value you want to track and 2up.jit.centroids reports up to 8 coordinates if it senses this value anywhere in the image. From here on, you can use this coordinates for anything one can think of.

Things to watch out for

Light.
A lot depends on lighting conditions. A small change in brightness (for instance clouds moving in front of the sun) can render the system totally unusable. That’s why i made an infrared illuminator from a stack of blue and red Lee filters attached around a 150 Watts light bulb, to make the scene as evenly lit as possible even if the scene itself has to stay dark.

Steadycam.
The camera has to be tightly fixed in position, because a small movement of the camera will make all pixels different from the reference movie, resulting in a jitter patch freaking out.

Background (floor) and different materials (clothes, hair) affect tracking.
This is the most unreliable part of the system. Sometimes a person wears clothes of a certain color or material that blends too much with the backgroundcolor/‐material, resulting in an image that doesn’t have enough contrast. Thus making it hard for the tracking object to track a specific color. Every situation demands a little finetuning; you can set a tolerance and a treshold for 2up.jit.centroids, to set a range of values to be tracked and a minimum value.