Dynamic eye-tracking in freely moving observers

Using eye-tracking to measure gaze behavior can reveal how individuals process different types of stimuli. Eye-tracking experiments typically use a fixed recording device, and require subjects to keep their heads and bodies still as stimuli are presented on a computer screen. Wearable eye-trackers, on the other hand, allow subjects free range of motion, but introduce the computational challenge of tracking a moving target across time.

Described below are a set of tools that offer dynamic gaze mapping onto static reference images and decoding an observer's movements throughout space. In collaboration with the Pearson Lab and the Nasher Museum of Art, this project represents a case study of how wearable devices may be utilized to measure naturalistic gaze behavior in a museum context.

Mapping gaze from moving to static frames

The eye-tracker includes a scene camera in the front of the glasses recording the approximate point-of-view of the observer. The gaze position can be superimposed onto this video to provide a sense of where the observer was looking at each point in time.

The target stimulus - in this case, the mixed media piece Decompositioning by artist Jeff Sonhouse - shifts across the video frame as the observer assumes different vantage points. Each gaze location is recorded relative to the full video frame; quantifying gaze behavior relative to the artwork requires computing the transformation between the appearance of the artwork in the video frame and a static reference image. This transformation must be computed independently for each frame of the video.

The transformation will provide a mapping between the coordinate systems of the video frame and reference image. Since the gaze location in video frame coordinates is known, this transformation can be used to map the gaze location on each frame of the video onto the static reference image. By having the gaze locations translated to a standard coordinate system, we can begin to use traditional eye-tracking analysis techniques, like creating heatmaps to show which parts of an image subjects fixate upon the most.

Finally, by inverting the video-to-reference transformation matrix on each frame, we can map new information back into the original video. For instance, the evolving heatmap created above can be overlaid on top of the target image in the source video.

Mapping position/orientation from video

Wearable devices allow study participants to freely navigate the 3D space around the target. For experimental purposes, it is important to record the subject's position in space and the path they take as they are navigating. Decoding subject position is possible using information from the point-of-view video alone. To do this, you need two pieces of information: the physical dimensions of the target, and the degree of lens distortion of the camera.

As the subject - and thus the camera - move around space, the perspective of the target in the video frame gets warped to varying degrees. Importantly, the specifics of that warping are unique at different positions in space. By knowing the size of the target and the distortion effects of the camera lens, it is possible to figure out, on each video frame, the unique position and orientation of the camera that would have produced the warping seen on that frame.

By combining camera position and orientation with the gaze location on the target image, it is possible to produce 3D reconstructions of a subject's path and viewing behavior across time.

Above, the blue path depicts the subject's position in space across time; Red block indicates the subject's current position and orientation; Red circle indicates the position of current gaze location, with the size of the circle representing the approximate extent of visual focus based on 5 degrees of visual angle centered around the fovea.