On its seventh occasion, it is our pleasure to bring ETRA 2012 to Santa
Barbara, CA. The series of ETRA symposiums has become the leading
international conference in eye tracking technology and its applications,
bringing together people from a wide range of backgrounds. Authors have
been encouraged to submit papers on topics such as advances in eye tracking
hardware and software, eye movement data analysis, visual attention, and
eye movement control. ETRA 2012's special theme has been Mobile Eye Tracking. Mobile devices
such as smartphones and tablet computers are becoming increasingly
powerful. Embedding the capability to track eye movements and support
gaze-based applications in these devices raises new challenges and
opportunities for many aspects of eye tracking research. These proceedings
contain several papers that address these problems. As in previous years, ETRA 2012 has had two kinds of submissions: Long
Papers (8 pages), and Short Papers (4 pages). Authors were requested to
send an abstract in advance, and submit their papers in blind-format. For
the second consecutive occasion, ETRA has received more submissions than
for any of the previous symposia, with 53 long and 101 short papers being
submitted. These proceedings contain the 18 long and 22 short papers that
were accepted for oral presentations, and the 43 short papers accepted as
posters. These papers were selected after a rigorous and impartial
double-blind review process, where each original submission was reviewed by
at least 3 reviewers, followed by careful examination from one of our 7
Area Chairs. Each Area Chair wrote a meta-review for the papers within
their area of expertise, and the final selection was made by the Program
Chairs and Area Chairs, on the basis of the reviews and the meta-reviews.

Gaze visualization

A GPU implementation is given for real-time visualization of aggregate eye
movements (gaze) via heatmaps. Parallelization of the algorithm leads to
substantial speedup over its CPU-based implementation and, for the first time,
allows real-time rendering of heatmaps atop video. GLSL shader colorization
allows the choice of color ramps. Several luminance-based color maps are
advocated as alternatives to the popular rainbow color map, considered
inappropriate (harmful) for depiction of (relative) gaze distributions.

A method to construct an importance map of an image using the saliency map
model and eye movement analysis

Interpretability and recognizability of images have played important roles
in applications such as the analysis of surveillance images, medical image
diagnosis, and visual communication in education. In order to make an image as
interpretable and recognizable as possible, unimportant visual information is
removed or minimized, and regions that are of higher importance than others are
clearly identified. Several methods have been developed to identify the
important regions in an image. Most of these methods consist of two stages:
segmentation of the image and ordering the segments hierarchically according to
their relative importance. In the present paper, we propose a new method by
which an importance map of a source image can be constructed. First, the source
image is divided into segments based on a saliency map model that indicates
high-saliency regions. Second, the segments are ordered according to the
attention shift induced by the saliency map. Third, eye movement data is
acquired and mapped into the segments. A network for the eye movements is
generated by regarding the segments as nodes. The importance score can be
calculated by the PageRank algorithm. Finally, an importance map of the image
is constructed by combining the attention shift among the segments and the
scores determined from eye movements. The usefulness of the proposed method is
then investigated through several experiments.

Measuring and visualizing attention in space with 3D attention volumes

Knowledge about the point of regard is a major key for the analysis of
visual attention in areas such as psycholinguistics, psychology, neurobiology,
computer science and human factors. Eye tracking is thus an established
methodology in these areas, e. g., for investigating search processes, human
communication behavior, product design or human-computer interaction. As eye
tracking is a process which depends heavily on technology, the progress of gaze
use in these scientific areas is tied closely to the advancements of
eye-tracking technology. It is thus not surprising that in the last decades,
research was primarily based on 2D stimuli and rather static scenarios,
regarding both content and observer.
Only with the advancements in mobile and robust eye-tracking systems, the
observer is freed to physically interact in a 3D target scenario. Measuring and
analyzing the point of regards in 3D space, however, requires additional
techniques for data acquisition and scientific visualization. We describe the
process for measuring the 3D point of regard and provide our own implementation
of this process, which extends recent approaches of combining eye tracking with
motion capturing, including holistic estimations of the 3D point of regard. In
addition, we present a refined version of 3D attention volumes for representing
and visualizing attention in 3D space.

Automatic analysis of 3D gaze coordinates on scene objects using data from
eye-tracking and motion-capture systems

We implemented a system, called the VICON-EyeTracking Visualizer, that
combines mobile eye tracking data with motion capture data to calculate and
visualize the 3D gaze vector within the motion capture co-ordinate system. To
ensure that both devices were temporally synchronized we used previously
developed software by us. By placing reflective markers on objects in the
scene, their positions are known and by spatially synchronizing both the eye
tracker and the motion capture system allows us to automatically compute how
many times and where fixations occur, thus overcoming the time consuming and
error-prone disadvantages of the traditional manual annotation process. We
evaluated our approach by comparing its outcome for a simple looking task and a
more complex grasping task against the average results produced by the manual
annotation process. Preliminary data reveals that the program only differed
from the average manual annotation results by approximately 3 percent in the
looking task with regard to the number of fixations and cumulative fixation
duration on each point in the scene. In case of the more complex grasping task
the results depend on the object size: for larger objects there was good
agreement (less than 16 percent (or 950ms)), but this degraded for smaller
objects, where there are more saccades towards object boundaries. The
advantages of our approach are easy user calibration, the ability to have
unrestricted body movements (due to the mobile eye-tracking system), and that
it can be used with any wearable eye tracker and marker based motion tracking
system. Extending existing approaches, our system is also able to monitor
fixations on moving objects. The automatic analysis of gaze and movement data
in complex 3D scenes can be applied to a variety of research domains, i. e.,
Human Computer Interaction, Virtual Reality or grasping and gesture research.

Eye tracking systems

Data quality is essential to the validity of research results and to the
quality of gaze interaction. We argue that the lack of standard measures for
eye data quality makes several aspects of manufacturing and using eye trackers,
as well as researching eye movements and vision, more difficult than necessary.
Uncertainty regarding the comparability of research results is a considerable
impediment to progress in the field. In this paper, we illustrate why data
quality matters and review previous work on how eye data quality has been
measured and reported. The goal is to achieve a common understanding of what
data quality is and how it can be defined, measured, evaluated, and reported.

This paper presents a probabilistic approach for the estimation of the angle
between the optical and visual axes (angle kappa) in infants. The approach
assumes that when patterned calibration targets are presented on a uniform
background, subjects are more likely to look at the calibration targets than at
the uniform background, but it does not require accurate and continuous
fixation on presented targets. Simulations results show that when subjects
attend to roughly half of the presented targets, angle kappa can be estimated
accurately with low probability (< 1%) of false detection. In experiments
with five babies who attended to the calibration target for only 47% of the
time (range from 26% to 70%), the average difference between repeated
measurements of angle kappa was 0.04 ± 0.31°.

Augmenting the robustness of cross-ratio gaze tracking methods to head
movement

Remote gaze estimation using a single non-calibrated camera, simple user
calibration or calibration free, and robust to head movements are very
desirable features of eye tracking systems. Because cross-ratio (CR) is an
invariant property of projective geometry, gaze estimation methods that rely on
this property have the potential to provide these features, though most current
implementations rely on a few simplifications that compromise the performance
of the method. In this paper, the CR method for gaze tracking is revisited, and
we introduce a new method that explicitly compensates head movements using a
simple 3 parameter eye model. The method uses a single non-calibrated camera
and requires a simple calibration procedure per user to estimate the eye
parameters. We have conducted simulations and experiments with real users that
show significant improvements over current state-of-the-art CR methods that do
not explicitly compensate for head motion.

Gaze informed user interfaces

Contents of Visual Short-Term Memory depend highly on viewer attention. It
is possible to influence where attention is allocated using a technique called
Subtle Gaze Direction (SGD). SGD combines eye tracking with subtle image-space
modulations to guide viewer gaze about a scene. Modulations are terminated
before the viewer can scrutinize them with high acuity foveal vision. This
approach is preferred to overt techniques that require permanent alterations to
images to highlight areas of interest. In our study, participants were asked to
recall the location of objects or regions in images. We investigated if using
SGD to guide attention to these regions would improve recall. Results showed
that the influence of SGD significantly improved accuracy of target count and
spatial location recall. This has implications for a wide range of applications
including spatial learning in virtual environments as well as image search
applications, virtual training and perceptually based rendering.

We use the Subtle Gaze Direction technique (SGD) to guide novices as they
try to find abnormalities in mammograms. SGD works by performing image-space
modulations on specific regions of the peripheral vision to attract attention.
Gaze is monitored and modulations are terminated before they are scrutinized
with high-acuity foveal vision. This approach is preferred to overt techniques
which permanently alter images to highlight areas of interest. SGD is used to
guide novices along the scanpath of an expert radiologist. We hypothesized that
this would increase the likelihood of novices correctly identifying
irregularities. Results reveal that novices who were guided in this manner
performed significantly better than the control group (no gaze manipulation).
Furthermore, a short-term post-training lingering effect was observed among
subjects guided using SGD. They continued to perform better than the control
group once the training was complete and gaze manipulation was disabled.

What do you want to do next: a novel approach for intent prediction in
gaze-based interaction

Interaction intent prediction and the Midas touch have been a longstanding
challenge for eye-tracking researchers and users of gaze-based interaction.
Inspired by machine learning approaches in biometric person authentication, we
developed and tested an offline framework for task-independent prediction of
interaction intents. We describe the principles of the method, the features
extracted, normalization methods, and evaluation metrics. We systematically
evaluated the proposed approach on an example dataset of gaze-augmented
problem-solving sessions. We present results of three normalization methods,
different feature sets and fusion of multiple feature types. Our results show
that accuracy of up to 76% can be achieved with Area Under Curve around 80%. We
discuss the possibility of applying the results for an online system capable of
interaction intent prediction.

Wearable eye trackers open up a large number of opportunities to cater for
the information needs of users in today's dynamic society. Users no longer have
to sit in front of a traditional desk-mounted eye tracker to benefit from the
direct feedback given by the eye tracker about users' interest. Instead, eye
tracking can be used as a ubiquitous interface in a real-world environment to
provide users with supporting information that they need. This paper presents a
novel application of intelligent interaction with the environment by combining
eye tracking technology with real-time object recognition. In this context we
present i) algorithms for guiding object recognition by using fixation points
ii) algorithms for generating evidence of users' gaze on particular objects
iii) building a next generation museum guide called Museum Guide 2.0 as a
prototype application of gaze-based information provision in a real-world
environment. We performed several experiments to evaluate our gaze-based object
recognition methods. Furthermore, we conducted a user study in the context of
Museum Guide 2.0 to evaluate the usability of the new gaze-based interface for
information provision. These results show that an enormous amount of potential
exists for using a wearable eye tracker as a human-environment interface.

Visual attention: studies, tools, methods

Audio description as an aural guide of children's visual attention: evidence
from an eye-tracking study

Audio description (AD) has become a cultural revolution for the visually
impaired; however, the range of AD beneficiaries can be much broader. We claim
that AD is useful for guiding children's attention. The paper presents an
eye-tracking study testing the usefulness of AD in selective attention to
described elements of a video scene. Forty-four children watched 2 clips from
an educational animation series while their eye movements were recorded.
Average fixation duration, fixation count, and saccade amplitude served as
primary dependent variables. The results confirmed that AD guides children's
attention towards described objects resulting e. g., in more fixations on
specific regions of interest. We also evaluated eye movement patterns in terms
of switching between focal and ambient processing. We postulate that audio
description could complement regular teaching tools for guiding and focusing
children's attention, especially when new concepts are introduced.

Let's look at the cockpit: exploring mobile eye-tracking for observational
research on the flight deck

As part of our research on multimodal analysis and visualization of activity
dynamics, we are exploring the integration of data produced by a variety of
sensor technologies within ChronoViz, a tool aimed at supporting the
simultaneous visualization of multiple streams of time series data. This paper
reports on the integration of a mobile eye-tracking system with data streams
collected from HD video cameras, microphones, digital pens, and simulation
environments. We focus on the challenging environment of the commercial airline
flight deck, analyzing the use of mobile eye tracking systems in aviation human
factors and reporting on techniques and methods that can be applied in this and
other domains in order to successfully collect, analyze and visualize
eye-tracking data in combination with the array of data types supported by
ChronoViz.

We present a method to analyze a relationship between eye movements and
saliency dynamics in videos for estimating attentive states of users while they
watch the videos. The multi-mode saliency-dynamics model (MMSDM) is introduced
to segment spatio-temporal patterns of the saliency dynamics into multiple
sequences of primitive modes underlying the saliency patterns. The MMSDM
enables us to describe the relationship by the local saliency dynamics around
gaze points, which is modeled by a set of distances between gaze points and
salient regions characterized by the extracted modes. Experimental results show
the effectiveness of the proposed model to classify the attentive states of
users by learning the statistical difference of the local saliency dynamics on
gaze-paths at each level of attentiveness.

Distinguishing whether eye tracking data reflects reading or skimming
already proved to be of high analytical value. But with a potentially more
widespread usage of eye tracking systems at home, in the office or on the road
the amount of environmental and experimental control tends to decrease. This in
turn leads to an increase in eye tracking noise and inaccuracies which are
difficult to address with current reading detection algorithms. In this paper
we propose a method for constructing and training a classifier that is able to
robustly distinguish reading from skimming patterns. It operates in real time,
considering a window of saccades and computing features such as the average
forward speed and angularity. The algorithm inherently deals with distorted eye
tracking data and provides a robust, linear classification into the two classes
read and skimmed. It facilitates reaction times of 750ms on average, is
adjustable in its horizontal sensitivity and provides confidence values for its
classification results; it is also straightforward to implement. Trained on a
set of six users and evaluated on an independent test set of six different
users it achieved a 86% classification accuracy and it outperformed two other
methods.

Gaze based interaction

Since eye gaze may serve as an efficient and natural input for steering in
virtual 3D scenes, we investigate the design of eye gaze steering user
interfaces (UIs) in this paper. We discuss design considerations and propose
design alternatives based on two selected steering approaches differing in
input condition (discrete vs. continuous) and velocity selection (constant vs.
gradient-based). The proposed UIs have been iteratively advanced based on two
user studies with twelve participants each. In particular, the combination of
continuous and gradient-based input shows a high potential, because it allows
for gradually changing the moving speed and direction depending on a user's
point-of-regard. This has the advantage of reducing overshooting problems and
dwell-time activations. We also investigate discrete constant input for which
virtual buttons are toggled using gaze dwelling. As an alternative, we propose
the Sticky Gaze Pointer as a more flexible way of discrete input.

A novel method for video-based head gesture recognition using eye
information by an eye tracker has been proposed. The method uses a combination
of gaze and eye movement to infer head gestures. Compared to other
gesture-based methods a major advantage of the method is that the user keeps
the gaze on the interaction object while interacting. This method has been
implemented on a head-mounted eye tracker for detecting a set of predefined
head gestures. The accuracy of the gesture classifier is evaluated and verified
for gaze-based interaction in applications intended for both large public
displays and small mobile phone screens. The user study shows that the method
detects a set of defined gestures reliably.

Simple gaze gestures and the closure of the eyes as an interaction technique

We created a set of gaze gestures that utilize the following three elements:
simple one-segment gestures, off-screen space, and the closure of the eyes.
These gestures are to be used as the moving tool in a gaze-only controlled
drawing application. We tested our gaze gestures with 24 participants and
analyzed the gesture durations, the accuracy of the stops, and the gesture
performance. We found that the difference in gesture durations between short
and long gestures was so small that there is no need to choose between them.
The stops made by closing both eyes were accurate, and the input method worked
well for this purpose. With some adjustments and with the possibility for
personal settings, the gesture performance and the accuracy of the stops can
become even better.

Eye tracking systems issues I

Self-localization is the process of knowing your position and location
relative to your surroundings. This research integrated artificial intelligence
techniques into a custom-built portable eye tracker for the purpose of
automating the process of determining indoor self-localization. Participants
wore the eye tracker and walked a series of corridors while a video of the
scene was recorded along with fixation locations. Patches of the scene video
without fixation information were used to train the classifier by creating
feature maps of the corridors. For testing the classifier, fixation locations
in the scene were extracted and used to determine the location of the
participant. Scene patches surrounding fixations were used for the
classification instead of objects in the environment. This eliminated the need
for complex computer vision object recognition algorithms and made scene
classification less dependent upon objects and their placement in the
environment. This allowed for a sparse representation of the scene since image
processing to detect and recognize objects was not necessary to determine
location. Experimentally, image patches surrounding fixations were found to be
a highly reliable indicator of location, as compared to random image patches,
non-fixated salient image patches, or other non-salient scene locations. In
some cases, only a single fixation was needed to accurately identify the
correct location of the participant. To the best of our knowledge, this
technique has not been used before for determining human self-localization in
either indoor or outdoor settings.

As pertinent technologies continue to evolve, eye tracking hardware options
grow more diverse. Consequently, it is important that researchers verify that
new systems and parameters used in testing meet data collection quality
standards. The current study evaluated hardware from four manufacturers: SR
Research, Seeing Machines, SensoMotoric Instruments and Tobii Technology. The
eye trackers included different system types and different sampling rates. The
purpose of this research was to determine whether or not the pupil recording of
each system was precise enough to effectively utilize the Index of Cognitive
Activity, a validated cognitive workload metric. Results indicated that each
system effectively captured Index of Cognitive Activity data. System factors
such as system type sampling rate did not affect the metric. To maintain the
integrity of data collected by succeeding generations of eye tracker, it is
important that this type of quality-control research continues.

Eye tracking analysis is the state of the art technique to study questions
of usability and cognition of graphical user interfaces. This paper presents a
new technique for the visualization of eye tracking data, the Parallel
Scan-Path Visualization. A key feature is the visualization of eye movements of
many subjects on a single screen in a parallel layout. The visualization
presents various properties of scan-paths, such as fixations, gaze durations
and eye shift frequencies at one glance. The paper concludes with an example of
use of the Parallel Scan-Path Visualization technique.

Permutation test for groups of scanpaths using normalized Levenshtein
distances and application in NMR questions

This paper presents a permutation test that statistically compares two
groups of scanpaths. The test uses normalized Levenshtein distances when the
lengths of scanpaths are not the same. This method was applied in a recent
eye-tracking experiment in which two groups of chemistry students viewed
nuclear magnetic resonance (NMR) spectroscopic signals and chose the
corresponding molecular structure from the candidates. A significant difference
was detected between the two groups, which is consistent with the fact that
students in the expert group showed more efficient scan patterns in the
experiment than the novice group. Various numbers of permutations were tested
and the results showed that p-values only varied in a small range with
different permutation numbers and that the statistical significance was not
affected.

Robust, accurate, real-time pupil tracking is a key component for online
gaze estimation. On head-mounted eye trackers, existing algorithms that rely on
circular pupils or contiguous pupil regions fail to detect or accurately track
the pupil. This is because the pupil ellipse is often highly eccentric and
partially occluded by eyelashes. We present a novel, real-time dark-pupil
tracking algorithm that is robust under such conditions. Our approach uses a
Haar-like feature detector to roughly estimate the pupil location, performs a
k-means segmentation on the surrounding region to refine the pupil centre, and
fits an ellipse to the pupil using a novel image-aware Random Sample Concensus
(RANSAC) ellipse fitting. We compare our approach against existing real-time
pupil tracking implementations, using a set of manually labelled infra-red
dark-pupil eye images. We show that our technique has a higher pupil detection
rate and greater pupil tracking accuracy.

Smooth pursuit eye movements hold information about the health, activity and
situation of people, but to date there has been no efficient method for their
automated detection. In this work we present a method to tackle the problem,
based on machine learning. At the core of our method is a novel set of shape
features that capture the characteristic shape of smooth pursuit movements over
time. The features individually represent incomplete information about smooth
pursuits but are combined in a machine learning approach. In an evaluation with
eye movements collected from 18 participants, we show that our method can
detect smooth pursuit movements with an accuracy of up to 92%, depending on the
size of the feature set used for their prediction. Our results have twofold
significance. First, they demonstrate a method for smooth pursuit detection in
mainstream eye tracking, and secondly they highlight the utility of machine
learning for eye movement analysis.

Eye tracking applications I

Automatic segmentation of a video stream poses a serious challenge to
multimedia research. Here we explore the idea that temporal segmentation might
include the observers' watching style. We propose a way to parse a visual
stimulus into temporally-defined units by exploiting the difference of
exploratory eye movements between novice and expert observers. The difference
was condensed into a single quantity, the quasi-instantaneous spatial extension
of the regions fixated significantly longer by either group of observers, which
we termed Visual Differential Attractor (VDA). As test-bed, we presented a
videotaped billiard match to novice and professional players, and recorded
their eye movements. We assessed whether VDA, in tracing over time the
oculomotor difference between experts and novices, would mark the individual
shots embedded in the movie. Indeed, VDA showed systematic modulations over
time, with peaks and toughs occurring before and after the shots, respectively.
The effect disappeared by analyzing separately the scanpath of novices and
experts. This finding suggests that it is possible to parse a visual stimulus
into behaviorally relevant temporal units by comparing the gaze of expert and
naïf observers.

This paper presents a study that investigated the potential effect of an
additional sign on people's simulated wayfinding behavior in a transfer
situation at an airport. Participants were presented with photographs of the
status quo and digitally edited images of the potential redesign. Path choice
behavior, gaze behavior and confidence ratings were analyzed. The combination
of the three methods proved to capture the situation better than any of the
methods alone. The results provide evidence that the re-design has a positive
effect on passengers' wayfinding behavior.

Mobile eye tracking has become a useful tool in studies of vision and
attention in real-world tasks. However, there remains a disconnection between
such studies and the laboratory paradigms used by cognitive psychology. In
particular, visual search has been studied intensively, but lab search often
differs from search in the real world in many respects (e.g., in reality one
must walk and move head and eyes to find the target, target and distractors are
not equally visible, and objects are frequently occluded). Here, we took a
broader view of search behaviour and analyzed the gaze of participants who were
asked to walk around within a building, find a room, and then locate a target
mailbox. Our aim was to describe the differences in behaviour according to
principles of (lab-based) visual search, and we did this by testing the effects
of top-down instructions (i.e. having more or less information about where to
go) and target saliency (i.e. having a more or less distinctive target to look
for). These factors made a difference in a real world context by changing the
frequency with which signs and cues in the environment were fixated, and by
affecting head and eye movements in the mail-room. Bottom-up saliency had
little effect on search time, but our approach revealed how it influenced the
coordination of gaze, while still allowing us to make contact with laboratory
paradigms.

Using ScanMatch scores to understand differences in eye movements between
correct and incorrect solvers on physics problems

Using a ScanMatch algorithm we investigate scan path differences between
subjects who answer physics problems correctly and incorrectly. This algorithm
bins a saccade sequence spatially and temporally, recodes this information to
create a sequence of letters representing fixation location, duration and
order, and compares two sequences to generate a similarity score. We recorded
eye movements of 24 individuals on six physics problems containing diagrams
with areas consistent with a novice-like response and areas of high perceptual
salience. We calculated average ScanMatch similarity scores comparing correct
solvers to one another (C-C), incorrect solvers to one another (I-I), and
correct solvers to incorrect solvers (C-I). We found statistically significant
differences between the C-C and I-I comparisons on only one of the problems.
This seems to imply that top down processes relying on incorrect domain
knowledge, rather than bottom up processes driven by perceptual salience,
determine the eye movements of incorrect solvers.

Integrated Development Environments (IDE) generate multiple graphical and
textual representations of programs. Co-ordination of these representations
during program comprehension and debugging can be a complex task. In order to
better understand the role and effectiveness of multiple representations, we
conducted an empirical study of Java program debugging with a professional,
multi-representation IDE. We found that program code and dynamic
representations (dynamic viewer, variable watch and output) attracted the most
attention of programmers. Static representations like Unified Modeling Language
(UML) Diagrams and Control Structure Diagrams (CSD) saw significantly lesser
usage. We analyzed gaze patterns by segmenting the debugging sessions into
three, five and fifteen minute intervals, and classifying gazes into short and
long gazes. Novel data mining techniques were used to detect high frequency
patterns from eye tracking data. Visual pattern differences were found among
participants based on their programming experience, familiarity with the IDE
and debugging performance.

An increasing amount of text is being read digitally. In this paper we
explore how eye tracking devices can be used to aggregate reading data of many
readers in order to provide authors and editors with objective and implicitly
gathered quality feedback. We present a robust way to jointly evaluate the gaze
data of multiple readers, with respect to various reading-related features. We
conducted an experiment in which a group of high school students composed
essays subsequently read and rated by a group of seven other students.
Analyzing the recorded data, we find that the amount of regression targets, the
reading-to-skimming ratio, reading speed and reading count are the most
discriminative features to distinguish very comprehensible from barely
comprehensible text passages. By employing machine learning techniques, we are
able to classify the comprehensibility of text automatically with an overall
accuracy of 62%.

Eye tracking systems issues II

The development of systems that track the eye while allowing head movement
is one of the most challenging objectives of gaze tracking researchers. Tracker
accuracy decreases as the subject moves from the calibration position and is
especially influenced by changes in depth with respect to the screen. In this
paper, we demonstrate that the pattern of error produced due to user movement
mainly depends on the system configuration and hardware element placement
rather than the user. Thus, we suggest alternative calibration techniques for
error reduction that compensate for the lack of accuracy due to subject
movement. Using these techniques, we can achieve an error reduction of more
than 50%.

Shifts in reported gaze position due to changes in pupil size: ground truth
and compensation

Camera-based eye trackers are the mainstay of today's eye movement research
and countless practical applications of eye tracking. Recently, a significant
impact of changes in pupil size on the accuracy of camera-based eye trackers
during fixation has been reported [Wyatt 2010]. We compared the pupil-size
effect between a scleral search coil based eye tracker (DNI) and an up-to-date
infrared camera-based eye tracker (SR Research Eyelink 1000) by simultaneously
recording human eye movements with both techniques. Between pupil-constricted
and pupil-relaxed conditions we find a subject-specific shift in reported gaze
position exceeding 2 degrees only with the camera based eye tracker, while the
scleral search coil system simultaneously reported steady fixation. This
confirms that the actual point of fixation did not change during pupil
constriction/relaxation, and the resulting shift in measured gaze position is
solely an artifact of the camera-based eye tracking system. We demonstrate a
method to partially compensate the pupil-based shift using separate
calibrations in pupil-constricted and pupil-dilated conditions, with pupil size
as an index to dynamically weight the two calibrations.

Automatic acquisition of a 3D eye model for a wearable first-person vision
device

A wearable gaze tracking device can work with users in daily-life. For long
time of use, a non-active method that does not employ an infrared illumination
system is desirable from safety standpoint. It is well known that the eye model
constraints substantially improve the accuracy and robustness of gaze
estimation. However, the eye model needs to be calibrated for each person and
each device. We propose a method to automatically build the eye model for a
wearable gaze tracking device. The key idea is that the eye model, which
includes the eye structure and eye-camera relationship, impose constraints on
image analysis even when it is incomplete, so we adopt an iterative eye model
building process with gradually increasing eye model constraints. Performance
of the proposed method is evaluated in various situations, including different
eye colors of users and camera configurations. We have confirmed that the gaze
tracking system using our eye model works well under general situations:
indoor, outdoor and driving scene.

Evaluation of pupil center-eye corner vector for gaze estimation using a web
cam

Low cost eye tracking is an actual challenging research topic for the eye
tracking community. Gaze tracking based on a web cam and without infrared light
is a searched goal to broaden the applications of eye tracking systems. Web cam
based eye tracking results in new challenges to solve such as a wider field of
view and a lower image quality. In addition, no infrared light implies that
glints cannot be used anymore as a tracking feature. In this paper, a thorough
study has been carried out to evaluate pupil (iris) center-eye corner (PC-EC)
vector as feature for gaze estimation based on interpolation methods in low
cost eye tracking, as it is considered to be partially equivalent to the pupil
center-corneal reflection (PC-CR) vector. The analysis is carried out both
based on simulated and real data. The experiments show that eye corner
positions in the image move slightly when the user is looking at different
points of the screen, even with a static head position. This lowers the
possible accuracy of the gaze estimation, significantly reducing the accuracy
of the system under standard working conditions to 2-3 degrees.

The objective is an efficient means to improve the accuracy of detected
fixations. The context is studies of natural behavior of subjects wearing eye
trackers while observing distant objects. Fixation detection algorithms try to
determine when the image on the retina is stable. Previous algorithms for
wearable eye trackers consider only eye-in-head motion. In the presence of the
vestibular-ocular response (VOR), however, the motion of the head counteracts
eye-in-head rotation. Compensating for this ego-motion increases the number of
detected fixations for all subjects. This compensation significantly affects
the number and size of the fixations detected, more accurately reflecting
mobile observers' natural gaze behavior.

Eye tracking applications II

This paper investigates whether it is feasible to interact with the small
screen of a smartphone using eye movements only. Two of the most common
gaze-based selection strategies, dwell time selections and gaze gestures are
compared in a target selection experiment. Finger-strokes and
accelerometer-based interaction, i. e. tilting, are also considered. In an
experiment with 11 subjects we found gaze interaction to have a lower
performance than touch interaction but comparable to the error rate and
completion time of accelerometer (i.e. tilt) interaction. Gaze gestures had a
lower error rate and were faster than dwell selections by gaze, especially for
small targets, suggesting that this method may be the best option for
hands-free gaze control of smartphones.

The two cardinal problems recognized with gaze-based interaction techniques
are: how to avoid unintentional commands, and how to overcome the limited
accuracy of eye tracking. Gaze gestures are a relatively new technique for
giving commands, which has the potential to overcome these problems. We present
a study that compares gaze gestures with dwell selection as an interaction
technique. The study involved 12 participants and was performed in the context
of using an actual application. The participants gave commands to a 3D
immersive game using gaze gestures and dwell icons. We found that gaze gestures
are not only a feasible means of issuing commands in the course of game play,
but they also exhibited performance that was at least as good as or better than
dwell selections. The gesture condition produced less than half of the errors
when compared with the dwell condition. The study shows that gestures provide a
robust alternative to dwell-based interaction with the reliance on positional
accuracy being substantially reduced.

The validity of using non-representative users in gaze communication
research

Gaze-based interaction techniques have been investigated for the last two
decades, and in many cases the evaluation of these has been based on trials
with able-bodied users and conventional usability criteria, mainly speed and
accuracy. The target user group of many of the gaze-based techniques
investigated is, however, people with different types of physical disabilities.
We present the outcomes of two studies that compare the performance of two
groups of participants with a type of physical disability (one being cerebral
palsy and the other muscular dystrophy) with that of a control group of
able-bodied participants doing a task using a particular gaze interaction
technique. One study used a task based on dwell-time selection, and the other
used a task based on gaze gestures. In both studies, the groups of participants
with physical disabilities performed significantly worse than the able-bodied
control participants. We question the ecological validity of research into gaze
interaction intended for people with physical disabilities that only uses
able-bodied participants in evaluation studies without any testing using
members of the target user population.

Eye typing is one of the most intensively investigated topics in eye
tracking technology. Currently, almost all eye typing systems are developed for
English typing. Some preliminary studies have been made on developing eye
typing systems for inputting Chinese characters/text. In this paper, a novel
eye typing system is proposed for inputting Chinese characters, where a
software keyboard is specially designed based on a study of Chinese Pinyin.
Experimental results show the efficiency and usability of the proposed system.

The potential of dwell-free eye-typing for fast assistive gaze communication

We propose a new research direction for eye-typing which is potentially much
faster: dwell-free eye-typing. Dwell-free eye-typing is in principle possible
because we can exploit the high redundancy of natural languages to allow users
to simply look at or near their desired letters without stopping to dwell on
each letter. As a first step we created a system that simulated a perfect
recognizer for dwell-free eye-typing. We used this system to investigate how
fast users can potentially write using a dwell-free eye-typing interface. We
found that after 40 minutes of practice, users reached a mean entry rate of 46
wpm. This indicates that dwell-free eye-typing may be more than twice as fast
as the current state-of-the-art methods for writing by gaze. A human
performance model further demonstrates that it is highly unlikely traditional
eye-typing systems will ever surpass our dwell-free eye-typing performance
estimate.

Systems, tools, methods

Analysing the potential of adapting head-mounted eye tracker calibration to
a new user

A key issue with state-of-the-art mobile eye trackers, particularly during
long-term recordings in daily life, is the need for cumbersome and time
consuming (re)calibration. To reduce this burden, in this paper we investigate
the feasibility of adapting the calibration obtained for one user to another.
Calibration adaptation is automatically performed using a light-weight linear
translation. We compare three different methods to compute the translation:
"multi-point", where all calibration-points are used, "1-point", and "0-point"
that uses only an external parameter. We evaluate these methods in a
6-participant user study in a controlled laboratory setting by measuring the
error in visual angle between the predicted gaze point and the true gaze point.
Our results show that, averaged across all participants, the best adapted
calibration is only 0.8° (mean) off the calibration obtained for that
specific user. We also show the potential of the 1-point and 0-point methods
compared to the time-consuming multi-point computation.

The demand for improved human computer interaction will lead to increasing
adoption of eye tracking in everyday devices. For interaction with devices such
as Smart TVs, the eye tracker must operate in more challenging environments
such as the home living room. In this paper we present a non-contact eye
tracking system that allows for freedom of viewer motion in a living room
environment. A pan and tilt mechanism is used to orient the eye tracker, guided
by face tracking information from a wide-angle camera. The estimated point of
gaze is corrected for viewer movement in realtime, avoiding the need for
recalibration. The proposed technique achieves comparable accuracy to desktop
systems near the calibration position of less than 1° of visual angle and
accuracy of less than 2° of visual angle when the viewer moved a large
distance, such as standing or sitting on the other side of the couch. The
system performance achieved was more than sufficient to operate a novel,
hands-free Smart TV interface.

A general framework for extension of a tracking range of
user-calibration-free remote eye-gaze tracking systems

Stereo-camera Remote Eye-Gaze Tracking (REGT) systems can provide
calibration-free estimation of gaze. However, such systems have a limited
tracking range due to the requirement for the eye to be tracked in both
cameras. This paper presents a general framework for extension of a tracking
range of stereo-camera user-calibration-free REGT systems. The proposed method
consists of two distinct phases. In the brief initial phase, estimates of
eye-features [the center of the pupil and corneal reflections] in pairs of
stereo-images are used to estimate automatically a set of subject-specific eye
parameters. In the second phase, these subject-specific eye parameters are used
with estimates of eye-features in images from any one of the systems' cameras
to compute the Point-of-Gaze (PoG). Experiments were conducted with a system
that includes two cameras in a horizontal plane. The experimental results
demonstrate that the tracking range for horizontal gaze directions can be
extended by more than 50%: from ±23.2° when the two cameras are used
as a stereo pair to ±35.5° when the two cameras are used
independently to estimate the PoG. By adding more cameras to the system, the
proposed framework allows further extension of the tracking range in both
horizontal and vertical direction, while preserving a user-calibration-free
status of a REGT system.

Mathematical model for wide range gaze tracking system based on corneal
reflections and pupil using stereo cameras

In this paper, we propose a mathematical model for a wide range gaze
tracking system based on corneal reflections and pupil using calibrated stereo
cameras and light sources. We demonstrate a general calculation method for
estimating the optical axis of the eye for a combination of non-coaxial and
coaxial configurations of many cameras and light sources. Gaze estimation is
possible only when light is reflected from the spherical surface of the cornea.
Moreover, we provide a method for calculating the eye rotation range where gaze
tracking can be achieved, which is useful for positioning cameras and light
sources in real world applications.

We contribute a novel gaze estimation technique, which is adaptable for
person-independent applications. In a study with 17 participants, using a
standard webcam, we recorded the subjects' left eye images for different gaze
locations. From these images, we extracted five types of basic visual features.
We then sub-selected a set of features with minimum Redundancy Maximum
Relevance (mRMR) for the input of a 2-layer regression neural network for
estimating the subjects' gaze. We investigated the effect of different visual
features on the accuracy of gaze estimation. Using machine learning techniques,
by combing different features, we achieved average gaze estimation error of
3.44° horizontally and 1.37° vertically for person-dependent.

Current microcomputers are powerful enough to implement a realtime eye
tracking system, but the computational throughput still limits the types of
algorithms that can be implemented in real time. Many of the image processing
algorithms that are typically used in eye tracking applications can be
significantly accelerated when the processing is delegated to a graphics
processing unit (GPU). This paper describes a real-time gaze tracking system
developed using the CUDA programming environment distributed by nVidia. The
current implementation of the system is capable of processing a 640 by 480
image in less than 4 milliseconds, and achieves an average accuracy close to
0.5 degrees of visual angle.

Extending the visual field of a head-mounted eye tracker for pervasive
eye-based interaction

Pervasive eye-based interaction refers to the vision of eye-based
interaction becoming ubiquitously usable in everyday life, e. g. across
multiple displays in the environment. While current head-mounted eye trackers
work well for interaction with displays at similar distances, the scene camera
often fails to cover both remote and close proximity displays, e. g. a public
display on a wall and a handheld portable device. In this paper we describe an
approach that allows for robust detection and gaze mapping across multiple such
displays. Our approach uses an additional scene camera to extend the viewing
and gaze mapping area of the eye tracker and automatically switches between
both cameras depending on the display in view. Results from a pilot study show
that our system achieves a similar gaze estimation accuracy to a single-camera
system while at the same time increasing usability.

We propose a multi-camera-based gaze tracking system that provides a wide
observation area. In our system, multiple camera observations are used to
expand the detection area by employing mosaic observations. Each facial feature
and eye region image can be observed by different cameras, and in contrast to
stereo-based systems, no shared observations are required. This feature relaxes
the geometrical constraints in terms of head orientation and camera viewpoints
and realizes wide availability of gaze tracking with a small number of cameras.
In experiments, we confirmed that our implemented system can track head
rotation of 120° with two cameras. The gaze estimation accuracy is 5.4°
horizontally and 9.7° vertically.

This work describes the design and implementation of an eye tracking system
on an unmodified common tablet PC. A neural network eye tracker is employed as
a solution to eye tracking in the visible spectrum of light. We discuss the
challenges related to image recognition and processing, and provide an
objective evaluation of the accuracy and sampling rate of eye-gaze-based
interaction with such an eye tracker. The results indicate that it is possible
to obtain an average accuracy of 4.42° and a sampling rate of 0.70 Hz with
the described system.

We compared various real-time filters designed to denoise eye movements from
low-sampling devices. Most of the filters found in literature were implemented
and tested on data gathered in a previous study. An improvement was proposed
for one of the filters. Parameters of each filter were adjusted to ensure their
best performance. Four estimation parameters were proposed as criteria for
comparison. The output from the filters was compared against two idealized
signals (the signals denoised offline). The study revealed that FIR filters
with triangular or Gaussian kernel (weighting) functions and parameters
dependent on signal state show the best performance.

The task of automatically tracking the visual attention in dynamic visual
scenes is highly challenging. To approach it, we propose a Bayesian online
learning algorithm. As the visual scene changes and new objects appear, based
on a mixture model, the algorithm can identify and tell visual saccades
(transitions) from visual fixation clusters (regions of interest). The approach
is evaluated on real-world data, collected from eye-tracking experiments in
driving sessions.

Several possible measures for the precision of an eye-tracker exist. The
fact that the commonly used measures of standard deviation and RMS lack with
respect to their ability to produce replicable results with varying frame rate,
gaze distance and arrangement of samples within a fixation, makes it difficult
to compare eye-trackers. It is proposed that an area-based measure, BCEA, is
adapted to provide a one dimensional quantity that is intuitive, independent of
frame rate and sensitive to small jerks in the reported fixation position.

It is important that eye-tracking studies report the accuracy and precision
of the eye tracker. It is argued that the values provided by the manufacturers
are representative of the best possible capability of the eye tracker under
ideal conditions and for participants with good tracking probabilities. A tool
is introduced that will allow researchers to determine the actual data quality
as it applies for individual participants at the time of data capturing.
Results of a study where the tool was implemented is discussed and compared
with the accuracy and precision values as reported by the manufacturer for the
same model of eye-tracker.

In a typical head-mounted eye tracking system, any small slippage of the eye
tracker headband on the participant's head leads to a systematic error in the
recorded gaze positions. While various approaches exist that reduce these
errors at recording time, only few methods reduce the errors of a given
tracking system after recording. In this paper we introduce a novel correction
algorithm that can significantly reduce the drift in recorded gaze data for eye
tracking experiments that use static stimuli. The algorithm is entropy-based
and needs no prior knowledge about the stimuli shown or the tasks participants
accomplish during the experiment.

The development of gaze tracking algorithms is very much bound to the
specific setup and properties of the respective system they are used in. This
makes it hard e. g. to compare their performance. We propose Gazelnut, a
modular system to ease the development and comparison of gaze tracking
algorithms, which also makes it independent from the permanent access to
specific hardware.
Building on the message passing architecture of the "robot operating system"
(ROS) the system provides a flexible base to record and replay sessions, record
the input from multiple cameras, run exchangeable algorithms on such sessions,
store their individual results on the recorded (or live) scene, run different
algorithms in parallel to compare their results and attach additional
diagnostic modules to the running system.

This paper presents a new, publicly available eye tracking dataset, aimed to
be used as a benchmark for Point of Gaze (PoG) detection algorithms. The
dataset consists of a set of videos recording the eye motion of human test
subjects as they were looking at, or following, a set of predefined points of
interest on a computer visual display unit. The eye motion was recorded using a
Mobile Eye, head mounted, infrared monocular camera. The ground truth of the
point of gaze and head location and direction in the three dimensional space
are provided together with the data. The ground truth regarding the point of
gaze at is known in advance since the subjects are always looking at predefined
targets, whereas, the head position in 3D is captured using a Vicon Motion
Tracking System.

For gaze-based training in surgery to be meaningful, the similarity between
a trainee's gaze and an expert's gaze during performance of surgical tasks must
be assessed. As it is difficult to record two people's gaze simultaneously, we
produced task videos made by experts, and measured the amount of overlap
between the gaze path of the expert surgeon and third-party observers while
watching the videos. For this investigation, we developed a new, simple method
for displaying and summarizing the proportion of time during which two
observers' points of gaze on a common stimulus were separated by no more than a
specified visual angle.
In a study of single-observer self-review and multiple-observer initial view
of a laparoscopic training task, we predicted that self-review would produce
the highest overlap. We found relatively low overlap between watchers and the
task performer; even operators with detailed task knowledge produce low overlap
when watching their own videos. Conversely, there was a high overlap among all
watchers. Results indicate that it may be insufficient to improve trainees'
eye-hand coordination by just watching a video. Gaze training will need to be
integrated with other teaching methods to be effective.

This paper considers the impact of location as context in mobile eye
tracking studies that extend to large-scale spaces, such as pedestrian
wayfinding studies. It shows how adding a subject's location to her gaze data
enhances the possibilities for data visualization and analysis. Results from an
explorative pilot study on mobile map usage with a pedestrian audio guide
demonstrate that the combined recording and analysis of gaze and position can
help to tackle research questions on human spatial problem solving in a novel
way.

Selecting values for fixation filters is a difficult task as not only the
specifics of the selected filter algorithm has to be taken into account, but
also what it is going to be used for and by whom. In this paper the selection
and testing process of values for an I-VT fixation filter algorithm
implementation is described.

Comparison of eye movement metrics recorded at different sampling rates

Previous work has shown significant differences in eye movement metrics
recorded by devices differing in sampling rates. Two schools of thought have
emerged on how to effectively compare such apparently disparate data. The
first, termed here as upsampling, strives to process eye movement data recorded
at a low sampling rate to allow comparison with data recorded at a high
sampling rate, e. g., by fitting a cubic spline to the signal derivative (i.e.,
velocity). Instead, we suggest downsampling based on a two-pass solution in
which data is first downsampled and smoothed prior to its velocity-based
classification. Results indicate that given a similar experimental task, this
approach gives more equitable results than other single-pass classification
methods as they typically do not explicitly consider sampling rates.

On the conspicuity of 3-D fiducial markers in 2-D projected environments

Fiducial markers are used with head-mounted eye trackers to facilitate eye
movement data aggregation for quantitative analysis. However, use of these
markers may be problematic in some situations (e.g., natural tasks) as the
markers may be visually distracting. To date, we are aware of no study that has
examined the conspicuity of such markers to determine how much (if any) effort
should be expended into concealing them from view. This paper presents a study
that examines Tobii's infra-red (IR) markers' conspicuity in a 2-D projected
environment. Results indicate that even when these 3-D markers are superimposed
on a canvas on which the 2-D environment is projected, and no effort is taken
to hide them (i.e., by minimizing contrast with the background), the presence
of markers does not significantly alter the number or duration of fixations on
the location of the markers when a specific task is given.

This paper discusses estimation of active speaker in multi-party
video-mediated communication from gaze data of one of the participants. In the
explored settings, we predict voice activity of participants in one room based
on gaze recordings of a single participant in another room. The two rooms were
connected by high definition, low delay audio and video links and the
participants engaged in different activities ranging from casual discussion to
simple problem-solving games. We treat the task as a classification problem. We
evaluate several types of features and parameter settings in the context of
Support Vector Machine classification framework. The results show that using
the proposed approach vocal activity of a speaker can be correctly predicted in
89% of the time for which the gaze data are available.

Characteristics of the human visual field are well known to be different in
central (fovea) and peripheral areas. Existing computational models of visual
saliency, however, do not take into account this biological evidence. The
existing models compute visual saliency uniformly over the retina and, thus,
have difficulty in accurately predicting the next gaze (fixation) point. This
paper proposes to incorporate human visual field characteristics into visual
saliency, and presents a computational model for producing such a saliency map.
Our model integrates image features obtained by bottom-up computation in such a
way that weights for the integration depend on the distance from the current
gaze point where the weights are optimally learned using actual saccade data.
The experimental results using a large number of fixation/saccade data with
wide viewing angles demonstrate the advantage of our saliency map, showing that
it can accurately predict the point where one looks next.

Uses and applications

A popular word processor application was adapted to include the use of eye
gaze and speech as a modality for text entry. An onscreen keyboard was used
whereby users were expected to focus on the desired character and then issue a
verbal command in order to type the character in the document. Measures of
speed and accuracy were captured and analyzed. Results indicate that the
keyboard is superior to the gaze and speech entry method in terms of both speed
and accuracy. Keyboard button sizes and spacing between the buttons did not
affect either measure in any way.

In eye-gaze-based human-computer interfaces, the most commonly used
mechanism for generating activation commands (i.e., mouse clicks) is dwell time
(DT). While DT can be relatively efficient and easy to use, it is also
associated with the possibility of generating unintentional activation commands
-- an issue that is known as the Midas' touch problem. To address this problem,
we proposed to use a "tooth-clicker" (TC) device as a mechanism for generating
activation commands independently of the activity of the eyes.
This paper describes a pilot study that verifies the feasibility of using an
eye-gaze tracker (EGT) and a TC to type on an on-screen keyboard, and compares
the performance of the EGT-TC system with that of the EGT with two different DT
thresholds (880 ms and 490 ms). The six subjects that participated in the study
were able to attain typing speeds using the EGT-TC system that were slower than
but comparable to the typing speeds that they attained using the EGT with the
shorter DT threshold.

The effect of clicking by smiling on the accuracy of head-mounted gaze
tracking

The effect of facial behaviour on gaze tracking accuracy was studied while
using a prototype system that integrated head-mounted, video-based gaze
tracking and a capacitive facial movement detection for respective pointing and
selecting objects in a simple graphical user interface. Experiments were
carried out to determine how voluntary smiling movements that were used to
indicate clicks affect the accuracy of gaze tracking due to the combination of
user eye movement behaviour and the operation of gaze tracking algorithms. The
results showed no observable degradation of the gaze tracking accuracy when
using voluntary smiling for object selections.

The performance of eye gaze and speech when used as a pointing device was
tested using the ISO multi-directional tapping task. Eye gaze and speech were
used for target selection as is, as well as with the use of a gravitational
well and in conjunction with magnification. These selection methods were then
compared to the mouse. The mouse was far superior in terms of performance when
selecting targets, although the use of a gravitational well did increase the
performance of eye gaze and speech. However, magnification did not improve the
use of gaze and speech as a pointing device.

This paper introduces Dynamic Context Switching (DCS) as an extension of the
Context Switching (CS) paradigm for gaze-based interaction. CS replicates
information in each context. The user can freely explore one context without
worrying about the Midas touch problem, and a saccade to the other context
triggers the selection of the item under focus. Because CS has to display two
contexts simultaneously, the amount of useful screen space is limited. DCS
dynamically adjusts the context sizes, where the context that has the focus is
displayed in full size, while the other is minimized, thus improving useful
screen space. A saccade to the minimized context triggers selection, and
properly readjusts the sizes of the contexts. Results from a pilot user
experiment show that DCS improves user performance and do not cause
disorientation due to the dynamic context resizing.

Remote pan-and-zoom control for the exploration of large information spaces
is of interest for various application areas, such as browsing through medical
data in sterile environments or investigating geographic information systems on
a distant display. In this context, considering a user's visual attention for
pan-and-zoom operations could be of interest. In this paper, we investigate the
potential of gaze-supported panning in combination with different zooming
modalities: (1) a mouse scroll wheel, (2) tilting a handheld device, and (3)
touch gestures on a smartphone. Thereby, it is possible to zoom in at a
location a user currently looks at (i.e., gaze-directed pivot zoom). These
techniques have been tested with Google Earth by ten participants in a user
study. While participants were fastest with the already familiar mouse-only
base condition, the user feedback indicates a particularly high potential of
the gaze-supported pivot zooming in combination with a scroll wheel or touch
gesture.

In this paper we present an approach to build an eye-tracking based text
cursor placement system. When triggered, the system employs a computer vision
based analysis of the screen's content around the current gaze position to find
the most likely designated gaze target. Eventually it synthesizes a mouse event
at that position, allowing for a rapid text cursor repositioning even in
applications which do not support eye tracking explicitly. For our system we
compared three different computer vision methods in a simulation run and
evaluated the best candidate in two double blinded user studies. We used a
total of 19 participants to assess the system's objective and perceived end
user speed up. We can demonstrate that in terms of reposition time the OCR
based method is superior to the other tested methods, it also beats common
keyboard-mouse interaction for some users. We conclude that while the tool was
almost universally preferred subjectively over keyboard-mouse interaction, the
highest speed can be achieved by using the right amount of eye tracking.

This paper presents an experiment comparing performance and user experience
of gaze and mouse interaction in a minimalistic 3D flying game that only
required steering. Mouse interaction provided better performance and
participants considered it less physical and mental demanding, less frustrating
and less difficult to maneuver. Gaze interaction, however, yielded higher
levels of entertainment and engagement. The paper suggests that gaze steering
provides a high kinesthetic pleasure both because it is difficult to master and
because it presents a unique mapping between fixation and locomotion.

In this work, we present a proactive content based recommender system that
employs web document clustering performed by using eye gaze data. Generally,
recommender systems are used in commercial applications, where information
about the user's habits and interests are of crucial importance in order to
plan marketing strategies, or in information retrieval systems in order to
suggest similar resources a user is interested in. Commonly, these systems use
explicit relevance feedback techniques (e.g. mouse or keyboard) to improve
their performance and to recommend products. In contrast, the proposed system
permits to capture user's interest by using implicit relevance feedback, based
on data acquired by an eye tracker Tobii T60. The purpose of the system is to
collect eye gaze data during web navigation and, by employing clustering
techniques, to suggest web documents similar to those that the user,
implicitly, expressed greater interest. Performance evaluation was carried out
on 30 users and the results show that the proposed system enhanced navigation
experience in about 73% of the cases.

A film comic is a kind of art work representing a movie story as a comic. It
uses the images of the movie as panels. Verbal information such as dialogue and
narrations is represented in word balloons. A key issue in creating film comics
is how to select images which are significant in conveying the story of the
movie. Such significance of images is inherently semantic and context-dependent
and hence, technologies purely based on image analysis usually fail to produce
good results. On the other hand, the word balloon arrangement requires
understanding not only the semantic of images but also the verbal information,
which is difficult except for the case the script of the movie is available.
This paper describes a new attempt to use eye-tracking data for the automatic
creation of a film comic from a movie. Patterns of eye movement are analyzed
for detecting the change of scenes and gaze information is used for
automatically finding the location for inserting and directing the word
balloons. Our experiments showed that the proposed technique can largely
improve the selection of significant images compared with the method using
image features only and realize the automatic balloon arrangement.

Differences between visual attention strategies of experts and novices have
been investigated in many fields, but little has been done in the field of
microneurosurgery. In the hands of an experienced surgeon, microneurosurgery
seems like an elegant, routine and clean procedure with minimal blood loss.
However, microneurosurgery is a multifaceted task with clinical risks
associated to surgeons' skills. In a preliminary study, eye movements of eight
surgeons were recorded while observing four images representing four phases in
a tumor removal surgery. A comparison of the eye movement strategies shows
clear markers of expertise depending on the phase of the surgery.

An eye-tracking study on the role of scan time in finding source code
defects

An eye-tracking study is presented that investigates how individuals find
defects in source code. This work partially replicates a previous eye-tracking
study by Uwano et al. [2006]. In the Uwano study, eye movements are used to
characterize the performance of individuals in reviewing source code. Their
analysis showed that subjects who did not spend enough time initially scanning
the code tend to take more time finding defects. The study here follows a
similar setup with added eye-tracking measures and analyses on effectiveness
and efficiency of finding defects with respect to eye gaze. The subject pool is
larger and is comprised of a varied skill level. Results indicate that scanning
significantly correlates with defect detection time as well as visual effort on
relevant defect lines. Results of the study are compared and contrasted to the
Uwano study.

While lots of reading happens on mobile devices, little research has been
performed on how the reading-interaction actually takes place. Therefore we
describe our findings on a study conducted with 18 users which were asked to
read a number of texts while their touch and gaze data was being recorded. We
found three reader types and identified their preferred alignment of text on
the screen. Based on our findings we are able to computationally estimate the
reading area with an approximate .81 precision and .89 recall. Our computed
reading speed estimate has an average 10.9% wpm error in contrast to the
measured speed, and combining both techniques we can pinpoint the reading
location at a given time with an overall word error of 9.26 words, or about
three lines of text on our device.

In this paper, we revisit a seminal research contribution by Russo and
Leclerc [1994], which identified three stages of the consumer choice process;
(1) orientation, (2) evaluation, and (3) verification. Their three stage model
broke with previous research favoring two stage models and it disconfirmed the
models of planned analysis of choice in favor of an adaptive and constructive
process [Wedel and Pieters 2008]. The aim of this paper is to replicate the
original study by Russo and Leclerc [1994] to better understand the
characteristics of the different stages of the consumer choice process. We
argue that such a replication is needed due to the advancements in the
technology of eye-tracking during the last 15 years and the detrimental effects
of think-aloud protocols. In general, our replication of the research by Russo
and Leclerc [1994] confirms the three stage model they suggested by, but we
identify some noteworthy differences regarding the time it takes to make a
decision and the mean observation time in the three stages..

Human perceptual expertise has significant influence on medical image
inspection. However, little is known regarding whether experts differ in their
cognitive processing or what effective visual strategies they employ for
examining medical images. To remedy this, we conduct an eye tracking experiment
and collect both eye movement and verbal description data from three groups of
subjects with different medical training levels. Each subject examines and
describes 42 photographic dermatological images. We then develop a hierarchical
probabilistic framework to extract the common and unique eye movement patterns
exhibited among multiple subjects' fixation and saccadic eye movements within
each expertise-specific group. Furthermore, experts' annotations of thought
units on the transcribed verbal descriptions are time-aligned with these eye
movement patterns to identify their semantic meanings. In this work, we are
able to uncover the manner in which these subjects alternated their viewing
strategies over the course of inspection, and additionally extract their
perceptual expertise so that it can be used for advanced medical image
understanding.

Visual attention to television programs with a second-screen application

This study examined participants' visual attention via eye-movement patterns
as they watched two television shows -- one a drama, the other a documentary --
while interacting with synchronized second-screen applications introduced in
spring 2011. The second screen garnered considerable visual attention, about
30% of the total viewing session. Visual attention went to the tablet screen
even without a recent "push" of interactive content and without advertising
content on the TV screen. However, interactive content and TV advertising did
trigger more attention to the tablet app. The presence of the second screen
also dramatically decreased the average gaze length on TV as described in
previous research.

Eye-tracking was used to predict choices made during play of a series of
computer-generated simultaneous normal-form games. Four normal-form games were
used as the test bed for the eye-tracking experiment: the Coordination Game,
Battle of the Sexes, the Game of Chicken, and Prisoner's Dilemma. These games
are abstractions of real-life scenarios where a person must make a choice to
either cooperate with another person for some common good, or not cooperate,
given a specific "payoff" for cooperating or not cooperating. The other player
was always an automated agent whose goal was to predict the choice of the human
player. Players were found to cluster into different types according to a
numeric index specific to the game played. An eye-tracking experiment confirms
that attention deployed to particular areas of interest varies according to the
game played and the type to which a player belongs. This enabled a decision
tree to be created from the eye-tracking data which was used by the agent to
classify each player as a specific type, allowing a prediction to be made about
a player's likely choice.

To observe whether there is a difference in eye gaze between doing a task,
and watching a video of the task, we recorded the gaze of 17 subjects
performing a simple surgical eye-hand coordination task. We also recorded eye
gaze of the same subjects later while they were watching videos of their
performance.
We divided the task into 9 or more sub-tasks, each of which involved a large
hand movement to a new target location. We analyzed the videos manually and
located the video frame for each sub-task where the operator's saccadic
movement began, and the frame where the watcher's eye movement began. We found
a consistent delay of about 600 ms between initial eye movement when doing the
task, and initial eye movement when watching the task, observed in 96.3% of the
sub-tasks.
For the first time, we have quantified the differences between doing and
watching a manual task. This will help develop gaze-based training strategies
for manual tasks.

How to measure monitoring performance of pilots and air traffic controllers

In prior research on the future of aviation it was established that
operators will have to work with highly automated systems. Increasing
automation will require operators monitoring appropriately (OMA). OMA are
expected to demonstrate the use of distinctly different monitoring phases
(orientation, anticipation, detection, and recheck). Within these phases, they
must grasp in time the relevant information that would enable them to take
control should automation fail. The presented study aims at finding appropriate
measurements for the identification of OMA on the basis of eye tracking. In
order to do this, a normative model of adequate monitoring behavior was
designed including the definition of areas of interest. We tested 90
participants who had to monitor a dynamic automatic process, and then take
control. In order to decide on suitable eye tracking parameters it was asked
which parameters are significantly related to manual control performance. The
results show that the suitability of parameters depends on the specific phase
of the monitoring process. Gaze durations allow for differentiating between
high and low performing subjects during orientation phases. In contrast,
relative fixation counts are suitable for predicting monitoring performance
during detection phases. In general, the results support the assumption that
eye tracking parameters are appropriate for identifying OMA.

Exploring the effects of visual cognitive load and illumination on pupil
diameter in driving simulators

Pupil diameter is an important measure of cognitive load. However, pupil
diameter is also influenced by the amount of light reaching the retina. In this
study we explore the interaction between these two effects in a simulated
driving environment. Our results indicate that it is possible to separate the
effects of illumination and visual cognitive load on pupil diameter, at least
in certain situations.