Gaze-mediated input

Eye tracking input often relies on visual and auditory feedback. Haptic
feedback offers a previously unused alternative to these established methods.
We describe a study to determine the natural time limits for haptic feedback to
gazing events. The target is to determine how much time we can use to evaluate
the user gazed object and decide if we are going to give the user a haptic
notification on that object or not. The results indicate that it is best to get
feedback faster than in 250 milliseconds from the start of fixation of an
object. Longer delay leads to increase in incorrect associations between
objects and the feedback. Delays longer than 500 milliseconds were confusing
for the user.

Within a pervasive computing environment, we see content on shared displays
that we wish to acquire and use in a specific way i.e., with an application on
a personal device, transferring from point-to-point. The eyes as input can
indicate intention to interact with a service, providing implicit pointing as a
result. In this paper we investigate the use of gaze and manual input for the
positioning of gaze-acquired content on personal devices. We evaluate two main
techniques, (1) Gaze Positioning, transfer of content using gaze with manual
input to confirm actions, (2) Manual Positioning, content is selected with gaze
but final positioning is performed by manual input, involving a switch of
modalities from gaze to manual input. A first user study compares these
techniques applied to direct and indirect manual input configurations, a tablet
with touch input and a laptop with mouse input. A second study evaluated our
techniques in an application scenario involving distractor targets. Our overall
results showed general acceptance and understanding of all conditions, although
there were clear individual user preferences dependent on familiarity and
preference toward gaze, touch, or mouse input.

This paper presents an experimental investigation of gaze-based control
modes for unmanned aerial vehicles (UAVs or "drones"). Ten participants
performed a simple flying task. We gathered empirical measures, including task
completion time, and examined the user experience for difficulty, reliability,
and fun. Four control modes were tested, with each mode applying a combination
of x-y gaze movement and manual (keyboard) input to control speed (pitch),
altitude, rotation (yaw), and drafting (roll). Participants had similar task
completion times for all four control modes, but one combination was considered
significantly more reliable than the others. We discuss design and performance
issues for the gaze-plus-manual split of controls when drones are operated
using gaze in conjunction with tablets, near-eye displays (glasses), or
monitors.

Compared to the mouse, eye pointing is inaccurate. As a consequence, small
objects are difficult to point by gaze alone. We suggest using a combination of
eye pointing and subtle head movements to achieve accurate hands-free pointing
in a conventional desktop computing environment. For tracking the head
movements, we exploited information of the eye position in the eye tracker's
camera view. We conducted a series of three experiments to study the potential
caveats and benefits of using head movements to adjust gaze cursor position.
Results showed that head-assisted eye pointing significantly improves the
pointing accuracy without a negative impact on the pointing time. In some cases
participants were able to point almost 3 times closer to the target's center,
compared to the eye pointing alone (7 vs. 19 pixels). We conclude that head
assisted eye pointing is a comfortable and potentially very efficient
alternative for other assisting methods in the eye pointing, such as zooming.

Analysis I: eye tracking data analysis methods

We introduce a new design for the visual analysis of eye tracking data
recorded from dynamic stimuli such as video. ISeeCube includes multiple
coordinated views to support different aspects of various analysis tasks. It
combines methods for the spatiotemporal analysis of gaze data recorded from
unlabeled videos as well as the possibility to annotate and investigate dynamic
Areas of Interest (AOIs). A static overview of the complete data set is
provided by a space-time cube visualization that shows gaze points with
density-based color mapping and spatiotemporal clustering of the data. A
timeline visualization supports the analysis of dynamic AOIs and the viewers'
attention on them. AOI-based scanpaths of different viewers can be clustered by
their Levenshtein distance, an attention map, or the transitions between AOIs.
With the provided visual analytics techniques, the exploration of eye tracking
data recorded from several viewers is supported for a wide range of analysis
tasks.

Most analytic approaches for eye-tracking data focus either on
identification of fixations and saccades, or on estimating saliency properties.
Analyzing both aspects of visual attention simultaneously provides a more
comprehensive view of strategies used to process information. This work
presents a method that incorporates both aspects in a unified Bayesian model to
jointly estimate dynamic properties of scanpaths and a saliency map.
Performance of the model is assessed on simulated data and on eye-tracking data
from 15 children with autism spectrum disorder and 13 control children.
Saliency differences between ASD and TD groups were found for both social and
non-social images, but differences in dynamic gaze features were evident in
only a subset of social images. These results are consistent with previous
region-based analyses as well as previous fixation parameter models, suggesting
that the new approach may provide synthesizing and statistical perspectives on
eye-tracking analyses.

Creating a new dynamic measure of the useful field of view using
gaze-contingent displays

Fixation identification algorithms facilitate data comprehension and provide
analytical convenience in eye-tracking analysis. However, current fixation
algorithms for eye-tracking analysis are heavily dependent on parameter
choices, leading to instabilities in results and incompleteness in reporting.
This work examines the nature of human scanning patterns during complex
scene viewing. We show that standard implementations of the commonly used
distance-dispersion algorithm for fixation identification are functionally
equivalent to greedy spatiotemporal tiling. We show that modeling the number of
fixations as a function of tiling size leads to a measure of fractal
dimensionality through box counting. We apply this technique to examine
scale-free gaze behaviors in toddlers and adults looking at images of faces and
blocks, as well as large number of adults looking at movies or static images.
The distributional aspects of the number of fixations may suggest a fractal
structure to gaze patterns in free scanning and imply that the incompleteness
of standard algorithms may be due to the scale-free behaviors of the underlying
scanning distributions. We discuss the nature of this hypothesis, its
limitations, and offer directions for future work.

Calibration & fixation analysis

Towards accurate and robust cross-ratio based gaze trackers through learning
from simulation

Cross-ratio (CR) based methods offer many attractive properties for remote
gaze estimation using a single camera in an uncalibrated setup by exploiting
invariance of a plane projectivity. Unfortunately, due to several
simplification assumptions, the performance of CR-based eye gaze trackers
decays significantly as the subject moves away from the calibration position.
In this paper, we introduce an adaptive homography mapping for achieving gaze
prediction with higher accuracy at the calibration position and more robustness
under head movements. This is achieved with a learning-based method for
compensating both spatially-varying gaze errors and head pose dependent errors
simultaneously in a unified framework. The model of adaptive homography is
trained offline using simulated data, saving a tremendous amount of time in
data collection. We validate the effectiveness of the proposed approach using
both simulated and real data from a physical setup. We show that our method
compares favorably against other state-of-the-art CR based methods.

This paper presents a fully calibrated extended geometric approach for gaze
estimation in three dimensions (3D). The methodology is based on a geometric
approach utilising a fully calibrated binocular setup constructed as a
head-mounted system. The approach is based on utilisation of two ordinary
web-cameras for each eye and 6D magnetic sensors allowing free head movements
in 3D. Evaluation of initial experiments indicate comparable results to current
state-of-the-art on estimating gaze in 3D. Initial results show an RMS error of
39-50 mm in the depth dimension and even smaller in the horizontal and vertical
dimensions regarding fixations. However, even though the workspace is limited,
the fact that the system is designed as a head-mounted device, the workspace
volume is relatively positioned to the pose of the device. Hence gaze can be
estimated in 3D with relatively free head-movements with external reference to
a world coordinate system and is therefore offering flexibility and movability
within certain constraints.

Studying natural reading and its underlying attention processes requires
devices that are able to provide precise measurements of gaze without rendering
the reading activity unnatural. In this paper we propose an eye tracking system
that can be used to conduct analyses of reading behavior in low constrained
experimental settings. The system is designed for dual-camera-based
head-mounted eye trackers and allows free head movements and note taking. The
system is composed of three different modules. First, a 3D model-based gaze
estimation method computes the reader's gaze trajectory. Second, a document
image retrieval algorithm is used to recognize document pages and extract
annotations. Third, a systematic error correction procedure is used to
post-calibrate the system parameters and compensate for spatial drifts. The
validation results show that the proposed method is capable of extracting
reliable gaze data when reading in low constrained experimental conditions.

A novel normalization principle for robust glint detection is presented. The
method is based on geometric properties of corneal reflections and allows for
simple and effective detection of glints even in the presence of several
spurious and identically appearing reflections. The method is tested on both
simulated and data obtained from web cameras. The proposed method is a possible
direction towards making eye trackers more robust to challenging scenarios.

The gaze locations reported by eye trackers often contain error resulting
from a variety of sources. Such error is of increasing concern to eye tracking
researchers, and several techniques have been introduced to clean up the error.
These methods, however, either compensate only for error caused by a particular
source (such as pupil dilation) or require the error to be somewhat constant
across space and time. This paper introduces a method that is applicable to
error generated from a variety of sources and that is resilient to the change
in error across the display. A study shows that, at least in some cases,
although the change in error across the display appears to be random it in fact
follows a consistent pattern which can be modeled using quadratic equations.
The parameters of these equations can be estimated using linear regression on
the error vectors between recorded fixations and possible target locations. The
resulting equations can then be used to clean up the error. This
regression-based approach is much easier to apply than some of the previously
published methods. The method is applied to the data of a visual search
experiment, and the results show that the regression-based error correction
works very well.

In this paper, we explore the problem of analyzing gaze patterns towards
attributing greater meaning to observed fixations. In recent years, there have
been a number of efforts that attempt to categorize fixations according to
their properties. Given that there are a multitude of factors that may
contribute to fixational behavior, including both bottom-up and top-down
influences on neural mechanisms for visual representation and saccadic control,
efforts to better understand factors that may contribute to any given fixation
may play an important role in augmenting raw fixation data. A grand objective
of this line of thinking is in explaining the reason for any observed fixation
as a combination of various latent factors. In the current work, we do not seek
to solve this problem in general, but rather to factor out the role of the
holistic structure of a scene as one observable, and quantifiable factor that
plays a role in determining fixational behavior. Statistical methods and
approximations to achieve this are presented, and supported by experimental
results demonstrating the efficacy of the proposed methods.

We show that the error in 3D gaze depth (vergence) estimated from
binocularly-tracked gaze disparity is related to the viewing distance of the
screen calibration plane at which 2D gaze is recorded. In a stereoscopic
(virtual) environment, this relationship is evident in gaze to target depth
error: vergence error behind the screen is greater than in front of the screen
and is lowest at the screen depth. In a physical environment, with no
accommodation-vergence conflict, the magnitude of vergence error in front of
the 2D calibration plane appears reversed, increasing with distance from the
viewer.

The effects of fast disparity adjustment in gaze-controlled stereoscopic
applications

With the emergence of affordable 3D displays, stereoscopy is becoming a
commodity. However, often users report discomfort even after brief exposures to
stereo content. One of the main reasons is the conflict between vergence and
accommodation that is caused by 3D displays. We investigate dynamic adjustment
of stereo parameters in a scene using gaze data in order to reduce discomfort.
In a user study, we measured stereo fusion times after abrupt manipulation of
disparities using gaze data. We found that gaze-controlled manipulation of
disparities can lower fusion times for large disparities. In addition we found
that gaze-controlled disparity adjustment should be applied in a personalized
manner and ideally performed only at the extremities or outside the comfort
zone of subjects. These results provide important insight on the problems
associated with fast disparity manipulation and are essential for developing
appealing gaze-contingent and gaze-controlled applications.

Gaze-contingent depth of field in realistic scenes: the user experience

Computer-generated objects presented on a display typically have the same
focal distance regardless of the monocular and binocular depth cues used to
portray a 3D scene. This is because they are presented on a flat screen display
that has a fixed physical location. In a stereoscopic 3D display, accommodation
(focus) of the eyes should always be at the distance of the screen for clear
vision regardless of the depth portrayed; this fixed accommodation conflicts
with vergence eye movements that the user must make to fuse stimuli located off
the screen. This is known as accommodation-vergence conflict and is detrimental
for user experience of stereoscopic virtual environments (VE), as it can cause
visual discomfort and diplopia during use of a stereoscopic display. It is
believed that, by artificially simulating focal blur and natural accommodation,
it is possible to compensate for the vergence-accommodation conflict and
alleviate these symptoms. We hypothesized that it is possible to compensate for
conflict with a fixed accommodation cue by adding simulated focal blur
according to instantaneous fixation.
We examined gaze-contingent depth of field (DOF) when used in stereoscopic
and non-stereoscopic 3D displays. We asked our participants to compare
different conditions in terms of depth perception, image quality and viewing
comfort. As expected, we found that monocular DOF gave a stronger impression of
depth than no depth of field, stereoscopic cues were stronger than any kind of
monocular cues, but adding depth of field to stereo displays did not enhance
depth impressions. The opposite was true for image comfort. People thought that
DOF impaired image quality in monocular viewing. We also observed that comfort
was affected by DOF and display mode in similar fashion as image quality.
However, the magnitude of the effects of DOF simulation on image quality
depended on whether people associated image quality with depth or not. These
results suggest that studies evaluating DOF effectiveness need to consider the
type of task, type of image and questions asked.

Research into driving skill, particularly of hazard perception, often
involves studies where participants either view pictures of driving scenarios
or use movie viewing paradigms. However oculomotor strategies tend to change
between active and passive tasks and attentional limitations are introduced
during real driving. Here we present a study using eye tracking methods, to
contrast oculomotor behaviour differences across a passive video based hazard
perception task and an active hazard perception simulated driving task. The
differences presented highlight a requirement to study driving skill under more
active conditions, where the participant is engaged with a driving task. Our
results suggest that more standard, passive tests, may have limited utility
when developing visual models of driving behaviour. The results presented here
have implications for driver safety measures and provide further insights into
how vision and action interact during natural activity.

Moving target acquisition is a challenging and manually stressful task if
performed using an all-manual, pointer-based interaction technique like mouse
interaction, especially if targets are small, move fast, and are visible on
screen only for a limited time. The MAGIC pointing interaction approach
combines the precision of manual, pointer-based interaction with the speed and
little manual stress of eye pointing. In this contribution, a pilot study with
twelve participants on moving target acquisition is presented using an abstract
experimental task derived from a video analysis scenario. Mouse input,
conservative MAGIC pointing and MAGIC button are compared considering
acquisition time, error rate, and user satisfaction. Although none of the
participants had used MAGIC pointing before, eight participants voted for MAGIC
button being their favorite technique; participants performed with only
slightly higher mean acquisition time and error rate than with the familiar
mouse input. Conservative MAGIC pointing was preferred by three participants;
however, mean acquisition time and error rate were significantly worse than
with mouse input.

Analysis II: finding patterns in eye tracking data

Several algorithms, approaches, and implementations have been developed to
support comparison of scan paths and finding of interesting scan path
structures. In this work we contribute a visual approach to support scan path
comparison. A key feature of this approach is the combination of a clustering
algorithm using Levenshtein distance with the parallel scan path visualization
technique. The combination of computational methods with an interactive
visualization allows us to use both the power of pattern finding algorithms and
the human ability to visually recognize patterns. To use the concept in
practice we implemented the approach in a prototype and show its application in
two scan path analysis scenarios from automobile usability testing and
visualization research.

Dynamical systems analysis tools, like Recurrence Plotting (RP), allow for
concise mathematical representations of complex systems with relatively simple
descriptive metrics. These methods are invariant for phase-space trajectories
of a time series from a dynamical system, allowing analyses on simplified data
sets which preserve the system model's dynamics. In the past decade, recurrence
methods have been applied to eye-tracking, but those analyses avoided
Time-Delay Embedding (TDE). Without TDE, we lose the assumption that
phase-space trajectories are being preserved in the recurrence plot. Thus,
analysis has been typically limited to clustering fixation locations in the
image space, instead of clustering data sequences in the phase space. We will
show how classical recurrence analysis methods can be extended to allow for
multi-modal data visualization and quantification, by presenting an open-source
python implementation for analyzing eye movements.

During eye tracking studies, vast amounts of spatio-temporal data in the
form of eye gaze trajectories are recorded. Finding insights into these
time-varying data sets is a challenging task. Visualization techniques such as
heat maps or gaze plots help find patterns in the data but highly aggregate the
data (heat maps) or are difficult to read due to overplotting (gaze plots). In
this paper, we propose transforming eye movement data into a dynamic graph data
structure to explore the visualization problem from a new perspective. By
aggregating gaze trajectories of participants over time periods or Areas of
Interest (AOIs), a fair trade-off between aggregation and details is achieved.
We show that existing dynamic graph visualizations can be used to display the
transformed data and illustrate the approach by applying it to eye tracking
data recorded for investigating the readability of tree diagrams.

The paper introduces a two-step method of quantifying eye movement
transitions between Areas of Interests (AOIs). First, individuals' gaze
switching patterns, represented by fixated AOI sequences, are modeled as Markov
chains. Second, Shannon's entropy coefficient of the fit Markov model is
computed to quantify the complexity of individual switching patterns. To
determine the overall distribution of attention over AOIs, the entropy
coefficient of individuals' stationary distribution of fixations is calculated.
The novelty of the method is that it captures the variability of individual
differences in eye movement characteristics, which are then summarized
statistically. The method is demonstrated on gaze data collected during free
viewing of classical art paintings. Shannon's coefficient derived from
individual transition matrices is significantly related to participants'
individual differences as well as to their aesthetic experience of art pieces.

Visual attention and eye movements

Sustained attention (vigilance) is required for many professions such as air
traffic controllers, imagery analysts, airport security screeners, and cyber
operators. A lapse in attention in any of these environments can have deadly
consequences. The purpose of this study was to determine the ability of
pupillometry to detect changes in vigilance performance. Each participant
performed a 40-minute vigilance task while wearing an eye-tracker on each of
four separate days. Pupil diameter, pupil eccentricity, and pupil velocity all
changed significantly over time (p<.05) during the task. Significant
correlations indicate that all metrics increased as vigilance performance
declined except for pupil diameter, which decreased and the pupil became
miotic. These results are consistent with other research on attention, fatigue,
and arousal levels. Using an eye-tracker to detect changes in pupillometry in
an operational environment would allow interventions to be implemented.

Pupil size is known to correlate with changes of cognitive task workloads,
but the pupillary response to requirements of basic goal-directed motor tasks
is not yet clear, although pointing with tools is a ubiquitous human task. This
work describes a user study to investigate the pupil dilations during aiming in
two tele-operation tasks with different target settings, one aiming at targets
with different sizes located at constant distance apart, and the other aiming
at targets varying in different distances. The task requirements in each task
were defined by Fitts' index of difficulty (ID). The purpose of this work is to
further explore how the changes in task requirements are reflected by the
changes of pupil size, i.e., whether the pupil responds to either target size
or target distance, or to both of them. Pupil responses to different task IDs
were recorded in each task. The results showed that the pupil responds to the
changes of ID, not just to the change of target size. This implies that pupil
diameter can be employed as an indicator of task requirement in goal-directed
movements, because higher task difficulty evoked higher peak pupil dilation
which occurred with longer delay. These findings can be used for detailed
understanding of eye-hand coordination mechanisms in interactive systems and
contribute to the foundation for developing methods to objectively evaluate
interactive task requirements using pupil parameters during goal-directed
movements.

Smooth pursuit eye movements anticipate the future motion of targets when
future motion is either signaled by visual cues or inferred from past history.
To study the effect of anticipation derived from movement planning, the eye
pursued a cursor whose horizontal motion was controlled by the hand via a
mouse. The direction of a critical turn was specified by a cue or was freely
chosen. Information from planning to move the hand (which itself showed
anticipatory effects) elicited anticipatory smooth eye movements, allowing the
eye to track self-generated target motion with virtually no lag. Lags were
present only when either visual cues or motor cues were removed. The results
show that information derived from the planning of movement is as effective as
visual cues in generating anticipatory eye movements. Eye movements in dynamic
environments will be facilitated by collaborative anticipatory movements of
hand and eye. Cues derived from movement planning may be particularly valuable
in fast-paced human-computer interactions.

Exploring the influence of audio in directing visual attention during
dynamic content

The mechanisms underlying the allocation of visual attention toward dynamic
content are still largely unexplored. Due to the number of variables present
during dynamic content, it is often difficult to confidently determine what
components direct visual attention. In this study, we manipulated the presence
of audio in an attempt to explore the contribution of audio in driving visual
attention during dynamic content. Participants viewed a reel of non-global
commercials while their eye movements were recorded. Participants were either
exposed to content containing the original audio track or content in which the
audio track was edited out. Dynamic heat maps were created for each ad in order
to identify areas of high visual attention between the conditions. Fixation
durations and fixation counts for each area of interest were then computed.
Analyses showed that the presence of audio has an influence on the allocation
of visual attention during dynamic content, most notably in regard to on-screen
text. Understanding the influence of audio in directing visual attention may
help future researchers control for the extraneous influence of audio in
eye-tracking methodologies.

Overlaying visual cues on diagrams and animations can help students attend
to relevant areas and facilitate problem solving. In this study we investigated
the effects of visual cues on students' eye movements as they solved conceptual
physics problems. Students (N=80) enrolled in an introductory physics course
individually worked through four sets of problems, each containing a diagram,
while their eye movements were recorded. Each diagram contained regions that
were alternatively relevant to solving the problem correctly or related to
common incorrect responses. Each problem set contained an initial problem, six
isomorphic training problems, and a transfer problem. Those in the cued
condition saw visual cues overlaid on the training problems. Students provided
verbal responses. The cued group more accurately answered the (uncued) transfer
problems, and their eye movements showed they more efficiently extracted the
necessary information from the relevant area than the uncued group.

Mobile eye tracking & applications

For validly analyzing human visual attention, it is often necessary to
proceed from computer-based desktop set-ups to more natural real-world
settings. However, the resulting loss of control has to be counterbalanced by
increasing participant and/or item count. Together with the effort required to
manually annotate the gaze-cursor videos recorded with mobile eye trackers,
this renders many studies unfeasible.
We tackle this issue by minimizing the need for manual annotation of mobile
gaze data. Our approach combines geometric modelling with inexpensive 3D marker
tracking to align virtual proxies with the real-world objects. This allows us
to classify fixations on objects of interest automatically while supporting a
completely free moving participant.
The paper presents the EyeSee3D method as well as a comparison of an
expensive outside-in (external cameras) and a low-cost inside-out (scene
camera) tracking of the eye-tracker's position. The EyeSee3D approach is
evaluated comparing the results from automatic and manual classification of
fixation targets, which raises old problems of annotation validity in a modern
context.

An investigation into determining head pose for gaze estimation on
unmodified mobile devices

Traditionally, devices which are able to determine a users gaze are large,
expensive and often restrictive. We investigate the prospect of using common
webcams and mobile devices such as laptops, tablets and phones without
modification as an alternative means for obtaining a users gaze. A person's
gaze can be fundamentally determined by the pose of the head as well as the
orientation of the eyes. This initial work investigates the first of these
factors -- an estimate of the 3D head pose (and subsequently the positions of
the eye centres) relative to a camera device. Specifically, we seek a low cost
algorithm that requires only a one-time calibration for an individual user,
that can run in real-time on the aforementioned mobile devices with noisy
camera data. We use our head tracker to estimate the 4 eye corners of a user
over a 10 second video. We present the results at several different frames per
second (fps) to analyse the impact on the tracker with lower quality cameras.
We show that our algorithm is efficient enough to run at 75fps on a common
laptop, but struggles with tracking loss when the fps is lower than 10fps.

Despite the widespread use of mobile phones and tablets, hand-held portable
devices have only recently been identified as a promising platform for
gaze-aware applications. Estimating gaze on portable devices is challenging
given their limited computational resources, low quality integrated
front-facing RGB cameras, and small screens to which gaze is mapped. In this
paper we present EyeTab, a model-based approach for binocular gaze estimation
that runs entirely on an unmodified tablet. EyeTab builds on set of established
image processing and computer vision algorithms and adapts them for robust and
near-realtime gaze estimation. A technical prototype evaluation with eight
participants in a normal indoors office setting shows that EyeTab achieves an
average gaze estimation accuracy of 6.88° of visual angle at 12 frames per
second.

Humans see things from various viewpoints but nobody attempts to see
anything from every viewpoint owing to physical limitations and the great
effort required. Intelligent interfaces for viewing multi-viewpoint videos may
effectively remove these limitations and open up a new visual world to mankind.
We have developed a multi-viewpoint video viewer that incorporates
target-centered viewpoint switching. The viewer stabilizes an object at the
center of the display field, which helps to focus the user's gaze on the
target. We conducted a user study to analyze user behavior, especially eye
movement, while watching a multi-viewpoint video on the viewer. Statistical
analyses of the results indicated that the target-centered viewpoint switching
encouraged the users to gaze at the center of the display where the target was
located during the viewing. We believe that these are useful findings that pave
the way for the design of even more intelligent viewers.

Heatmap is one of the most popular visualizations of gaze behavior, however,
increasingly voluminous streams of eye-tracking data make processing of such
visualization computationally demanding. Because of high requirements on a
single processing machine, real-time visualizations from multiple users are
unfeasible if rendered locally. We designed a framework that collects data from
multiple eye-trackers regardless of their physical location, analyses these
streams, and renders heatmaps in real-time. We propose a cloud computing
architecture (EyeCloud) consisting of master and slave nodes on a cloud
cluster, and a web interface for fast computation and effective aggregation of
the large volumes of eye-tracking data. In experimental studies of the
feasibility and effectiveness, we built a cloud cluster on a well-known
service, implemented the architecture and reported on a comparison between the
proposed system and traditional local processing. The results showed efficiency
of the EyeCloud when recordings vary in durations. To our knowledge, this is
the first solution to implement cloud computing for gaze visualization.

When evaluating eye tracking algorithms, a recurring issue is what metric to
use and what data to compare against. User studies are informative when
considering the entire eye tracking system, however they are often
unsatisfactory for evaluating the gaze estimation algorithm in isolation. This
is particularly an issue when evaluating a system's component parts, such as
pupil detection, pupil-to-gaze mapping or head pose estimation.
Instead of user studies, eye tracking algorithms can be evaluated using
simulated input video. We describe a computer graphics approach to creating
realistic synthetic eye images, using a 3D model of the eye and head and a
physically correct rendering technique. By using rendering, we have full
control over the parameters of the scene such as the gaze vector or camera
position, which allows the calculation of ground truth data, while creating a
realistic input for a video-based gaze estimator.

We present in this paper a novel study aiming at identifying the differences
in visual search patterns between physicians of diverse levels of expertise
during the screening of colonoscopy videos. Physicians were clustered into two
groups -- experts and novices -- according to the number of procedures
performed, and fixations were captured by an eye-tracker device during the task
of polyp search in different video sequences. These fixations were integrated
into heat maps, one for each cluster. The obtained maps were validated over a
ground truth consisting of a mask of the polyp, and the comparison between
experts and novices was performed by using metrics such as reaction time,
dwelling time and energy concentration ratio. Experimental results show a
statistically significant difference between experts and novices, and the
obtained maps show to be a useful tool for the characterisation of the
behaviour of each group.

Poster abstracts

Visual foraging is investigated by examining the nature of statistical
distributions underlying human search strategies. Eye movements uninfluenced by
scene perception or higher level cognition tasks are used to generate a data
set which can be analyzed to study 'pure' searches. Eye movements in the form
of 'jump' length constituting the entire search process are studied to detect
the presence of statistical distributions whose parameters can be estimated.
Animal ecology studies have reported the presence of a Lèvy flight/power
law model, which explains animal foraging patterns in few species. We consider
a Lèvy flight model to explain visual foraging. Results from data
analysis, while not ruling out the presence of a power law entirely, point
strongly towards the presence of a mixture distribution which faithfully
explains visual foraging. This mixture distribution is made up of gamma
distributions.

An eye-tracking study assessing the comprehension of c++ and Python source
code

A study to assess the effect of programming language on student
comprehension of source code is presented, comparing the languages of C++ and
Python in two task categories: overview and find bug tasks. Eye gazes are
tracked while thirty-eight students complete tasks and answer questions.
Results indicate no significant difference in accuracy or time, however there
is a significant difference reported on the rate at which students look at
buggy lines of code. These results start to provide some direction as to the
effect programming language might have in introductory programming classes.

Attentional processes in natural reading: the effect of margin annotations
on reading behaviour and comprehension

We present an eye tracking study to investigate how natural reading behavior
and reading comprehension are influenced by in-context annotations. In a lab
experiment, three groups of participants were asked to read a text and answer
comprehension questions: a control group without taking annotations, a second
group reading and taking annotations, and a third group reading a
peer-annotated version of the same text. A self-made head-mounted eye tracking
system was specifically designed for this experiment, in order to study how
learners read and quickly re-read annotated paper texts, in low constrained
experimental conditions. In the analysis, we measured the phenomenon of
annotation-induced overt attention shifts in reading, and found that: (1) the
reader's attention shifts toward a margin annotation more often when the
annotation lies in the early peripheral vision, and (2) the number of attention
shifts, between two different types of information units, is positively related
to comprehension performance in quick re-reading. These results can be
translated into potential criteria for knowledge assessment systems.

We present a framework for collaborative image analysis where gaze
information is shared across all users. A server gathers and broadcasts
fixation data from/to all clients and the clients visualize this information.
Several visualization options are provided. The system can run in real-time or
gaze information can be recorded and shared the next time an image is accessed.
Our framework is scalable to large numbers of clients with different eye
tracking devices. To evaluate our system we used it within the context of a
spot-the-differences game. Subjects were presented with 10 image pairs each
containing 5 differences. They were given one minute to detect the differences
in each image. Our study was divided into three sessions. In session 1,
subjects completed the task individually, in session 2, pairs of subjects
completed the task without gaze sharing, and in session 3, pairs of subjects
completed the task with gaze sharing. We measured accuracy, time-to-completion
and visual coverage over each image to evaluate the performance of subjects in
each session. We found that visualizing shared gaze information by graying out
previously scrutinized regions of an image significantly increases the dwell
time in the areas of the images that are relevant to the task (i.e. the regions
where differences actually occurred). Furthermore, accuracy and
time-to-completion also improved over collaboration without gaze sharing though
the effects were not significant. Our framework is useful for a wide range of
image analysis applications which can benefit from a collaborative approach.

Design issues of remote eye tracking systems with large range of movement

One of the goals of the eye tracking community is to build systems that
allow users to move freely. In general, there is a trade-off between the field
of view of an eye tracking system and the gaze estimation accuracy. We aim to
study how much the field of view of an eye tracking system can be increased,
while maintaining acceptable accuracy. In this paper, we investigate all the
issues concerning remote eye tracking systems with large range of movement in a
simulated environment and we give some guidelines that can facilitate the
process of designing an eye tracker. Given a desired range of movement and a
working distance, we can calculate the camera focal length and sensor size or
given a certain camera, we can determine the user's range of movement. The
robustness against large head movement of two gaze estimation methods based on
infrared light is analyzed: an interpolation and a geometrical method. We
relate the accuracy of the gaze estimation methods with the image resolution
around the eye area for a certain feature detector's accuracy and provide
possible combinations of pixel size and focal length for different gaze
estimation accuracies. Finally, we give the gaze estimation accuracy as a
function of a new defined eye error, which is independent of any design
parameters.

Development of an untethered, mobile, low-cost head-mounted eye tracker

Head-mounted eye-tracking systems allow us to observe participants' gaze
behaviors in largely unconstrained, real-world settings. We have developed
novel, untethered, mobile, low-cost, lightweight, easily-assembled head-mounted
eye-tracking devices, comprised entirely of off-the-shelf components, including
untethered, point-of-view, sports cameras. In total, the parts we have used
cost $153, and we suggest untested alternative components that reduce the cost
of parts to $31. Our device can be easily assembled using hobbying skills and
techniques. We have developed hardware, software, and methodological techniques
to perform point-of-regard estimation, and to temporally align scene and eye
videos in the face of variable frame rate, which plagues low-cost, lightweight,
untethered cameras. We describe an innovative technique for synchronizing eye
and scene videos using synchronized flashing lights. Our hardware, software,
and calibration designs will be made publicly available, and we describe them
in detail here, to facilitate replication of our system. We also describe novel
smooth-pursuit-based calibration methodology, which affords rich sampling of
calibration data while compensating for lack of information regarding the
extent of visibility on participants' scene recordings. Validation experiments
indicate accuracy within 0.752 degrees of visual angle on average.

Recently, the eye-tracker has been developed as a daily-use device. However,
when an eye-tracker is used daily, the problem of calibration arises. Even when
the calibration for computing the relationship between the scene and eye camera
is conducted in advance, the relationship is not maintained in prolonged use.
Therefore, we propose a method for conserving the relationship between the
scene and eye camera during the execution of an eye-tracking program. The
texture information of the corneal surface image is used to estimate the
point-of-regard. We confirm the feasibility of the proposed method through
preliminary experiments.

EYEDIAP: a database for the development and evaluation of gaze estimation
algorithms from RGB and RGB-D cameras

The lack of a common benchmark for the evaluation of the gaze estimation
task from RGB and RGB-D data is a serious limitation for distinguishing the
advantages and disadvantages of the many proposed algorithms found in the
literature. This paper intends to overcome this limitation by introducing a
novel database along with a common framework for the training and evaluation of
gaze estimation approaches. In particular, we have designed this database to
enable the evaluation of the robustness of algorithms with respect to the main
challenges associated to this task: i) Head pose variations; ii) Person
variation; iii) Changes in ambient and sensing conditions and iv) Types of
target: screen or 3D object.

Eye tracking research in disciplines such as cognitive psychology requires
specific software packages designed for experiments supporting reaction time
measurement, blocking and mixing of conditions and item randomisation. Although
recording raw eye movement data is possible, its visualisation is difficult
regarding the experimental design. The currently used eye tracking software is
often built as an all-in-one program that can only visualise the eye tracking
data recorded by itself. Therefore, in this paper a software tool is presented
that visualises nearly any recorded eye tracking gaze data on the corresponding
video independent of the specific software that runs the experiment. Summarised
visualisations over randomised item presentations according to experimental
conditions can be created. In addition to basic visualisation functionalities,
further features such as simple object detection, repetitive pattern
exploration and subset selection of subjects are provided.

Print interpreting is a form of communication that allows deaf and hard of
hearing people to get access to speech. We carried out an eye tracking
experiment where twenty participants read print interpreted text presented
dynamically on a computer screen. We compared regression landing points on
reread words between two dynamic text presentation formats: letter-by-letter
and word-by-word. Then we investigated the gaze behaviour from a linguistic
point of view in order to discover whether the dynamic presentation has an
effect on linguistic factors. In particular, we have examined the parts of
speech of the first and the second landing points of regressions. The findings
suggest significant difference between the presentation formats. There is also
a relationship between the gaze behaviour and the linguistic processing of
dynamic text. Being conscious of this lexical hierarchy may help to develop
supporting print interpreting tools and consequently may also help print
interpreters to improve the presentation of dynamic text to the user.

The cross-ratio approach has recently attracted increasing attention in
eye-gaze tracking due to its simplicity in setting up a tracking system. Its
accuracy, however, is lower than that of the model-based approach, and
substantial efforts have been devoted to improving its accuracy. Binocular
fixation is essential for humans to have good depth perception, and this paper
presents a technique leveraging this constraint. It is used in two ways: First,
in estimating jointly the homography matrices for both eyes, and second, in
estimating the eye gaze itself. Experimental results with both synthetic and
real data show that the proposed approach produces significantly better results
than using a single eye and also better than averaging the independent results
from the two eyes.

Influence of stimulus and viewing task types on a learning-based visual
saliency model

Learning-based approaches using actual human gaze data have been proven to
be an efficient way to acquire accurate visual saliency models and attracted
much interest in recent years. However, it still remains yet to be answered how
different types of stimulus (e.g., fractal images, and natural images with or
without human faces) and viewing tasks (e.g., free viewing or a preference
rating task) affect learned visual saliency models. In this study, we
quantitatively investigate how learned saliency models differ when using
datasets collected in different settings (image contextual level and viewing
task) and discuss the importance of choosing appropriate experimental settings.

Traditional content-based image retrieval techniques, which primarily rely
on image content at the pixel level, are not effective in accessing images at
the semantic level. Defining approaches to incorporate experts' perceptual and
conceptual capabilities of image understanding in their domain of expertise
into the retrieval processes promises to help bridge this semantic gap. Towards
accomplishing this, we design and implement a novel multimodal interactive
system for image retrieval. To incorporate human expertise, the system stores
expert-derived information extracted from two human sensor modalities that
intuitively relate to image search, eye movements and verbal descriptions, both
generated by medical experts. Experimental evaluation of the system shows that
by transferring experts' perceptual expertise and domain knowledge into
image-based computational procedures, our system can take advantage of the
different human-centered modalities' respective strengths and improve the
retrieval performance over just using image-based features.

Machine-extracted eye gaze features: how well do they correlate to
sight-reading abilities of piano players?

Skilled piano players are able to decipher and play a musical piece they had
never seen before (a skill known as sight-reading). For a sample of 23 piano
players of various abilities we consider the correlation between
machine-extracted gaze path features and the overall human rating. We find that
correlation values (between machine-extracted gaze features and overall human
ratings) are statistically similar to correlation values between
human-extracted task-related ratings (e.g., note accuracy, error rate) and
overall human ratings. These high correlation values suggest that an eye
tracking-enabled computer could help students assess their sight-reading
abilities, and could possibly advise students on how to improve. The approach
could be extended to any musical instrument. For keyboard players, a MIDI
keyboard with the appropriate software to provide information about note
accuracy and timing could complement feedback from an eye tracker to enable
more detailed analysis and advice.

Relevance is a fundamental concept in information retrieval. We consider
relevance from the user's perspective and ask if the degree of relevance can be
inferred from eye-tracking data and if it is related to the cognitive effort
involved in relevance judgments. To this end we conducted a study, in which
participants were asked to find information in screen-long text documents
containing news stories. Each participant responded to fourteen trials
consisting of an information question followed by three documents each at a
different level of relevance (irrelevant, partially relevant, and relevant).
The results indicate that relevant documents tended to be continuously read,
while irrelevant documents tended to be scanned. In most cases, cognitive
effort inferred from eye-tracking data was highest for partially relevant
documents and lowest for irrelevant documents.

Since Yarbus's seminal work in 1965, vision scientists have argued that
people's eye movement patterns differ depending upon their task. This suggests
that we may be able to infer a person's task (or mental state) from their eye
movements alone. Recently, this was attempted by Greene et al. [2012] in a
Yarbus-like replication study; however, they were unable to successfully
predict the task given to their observer. We reanalyze their data, and show
that by using more powerful algorithms it is possible to predict the observer's
task. We also used our algorithms to infer the image being viewed by an
observer and their identity. More generally, we show how off-the-shelf
algorithms from machine learning can be used to make inferences from an
observer's eye movements, using an approach we call Multi-Fixation Pattern
Analysis (MFPA).

The accuracy of gaze point estimation is one of the main limiting factors in
developing applications that utilize gaze input. The existing gaze point
correction methods either do not support real-time interaction or imply
restrictions on gaze-controlled tasks and object screen locations. We
hypothesize that when gaze points can be reliably correlated with object screen
locations, it is possible to gather and leverage this information for improving
the accuracy of gaze pointing. We propose an algorithm that uses a growing pool
of such collected correlations between gaze points and objects for real-time
hidden gaze point correction. We tested this algorithm assuming that any point
inside of a rectangular object has equal probability to be hit by gaze. We
collected real data in a user study to simulate pointing at targets of small
(<30px), medium (~50px) and large (>80px) size. The results showed that
our algorithm can significantly improve the hit rate especially in pointing at
middle-sized targets. The proposed method is real-time, person- and
task-independent and is applicable for arbitrary located objects.

In this paper, a novel approach for real-time heatmap generation and
visualization of 3D gaze data is presented. By projecting the gaze into the
scene and considering occlusions from the observer's view, to our knowledge,
for the first time a correct visualization of the actual scene perception in 3D
environments is provided. Based on a graphics-centric approach utilizing the
graphics pipeline, shaders and several optimization techniques, heatmap
rendering is fast enough for an interactive online and offline gaze analysis of
thousands of gaze samples.

Recognition of translator expertise using sequences of fixations and
keystrokes

Professional human translation is necessary to meet high quality standards
in industry and governmental agencies. Translators engage in multiple
activities during their task, and there is a need to model their behavior, with
the objective to understand and optimize the translation process. In recent
years, user interfaces enabled us to record user events such as eye-movements
or keystrokes. Although there have been insightful descriptive analysis of the
translation process, there are multiple advantages in enabling quantitative
inference. We present methods to classify sequences of fixations and keystrokes
into activities and model translation sessions with the objective to recognize
translator expertise. We show significant error reductions in the task of
recognizing certified translators and their years of experience, and analyze
the characterizing patterns.

Understanding and characterizing perceptual expertise is a major bottleneck
in developing intelligent systems. In knowledge-rich domains such as
dermatology, perceptual expertise influences the diagnostic inferences made
based on the visual input. This study uses eye movement data from 12
dermatology experts and 12 undergraduate novices while they inspected 34
dermatological images. This work investigates the differences in global and
local temporal fixation patterns between the two groups using recurrence
quantification analysis (RQA). The RQA measures reveal significant differences
in both global and local temporal patterns between the two groups. Results show
that experts tended to refixate previously inspected areas less often than did
novices, and their refixations were more widely separated in time. Experts were
also less likely to follow extended scan paths repeatedly than were novices.
These results suggest the potential value of RQA measures in characterizing
perceptual expertise. We also discuss potential use of the RQA method in
understanding the interactions between experts' visual and linguistic behavior.

Visualization by heat maps is a powerful technique for showing frequently
visited areas in displayed stimuli. However, by aggregating the spatio-temporal
data, heat maps lose the information about the transitions between fixations,
i.e., the saccades. In gaze plots, instead, trajectories are shown as
overplotted polylines, leading to much visual clutter, which makes those
diagrams difficult to read. In this paper, we introduce Saccade Plots as a
novel technique that combines the benefits of both approaches: it shows the
gaze frequencies as a heat map and the saccades in the form of color-coded
triangular matrices that surround the heat map. We illustrate the usefulness of
our technique by applying it to a representative example from a previously
conducted eye tracking study.

To create input videos for testing pupil detection algorithms for outdoor
eye tracking, we develop a simulation of the eye with front-surface reflections
of the cornea and the internal refractions of the cornea and refraction at the
air/cornea and cornea/aqueous boundaries. The scene and iris are simulated
using texture mapping and are alpha-blended to produce the final image of the
eye with reflections and refractions. The simulation of refraction is important
in order to observe the elliptical shape that the pupil takes on as it goes off
axis, and to take into consideration the difference between true pupil position
and apparent (entrance) pupil position. Sequences of images are combined to
produce input videos for testing the next generation of pupil detection and
tracking algorithms, which must sort the pupil out of distracting edges and
reflected objects.

Starting to get bored: an outdoor eye tracking study of tourists exploring a
city panorama

Predicting the moment when a visual explorer of a place loses interest and
starts to get bored is of considerable importance to the design of touristic
information services. This paper investigates factors affecting the duration of
the visual exploration of a city panorama. We report on an empirical outdoor
eye tracking study in the real world with tourists following a free exploration
paradigm without a time limit. As main result, the number of areas of interest
revisited during a short period was found to be a good predictor for the total
exploration duration.

SubsMatch: scanpath similarity in dynamic scenes based on subsequence
frequencies

The analysis of visual scanpaths, i.e., series of fixations and saccades, in
complex dynamic scenarios is highly challenging and usually performed manually.
We propose SubsMatch, a scanpath comparison algorithm for dynamic, interactive
scenarios based on the frequency of repeated gaze patterns. Instead of
measuring the gaze duration towards a semantic target object (which would be
hard to label in dynamic scenes), we examine the frequency of attention shifts
and exploratory eye movements. SubsMatch was evaluated on highly dynamic data
from a driving experiment to identify differences between scanpaths of subjects
who failed a driving test and subjects who passed.

The applicability of probabilistic methods to the online recognition of
fixations and saccades in dynamic scenes

In many applications involving scanpath analysis, especially when dynamic
scenes are viewed, consecutive fixations and saccades, have to be identified
and extracted from raw eye-tracking data in an online fashion. Since
probabilistic methods can adapt not only to the individual viewing behavior,
but also to changes in the scene, they are best suited for such tasks.
In this paper we analyze the applicability of two types of main-stream
probabilistic models to the identification of fixations and saccades in dynamic
scenes: (1) Hidden Markov Models and (2) Bayesian Online Mixture Models. We
analyze and compare the classification performance of the models on
eye-tracking data collected during real-world driving experiments.

Consistent measuring and reporting of gaze data quality is important in
research that involves eye trackers. We have developed TraQuMe: a generic
system to evaluate the gaze data quality. The quality measurement is fast and
the interpretation of the results is aided by graphical output. Numeric data is
saved for reporting of aggregate metrics for the whole experiment. We tested
TraQuMe in the context of a novel hidden calibration procedure that we
developed to aid in experiments where participants should not know that their
gaze is being tracked. The quality of tracking data after the hidden
calibration procedure was very close to that obtained with the Tobii's T60
trackers built-in 2 point, 5 point and 9 point calibrations.

Novices were trained to perform a unimanual peg transport task in a
laparoscopic training box with an illuminated interior displayed on a monitor.
Subjects were divided into two groups; one group was verbally instructed to
direct their gaze at distant targets, while the other group had their gaze
behaviour implicitly manipulated using distant target illumination. Both groups
achieved similar task completion times post-training and developed peripheral
vision strategies leading to delayed foveation on targets until the instrument
was closer to its destination, although the ability to focus on targets earlier
during manual movements as done by an expert surgeon was quickly regained by
the verbal instruction group post-training. This suggests that care should be
taken when employing visual attention cuing methods such as target highlighting
for training eye-hand coordination skills, as simple verbal instruction may be
sufficient to help trainees to adopt more expert-like gaze behaviours.

What influences dwell time during source code reading?: analysis of element
type and frequency as factors

While knowledge about reading behavior in natural-language text is abundant,
little is known about the visual attention distribution when reading source
code of computer programs. Yet, this knowledge is important for teaching
programming skills as well as designing IDEs and programming languages. We
conducted a study in which 15 programmers with various expertise read short
source codes and recorded their eye movements. In order to study attention
distribution on code elements, we introduced the following procedure: First we
(pre)-processed the eye movement data using log-transformation. Taking into
account the word lengths, we then analyzed the time spent on different lexical
elements. It shows that most attention is oriented towards understanding of
identifiers, operators, keywords and literals, relatively little reading time
is spent on separators. We further inspected the attention on keywords and
provide a description of the gaze on these primary building blocks for any
formal language. The analysis indicates that approaches from research on
natural-language text reading can be applied to source code as well, however
not without review.

Demo/video session

Several algorithms, approaches, and implementations have been developed to
support comparison of scan paths and finding of interesting scan path
structures. In this work we contribute a visual approach to support scan path
comparison. A key feature of this approach is the combination of a clustering
algorithm using Levenshtein distance with the parallel scan path visualization
technique. The combination of computational methods with an interactive
visualization allows us to use both the power of pattern finding algorithms and
the human ability to visually recognize patterns. To use the concept in
practice we implemented the approach in a prototype and show its application in
two scan path analysis scenarios from automobile usability testing and
visualization research.

We present in this paper a novel study aiming at identifying the differences
in visual search patterns between physicians of diverse levels of expertise
during the screening of colonoscopy videos. Physicians were clustered into two
groups -- experts and novices -- according to the number of procedures
performed, and fixations were captured by an eye-tracker device during the task
of polyp search in different video sequences. These fixations were integrated
into heat maps, one for each cluster. The obtained maps were validated over a
ground truth consisting of a mask of the polyp, and the comparison between
experts and novices was performed by using metrics such as reaction time,
dwelling time and energy concentration ratio. Experimental results show a
statistically significant difference between experts and novices, and the
obtained maps show to be a useful tool for the characterisation of the
behaviour of each group.

We introduce a new design for the visual analysis of eye tracking data
recorded from dynamic stimuli such as video. ISeeCube includes multiple
coordinated views to support different aspects of various analysis tasks. It
combines methods for the spatiotemporal analysis of gaze data recorded from
unlabeled videos as well as the possibility to annotate and investigate dynamic
Areas of Interest (AOIs). A static overview of the complete data set is
provided by a space-time cube visualization that shows gaze points with
density-based color mapping and spatiotemporal clustering of the data. A
timeline visualization supports the analysis of dynamic AOIs and the viewers'
attention on them. AOI-based scanpaths of different viewers can be clustered by
their Levenshtein distance, an attention map, or the transitions between AOIs.
With the provided visual analytics techniques, the exploration of eye tracking
data recorded from several viewers is supported for a wide range of analysis
tasks.

Continuous, real-time tracking of eye gaze is valuable in a variety of
scenarios including hands-free interaction with the physical world, detection
of unsafe behaviors, leveraging visual context for advertising, life logging,
and others. While eye tracking is commonly used in clinical trials and user
studies, it has not bridged the gap to everyday consumer use. The challenge is
that a real-time eye tracker is a power-hungry and computation-intensive device
which requires continuous sensing of the eye using an imager running at many
tens of frames per second, and continuous processing of the image stream using
sophisticated gaze estimation algorithms. Our key contribution is the design of
an eye tracker that dramatically reduces the sensing and computation needs for
eye tracking, thereby achieving orders of magnitude reductions in power
consumption and form-factor. The key idea is that eye images are extremely
redundant, therefore we can estimate gaze by using a small subset of carefully
chosen pixels per frame. We use a sparse pixel-based gaze estimation algorithm
that is a multi-layer neural network learned using a state-of-the-art
sparsity-inducing regularization function which minimizes the gaze prediction
error while simultaneously minimizing the number of pixels used. Our results
show that we can operate at roughly 70mW of power, while continuously
estimating eye gaze at the rate of 30 Hz with errors of roughly 4 degrees.

For solving complex tasks cooperatively in close interaction with robots,
they need to understand natural human communication. To achieve this, robots
could benefit from a deeper understanding of the processes that humans use for
successful communication. Such skills can be studied by investigating human
face-to-face interactions in complex tasks. In our work the focus lies on
shared-space interactions in a path planning task and thus 3D gaze directions
and hand movements are of particular interest.
However, the analysis of gaze and gestures is a time-consuming task:
Usually, manual annotation of the eye tracker's scene camera video is necessary
in a frame-by-frame manner. To tackle this issue, based on the EyeSee3D method,
an automatic approach for annotating interactions is presented: A combination
of geometric modeling and 3D marker tracking serves to align real world stimuli
with virtual proxies. This is done based on the scene camera images of the
mobile eye tracker alone. In addition to the EyeSee3D approach, face detection
is used to automatically detect fixations on the interlocutor. For the
acquisition of the gestures, an optical marker tracking system is integrated
and fused in the multimodal representation of the communicative situation.

In this work we describe a method of pupil detection for subsequent gaze
tracking, when specular reflection is present in the image. Gaze tracking
commonly uses the spatial relationship between the pupil and corneal
reflection, but is not robust when the user is wearing eyeglasses, since light
reflected from the surroundings changes the appearance of the pupil. In this
research we propose and evaluate a pupil detection method that can perform
robustly even in the presence of such reflection.

This document describes the software framework of an ocular biometric
system. The framework encompasses several interconnected components that allow
an end-user to perform biometric enrollment, verification, and identification
with most common eye tracking devices. The framework, written in C#, includes
multiple state-of-the-art biometric algorithms and information fusion
techniques, and can be easily extended to utilize new biometric techniques and
eye tracking devices.

For validly analyzing human visual attention, it is often necessary to
proceed from computer-based desktop set-ups to more natural real-world
settings. However, the resulting loss of control has to be counterbalanced by
increasing participant and/or item count. Together with the effort required to
manually annotate the gaze-cursor videos recorded with mobile eye trackers,
this renders many studies unfeasible.
We tackle this issue by minimizing the need for manual annotation of mobile
gaze data. Our approach combines geometric modelling with inexpensive 3D marker
tracking to align virtual proxies with the real-world objects. This allows us
to classify fixations on objects of interest automatically while supporting a
completely free moving participant.
The paper presents the EyeSee3D method as well as a comparison of an
expensive outside-in (external cameras) and a low-cost inside-out (scene
camera) tracking of the eye-tracker's position. The EyeSee3D approach is
evaluated comparing the results from automatic and manual classification of
fixation targets, which raises old problems of annotation validity in a modern
context.

Doctoral symposium extended abstracts

Many different eye-tracking calibration techniques have been developed [e.g.
see Talmi and Liu 1999; Zhu and Ji 2007]. A community standard is a
9-point-sparse calibration that relies on sequential presentation of known
scene targets. However, fixating different points has been described as
tedious, dull and tiring for the eye [Bulling, Gellersen, Pfeuffer, Turner and
Vidal 2013].

Assessment of the improvement of signal recorded in infant EEG by using eye
tracking algorithms

Event-related potentials (ERPs) elicited by visual stimuli consist in
showing the same stimuli to the subject dozens of times while recording the
electrical brain activity and averaging afterwards the EEG signal of the valid
trials to get rid of the general brain activity and keep the response generated
by the stimuli. ERPs are a common methodology used among cognitive
developmental scientists to investigate how infants develop because responses
to external events can be observed in ERP without specific behavioral
requirements from the infants. However, applying this technique to infants has
some disadvantages that are not found in adult participants. These are mainly
the limited attention span and the difficulty of getting enough free-artifact
trials due to movement artifacts and lack of attention to the stimuli. These
limitations are the main reason for the current attrition rates in infant ERP
studies, which are expected of between 50%-75% [DeBoer et al., 2007; Stets et
al., 2012].

The project is rooted in the concepts of cognitive psychopathology stating
that clinical disorders stem from dysfunctional cognitive mechanisms. I hope
that the project will help in validation whether attentional bias towards
negative stimuli is an underlying cause of depressive disorders.

Eye tracking techniques have moved from the laboratory into everyday life;
examples include input interfaces for the severely handicapped and
object-of-interest selection in the camera finder. They will bring great
benefits when they can be used easily in everyday life. However, the current
major tracking devices are not accepted widely because they set cameras in
front of the user's face which plays several extremely important roles in
everyday life. Desktop devices or special personal devices can be used but they
impose their own limitations.

Visual perception is perhaps the most important sensory input. During
driving, about 90% of the relevant information is related to the visual input
[Taylor 1982]. However, the quality of visual perception decreases with age,
mainly related to a reduce in the visual acuity or in consequence of diseases
affecting the visual system. Amongst the most severe types of visual
impairments are visual field defects (areas of reduced perception in the visual
field), which occur as a consequence of diseases affecting the brain, e.g.,
stroke, brain injury, trauma, or diseases affecting the optic nerve, e.g.,
glaucoma. Due to demographic aging, the number of people with such visual
impairments is expected to rise [Kasneci 2013]. Since persons suffering from
visual impairments may overlook hazardous objects, they are prohibited from
driving. This, however, leads to a decrease in quality of life, mobility, and
participation in social life. Several studies have shown that some patients
show a safe driving behavior despite their visual impairment by performing
effective visual exploration, i.e., adequate eye and head movements (e.g.,
towards their visual field defect [Kasneci et al. 2014b]). Thus, a better
understanding of visual perception mechanisms, i.e., of why and how we attend
certain parts of our environment while "ignoring" others, is a key question to
helping visually impaired persons in complex, real-life tasks, such as driving
a car.

The role of processing fluency in online consumer behavior: evaluating
fluency by tracking eye movements

The Internet enables people to extensively research products or services,
and also easily compare prices between offers [e.g. Baker et al. 2001]. Taking
into account the amount of information available on the Internet, acquisition
of new information can face some difficulties, especially when one wants to
make a purchase decision. Therefore, the ability to process relevant
information fluently enables a user to create a better experience and to become
more efficient in gathering information related to the purpose of the visit.
This ability might be connected to the cognitive task that can either be
effortless or effortful, and may lead to a metacognitive experience of either
fluency or disfluency [Alter and Oppenheimer 2009]. Nevertheless, some
e-commerce websites are preferred over others and this preference varies
between individuals. This variation can be influenced by user's prior
experience, cognitive sources but also graphics or information architecture on
the web page. Presented project aims at applying the fluency concept to
consumer behavior in online environment by studying eye movements and promoting
eye tracking as an objective measure.

The European Landscape Convention defines landscape as "an area, as
perceived by people, whose character is the result of the action and
interaction of natural and/or human factors" [Council of Europe 2000]. This
definition puts people in the core of the landscape and makes them part of it
while observing the landscape. In addition, the Convention emphasizes that
landscape is an important public interest which determines a part of the
quality of life for people everywhere. Consequently, an active participation of
the public in landscape planning and management is strongly stimulated [Council
of Europe 2000]. Regarding these statements, it would be beneficial to gain
insights into people's observation and perception of landscapes to be able to
use this knowledge for landscape planning and management. So far, different
landscape perception paradigms have been formulated [Scott and Benson 2002] and
analyzed using questionnaires and depth interviews. The most frequently used
stimuli in these empirical researches are photographs or in situ observations
[e.g. Ode et al. 2008; Palmer 2004; Tveit 2009]. Eye-tracking in combination
with landscape photographs, however, offers an objective manner to measure
people's observation of landscapes.

Recording of eye movement data can help to understand where and at what
participants look. However, analyzing eye movement data is a time consuming
task. Using visualization techniques in the analysis process can help to
uncover concealed relationships within the data and can therefore be seen as
one means in the analysis of eye movement data. The most well known
visualization techniques in eye tracking are heat maps or scanpaths. In recent
years more visualization techniques have been developed, as for example scaled
traces [Goldberg and Helfman 2010], eyePatterns [West et al. 2006], or
eSeeTrack [Tsang et al. 2010].