Touchscreen interaction suffers from occlusion problems as fingers can cover small targets, which makes interacting with such targets challenging. To improve touchscreen interaction accuracy and consequently the selection of small or hidden objects we introduce a back-of-device force feedback system for smartphones. We introduce a new solution that combines force feedback on the back to enhance touch input on the front screen. The interface includes three actuated pins at the back of a smartphone. All three pins are driven by micro servos and can be actuated up to a frequency of 50Hz and a maximum amplitude of 5mm. In a first psychophysical user study, we explored the limits of the system. Thereafter, we demonstrate through a performance study that the proposed interface can enhance touchscreen interaction precision, compared to state-of-the-art methods. In particular the selection of small targets performed remarkably well with force feedback. The study additionally shows that users subjectively felt significantly more accurate with force feedback. Based on the results, we discuss back-to-front feedback design issues and demonstrate potential applications through several prototypical concepts to illustrate where the back-of-device force feedback could be beneficial.

In presence of conflicting or ambiguous visual cues in complex scenes, performing 3D selection and manipulation tasks can be challenging. To improve motor planning and coordination, we explore audio-tactile cues to inform the user about the presence of objects in hand proximity, e.g., to avoid unwanted object penetrations. We do so through a novel glove-based tactile interface, enhanced by audio cues. Through two user studies, we illustrate that proximity guidance cues improve spatial awareness, hand motions, and collision avoidance behaviors, and show how proximity cues in combination with collision and friction cues can significantly improve performance.

We present a novel forearm-and-glove tactile interface that can enhance 3D interaction by guiding hand motor planning and coordination. In particular, we aim to improve hand motion and pose actions related to selection and manipulation tasks. Through our user studies, we illustrate how tactile patterns can guide the user, by triggering hand pose and motion changes, for example to grasp (select) and manipulate (move) an object. We discuss the potential and limitations of the interface, and outline future work.

In recent years, a variety of methods have been introduced to exploit the decrease in visual acuity of peripheral vision, known as foveated rendering. As more and more computationally involved shading is requested and display resolutions increase, maintaining low latencies is challenging when rendering in a virtual reality context. Here, foveated rendering is a promising approach for reducing the number of shaded samples. However, besides the reduction of the visual acuity, the eye is an optical system, filtering radiance through lenses. The lenses create depth-of-field (DoF) effects when accommodated to objects at varying distances. The central idea of this article is to exploit these effects as a filtering method to conceal rendering artifacts. To showcase the potential of such filters, we present a foveated rendering system, tightly integrated with a gaze-contingent DoF filter. Besides presenting benchmarks of the DoF and rendering pipeline, we carried out a perceptual study, showing that rendering quality is rated almost on par with full rendering when using DoF in our foveated mode, while shaded samples are reduced by more than 69%.

In this research demo, we show different examples for the integration of multisensory cues to enhance user engagement and trigger emotional responses in immersive environments. Our primary focus is to use this modular-designed system for the supportive treatment of different anxiety disorders.

Head-mounted displays (HMDs) with integrated eye trackers have opened up a new realm for gaze-contingent rendering. The accurate estimation of gaze depth is essential when modeling the optical capabilities of the eye. Most recently multifocal displays are gaining importance, requiring focus estimates to control displays or lenses. Deriving the gaze depth solely by sampling the scene's depth at the point-of-regard fails for complex or thin objects as eye tracking is suffering from inaccuracies. Gaze depth measures using the eye's vergence only provide an accurate depth estimate for the first meter. In this work, we combine vergence measures and multiple depth measures into feature sets. This data is used to train a regression model to deliver improved estimates. We present a study showing that using multiple features allows for an accurate estimation of the focused depth (MSE<0.1m) over a wide range (first 6m).

Large, high-resolution displays are highly suitable for creation of digital environments for co-located collaborative task solving. Yet, placing multiple users in a shared environment may increase the risk of interferences, thus causing mental discomfort and decreasing efficiency of the team. To mitigate interferences coordination strategies and techniques were introduced. However, in a mixed-focus collaboration scenarios users switch now and again between loosely and tightly collaboration, therefore different coordination techniques might be required depending on the current collaboration state of team members. For that, systems have to be able to recognize collaboration states as well as transitions between them to ensure a proper adjustment of the coordination strategy. Previous studies on group behavior during collaboration in front of large displays investigated solely collaborative coupling states, not transitions between them though. To address this gap, we conducted a study with 12 participant dyads in front of a tiled display and let them solve two tasks in two different conditions (focus and overview). We looked into group dynamics and categorized transitions by means of changes in proximity, verbal communication, visual attention, visual interface, and gestures. The findings can be valuable for user interface design and development of group behavior models.

In western societies a huge percentage of the population suffers from some kind of back pain at least once in their life. There are several approaches addressing back pain by postural modifications. Postural training and activity can be tracked by various wearable devices most of which are based on accelerometers. We present research on the accuracy of accelerometer-based posture measurements. To this end, we took simultaneous recordings using an optical motion capture system and a system consisting of five accelerometers in three different settings: On a test robot, in a template, and on actual human backs. We compare the accelerometer-based spine curve reconstruction against the motion capture data. Results show that tilt values from the accelerometers are captured highly accurate, and the spine curve reconstruction works well.

Large, high-resolution displays demonstrated their effectiveness in lab settings for cognitively demanding tasks in single user and collaborative scenarios. The effectiveness is mostly reached through inherent displays' properties - large display real estate and high resolution - that allow for visualization of complex datasets, and support of group work and embodied interaction. To raise users' efficiency, however, more sophisticated user support in the form of advanced user interfaces might be needed. For that we need profound understanding of how large, tiled displays impact users work and behavior. We need to extract behavioral patterns for different tasks and data types. This paper reports on study results of how users, while working collaboratively, process spatially fixed items on large, tiled displays. The results revealed a recurrent pattern showing that users prefer to process documents column wise rather than row wise or erratic.

Motion capture, often abbreviated mocap, generally aims at recording any kind of motion -- be it from a person or an object -- and to transform it to a computer-readable format. Especially the data recorded from (professional and non-professional) human actors are typically used for analysis in e.g. medicine, sport sciences, or biomechanics for evaluation of human motion across various factors. Motion capture is also widely used in the entertainment industry: In video games and films realistic motion sequences and animations are generated through data-driven motion synthesis based on recorded motion (capture) data.
Although the amount of publicly available full-body-motion capture data is growing, the research community still lacks a comparable corpus of specialty motion data such as, e.g. prehensile movements for everyday actions. On the one hand, such data can be used to enrich (hand-over animation) full-body motion capture data - usually captured without hand motion data due to the drastic dimensional difference in articulation detail. On the other hand, it provides means to classify and analyse prehensile movements with or without respect to the concrete object manipulated and to transfer the acquired knowledge to other fields of research (e.g. from 'pure' motion analysis to robotics or biomechanics).
Therefore, the objective of this motion capture database is to provide well-documented, free motion capture data for research purposes.
The presented database GraspDB14 in sum contains over 2000 prehensile movements of ten different non-professional actors interacting with 15 different objects. Each grasp was realised five times by each actor. The motions are systematically named containing an (anonymous) identifier for each actor as well as one for the object grasped or interacted with.
The data were recorded as joint angles (and raw 8-bit sensor data) which can be transformed into positional 3D data (3D trajectories of each joint).
In this document, we provide a detailed description on the GraspDB14-database as well as on its creation (for reproducibility).
Chapter 2 gives a brief overview of motion capture techniques, freely available motion capture databases for both, full body motions and hand motions, and a short section on how such data is made useful and re-used. Chapter 3 describes the database recording process and details the recording setup and the recorded scenarios. It includes a list of objects and performed types of interaction. Chapter 4 covers used file formats, contents, and naming patterns. We provide various tools for parsing, conversion, and visualisation of the recorded motion sequences and document their usage in chapter 5.

Development and rapid prototyping for large interactive environments like tiled-display walls pose many challenges. One is the heterogeneity of the various applications and libraries. A visual application tailored for a single monitor setup with a certain software environment is difficult to port and distribute to a multi-display, multi-PC setup. As a solution to this problem, we explore the potential of lightweight containerization techniques for distributed interactive applications. In particular, we present how the necessary runtime and build environments including libraries and drivers can be abstracted using the Docker framework. We demonstrate the packing of an existing single-machine GPU-enabled ray tracer inside a container to be used on tiled display walls. The performance measurements reveal that the containerization has a negligible impact on the system’s performance but allows for easy setup, integration, and distribution of complex applications.

This work presents the analysis of data recorded by an eye tracking device in the course of evaluating a foveated rendering approach for head-mounted displays (HMDs). Foveated rendering methods adapt the image synthesis process to the user’s gaze and exploiting the human visual system’s limitations to increase rendering performance. Especially, foveated rendering has great potential when certain requirements have to be fulfilled, like low-latency rendering to cope with high display refresh rates. This is crucial for virtual reality (VR), as a high level of immersion, which can only be achieved with high rendering performance and also helps to reduce nausea, is an important factor in this field. We put things in context by first providing basic information about our rendering system, followed by a description of the user study and the collected data. This data stems from fixation tasks that subjects had to perform while being shown fly-through sequences of virtual scenes on an HMD. These fixation tasks consisted of a combination of various scenes and fixation modes. Besides static fixation targets, moving tar- gets on randomized paths as well as a free focus mode were tested. Using this data, we estimate the precision of the utilized eye tracker and analyze the participants’ accuracy in focusing the displayed fixation targets. Here, we also take a look at eccentricity-dependent quality ratings. Comparing this information with the users’ quality ratings given for the displayed sequences then reveals an interesting connection between fixation modes, fixation accuracy and quality ratings.

Despite the ever increasing performance of computer hardware, single machines are not always capable of processing complex tasks in reasonable time, if at all. At the same time, the amount of data to be generated or processed by computer systems increases at a similar rate. Generating images using algorithms from computer graphics is a good example for such scenarios. Because visual analysis of data can greatly benefit from the usage of large, tiled display systems, and techniques to render images become more and more complex in order to generate visually pleasing results, using a single machine is often not sufficient to enable reasonable workflows. In this paper, we introduce CRaP, a framework for cluster-based rendering and postprocessing. Distributing the generation and post-processing of images to a cluster of machines allows to speed up these tasks. In particular, we aim at rendering images at high resolutions, e.g. for large display walls. To our knowledge, no other framework for distributed rendering is capable of postprocessing images in a distributed fashion without being limited to frame tiles. We present a software design that provides a simple interface for creating render sessions, enabling arbitrary applications to delegate the generation of images to a cluster. In addition to a simple configuration process, we intend to support platform interoperability. We focus on the design of a flexible and modular architecture in order to make the framework independent of rendering techniques and scheduling algorithms. Thus, these components should be interchangeable to enable a variety of use cases. The same applies to the integration of postprocessing methods.

Advances in computer graphics enable us to create digital images of astonishing complexity and realism. However, processing resources are still a limiting factor. Hence, many costly but desirable aspects of realism are often not accounted for, including global illumination, accurate depth of field and motion blur, spectral effects, etc. especially in real-time rendering. At the same time, there is a strong trend towards more pixels per display due to larger displays, higher pixel densities or larger fields of view. Further observable trends in current display technology include more bits per pixel (high dynamic range, wider color gamut/fidelity), increasing refresh rates (better motion depiction), and an increasing number of displayed views per pixel (stereo, multi-view, all the way to holographic or lightfield displays). These developments cause significant unsolved technical challenges due to aspects such as limited compute power and bandwidth. Fortunately, the human visual system has certain limitations, which mean that providing the highest possible visual quality is not always necessary. In this report, we present the key research and models that exploit the limitations of perception to tackle visual quality and workload alike. Moreover, we present the open problems and promising future research targeting the question of how we can minimize the effort to compute and display only the necessary pixels while still offering a user full visual experience.

Enhancing touch screen interfaces through non-visual cues has been shown to improve performance. In this paper we report on a novel system that explores the usage of a forcesensitive motion-platform enhanced tablet interface to improve multi-modal interaction based on visuo-haptic instead of tactile feedback. Extending mobile touch screen with force-sensitive haptic feedback has potential to enhance performance interacting with GUIs and to improve perception of understanding relations. A user study was performed to determine the perceived recognition of different 3D shapes and the perception of different heights. Furthermore, two application scenarios are proposed to explore our proposed visuo-haptic system. The studies show the positive stance towards the feedback, as well as the found limitations related to perception of feedback.

In this paper, we report on novel zooming interface methods that deploy a small handheld projector. Using mobile projections to visualize object/environment related information on real objects introduces new aspects for zooming interfaces. Different approaches are investigated that focus on maintaining a level of context while exploring detail in information. Doing so, we propose methods that provide alternative contextual cues within a single projector, and deploy the potential of zoom lenses to support a multi-level zooming approach. Furthermore, we look into the correlation between pixel density, distance to target and projection size. Alongside these techniques, we report on multiple user studies in which we quantified the projection limitations and validated various interactive visualization approaches. Thereby, we focused on solving issues related to pixel density, brightness and contrast that affect the design of more effective, legible zooming interfaces for handheld projectors.

We present an analysis of eye tracking data produced during a quality-focused user study of our own foveated ray tracing method. Generally, foveated rendering serves the purpose of adapting actual rendering methods to a user’s gaze. This leads to performance improvements which also allow for the use of methods like ray tracing, which would be computationally too expensive otherwise, in fields like virtual reality (VR), where high rendering performance is important to achieve immersion, or fields like scientific and information visualization, where large amounts of data may hinder real-time rendering capabilities. We provide an overview of our rendering system itself as well as information about the data we collected during the user study, based on fixation tasks to be fulfilled during flights through virtual scenes displayed on a head-mounted display (HMD). We analyze the tracking data regarding its precision and take a closer look at the accuracy achieved by participants when focusing the fixation targets. This information is then put into context with the quality ratings given by the users, leading to a surprising relation between fixation accuracy and quality ratings.

Supported by their large size and high resolution, display walls suit well for different collaboration types. However, in order to foster instead of impede collaboration processes, interaction techniques need to be carefully designed, taking into regard the possibilities and limitations of the display size, and their effects on human perception and performance. In this paper we investigate the impact of visual distractors (which, for instance, might be caused by other collaborators' input) in peripheral vision on short-term memory and attention. The distractors occur frequently when multiple users collaborate in large wall display systems and may draw attention away from the main task, as such potentially affecting performance and cognitive load. Yet, the effect of these distractors is hardly understood. Gaining a better understanding thus may provide valuable input for designing more effective user interfaces. In this article, we report on two interrelated studies that investigated the effect of distractors. Depending on when the distractor is inserted in the task performance sequence, as well as the location of the distractor, user performance can be disturbed: we will show that distractors may not affect short term memory, but do have an effect on attention. We will closely look into the effects, and identify future directions to design more effective interfaces.

Head-mounted displays with dense pixel arrays used for virtual reality applications require high frame rates and low latency rendering. This forms a challenging use case for any rendering approach. In addition to its ability of generating realistic images, ray tracing offers a number of distinct advantages, but has been held back mainly by its performance. In this paper, we present an approach that significantly improves image generation performance of ray tracing. This is done by combining foveated rendering based on eye tracking with reprojection rendering using previous frames in order to drastically reduce the number of new image samples per frame. To reproject samples a coarse geometry is reconstructed from a G-Buffer. Possible errors introduced by this reprojection as well as parts that are critical to the perception are scheduled for resampling. Additionally, a coarse color buffer is used to provide an initial image, refined smoothly by more samples were needed. Evaluations and user tests show that our method achieves real-time frame rates, while visual differences compared to fully rendered images are hardly perceivable. As a result, we can ray trace non-trivial static scenes for the Oculus DK2 HMD at 1182 × 1464 per eye within the the VSync limits without perceived visual differences.

The work at hand outlines a recording setup for capturing hand and finger movements of musicians. The focus is on a series of baseline experiments on the detectability of coloured markers under different lighting conditions. With the goal of capturing and recording hand and finger movements of musicians in mind, requirements for such a system and existing approaches are analysed and compared. The results of the experiments and the analysis of related work show that the envisioned setup is suited for the expected scenario.

Human beings spend much time under the influence of artificial lighting. Often, it is beneficial to adapt lighting to the task, as well as the user’s mental and physical constitution and well-being. This formulates new requirements for lighting - human-centric lighting - and drives a need for new light control methods in interior spaces. In this paper we present a holistic system that provides a novel approach to human-centric lighting by introducing simulation methods into interactive light control, to adapt the lighting based on the user's needs. We look at a simulation and evaluation platform that uses interactive stochastic spectral rendering methods to simulate light sources, allowing for their interactive adjustment and adaption.

In this article, we report on challenges and potential methodologies to support the design and validation of multisensory techniques. Such techniques can be used for enhancing engagement in immersive systems. Yet, designing effective techniques requires careful analysis of the effect of different cues on user engagement. The level of engagement spans the general level of presence in an environment, as well as the specific emotional response to a set trigger. Yet, measuring and analyzing the actual effect of cues is hard as it spans numerous interconnected issues. In this article, we identify the different challenges and potential validation methodologies that affect the analysis of multisensory cues on user engagement. In doing so, we provide an overview of issues and potential validation directions as an entry point for further research. The various challenges are supported by lessons learned from a pilot study, which focused on reflecting the initial validation methodology by analyzing the effect of different stimuli on user engagement.