Face parsing is a segmentation task over facial components, which is very important for a lot of facial augmented reality applications. We present a demonstration of face parsing for mobile platforms such as iPhone and Android. We design an efficient fully convolutional neural network (CNN) in an hourglass form that is adapted to live face parsing. The CNN is implemented on the iPhone with the CoreML framework. In order to visualize the segmentation results, we superpose a mask with false colors so that the user can have an instant AR experience.

The demonstration in hand is a facility management interface for handheld devices. It utilizes a combination of VTT�s ALVAR point cloud tracking for accurate coupling of indoor location and information related to points of interest, and Google�s ARCore for enhanced mobility and less restricted experience. The application can connect to a database of a Finnish commercial facility management operator Granlund Oy, allowing the user to instantly access and modify information related to a selected part of a Building Information Model (BIM). Such information includes temperature and flow rate of air conditioning, for example.

In this paper, we propose a system that can reproduce the material appearance of real objects using mobile augmented reality (AR). Our proposed system allows a user to manipulate a virtual object, whose model is generated from the shape and reflectance of a real object, using the user�s own hand. The shape of the real object is reconstructed by integrating depth images of the object, which are captured using an RGB-D camera from different directions. The reflectance of the object is obtained by estimating the parameters of a reflectance model from the reconstructed shape and color images, assuming that a single light source is attached to the camera. We measured the shape and reflectance of some real objects and presented the material appearance of the objects using mobile AR. It was confirmed that users were able to obtain the perception of materials from changes in gloss and burnish of the objects by rotating the objects using their own hand.

Scratching is a DJ (Disk Jockey) technique for producing rhythmic sounds by moving a record back and forth. It requires a high level of manual dexterity and its difficulty has been likened to playing an instrument. Our goals are: first, ease of use, so that novice user scan start scratching immediately; second, to enable users to scratch while walking. We employ millimeter wave radar sensing technology to replace record movements with mid-air gestures that are captured at 1 kHz.We developed our systems in several iterations with professional DJs. The accompanying movie shows that we could achieve our goals.

In this demo, we demonstrate the use of the mobile Vibro Motors Wearable device and we explain how it can be used to enhance Mixed Reality applications with tactile human-machine hand interactions. The presented device is capable of providing tactile feedback while interacting with virtual objects, e.g., when elements of a 3D user interface are touched. The device enables to run vibro motors mounted on a user's fingers while interacting with synthetic objects. In order to control the vibe of a particular vibro motor in a wireless manner, we have developed an HTTP-based API. The API can be used to develop mobile-tactile Mixed Reality applications to enhance user experience giving an impression of fading in/out effects when interacting with 3D objects. To show the device's capabilities, a 3D demo application has been developed in which the user can experience tactile feedback effects while interacting with virtual objects.

HoloRoyale is the first large-scale multiplayer, Augmented Reality game with high-fidelity interactions between virtual elements and the physical environment. We demonstrate unique design approaches we applied to deal with first-person AR interactions on a campus scale and with dozens of players (running around) in the game.

Cinematic Virtual Reality (CVR) has been increasing in popularity over the last years. In Cinematic Virtual Reality (CVR) the viewer watches omnidirectional movies using a head-mounted display or other VR devices. Thus, the viewer is positioned inside the scene, and can freely choose the direction of view. Accordingly, the viewer determines the visible section of the movie, and so it may happen that details, important for the story, are missed. During our research on user attention in CVR, we encountered many analytic demands and documented potentially useful features. This led us to developing an analyzing tool for omnidirectional movies: the CVR-Analyzer.

The work presented in this paper, explores the enhancement of the processes of data collection and data annotation via gamification. The need for structured data is vast as intelligent systems are highly dependant on machine-readable information. As seen in nowadays, many existing systems address this issue by relying on users to provide the data they need and also annotate them. However, a major flaw in this process is the lack of motive in the action which in turn lowers the volume of data. Gamification can potentially be an aid to the aforementioned issue as it is a known method for providing the motive needed through entertainment. On the other hand, the amusement of such a setting could lower the quality of the collected information. As such, for the use case of Hand Tracking (HT) and Gesture Recognition (GR) we have conducted an experiment in order to measure the effectiveness of gamification on this task. The relevant data are binary images which contain white values for the area in which the hand is detected as well as the classification of the gesture that is being performed. The tool used for the generation of this information is a computer vision solution, operating on a single RGB input, provided by ManoMotion AB. The hypothesis is that a gamified version of the task would provide a lower quality data in comparison with a non gamified task. Therefore, we conducted an experiment with 10 participants using both methods in order to collect the annotated data. 1288 of total images were collected from both applications. These images were later on used as an input for evaluation coming from external observation. The analysis of the result revealed that in the case of the gamified version, both the quality of the collected images and accuracy of classification was higher. As such, the hypothesis was not confirmed and in turn point out towards a potential benefit of gamification for the task. Based on that, this study suggests that a gamified version of a task is more likely to provide with quality information.

@inproceedings{ismar2018_demo_08,
author = "Ekneling, Sanna and Sonestedt, Tilian and Georgiadis, Abraham and Yousefi, Shahrouz and Chana, Julio",
title = "Magestro: Gamification of the Data Collection Process for Development of the Hand Gesture Recognition Technology",
year = "2018",
booktitle = "Adjunct Proceedings of the IEEE International Symposium for Mixed and Augmented Reality 2018 (To appear)"
}

The ability to localize a device or user precisely within a known space, would allow many use cases on the context of location-based augmented reality. We propose a localization service based on sparse visual information using ARCore [4, a state-of-the-art augmented reality platform for mobile devices. Our service is constituted by two components: front-end and back-end. On the front-end, using the point cloud generated by ARCore as feature points, a corresponding binary keypoint descriptor algorithm like ORB [6] or FREAK [1] is computed to describe the place. On the back-end, this binary descriptor is searched in a map using the bags of binary words technique [3, responding with the position of the recognized place.

With Augmented Reality (AR) on Optical-See-Through-Head- Mounted Displays (OST-HMD), users can observe the real world and computer graphics at the same time. In this work, we present TutAR, a pipeline that semi-automatically creates AR tutorials out of 2D RGB videos. TutAR extracts relevant 3D hand motion from the input video. The derived motion will be displayed as an animated 3D hand relative to the human body and plays synchronously with the motion in the video on an OST-HMD.

It was confirmed speech balloon captioning shown in 3 dimensional space (such as Augmented Reality) were better than caption shown in 2 dimension screen with the aim of information insurance for people with hearing impairment in previous research. In this research, we reconfirmed that multi line captions contributes to the user's comprehension of the speech balloon contents better than the single line caption. This research has also found that the system's ability which enables the user to look back on the previous parts of the conversation leads to the improvement of the user experience. In addition, when the amount of sentences increases, the captions presented not as a log, but as multi speech balloons like comics are more natural and do not obstruct the user's view.

In this demonstration, we will show a prototype system with sensor fusion approach to robustly track 6 degrees of freedom of hand movement and support intuitive hand gesture interaction and 3D object manipulation for Mixed Reality head-mounted displays. Robust tracking of hand and finger with egocentric camera remains a challenging problem, especially with self-occlusion � for example, when user tries to grab a virtual object in midair by closing the palm. Our approach leverages the use of a common smart watch worn on the wrist to provide a more reliable palm and wrist orientation data, while fusing the data with camera to achieve robust hand motion and orientation for interaction.

In this work, we consider the challenge of achieving a coherent blending between real and virtual worlds in the context of a Mixed Reality (MR) scenario. Specifically, we have designed and implemented an interactive demonstrator that shows a realistic MR application without using any light probe. The proposed system takes as input the RGB stream of the real scene, and uses these data to recover both the position and intensity of light sources. The lighting can be static and/or dynamic and the geometry of the scene can be partially altered. Our system is robust in presence of specular effects and handles both uniform and/or textured surfaces.

Hand gestures are widely used in Human-Computer and Human Robotic Interfaces. Head mounted devices use gestures to communicate as evident on HoloLens, Meta, and ARCore/ARKit platform enabled smartphones. However, these devices are expensive mainly due to onboard powerful processors and sensors such as multiple cameras, depth and IR sensors that process hand gestures. To enable mass market reach via inexpensive MR headsets without built-in depth or IR sensors, we propose a real-time, in-air gestural framework that works on monocular RGB input alone. We use fingertip for writing in air analogous to a pen on paper. The major challenge in training egocentric gesture recognition models is in obtaining sufficient labeled data for end-to-end learning. Thus, we design a cascade of networks, consisting of a CNN with differentiable spatial to numerical transform (DSNT) layer, for fingertip regression, followed by a Bidirectional Long Short-Term Memory (Bi-LSTM), for a real-time pointing hand gesture classification. The framework takes 1.73s to run end-to-end and has a low memory footprint of 14MB facilitating easy portability on a smart-phone while achieving an accuracy of 88.0 on egocentric video dataset.

For our demonstration, we present a prototype system for sharing and augmenting facial expression in cooperative social Virtual Reality (VR) games. We created two social VR games, �Bomb Defusal� and �Island Survivor�, to demonstrate our system for capturing and sharing facial expression between VR players through their avatar.

We present a real-time system for collaboratively reconstructing dense volumetric models of large 3D scenes. Capturing large scenes can take time, and risk tracking failure / transient changes to the scene as capture time / scale increase. To avoid these problems, we use multiple mobile agents, each equipped with visual-inertial camera tracking, to capture smaller, overlapping sub-scenes in parallel, and then join them into a complete scene on a central server using online RGB-D camera relocalisation. Using our system, an entire building can be reconstructed in under half an hour and at a far lower cost than was previously possible.

We present hybrid user interfaces that facilitate interaction with music content in 3D, using a combination of 2D and 3D input and display devices. Participants will explore an online music library, some wearing AR or VR head-worn displays used alone or in conjunction with touch screens, and others using only touch screens. They will select genres, artists, albums, and songs, interacting through a combination of 3D hand-tracking and 2D multi-touch technologies.

@inproceedings{ismar2018_demo_17,
author = "Elvezio, Carmine and Amelot, Pierre and Boyle, Robert and Wes, Catherine Ilona and Feiner, Steven",
title = "Hybrid UIs for Music Exploration in AR and VR",
year = "2018",
booktitle = "Adjunct Proceedings of the IEEE International Symposium for Mixed and Augmented Reality 2018 (To appear)"
}

Handheld Perspective Corrected Displays (HPCDs) are physical objects covered by virtual images of a 3D scene. The images are computed as to create the illusion that the scene is contained in the device. HPCDs can offer a direct interaction that is isomorphic to the manipulation of physical objects. We demonstrate a spherical HPCD using external projection. The system improves over previous works by offering a lightweight wireless seamless display with head-coupled stereo, robust tracking, and low latency. This contributes to create one of the most convincing illusion of presence of a virtual scene.

Streaming high quality rendering for virtual reality applications requires minimizing perceived latency. We introduce Shading Atlas Streaming (SAS), a novel object-space rendering framework suitable for streaming 3D content in virtual/augmented reality. SAS decouples server- side shading from client-side rendering, allowing the client to perform framerate upsampling and latency compensation autonomously for short periods of time. The shading information created by the server in object space is temporally coherent and can be efficiently compressed using standard MPEG encoding. Our results show that SAS compares favorably to previous methods for remote image-based rendering in terms of image quality and network bandwidth efficiency. SAS allows highly efficient parallel allocation in a virtualized-texture-like memory hierarchy, solving a common efficiency problem of object- space shading. With SAS, untethered virtual reality headsets can benefit from high quality rendering without paying in increased latency. Visitors will be able to try SAS by roaming the exhibit area wearing a Snapdragon 845 based headset.

We present an Augmented Reality remote collaboration system leveraging dense scene reconstruction for intuitive remote guidance. A local worker can use our system to automatically generate a 3D mesh of the surrounding and stream it to a remote expert. The remote expert can use the reconstruction to explore the scene independently of the local worker. World-stabilized annotations can be placed and strokes drawn on surfaces are intelligently placed in the world. In addition, the remote expert can segment objects from the reconstructed mesh in order to quickly create animations for conveying precise instructions to the local worker.

The Microsoft HoloLens is one of the latest headsets that facilitates mixed and augmented reality (AR) applications. It has a high potential to leverage AR applications in many domains. However, the first version also comes with several limitations such as the limited battery lifetime and the small field of view. An important one is the lack of high-fidelity depth data which would facilitate object detection and tracking research; a capability imperative for many applications. To mitigate this limitation, we integrated the HoloLens into a point cloud-based tracking system. Our system uses several Kinect range cameras to obtain a point cloud and to detect and track real assets of interest in this point cloud. The pose data for all objects are forwarded to the HoloLens, which can then render 3D models from the right perspective. However, this system is not free of tolerances and tracking errors, which mandates calibrations. This poster explains how the system was set-up and verifies the feasibility of a system such as this for 3D model registration using the HoloLens as a display device.

We present a diminished reality application running live on consumer mobile devices. In our pre-observation-based approach, the clean 3D scene, free of undesired objects, is scanned beforehand and reconstructed as a high resolution textured 3D model. At runtime, objects added in a region of interest are efficiently removed by projecting the previously captured background. Differences of illumination conditions between scan time and run-time are compensated to obtain seamless results. The proposed approach requires no segmentation or manual input other than the definition of the 3D region of interest to be diminished, and is not based on any particular assumption on the background geometry. We show the potential of our approach by processing a variety of challenging unknown 3D scenes including textured backgrounds, dynamic illumination conditions and foreground objects partially occluding the diminished region. We provide details on our compute shader implementation to make as easy as possible the reimplementation by the community.

With data growing at a huge rate, there arises a need for advanced data visualization techniques. Visualizing these data sets in Mixed Reality(MR) mode provides an immersive experience to the user in the context of the real world applications. Most of the existing works can only be used with inordinately priced devices such as Microsoft HoloLens, Meta Glass that use proprietary hardware for data visualization and user interaction through hand gestures. In this paper, we demonstrate a cost-effective solution for data visualization using frugal devices such as Google Cardboard, VR Box etc. in MR mode. However, these devices still employ only primitive modes of interaction such as the magnetic trigger, conductive lever and have a limited user-input capability. To interact with visualizations and facilitate rich user experience, we propose the use of intuitive pointing fingertip gestural interface in the user�s Field of View(FoV). The proposed pointing hand gesture recognition framework is driven by cascade of state-of-the-art deep learning model - Faster RCNN for localizing the hand followed by a proposed regression CNN for fingertip localization. We conducted both objective and subjective evaluation to demonstrate the performance of our proposed method. Objective metrics are fingertip recognition accuracy and computational time. The subjective evaluation includes user comfort and effectiveness of fingertip interaction that is proposed.

The transition from high school to university is an exciting time for students including many new challenges. Particularly in the field of science, technology, engineering, and mathematics, the university dropout rate may reach up to 40%. The studies of physics rely on many abstract concepts and quantities that are not directly visible like energy or heat. We developed a mixed reality application for education, that augments the thermal conduction of metal by overlaying a representation of temperature as false-color visualization directly onto the object. This real-time augmentation avoids attention split and overcomes the perception gap by amplifying the human eye. Augmented and Virtual Reality environments allow students to perform experiments that were impossible to conduct for security or financial reasons. With the application, we try to foster a deeper understanding of the learning material and higher engagement during the studies.

Museums and exhibitions often involve physical artefacts which may contain rich histories or deep meaning associated with them. These additional contents are often installed physically as informational panels shown on a wall display. However, it may sometimes be challenging to deploy due to space constraints. In order to address this challenge, we introduced the use of mixed reality. Mixed reality offers an immersive and interactive experience through the use of head mounted displays and in-air gestures. Visitors can discover additional content virtually, without changing the physical space. For a small-scale exhibition at a cafe, we developed a Microsoft HoloLens application to create an interactive experience on top of a collection of historic physical items. Through public experiences at the caf�, we received positive feedback of our system. In this paper, we discuss the design and implications of our system, survey results, as well as challenges that were encountered in deploying our mixed reality experience in a public setting.

@inproceedings{ismar2018_demo_30,
author = "Cheng, Kelvin",
title = "The deployment of a mixed reality experience for a small-scale exhibition in the wild",
year = "2018",
booktitle = "Adjunct Proceedings of the IEEE International Symposium for Mixed and Augmented Reality 2018 (To appear)"
}