Abstract

We propose a robust pointing detection with virtual shadow representation for interacting with a public display. Using a depth camera, our shadow is generated by a model with an angled virtual sun light and detects the nearest point as a pointer. Position of the shadow becomes higher when user walks closer, which conveys the notion of correct distance to control the pointer and offers accessibility to the higher area of the display.

Abstract

Two related challenges with current teleoperated robotic systems are lack of peripheral vision and awareness, and difficulty or tedium of navigating through remote spaces. We address these challenges by providing an interface with a focus plus context (F+C) view of the robot location, and where the user can navigate simply by looking where they want to go, and clicking or drawing a path on the view to indicate the desired trajectory or destination. The F+C view provides an undistorted, perspectively correct central region surrounded by a wide field of view peripheral portion, and avoids the need for separate views. The navigation method is direct and intuitive in comparison to keyboard or joystick based navigation, which require the user to be in a control loop as the robot moves. Both the F+C views and the direct click navigation were evaluated in a preliminary user study.

Abstract

The confluence of technologies such as telepresence, immersive imaging, model based virtual mirror worlds, mobile live streaming, etc. give rise to a capability for people anywhere to view and connect with present or past
events nearly anywhere on earth. This capability properly belongs to a public commons, available as a birthright of all humans, and can been seen as part of an evolutionary transition supporting a global collective mind. We describe examples and elements of this capability, and suggest how they can be better integrated through a tool we
call TeleViewer and a framework called WorldViews,
which supports easy sharing of views as well as connecting of providers and consumers of views all around the world.

Abstract

In this paper we report findings from two user studies that explore the problem of establishing common viewpoint in the context of a wearable telepresence system. In our first study, we assessed the ability of a local person (the guide) to identify the view orientation of the remote person by looking at the physical pose of the telepresence device. In the follow-up study, we explored visual feedback methods for communicating the relative viewpoints of the remote user and the guide via a head-mounted display. Our results show that actively observing the pose of the device is useful for viewpoint estimation. However, in the case of telepresence devices without physical directional affordances, a
live video feed may yield comparable results. Lastly, more abstract visualizations lead to significantly longer recognition times, but may be necessary in more complex environments.

Abstract

As video-mediated communication reaches broad adoption, improving immersion and social interaction are important areas of focus in the design of tools for exploration and work-based communication. Here we present three threads of research focused on developing new ways of enabling exploration of a remote environment and interacting with the people and artifacts therein.

Abstract

Our research focuses on improving the effectiveness and usability of driving mobile telepresence robots by increasing the user's sense of immersion during the navigation task. To this end we developed a robot platform that allows immersive navigation using head-tracked stereoscopic video and a HMD. We present the result of an initial user study that compares System Usability Scale (SUS) ratings of a robot teleoperation task using head-tracked stereo vision with a baseline fixed video feed and the effect of a low or high placement of the camera(s). Our results show significantly higher ratings for the fixed video condition and no effect of the camera placement. Future work will focus on examining the reasons for the lower ratings of stereo video and and also exploring further visual navigation interfaces.

Abstract

Telepresence systems usually lack mobility. Polly, a wearable telepresence device, allows users to explore remote locations or experience events remotely by means of a person that serves as a mobile "guide". We built a series of hardware prototypes and our current, most promising embodiment consists of a smartphone mounted on a stabilized gimbal that is wearable. The gimbal enables remote control of the viewing angle as well as providing active image stabilization while the guide is walking. We present qualitative findings from a series of 8 field tests using either Polly or only a mobile phone. We found that guides felt more physical comfort when using Polly vs. a phone and that Polly was accepted by other persons at the remote location. Remote participants appreciated the stabilized video and ability to control camera view. Connection and bandwidth issues appear to be the most challenging issues for Polly-like systems.

Abstract

Polly is an inexpensive, portable telepresence device based
on the metaphor of a parrot riding a guide's shoulder and acting as proxy
for remote participants. Although remote users may be anyone with a
desire for `tele-visits', we focus on limited mobility users. We present a
series of prototypes and field tests that informed design iterations. Our
current implementations utilize a smartphone on a stabilized, remotely
controlled gimbal that can be hand held, placed on perches or carried by
wearable frame. We describe findings from trials at campus, museum and
faire tours with remote users, including quadriplegics. We found guides
were more comfortable using Polly than a phone and that Polly was
accepted by other people. Remote participants appreciated stabilized
video and having control of the camera. One challenge is negotiation
of movement and view control. Our tests suggests Polly is an effective
alternative to telepresence robots, phones or fixed cameras.

Abstract

We describe Explorer, a system utilizing mirror worlds - dynamic 3D virtual models of physical spaces that reflect the structure and activities of those spaces to help support navigation, context awareness and tasks such as planning and recollection of events. A rich sensor network dynamically updates the models, determining the position of people, status of rooms, or updating textures to reflect displays or bulletin boards. Through views on web pages, portable devices, or on 'magic window' displays located in the physical space, remote people may 'Clook in' to the space, while people within the space are provided with augmented views showing information not physically apparent.
For example, by looking at a mirror display, people can learn how long others have been present, or where they have been. People in one part of a building can get a sense of activities in the rest of the building, know who is present in their office, and look in to presentations in other rooms. A spatial graph is derived from the 3D models which is used both to navigational paths and for fusion of acoustic, WiFi, motion and image sensors
used for positioning. We describe usage scenarios for the system as deployed in two research labs, and a conference venue.

Abstract

Audio-based receiver localization in indoor environ-ments has multiple
applications including indoor navigation, loca-tion tagging, and
tracking. Public places like shopping malls and consumer stores often
have loudspeakers installed to play music for public
entertainment. Similarly, office spaces may have sound conditioning
speakers installed to soften other environmental noises. We discuss an
approach to leverage this infrastructure to perform audio-based
localization of devices requesting local-ization in such environments,
by playing barely audible controlled sounds from multiple speakers at
known positions. Our approach can be used to localize devices such as
smart-phones, tablets and laptops to sub-meter accuracy. The user does
not need to carry any specialized hardware. Unlike acoustic approaches
which use high-energy ultrasound waves, the use of barely audible (low
energy) signals in our approach poses very different challenges. We
discuss these challenges, how we addressed those, and experimental
results on two prototypical implementations: a request-play-record
localizer, and a continuous tracker. We evaluated our approach in a
real world meeting room and report promising initial results with
localization accuracy within half a meter 94% of the time. The system
has been deployed in multiple zones of our office building and is now
part of a location service in constant operation in our lab.

Abstract

We describe a system for supporting mirror worlds - 3D virtual models of physical spaces that reflect the structure and activities of those spaces to help support context awareness and tasks such as planning and recollection of events. Through views on web pages, portable devices, or on 'magic window' displays located in the physical space, remote people may 'look in' to the space, while people within the space are provided information not apparent through unaided perception. For example, by looking at a mirror display, people can learn how long others have been present, or where they have been. People in one part of a building can get a sense of activities in the rest of the building, know who is present in their office, and look in to presentations in other rooms. The system can be used to bridge across sites and help provide different parts of an organization with a shared awareness of each other's space and activities. We describe deployments of our mirror world system at several locations.

Abstract

We describe a system that mirrors a public physical space
into cyberspace to provide people with augmented awareness of that space. Through views on web pages, portable
devices, or on `Magic Window' displays located in the physical space, remote people may `look in' to the space, while people within the space are provided information not apparent through unaided perception. For example, by looking at a mirror display, people can learn how long others have been
present, where they have been, etc. People in one part of
a building can get a sense of the activities in the rest of
the building, who is present in their office, look in to a talk
in another room, etc. We describe a prototype for such a
system developed in our research lab and office space.

Abstract

We will exhibit several aspects of a complex mixed reality system that we have built and deployed in a real-world factory setting. In our system, virtual worlds, augmented realities, and mobile applications are all fed from the same infrastructure. In collaboration with TCHO, a chocolate maker in San Francisco, we built a virtual “mirror” world of a real-world chocolate factory and its processes. Sensor data is imported into the multi-user 3D environment from hundreds of sensors on the factory floor. The resulting virtual factory is used for simulation, visualization, and collaboration, using a set of interlinked, real-time layers of information. Another part of our infrastructure is designed to support appropriate industrial uses for mobile devices such as cell phones and tablet computers. We deployed this system at the real-world factory in 2009, and it is now is daily use there. By simultaneously developing mobile, virtual, and web-based display and collaboration environments, we aimed to create an infrastructure that did not skew toward one type of application but that could serve many at once, interchangeably. Through this mixture of mobile, social, mixed and virtual technologies, we hope to create systems for enhanced collaboration in industrial settings between physically remote people and places, such as factories in China with managers in the US.

Abstract

We propose an Augmented Reality (AR) system that helps users
take a picture from a designated pose, such as the position and
camera angle of an earlier photo. Repeat photography is
frequently used to observe and document changes in an object.
Our system uses AR technology to estimate camera poses in real
time. When a user takes a photo, the camera pose is saved as a
'view bookmark.' To support a user in taking a repeat photo, two
simple graphics are rendered in an AR viewer on the camera's
screen to guide the user to this bookmarked view. The system then
uses image adjustment techniques to create an image based on the
user's repeat photo that is even closer to the original.

Abstract

Virtual, mobile, and mixed reality systems have diverse uses for data visualization and remote collaboration in industrial settings, especially factories. We report our experiences in designing complex mixed-reality collaboration, control, and display systems for a real-world factory, for delivering real-time factory information to multiple users. In collaboration with (blank for review), a chocolate maker in San Francisco, our research group is building a virtual “mirror” world of a real-world chocolate factory and its processes. Real-world sensor data (such as temperature and machine state) is imported into the 3D environment from hundreds of sensors on the factory floor. Multi-camera imagery from the factory is also available in the multi-user 3D factory environment. The resulting "virtual factory" is designed for simulation, visualization, and collaboration, using a set of interlinked, real-time 3D and 2D layers of information about the factory and its processes. We are also looking at appropriate industrial uses for mobile devices such as cell phones and tablet computers, and how they intersect with virtual worlds and mixed realities. For example, an experimental iPhone web app provides mobile laboratory monitoring and control. The app allows a real-time view into the lab via steerable camera and remote control of lab machines. The mobile system is integrated with the database underlying the virtual factory world. These systems were deployed at the real-world factory and lab in 2009, and are now in beta development. Through this mashup of mobile, social, mixed and virtual technologies, we hope to create industrial systems for enhanced collaboration between physically remote people and places – for example, factories in China with managers in Japan or the US.

Abstract

Creating virtual models of real spaces and objects is cumber-
some and time consuming. This paper focuses on the prob-
lem of geometric reconstruction from sparse data obtained
from certain image-based modeling approaches. A number of
elegant and simple-to-state problems arise concerning when
the geometry can be reconstructed. We describe results and
counterexamples, and list open problems.

Abstract

This project investigates practical uses of virtual, mobile, and mixed reality systems in industrial settings, in particular control and collaboration applications for factories. In collaboration with TCHO, a chocolate maker start-up in San Francisco, we have built virtual mirror-world representations of a real-world chocolate factory and are importing its data and modeling its processes. The system integrates mobile devices such as cell phones and tablet computers. The resulting "virtual factory" is a cross-reality environment designed for simulation, visualization, and collaboration, using a set of interlinked, real-time 3D and 2D layers of information about the factory and its processes.

Abstract

The Pantheia system enables users to create virtual models by `marking up' the real world with pre-printed markers.
The markers have predefined meanings that guide the system as it creates models. Pantheia takes as input user captured images or video of the marked up space. This video
illustrates the workings of the system and shows it being
used to create three models, one of a cabinet, one of a lab,
and one of a conference room. As part of the Pantheia system, we also developed a 3D viewer that spatially integrates
a model with images of the model.

Abstract

FXPAL's Pantheia system enables users to create virtual models
by 'marking up' a physical space with pre-printed visual
markers. The meanings associated with the markers come
from a markup language that enables the system to create
models from a relatively sparse set of markers. This paper
describes extensions to our markup language and system that
support the creation of interactive virtual objects. Users place
markers to define components such as doors and drawers with
which an end user of the model can interact. Other interactive
elements, such as controls for color changes or lighting
choices, are also supported. Pantheia produced a model of a
room with hinged doors, a cabinet with drawers, doors, and
color options, and a railroad track.

Abstract

This paper presents a tool and a novel Fast Invariant Transform (FIT) algorithm for language independent e-documents access. The tool enables a person to access an e-document through an informal camera capture of a document hardcopy. It can save people from remembering/exploring numerous directories and file names, or even going through many pages/paragraphs in one document. It can also facilitate people’s manipulation of a document or people’s interactions through documents. Additionally, the algorithm is useful for binding multimedia data to language independent paper documents. Our document recognition algorithm is inspired by the widely known SIFT descriptor [4] but can be computed much more efficiently for both descriptor construction and search. It also uses much less storage space than the SIFT approach. By testing our algorithm with randomly scaled and rotated document pages, we can achieve a 99.73% page recognition rate on the 2188-page ICME06 proceedings and 99.9% page recognition rate on a 504-page Japanese math book.

Abstract

In this paper, we describe an automatic lighting adjustment
method for browsing object images. From a set of images of an
object, taken under different lighting conditions, we generate two
types of illuminated images: a textural image which eliminates
unwanted specular reflections of the object, and a highlight image
in which specularities of the object are highly preserved. Our user
interface allows viewers to digitally zoom into any region of the
image, and the lighting adjusted images are automatically
generated for the selected region and displayed. Switching
between the textural and the highlight images helps viewers to
understand characteristics of the object surface.

Abstract

We describe Pantheia, a system that constructs virtual models of real spaces from collections of images, through the use of visual markers that guide and constrain model construction. To create a model users simply `mark up' the real world scene by placing pre-printed markers that describe scene elements or impose semantic constraints. Users then collect still images or video of the scene. From this input, Pantheia automatically and quickly produces a model. The Pantheia system was used to produce models of two rooms that demonstrate the eectiveness of the approach.

Abstract

As the use of rich media in mobile devices and smart environments becomes more sophisticated, so must the design of the everyday objects used as controllers and interfaces. Many new interfaces simply tack electronic systems onto existing forms. However, an original physical design for a smart artefact, that integrates new systems as part of the form of the device, can enhance the end-use experience. The Convertible Podium is an experiment in the design of a smart artefact with complex integrated systems for the use of rich media in meeting rooms. It combines the highly designed look and feel of a modern lectern with systems that allow it to serve as a central control station for rich media manipulation. The interface emphasizes tangibility and ease of use in controlling multiple screens, multiple media sources (including mobile devices) and multiple distribution channels, and managing both data and personal representation in remote telepresence.

Abstract

This video shows the Virtual Physics Circus, a kind of playground
for experimenting with simple physical models. The system
makes it easy to create worlds with common physical objects such
as swings, vehicles, ramps, and walls, and interactively play with
those worlds. The system can be used as a creative art medium as
well as to gain understanding and intuition about physical
systems. The system can be controlled by a number of UI devices
such as mouse, keyboard, joystick, and tags which are tracked in 6
degrees of freedom.

Abstract

Current approaches to pose estimation and tracking can
be classified into two categories: generative and discriminative. While generative approaches can accurately determine human pose from image observations, they are computationally intractable due to search in the high dimensional human pose space. On the other hand, discriminative approaches do not generalize well, but are computationally efficient. We present a hybrid model that combines the strengths of the two in an integrated learning and inference framework. We extend the Gaussian process latent variable model (GPLVM) to include an embedding from
observation space (the space of image features) to the latent space. GPLVM is a generative model, but the inclusion
of this mapping provides a discriminative component,
making the model observation driven. Observation Driven
GPLVM (OD-GPLVM) not only provides a faster inference
approach, but also more accurate estimates (compared to
GPLVM) in cases where dynamics are not sufficient for the
initialization of search in the latent space.
We also extend OD-GPLVM to learn and estimate poses
from parameterized actions/gestures. Parameterized gestures
are actions which exhibit large systematic variation
in joint angle space for different instances due to difference in contextual variables. For example, the joint angles in a forehand tennis shot are function of the height of the ball (Figure 2). We learn these systematic variations as a function of the contextual variables. We then present an approach to use information from scene/object to provide
context for human pose estimation for such parameterized
actions.