Spatial Augmented Reality Book (free ebook)

Aero Glass

The Caleydo Project

VIOSO smartprojecting

Location - Computer Science Building (S3)

Position Indication:

Content

Research

The Institute of Computer Graphics carries out research in a modern field that has been coined "visual computing". Our core disciplines cover the imaging, processing, visualization and displaying of visual data. These are enabled by new fields and technologies, such as light fields, projector-camera systems, responsive optics, mobile computing, and visual analytics.

Please select one of the following topics or years for detailed information.

Imaging

A classification sensor based on compressed optical Radon transform

We present a thin-film sensor that optically measures the Radon transform of an image focussed onto it. Measuring and classifying directly in Radon space, rather than in image space, is fast and yields robust and high classification rates. We explain how the number of integral measurements required for a given classification task can be reduced by several orders of magnitude. Our experiments achieve classification rates of 98%–99% for complex hand gesture and motion detection tasks with as few as 10 photosensors. Our findings have the potential to stimulate further research towards a new generation of application-oriented classification sensors for use in areas such as biometry, security, diagnostics, surface inspection, and human-computer interfaces.

LumiConSense, a transparent, flexible, scalable, and disposable thin-film image sensor has the potential to lead to new human-computer interfaces that are unconstrained in shape and sensing-distance. In this article we make four new contributions: (1) A new real-time image reconstruction method that results in a significant enhancement of image quality compared to previous approaches; (2) the efficient combination of image reconstruction and shift-invariant linear image processing operations; (3) various hardware and software prototypes which, realize the above contributions, demonstrating the current potential of our sensor for real-time applications; and finally, (4) a further higher quality offline reconstruction algorithm.

Video 5
(Neues Fenster)
This video presents interactive depth reconstructions from shadows cast by a moving target. The 216 × 216 mm sensor prototype was applied in this case. The lateral reconstruction resolution was 16 × 16 pixels. The axial reconstruction resolution was 20 σ-levels (0.3 - 0.9). Note that due to the lower S/N, the necessary high-dynamic-range measurements limited the reconstruction speed to 0.5 frames per second. Note also that the shortest target-sensor distance is constrained by the lateral reconstruction resolution. If too close, the defocus of the target cannot be resolved precisely with only 16 ×16 pixels.

We present a fully transparent and flexible light-sensing film that, based on a single thin-film luminescent concentrator layer, supports simultaneous multi-focal image reconstruction and depth estimation without additional optics. Together with the sampling of two-dimensional light fields propagated inside the film layer under various focal conditions, it allows entire focal image stacks to be computed after only one recording that can be used for depth estimation. The transparency and flexibility of our sensor unlock the potential of lensless multilayer imaging and depth sensing with arbitrary sensor shapes – enabling novel human-computer interfaces.

Video 1
(Neues Fenster)
This video shows a laser pointer shining on the LC surface and the captured light field.

Video 2
(Neues Fenster)
This video visualizes the scanning range of all apertures.

Video 3
(Neues Fenster)
This video presents real-time reconstructions of shadows cast onto the LC surface. To achieve real-time rates we limit measurements to short exposure and low dynamic range. Compared to high-dynamic-range measurements, this leads to image reconstructions with a lower S/N.

Video 4
(Neues Fenster)
This video sweeps through a reconstructed focal stack and shows two simultaneous targets in focus at different depths.

Video 5
(Neues Fenster)
This video presents interactive depth reconstructions from shadows cast by a moving target. The 216 × 216 mm sensor prototype was applied in this case. The lateral reconstruction resolution was 16 × 16 pixels. The axial reconstruction resolution was 20 σ-levels (0.3 - 0.9). Note that due to the lower S/N, the necessary high-dynamic-range measurements limited the reconstruction speed to 0.5 frames per second. Note also that the shortest target-sensor distance is constrained by the lateral reconstruction resolution. If too close, the defocus of the target cannot be resolved precisely with only 16 ×16 pixels.

We present a novel approach to recording and computing panorama light fields. In contrast to previous methods that estimate panorama light fields from focal stacks or naive multi-perspective image stitching, our approach is the first that processes ray entries directly and does not require depth reconstruction or matching of image features. Arbitrarily complex scenes can therefore be captured while preserving correct occlusion boundaries, anisotropic reflections, refractions, and other light effects that go beyond diffuse reflections of Lambertian surfaces.

Capturing exposure sequences to compute high dynamic range (HDR) images causes motion blur in cases of camera movement. This also applies to light-field cameras: frames rendered from multiple blurred HDR lightfield perspectives are also blurred. While the recording times of exposure sequences cannot be reduced for a single-sensor camera, we demonstrate how this can be achieved for a camera array. Thus, we decrease capturing time and reduce motion blur for HDR light-field video recording. Applying a spatio-temporal exposure pattern while capturing frames with a camera array reduces the overall recording time and enables the estimation of camera movement within one light-field video frame. By estimating depth maps and local point spread functions (PSFs) from multiple perspectives with the same exposure, regional motion deblurring can be supported. Missing exposures at various perspectives are then interpolated.

Capturing exposure sequences for computing HDR images is prone to motion blur, which also affects HDR light-field recording.
We record four exposures encoded at varying camera perspectives and deblur long exposure recordings by tracking features in low exposure recordings.
This reduces motion blur and leads to shorter recording interval.

Most image sensors are planar, opaque, and inflexible. We present a novel image sensor that is based on a luminescent concentrator (LC) film which absorbs light from a specific portion of the spectrum. The absorbed light is re-emitted at a lower frequency and transported to the edges of the LC by total internal reflection. The light transport is measured at the border of the film by line scan cameras. With these measurements, images that are focused onto the LC surface can be reconstructed. Thus, our image sensor is fully transparent, flexible, scalable and, due to its low cost, potentially disposable.

We present a caching framework with a novel probability-based prefetching and eviction strategy applied to atomic cache units that enables interactive rendering of gigaray light fields. Further, we describe two new use cases that are supported by our framework: panoramic light fields, including a robust imaging technique and an appropriate parameterization scheme for real-time rendering and caching; and light-field-cached volume rendering, which supports interactive exploration of large volumetric datasets using light-field rendering. We consider applications such as light-field photography and the visualization of large image stacks from modern scanning microscopes.

With increasing resolution of imaging sensors, light-field photography is now becoming increasingly practical, and first light-field cameras are already commercially available (e.g., Lytro, Raytrix, and others). Applying common digital image processing techniques to light-fields, however, is in many cases not straight forward. The reason for this is, that the outcome must not only be spatially consistent, but also directionally consistent. Otherwise, refocussing and perspective changes will cause strong image artifacts. Panorama imaging techniques, for example, are an integral part of digital photography – often being supported by camera hardware today. We present a first approach towards the construction of panorama light-fields (i.e., large field-of-view light-fields computed from overlapping sub-light-field recordings).
By converting overlapping sub-light-fields into individual focal stacks, computing a panorama focal stack from them, and converting the panorama focal stack back into a panorama light field, we avoid the demand for a precise reconstruction of scene depth.

We show how the intrinsically performed JPEG compression of many digital still cameras leaves margin for deriving and applying image-adapted coded apertures that support retention of the most important frequencies after compression. These coded apertures, together with subsequently applied image processing, enable a higher light throughput than corresponding circular apertures, while preserving adjusted focus, depth of field, and bokeh. Higher light throughput leads to proportionally higher signal-to-noise ratios and reduced compression noise, or -alternatively- to lower shutter times. We explain how adaptive coded apertures can be computed quickly, how they can be applied in lenses by using binary spatial light modulators, and how a resulting coded bokeh can be transformed into a common radial one.

With the increasing computational capacity of camera-equipped mobile phones, object recognition on such devices is shifting away from centralized client-server approaches, in which the phones act only as input/output front-ends, to local on-device classification systems. The advantages of such a decentralization are shorter response times, scalability with respect to a large number of simultaneous users, and reduced network traffic costs. Mobile image classification can support applications that rely on device localization, such as museum or city guidance, by supplementing existing positional information retrieved, for instance, from GPS or
GSM cells. The challenge for mobile image classification, however, is to become as robust as possible, even when applied in large, highly dynamic, and uncontrollable public environments: Hundreds to thousands of objects must be recognized from different perspectives, from varying distances, and under changing lighting conditions, while recognition rates must remain usable. The key to solving this problem may be automatic adaptation to dynamic changes in the environment and to the most common user behavior.
This paper summarizes the various components of our mobile museum guidance system PhoneGuide.

The combination of advanced software algorithms and optics opens up new possibilities for display, imaging, and lighting. It makes possible responsive optical systems that adapt to particular situations automatically and dynamically. Visual computing is a relatively young research field that provides a foundation for many of these approaches. It represents a tight coupling between image synthesis, image analysis, and visual perception. While optics is all about image formation, visual computing deals with the general processing of images. This paper summarizes several examples that illustrate how graphics, vision, perception, and optics are combined to realize smart projectors, smart cameras, and smart light sources.

We present a multi-image classification technique for mobile phones that is supported by relational reasoning. Users capture a sequence of images employing a simple near-far camera movement. After classifying distinct keyframes using a nearest-neighbor approach the corresponding database images are only considered for a majority voting if they exhibit similar near-far inter-image relations to the captured keyframes. In the context of PhoneGuide, our adaptive mobile museum guidance system, a user study revealed that our multi-image classification technique leads to significantly higher classification rates than single image classification. Furthermore, when using near-far image relations, less keyframes are sufficient for classification. This increases the overall classification speed of our approach by up to 35%.

We present an unobtrusive technique for supporting and improving object recognition approaches on mobile phones. To accomplish this we determine the present and future locations of museum visitors by evaluating user-generated spatio-temporal pathway data. In the context of our adaptive mobile museum guidance system called PhoneGuide we show that this improves the classification performance significantly and can achieve recognition rates comparable to those of traditional location-based image classification approaches. Over a period of four months, we collected the pathway data of 132 regular museum visitors at the Natural History Museum of Erfurt, Germany.

We show that optical inverse tone-mapping (OITM) in light microscopy can improve the visibility of specimens, both when observed directly through the oculars and when imaged with a camera. In contrast to previous microscopy techniques, we pre-modulate the illumination based on the local modulation properties of the specimen itself. We explain how the modulation of uniform white light by a specimen can be estimated in real-time, even though the specimen is continuously but not uniformly illuminated. This information is processed and back-projected constantly, allowing the illumination to be adjusted on the fly if the specimen is moved or the focus or magnification of the microscope is changed. The contrast of the specimen's optical image can be enhanced, and high-intensity highlights can be suppressed. A formal pilot study with users indicates that this optimizes the visibility of spatial structures when observed through the oculars. We also demonstrate that the signal-to-noise (S/N) ratio in digital images of the specimen is higher if captured under an optimized rather than a uniform illumination. In contrast to advanced scanning techniques that maximize the S/N ratio using multiple measurements, our approach is fast because it requires only two images. This can be beneficial for image analysis in digital microscopy applications with real-time capturing demands.

We present a multi-image classification technique for mobile phones that is supported by relational reasoning. Users capture a sequence of images employing a simple near-far camera movement. After classifying distinct keyframes using a nearest-neighbor approach the corresponding database images are only considered for a majority voting if they exhibit similar near-far inter-image relations to the captured keyframes. In the context of PhoneGuide, our adaptive mobile museum guidance system, a user study revealed that our multi-image classification technique leads to significantly higher classification rates than single image classification. Furthermore, when using near-far image relations, less keyframes are sufficient for classification. This increases the overall classification speed of our approach by up to 35%.

CAMShift is a well-established and fundamental algorithm for kernel-based visual object tracking. While it performs well with objects that have a simple and constant appearance, it is not robust in more complex cases. As it solely relies on back projected probabilities it can fail in cases when the object’s appearance changes (e.g., due to object or camera movement, or due to lighting changes), when similarly colored objects have to be re-detected or when they cross their trajectories.
We propose low-cost extensions to CAMShift that address and resolve all of these problems. They allow the accumulation of multiple histograms to model more complex object appearances and the continuous monitoring of object identities to handle ambiguous cases of partial or full occlusion. Most steps of our method are carried out on the GPU for achieving real-time tracking of multiple targets simultaneously. We explain efficient GPU implementations of histogram generation, probability back projection, computation of image moments, and histogram intersection. All of these techniques make full use of a GPU’s high parallelization capabilities.

We show how temporal backdrops that alternately change their color rapidly at recording rate can aid chroma keying by transforming color spill into a neutral background illumination. Since the chosen colors sum up to white, the chromatic (color) spill component is neutralized when integrating over both backdrop states. Being able to separate both states, however, additionally allows to compute high quality alpha mattes. Besides neutralizing color spill, our method is invariant to foreground colors and supports applications with real-time demands. In this article, we explain different realizations of temporal backdrops and describe how keying and color spill neutralization are carried out, how artifacts resulting from rapid motion can be reduced, and how our approach can be implemented to be compatible with common real-time post-production pipelines.

We present a novel image classification technique for detecting multiple objects (called subobjects) in a single image. In addition to image classifiers, we apply spatial relationships among the subobjects to verify and to predict locations of detected and undetected subobjects, respectively. By continuously refining the spatial relationships throughout the detection process, even locations of completely occluded exhibits can be determined. Finally, all detected subobjects are labeled and the user can select the object of interest for retrieving corresponding multimedia information. This approach is applied in the context of PhoneGuide, an adaptive museum guidance system for camera-equipped mobile phones.
We show that the recognition of subobjects using spatial relationships is up to 68% faster than related approaches without spatial relationships. Results of a field experiment in a local museum illustrate that unexperienced users reach an average recognition rate for subobjects of 85.6% under realistic conditions.

We present a novel technique for adapting local image classifiers that are applied for object recognition on mobile phones through ad-hoc network communication between the devices. By continuously accumulating and exchanging collected user feedback among devices that are located within signal range, we show that our approach improves the overall classification rate and adapts to dynamic changes quickly. This technique is applied in the context of our PhoneGuide system -- a mobile phone based museum guidance framework that combines pervasive tracking and local object recognition for identifying a large number of objects in uncontrolled museum environments. We explain a technique that distributes the user feedback information during runtime through ad-hoc network connections between local devices. By doing so, we enforce cooperative classification improvements during the actual stay of the visitors. The general functionality of our technique has been tested with a small number of real devices in a museum. For proving its scalability, however, we have developed a simulator that evaluates our method for many hundred devices under several conditions. The simulation parameters have all been gathered in a museum, and are therefore realistic. We will show that ad-hoc phone-to-phone synchronization not only leads to higher overall classification rates, but also to quicker adaptations to dynamic changes during runtime.

We synchronize film cameras and LED lighting with off-the-shelf video projectors. Radiometric compensation allows displaying keying patterns and other spatial codes on arbitrary real world surfaces. A fast temporal multiplexing of coded projection and flash illumination enables professional keying, environment matting, displaying moderator information, scene reconstruction, and camera tracking for non-studio film sets without being limited to the constraints of a virtual studio. The reconstruction of the scene geometry allows special composition effects, such as shadow casts, occlusions and reflections. This makes digital video composition more flexible, since static studio equipment, such as blue screens, teleprompters, or tracking devices, is not required. Authentic film locations can be supported with our portable system without causing a lot of installation effort. We propose a concept that combines all of these techniques into one single compact system that is fully compatible with common digital video composition pipelines, and offers an immediate plug-and-play applicability.

We present an enhancement towards adaptive video training for PhoneGuide, a digital museum guidance system for ordinary camera–equipped mobile phones. It enables museum visitors to identify exhibits by capturing photos of them. In this article, a combined solution of object recognition and pervasive tracking is extended to a client–server–system for improving data acquisition and for supporting scale–invariant object recognition. A static as well as a dynamic training technique are presented that preprocess the collected object data differently and apply two types of neural networks for classification. Furthermore, the system enables a temporal adaptation for ensuring a continuous data acquisition to improve the recognition rate over time. A formal field experiment reveals current recognition rates and indicates the practicability of both methods under realistic conditions in a museum.

We present a new adaptive classification system for museum guidance tasks. It uses camera-equipped mobile phones for on-device object recognition in ad-hoc sensor networks and provides location and object aware multimedia content to museum visitors. Our approach is invariant against perspective, distance and illumination. It supports the scalable identification of single objects and multiple sub-objects, pervasive tracking, phone-to-sensor and phone-to-phone communication. It adapts to user behaviour and environmental conditions over time and achieves high recognition rates under realistic conditions. Our decentralized classification approach makes the system highly scalable to an arbitrarily large number of users since the heavy-weight training process is carried out off-line on the server while the lower-weight classification task is performed individually and in parallel by each mobile phone.

We present a novel technique for optical data transfer between public displays and mobile devices based on unsynchronized 4D barcodes. We assume that no direct (electromagnetic or other) connection between the devices can exist. Time-multiplexed, 2D color barcodes are displayed on screens and recorded with camera equipped mobile phones. This allows to transmit information optically between both devices. Our approach maximizes the data throughput and the robustness of the barcode recognition, while no immediate synchronization exists. Although the transfer rate is much smaller than it can be achieved with electromagnetic techniques (e.g., Bluetooth or WiFi), we envision to apply such a technique wherever no direct connection is available. 4D barcodes can, for instance, be integrated into public web-pages, movie sequences, advertisement presentations or information displays, and they encode and transmit more information than possible with single 2D or 3D barcodes.

Radiometric compensation techniques allow seamless projections onto complex everyday surfaces. Implemented with projector-camera systems they support the presentation of visual content in situations where projection-optimized screens are not available or not desired - as in museums, historic sites, air-plane cabins, or stage performances. We propose a novel approach that employs the full light transport between projectors and a camera to account for many illumination aspects, such as interreflections, refractions, shadows, and defocus. Pre-computing the inverse light transport in combination with an efficient implementation on the GPU makes the real-time compensation of captured local and global light modulations possible.

We present a novel adaptive imperceptible pattern projection technique that considers parameters of human visual perception. A coded image that is invisible for human observers is temporally integrated into the projected image, but can be reconstructed by a synchronized camera. The embedded code is dynamically adjusted on the fly to guarantee its non-perceivability and to adapt it to the current camera pose. Linked with real-time flash keying, for instance, this enables in-shot optical tracking using a dynamic multi-resolution marker technique. A sample prototype is realized that demonstrates the application of our method in the context of augmentations in television studios.

Mobile phones have the potential of becoming a future platform for personal museum guidance. They enable full multimedia presentations and –assuming that the visitors are using their own devices– will significantly reduce acquisition and maintenance cost for museum operators. However, several technological challenges have to be mastered before this concept can be successful. One of them is the question of how individual museum objects can be intuitively identified before presenting corresponding information. We have developed an enhanced museum guidance system called PhoneGuide that uses widespread camera equipped mobile phones for on-device object recognition in combination with pervasive tracking. It provides additional location- and object-aware multimedia content to museum visitors, and is scalable to cover a large number of museum objects. In a field survey our system was able to identify 155 real museum exhibits from multiple perspectives with a recognition rate of 95% and a classification speed of less than one second per object. A coarse grid of only eight low-cost Bluetooth emitters distributed over two museum floors was used to achieve these results. Once an object has been recognized, related multimedia presentations such as videos, audio , text, computer graphics and images are displayed on the phone.
Special thanks to the City Museum of Weimar and to CellIQ for their support.

We present PhoneGuide – an enhanced museum guidance approach that uses camera-equipped mobile phones and on-device object recognition. Our main technical achievement is a simple and light-weight object recognition approach that is realized with single-layer perceptron neuronal networks. In contrast to related systems which perform computational intensive image processing tasks on remote servers, our intention is to carry out all computations directly on the phone. This ensures little or even no network connectivity and consequently decreases cost for online times. Our laboratory experiments and field surveys have shown that photographed museum exhibits can be recognized with a probability of over 90%. We have evaluated different feature sets to optimize the recognition rate and performance. Our experiments reviled that normalized color features are most effective for our method. Choosing such a feature set allows recognizing an object below one second on up-to-date phones. The amount of data that is required for differentiating 50 objects from multiple perspectives is les than 6KBytes.
Special thanks to the Senckenberg Museum of Natural History, Frankfurt, to the Museum für Ur- und Frühgeschichte, Weimar, and to CellIQ for their support.

To enable mobile devices, such as head-mounted displays and PDAs, to support video see-through augmented reality is a popular research topic. However, such technology is not widely-spread outside the research community today. It has been estimated that by the end of the year 2005 approximately 50% of all cell phones will be equipped with digital cameras. Consequently, using cell phones as platform for video see-through AR has the potential of addressing a brought group of end users and applications. Compared to high-end PDAs and HMDs together with personal computers, the implementation of video see-through AR on the current fun- and smart-phone generations is a challenging task: Ultra-low video-stream resolutions, little graphics and memory capabilities, as well as slow processors set technological limitations. We have realized a prototype solution for video see-through AR on consumer cell phones. It supports optical tracking of passive paper markers and the correct integration of 2D/3D graphics into the live video-stream at interactive rates. We aim at applications, such as interactive tour guiding for museums and tourism, as well as at mobile games.
Special thanks to Nokia Research for their support.