In this paper, the authors conducted an experiment to evaluate the UX in an actual outdoor environment, assuming the
casual use of monocular HMD to view video content while short walking. In conducting the experiment, eight subjects
were asked to view news videos on a monocular HMD while walking through a large shopping mall. Two types of
monocular HMDs and a hand-held media player were used, and the
psycho-physiological responses of the subjects were
measured before, during, and after the experiment. The VSQ, SSQ and NASA-TLX were used to assess the subjective
workloads and symptoms. The objective indexes were heart rate and stride and a video recording of the environment in
front of the subject's face. The results revealed differences between the two types of monocular HMDs as well as
between the monocular HMDs and other conditions. Differences between the types of monocular HMDs may have been
due to screen vibration during walking, and it was considered as a major factor in the UX in terms of the workload.
Future experiments to be conducted in other locations will have higher cognitive loads in order to study the performance
and the situation awareness to actual and media environments.

Since more processing power, new sensing and display technologies are already available in mobile devices, there
has been increased interest in building systems to communicate via different modalities such as speech, gesture,
expression, and touch. In context identification based user interfaces, these independent modalities are combined
to create new ways how the users interact with hand-helds. While these are unlikely to completely replace
traditional interfaces, they will considerably enrich and improve the user experience and task performance. We
demonstrate a set of novel user interface concepts that rely on built-in multiple sensors of modern mobile devices
for recognizing the context and sequences of actions. In particular, we use the camera to detect whether the user
is watching the device, for instance, to make the decision to turn on the display backlight. In our approach the
motion sensors are first employed for detecting the handling of the device. Then, based on ambient illumination
information provided by a light sensor, the cameras are turned on. The frontal camera is used for face detection,
while the back camera provides for supplemental contextual information. The subsequent applications triggered
by the context can be, for example, image capturing, or bar code reading.

We present a new mobile service for the translation of text from images taken by consumer-grade cell-phone cameras.
Such capability represents a new paradigm for users where a simple image provides the basis for a service. The ubiquity
and ease of use of cell-phone cameras enables acquisition and transmission of images anywhere and at any time a user
wishes, delivering rapid and accurate translation over the phone's MMS and SMS facilities. Target text is extracted completely
automatically, requiring no bounding box delineation or related user intervention. The service uses localization,
binarization, text deskewing, and optical character recognition (OCR) in its analysis. Once the text is translated, an SMS
message is sent to the user with the result. Further novelties include that no software installation is required on the handset,
any service provider or camera phone can be used, and the entire service is implemented on the server side.

Selecting cosmetics requires visual information and often benefits from the assessments of a cosmetics expert. In this
paper we present a unique mobile imaging application that enables women to use their cell phones to get immediate
expert advice when selecting personal cosmetic products. We derive the visual information from analysis of camera
phone images, and provide the judgment of the cosmetics specialist through use of an expert system. The result is a new
paradigm for mobile interactions-image-based information services exploiting the ubiquity of camera phones. The
application is designed to work with any handset over any cellular carrier using commonly available MMS and SMS
features. Targeted at the unsophisticated consumer, it must be quick and easy to use, not requiring download capabilities
or preplanning. Thus, all application processing occurs in the back-end system and not on the handset itself. We present
the imaging pipeline technology and a comparison of the services' accuracy with respect to human experts.

In this paper, we propose a new adaptive embedding technique which decomposes the image into various bitplanes
based on redundant number systems. This technique is driven by three separate functions: 1) Adaptive selection
of locations and number of bits per pixel to embed. 2) Adaptive selection of bit-plane decomposition for the cover
image. 3) Adaptive selection of manner in which the information is inserted. Through the application of sensitive
directional-based statistical estimation and a recorded account of actions taken, the proposed algorithms are able to
provide the desired level of security, both visually and statistically. In comparison with other methods offering the same
level of security, the new technique is able to offer a greater embedding capacity.

In this paper we focus on: a) enhancing the performance of existing barcode systems and b) building a barcode
system for mobile applications. First we introduce a new concept of generating a parametric number representation
system by fusing a number of representation systems that use multiplication, addition, and other operations. Second
we show how one can generate a secure, reliable, and high capacity color barcode by using the fused system. The
representation, symbols, and colors may be used as encryption keys that can be encoded into barcodes, thus
eliminating the direct dependence on cryptographic techniques. To supply an extra layer of security, the fused
system also allows one to encrypt given data using different types of encryption methods. In addition, this fused
system can be used to improve image processing applications and cryptography.

Visual cryptography is a powerful method to share secret information, such as identification numbers, between plural
members. There have been many papers on visual cryptography by use of intensity modulation. Although the use of
intensity modulation is suitable for printing, degradation of image quality is a problem. Another problem for
conventional visual cryptography is a risk of theft of physical keys. To cope with these problems, we propose a new field
of visual cryptography by use of polarization. In this study, we have implemented polarization decoding by stacking
films. Use of polarization processing improves image quality of visual cryptography. The purpose of this paper is to
construct visual cryptography based on polarization processing. Furthermore, we construct a new type of visual
cryptography that uses stacking order as a key for decryption. The use of stacking order multiplies the complexity of
encryption. Then, it is effective to prevent secret against theft because the theft cannot determine the secret only by
collecting encrypted films.

An online buyer of multimedia content does not want to reveal his identity or his choice of multimedia content
whereas the seller or owner of the content does not want the buyer to further distribute the content illegally.
To address these issues we present a new private anonymous fingerprinting protocol. It is based on superposed
sending for communication security, group signature for anonymity and traceability and single database private
information retrieval (PIR) to allow the user to get an element of the database without giving any information
about the acquired element. In the presence of a semi-honest model, the protocol is implemented using a blind,
wavelet based color image watermarking scheme. The main advantage of the proposed protocol is that both the
user identity and the acquired database element are unknown to any third party and in the case of piracy, the
pirate can be identified using the group signature scheme. The robustness of the watermarking scheme against
Additive White Gaussian Noise is also shown.

In the paper we extend an existing information fusion based audio steganalysis approach by three different kinds of
evaluations: The first evaluation addresses the so far neglected evaluations on sensor level fusion. Our results show that
this fusion removes content dependability while being capable of achieving similar classification rates (especially for
the considered global features) if compared to single classifiers on the three exemplarily tested audio data hiding
algorithms. The second evaluation enhances the observations on fusion from considering only segmental features to
combinations of segmental and global features, with the result of a reduction of the required computational complexity
for testing by about two magnitudes while maintaining the same degree of accuracy.
The third evaluation tries to build a basis for estimating the plausibility of the introduced steganalysis approach by
measuring the sensibility of the models used in supervised classification of steganographic material against typical
signal modification operations like de-noising or 128kBit/s MP3 encoding. Our results show that for some of the tested
classifiers the probability of false alarms rises dramatically after such modifications.

Multimedia forensics deals with the analysis of multimedia data to gather information on its origin and authenticity. One
therefore needs to distinguish classical criminal forensics (which today also uses multimedia data as evidence) and
multimedia forensics where the actual case is based on a media file. One example for the latter is camera forensics where
pixel error patters are used as fingerprints identifying a camera as the source of an image. Of course multimedia forensics
can become a tool for criminal forensics when evidence used in a criminal investigation is likely to be manipulated. At
this point an important question arises: How reliable are these algorithms? Can a judge trust their results? How easy are
they to manipulate? In this work we show how camera forensics can be attacked and introduce a potential
countermeasure against these attacks.

Detection results obtained from an oracle can be used to reverse-engineer the underlying detector structure, or
parameters thereof. In particular, if a detector uses a common structure like correlation or normalized correlation,
detection results can be used to estimate feature space dimensionality, watermark strength, and detector threshold
values. Previous estimation techniques used a simplistic but tractable model for a watermarked image in the
detection cone of a normalized correlation detector; in particular a watermarked image is assumed to lie along the
axis of the detection cone, essentially corresponding to an image of zero magnitude. This produced useful results
for feature spaces of fewer dimensions, but increasingly imprecise estimates for larger feature spaces. In this paper
we model the watermarked image properly as a sum of a cover vector and approximately orthogonal watermark
vector, offsetting the image within the cone, which is the geometry of a detector using normalized correlation.
This symmetry breaking produces a far more complex model which boils down to a quartic equation. Although
it is infeasible to find its symbolic solution even with the aid of computer, our numerical analysis results show
certain critical behavior which reveals the relationship between the attacking noise strength and the detector
parameters. The critical behavior predicted by our model extends our reverse-engineering capability to the case of
detectors with large feature space dimensions, which is not uncommon in multimedia watermarking algorithms.

This work is motivated by the limitations of statistical quality metrics to assess the quality of images distorted
in distinct frequency ranges. Common quality metrics, which basically have been designed and tested for various
kind of global distortions, such as image coding may not be efficient for watermarking applications, where
the distortions might be restricted on a very narrow portion of the frequency spectrum. We hereby want to
propose an objective quality metric whose performances do not depend on the distortion frequency range, but
we nevertheless want to provide a simplified objective quality metric in opposition to the complex Human Visual
System (HVS) based quality metrics recently made available. The proposed algorithm is generic (not designed
for a particular distortion), and exploits the contrast sensitivity function (CSF) along with an adapted Minkowski
error pooling. The results show a high correlation between the proposed objective metric and the mean opinion
score (MOS) given by observers. A comparison with relevant existing objective quality metrics is provided.

The quality of the images obtained by digital cameras has improved a lot since digital cameras early days.
Unfortunately, it is not unusual in image forensics to find wrongly exposed pictures. This is mainly due to
obsolete techniques or old technologies, but also due to backlight conditions. To extrapolate some invisible
details a stretching of the image contrast is obviously required. The forensics rules to produce evidences require
a complete documentation of the processing steps, enabling the replication of the entire process. The automation
of enhancement techniques is thus quite difficult and needs to be carefully documented. This work presents
an automatic procedure to find contrast enhancement settings, allowing both image correction and automatic
scripting generation. The technique is based on a preprocessing step which extracts the features of the image
and selects correction parameters. The parameters are thus saved through a JavaScript code that is used in the
second step of the approach to correct the image. The generated script is Adobe Photoshop compliant (which is
largely used in image forensics analysis) thus permitting the replication of the enhancement steps. Experiments
on a dataset of images are also reported showing the effectiveness of the proposed methodology.

Global Positioning System (GPS) products help to navigate while driving, hiking, boating, and flying. GPS uses a
combination of orbiting satellites to determine position coordinates. This works great in most outdoor areas, but the
satellite signals are not strong enough to penetrate inside most indoor environments. As a result, a new strain of indoor
positioning technologies that make use of 802.11 wireless LANs (WLAN) is beginning to appear on the market. In
WLAN positioning the system either monitors propagation delays between wireless access points and wireless device
users to apply trilateration techniques or it maintains the database of location-specific signal fingerprints which is used to
identify the most likely match of incoming signal data with those preliminary surveyed and saved in the database. In this
paper we investigate the issue of deploying WLAN positioning software on mobile platforms with typically limited
computational resources. We suggest a novel received signal strength rank order based location estimation system to
reduce computational loads with a robust performance. The proposed system performance is compared to conventional
approaches.

In this paper, we propose a new method to adapt the resolution of images to the limited display resolution
of mobile devices. We use the seam carving technique to identify and remove less relevant content in images.
Seam carving achieves a high adaptation quality for landscape images and distortions caused by the removal of
seams are very low compared to other techniques like scaling or cropping. However, if an image depicts objects
with straight lines or regular patterns like buildings, the visual quality of the adapted images is much lower.
Errors caused by seam carving are especially obvious if straight lines become curved or disconnected. In order
to preserve straight lines, our algorithm applies line detection in addition to the normal energy function of seam
carving. The energy in the local neighborhood of the intersection point of a seam and a straight line is increased
to prevent other seams from removing adjacent pixels. We evaluate our improved seam carving algorithm and
compare the results with regular seam carving. In case of landscape images with no straight lines, traditional
seam carving and our enhanced approach lead to very similar results. However, in the case of objects with
straight lines, the quality of our results is significantly better.

The rapid dissemination of media technologies has lead to an increase of unauthorized copying and distribution
of digital media. Digital watermarking, i.e. embedding information in the multimedia signal in a robust and
imperceptible manner, can tackle this problem. Recently, there has been a huge growth in the number of
different terminals and connections that can be used to consume multimedia. To tackle the resulting distribution
challenges, scalable coding is often employed. Scalable coding allows the adaptation of a single bit-stream to
varying terminal and transmission characteristics. As a result of this evolution, watermarking techniques that
are robust against scalable compression become essential in order to control illegal copying. In this paper, a
watermarking technique resilient against scalable video compression using the state-of-the-art H.264/SVC codec
is therefore proposed and evaluated.

In anticipation of the proliferation of micro-projectors on our handheld imaging devices, we designed and tested a
camera-projector system that allows a distant user to point into a remote 3D environment with a projector. The solution
involves a means for locating a projected dot, and for adjusting its location to correspond to a position indicted a remote
user viewing the scene through a camera. It was designed to operate efficiently, even in the presence of camera noise.
While many camera-projector display systems require a calibration phase, the presented approach allows calibration-free
operation. The tracking algorithm is implemented with a modified 2D gradient decent method that performs even in the
presence of spatial discontinuities. Our prototype was constructed using a standard web-camera and network to perform
real-time tracking, navigating the projected dot across irregularly shaped and colored surfaces accurately. Our tests
included a camera-projector system and client on either side of the Atlantic Ocean with no loss of responsiveness.

This work presents a new distributed multiview coding framework, based on the H.264/AVC standard operating
with mixed resolution frames. It allows for a scalable complexity transfer from the encoder to the decoder, which
is particularly suited for low-power video applications, such as multiview surveillance systems. Greater quality
sequences are generated by exploiting the spatial and temporal correlation between views at the decoder. The
results show a good potential for objective quality improvement over simulcast coding, with no extra rate cost.

We present a novel intelligent video surveillance system with efficient detection of abandoned objects and counting
number of pedestrians. In the proposed algorithm the adaptively generated background enables to solve problems of
illumination change and occlusions. After building the adaptive background model, the counting procedure starts to
augment number of detected objects. Experimental results show that the proposed system outperforms existing
abandoned object detection and pedestrian counting methods.

In this paper we present a new approach for sharing a secret image between l users exploiting additive homomorphic
property of Paillier algorithm. With a traditional approach, when a dealer wants to share an image between
l players, the secret image must be sequentially encrypted l + 1 times using l + 1 keys (secret or public keys).
When the dealer and the l players want to extract the secret image, they must decrypt sequentially, keeping the
same order of the encryption step, by using l + 1 keys (secret or private). With the proposed approach, during
the encryption step, each player encrypts his own secret image using the same public key given by the dealer,
the dealer encrypts the secret image to be shared with the same key and then the l secret encrypted images plus
the encrypted image to be shared are multiplied between them to get a scrambled image. After this step, the
dealer can securely use the private key to decrypt this scrambled image to get a new scrambled image which
corresponds to the addition of the l + 1 original images because of the additive homomorphic property of Paillier
algorithm. When the l players want to extract the secret image, they do not need the dealer and to use keys.
Indeed, with our approach, to extract the secret image, the l players need only to subtract their own secret image
from the scrambled image. In this paper we illustrate our approach with an example of a captain who wants to
share a secret treasure map between l pirates. Experimental results and security analysis show the effectiveness
of the proposed scheme.

A continuously growing amount of information of today exists not only in digital form but were actually born-digital.
These informations need be preserved as they are part of our cultural and scientific heritage or because of legal
requirements. As many of these information are born-digital they have no analog origin, and cannot be preserved by
traditional means without losing their original representation. Thus digital long-term preservation becomes continuously
important and is tackled by several international and national projects like the US National Digital Information
Infrastructure and Preservation Program [1], the German NESTOR project [2] and the EU FP7 SHAMAN Integrated
Project [3].
In digital long-term preservation the integrity and authenticity of the preserved information is of great importance and a
challenging task considering the requirement to enforce both security aspects over a long time often assumed to be at
least 100 years. Therefore in a previous work [4] we showed the general feasibility of the Clark-Wilson security model
[5] for digital long-term preservation in combination with a syntactic and semantic verification approach [6] to tackle
these issues. In this work we do a more detailed investigation and show exemplarily the influence of the application of
such a security model on the use cases and roles of a digital
long-term preservation environment. Our goals is a scalable
security model - i.e. no fixed limitations of usable operations, users and objects - for mainly preserving integrity of
objects but also ensuring authenticity.

This paper proposes a fast statistical approach to recover lost motion vectors in H.264 video coding standard. Unlike
other video coding standards, the motion vectors of H.264 cover smaller area of the video frame being encoded. This
leads to a strong correlation between neighboring motion vectors, thus making H.264 standard amenable for statistical
analysis to recover the lost motion vectors. This paper proposes a Pearson Correlation Coefficient based matching
algorithm that speeds up the recovery of lost motion vectors with very less compromise in visual quality of the recovered
video. To the best of our knowledge, this is the first attempt that employs correlation coefficient for motion vector
recovery. Experimental results obtained by employing the proposed algorithm on standard benchmark video sequences
show that they yield comparable quality of recovered video with significantly less computation than the best reported in
the literature, thus making it suitable for real-time applications.

Down-sampling coding, which sub-samples the image and encodes the smaller sized images, is one of the solutions to
raise the image quality at insufficiently high rates. In this work, we propose an Adaptive Down-Sampling (ADS) coding
for H.264/AVC. The overall system distortion can be analyzed as the sum of the down-sampling distortion and the
coding distortion. The down-sampling distortion is mainly the loss of the high frequency components that is highly
dependent of the spatial difference. The coding distortion can be derived from the classical Rate-Distortion theory. For a
given rate and a video sequence, the optimum down-sampling
resolution-ratio can be derived by utilizing the optimum
theory toward minimizing the system distortion based on the models of the two distortions. This optimal resolution-ratio
is used in both down-sampling and up-sampling processes in ADS coding scheme. As a result, the rate-distortion
performance of ADS coding is always higher than the fixed ratio coding or H.264/AVC by 2 to 4 dB at low to medium
rates.

The choice of the right coding method is a critical factor in the development process of mobile 3D television and
video. Several coding methods are available and each of these is based on a different approach. These differences
lead to method specific artefacts - content and bit rate are as well important parameters for the performance.
In our study,w e evaluated Simulcast,Multi View Coding,Mixed Resolution Stereo Coding and Video + Depth
Coding. Therefore each method has been optimized at a high and a low bit rate using parameters typical for
mobile devices. The goal of the study was to get knowledge about the optimum codign method for mobile
3DTV,but also to get knowledge about the underlying rationale of quality perception. We used Open Profiling
of Quality (OPQ) for comparison. OPQ combines quantitative rating and sensory profiling of the content. This
allowed us to get a preference order of the coding methods and additional individual quality factors that were
formed into a quality model. The results show that MVC and V+D outperform the other two approaches,but
content itself is still an important factor.

We investigate the effect of camera de-calibration on the quality of depth estimation. Dense depth map is a format
particularly suitable for mobile 3D capture (scalable and screen independent). However, in real-world scenario cameras
might move (vibrations, temp. bend) form their designated positions. For experiments, we create a test framework,
described in the paper. We investigate how mechanical changes will affect different (4) stereo-matching algorithms. We
also assess how different geometric corrections (none, motion compensation-like, full rectification) will affect the
estimation quality (how much offset can be still compensated with "crop" over a larger CCD). Finally, we show how
estimated camera pose change (E) relates with stereo-matching, which can be used for "rectification quality" measure.

Creating an imperceptible watermark which can be read by a broad range of cell phone cameras is a difficult problem.
The problems are caused by the inherently low resolution and noise levels of typical cell phone cameras. The quality
limitations of these devices compared to a typical digital camera are caused by the small size of the cell phone and cost
trade-offs made by the manufacturer.
In order to achieve this, a low resolution watermark is required which can be resolved by a typical cell phone camera.
The visibility of a traditional luminance watermark was too great at this lower resolution, so a chrominance watermark
was developed. The chrominance watermark takes advantage of the relatively low sensitivity of the human visual system
to chrominance changes. This enables a chrominance watermark to be inserted into an image which is imperceptible to
the human eye but can be read using a typical cell phone camera.
Sample images will be presented showing images with a very low visibility which can be easily read by a typical cell
phone camera.

Subjective quality evaluation experiments are conducted for optimizing critical system components during the process of
system development. Conventionally, the experiments take place in the controlled viewing conditions even though the
target application is meant to be used in the heterogeneous mobile settings. The goal of the paper is a two-fold. Firstly,
we present a hybrid User-Centered Quality of Experience (UC-QoE) evaluation method for measuring quality in the
context of use. The method combines quantitative preference ratings, qualitative descriptions of quality and context,
characterization of context in the macro and micro levels, and the measures of effort. Secondly, we present results of two
experiments using this method in different field settings and compared to the laboratory settings. We conducted the
experiments with a relatively low quality range for current and future data rates for mobile (3D) television by varying
encoding parameters for simulcast stereo video. The study was conducted on a portable device with parallax barrier
display technology. The results show significant differences between the different field conditions and between field and
laboratory measures.