Micro- and nanoresolution applications are important part of functional material research, where imaging and
observation of material interaction may go down to the molecular or even atomic level. Much of the nanometer range
movement of scanning and manipulation instruments is made possible by usage of piezoelectric actuation systems.
This paper presents a software based controller implementation utilizing neural networks for high precision positioning
of a piezoelectric actuator. The controller developed can be used for controlling nanopositioning piezo actuators when
sufficiently accurate feedback information is available.
Piezo actuators exhibit complex hysteresis dynamics that need to be taken into account when designing an accurate
control system. For inverse modelling purposes of the hysteresis related phenomena, a static hysteresis operator and a
new developed dynamic creep operator are presented to be used in conjunction with a feed-forward type neural
network. The controller utilizing the neural network inverse hybrid model is implemented as a software component for
the existing Scalable Modular Control framework (SMC). Using the SMC framework and off-the-shelf components, a
measurement and control system for the nanopositioning actuator is constructed and tested using two different
capacitive sensors operating on y- and z-axes of the actuator.
Using the developed controller, piezo actuator related hysteresis phenomena were successfully reduced making the
nanometer range positioning of the actuator axes possible. Also, the effect of using a lower accuracy position sensor
with more noise to control accuracy is briefly discussed.

The observation and monitoring of traffic with smart visions systems for the purpose of improving traffic safety has a big potential. Today the automated analysis of traffic situations is still in its infancy--the patterns of vehicle motion and pedestrian flow in an urban environment are too complex to be fully captured and interpreted by a vision system. 3In this work we present steps towards a visual monitoring system which is designed to detect potentially dangerous traffic situations around a pedestrian crossing at a street intersection. The camera system is specifically designed to detect incidents in which the interaction of pedestrians and vehicles might develop into safety critical encounters. The proposed system has been field-tested at a real pedestrian crossing in the City of Vienna for the duration of one year. It consists of a cluster of 3 smart cameras, each of which is built from a very compact PC hardware system in a weatherproof housing. Two cameras run vehicle detection and tracking software, one camera runs a pedestrian detection and tracking module based on the HOG dectection principle. All 3 cameras use sparse optical flow computation in a low-resolution video stream in order to estimate the motion path and speed of objects. Geometric calibration of the cameras allows us to estimate the real-world co-ordinates of detected objects and to link the cameras together into one common reference system. This work describes the foundation for all the different object detection modalities (pedestrians, vehicles), and explains the system setup, tis design, and evaluation results which we have achieved so far.

The Intelligent Ground Vehicle Competition (IGVC) is one of four, unmanned systems, student competitions that were
founded by the Association for Unmanned Vehicle Systems International (AUVSI). The IGVC is a multidisciplinary
exercise in product realization that challenges college engineering student teams to integrate advanced control theory,
machine vision, vehicular electronics and mobile platform fundamentals to design and build an unmanned system.
Teams from around the world focus on developing a suite of dual-use technologies to equip ground vehicles of the future
with intelligent driving capabilities. Over the past 19 years, the competition has challenged undergraduate, graduate and
Ph.D. students with real world applications in intelligent transportation systems, the military and manufacturing
automation. To date, teams from almost 80 universities and colleges have participated. This paper describes some of the
applications of the technologies required by this competition and discusses the educational benefits. The primary goal of
the IGVC is to advance engineering education in intelligent vehicles and related technologies. The employment and
professional networking opportunities created for students and industrial sponsors through a series of technical events
over the four-day competition are highlighted. Finally, an assessment of the competition based on participation is
presented.

This paper improves the authors' conventional method for reconstructing the 3D structure of moving and still objects
that are tracked in the video and/or depth image sequences acquired by moving cameras and/or range finder. The authors
proposed a Temporal Modified-RANSAC based method [1] that (1) can discriminate each moving object from the still
background in color image and depth image sequences acquired by moving stereo cameras or moving range finder, (2)
can compute the stereo cameras' egomotion, (3) can compute the motion of each moving object, and (4) can reconstruct
the 3D structure of each moving object and the background. However, the TMR based method has the following two
problems concerning the 3D reconstruction: lack of accuracy of segmenting into each object's region and sparse 3D
reconstructed points in each object's region. To solve these problems of our conventional method, this paper proposes a
new 3D segmentation method that utilizes Graph-cut, which is frequently used for segmentation tasks. First, the
proposed method tracks feature points in the color and depth image sequences so that 3D optical flows of the feature
points in every N frames are obtained. Then, TMR classifies all the obtained 3D optical flows into regions (3D flow set)
for the background and each moving object; simultaneously, the rotation matrix and the translation vector for each 3D
flow set are computed. Next, Graph-Cut using the energy function that consists of color probability, structure probability
and a-priori probability is performed so that pixels in each frame are segmented into object regions and the background
region. Finally, 3D point clouds are obtained from the segmentation result image and depth image, and then the point
clouds are merged using the rotation and translation from the N-th frame prior to the current frame so that 3D models for
the background and each moving object are constructed with dense 3D point data.

In this paper we propose an innovative method for the automatic detection and tracking of road traffic signs using an onboard
stereo camera. It involves a combination of monocular and stereo analysis strategies to increase the reliability of
the detections such that it can boost the performance of any traffic sign recognition scheme. Firstly, an adaptive color
and appearance based detection is applied at single camera level to generate a set of traffic sign hypotheses. In turn,
stereo information allows for sparse 3D reconstruction of potential traffic signs through a SURF-based matching
strategy. Namely, the plane that best fits the cloud of 3D points traced back from feature matches is estimated using a
RANSAC based approach to improve robustness to outliers. Temporal consistency of the 3D information is ensured
through a Kalman-based tracking stage. This also allows for the generation of a predicted 3D traffic sign model, which is
in turn used to enhance the previously mentioned color-based detector through a feedback loop, thus improving detection
accuracy. The proposed solution has been tested with real sequences under several illumination conditions and in both
urban areas and highways, achieving very high detection rates in challenging environments, including rapid motion and
significant perspective distortion.

In geometric calibration of stereoscopic cameras the object is to determine a set of parameters which describe the
mapping from 3D reference coordinates to 2D image coordinates, and indicate the geometric relationships between the
cameras. While various methods for stereo cameras with ordinary lenses can be found from the literature, stereoscopic
vision with extremely wide angle lenses has been much less discussed. Spherical stereoscopic vision is more and more
convenient in computer vision applications. However, its use for 3D measurement purposes is limited by the lack of an
accurate, general, and easy-to-use calibration procedure. Hence, we present a geometric model for spherical stereoscopic
vision equipped by extremely wide angle lenses. Then, a corresponding generic mathematical model is built. Method for
calibration the parameters of the mathematical model is proposed. This paper shows practical results from the calibration
of two high quality panomorph lenses mounted on cameras with 2048x1536 resolutions. Here, the stereoscopic vision
system is flexible, the position and orientation of the cameras can be adjusted randomly. The calibration results include
interior orientation, exterior orientation and the geometric relationships between the two cameras. The achieved level of
calibration accuracy is very satisfying.

A mobile robot equipped with a stereo camera can measure both the video image of a scene and the visual
disparity in the scene. The disparity image can be used to generate a collection of points, each representing
the location of a surface in the visual scene as a 3D point with respect to the location of the stereo camera:
a point cloud. If the stereo camera is moving, e.g., mounted on a moving robot, aligning these scans becomes
a difficult, and computationally expensive problem. Many finely tuned versions of the iterative closest point
algorithm (ICP) have been used throughout robotics for registration of these sets of scans. However, ICP relies
on theoretical convergence to the nearest local minimum of the dynamical system: there is no guarantee that
ICP will accurately align the scans. In order to address two problems with ICP, convergence time and accuracy
of convergence, we have developed an improvement by using salient keypoints from successive video images to
calculate an affine transformation estimate of the camera location. This transformation, when applied to the
target point cloud, provides ICP an initial guess to reduce the computational time required for point cloud
registration and improve the quality of registration. We report ICP convergence times with and without image
information for a set of stereo data point clouds to demonstrate the effectiveness of the approach.

The transportation of hazardous goods in public streets systems can pose severe safety threats in case of accidents.
One of the solutions for these problems is an automatic detection and registration of vehicles which are marked
with dangerous goods signs. We present a prototype system which can detect a trained set of signs in high
resolution images under real-world conditions. This paper compares two different methods for the detection:
bag of visual words (BoW) procedure and our approach presented as pairs of visual words with Hough voting.
The results of an extended series of experiments are provided in this paper. The experiments show that the
size of visual vocabulary is crucial and can significantly affect the recognition success rate. Different code-book
sizes have been evaluated for this detection task. The best result of the first method BoW was 67% successfully
recognized hazardous signs, whereas the second method proposed in this paper - pairs of visual words and Hough
voting - reached 94% of correctly detected signs. The experiments are designed to verify the usability of the two
proposed approaches in a real-world scenario.

Mobile systems exploring Planetary surfaces in future will require more autonomy than today. The EU FP7-SPACE
Project ProViScout (2010-2012) establishes the building blocks of such autonomous exploration systems in terms of
robotics vision by a decision-based combination of navigation and scientific target selection, and integrates them into a
framework ready for and exposed to field demonstration.
The PRoViScout on-board system consists of mission management components such as an Executive, a Mars Mission
On-Board Planner and Scheduler, a Science Assessment Module, and Navigation & Vision Processing modules. The
platform hardware consists of the rover with the sensors and pointing devices.
We report on the major building blocks and their
functions & interfaces, emphasizing on the computer vision parts such
as image acquisition (using a novel zoomed 3D-Time-of-Flight & RGB camera), mapping from 3D-TOF data,
panoramic image & stereo reconstruction, hazard and slope maps, visual odometry and the recognition of potential
scientifically interesting targets.

The observation and monitoring of traffic witih smart vision systems for the purpose of improving traffic safety has a big potential. Embedded loop sensors can detect and count passing vehicles, radar can measure speed and presence of vehicles, and embedded vision systems or stationary camera systems can count vehicles and estimate the state of traffic along the road. This work presents a vision system which is targeted at detecting and reporting incidents at unsecured railways crossings. These crossings, even when guarded by automated barriers, pose a threat to drivers day and night. Our system is designed to detect and record vehicles which pass over the railway crossing by means of real-time motion analysis after the red light has been activated. We implement sparse optical flow in conjunction with motion clustering in order to detect critical events. We describe some modifications of the original Lucas Kanade optical flow method which makes our implementation faster and more robust compared to the original concept. In addition, the results of our optical flow method are compared with a HOG based vehicle detector which has been implemented and tested as an alternative methodology. The embedded system which is used for detection consists of a smart camera which observes one street lane as* well as the red light at the crossing. The camera is triggered by an electrical signal from the railway as soon ss a vehicle moves over th this line, image sequences are recorded and stored onboard the device.

Vehicle tracking is of great importance for tunnel safety. To detect incidents or disturbances in traffic flow it is
necessary to reliably track vehicles in real-time. The tracking is a challenging task due to poor lighting conditions
in tunnels and frequent light reflections from tunnel walls, the road and the vehicles themselves. In this paper we
propose a multi-clue tracking approach combining foreground blobs, optical flow of Shi-Tomasi features and image
projection profiles in a Kalman filter with a constant velocity model. The main novelty of our approach lies in
using vertical and horizontal image projection profiles (so-called vehicle signatures) as additional measurements
to overcome the problems of inconsistent foreground and optical flow clues in cases of severe lighting changes.
These signatures consist of Radon-transform like projections along each image column and row. We compare the
signatures from two successive video frames to align them and to correct the predicted vehicle position and size.
We tested our approach on a real tunnel video sequence. The results show an improvement in the accuracy of the
tracker and less target losses when image projection clues are used. Furthermore, calculation and comparison
of image projections is computationally efficient so the tracker keeps real-time performance (25 fps, on a single
1.86 GHz processor).

Real-time tracking of people has many applications in computer vision and typically requires multiple cameras;
for instance for surveillance, domotics, elderly-care and video conferencing. However, this problem is very
challenging because of the need to deal with frequent occlusions and environmental changes. Another challenge
is to develop solutions which scale well with the size of the camera network. Such solutions need to carefully
restrict overall communication in the network and often involve distributed processing. In this paper we present a
distributed person tracker, addressing the aforementioned issues. Real-time processing is achieved by distributing
tasks between the cameras and a fusion node. The latter fuses only high level data based on low-bandwidth
input streams from the cameras. This is achieved by performing tracking first on the image plane of each camera
followed by sending only metadata to a local fusion node. We designed the proposed system with respect to a
low communication load and towards robustness of the system. We evaluate the performance of the tracker in
meeting scenarios where persons are often occluded by other persons and/or furniture. We present experimental
results which show that our tracking approach is accurate even in cases of severe occlusions in some of the
views.

With rapid increase of number of vehicles on roads it is necessary to maintain close monitoring of traffic. For this
purpose many surveillance cameras are placed along roads and on crossroads, creating a huge communication load
between the cameras and the monitoring center. Therefore, the data needs to be processed on site and transferred to the
monitoring centers in form of metadata or as a set of selected images. For this purpose it is necessary to detect events of
interest already on the camera side, which implies using smart cameras as visual sensors. In this paper we propose a
method for tracking of vehicles and analysis of vehicle trajectories to detect different traffic events. Kalman filtering is
used for tracking, combining foreground and optical flow measurements. Obtained vehicle trajectories are used to detect
different traffic events. Every new trajectory is compared with collection of normal routes and clustered accordingly. If
the observed trajectory differs from all normal routes more than a predefined threshold, it is marked as abnormal and the
alarm is raised. The system was developed and tested on Texas Instruments OMAP platform. Testing was done on four
different locations, two locations in the city and two locations on the open road.

This paper applies object detection in a microscopic traffic model calibration process and analyses the outcome. To cover
a large and versatile amount of real world data for calibration and validation processes this paper proposes semiautomated
data acquisition by video analysis. This work concentrates mainly on the aspects of a automatic annotation
tool applied to create trajectories of traffic participants over space and time.
The acquired data is analyzed with a view towards calibrating vehicle models, which navigate through a road's surface
and interact with the environment. The applied vehicle tracking algorithms for automated data extraction provide many
trajectories not applicable for model calibration. Therefore, we applied an additional automated processing step to filter
out faulty trajectories. With this process chain, the trajectory data can be extracted from videos automatically in a quality
sufficient for the model calibration of speeds, the lateral positioning and vehicle interactions in a mixed traffic
environment.

In this article we illustrate an approach of a security threat analysis of the quadrocopter AR.Drone, a toy
for augmented reality (AR) games. The technical properties of the drone can be misused for attacks, which
may relate security and/or privacy aspects. Our aim is to sensitize for the possibility of misuses and the
motivation for an implementation of improved security mechanisms of the quadrocopter. We focus primarily
on obvious security vulnerabilities (e.g. communication over unencrypted WLAN, usage of UDP, live video
streaming via unencrypted WLAN to the control device) of this quadrocopter. We could practically verify in
three exemplary scenarios that this can be misused by unauthorized persons for several attacks: high-jacking of
the drone, eavesdropping of the AR.Drones unprotected video streams, and the tracking of persons. Amongst
other aspects, our current research focuses on the realization of the attack of tracking persons and objects with
the drone. Besides the realization of attacks, we want to evaluate the potential of this particular drone for a
"safe-landing" function, as well as potential security enhancements. Additionally, in future we plan to investigate
an automatic tracking of persons or objects without the need of human interactions.

An aerial multiple camera tracking paradigm needs to not only spot unknown targets and track them, but also needs to
know how to handle target reacquisition as well as target handoff to other cameras in the operating theater. Here we
discuss such a system which is designed to spot unknown targets, track them, segment the useful features and then create
a signature fingerprint for the object so that it can be reacquired or handed off to another camera. The tracking system
spots unknown objects by subtracting background motion from observed motion allowing it to find targets in motion,
even if the camera platform itself is moving. The area of motion is then matched to segmented regions returned by the
EDISON mean shift segmentation tool. Whole segments which have common motion and which are contiguous to each
other are grouped into a master object. Once master objects are formed, we have a tight bound on which to extract
features for the purpose of forming a fingerprint. This is done using color and simple entropy features. These can be
placed into a myriad of different fingerprints. To keep data transmission and storage size low for camera handoff of
targets, we try several different simple techniques. These include Histogram, Spatiogram and Single Gaussian Model.
These are tested by simulating a very large number of target losses in six videos over an interval of 1000 frames each
from the DARPA VIVID video set. Since the fingerprints are very simple, they are not expected to be valid for long
periods of time. As such, we test the shelf life of fingerprints. This is how long a fingerprint is good for when stored
away between target appearances. Shelf life gives us a second metric of goodness and tells us if a fingerprint method
has better accuracy over longer periods. In videos which contain multiple vehicle occlusions and vehicles of highly
similar appearance we obtain a reacquisition rate for automobiles of over 80% using the simple single Gaussian model
compared with the null hypothesis of <20%. Additionally, the performance for fingerprints stays well above the null
hypothesis for as much as 800 frames. Thus, a simple and highly compact single Gaussian model is useful for target
reacquisition. Since the model is agnostic to view point and object size, it is expected to perform as well on a test of
target handoff. Since some of the performance degradation is due to problems with the initial target acquisition and
tracking, the simple Gaussian model may perform even better with an improved initial acquisition technique. Also, since
the model makes no assumption about the object to be tracked, it should be possible to use it to fingerprint a multitude of
objects, not just cars. Further accuracy may be obtained by creating manifolds of objects from multiple samples.

Super resolution techniques are commonly used to enhance images and video. The techniques have previously been
applied to the enhancement of map data via enhancing aerial imagery. This paper proposes the use of super resolution
techniques for enhancing topographic data directly. Specifically, a database-driven super resolution algorithm that is
trained with domain-specific patterns is used to enhance topographic digital elevation model (DEM) data from
NASA/NGIA SRTM. This enhancement process is evaluated using a elevation-difference evaluation technique where
downscaled and enhanced DEM data is compared to the origional higher-resolution data. It is also evaluated with a
threshold-based elevation difference metric and visually. The benefits that it might have for flight path planning for a
UAV application are discussed. The challenges of using super resolution style techniques on non-visual data are also
reviewed.

The objective of this paper is to establish a technique that levitates and conveys a hand, a kind of micro-robot, by
applying magnetic forces: the hand is assumed to have a function of holding and detaching the objects. The equipment to
be used in our experiments consists of four pole-pieces of electromagnets, and is expected to work as a 4DOF drive unit
within some restricted range of 3D space: the three DOF are corresponding to 3D positional control and the remaining
one DOF, rotational oscillation damping control. Having used the same equipment, Khamesee et al. had manipulated the
impressed voltages on the four electric magnetics by a PID controller by the use of the feedback signal of the hand's 3D
position, the controlled variable. However, in this system, there were some problems remaining: in the horizontal
direction, when translating the hand out of restricted region, positional control performance was suddenly degraded. The
authors propose a method to apply an adaptive control to the horizontal directional control. It is expected that the
technique to be presented in this paper contributes not only to the improvement of the response characteristic but also to
widening the applicable range in the horizontal directional control.

This paper describes the design of a gesture-based Human Robot Interface (HRI) for an autonomous mobile robot
entered in the 2010 Intelligent Ground Vehicle Competition (IGVC). While the robot is meant to operate autonomously
in the various Challenges of the competition, an HRI is useful in moving the robot to the starting position and after run
termination. In this paper, a user-friendly gesture-based embedded system called the Magic Glove is developed for
remote control of a robot. The system consists of a microcontroller and sensors that is worn by the operator as a glove
and is capable of recognizing hand signals. These are then transmitted through wireless communication to the robot. The
design of the Magic Glove included contributions on two fronts: hardware configuration and algorithm development. A
triple axis accelerometer used to detect hand orientation passes the information to a microcontroller, which interprets the
corresponding vehicle control command. A Bluetooth device interfaced to the microcontroller then transmits the
information to the vehicle, which acts accordingly.
The user-friendly Magic Glove was successfully demonstrated first in a Player/Stage simulation environment. The
gesture-based functionality was then also successfully verified on an actual robot and demonstrated to judges at the 2010
IGVC.

Unmanned ground vehicles (UGVs) allow people to remotely access and perform tasks in dangerous or inconvenient
locations more effectively. They have been successfully used for practical applications such as mine
detection, sample retrieval, and exploration and mapping. One of the fundamental requirements for the autonomous
operation of any vehicle is the capability to traverse its environment safely. To accomplish this, UGVs
rely on the data from their on-board sensors to solve the problems of localization, mapping, path planning, and
controls. This paper proposes a combined mapping, path planning, and controls solution that will allow a skidsteer
UGV to navigate safely through unknown environments and reach a goal location. The mapping algorithm
generates 2D maps of the traversable environment, the path planner uses these maps to find kinodynamically
feasible paths to the goal, and the tracking controller ensures that the vehicle stays on the generated path during
traversal. All of the algorithms are computationally efficient enough to run onboard the robot in real-time, and
the proposed solution has been experimentally verified on a custom built skid-steer vehicle allowing it to navigate
to desired GPS waypoints through a variety of unknown environments.

In order to maximize the use of a robotic probe during its limited lifetime, scientists immediately have to be provided the
best achievable visual quality of 3D data products. The EU FP7-SPACE Project PRoVisG (2008-2012) develops
technology for the rapid processing and effective representation of visual data by improving ground processing facilities.
In September 2011 PRoVisG held a Field Trials campaign in the Caldera of Tenerife to verify the implemented 3D
Vision processing mechanisms and to collect various sets of reference data in representative environment. The campaign
was strongly supported by the Astrium UK Rover Bridget as a representative platform which allows simultaneous onboard
mounting and powering of various vision sensors such as the Aberystwyth ExoMars PanCam Emulator (AUPE).
The paper covers the preparation work for such a campaign and highlights the experiments that include standard
operations- and science- related components but also data capture to verify specific processing functions.
We give an overview of the captured data and the compiled and envisaged processing results, as well as a summary of
the test sites, logistics and test assets utilized during the campaign.

Loop closing is a fundamental part of 3D simultaneous localization and mapping (SLAM) that can greatly enhance
the quality of long-term mapping. It is essential for the creation of globally consistent maps. Conceptually, loop
closing is divided into detection and optimization. Recent approaches depend on a single sensor to recognize
previously visited places in the loop detection stage. In this study, we combine data of multiple sensors such as
GPS, vision, and laser range data to enhance detection results in repetitively changing environments that are
not sufficiently explained by a single sensor. We present a fast and robust hierarchical loop detection algorithm
for outdoor robots to achieve a reliable environment representation even if one or more sensors fail.

Linear Dimensionality Reduction (LDR) techniques have been increasingly important in computer vision and
pattern recognition since they permit a relatively simple mapping of data onto a lower dimensional subspace,
leading to simple and computationally efficient classification strategies. Recently, many linear discriminant methods
have been developed in order to reduce the dimensionality of visual data and to enhance the discrimination
between different groups or classes. Although many linear discriminant analysis methods have been proposed in
the literature, they suffer from at least one of the following shortcomings: i) they require the setting of many
parameters (e.g., the neighborhood sizes for homogeneous and heterogeneous samples), ii) they suffer from the
Small Sample Size problem that often occurs when dealing with visual data sets for which the number of samples
is less than the dimension of the sample, and iii) most of the traditional subspace learning methods have to
determine the dimension of the projected space by either cross-validation or exhaustive search. In this paper, we
propose a novel margin-based linear embedding method that exploits the nearest hit and the nearest miss samples
only. Our proposed method tackles all the above shortcomings. It finds the projection directions such that
the sum of local margins is maximized. Our proposed approach has been applied to the problem of appearancebased
face recognition. Experimental results performed on four public face databases show that the proposed
approach can give better generalization performance than the competing methods. These competing methods
used for performance comparison were: Principal Component Analysis (PCA), Locality Preserving Projections
(LPP), Average Neighborhood Margin Maximization (ANMM), and Maximally Collapsing Metric Learning algorithm
(MCML). The proposed approach could also be applied to other category of objects characterized by
large variations in their appearance.

In this paper we discuss foreground detection and human body silhouette extraction and tracking in monocular video
systems designed for human motion analysis applications. Vision algorithms face many challenges when it comes to
analyze human activities in non-controlled environments. For instance, issues like illumination changes, shadows,
camouflage and occlusions make the detection and the tracking of a moving person a hard task to accomplish. Hence,
advanced solutions are required to analyze the content of video sequences.
We propose a real-time, two-level foreground detection, enhanced by body parts tracking, designed to efficiently extract
person silhouette and body parts for monocular video-based human motion analysis systems. We aim to find solutions
for different non-controlled environment challenges, which make the detection and the tracking of a moving person a
hard task to accomplish. On the first level, we propose an enhanced Mixture of Gaussians, built on both chrominanceluminance
and chrominance-only spaces, which handles global illumination changes. On the second level, we improve
segmentation results, in interesting areas, by using statistical foreground models updated by a high-level tracking of body
parts. Each body part is represented with a set of template characterized by a feature vector built in an initialization
phase. Then, high level tracking is done by finding blob-template correspondences via distance minimization in feature
space. Correspondences are then used to update foreground models, and a graph cut algorithm, which minimizes a
Markov random field energy function containing these models, is used to refine segmentation. We were able to extract a
refined silhouette in the presence of light changes, noise and camouflage. Moreover, the tracking approach allowed us to
infer information about the presence and the location of body parts even in the case of partial occlusion.

The adversary in current threat situations can no longer be identified by what they are, but by what they are doing. This
has lead to a large increase in the use of video surveillance systems for security and defense applications. With the
quantity of video surveillance at the disposal of organizations responsible for protecting military and civilian lives comes
issues regarding the storage and screening the data for events and activities of interest.
Activity recognition from video for such applications seeks to develop automated screening of video based upon the
recognition of activities of interest rather than merely the presence of specific persons or vehicle classes developed for
the Cold War problem of "Find the T72 Tank". This paper explores numerous approaches to activity recognition, all of
which examine heuristic, semantic, and syntactic methods based upon tokens derived from the video.
The proposed architecture discussed herein uses a multi-level approach that divides the problem into three or more tiers
of recognition, each employing different techniques according to their appropriateness to strengths at each tier using
heuristics, syntactic recognition, and HMM's of token strings to form higher level interpretations.

Performing efficient view frustum culling is a fundamental problem in computer graphics. In general, an octree is used
for view frustum culling. The culling checks the intersection of each octree node (cube) against the planes of the view
frustum. However, this involves many calculations. We propose a method for fast detecting the intersection of a plane
and a cube in an octree structure. When we check which child of the octree node intersects a plane, we compare the
coordinates of the corner of the node and the plane. Using an octree, we calculate the vertices of the child node by using
the vertices of the parent node. To find points within a convex region, a visibility test is performed by AND operation
with the result of three or more planes. In experiments, we tested the problem of searching for the visible point with a
camera. The method was two times faster than the conventional method, which detects a visible octree node by using the
inner product of the plane and each corner of the node.

The Lucas-Kanade algorithm and its variants have been successfully used for numerous works in computer vision,
which include image registration as a component in the process. In this paper, we propose a Lucas-Kanade based
image registration method using camera parameters. We decompose a homography into camera intrinsic and
extrinsic parameters, and assume that the intrinsic parameters are given, e.g., from the EXIF information of
a photograph. We then estimate only the extrinsic parameters for image registration, considering two types of
camera motions, 3D rotations and full 3D motions with translations and rotations. As the known information
about the camera is fully utilized, the proposed method can perform image registration more reliably. In addition,
as the number of extrinsic parameters is smaller than the number of homography elements, our method runs
faster than the Lucas-Kanade based registration method that estimates a homography itself.

Scenarios for a manned mission to the Moon or Mars call for astronaut teams to be accompanied by semiautonomous
robots. A prerequisite for human-robot interaction is the capability of successfully tracking humans
and objects in the environment. In this paper we present a system for real-time visual object tracking in 2D
images for mobile robotic systems. The proposed algorithm is able to specialize to individual objects and to adapt
to substantial changes in illumination and object appearance during tracking. The algorithm is composed by two
main blocks: a detector based on Histogram of Oriented Gradient (HOG) descriptors and linear Support Vector
Machines (SVM), and a tracker which is implemented by an adaptive Rao-Blackwellised particle filter (RBPF).
The SVM is re-trained online on new samples taken from previous predicted positions. We use the effective
sample size to decide when the classifier needs to be re-trained. Position hypotheses for the tracked object are
the result of a clustering procedure applied on the set of particles. The algorithm has been tested on challenging
video sequences presenting strong changes in object appearance, illumination, and occlusion. Experimental tests
show that the presented method is able to achieve near real-time performances with a precision of about 7 pixels
on standard video sequences of dimensions 320 × 240.

Robotic vision is nowadays one of the most challenging branches of robotics. In the case of a humanoid robot, a robust
vision system has to provide an accurate representation of the surrounding world and to cope with all the constraints
imposed by the hardware architecture and the locomotion of the robot. Usually humanoid robots have low computational
capabilities that limit the complexity of the developed algorithms. Moreover, their vision system should perform in real
time, therefore a compromise between complexity and processing times has to be found. This paper presents a reliable
implementation of a modular vision system for a humanoid robot to be used in color-coded environments. From image
acquisition, to camera calibration and object detection, the system that we propose integrates all the functionalities needed
for a humanoid robot to accurately perform given tasks in color-coded environments. The main contributions of this paper
are the implementation details that allow the use of the vision system in real-time, even with low processing capabilities,
the innovative self-calibration algorithm for the most important parameters of the camera and its modularity that allows its
use with different robotic platforms. Experimental results have been obtained with a NAO robot produced by Aldebaran,
which is currently the robotic platform used in the RoboCup Standard Platform League, as well as with a humanoid build
using the Bioloid Expert Kit from Robotis. As practical examples, our vision system can be efficiently used in real time
for the detection of the objects of interest for a soccer playing robot (ball, field lines and goals) as well as for navigating
through a maze with the help of color-coded clues. In the worst case scenario, all the objects of interest in a soccer game,
using a NAO robot, with a single core 500Mhz processor, are detected in less than 30ms. Our vision system also includes
an algorithm for self-calibration of the camera parameters as well as two support applications that can run on an external
computer for color calibration and debugging purposes. These applications are built based on a typical client-server model,
in which the main vision pipe runs as a server, allowing clients to connect and distantly monitor its performance, without
interfering with its efficiency. The experimental results that we acquire prove the efficiency of our approach both in terms
of accuracy and processing time. Despite having been developed for the NAO robot, the modular design of the proposed
vision system allows it to be easily integrated into other humanoid robots with a minimum number of changes, mostly in
the acquisition module.

In order to achieve highly accurate motion control and path planning for a mobile robot, an obstacle avoidance
algorithm that provided a desired instantaneous turning radius and velocity was generated. This type of obstacle
avoidance algorithm, which has been implemented in California State University Northridge's Intelligent Ground
Vehicle (IGV), is known as Radial Polar Histogram (RPH). The RPH algorithm utilizes raw data in the form of a polar
histogram that is read from a Laser Range Finder (LRF) and a camera. A desired open block is determined from the raw
data utilizing a navigational heading and an elliptical approximation. The left and right most radii are determined from
the calculated edges of the open block and provide the range of possible radial paths the IGV can travel through. In
addition, the calculated obstacle edge positions allow the IGV to recognize complex obstacle arrangements and to slow
down accordingly. A radial path optimization function calculates the best radial path between the left and right most radii
and is sent to motion control for speed determination. Overall, the RPH algorithm allows the IGV to autonomously travel
at average speeds of 3mph while avoiding all obstacles, with a processing time of approximately 10ms.

This paper describes our attempt to optimize a robot control program for the Intelligent Ground Vehicle Competition
(IGVC) by running computationally intensive portions of the system on a commodity graphics processing
unit (GPU). The IGVC Autonomous Challenge requires a control program that performs a number of different
computationally intensive tasks ranging from computer vision to path planning. For the 2011 competition our
Robot Operating System (ROS) based control system would not run comfortably on the multicore CPU on our
custom robot platform. The process of profiling the ROS control program and selecting appropriate modules
for porting to run on a GPU is described. A GPU-targeting compiler, Bacon, is used to speed up development
and help optimize the ported modules. The impact of the ported modules on overall performance is discussed.
We conclude that GPU optimization can free a significant amount of CPU resources with minimal effort for
expensive user-written code, but that replacing heavily-optimized library functions is more difficult, and a much
less efficient use of time.

In this work, we address a situation presented as a new requirement for the Autonomous Challenge portion of the
2011 Intelligent Ground Vehicle Competition (IGVC). This new requirement is to navigate between red and green
colored flags placed within the normal white painted lane lines. The regular vision algorithms had to be enhanced to
reliably identify and localize the colored flags, while the Navigation algorithms had to be modified to satisfy the
constraints placed on the robot while transiting through the flag region. The challenge in finding a solution was the size
of the flags, the possibility of loosing them against the background, as well as their movement in the wind. The
attendant possibility of false positives and negatives also needed to be addressed to increase reliability of detection.
Preliminary tests on the robot have produced positive results.

The IGVC Navigation Challenge course configuration has evolved in complexity to a point where use of a simple
reactive local navigation algorithm presents problems in course completion. A commonly used local navigation
algorithm, the Vector Field Histogram (VFH), is relatively fast and thus suitable when computational capabilities on a
robot are limited. One of the attendant disadvantages of this algorithm is that a robot can get trapped when attempting to
get past a concave obstacle structure. The Navigation Challenge course now has several such structures, including some
that partially surround waypoints. Elaborate heuristics are needed to make VFH viable in such a situation and their
tuning is arduous.
An alternate approach that avoids the use of heuristics is to combine a dynamic path planning algorithm with VFH. In
this work, the D*Lite path planning algorithm is used to provide VFH with intermediate goals, which the latter then uses
as stepping stones to its final destination. Results from simulation studies as well as field deployment are used to
illustrate the benefits of using the local navigator in conjunction with a path planner.

Hybrid optical-digital systems based on diffractive correlator are being actively developed. To correctly estimate
application capabilities of cameras of different types in optical-digital correlation systems knowledge of modulation
transfer function (MTF) and light depended temporal and spatial noises is required. The method for measurement of 2D
MTF is presented. The method based on random target method but instead of a random target the specially created target
with flat power spectrum is used. It allows to measure MTF without averaging 1D Fourier spectra over rows or columns
as is in the random target method and to achieve all values of 2D MTF instead of just two orthogonal cross-sections. The
simple method for measuring the dependence of camera temporal noise on light signal value by shooting a single scene
is described. Measurements results of light and dark spatial and temporal noises of cameras are presented. Procedure for
obtaining camera's light spatial noise portrait (array of PRNU values for all photo sensor pixels) is presented. Results on
measurements of MTF and temporal and spatial noises for consumer photo camera, machine vision camera and videosurveillance
camera are presented.

Nowadays, the widespread use of computer vision algorithms in surveillance systems and autonomous robots has increased the demand for video enhancement algorithms. In this paper, we propose an algorithm based on phase congruency features to detect and remove rain and thus improve the quality of video. We make use of the following characteristics of rain streaks in video in order to detect them: (1) rain streaks do not occlude the scene at all instances, (2) all the rain streaks in a frame are oriented in a single direction, and (3) presence of rain streak at a particular pixel causes a positive change in intensity. Combining all these properties we are able to detect rain streaks in a particular frame using phase congruency features. The pixels in a frame which are identified as rain streaks are then replaced using the pixel information of its spatial and temporal neighbors which are not affected by rain. When this method is used in conjunction with phase correlation, we are able to remove rain of medium density from videos even when complex camera movement is involved.

This paper will discuss the approach to autonomous navigation used by "Q," an unmanned ground vehicle designed by
the Trinity College Robot Study Team to participate in the Intelligent Ground Vehicle Competition (IGVC). For the
2011 competition, Q's intelligence was upgraded in several different areas, resulting in a more robust decision-making
process and a more reliable system. In 2010-2011, the software of Q was modified to operate in a modular parallel
manner, with all subtasks (including motor control, data acquisition from sensors, image processing, and intelligence)
running simultaneously in separate software processes using the National Instruments (NI) LabVIEW programming
language. This eliminated processor bottlenecks and increased flexibility in the software architecture. Though overall
throughput was increased, the long runtime of the image processing process (150 ms) reduced the precision of Q's realtime
decisions. Q had slow reaction times to obstacles detected only by its cameras, such as white lines, and was limited
to slow speeds on the course. To address this issue, the image processing software was simplified and also pipelined to
increase the image processing throughput and minimize the robot's reaction times. The vision software was also
modified to detect differences in the texture of the ground, so that specific surfaces (such as ramps and sand pits) could
be identified. While previous iterations of Q failed to detect white lines that were not on a grassy surface, this new
software allowed Q to dynamically alter its image processing state so that appropriate thresholds could be applied to
detect white lines in changing conditions. In order to maintain an acceptable target heading, a path history algorithm was
used to deal with local obstacle fields and GPS waypoints were added to provide a global target heading. These
modifications resulted in Q placing 5th in the autonomous challenge and 4th in the navigation challenge at IGVC.

A novel algorithm for hierarchical multi-level image mosaicing for autonomous navigation of UAV is proposed.
The main contribution of the proposed system is the blocking of the error accumulation propagated along
the frames, by incrementally building a long-duration mosaic on the fly which is hierarchically composed of
short-duration mosaics. The proposed algorithm fulfills the real-time processing requirements in autonomous
navigation as follows. 1) Causality: the current output of the mosaicing system depends only on the current
and/or previous input frames, contrary to existing offline mosaic algorithms that depend on future input frames as
well. 2) Learnability: the algorithm autonomously analyzes/learns the scene characteristics. 3) Adaptability: the
system automatically adapts itself to the scene change and chooses the proper methods for feature selection (i.e.,
the fast but unreliable LKT vs. the slow but robust SIFT). The evaluation of our algorithm with the extensive
field test data involving several thousand airborne images shows the significant improvement in processing time,
robustness and accuracy of the proposed algorithm.

We present an electrically-actuated adaptive fluidic lens having a 10-mm aperture, 4-diopter range and center-thickness
less than 1 mm. The lens employs dual deflectable glass membranes encasing an optical fluid. A piezoelectric ring
bender actuator draws less than 1 mW and is built into the 25-mm-diameter lens housing. The adaptive lens
demonstrates resolution comparable to commercial precision glass singlet lenses of similar format over a wide range of
field angles and focal powers. Focal power vs. voltage, resolution, modulation transfer function (MTF), life testing and
dynamic response are examined and show that the lens is suitable for numerous adaptive lens applications demanding
high optical quality.

Keywords/Phrases

Keywords

in

Remove

in

Remove

in

Remove

+ Add another field

Search In:

Proceedings

Volume

Journals +

Volume

Issue

Page

Journal of Applied Remote SensingJournal of Astronomical Telescopes Instruments and SystemsJournal of Biomedical OpticsJournal of Electronic ImagingJournal of Medical ImagingJournal of Micro/Nanolithography, MEMS, and MOEMSJournal of NanophotonicsJournal of Photonics for EnergyNeurophotonicsOptical EngineeringSPIE Reviews