We consider the problem of object classification from image data. Significant challenges are presented when objects
can be imaged from different view angles and have different distortions. For example, a vehicle will appear completely
different depending on the viewing angle of the sensor but must still be classified as the same vehicle. In regards to face
recognition, a person may have a variety of facial expressions and a pattern recognition algorithm would need to account
for these distortions. Traditional algorithms such as PCA filters are linear in nature and cannot account for the underlying
non-linear structure which characterizes an object. We examine nonlinear manifold techniques applied to the pattern
recognition problem. One mathematical construct receiving significant research attention is diffusion maps, whereby the
underlying training data are remapped so that Euclidean distance in the mapped data is equivalent to the manifold
distance of the original dataset. This technique has been used successfully for applications such as data organization,
noise filtering, and anomaly detection with only limited experiments with object classification. For very large datasets
(size N), pattern classification with diffusion maps becomes rather onerous as there is a requirement for the eigenvectors
of an NxN matrix. We characterize the performance of a 40 person facial recognition problem with standard K-NN
classifier, a diffusion distance classifier, and standard PCA. We then develop a local subspace projection algorithm
which approximates the diffusion distance without the prohibitive computations and shows comparable classification
performance.

This paper describes a system for multiple-object recognition and segmentation that (1) correctly identifies objects in a
natural scene and provides a boundary for each object, (2) can identify multiple occurrences of the same object (e.g., two
identical objects, side-by-side) in the scene from different training views. The algorithm is novel in that it employs
statistical modeling to efficiently prune features from an identified object from the scene without disturbing similar
features elsewhere in the scene. The originality of the approach allows one to analyze complex scenes that occur in
nature contain multiple instances of the same object.

This paper describes a fast and robust bio-inspired method for change detection in high-resolution visual imagery. It is
based on the computation of surprise, a dynamic analogue to visual saliency or attention, that uses very little processing
beyond that of the initial computation of saliency. This is different from prior surprise algorithms, which employ
complex statistical models to describe the scene and detect anomalies. This algorithm can detect changes in a busy scene
(e.g., a person crawling in bushes or a vehicle moving in a desert) in real-time on typical video frame rates and can be
used as a front-end to a larger system that includes object recognition and scene understanding modules that operate on
the detected surprising regions.

The usefulness of ground penetrating radar to detect landmines has been limited because of low signal-to-clutter ratios
which result in high false alarm rates. We describe a method using polarimetric radar to measure the polarizability angle,
the relative phase, and the target magnitude. These three independent quantities are directly related to target shape and
dimensions and are invariant with respect to rotation about the sensor-to-target axis. We built a forward-looking
polarimetric ground penetrating radar and used it to collect data on an automobile disk brake rotor on the surface of dry
sand and buried 1 in under the surface of the sand. Measurements were made over a frequency range of 1.35-2.14 GHz.
We also performed a computer simulation using the Method of Moments of a target roughly shaped like the rotor. For
the simulation and the measured data, the target magnitude exhibited an interference patterns from scattering centers at
the edges. The computer simulation revealed that a target has characteristic frequencies marking transitions from
reflection being dominated by one polarization state to reflection being dominated by the orthogonal polarization state.
For the rotor in uneven ground the characteristic frequencies were found at the maxima of the polarizability angle. At
these particular frequencies, the relative phase changes sign. The characteristic frequencies may be useful as a target
signature.

Informative representations are those representations that do more than reconstruct the data; they have information
embedded implicitly in them and are compressive for utilization in real-time Automatic Target Recognition. In this
paper we create methods for embedding information in subspace bases through sparsity and information theoretic
measures. We present a theory of informative bases and demonstrate some practical examples of basis learning
using infrared imagery. We will employ sparsity and entropy measures to drive the learning process to extract the
most informative representation and will draw relations between informative representations and the quadratic
correlation filter.

We calculate the raw and central moments for a noise model that is the sum of elementary signals.
In addition we obtain expressions for the scintillation index. The case where the elementary signals
are real and complex are considered and the relationships between the two are derived.

We extend a recent method by Kay that maximizes the probability of detecting an elastic object in the presence
of Gaussian reverberation and additive Gaussian interference. Kay's solution specifies the spectral magnitude
for the optimal transmit waveform, and hence there is an unlimited number of "optimal" waveforms that can
be transmitted, all with the same spectral magnitude but differing in terms of time domain characteristics
such as duration and peak power. We extend Kay's approach in order to obtain a unique optimal waveform
by incorporating time-domain constraints, via two optimization problem formulations. One approach yields a
waveform that preserves the optimal spectral magnitude while achieving the minimum temporal duration. The
second complementary approach considers temporal concentration rather than duration, and yields a waveform
that, depending on the degree of concentration imposed, achieves the optimal the spectral magnitude to varying
degrees.

We consider the probability distribution for intensity for a reverberation model that is the sum of
elementary signals. The random aspect is that the initial spatial means of the elementary signals
are chosen randomly. The intensity is calculated at fixed positions and time after the pulses
evolves. A number of cases are done analytically. Otherwise we study the probability distribution
by simulation.

Propagation effects, such as dispersion, absorption and multi-path, can adversely
impact classification of underwater objects from their sonar backscatter. One approach to handling
this problem is to extract features from the wave that are minimally affected by propagation effects, if
possible. In previous work, a signal processing and feature extraction method was developed to obtain
moment-like features that are invariant to dispersion and absorption. The method was developed based
on linear wave propagation in range- independent environments. However, most ocean environments,
especially littoral environments, exhibit range dependence. Deriving propagation invariant features for
such environments remains an especially challenging task. In this paper, we explore the classification
utility of the previously developed range-independent features in a range-dependent environment, via
simulation of the propagation of the backscatter from two different cylinders in an ideal wedge. Our
simulation results show that, while performance does drop off for increasing distances in a range dependent
environment, the previously developed invariant moment features do provide better classification
performance than ordinary temporal moments.

This research investigates the automatic detection of a dismounted human from a single image as a function of range.
The histogram of oriented gradients (HOG) method provides the feature vector and a support vector machine performs
the classification. This work presents, for the first time, an understanding of how HOG for human detection holds up as
range increases. The results indicate that HOG remains effective even at long distances; for example, the average miss
rate and false alarm rate were both kept to 5% for humans only 12 pixels high and 4-5 pixels wide. The impact of the
amount and type of training data needed to achieve this long-range performance is examined.

A new model-based human body tracking framework with learned-based theory is proposed in this paper. This
framework introduces a likely model set-variable structure multiple models (LMS-VSMM) to track articulated human
motion in monocular images sequences. The key joint points are selected as image feature, which are detected
automatically and the undetected points are estimated with Particle filters, multiple motion models are learned from
CMU motion capture database with ridge regression method to direct tracking. In tracking, motion models currently in
effect switches from one to another in order to match the present human motion mode. The motion model is activated
according to the change in projection angle of kinematic chain, and topological and compatibility relationship among
them. It is terminated according to their model probabilities. And likely model set schemes of VSMM is used to estimate
the quaternion vectors of joints rotation. Experiments using two videos demonstrate this tracking framework is efficient
with respect to 3D pose and 2D projection.

A method in Curvelet transformation and integrating heterogeneous features for human detection are proposed in this
paper. The descriptor based on the second generation Curvelet transform (CTD) was proposed firstly, it concatenated the
edge and texture feature vectors. To capture edge features, the statistic measures such as energy, entropy, standard
deviation, max value and contrast computed from the blocks which is partitioned from the sub-bands of all the scales are
concatenated. To get texture features, the lowest frequency sub-band coefficients were partitioned into overlapped blocks.
Four co-occurrence matrixes were computed for each block. And some descriptors such as angular second-moment,
contrast, correlation, sum of variance, sum of average and entropy are computed from the co-occurrence matrix, which
are concatenated as the texture feature vector. And then the method integrating three feature extraction methods, such as
Histogram of Oriented Gradient (HOG), Granularity-tunable Gradients Partition descriptors (GGP), and CTD, is
proposed for human detection. Computational Cost Normalized classification Margin is used to determine the order of
the feature to be evaluated. The experimental results on the basis of INRIA and MIT human database showed that CTD
and integrating heterogeneous feature method increased the detection accuracy comparing to HOG and GGP.

In most of today's surveillance tasks, people's actions are the focus of attention. A prerequisite for action interpretation
is a stable tracking of people to build meaningful trajectories. Specifically in surveillance applications,
not only trajectories on agent level are of interest, but also interpretation on the level of limbs provides important
information when it comes to more sophisticated action recognition tasks. In this paper, we present an
integrated approach to detect and track people and their body parts in thermal imagery. For that, we introduce
a generic detection and tracking strategy that employs only local image features and thus works independently
of underlying video data specifics like color information - making it applicable to both, visible and infrared data.
In addition, we show how this approach serves to detect a person's body parts and extract trajectories which
can be input for further interpretation purposes.

In addition to detecting and tracking persons via video surveillance in public spaces like airports and train
stations, another important aspect of a situation analysis is the appearance of objects in the periphery of a
person. Not only from a military perspective, in certain environments, an unidentified armed person can be
an indicator for a potential threat. In order to become aware of an unidentified armed person and to initiate
counteractive measures, the ability to identify persons carrying weapons is needed. In this paper we present a
classification approach, which fits into an Implicit Shape Model (ISM) based person detection and is capable
to differentiate between unarmed persons and persons in an aiming body posture. The approach relies on
SIFT features and thus is completely independent of sensor-specific features which might only be perceivable
in the visible spectrum. For person representation and detection, a generalized appearance codebook is used.
Compared to a stand-alone person detection strategy with ISM, an additional training step is introduced that
allows interpretation of a person hypothesis delivered by the ISM. During training, the codebook activations and
positions of participated features are stored for the desired classes, in this case, persons in an aiming posture
and unarmed persons. With the stored information, one is able to calculate weight factors for every feature
participating in a person hypothesis in order to derive a specific classification model. The introduced model is
validated using an infrared dataset which shows persons in aiming and non-aiming body postures from different
angles.

Multi-frame correlation filters have been recently reported in the literature for the detection of
moving objects. Introduced by Kerekes and Kumar [5], this technique uses a motion model to
accumulate evidence over time in a Bayesian framework to improve the receiver operating
characteristic (ROC) curve. In this paper, we generalize the approach to not only detect objects,
but also their activities by using separate motion models to represent each activity. We also
discuss results of preliminary simulations using publicly released aerial data set to illustrate the
concept.

Modern multi- and hyper-dimensional processing problems, such as those encountered in many applications
involving image processing, adaptive beamforming, hyperspectral IR detection, medical imaging, STAP,
Volterra calibration, etc., are numerically very demanding due to the vast amounts of data involved. Further
compounding the situation is the fact that many such applications require estimating a set of parameters of
interest that may be so large that the data available, despite its massiveness, may not be enough to properly
calculate the pertinent statistics.
The approach presented here addresses such problems by projecting the available data - both, modeled and
measured - into a reduced-dimensionality domain where the estimation process is then performed. This
strategy is extremely useful when the parameter set is not the final objective per se, but rather just a means to
an end (e.g., a classification decision, detecting a signal of interest, etc.). In particular, we will concentrate on
the case of finding the optimal projector for a given problem of interest where a priori information may be
available. This means that the reduced-dimensionality domain must be selected as one incorporating and
preserving that knowledge. We explore the use of Krylov Subspaces to achieve this end, as they inherently
allow the inclusion of such data.
In order to maintain a visage of practicality, we have chosen to present our developments from the
perspective of the adaptive processing (filtering) problem, as this enables our presentation to be applicable to
the endless expanse of optimization problems that can be addressed via a Least Squares formulation.
Regularization issues, as well as extensions to non-linear filters (Taylor/Volterra/polynomial), will also be
presented so as to provide additional ideas regarding the usefulness and malleability of our methods.

It has been proven that 3D ladar imagery has a strong potential for automatic target detection (ATD) and
automatic target recognition (ATR); ladars enhance target information, which may then be exploited to yield
higher recognition rates and lower false alarms. Although numerous techniques have been proposed for both 3D
ATD and 3D ATR, no single approach has proven capable of systematically outperforming all other techniques
for every possible scenario. In this context, this paper describes a set of fast 3D ATD/ATR algorithms designed
to process cooperative targets in airborne 3D ladar imagery. This algorithmic chain consists of four modules:
detection, segmentation, classification and recognition. In each module, fast algorithms were implemented, some
of which stem from open literature while others were designed in-house. The purpose of this algorithmic chain
is to provide a baseline approach for efficient processing of simple scenarios. The ultimate goal of this work is
to characterize and compare algorithms with respect to increasingly complex scenarios, in hopes of progressing
towards an adaptive processing pipeline for context-driven 3D ATD/ATR. In this paper, the four modules of
the baseline processing pipeline are first described. Preliminary test results obtained with real airborne ladar
imagery are then presented, in which fast and accurate 3D ATD/ATR is performed with a library of 20 scanned
vehicles. Finally, a demonstration is presented to illustrate how this baseline approach may be expanded to
tackle more complex scenarios, such as non-cooperative targets concealed under vegetation.

A new generation of high-resolution surveillance cameras makes it possible to apply video processing and recognition
techniques on live video feeds for the purpose of automatically detecting and identifying objects and events of interest.
This paper addresses a particular application of detecting and identifying vehicles passing through a checkpoint. This
application is of interest to border services agencies and is also related to many other applications. With many
commercial automated License Plate Recognition (LPR) systems available on the market, some of which are available as
a plug-in for surveillance systems, this application still poses many unresolved technological challenges, the main two of
which are: i) multiple and often noisy license plate readings generated for the same vehicle, and ii) failure to detect a
vehicle or license plate altogether when the license plate is occluded or not visible. This paper presents a solution to both
of these problems. A data fusion technique based on the Levenshtein distance is used to resolve the first problem. An
integration of a commercial LPR system with the in-house built Video Analytic Platform is used to solve the latter. The
developed solution has been tested in field environments and has been shown to yield a substantial improvement over
standard off-the-shelf LPR systems.

In hyperspectral imaging applications, the background generally exhibits a clearly non-Gaussian impulsive behavior,
where valuable information stays in the tail. In this paper, we propose a new technique, where the background is
modeled using the stable distribution for robust detection of outliers. The outliers of the distribution can be considered as
potential anomalies or regions of interests (ROIs). We effectively utilize the stable model for detecting targets in
impulsive hyperspectral data. To decrease the false alarm rate, it is necessary to compare the ROI with the known
reference using a suitable technique, such as the Euclidian distance. Modeling data with stable distribution compensates
a drawback of the Gaussian model, which is not well suited for describing signals with impulsive behavior. In addition,
thresholding is considered to avoid misclassification of targets. Test results using real life hyperspectral image datasets
are presented to verify the effectiveness of the proposed technique.

Current ISR (Intelligence, Surveillance, and Reconnaissance) systems require an analyst to observe each video stream,
which will result in analyst overload as systems such as ARGUS or Gorgon Stare come into use with many video
streams generated by those sensor platforms. Full exploitation of these new sensors is not possible using today's one
video stream per analyst paradigm. The Contextual Visual Dataspace (CVD) is a compact representation of real-time
updating of dynamic objects from multiple video streams in a global (geo-registered/annotated) view that combines
automated 3D modeling and semantic labeling of a scene. CVD provides a single integrated view of multiple
automatically-selected video windows with 3D context. For a proof of concept, a CVD demonstration system performing
detection, localization, and tracking of dynamic objects (e.g., vehicles and pedestrians) in multiple infrastructure camera
views was developed using a combination of known computer vision methods, including foreground detection by
background subtraction, ground-plane homography mapping, and appearance model-based tracking. Automated labeling
of fixed and moving objects enables intelligent context-aware tracking and behavior analysis and will greatly improve
ISR capabilities.

This paper presents a further step of a research toward the development of a quick and accurate weapons identification
methodology and system. A basic stage of this methodology is the automatic acquisition and updating of weapons
ontology as a source of deriving high level weapons information. The present paper outlines the main ideas used to
approach the goal. In the next stage, a clustering approach is suggested on the base of hierarchy of concepts. An inherent
slot of every node of the proposed ontology is a low level features vector (LLFV), which facilitates the search through
the ontology. Part of the LLFV is the information about the object's parts. To partition an object a new approach is
presented capable of defining the objects concavities used to mark the end points of weapon parts, considered as
convexities. Further an existing matching approach is optimized to determine whether an ontological object matches the
objects from an input image. Objects from derived ontological clusters will be considered for the matching process.
Image resizing is studied and applied to decrease the runtime of the matching approach and investigate its rotational and
scaling invariance. Set of experiments are preformed to validate the theoretical concepts.

We present a mathematical technique for estimating new perspective views of an object
from a single image. Unlike traditional graphics or ray tracing methods, our approach
treats the view-morphing problem as a 2-D linear prediction process. We first estimate
the prediction parameters in a reduced dimensional space using features extracted from
"training" images of the object. Given an arbitrary view of the object, the features of the
new view are linearly predicted from which the morphed image of the object is
reconstructed. The proposed approach can be used for rapidly incorporating new objects
in the knowledge base of a computer vision system and may have advantages in low-contrast
situations where it is difficult to establish correspondence between sample views.

Present descriptors for Automatic Target Recognition (ATR) performance are inadequate for use in comparing
algorithms that are purported to be a solution to the problem. The use of receiver operator characteristic curves (ROCs)
is a defacto standard, but they do not communicate several key performance measures, including (i) intrinsic separation
between classes in the input space, (ii) the efficacy of the mapping induced by the algorithm, (iii) the complexity of the
algorithmic mapping, and (iv) a measure of the generalization of the proposed solution. Previous work by Sims et. al.2,5
has addressed the distortion of the evaluation sets to indicate an algorithm's capability (or lack thereof) for generalization
and handling of unspecified cases. This paper addresses the rethinking of the summary statistics used for understanding
the performance of a solution. We propose new approaches for solution characterization, allowing algorithm
performance comparison in an equitable and insightful manner. This paper proffers some examples and suggests
directions for new work from the community in this field.

Synthetic aperture radar (SAR) exploitation algorithms typically rely on the use of derived features to represent
the target. These features are chosen to discriminate between target classes while exhibiting robustness to
noise and calibration artifacts. One of the challenges in working with such features, is understanding when this
assumption of robustness is no longer valid. In this paper, we focus on characterizing the performance of the
gray scale quantization feature in the presence of additive noise. We derive an approximation for the variance
of the intraclass distance by treating the additive noise as an independently identically distributed (iid) process.
The analytic model is contrasted with empirical results for a two class problem.

The most challenging problem of Automatic Target Recognition (ATR) is the extraction of robust and independent target
features which describe the target unambiguously. These features have to be robust and invariant in different senses: in
time, between aspect views (azimuth and elevation angle), between target motion (translation and rotation) and between
different target variants. Especially for ground moving targets in military applications an irregular target motion is
typical, so that a strong variation of the backscattered radar signal with azimuth and elevation angle makes the extraction
of stable and robust features most difficult. For ATR based on High Range Resolution (HRR) profiles and / or Inverse
Synthetic Aperture Radar (ISAR) images it is crucial that the reference dataset consists of stable and robust features,
which, among others, will depend on the target aspect and depression angle amongst others. Here it is important to find
an adequate data grid for an efficient data coverage in the reference dataset for ATR.
In this paper the variability of the backscattered radar signals of target scattering centers is analyzed for different HRR
profiles and ISAR images from measured turntable datasets of ground targets under controlled conditions. Especially the
dependency of the features on the elevation angle is analyzed regarding to the ATR of large strip SAR data with a large
range of depression angles by using available (I)SAR datasets as reference. In this work the robustness of these
scattering centers is analyzed by extracting their amplitude, phase and position. Therefore turntable measurements under
controlled conditions were performed targeting an artificial military reference object called STANDCAM. Measures
referring to variability, similarity, robustness and separability regarding the scattering centers are defined. The
dependency of the scattering behaviour with respect to azimuth and elevation variations is analyzed.
Additionally generic types of features (geometrical, statistical), which can be derived especially from (I)SAR images, are
applied to the ATR-task. Therefore subsequently the dependence of individual feature values as well as the feature
statistics on aspect (i.e. azimuth and elevation) are presented. The Kolmogorov-Smirnov distance will be used to show
how the feature statistics is influenced by varying elevation angles. Finally, confusion matrices are computed between
the STANDCAM target at all eleven elevation angles. This helps to assess the robustness of ATR performance under the
influence of aspect angle deviations between training set and test set.

Based on measurements of a ship at 17 GHz and on several simulated ships at 35GHz it is demonstrated how multipath
changes the range profiles that form a basis for the construction of ATR features for ship classification. The fluctuation
of range profiles leads to a corresponding fluctuation of feature values that make it difficult to define stable test feature
vectors, and reliable feature references in the training stage.

The ability to accurately classify targets is critical to the performance of automated/assisted target recognition (ATR)
algorithms. Supervised machine learning methods have been shown to be able to classify data in a variety of disciplines
with a high level of accuracy. The performance of machine learning techniques in classifying ground targets in two-dimensional
radar imagery were compared. Three machine learning models were compared to determine which model
best classifies targets with the highest accuracy: decision tree, Bayes', and support vector machine. X-band signature
data acquired in scale-model compact ranges were used. ISAR images were compared using several techniques
including two-dimensional cross-correlation and pixel by pixel comparison of the image against a reference image. The
highly controlled nature of the collected imagery was ideally suited for the inter-comparison of the machine learning
models. The resulting data from the image comparisons were used as the feature space for testing the accuracy of the
three types of classifiers. Classifier accuracy was determined using N-fold cross-validation.