The construction of a symbolic description from a digitized image is a difficult problem. It is our position that effective utilization of uncertain and unreliable information contained in an image requires the application of knowledge-based constraints obtained from world models and expectations. In this paper, we first examine some of the research issues involved in constructing knowledge-based vision systems. We then briefly describe the VISIONS image understanding system and show sample results on two outdoor images. We conclude with an examination of the issues blocking the widespread practical application of knowledge-based vision. A key factor has been the absence of commercially available shells to reduce the impact of systems development on the vision researcher. The K VisionTM system of Amerinex Artificial Intelligence may help to overcome this problem.

The development of autonomous land vehicles (ALV) capable of operating in an unconstrained environment has proven to be a formidable research effort. The unpredictability of events in such an environment calls for the design of a robust perceptual system, an impossible task requiring the programming of a system bases on the expectation of future, unconstrained events. Hence, the need for a "general purpose" machine vision system that is capable of perceiving and understanding images in an unconstrained environment in real-time. The research undertaken at the UCLA Machine Perception Laboratory addresses this need by focusing on two specific issues: 1) the long term goals for machine vision research as a joint effort between the neurosciences and computer science; and 2) a framework for evaluating progress in machine vision. In the past, vision research has been carried out independently within different fields including neurosciences, psychology, computer science, and electrical engineering. Our interdisciplinary approach to vision research is based on the rigorous combination of computational neuroscience, as derived from neurophysiology and neuropsychology, with computer science and electrical engineering. The primary motivation behind our approach is that the human visual system is the only existing example of a "general purpose" vision system and using a neurally based computing substrate, it can complete all necessary visual tasks in real-time.

This paper describes a constraint-based image understanding system (COBIUS) for aerial imagery interpretation. COBIUS consists of knowledge bases for domain objects, spatial and temporal constraints, control strategies, blackboard areas to instantiate hypotheses of objects and constraints, and an image feature database to fuse results from multiple image segmentation modules. Unlike previous efforts on knowledge-based image understanding where domain-specific rules are encoded for image interpretation, in COBIUS, domain-specific knowledge is represented in terms of generic temporal and spatial constraints. A set of generic constraint manipulation and hypothesis formation rules are used to perform formation, evaluation, propagation, and satisfaction of constraints and hypotheses. More importantly, constraints are represented hierarchically and are no different from the representation of domain objects. This representation avoids time-consuming graph matching processes for image interpretation and provides greater representation modularity.

The recognition of terrain features differs from the recognition of mobile targets in important ways. Vehicles can be described by CAD-like models; terrain feature models are less predictive and contain more qualifying information. This demands a different strategy for recognizing terrain features. This paper describes a recognition strategy based on these differences.

The paper to be presented will discuss research on a computer vision system controlled by a neural network capable of learning through classical (Pavlovian) conditioning. Through the use of unconditional stimuli (reward and punishment) the system will develop scan patterns of eye saccades necessary to differentiate and recognize members of an input set. By foveating only those portions of the input image that the system has found to be necessary for recognition the drawback of computational explosion as the size of the input image grows is avoided. The model incorporates many features found in animal vision systems, and is governed by understandable and modifiable behavior patterns similar to those reported by Pavlov in his classic study. These behavioral patterns are a result of a neuronal model, used in the network, explicitly designed to reproduce this behavior.

In 1987, I was privileged to have the opportunity to relate the activities and accomplishments of the DARPA sponsored research community conducting computer vision research under the auspices of the Strategic Computing (SC) Program in a similar paper for the SPIE. In this follow-up paper, I will relate some of the subsequent activities and accomplishments of this critically important national research program.

This paper presents the current status of the DARPA sponsored SCORPIUS program. The SCORPIUS program is an applied research effort whose goal is to combine technologies from DARPA's Image Understanding and Computer architecture research areas in a real world application, automated exploitation of aerial imagery. The vision system under development as well as the parallel processing testbed which is being used to host the vision system is discussed.

Previous work in automatic photointerpretation has performed feature extraction and identification as one-pass sequential processes, at a single global level of detail. However, human photointerpreters perform these actions simultaneously while focusing on local areas of interest or uncertainty, at an appropriate level of detail. This paper describes our system for a piece-wise approximation to human-like photointerpretation, called Multi-Pass Multi-Resolution (MPR) image interpretation. MPR uses multiple passes to approximate simultaneity, and multiple resolutions together with recursive segmentation to achieve varying levels of local detail.

The Shape Intersection Technique uses image processing methods to locate a radar transmitter, receiver or both from radar signals collected passively with an omnidirectional antenna. The method is based on searching a set of transmitter positions, receiver positions and radar parameters until the predicted locations of bistatic reflectors optimally matches a feature reference data base of known locations. Reflector locations are obtained by converting measured pulse times of arrival to two-dimensional locations based on a geometric model of the bistatic radar geometry. The feature reference data base can be extracted from digital terrain elevation data, cultural feature data or directly from registered radar data. The technique has been tested on synthetic data, data from cooperative radars and data from non-cooperative radars. Geolocation can be very accurate when the map features include adequate patterns.

Two methods for building delineation are presented. The first takes as input the result of a semi-automated delineation step and creates an enhanced version based on model and edge knowledge. The second utilizes a combination of edge detection, region growing, and a heuristic to find buildings from their shadows.

Automation of major portions of the imagery exploitation process is becoming a necessity for meeting current and future imagery exploitation needs. In this paper we describe a prototype Automated Exploitation System (AES) which addresses requirements for monitoring objects of interest and situation assessment in large geographic areas. The purpose of AES is to aid the image analyst in performing routine, commonplace tasks more effectively. AES consists of four main subsystems: Cue Extractor (CE), Knowledge-Based Exploitation (KBE), Interactive Work-Station (IWS), and a database subsystem. The CE processes raw image data, and identifies objects and target cues based on pixel- and object-model data. Cues and image registration coefficients are passed to KBE for screening and verification, situation assessment and planning. KBE combines the cues with ground-truth and doctrinal knowledge in screening the cues to determine their importance. KBE generates reports on image analysis which passes on to the IWS from which an image analyst can monitor, observe, and evaluate system functionality as well as respond to critical items identified by KBE. The database subsystem stores and shares reference imagery, collateral information and digital terrain data to support both automated and interactive processing. This partitioning of functions to subsystems facilitates hierarchical application of knowledge in image interpretation. The AES current prototype helps in identification, capture, representation, and refinement of knowledge. The KBE subsystem, which is the primary focus of the present paper, runs on a Symbolics 3675 computer and its software is written in the ART expert system and LISP language.

It is difficult to segment an image according to object, for the geometry of lighting and viewing of three-dimensional objects incurs spatial inhomogeneities (highlights, shading, and cast shadows) in the image. However, the bands of a multispectral image can be used to do the segmentation. We start by assuming that the image field for a uniformly colored object is the sum of a small number of terms, each term being the product of a spatial and a spectral part. The physics of the spatial part is intricate, but the spatial part can be factored out to produce several space-invariant fields of numbers within reflectance boundaries. For an image field either from two light sources on a matte surface or from a single light source on a dielectric surface with highlights, the space-invariant quantities characterizing the object are the components of a particular unit vector in color space. The possibility is discussed of an algorithm for estimating the relative spectral reflectance of an object based on its space-invariant image fields.

In this paper , we bring forward an application of vision in the domain of quality assurance in mineral industry of talc. By using image processing and computer vision means, the proposed real time whiteness captor system intends: - to inspect the whiteness of grinded product, - to manage the mixing of primary talcs before grinding, in order to obtain a final product with predetermined whiteness. The system uses the robotic CCD microcamera MICAM (designed by our laboratory and presently manufactured), a micro computer system based on Motorola 68020 and real time image processing boards. It has the industrial following specifications: - High reliability - Whiteness is determined with a 0.3% precision on a scale of 25 levels. Because of the expected precision, we had to study carefully the lighting system, the type of image captor and associated electronics. The first developped softwares are able to process the withness of talcum powder; then we have conceived original algorithms to control withness of rough talc taking into account texture and shadows. The processing times of these algorithms are completely compatible with industrial rates. This system can be applied to other domains where high precision reflectance captor is needed: industry of paper, paints, ...

We discuss some basic problems encountered when we assemble the results of image analysis on architectures with coarse grain parallelism. Emphasis is placed on strategies that minimize distortions during the recombination of independently processed image tiles. A seaming algorithm is presented which merges the results of mean difference or max-min region based split-and-merge segmentations of individual tiles. Benchmarks obtained on a Sequent B/21 multiprocessor are given to illustrate the performance of the algorithm.

Match points between stereo image pairs can be used to derive terrain elevation data from aerial photography and to determine obstacle and target ranges for camera guided vehicles. These measurements are made possible by the relative image offsets, or parallax, produced when objects at different ranges are imaged from different angles. In other image comparison applications, such as change detection, parallax may constitute a significant nuisance, producing undesired relative image distortions. Parallax removal through pixel by pixel image matching is then necessary before image comparison can be performed. In this study fractional pixel parallax determination at each pixel location is attempted using image cross correlation calculations input to a neural network. Correlation values are obtained between image windows in the left view with a succession of overlay window positions in the right view. High resolution correlation requires small image windows. Correlation peak locations for the small windows are often unreliable match points due to noise and relative parallax distortion. Further processing of the correlation data is necessary to reduce match point errors. A section of correlation data is input to a neural network. The network outputs the parallax offset value for the pixel centered on the section. The network was trained using simulated stereo imagery so that the exact parallax offset at each pixel was known. A network with two "hidden layers", and with symmetries imposed on the cell connection weights, was trained using the back-propagation method. Network results on simulated test sets show a distinct improvement over results using correlation smoothing methods. The trained network was then used on a real image pair for elevation extraction and change detection. The resulting elevation surface and change detection difference image are illustrated and evaluated.

Recovering the three dimensional structure of objects viewed by two eyes or cameras is an ill-posed inverse visual problem. Instead of computing disparities at several spatial resolutions by stereo-matching, and then regularizing the disparities, the authors propose a direct method for recovering depth based on the formulation of the task as a problem in calculus of variations. This method makes use of brightness gradients of the textured surfaces in the scenes. Occlusion cues are also used for arriving at a final depth estimate. In its present form, the method works for nonconvergent (parallel optical axes) stereo images. Surfaces in the scenes are assumed to have a visual texture. The optical flow constraint equation is used. Depth is assumed to vary continuously almost everywhere (i.e., except at depth discontinuities). Standard regularization theory is applied to make the problem well posed. This leads to a quadratic energy function. Standard regularization theory cannot handle discontinuities in the solution space. Line processes are used to recover discontinuous depth fields. A deterministic sequential update procedure is used for estimating the state of the line processes. The solution obtained from standard regularization theory maps directly onto an analog resistive network. The nonlinear solution with line processes is mapped onto a hybrid analog-digital resistive network. The line process update is carried out using a digital computer while the local computation of depth values and smoothing of the solution is done by a resistive network.

A large part of the information content of an image is conveyed by the location of pixels that determine the boundaries between different segments. In absence of prior knowledge about objects in the scene, it is impossible to predict where the edges will occur. The gray levels at locations away from boundaries are often highly correlated. They can be estimated from the values of nearby locations using assumptions based on our knowledge of the physical laws governing reflections from surfaces. Boundaries occur at points of essentially infinite variation. At these points corresponding gray levels result from summing energy reflected by two different regions and cannot be easily estimated from the values at nearby pixels. In this sense gray level values at discontinuities are less predictable.

It has been well established that the AFATL (Air Force Armament Technical Laboratory) Image Algebra is capable of expressing all image-to-image transformations [1,2] and that it is ideally suited for parallel image transformations {3,4]. In this paper we show how the algebra can also be applied to compactly express image-to-feature transforms including such sequential image-to-feature transforms as chain coding.

This paper describes computational techniques for utilizing the relationship between shadows and man-made structures to aid in the automatic extraction of man-made structures from aerial imagery. Four methods are described that perform the prediction of structure shape, grouping of related structures, verification of individual structures, and structure height estimation. In each method the relationship between structure and cast shadows is exploited in a unique fashion. Key issues involve the accurate localization of the structure/shadow boundary and the shadow edge, and attribution of shadow segments to structure hypotheses. We present several examples that show how each method is used within the task of building detection, delineation, and height estimation.

Autometric, Inc. is currently developing a new Geographic Information System (GIS) to be used in the exploitation of multi-source/multi-sensor information. Digital raster imagery data (optical, radar, multispectral, infrared, etc.), weather data, MC&G data (terrain, vegetation, cultural features and transportation features), and text or intelligence information can be interactively overlaid and geographically fused to aid exploitation analysis. This system, called MILGIS, is tailored for military information and applications. This paper describes the architecture of MILGIS and presents a discussion of the role of GIS technology in support of exploitation analysis.

The derivation of the traditional control panel is addressed, including the origins and impacts of requirements for smooth-and-continuous response. The discussion highlights the link between specific MMI techniques and the inherent response expectations which these techniques create in the user. The control panel technique is compared with a point-and-click man-machine interface (MMI) which can be tailored to achieve high production efficiency while reducing workstation cost, including the cost of image processor support.

ASPIPE is an interactive graphical development environment used to program Aspex Incorporated's PIPE Model 1 System. It allows the PIPE programmer to think visually to apply complex multi-dimensional vision algorithms to parallel video-rate hardware.

The work reported in this paper was carried out as part of a contract with MoD (PE) UK. It considers the problems associated with realistic modelling of a passive infrared system in an operational environment. Ideally all aspects of the system and environment should be integrated into a complete end-to-end simulation but in the past limited computing power has prevented this. Recent developments in workstation technology and the increasing availability of parallel processing techniques makes the end-to-end simulation possible. However the complexity and speed of such simulations means difficulties for the operator in controlling the software and understanding the results. These difficulties can be greatly reduced by providing an extremely user friendly interface and a very flexible, high power, high resolution colour graphics capability. Most system modelling is based on separate software simulation of the individual components of the system itself and its environment. These component models may have their own characteristic inbuilt assumptions and approximations, may be written in the language favoured by the originator and may have a wide variety of input and output conventions and requirements. The models and their limitations need to be matched to the range of conditions appropriate to the operational scenerio. A comprehensive set of data bases needs to be generated by the component models and these data bases must be made readily available to the investigator. Performance measures need to be defined and displayed in some convenient graphics form. Some options are presented for combining available hardware and software to create an environment within which the models can be integrated, and which provide the required man-machine interface, graphics and computing power. The impact of massively parallel processing and artificial intelligence will be discussed. Parallel processing will make real time end-to-end simulation possible and will greatly improve the graphical visualisation of the model output data. Artificial intelligence should help to enhance the man-machine interface.

Keywords/Phrases

Keywords

in

Remove

in

Remove

in

Remove

+ Add another field

Search In:

Proceedings

Volume

Journals +

Volume

Issue

Page

Journal of Applied Remote SensingJournal of Astronomical Telescopes Instruments and SystemsJournal of Biomedical OpticsJournal of Electronic ImagingJournal of Medical ImagingJournal of Micro/Nanolithography, MEMS, and MOEMSJournal of NanophotonicsJournal of Photonics for EnergyNeurophotonicsOptical EngineeringSPIE Reviews