Extraction of lines and curves from images is one of the most important and fundamental tasks in machine inspection and computer vision in general. Among all the techniques in detecting lines and curves, the Hough Transform (HT) method is unique in its ability to cope effectively with noise, gaps in outlines and even partial occlusion. In spite of this ability, the HT method is still not widely used in real time applications due to its computationally intensive requirements. One solution to this problem is to find an architecture for parallel processing. Recently, some approaches using parallel architectures have been reported. In real-time applications, the overall time required using these approaches is of the order of hundreds of milliseconds for a typical image of 256 X 256 resolution. Clearly, this speed is not good enough for most image processing and machine vision tasks where line detection is just partial work. Since these architectures use commercial components and are not highly parallel, there is ample opportunity to improve the speed by using neural-like analog circuitry. An original method of higher order curve (HOC) detection using the Hough Transform is presented. This method is computationally very efficient and may yield to hardware implementation, thus making it possible to use the Hough Transform in fast real time applications.

Artificial neural networks capable of learning and recalling stochastic associations between non-deterministic quantities have received relatively little attention to date. One potential application of such stochastic associative networks is the generation of sensory 'expectations' based on arbitrary subsets of sensor inputs to support anticipatory and investigate behavior in sensor-based robots. Another application of this type of associative memory is the prediction of how a scene will look in one spectral band, including noise, based upon its appearance in several other wavebands. This paper describes a semi-supervised neural network architecture composed of self-organizing maps associated through stochastic inter-layer connections. This 'Stochastic Associative Memory' (SAM) can learn and recall non-deterministic associations between multi-dimensional probability density functions. The stochastic nature of the network also enables it to represent noise distributions that are inherent in any true sensing process. The SAM architecture, training process, and initial application to sensor image prediction are described. Relationships to Fuzzy Associative Memory (FAM) are discussed.

We discuss the use of Radial Basis Functions for use in neural networks for hand-printed character recognition. The results are expected to apply to other applications of neural networks for classifying input patterns.

In this paper we present a new network architecture based on a Multi-Link Neural Network (MLNN) model. An MLNN has a structure similar to that of a feedforward network except that each connection between a pair of nodes in the hidden layer and the output layer is made of multiple links with possibly different weight values. This results in an aggregation of a combinatorial number of subnets that themselves can be viewed as ordinary feedforward networks. For an MLNN with M hidden nodes, N output nodes, and K links for each connection, its overall connection status is represented as an MN dimensional region with KMN sampling points in the weight space. Thus the multi-link structure defines a sampling grid over this region.A region-based search algorithm has been developed that, for problems of a complex nature, offers a better chance in locating a global minimum than the traditional backpropagation algorithm.

Two new classes of Boltzmann machines will be introduced in this paper. They are called Probabilistic Boltzmann Machine (PBM) and Automata Boltzmann Machine (ABM) by the author. The ABM is further divided into union ABM, intersection ABM, and mixed ABM. The motivation of this development is to enhance the capability of the Boltzmann machine family. These two new classes of machines are derived by combining the automata neural network developed by the author, the stochastic neural network developed by the author, and the Boltzmann machine.

This paper presents a special sequential neural network for temporal pattern recognition using one memory element per state. The network generates a single response representing a sequence of events by utilizing the process of temporal integration. That is, the response is generated in small increments at each time step by summing in time the recognition result of each event. Models for motion detection and speech recognition based on the proposed network were implemented. Simulation results show that the network is tolerant to noise, and can recognize partial sequences.

Optimization of large neural networks is essential in improving the network speed and generalization power, while at the same time reducing the training error and the network complexity. Boltzmann methods have been used as a statistical method for combinatorial optimization and for the design of learning algorithm. In the networks studied here, the Adaptive Resonance Theory (ART) serves as a connection creation operator and the Boltzmann method serves as a competitive connection annihilation operator. By combining these two methods it is possible to generate small networks that have similar testing and training accuracy and good generalization from small training sets. Our findings demonstrate that for a character recognition problem the number of weights in a fully connected network can be reduced by over 80%. We have applied the Boltzmann criteria to differential pruning of the connections which is based on the weight contents rather than on the number of connections.

Although neural networks are gaining wide acceptance as a vehicle for intelligent software, their use will be limited unless good procedures for their evaluation, including verification and validation, can be developed. Since neural networks are created using a different methodology from conventional software, extensive changes in evaluation concepts and techniques are necessary. This paper proposes some clarifying concepts related to evaluation, and a process for developing neural networks in which the role of evaluation is emphasized. Some ideas about how the various evaluation activities may be performed are also described.

This report describes a GA (Genetic Algorithms) method that evolves multi-layered feedforward neural network architectures for specific mappings. The network is represented as a genotype that has six kinds of genes. They are a learning rate, a slant of sigmoid function, a coefficient of momentum term, an initializing weights range, the number of layers and the unit numbers of each layer. Genetic operators affect populations of these genotypes to produce adaptive networks with higher fitness values. We define three kinds of fitness functions that evaluate networks generated by the GA method. Their fitnesses are assessed for the generated network trained with BP (Back Propagation) algorithm by several network performances. In our experiments, we train the networks for the XOR mapping. They are designed systematically and easily using the GA method. These generated networks require fewer training cycles then networks used until now, and a rate of convergence is improved.

Hybrid Location-Content Addressable Memory (HyLCAM), which has been recently proposed by the present authors, is a new class of neural networks which guarantees learning with a fast learning speed. In this paper, we present the sensitivity analysis of the HyLCAM to weight errors and input errors. The weight sensitivity specifies implementation requirements of the network; the input sensitivity characterizes the rejection capability of the HyLCAM. The rejection capability for unknown patterns is one of the most unique features of the HyLCAM and thus it is important to study the probability of rejection. For the illustration purpose, we solve a simple character recognition problem concerning with the performance of the HyLCAM in terms of learning and sensitivity.

The 'rules' by which information is temporally encoded within biological and artificial neural networks remains poorly understood. To better understand the 'problem' we tested the temporal encoding and extrapolation (generalization) of artificial neural networks on the spiking patterns of 55 simultaneously recorded neurons. The results indicated that to optimize performance a 4-layer network works best both for a feedforward and a recurrent architecture. Further, the results indicate that both hidden layers should have roughly 2 n units and that the learning rate between the input and first hidden layer should be roughly an order of magnitude larger than the learning rates for subsequent layers. Interestingly, the number of hidden units in the final network is consistent with the known histology and function of the neural tissue from which the recordings were obtained. The results suggest that the fan-out architectures typically found in this neural tissue may be crucial to the ability of this organism to function well under novel conditions. And, these results indicate that temporal data may be encoded best in a highly distributed fashion across large numbers of hidden units where the weighted interconnections are largest, more deterministic, in the early layers and smallest, more statistical, in the final layers of the network. Further, the same architectural guidelines developed herein are shown to apply directly to other temporal problems where the information is encoded in a spatially distributed fashion.

As we reported previously, a deterministic hard-limited perceptron is a novel learning system in which the learning mechanism is different from most of the conventional learning systems. It is a noniterative, one-step learning system and yet is achieves most of the goals of the conventional learning. If an optimum algorithm is adopted in the design of this learning system, the learning is very fast and the recognition is very robust. This article reports the experimental results of several computer-implemented schemes of this novel learning system. It is seen that the system takes only one to two minutes to train several given patterns and the recognition of the untrained patterns is more than 80% successful. In one scheme, the recognition rate is almost 100% and the recognition is independent of the pattern size, pattern orientation, and pattern location.

Backpropagation is a supervised learning algorithm for training multi-layer neural networks for function approximation and pattern classification by minimizing a suitably defined error metric (e.g., the mean square error between the desired and actual outputs of the network for a training set) using gradient descent. It does this by calculating the partial derivative of the overall error and changing each weight by a small amount (determined by the learning rate) in a direction that is expected to reduce the error. Despite its success on a number of real-world problems, backpropagation can be very slow (it requires hundreds of passes (epochs) through the training set). Also, its performance is extremely sensitive to the choice of parameters such as the learning rate. The mathematical considerations that go into the derivation of backpropagation require that the learning rate be as small as possible. On the other hand, in order reduce the number training epochs required to learn the desired input-output mapping, it is desirable to make the weight changes as large as possible without causing the error to increase. Also it is desirable to have an algorithm that can change the learning rate dynamically so that it is close to optimal (in terms of reducing the error as much as possible given the local gradient information) at each epoch (thereby eliminating the reliance on guesswork in the choice of the learning rate). Several authors have proposed methods for adaptively adjusting the learning rate based on certain assumptions about the shape of the error surface (e.g., quadratic) and/or estimation of higher order derivatives of the error with respect to the weights at a given point in the weight space. The primary disadvantage of such methods is that the estimation of second order derivatives is not only computationally expensive but also prone to inaccuracy due to the approximation of derivatives by discrete differences. In this paper we propose and evaluate a heuristically motivated method for adaptive modification of the learning rate in backpropagation that does not require the estimation of higher order derivatives. We present a modified version of the Backpropagation learning algorithm which uses a simple heuristic to come up with a learning parameter value at each epoch. We present numerous simulations on real-world data sets, using our modified algorithm. We compare these results with results got through standard backpropagation learning algorithm, and also various modifications of the standard backpropagation algorithm (e.g., flat-spot elimination methods) that have been discussed in the literature. Our simulation results suggest that the adaptive learning rate modification helps substantially speed up the convergence of backpropagation algorithm. Furthermore, it makes the initial choice of the learning rate fairly unimportant as our method allows the learning rate to change and settle at a reasonable value for the specific problem. As a standard against which to compare our results, we computed the quasi-optimal value of the learning parameter at each epoch. Simulation results indicate that our heuristic modification matches the performance of backpropagation with the quasi-optimal learning rate. The computational complexity of this algorithm is analysed and compared with that of standard backpropagation.

Usually, to discriminate among particle tracks in high energy physics a set of discriminating parameters is used. To cope with the different particle behaviors these parameters are connected by the human observer with boolean operators. We tested successfully an automatic method for particle recognition using a stochastic method to pre-process the input to a back propagation algorithm. The test was made using raw experimental data of electrons and negative pions taken at CERN laboratories (Geneva). From the theoretical standpoint, the stochastic pre-processing of a back propagation algorithm can be interpreted as finding the optimal fuzzy membership function notwithstanding high fluctuating (noisy) input data.

Fractal image compression is achieved by Fractal transformation of images. To complete a Fractal transformation, one has to solve a Fractal equation. In this paper, we will generalize the Fractal theory. We will first present Fractal equations and algorithms for solving Fractal equations. Then we will present several generalizations of Fractal theory.

The Backpropagation technique for supervised learning of internal representations in multi- layer artificial neural networks is an effective approach for solution of the gradient descent problem. However, as a primarily deterministic solution, it will attempt to take the best path to the nearest minimum, whether global or local. If a local minimum is reached, the network will fail to learn or will learn a poor approximation of the solution. This paper describes a novel approach to the Backpropagation model based on Simulated Annealing. This modified learning model is designed to provide an effective means of escape from local minima. The system is shown to converge more reliably and much faster than traditional noise insertion techniques. Due to the characteristics of the cooling schedule, the system also demonstrates a more consistent training profile.

Back-propagation neural networks is a very popular training algorithm for neural nets. One of the problems with this learning algorithm is its training speed. The selection of a good learning rate is a very important factor to achieve a satisfactory learning time. However, it is very difficult to determine an optimal learning rate since this parameter is dependent on a lot of variables such as the size of the network, the number of examples in the training sets... A new method is proposed to compute a near optimal learning rate for a three layer (one hidden layer) back propagation network.

This paper presents function approximation based on nonparametric estimation. As an estimation model of function approximation, a three layered network composed of input, hidden and output layers is considered. The input and output layers have linear activation units while the hidden layer has nonlinear activation units or kernel functions which have the characteristics of bounds and locality. Using this type of network, a many-to-one function is synthesized over the domain of the input space by a number of kernel functions. In this network, we have to estimate the necessary number of kernel functions as well as the parameters associated with kernel functions. For this purpose, a new method of parameter estimation in which linear learning rule is applied between hidden and output layers while nonlinear (piecewise-linear) learning rule is applied between input and hidden layers, is considered. The linear learning rule updates the output weights between hidden and output layers based on the Linear Minimization of Mean Square Error (LMMSE) sense in the space of kernel functions while the nonlinear learning rule updates the parameters of kernel functions based on the gradient of mean square error with respect to the parameters (especially, the shape) of kernel functions. This approach of parameter adaptation provides near optimal values of the parameters associated with kernel functions in the sense of minimizing mean square error. As a result, the suggested nonparametric estimation provides an efficient way of function approximation from the view point of the number of kernel functions as well as learning speed.

The performance of Back Propagation methods strongly depend on the following two choices: (1) use of off-line or on-line algorithm; (2) level of redundancy of the training set of data. Past investigations studied respectively off-line algorithms with a low degree of redundancy and on- line algorithms with a high degree of redundancy. In this paper we complete the framework considering on-line algorithms with a low level of information and off-line algorithms using training sets with 'redundancy of target data'.

Based on mathematical interpretation of fundamental principles related to human visual perception in the case of contour as a general geometrical figure, it is proposed a procedure for class-prototype reconstruction from a limited number of contour-samples belonging to the class. The procedure itself comprises the ordering of given contour-samples into a sequence according to the criterion and value of their mutual distortion, defining adequate mathematical characteristic of distortion trend within the sequence and subsequent reconstruction of approximate class-prototype shape, that is refined afterward by taking into account the assumed property of prototype symmetry. Finally, a global structure of artificial neural system is presented, that could be capable to fulfill the described procedure.

The paper expands the available theoretical framework that establishes a link between an adaptive feedforward layered linear-output network used as a mean-square classifier and discriminant analysis. We prove that, under reasonable assumptions, minimizing the mean- square error at the network output is equivalent to minimizing the following: (1) the difference between the optimum value of a familiar discriminant criterion and the value of this criterion evaluated in the space spanned by the outputs of the final hidden layer, and (2) the difference between the values of the same discriminant criterion evaluated in desired-output and actual- output subspaces. We also illustrate, under specific constraints, how to solve the following problem: given a feature extraction criterion, how the target coding scheme can be selected such that this criterion is maximized at the output of the network final hidden layer.

An important consideration in designing an RBF network is the choice of the number of hidden units required for the network to generalize optimally. A new method, which is called canonical subspace analysis, is proposed for the selection of the number of hidden units. The numerical results show that with the number of the hidden units determined using the proposed method, minimum prediction errors are obtained.

This woit investigates the application of evolutionary programming, a multi-agent stochastic search technique, to the generation of recurrent perceptrons (nonlinear hR filters) for time-series prediction tasks. The evolutionary programming paradigm is discussed and analogies are made to classical stochastic optimization methods. A hybrid optimization scheme is proposed based on multi-agent and single-agent random optimization techniques. This method is then used to determine both the model order and weight coefficients of linear, nonlinear, and parallel linear-nonlinear nextstep predictors. The AIC is used as the cost function to score each candidate solution.

It is commonly accepted that the modification of the weights during training of an Artificial Neural Network can be augmented by addition of a random element chosen from various distributions. This technique, referred to as Noise Injection, allows the training process to stochastically traverse a larger subset of the sample space, as well as escape from local minima. This paper examines the effect of noise injection on the training cycle of feedforward neural networks. Emphasis is placed on the gradient descent weight modification technique of the backpropagation model. Statistical examination is made of the distribution of the effect within the topology of the weight space, upon the inputs to individual units, upon training time, and on the total error of the network. Since the weights of the network can be considered together as an n-tuple, injection of noise can be statistically examined within that n- dimensional space. It is shown that, for stochastically independent random distributions, the effect on this weight space and on the inputs to individual units is dependent upon the number of weights in the network. The multivariate distribution of the vector modification during training becomes increasingly distorted as the network size increases, such that noise injection has a more significant, and less stable, effect. Problems with traditional approaches are examined and an alternative noise injection method based on network size is presented.

A significant problem in the design and construction of an artificial neural network for function approximation is limiting the magnitude and variance of errors when the network is used in the field. Network errors can occur when the training data does not faithfully represent the required function due to noise or low sampling rates, when the network's flexibility does not match the variability of the data, or when the input data to the resultant network is noisy. This paper reports on several experiments whose purpose was to rank the relative significance of these error sources and thereby find neural network design principles for limiting the magnitude and variance of network errors.

In this paper, it is shown that supervised learning can be posed as an optimization problem in which inequality constraints are used to encode the information contained in the training patterns and to specify the degree of accuracy expected from the neural network. Starting from this point, a technique for evaluating the learning capability and optimizing the feature space of a class of higher-order neural networks is developed. The technique gives significant insight into the problem of task learning. It permits establishing whether the structure of the network can effectively learn the training patterns. Should the structure not be appropriate for learning, it indicates which patterns form the minimum set of patterns which cannot be learned with the desired accuracy. Otherwise, it provides a connectivity which produces satisfactory network performance. Furthermore, it identifies those features which can be suppressed from the definition of the feature space without deteriorating network performance. Several examples are presented and results are discussed.

The experience gained in many experiments with neural networks has shown that many challenging problems are still hard to solve, since the learning process becomes very slow, often leading to sub-optimal solutions. In this paper we analyze this problem for the case of two-layered networks by discussing on the joint behavior of the algorithm convergence and the generalization to new data. We suggest two scores for generalization and optimal convergence that behave like conjugate variable in Quantum Mechanics. As a result, the requirement of increasing the generalization is likely to affect the optimal convergence. This suggests that 'difficult' problems are better face with biased-models, somewhat tuned on the task to be solved.

Generalized Deterministic Annealing (GDA) is a useful new tool for computing fast multi-state combinatorial optimization of difficult non-convex problems. By estimating the stationary distribution of simulated annealing (SA), GDA yields equivalent solutions to practical SA algorithms while providing a significant speed improvement. Using the standard GDA, the computational time of SA may be reduced by an order of magnitude, and, with a new implementation improvement, Windowed GDA, the time improvements reach two orders of magnitude with a trivial compromise in solution quality. The fast optimization of GDA has enabled expeditious computation of complex nonlinear image enhancement paradigms, such as the Piecewise Constant (PICO) regression examples used in this paper. To validate our analytical results, we apply GDA to the PICO regression problem and compare the results to other optimization methods. Several full image examples are provided that show successful PICO image enhancement using GDA in the presence of both Laplacian and Gaussian additive noise.

Adaptive infmite impulse response filters require the exploration of a multimodal error surface. We describe the use of genetic algorithms for this particular problem and demonstrate that our approach is capable of discovering the global minimum of the performance surface of a multimodal adaptive filter example in a reasonable length of time compared with other methods. We also compare the performance of the genetic algorithm with the recently-proposed method of very fast simulated reannealing.

Based on the optimization with neural net, a new algorithm for construction of an interconnection weight matrix (IWM) of binary Hopfield-type dynamic associative memories is proposed. Computer simulation shows that is has higher storage capacity and better error- correction ability than Hopfield model.

Kohonen has proposed a physiologically plausible method of cooperative and competitive organization for artificial neural networks that allows them to self-organize around a set of input vectors.' The now famous approach of Kohonen's Self-Organizing Topological Feature Maps has been applied extensively to pattern classification problems. For many problems, however, it is not enough to say which class the input falls into, but rather what (real-valued) output is appropriate for the class to which the input vector belongs. Several methods have been proposed for extending Kohonen Networks so that they may learn appropriate output responses.2' Unfortunately, these supervised methods require having a "teacher" that knows the correct output responses. While in all problems the input vectors are "correct" (that is why the network is to learn them), in many problems the correct output responses are not available (which, contrapositively, may be the reason we wish to train the network). To partially fill this gap I propose a SelfOrganizing Neural Network using Eligibility Traces (SONNET). SONNET is appropriate for those problems in which the correct output responses are not known, but a feedback mechanism that allows for an overall evaluation of system performance (success and/or failure signals) is available and for which system performance is temporally based on network responses. Such is the case with controllers for many physical systems (including some robotics applications) as well as chemiCal and biological systems and is even the case for some object recognition and other computer vision problems. SONNET works by combining the self-organizing capabilities of Kohonen Networks with the temporal sensitivity of eligibility traces. The concept of the eligibility trace comes from observations of human and animal brains. It has been noted that many neurons become more amenable to change when they fire.4 This plasticity reduces with time, but produces a trace of eligibility for adaptation. Using these races, SONNEr adapts the output responses to a greater or lesser degree depending on their eligibility for adaptation at the time when the failure and/or success signals are received. The use of SONNET is demonstrated on a simulation of the well-known (toy) physical system control problem known as the pole-balancing problem. Comparisons are made between SONNET and other neural network5 and nonconnectionist control-learning systems. SONNET is seen to be powerful and adaptable. It is capable of learning both a useful partitioning of the input space and, without supervision, an appropriate output response for each class.

ARMA (autoregressive--moving average) time series methods have been found to be effective methods of forecasting and prediction. Using AR (autoregression) methods, predictions rely on regressing previous time series input values, while in MA (moving average) methods, predictions are calculated by regressing previous forecasting errors. We can improve ARMA type forecasts with backpropagation by nonlinear regression of both the inputs and the previous forecasting errors. The new predictions are calculated by adding a feedforward neural network that accepts the previous forecast and previously generated forecast errors as inputs and produces new forecasts having smaller prediction errors. The accuracy of these forecast can exceed that of ARMA, or backpropagation forecasts alone. The improved predictions of AR and backpropagation network forecasts are shown using the Mackey-Glass chaotic time series.

The Auto-Associative Recurrent Network (AARN), a modified version of the Simple Recurrent Network (SRN) can be trained to behave as recognizer of a language generated by a regular grammar. The network is trained successfully on an unbounded number of sequences of the language, generated randomly from the Finite State Automation (FSA) of the language. But the training algorithm fails when training is restricted to a fixed finite set of examples. Here, we present a new algorithm for training the AARN from a finite set of language examples. A tree is constructed by preprocessing the training data. The AARN is trained with sequences generated randomly from the tree. The results of the simulations experiments are discussed.

Organizational principles of software for arbitrary neural networks simulation are formulated, and a practical software system realizing these principles is described. While using this software, maximum automation of simulated neural networks structure synthesis is possible, and the necessary flexibility of arbitrarily organized neurocomputers description synthesis is preserved. Wide possibilities for simulation modes control and for simulated neural networks' states display are available.

User recognition is examined using neural and conventional techniques for processing speech and face images. This article for the first time attempts to overcome this significant problem of distortions inherently captured over multiple sessions (days). Speaker recognition uses both Linear Predictive Coding (LPC) cepstral and auditory neural model representations with speaker dependent codebook designs. For facial imagery, recognition is developed on a neural network that consists of a single hidden layer multilayer perceptron backpropagation network using either the raw data as inputs or principal components of the raw data computed using the Karhunen-Loeve Transform as inputs. The data consists of 10 subjects; each subject recorded utterances and had images collected for 10 days. The utterances collected were 400 rich phonetic sentences (4 sec), 200 subject name recordings (3 sec), and 100 imposter name recordings (3 sec). Face data consists of over 2000, 32 X 32 pixel, 8 bit gray scale images of the 10 subjects. Each subsystem attains over 90% verification accuracy individually using test data gathered on day following the training data.

Linear and nonlinear adaptive filtering algorithms are described, along with applications to signal processing and control problems such as prediction, modeling, inverse modeling, equalization, echo cancelling, noise cancelling, and inverse control.

In the last decade, much effort has been directed towards understanding the role of chaos in the brain. Work with rabbits reveals that in the resting state the electrical activity on the surface of the olfactory bulb is chaotic. But, when the animal is involved in a recognition task, the activity shifts to a specific pattern corresponding to the odor that is being recognized. Unstable, quasiperiodic behavior can be found in a class of conservative, deterministic physical systems called the Hamiltonian systems. In this paper, we formulate a complex version of Hopfield's network os real parameters and show that a variation on this model is a conservative system. Conditions under which the complex network can be used as a Content Addressable memory are studied. We also examine the effect of singularities of the complex sigmoid function on the network dynamics. The network exhibits unpredictable behavior at the singularities due to the failure of a uniqueness condition for the solution of the dynamic equations. On incorporating a weight adaptation rule, the structure of the resulting complex network equations is shown to have an interesting similarity with Kosko's Adaptive Bidirectional Associative Memory.

In this paper the application and performance of Artificial Neural Networks (ANN) to the problem of sensor data fusion is reported for an experimental system, Tracker. The task of sensor data fusion involves integrating numerous data streams, originating from disparate sensors, into a consistent model that represents the pertinent higher level features of the environment as well as presenting an assessment of their significance. In the case of the modern naval environment, the problem central to many tactical data fusion systems is the need for rapid acquisition and interpretation of the information. In a potentially hostile situation the time taken to perform such an assessment is severely limited and a rapid and accurate response is vital. This paper describes the application of ANN to tactical sensor data fusion and the automated processing of the radar behaviors for various vehicle types. In particular the tasks of target and behavioral identification for both automated surveillance and support tasks are highlighted as important in the modern naval environment. The experimental research program divided the analysis of the radar tracks into three distinct categories. These were (1) target identification, (2) behavioral analysis (target task identification), and (3) threat assessment.

In this paper we discuss the means by which recurrent connections are used in neural control system architectures. We first consider the state feedback approach to control and the role of recurrent neural networks for plant modeling and control. In this context, we provide an explicit formulation for the computation of dynamic derivatives in recurrent neural network architectures as required for training by the dynamic gradient method. For illustration, we apply dynamic gradient methods to train recurrent neural network controllers for a series of cart-pole problems with the simultaneous objectives of pole balancing and cart centering.

We describe a new theory of differential learning by which a broad family of pattern classifiers (including many well-known neural network paradigms) can learn stochastic concepts efficiently. We describe the relationship between a classifier's ability to generalize well to unseen test examples and the efficiency of the strategy by which it learns. We list a series of proofs that differential learning is efficient in its information and computational resource requirements, whereas traditional probabilistic learning strategies are not. The proofs are illustrated by a simple example that lends itself to closed-form analysis. We conclude with an optical character recognition task for which three different types of differentially generated classifiers generalize significantly better than their probabilistically generated counterparts.

Two types of artificial neural networks are introduced for the robust classification of spatio- temporal sequences. The first network is the Adaptive Spatio-Temporal Recognizer (ASTER), which adaptively estimates the confidence that a (variable length) signal of a known class is present by continuously monitoring a sequence of feature vectors. If the confidence for any class exceeds a threshold value at some moment, the signal is considered to be detected and classified. The nonlinear behavior of ASTER provides more robust performance than the related dynamic time warping algorithm. ASTER is compared with a more common approach wherein a self-organizing feature map is first used to map a sequence of extracted feature vectors onto a lower dimensional trajectory, which is then identified using a variant of the feedforward time delay neural network. The performance of these two networks is compared using artificial sonograms as well as feature vectors strings obtained from short-duration oceanic signals.

We investigate the use of a Differential Vector Quantizer (DVQ) architecture for the coding of digital images. An Artificial Neural Network (ANN) is used to develop entropy-biased codebooks which yield substantial data compression while retaining insensitivity to transmission channel errors. Two methods are presented for variable bit-rate coding using the described DVQ algorithm. In the first method, both the encoder and the decoder have multiple codebooks of different sizes. In the second, variable bit-rates are achieved by encoding using subsets of one fixed codebook. We compare the performance of these approaches under conditions of error-free and error-prone channels.