A two dimensional hexagonal cellular automaton using second order dynamics, where cell states at time t-1 determine cell states at time t, is shown to be an effective motion detector. Simulation results indicate that this cellular automaton is capable of simultaneous tracking of multiple objects in the presence of a randomly fluctuating background (clutter). Detection of motion can be achieved in fluctuating backgrounds of 30% or more (30% of the input cells ,or focal plane pixels, are randomly chosen to be active for every frame). This automaton design lends itself well to concurrent focal plane processing, including optical implementations, and has a variety of applications. These include linear motion filtering and feature extraction from time varying images. This design may also serve as a basic model for retinal neuron interactions in biological vision.

This paper is concerned with the development of algorithms for solving optimization problems with a network of artificial neurons. Although there is no notion of step-by-step sequencing in a neural network, it is possible to develop tools and techniques for interconnecting a network of neurons so that it will achieve stable states corresponding to possible solutions of a problem. An approach to implementing an important class of constraints in a network of artificial neurons is presented and illustrated by developing a solution to a resource allocation problem.

Teledyne Brown Engineering has developed techniques for mapping sequential algorithms onto a neural model of a parallel distributed processor. Any sequential algorithm (including NP algorithms) can be mapped onto a neural network. This paper discusses some practical considerations for implementation of sequential to parallel mappings (SPM). It is feasible to generalize these techniques and implement a neural compiler for sequential algorithms. Neural networks (the interconnection matrix) generated by the neural compiler will be implemented in a fixed, holographic, optical computer.

The method presented in this paper outlines a kind of free-space optical network that is achieved using a mixture of regular- optical interconnected links and dynamic- electrical switching control elements. Free-space optics can provide a large number of regular connections, which we explore with an electrical switching element. Recent work on optical connections provides design techniques for using regular free-space interconnects within processing nodes and between node's of the hypercube computer. This electro- optical adaptive switching network takes advantage of the best features of two technologies by using optics for hologram interconnections and electronics for controlling space-invariant interconnections. Therefore, a device whose reflectance may be varied spatially and temporally like a spatial light modulator (SLM) is not needed to dynamically perform a large number of interconnections. This is because of the small electrical switching element associated with each optical link. Here, available optical paths need not be specified by a source node and communicated to a SLM for path setup. Instead the message routing header can be modified in response to congestion as an optical path is established. This architecture takes advantage of the best features of both technologies as they currently exist. Future devices, aimed at reducing the optics- to- electronics conversion penalty, will make such architectures attractive.

A new class of algorithms based upon a generalized singular value decomposition (SVD) is considered for system identification, statistical model order determination, model order reduction, and predictive control. Currently available algorithms for system identification and control are not completely reliable for automatic implementation on microprocessors in real time. In the generalized SVD approach, the algorithms are computationally stable and numerically accurate and can be implemented on systolic array processors using recently developed algorithms resulting in a considerable speedup. The method is based upon a recent generalized canonical variate analysis (CVA) method for determining the optimal state of a restricted order in system identification, reduced order stochastic filtering, and model predictive control. This permits a unified approach to the solution of these problems from the viewpoints of a prediction problem as well as an approximation problem. Algorithms for online computation in identification, filtering, and control of high order linear multivariable systems are developed. Implementing these algorithms on systolic array processors are discussed.

A statistical measure of the performance of an adaptive beam-forming technique for a multi sensor linear array has been developed. The sample complex vector signal process, Z(tk ) as observed at sample times {tk } E [0,T] is used to calculate a positive, definite, maximum likelihood estimate (M.L.E.) fi of the signal process covariance matrix R. Then, an estimated measure of the performance of the array in terms of its ability to suppress an interfering signal is developed from an a priori specified process matrix E (R), its sample M.L.E. estimate R, and a form of the Wishart distribution. In particular, such an estimated measure of performance in terms of suppressing an interfering signal with an AOA of 02 relative to a beam AOA of 01 is the statistic represented by the estimated L 2 norm, II 0-1 11 29 where 15-1.V*(01)11-1V(01) and p-1=V*(01)R-1V(01). Also R and R are functions of V(02) and V* (82).

This paper describes an approach to 2-dimensional object recognition. Complex-log conformal mapping is combined with a distributed associative memory to create a system which recognizes objects regardless of changes in rotation or scale. Recalled information from the memorized database is used to classify an object, reconstruct the memorized version of the object, and estimate the magnitude of changes in scale or rotation. The system response is resistant to moderate amounts of noise and occlusion. Several experiments, using real, gray scale images, are presented to show the feasibility of our approach.

A knowledge based simulation environment requires the capture, organization, retrieval and use of knowledge from many sources. Sources of knowledge include but are not limited to simulation knowledge, domain knowledge in the area of the problem and the knowledge of the user about the specific problem. Given the knowledge is available, a model is constructed to allow for the experimentation with the system. The architecture for a knowledge based simulation environment, the collection of the model specification and the synthesis of the model are described.

Recent applied and theoretical results have demonstrated that certain classes of neural networks possess adaptive characteristics. These adaptive networks modify their response to inputs as a result of "experience," e.g., they are trained. One of the most obvious applications of this type of trainable network is to real-time signal processing problems. In particular, for many applications, the detection and classification of specific "target" signatures buried in noisy, clutter-rich signals often proves to be an extremely difficult problem. In addition, this problem is nearly always worsened by the high bandwidth associated with many modern sensor systems. Nearly all conventional signal processing and neural network techniques employ special-purpose feature extraction hardware as an interface between the sensor and the detection/classification mechanism (whether neural net or conventional). However, given typical system requirements for minimum size, weight and power, a considerable advantage would be gained if this interface were simplified or eliminated. Trainable neural networks appear to offer exceptional promise as simultaneous feature extraction and pattern recognition mechanisms. This paper presents the results of preliminary experimental investigations of the performance of various trainable (back-propagation) neural networks applied to the processing of various types of sensor signals.

Neural network models that mimic some functions of a brain use logic elements as neurons and connections of varying strength as synapses. Inspired by these models, we have developed a high density, programmable read only memory (PROM) that promises high-speed and fault tolerant capabilities. This memory is based on an innovative thin film implementation of binary synapses represented by a connection matrix with electronic amplifiers as input drivers and output sensors. A binary word is stored along a row of such a connection matrix of mutually perpendicular wires where the passive, two terminal, write-once synaptic interconnects consisting of a-Si:H microswitches are programmed to take on a value of '0' (synapse OFF) or a '1' (synapse ON). The retrieval of information is done either by addressing the matrix row-by-row or by content addressing in an associative recall. Thin film fabrication technology for this memory and the design options leading to very high density (up to a gigabit/cm2) PROM are described. Test circuits containing 40 x 40 binary synaptic arrays have been fabricated to study the a-Si:H memory switching properties for the 'write' operation. Results of the read operation and projections for larger memory blocks are discussed to highlight the potential for a read data rate exceeding 10 megabits/sec.

This paper presents a neural-network approach to the problem of multi-target tracking. The problem is formulated analytically in terms of desired optima and constraints that make it suitable for solution using the neural-network formalism of Hopfield-Tank. The results of computer simulations of a network designed to solve the problem are presented.

In this paper, we present a new approach to designing optical digital arithmetic systems. We present new arithmetic algorithms based on a modified signed-digit number representation. New Signed-digit symbolic substitution rules are introduced. to implement them. These signed-digit arithmetic algorithms are well suited for optical implementation because, of the confined carry propagation within adjacent digits. We present an optical architecture of such an arithmetic processor. The proposed optical arithmetic processor can potentially achieve 0(102) to 0(103) improvement in speed as compared with conventional electronic arithmetic processors.

The possibility of using a neural network to perform super-resolution is discussed. A method is presented for mapping a regularized super-resolution cost function onto a neural network of the Hopfield type. The required computational effort is compared to conventional matrix-based approaches to super-resolution.

For many application-specific and mission-oriented multiple processor systems, the interprocessor communication is deterministic and can be specified at system inception. This specification can be automatically mapped onto a physical system using a network traffic scheduler. An iterative network traffic scheduler is presented which, given the arbitrary topology of the communication network, translates the deterministic communication into a network traffic routing pattern. Previous work has shown the existence of a network traffic scheduling algorithm based on a fluid-flow model that converges to an optimal solution. However, this algorithm assumes an external host that performs centralized scheduling. This paper presents a parallel version of the algorithm which can be executed in a parallel fashion by the network switching nodes themselves without requiring an external host. Hence, using such an algorithm a communication network can perform self scheduling of interprocessor traffic. Furthermore, with the self-scheduling capability, a network will be able to perform traffic routing and scheduling concurrently and on-line network reconfiguration.

The paper describes results of recent research on the design, testing, and evaluation of high performance, high reliabilty mutiprocessor systems at the University of Michigan. This work is aimed at the fundamental problem of realizing systems which perform extremely well and, moreover, are able to sustain satisfactory performance in the presence of faults. In our view, such work, if it is to be successful, must deal simultaneously with a variety of related issues. The presentation supports this position and, thus, covers a relatively broad range of topics. More detailed descriptions of these results are available in most instances and are referenced as part the discussion.

A systolic organization is presented for the computation of a complex-valued frequency response matrix G (j 0)) = C (j w- A)-1B . By 'sys-tolic organization,' we mean an algorithm intended for software implementation on a programmable systolic/wavefront computer system. Typically, the real-valued state space model matrices A , B , and C are given and the calculation of G must be performed for a very large number of values of the scalar "frequency" parameter co. This, and closely related calculations, arise naturally in the analysis and design of control systems. The algorithm which has been chosen for systolic implementation is an orthogonal version of an algorithm appearing pre-viously in the literature. The matrix A is reduced initially to an upper Hessenberg form and this form is preserved as w varies subsequently in the matrix j wl - A . A systolic QR factorization of this latter matrix [(jl - A) = QT R} is then implemented for effecting the linear system solution (inversion). The critical computational component is CR-1. This computational component's process dependency graph is embedded optimally in space and time through the use of a nonlinear spacetime transformation. The computational period of the algorithm is 0 (n) where n is the order of the matrix A .

Because processor arrays have only limited connections between neighboring processors, fault-tolerance schemes may require additional interconnect, switching and control hardware in order to allow for reconfiguration when faults occur. In general, the larger the reconfiguration capability, the greater is the probability that a processor array can survive a given distribution of faults. In other words, the coverage of the reconfiguration procedure increases directly with the amount of extra hardware required to support it. However, this is true only if the added hardware does not fail itself. For this reason and depending on, among other factors, the size of the processor array and the size of each processor, dis-tinct reconfiguration schemes may be best suited for different arrays. Also, in general, previously proposed schemes may still result in unacceptably low reliabilities in very large processor arrays. This paper proposes a class of reconfiguration schemes which have a hierarchical nature. According to this approach, a processor array is logically partitioned into smaller subarrays and, once faults occur, reconfiguration takes place within each of the subarrays (where faults are present) if possible and, otherwise, the full subarray is replaced by a spare subarray. Arrays of this type are referred to as bi-level fault-tolerant processor arrays and, by allowing several levels of reconfiguration, multi-level arrays can be defined similarly. While several levels of reconfiguration are possible, the case of two levels is emphasized in this paper. Also, the reconfiguration schemes used in each level are not necessarily identical. This class of hierarchical reconfiguration schemes provide much higher reliability than previously proposed ones, particularly in the case of very large arrays. To design a hierarchical reconfiguration scheme for a given processor array it is necessary to choose the size of the subarrays for every level in the hierarchy as well as the reconfiguration scheme at that level. A design methodology is provided which mathematically solves these problems, i.e. it enables the choice of the subarrays size and the reconfiguration scheme to be used at each level so to obtain a processor array with optimal reliability.

We present preliminary results on the VLSI design and implementation of a novel and promising algorithm for accurate high-speed Fourier analysis and synthesis. The Arithmetic Fourier Transform (AFT) is based on the number -theoretic method of Mobius inversion. Its computations proceed in parallel and the individual operations are very simple. Except for a small number of scalings in one stage of the computation, only multiplications by 0, +1, and -1 are required. If the input samples were not quantized and if ideal real-number operations were used internally, then the results would be exact. The accuracy of the computation is limited only by the input A/D conversion process, any constraints on the word lengths of internal accumulating registers, and the implementation of the few scaling operations. Motivated by the goal of efficient, effective, high-speed realization of the algorithm in an integrated circuit, we introduce further simplicities by the use of delta modulation to represent the input function in digital form. The result is that only binary (or preferably, ternary) sequences need to be processed in the parallel computations. And the required accumulations can be replaced by up/down counters. The dynamic range of the resulting transformation can be increased by the use of adaptive delta modulation (ADM).

In this work, we study the design of computationally efficient order-recursive algorithms for computing the predictor polynomial and the reflection coefficients associated with a real, symmetric, positive-definite Toeplitz matrix T,and for solving the linear systemTx=b. New algorithms are derived which lead to significant improvements in the computational com-plexity as compared to the previously known order-recursive algorithms. They also provide further insight into the mathematical properties of the structurally rich Toeplitz matrices.

We are interested in implementing direct search methods on parallel computers to solve the unconstrained minimization problem: Given a function f : IRn --? IR find an x E En that minimizes 1 (x). Our preliminary work has focused on the Nelder-Mead simplex algorithm. The origin of the algorithm can be found in a 1962 paper by Spendley, Hext and Himsworth;1 Nelder and Meade proposed an adaptive version which proved to be much more robust in practice. Dennis and Woods3 give a clear presentation of the standard Nelder-Mead simplex algorithm; Woods4 includes a more complete discussion of implementation details as well as some preliminary convergence results. Since descriptions of the standard Nelder-Mead simplex algorithm appear in Nelder and Mead,2 Dennis and Woods,3 and Woods,4 we will limit our introductory discussion to the advantages and disadvantages of the algorithm, as well as some of the features which make it so popular. We then outline the approaches we have taken and discuss our preliminary results. We conclude with a discussion of future research and some observations about our findings.

Recent advances in high-speed microprocessor technology and in methods to couple large numbers of such processors into concurrent structures offer cost-effective means of obtaining super-computing performance. There is much interest in applying and in evaluating the actual performance on large, computationally-intensive problems. Of particular interest is the concurrent performance of large scale electromagnetic scattering problems. Two electromagnetic codes with differing underlying algorithms have been converted to run on the Mark III Hypercube. One is a time domain finite difference solution of Maxwell's equations to solve for scattered fields and the other is a frequency domain moment method solution. Important measures for demonstrating the utility of the parallel architecture are the size of the problem that could be solved and the efficiency by which the paralleling can increase the speed of execution.

Four matrix operations: matrix multiplication, the QR decomposition, the singular value decomposition, and the generalized singular value decomposition, form the basic tools for modern signal processing. This paper discusses their implementation on the 65,536-processor Connection Machine, and presents results showing that for n X n matrices, where n <256, an almost linear time performance is obtained. Our other major result is a novel method for computing the generalized singular value decomposition.

In this paper the state of art of designing optimisation algorithms for parallel processing computers will be reviewed. The availability of parallel processing hardware implies the redesign of optimisation software. As on a sequential computer most of the computer time is spent in calculating either the function and its derivatives, and/or the constraints and their derivatives, it is natural that ways of evaluating these in parallel be considered first. It will be shown that automatic differentiation provides an excellent tool for this purpose. A library of codes will then be described based on parallel processed structured automatic differentiation; which also utilise parallel processing in calculating their direction of search. In particular, parallel versions of the Newton Raphson, Variable Metric, Conjugate Gradient, Truncated Newton algorithms will be described, followed by two codes for constrained optimisation and one for global optimisation. Finally, our experience using the ICL-DAP processor for solving finite element optimisation problems will be described.

A multilevel-multiresolution method for image processing tasks and computer vision in general, is presented. The method is based on a combination of probabilistic models, Monte Carlo type algorithms, and renormalization group ideas. The method is suitable for implementation on massively parallel computers. It also yields a new global optimization algorithm potentially applicable to any cost function, but especially efficient for problems which are governed by local spatial relations.

Systolic array processors featuring local communications, high throughput and VLSI compatability provide the most attractive media for meeting the computational requirements of on line non linear signal processing. The present study considers computational requirements attendant to the Wigner distribution, and quadratic functions in general.