Images

Classifications

G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS

G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion

G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric

G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion

G05B13/027—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only

Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS

Description

TECHNICAL FIELD OF THE INVENTION

The present invention pertains in general to neural networks, and more
particularly to a method and apparatus for improving performance and accuracy in
neural networks by utilizing the residual activation in subnetworks.

BACKGROUND OF THE INVENTION

Neural networks are generally utilized to predict, control and optimize a
process. The neural network is generally operable to learn a non-linear model of a
system and store the representation of that non-linear model. Therefore, the neural
network must first learn the non-linear model in order to optimize/control that
system with that non-linear model. In the first stage of building the model, the
neural network performs a prediction or forecast function. For example, a neural
network could be utilized to predict future behavior of a chemical plant from the
past historical data of the process variables. Initially, the network has no knowledge
of the model type that is applicable to the chemical plant. However, the neural
network "learns" the non-linear model by training the network on historical data of
the chemical plant. This training is effected by a number of classic training
techniques, such as back propagation, radial basis functions with clustering, non-radial
basis functions, nearest-neighbor approximations, etc. After the network is
finished learning on the input data set, some of the historical data of the plant that
was purposefully deleted from the training data is then input into the network to
determine how accurately it predicts on this new data. If the prediction is accurate,
then the network is said to have "generalized" on the data. If the generalization
level is high, then a high degree of confidence exists that the prediction network has
captured useful properties of the plant dynamics.

In order to train the network, historical data is typically provided as a
training set, which is a set of patterns that is taken from a time series in the form of
a vector, x(t) representing the various input vectors and a vector, y(t) representing
the actual outputs as a function of time for t=1, 2, 3 ... M, where
M is the number of training patterns. These inputs could be temperatures,
pressures, flow-rates, etc., and the outputs could be yield, impurity levels, variance,
etc. The overall goal is to learn this training data and then generalize to new
patterns.

With the training set of inputs and outputs, it is then possible to construct a
function that is imbedded in the neural network as follows:
o(t) = f(x(t),P)
Where o(t) is an output vector and P is a vector or parameters ("weights") that are
variable during the learning stage. The goal is to minimize the Total-Sum-Square-Error
function:

The Total-Sum-Square-Error function is minimized by changing the parameters P of
the function f. This is done by the back propagation or gradient descent method in
the preferred embodiment. This is described in numerous articles, and is well
known. Therefore, the neural network is essentially a parameter fitting scheme that
can be viewed as a class of statistical algorithms for fitting probability distributions.
Alternatively, the neural network can be viewed as a functional approximator that
fits the input-output data with a high-dimensional surface. The neural network
utilizes a very simple, almost trivial function (typically sigmoids), in a multi-layer
nested structure. The general advantages provided by neural networks over other
functional approximation techniques is that the associated neural network algorithm
accommodates many different systems, neural networks provide a non-linear
dependence on parameters, i.e., they generate a non-linear model, they utilize the
computer to perform most of the learning, and neural networks perform much better
than traditional rule-based expert systems, since rules are generally difficult to
discern, or the number of rules or the combination of rules can be overwhelming.
However, neural networks do have some disadvantages in that it is somewhat
difficult to incorporate constraints or other knowledge about the system into the
neural networks, such as thermodynamic pressure/temperature relations, and
neural networks do not yield a simple explanation of how they actually solve
problems.

In practice, the general disadvantages released with neural networks are
seldom important. When a neural network is used in part for optimizing a
system, it is typically done under supervision. In this type of optimization, the
neural network as the optimizer makes suggestions on how to change the
operating parameters. The operator then makes the final decision of how to
change these parameters. Therefore, this type of system usually requires an
"expert" at each plant that knows how to change control parameters to make
the plant run smoothly. However, this expert often has trouble giving a good
reason why he is changing the parameters and the method that he chooses.
This kind of expertise is very difficult to incorporate into classical models for
rule-based systems, but it is readily learned from historical data by a neural
network.

US 5 113,483 discloses a neural network which includes an input layer
comprising a plurality of input units interconnected to a hidden layer with a
plurality of hidden units disposed therein through an interconnection matrix.
Each of the hidden units is a single output that is connected to output units in
an output layer through an interconnection matrix. Each of the interconnections
between one of the hidden units and one of the outputs has a weight
associated therewith. The network is arranged to learn by back propagation
methods to vary the output weights and in this way provides an output
dependent on known inputs. However, this network takes no account of
dependency between the inputs or of any unmeasurable inputs.

US 5 111,531 discloses a control system and a method for a continuous
process in which a trained neural network predicts the value of an indirectly
controlled process variable and the values of directly controlled process
variables are changed to cause the predicted value to approach a desired value.
In this way the network monitors what is known as state variables and controls
what is known as control variables in order to enhance the operation of the
device which the control system controls. However, the control system takes
no account of possible dependencies between the control variables and state
variables nor of any unmeasurable variables.

IEE Proceedings Volume 138 No. 5 September 19, 1991, pages 431 to
438 disclose a novel technique directly using artificial neural networks which is
proposed for the adaptive control of non-linear systems. The ability of neural
networks to model arbitrary non-linear functions and the inverse is exploited.
The use of non-linear function inverses raises questions of the existence of the
inverse operators. This is a very early research paper regarding neural networks
and again takes no effect of interconnections between inputs and outputs of the
network nor of unmeasurable variables.

The general problem in developing an accurate prediction is the problem
in developing an accurate model. In prediction files, there often exists variables
that contain very different frequency components, or have a modulation on top
of the slow drift. For example, in electronics, one may have a signal on top of a
slowly varying wave of a much lower frequency. As another example in
economics, there is often an underlying slow upward drift accompanied by very
fast fluctuating dynamics. In manufacturing, sensors often drift slowly, but the
sensory values can change quite quickly. This results in an error in the
prediction process. Although this error could be predicted given a sophisticated
enough neural network and a sufficient amount of training data on which the
model can be built, these are seldom practical neural network systems. As
such, this error is typically discarded. This error is generally the type of error
that is predictable and should be distinguished from random "noise" that is
generally impossible to predict. This predictable error that is discarded in
conventional systems is referred to as a "residual".

In addition to the loss of the residual prediction from the actual
prediction, another aspect of the use of a neural network is that of providing
optimization/control. Once a prediction has been made, it is then desirable to
actually manipulate input variables which are referred to as the control
variables, these being independent variables, to manipulate control input
parameters to a specific set point. For example, valve positions, tank level-controllers,
the accelerator pedal on a car, etc., are all control variables. In
contrast, another set of variables referred to as state variables are measured,
not manipulated variables, from sensors such as thermometers, flow meters,
pressure gauges, speedometers, etc. For example, a control valve on a furnace
would constitute the control variable, whereas a thermometer reading would
constitute a state variable. If a prediction neural network were built to model a
plant process based on these input variables, the same predicted accuracy
would be obtained based on either the control variable or the state variable, or a
combination of both.

Whenever the network is trained on input patterns, a problem occurs due
to the relationship between the control valve and the thermometer reading. The
reason for this is that the network will typically learn to pay attention to the
temperature or the control or both. If it only pays attention to the temperature,
the network's control answer is of the form "make the temperature higher" or,
"make the temperature lower". As the thermometer is not a variable that can
be manipulated directly, this information has to be related back to information
as to how to change the controller. If the relationship between the valve and
the temperature reading were a direct relationship, this might be a simple
problem. However, the situations that exist in practice are typically more
complex in that the state variable dependencies on the control variables are not
obvious to discern; they may be multivariant non-linear functions of the control
with no human in the loop, it is necessary for the network to account for the
relationship between the control variables and the state variables.

According to a first aspect of the present invention there is provided a
control system for controlling a plant, the system having plant control inputs for
receiving plant control variables, measurable state variables of the plant and
desired plant outputs, the measurable state variables having dependencies on
the plant control variables and unmeasurable external influences on the plant,
the system comprising:

a control network input for receiving as network inputs the current
plant control variables, the measurable state variables and desired plant outputs;

a control network output for outputting predicted plant control
variables necessary to achieve the desired plants outputs;

a processing system for processing the received plant control
variables through an inverse representation of the plant that represents the
dependencies of the plant output on the plant control variables and the
measurable state variables parameterized by an estimation of the unmeasurable
external influences to provide the predicted plant control variables to achieve
the desired plant outputs; and

an interface device for inputting the predicted plant control
variables that are output by said control network output to the plant as plant
control variables to achieve the desired plant outputs.

According to a second aspect of the present invention there is provided a
method of controlling a plant having measurable state variables and plant
control inputs for receiving plant control variables and desired plant outputs, the
measurable state variables being a function of the plant control variables and
unmeasurable external influences on the plant, the method comprising the steps
of:

receiving the current plant control variables and desired plant
outputs;

processing the received plant control variables through an inverse
representation of the plant that represents the dependencies of the plant output
on the plant control variables and the measurable state variables parameterized
by an estimation of the unmeasurable external influences to provide the
predicted plant control variables necessary to achieve the desired plant outputs;

outputting as an output the predicted plant control variables
necessary to achieve the desired plant outputs; and

controlling the plant with the predicted plant control variables.

BRIEF DESCRIPTION OF THE DRAWINGS

For a more complete understanding of the present invention and the
advantages thereof, reference is now made to the following description taken in
conjunction with the accompanying Drawings in which:

FIGURE 1 illustrates a general diagram of the neural network model of a
plant;

FIGURE 2 illustrates a schematic view of a neural network representing a
single hidden layer;

FIGURE 3 illustrates a time-series output representing the first level of
prediction;

FIGURE 4 illustrates the first residual from the first prediction with the
second prediction of the residual;

FIGURE 5 illustrates a diagrammatic view of the neural network for
generating the prediction utilizing residuals;

FIGURE 6 illustrates the residual activation networks utilized for predicting
the time series y(t);

FIGURES 7a and 7b illustrate a block diagram of a control system for
optimization/control of a plant's operation;

FIGURE 7c illustrates a control network utilized to generate the new control
variables;

FIGURE 8 illustrates a block diagram of a simplified plant that is operable to
estimate the value and give proper control signals to keep the output at the desired
state;

FIGURE 9 illustrates a straightforward neural network having three input
nodes, each for receiving the input vectors;

FIGURE 10 illustrates the first step of building the neural network;

FIGURE 11 illustrates the next step in building the residual activation
network;

FIGURE 12 illustrates the next step in building the network, wherein the
overall residual network is built;

FIGURE 13 illustrates a block diagram of a chaotic plant;

FIGURE 14 illustrates a block diagram of the residual activation network for
controlling the plant of FIGURE 13; and

Referring now to FIGURE 1, there is illustrated a diagrammatic view of a
predicted model 10 of a plant 12. The plant 12 is any type of physical, chemical,
biological, electronic or economic process with inputs and outputs. The predicted
model is a neural network which is generally comprised of an input layer comprising
a plurality of input nodes 14, a hidden layer comprised of a plurality of hidden
nodes 16, and an output layer comprised of a plurality of output nodes 18. The
input nodes 14 are connected to the hidden layer node 16 through an interconnection
scheme that provides a non-linear interconnection. Similarly, the hidden nodes 16
are connected to the output nodes 18 through a similar interconnection scheme that
is also non-linear. The input of the model 10 is comprised of an input vector 20 of
known plant inputs, which inputs comprise in part manipulated variables referred to
as "control" variables, and in part measured or non-manipulated variables referred to
as "state" variables. The control variables are the input to the plant 12. When the
inputs are applied to the plant 12, an actual output results. By comparison, the
output of the model 10 is a predicted output. To the extent that the model 10 is an
accurate model, the actual output and the predicted output will be essentially
identical. However, whenever the actual output is to be varied to a set point, the
plant control inputs must be varied. This is effected through a control block 22 that
is controlled by a control/optimizer block 24. The control/optimizer block receives
the outputs from the predicted model 10 in addition to a desired output signal and
changes the plant inputs. This allows the actual output to be moved to the setpoint
without utilizing the actual output of the plant 12 itself.

In addition to the control inputs, the plant 12 also has some unmeasured
unknown plant inputs, referred to as "external disturbances", which represent
unknown relationships, etc. that may exist in any given plant such as humidity, feedstock
variations, etc. in a manufacturing plant. These unknown plant inputs or
external disturbances result in some minor errors or variations in the actual output as
compared to the predicted output, which errors are part of the residual. This will
result in an error between the predicted output and the actual output.

Referring now to FIGURE 2, there is illustrated a detailed diagram of a
conventional neural network comprised of the input nodes 14, the hidden nodes 16
and the output nodes 18. The input nodes 14 are comprised of N nodes labelled x1,
x2, ... xN, which are operable to receive an input vector x(t) comprised of a plurality
of inputs, INP1(t), INP2(t), ... INPN(t). Similarly, the output nodes 18 are labelled
o1, o2, ... oK, which are operable to generate an output vector o(t), which is
comprised of the output OUT1(t), OUT2(t), ... OUTK(t). The input nodes 14 are
interconnected with the hidden nodes 16, hidden nodes 16 being labelled a1, a2, ...
an, through an interconnection network where each input node 14 is interconnected
with each of the hidden nodes 16. However, some interconnection schemes do not
require full interconnect. Each of the interconnects has a weight Wij1. Each of the
hidden nodes 16 has an output oi with a function g, the output of each of the hidden
nodes defined as follows:

Similarly, the output of each of the hidden nodes 16 is interconnected with
substantially all of the output nodes 18 through an interconnect network, each of the
interconnects having a weight Wik2 associated therewith. The output of each of the
output nodes is defined as follows:

This neural network is then trained to learn the function f( ) in Equation 1 from the
input space to the output space as examples or input patterns are presented to it, and
the Total-Sum-Square-Error function in Equation 2 is minimized through use of a
gradient descent on the parameters Wik2, Wij1,b1j, b2k.

The neural network described above is just one example. Other types of
neural networks that may be utilized are these using multiple hidden layers, radial
basis functions, gaussian bars (as described in U.S. Patent No. 5,113,483, issued
May 12, 1992, which is incorporated herein by reference), and any other type of
general neural network. In the preferred embodiment, the neural network utilized is
of the type referred to as a multi-layer perception.

Prediction with Residual Activation Network

Referring now to FIGURE 3, there is illustrated an example of a time series
that is composed of underlying signals with several different frequencies. Often, it
is difficult to discern what frequencies are important, or what scales are important
when a problem is encountered. But, for this time series, there is a semi-linear
component, a sign-wave component, and a high-frequency component. The time
series is represented by a solid line with the x-axis representing samples over a
period of time, and the y-axis representing magnitude. The time series represents
the actual output of a plant, which is referred to as y(t). As will be described in
more detail hereinbelow, a first network is provided for making a first prediction,
and then the difference between that prediction and the actual output y(t) is then
determined to define a second time series representing the residual. In FIGURE 3,
the first prediction is represented by a dashed line.

Referring now to FIGURE 4, there is illustrated a plot of the residual of the
time series of FIGURE 3, with the first prediction subtracted from y(t). As will
also be described hereinbelow, a second separate neural network is provided, which
network contains a representation of the residual after the first prediction is
subtracted from y(t). By adding the prediction of this second neural network with
the prediction output by the neural network of FIGURE 3, a more accurate overall
prediction can be made. The residual in FIGURE 4 is illustrated with a solid line,
whereas the prediction of the residual network is represented in a dashed line.

Referring now to FIGURE 5, there is illustrated a diagrammatic view of the
overall network representing the various levels of the residual activation network.
As described above, each level of the network contains a representation of a portion
of the prediction, with a first network NET 1 providing the primary prediction and a
plurality of residual activation networks, NET 2 - NET K, that each represent a
successively finer portion of the prediction. The output of each of these networks is
added together. FIGURE 5 illustrates K of these networks, with each network being
comprised of an input layer, one or more hidden layers, and an output layer 52.
Each of the output layers is summed together in a single output layer 52 with a
linear interconnect pattern.

The input layer of all of the networks NET 1 - NET K is represented by a
single input layer 30 that receives the input vector x(t). Multiple input layers could
be utilized, one for each network. However, since the same input variables are
utilized, the number of input nodes is constant. It is only the weights in the
interconnect layers that will vary. Each network has the representation of the model
stored in the associated hidden layers and the associated weights connecting the
hidden layer to the input layer and the output layer. The primary network NET 1 is
represented by a hidden layer 32, which represents the gross prediction. The hidden
layer 32 is interconnected to an output layer 34 representing the output vector o1(t).
An interconnect layer 36 interconnects the input layer 30 to the hidden layer 32 with
an interconnect layer 38 connecting the hidden layer 32 to the output layer 34. The
interconnection 36, hidden layer 32 and the interconnect 38 provide the non-linear
mapping function from the input space defined by the input layer 30 to the output
space defined by the output layer 34. This mapping function provides the non-linear
model of the system at the gross prediction level, as will be described hereinbelow.

There are K-1 remaining residual networks, each having a hidden layer 40
with output layers 42 representing output vectors o2(t) through oK(t). The input
layer 30 is connected to each of the hidden layers 40 through a separate set of
interconnects 46 and the output layers 42 are each connected to the respective hidden
layer 40 through a separate set of interconnects 50. Each of the hidden layers 40
and their associated interconnects 42 and 46 provide a non-linear representation or
model of the residual as compared to the preceding prediction. For example, the
first residual network, labelled "NET 2", represents the residual of the predicted
output o1(t) in layer 34 as compared to the actual output y(t). In a similar manner,
each successive residual network represents the residue of the prediction from the
output layer prediction of the previous layers subtracted from y(t). Each of the
models represented by the networks between the input layer 30 and each of the
output layers 34 and 42 provide a non-linear mapping function. Each of the output
layers 34 and 42 are then mapped into a single output layer 52, representing the
predicted output oP(t), which is a linear mapping function, such that each output
node in each of the output layers 34 and 42 is directly mapped into a corresponding
node in layer 52 with a weight of "+1". This is a simple summing function.

Referring now to FIGURE 6, there is illustrated a block diagram of the
procedure for training the networks and storing a representation in the respective
hidden layers and associated interconnection networks. Initially, the pattern y(t) is
provided as a time series output of a plant for a time series input x(t). The first
network, labelled "NET 1" is trained on the pattern y(t) as target values and then the
weights therein fixed. This pattern is represented in a layer 60 with an arrow
directed toward the hidden layer 32, representing that the hidden layer 32 is trained
on this pattern as the target. Once trained, the weights in hidden layer 32 and
associated interconnect layers 36 and 38 are frozen. The first network NET 1 is run
by exercising the network with the time series x(t) to generate a predicted output
o1(t). The output layer 34 is interconnected to a first residual layer 62 through a
linear interconnect layer 64 having fixed weights of "-1". Similarly, the block 60
represents an input layer to the residual output layer 62 with an interconnect layer 66
providing interconnection and having a fixed weight of "+1". Of course, any other
fixed weights could be utilized. Therefore, the residual output layer 62 represents
the first residue output r1(t) that constitutes the difference between the predicted
output o1(t) of the first network NET 1 and the target output y(t) or:
r1(t) = y(t) - o1(t)
which could be stated as:
rk(t) = ok-1(t) - ok(t) where: o0 ≡ y(t)

Equations 5 and 6 represent the residual error. The residual of the kth
network is used to train the (k+1) network, which residue is utilized to train the
second network, labelled "NET 2". In the training procedure, the value of r1(t) is
utilized as a target value with the input exercised with x(t). Once trained, the
weights in the hidden layer 40 and associated interconnect layers 46 and 50 are
frozen and then the network exercised with x(t) to provide a predicted output o2(t).
This training continues with the next residual network being trained on the residual
of the previous network as a target value. In this example, a residual r2(t) would
first be determined in a second residual layer 64, which has as its inputs the values
in the residual layer 62 interconnected to the second residual layer 64 through an
interconnect layer 68, having fixed weights of "+1" and also the output of the
output layer 42 interconnected through an interconnection layer 70, having fixed
weights of "-1". The residual r2(t) would be defined as follows:
r2(t) = r1(t) - o2(t)
This residual in the second residual layer 64, would then be utilized to train the next
network illustrated in FIGURE 5. This would continue until sufficient resolution
had been obtained. Once the network is trained, they are interconnected in
accordance with the structure of FIGURE 5, wherein the predicted output of all of
the networks would be added together in the layer 52.

During training, typically, only a limited set of patterns is available. The
network is trained on only a portion of those patterns, with the remainder utilized
for generalization of the network. By way of example, assume that 1000
input/output patterns are available for training. During training of the first network,
only patterns representing time samples from 1 to 800 are utilized in the training
procedure, with patterns from 801 through 1000 utilized to test generalization of the
network to determine how accurate the prediction is. Whether or not the available
set of patterns is limited to reserve some for the purpose of generalization, patterns
not in the set are used to determine how accurate the prediction is. Table 1
illustrates the training procedure wherein the network labelled NET 1 is trained on
the actual output y(t). From this network, a predicted output can then be obtained
after the weights are fixed and then a residual calculated.

TIME

INPUT x(t)

TARGET y(t)

PREDICTED OUTPUT o(t)

RESIDUAL y(t) - o(t) = r1(t)

1

x1, x2, ... xn

y1, y2, ... ym

o11, o12, ...o1m

r11, r12, ... r1m

2

x1, x2, ... xn

y1, y2, ... ym

o11, o12, ... o1m

r11, r12, ...r1m

3

x1, x2, ... xn

y1, y2, ... ym

o11, o12, ...o1m

r11, r12, ... r1m

4

x1, x2, ... xn

y1, y2, ... ym

o11, o12, ...o1m

r11, r12, ... r1m

.

.

.

.

.

.

.

.

800

x1, x2, ... xn

y1, y2, ... ym

o11, o12, ...o1m

r11, r12, ... r1m

.

.

.

.

.

.

.

.

1000

x1, x2, ... xn

y1, y2, ... ym

o11, o12, ...o1m

r11, r12, ...r1m

Table 2 illustrates the second step for training the network labelled NET 2,
representing the network trained on the first residual layer r1(t). This will result in
the predicted output o2(t). The residual of this network will be r2(t), which is
calculated by the difference between the predicted output and the target output.

TIME

INPUT

TARGET r(t)

PREDICTED OUTPUT

RESIDUAL r(t) - o(t) = r2(b)

1

x1, x2, ... xn

r11, r12, ... r1m

o21, o22, ... o2m

r21, r22, ... r2m

2

x1, x2, ... xn

r11, r12, ... r1m

o21, o22, ... o2m

r21, r22, ... r2m

3

x1, x2, ... xn

r11, r12, ... r1m

o21, o22, ...o2m

r21, r22, ... r2m

4

x1, x2, ... xn

r11, r12, ... r1m

o21, o22, ... o2m

r21, r22, ... r2m

.

.

.

.

.

.

.

.

.

.

800

x1, x2, ...xn

r11, r12, ... r1m

o21, o22, ... o2m

r21, r22, ... r2m

.

.

.

.

.

.

.

.

1000

x1, x2, ... xn

r11, r12, ... r1m

o21, o22, ... o2m

r21, r22, ... r2m

Plant Optimization/Control Using a
Residual-Activation Network

Referring now to FIGURE 7a, there is illustrated a block diagram of a
control system for optimization/control of a plant's operation in accordance with the
weights of the present invention. A plant is generally shown as a block 72 having
an input for receiving the control inputs c(t) and an output for providing the actual
output y(t) with the internal state variables s(t) being associated therewith. As will
be described hereinbelow, a plant predictive model 74 is developed with a neural
network to accurately model the plant in accordance with the function f(c(t),s(t)) to
provide an output op(t), which represents the predicted output of plant predictive
model 74. The inputs to the plant model 74 are the control inputs c(t) and the state
variables s(t). For purposes of optimization/control, the plant model 74 is deemed
to be a relatively accurate model of the operation of the plant 72. In an
optimization/control procedure, an operator independently generates a desired output
value od(t) for input to an operation block 78 that also receives the predicted output
op(t). An error is generated between the desired and the predicted outputs and input
to an inverse plant model 76 which is identical to the neural network representing
the plant predictive model 74, with the exception that it is operated by back
propagating the error through the original plant model with the weights of the
predictive model frozen. This back propagation of the error through the network is
similar to an inversion of the network with the output of the plant model 76
representing a Δc(t+1) utilized in a gradient descent operation illustrated by an
iterate block 77. In operation, the value Δc(t+1) is added initially to the input value
c(t) and this sum then processed through plant predictive model 74 to provide a new
predicted output op(t) and a new error. This iteration continues until the error is
reduced below a predetermined value. The final value is then output as the new
predicted control variables c(t+1).

This new c(t+1) value comprises the control inputs that are required to
achieve the desired actual output from the plant 72. This is input to a control
system 73, wherein a new value is presented to the system for input as the control
variables c(t). The control system 73 is operable to receive a generalized control
input which can be varied by the distributed control system 73. As will be described
in more detail hereinbelow, the original plant model 74 receives the variables s(t)
and the control input c(t), but the inverse plant model for back propagating the error
to determine the control variable determines these control variables independent of
the state variables, since the state variables cannot be manipulated. The general
terminology for the back propagation of error for control purposes is "Back
Propagation-to-Activation" (BPA).

In the preferred embodiment, the method utilized to back propagate the error
through the plant model 76 is to utilize a local gradient descent through the network
from the output to the input with the weights frozen. The first step is to apply the
present inputs for both the control variables c(t) and the state variables s(t) into the
plant model 74 to generate the predicted output op(t). A local gradient descent is
then performed on the neural network from the output to the input with the weights
frozen by inputting the error between the desired output od(t) and the predicted
output op(t) in accordance with the following equation:
Δc(t) = c(t+1) - c (t) - η = ∂(od(t) - op(t))2∂(c(t))
where η is an adjustable "step size" parameter. The output is then regenerated from
the new c(t), and the gradient descent procedure is iterated.

As will be described hereinbelow, the inverse plant model 76 utilizes a
residual activation network for the purposes of projecting out the dependencies of the
control variables on the state variables. In this manner, the network 76 will pay
attention to the appropriate attention to the control variables and control the plant in
the proper fashion.

Referring now to FIGURE 7c, there is illustrated an alternate embodiment of
the control system illustrated in FIGURES 7a and 7b. In FIGURE 7a, the control
operation is a dynamic one; that is, the control network will receive as input the
control variables and the state variables and also a desired input and output. The
control variables to achieve a desired output. In the illustration of FIGURE 7c, a
conventional control network 83 is utilized that is trained on a given desired input
for receiving the state variables and control variables and generating the control
variables that are necessary to provide the desired outputs. The distinction between
the control network scheme of FIGURE 7b and the control network scheme of
FIGURE 7a is that the weights in the control network 83 of FIGURE 7b are frozen
and were learned by training the control network 83 on a given desired output. A
desired output is provided as one input for selecting between sets of weights. Each
internal set of weights is learned through training with a residual activation network
similar to that described above with respect to FIGURE 7a, with the desired output
utilized to select between the prestored and learned weights. The general operation
of control nets is described in W.T. Miller, III, R.S. Sutton and P.J. Werbos,
"Neural Networks for Control", The MIT Press, 1990, which reference is
incorporated herein by reference.

Another standard method of optimization involves a random search through
the various control inputs to minimize the square of the difference between the
predicted outputs and the desired outputs. This is often referred to as a monte-carlo
search. This search works by making random changes to the control inputs and
feeding these modified control inputs into the model to get the predicted output. We
then compare the predicted output to the desired output and keep track of the best
set of control inputs over the entire random search. Given enough random trials, we
will come up with a set of control variables that produces a predicted output that
closely matches the desired output. For reference on this technique and associated,
more sophisticated random optimization techniques, see the paper by S. Kirkpatrick,
C.D. Gelatt, M.P. Vecchi, "Optimization by Simulated Annealing". Science, vol.
220, 671-780 (1983), which reference is incorporated herein by reference.

Referring now to FIGURE 8, there is illustrated a block diagram of a
simplified plant that is operable to estimate the output y(t) = x(t+1) and give
proper control signals at time t to the c(t) input to keep the output y(t) at the desired
state, even though there is an external perturbation E(t). The network has available
to it information regarding s(t), c(t) and y(t). y(t) is related to the control vector
c(t) and the state variable vector s(t) by an equation f( ). This is defined as follows:

(In these equations, we ignore time delays for simplicity.)
This will be a relatively straightforward system to design by utilizing the neural
network to embody the non-linear function f( ). However, the state variable s(t) is
related to the control variable vector c(t) by another function fs as follows:

As such, if this functional dependency is not taken into account, the network will not
possess the information to completely isolate the control input from the state variable
input during training, as sufficient isolation is not inherently present in the neural
network by the nature of the design of the neural network itself.

Referring now to FIGURE 9, there is illustrated a straightforward neural
network having three input nodes, each for receiving the input vectors y(t), s(t) and
c(t) and outputting y(t+1). The three input nodes are a node 86 associated with
y(t), a node 88 associated with s(t) and a node 90 associated with c(t). It should be
understood that each of the nodes 86-90 could represent multiple nodes for receiving
multiple inputs associated with each of the vectors input thereto. A single hidden
layer is shown having an interconnection matrix between the input nodes 86-90 and
a hidden layer with an output layer interconnected to the hidden layer. The output
layer provides the output vector y(t+1).

During training of the network of FIGURE 9, no provision is made for the
interdependence between s(t) and c(t) in accordance with the function fs( ), which is
illustrated in a block 91 external to the network. As such, during training through
such techniques as back propagation, problems can result. The reason for this is
that the inversion of the input/output function fs( ) is singular for correlated
variables. In this training, the network is initialized with random weights, and then
it randomly learns on an input pattern and a target output pattern, but this learning
requires it to pay attention to either the state variables or the control variables or
both. If it only pays attention to the state variable input, the network's control
answer is of the form "vary the state variable". However, the state variable is not a
variable that can be manipulated directly. It has to be related back to how to change
the controller. If this is a simple function, as defined by the function fs( ), it may be
a relatively easy task to accomplish. However, -if it is a more complex dependency
that is not obvious to discern, there may be multi-variate non-linear functions of
these control inputs. In performing on-line control (where there is no human in the
loop), it is desirable to have the state information translated automatically to control
information.

According to the present invention, the neural network is configured such
that the interdependence between the control variables c(t) and the state variables
s(t) is properly modeled, with the neural network forced to pay attention to the
control variables during the learning stage. This is illustrated in FIGURE 9a,
wherein a network 89 is illustrated as having the state variables and control variables
isolated. Once isolated, the BPA operation will pay maximal attention to the control
variables. This is achieved by projecting out the dependencies of the control
variables on the state variables.

Referring now to FIGURE 10, the first step of building the neural network is
to model the function fs( ) as defined in Equation 10. A neural network is formed
having an input layer 96, a hidden layer 98 and an output layer 100. The input
layer receives as inputs the controls c(t) in the form of inputs c1, c2 ... cn, with the
output layer representing the predicted state variables sp(t), comprising the outputs
s1p, s2p, ... smp. The neural network of FIGURE 10 is trained by utilizing the state
variables as the target outputs with the control input c(t) and, with back propagation,
fixing the weights in the network to provide a representation of the function fs( ) of
Equation 10. This, therefore represents a model of the state variables from the
control variables which constitutes dependent or measured variables versus
independent or manipulated variables. This model captures any dependencies,
linear, non-linear or multi-variant of the state variables on the control variables. As
will be described hereinbelow, this is an intermediate stage of the network.
Although only a single hidden layer was shown, it should be understood that
multiple hidden layers could be utilized.

Referring now to FIGURE 11, there is illustrated the next step in building
the residual activation network. A residual output layer 102 is provided for
generating the residual states sr(t). The residual states in layer 102 are derived by a
linear mapping function of the predicted states sp(t) into the residual state layer 102
with fixed weights of "-1", and also linearly mapping the input state variables s(t)
from an input layer 104 into the residual layer 102, with the states in the layer 104
being termed the actual states sa(t). The linear mapping function has fixed weights
of "+1". Therefore, the residual state layer would have the following relationship:

The residual states sr(t) in layer 102 are calculated after the weights in the
network labelled NET 1 are frozen. This network is referred to as the "state
prediction" net. The values in the residual layer 102 are referred to as the "residual
activation" of the state variables. These residuals represent a good estimation of the
external variables that affect the plant operation. This is important additional
information for the network as a whole, and it is somewhat analogous to noise
estimation in Weiner and Kahlman filtering, wherein the external perturbations can
be viewed as noise and the residuals are the optimal (non-linear) estimate of this
noise. However, the Kahlman filters are the optimal linear estimators of noise, as
compared to the present system which provides a non-linear estimator of external
influences.

Referring now to FIGURE 12, there is illustrated the next step in building
the network, wherein the overall residual network is built. The output of the
residual layer 102 sr(t) represents f(E(t)), where E(t) comprises the extraneous
inputs that cannot be measured. Such extraneous inputs could be feed stock
variations of chemical processes, etc. The overall residual network is comprised of
a network wherein the inputs are the control inputs c(t) and the residual sr(t).
Therefore, the input layer 96 and the input layer 104 are mapped into an output
layer 106, with a hidden layer 108. The hidden layer 108 being interconnected to
the residual layer 102 through an interconnection network 110 and interconnected to
the input layer 96 through an interconnection network 112. The hidden layer 108
could also be mapped to the output layer, although not shown in this embodiment.
Layer 108 is mapped into output 106 through interconnection network 114.
Therefore, the mapping of both the control input layer 96 and the residual layer 102
to the output layer 106 provides a non-linear representation, with this non-linear
representation trained on a desired output pattern with the input comprising the
control input pattern c(t) and the residual states sr(t). An important aspect of the
present invention is that, during back propagation of the error through BPA, in
accordance with the optimization/control configuration illustrated in FIGURE 7a, the
network effectively ignores the state variables and only provides the c(t+1)
calculation via model inversion (BPA). Since the residuals are functions that do not
change when the control changes, i.e., they are external parameters, these should
not change during the prediction operation. Therefore,when the prediction of the
control changes is made, the residual states are effectively frozen with a latch 113
that is controlled by a LATCH signal. The procedure for doing this is to initially
input the control c(t) and state variables s(t) into the input layer 96 and input layer
104, respectively, to generate the predicted output op(t). During this operation, the
values in the residual layer 102 sr(t) are calculated. The latch is set and these values
are then clamp for the next operation, wherein the desired output od(t) is
generated and the error between the desired output and the predicted output is then
propagated back through the network in accordance with Equation 7. The back
propagation of this error is then directed only toward the controls. The controls are
then changed according to gradient descent, control nets, or one of the other
methods described hereinabove with reference to FIGURE 7a, completing on cycle
in the BPA process. These cycles continue with the sr(t) now latched, until the
output reaches a desired output or until a given number of BPA iterations has been
achieved. This procedure must be effected for each and every input pattern and the
desired output pattern.

By freezing the values in the residual state sr(t), the dependencies of the
controls on the state variables have been projected out of the BPA operation.
Therefore, the residual-activation network architecture will be assured of directing
the appropriate attention to the controls during the BPA operation to generate the
appropriate control values that can help provide an input to the distributed control
system that controls the plant.

By way of example, if one of the controls is a furnace valve, and one of the
states is a temperature, it will be appreciated that these are highly correlated
variables, such that when the prediction of the temperature from the control in NET
1, represented by input layer 96, hidden layer 98 and output layer 100, would be
quite accurate. Hence, when the actual temperature of a state variable 1 is
subtracted from the predicted temperature, the residual is quite small. Thus, any
control signal will go directly to the control and not to the state, constituting a
significant benefit of the present invention. Additionally, the residual is, in fact,
that part of the temperature that is not directly dependent on the controls, e.g. due to
the ambient air temperature, humidity, or other external influences. When the
prediction network is built, the outputs will now be a direct function of the controls
and possibly these external variations, with the residual activation network of the
present invention compensating for external perturbations, via a non-linear
estimation of these perturbations.

Referring now to FIGURE 13, there is illustrated a block diagram of a
chaotic plant. In this example, the task is to estimate y(t+1) and give the proper
control signal at time t to c(t) to keep the output x(t) at the desired state, even
though there is an external perturbation E(t). However, it should be understood that
the neural network model does not directly receive information about E(t). The
residual activation network that receives the inputs c(t), s(t) and y(t) and outputs the
predicted value y(t+1) while receiving the desired output, with the error propagated
back through the network to generate the full values is illustrated in FIGURE 14.
The output variables y(t) are functions of the control variables c(t), the measured
state variables s(t) and the external influences E(t), which can be stated as follows:
y(t) = f(c(t),s(t),E(t)).
The Equation f( ) is assumed to be some uncomplicated non-linear unknown function
which to be modeled by the network. The task is to obtain the best approximation
of this function f( ) by learning from measured data. The assumption is made that
the measured state variables s(t) are some other unknown function of the controls
c(t) and the external perturbations E(t) which would have the following relationship:
s(t) = fs(c(t), E(t)).
The function fs( ) represents the non-linear unknown function of the dependency of
the state variables s(t) on both the control variables s(t) and the external
perturbations E(t). Without loss of generality, this function can be expanded in the
following form:
fs(c(t), E(t)) = fc(c(t)) + fE(E(t) + fcE()...
Where fc( ) depends only on c(t) and fE( ) depends only on E(t).

It is assumed that the magnitude of fc( ) and fE( ) are large compared to the
higher order terms, fcE( ) + ...; most of the dependencies of the states on the
controls can be projected out by learning the states from the controls. The state-variables
prediction can be written as a function of the controls, sp(c(t)) = fps(c(t)).
It is also assumed that the external variations in the controls are not highly
correlated, hence he learned function, fps(c(t)) will be very close to fc(c(t)), since
this is assumed to be the dominant term in the equation. Thus, the following
approximate equality will exist:
fps(c(t)) = fc(c(t)) = fc(c(t)) + ε (c(t), E(t))
where the error ε is small compared to fE(E(t)).

Since the predicted model fps(c(t)), the residuals can then be calculated as
follows:
r(E(t), c(t)) = s(t) - sp(t)
Substituting, the following is obtained:
Reducing this, the following relationship will be obtained:
r(E(t)c(t)) = fc(c(t)) + fE(E(t)) + fcE(c(t),E(t)) + ...- fc(c(t)) - ε(c(t),E(t))r(E(t)c(t)) = fE(E(t)) + fcE(c(t),E(t)) + ...- ε(c(t),E(t))
The c(t) and E(t) dependencies are then grouped into a single term η(c(t), E(t)) as
follows:
r(E(t)c(t)) = fE(E(t)) + η (c(t),E(t))
where, by the above assumptions, η(c(t), E(t)) is expected to be smaller in
magnitude as compared to fE(E(t)).

In the above manner, the majority of the dependencies of the state variables
on the controls have been projected out of the network operations, but the useful
information that is captured by the measured state variables, and that implicitly
contains the external disturbances, is not discarded. Note that since the neural
network learning state variable predictions can learn non-linear functions, this is a
fully general non-linear projection to f(c(t)). Furthermore, by calculating the
residuals, an excellent estimation of the external variations has been provided.

The residuals in the above described example were calculated via a simple
subtraction. However, multiplicative and higher-order terms could exist in the
expansion and, as such, another projection operator would be required to capture
these terms. To achieve this, we would examine the term η(c(t), E(t)) in a manner
totally analogous to the previous term. That is, whereas the first-order dependencies
of the control variables were subtracted, the same methodology can be applied to
capture the higher-order terms. As an example, consider the term η(c(t),E(t)) which
has no first-order dependencies on c(t) and E(t), such that the next highest order is
second-order. The function can be written in the following form:
η(c,E) = Aηc(c)ηE(E) + B[c3; c2E; cE2; E3] + ...
Whereas these dependencies cannot be separated term-by-term as described above,
the higher-order information can be provided, for example, by dividing η(c(t), E(t))
by the actual states. This, together with the substraction (above), will provide two
independent estimates of the external perturbation, and the neural network can build
a better model from the combination of these estimates. An example of this
architecture is illustrated in FIGURE 15. The same higher-order generalizations can
be applied for the prediction residual activation networks, namely taking divisions,
etc., of the activations before further modeling.

In summary, there has been provided a residual activation network that
allows dependencies of the controls on the state variables to be projected out. Once
projected out, Back Propagation-to-Activation control can be utilized to achieve
control and be assured that the network pays appropriate attention to the controls.
The network is comprised of two networks, a first network for modeling the
dependencies of the state variables on the controls and developing a residual value.
The control inputs and residual values are then input to a second network to provide
a predicted output for the plant. A desired output is then determined and combined
with the predicted output for a given set of input control variables in order to
generate an error. This error is back propagated through the control network with
the predicted model therein frozen. Further, this back propagation of error is
performed with the residual values frozen, such that only the control inputs are
varied. This procedure is iterative. The resulting control inputs are then input to
the plant control system to effect changes in the input to the plant to achieve the
desired output.

Although the preferred embodiment has been described in detail, it should be
understood that various changes, substitutions and alterations can be made therein
without departing from the spirit and scope of the invention as defined by the
appended claims. For example, instead of BPA, the residual net can be inverted via
control nets as described in FIGURE 7a or via a Monte-Carlo Search through the
space of control inputs until the desired output is achieved, or through simulated
annealing of the inputs, or any combination thereof.

Claims (17)

A control system for controlling a plant (72), the system
having plant control inputs for receiving plant control variables (c(t)),
measurable state variables (s(t)) of the plant and desired plant outputs
(OD(t)), the measurable state variables (s(t)) having dependencies on the
plant control variables (c(t)) and unmeasurable external influences on the
plant, the system comprising:

a control network input (83) for receiving as network inputs
the current plant control variables (c(t)), the measurable state variables
(s(t)) and desired plant outputs (OD(t));

a control network output (82), for outputting predicted
plant control variables (c(t+1)) necessary to achieve the desired plant
outputs;

a processing system (74, 76, 77 and 78) for processing the
received plant control variables (c(t)) through an inverse representation
(76) of the plant (72) that represents the dependencies of the plant
output on the plant control variables (c(t)) and the measurable state
variables (s(t)) parameterized by an estimation of the unmeasurable
external influences to provide the predicted plant control variables
(c(t+1)) to achieve the desired plant outputs (OD(t)); and

an interface device (73) for inputting the predicted plant
control variables that are output by said control network output to the
plant as plant control variables (c(t+1) to achieve the desired plant
outputs.

A control system as claimed in Claim 1, wherein said
processing system comprises:

an estimation network (76, 78) for estimating the
unmeasurable external influences on the plant (72) and output estimated
external influences; and

means for parameterizing the inverse representation of the
plant with the estimated influences.

A control system as claimed in Claim 1 or Claim 2, wherein
the inverse representation of said processing system is a general non-linear
inverse representation.

A control system as claimed in any preceding claim,
wherein the control variables are variables that can be manipulated.

A control network system as claimed in any of Claims 2 to
4, wherein said processing system comprises:

a first intermediate output (102) for providing a predicted
plant output;

a first intermediate processing system (76) for receiving the
plant control variables (c(t)) from said control network input and the
estimated external influences from said estimation network for
processing through a predictive model of the plant to generate the
predicted plant outputs for output from said intermediate output;

an error generation device (78) for comparing the
predicated plant outputs to the desired plant outputs and generating an
error representing the difference therebetween;

a second intermediate processing system (76) for
processing the error through the inverse representation of the plant that
represents the dependencies of the plant output on the plant control
variables (c(t)) and the measurable state variable (s(t)) parameterized by
the estimated unmeasurable external influences to output predicted
control variable change values; and

a control system (73) for inputting said predicted control
variable change values to the input of said first intermediate processing
system for summing with the control variable input to provide a summed
control variable value, and processing the summed control variable
through said first processing system to minimize said error and output
the summed control variable value (c(t+1)) as the predicted control
variables.

A control network system as claimed in Claim 5, wherein
said second intermediate processing system (76) comprises:

a neural network having an input layer (96) for receiving
said error;

an output layer (100) for providing the predicted output of
the plant;

a hidden layer (98) for mapping said input layer to said
output layer through an inverse representation of the plant that
represents the dependencies of the plant output on the plant control
variables and the measurable state variables (s(t)) parameterized by the
unmeasurable external influences to provide as an output from the
output layer the control variable change values.

A control system as claimed in Claim 5 or Claim 6, wherein
said control system utilizes a gradient descent procedure to minimize
said error.

A control system as claimed in Claim 5 or Claim 6, wherein
said control system utilizes a Monte Carlo technique to minimize said
error.

A control system as claimed in any of Claims 5 to 8,
wherein said second intermediate processing system and said estimation
network comprise:

a residual activation neural network having:

a residual neural network for receiving at inputs in an input
layer (96, 104) the plant control variables (c(t)) and non-manipulatable
plant state variables (s(t)) dependant on the plant control variables, and
mapping the received plant control variables through a hidden layer
(108) to an output layer (106), the hidden layer (108) having a
representation of the dependencies of the measurable plant state
variables (s(t)) on the plant control variables (c(t)) to provide as an
output from said output layer predicted state variables,

a residual layer (102) for determining as a residual the
difference between the plant state variables and the predicted state
variables as an estimation of the external influences on the plant, and

a latch (113) for latching said residual determined in said
residual layer (102) after determination thereof; and
a main neural network having:

an input layer (96) for receiving the plant control variables
(c(t)) and said latched residual,

an output layer (100) for outputting a predicted plant
output, and

a hidden layer (98) for mapping said input layer to said
output layer through a representation of the plant as a function of the
plant control variable inputs and said latched residual, said main neural
network operating in an inverse mode to receive at the output layer said
error and back propagate said error through said hidden layer to said
input layer with said residual latched in said latch to output from said
input layer said predicted control variable change values.

A method of controlling a plant (72) having measurable
state variables (s(t)) and plant control inputs for receiving plant control
variables (c(t)) and desired plant outputs (OD(t)), the measurable state
variables (s(t)) being a function of the plant control variables (c(t)) and
unmeasurable external influences on the plant,

the method comprising the steps of:

receiving the current plant control variables (c(t)) and
desired plant outputs (OD(t));

processing the received plant control variables through an
inverse representation (76) of the plant (72) that represents the
dependencies of the plant output on the plant control variables (c(t)) and
the measurable state variables (s(t)) parameterized by an estimation of
the unmeasurable external influences to provide the predicted plant
control variables necessary to achieve the desired plant outputs (OD(t));

outputting as an output the predicted plant control variables
necessary to achieve the desired plant outputs (OD(t)); and

controlling the plant with the predicted plant control
variables.

A method as claimed in Claim 10, wherein the inverse
representation of the processing system is a general non-linear inverse
representation.

A method as claimed in Claim 10 or 11, wherein the control
variables are variables that can be manipulated.

A method as claimed in any of Claims 11 to 12, comprising:

estimating the unmeasurable external influences on the
plant as estimated external influences; and

parameterizing the inverse representation of the plant with
the estimated external influences.

A method as claimed in Claim 13, wherein the step of
processing comprises:

processing in a first intermediate processing step the plant
control variables and the unmeasurable estimated external influences
through a predictive model of the plant to generate the predicted plant
outputs for output from an intermediate output (102);

comparing the predicted plant outputs to the desired plant
outputs (OD(t)) and generating an error representing the difference
therebetween; and

processing in a second intermediate processing step the
error through the inverse representation of the plant that represents the
dependencies of the plant output on the plant control variables and the
measurable state variables (s(t)) parameterized by the estimated
unmeasurable external influences to output predicted control variable
change values; and

changing the input control variables to the first intermediate
step by the control variable change values to provide the predicted plant
control variables.

A method as claimed in Claim 14, wherein the second
intermediate processing step comprises:

receiving on input layer (104) of a neural network the error;

mapping the neural network input layer to a neural network
output layer (106) through a neural network hidden layer (108) having
stored therein a local representation of the plant; and

operating the neural network in an inverse relationship
wherein the error is received as an input in the output layer and
propagated through the hidden layer having a local inverse
representation of the plant that represents the dependencies of the plant
output on the plant control variables and the measurable state variables
(s(t)) parameterized by the estimate of unmeasurable external influences
to provide as an output from the neural network input layer the
predicted plant control variable change values, wherein the error is back
propagated through the neural network hidden layer to the neural
network input layer.

A method as claimed in Claim 14 or Claim 15, wherein the
first intermediate processing step includes the step of estimating and
comprises:

receiving the plant control variables (c(t)) on an input layer
to a residual neural network and mapping the received plant control
variables (c(t)) to a residual neural network output layer (106) through a
hidden layer (108), the hidden layer (108) having a representation of the
dependencies of non-manipulatable plant state variables (s(t)) on the
plant control variables (c(t)) to provide from the output layer predicted
state variables as a function of the plant control variables, the residual
comprising the estimation of the external influences;

determining as a residual the difference between the plant
state variables (s(t)) and the predicted state variables;

latching the determined residual after determination thereof;

receiving the plant control variables and the latched residual
on an input layer (96) of a main neural network; and

mapping the input layer (96) of the main neural network to
an output layer (100) of the main neural network through a main neural
network hidden layer (98) having stored therein a representation of the
plant as a function of the plant control variable inputs and the residual,
to output from the output layer the predicted plant outputs.

A method as claimed in Claim 16, wherein the step of
changing the input control variables comprises iteratively changing the
input control variables by summing with the predicted control variable
change values to minimize the error in accordance with a gradient
descent technique.