Contents

Both neural networks and fuzzy systems have some things in common. They can be used for solving a problem (e.g. pattern recognition, regression or density estimation) if there does not exist any mathematical model of the given problem. They solely do have certain disadvantages and advantages which almost completely disappear by combining both concepts.

Neural networks can only come into play if the problem is expressed by a sufficient amount of observed examples. These observations are used to train the black box. On the one hand no prior knowledge about the problem needs to be given. On the other hand, however, it is not straightforward to extract comprehensible rules from the neural network's structure.

On the contrary, a fuzzy system demands linguistic rules instead of learning examples as prior knowledge. Furthermore the input and output variables have to be described linguistically. If the knowledge is incomplete, wrong or contradictory, then the fuzzy system must be tuned. Since there is not any formal approach for it, the tuning is performed in a heuristic way. This is usually very time consuming and error-prone.

It is desirable for fuzzy systems to have an automatic adaption procedure which is comparable to neural networks. As it can be seen in Table 1, combining both approaches should unite advantages and exclude disadvantages.

Characteristics

Compared to a common neural network, connection weights and propagation and activation functions of fuzzy neural networks differ a lot. Although there are many different approaches to model a fuzzy neural network (Buckley and Hayashi, 1994, 1995; Nauck and Kruse, 1996), most of them agree on certain characteristics such as the following:

Figure 1: The architecture of a neuro-fuzzy system

A neuro-fuzzy system based on an underlying fuzzy system is trained by means of a data-driven learning method derived from neural network theory. This heuristic only takes into account local information to cause local changes in the fundamental fuzzy system.

It can be represented as a set of fuzzy rules at any time of the learning process, i.e., before, during and after.

Thus the system might be initialized with or without prior knowledge in terms of fuzzy rules.

The learning procedure is constrained to ensure the semantic properties of the underlying fuzzy system.

A neuro-fuzzy system approximates a n-dimensional unknown function which is partly represented by training examples.

Fuzzy rules can thus be interpreted as vague prototypes of the training data.

A neuro-fuzzy system is represented as special three-layer feedforward neural network as it is shown in Figure 1.

The first layer corresponds to the input variables.

The second layer symbolizes the fuzzy rules.

The third layer represents the output variables.

The fuzzy sets are converted as (fuzzy) connection weights.

Some approaches also use five layers where the fuzzy sets are encoded in the units of the second and fourth layer, respectively. However, these models can be transformed into a three-layer architecture.

One can basically distinguish between three different kinds of fuzzy neural networks, i.e., cooperative, concurrent and hybrid FNNs (Nauck et al., 1997).

Cooperative Fuzzy Neural Network

Figure 2: Different cooperative fuzzy neural networks

In the case of cooperative neural fuzzy systems, both artificial neural network and fuzzy system work independently from each other. The ANN tries to learn the parameters from the fuzzy system. This can be either performed offline or online while the fuzzy system is applied. Figure 2 depicts four different kinds of cooperative fuzzy neural networks.

The upper left fuzzy neural network learns fuzzy set from given training data. This is usually performed by fitting membership functions with a neural network. The fuzzy sets are then determined offline. They are then utilized to form the fuzzy system by fuzzy rules that are given (not learned) as well.

The upper right neuro-fuzzy system determines fuzzy rules from training data by a neural network. Here as well, the neural networks learns offline before the fuzzy system is initialized. The rule learning usually done by clustering on self-organizing feature maps (Bezdek et al., 1992; Vuorimaa, 1994). It is also possible to apply fuzzy clustering methods to obtain rules.

In the lower left neuro-fuzzy model, the system learns all membership function parameters online, i.e., while the fuzzy system is applied. Thus initially fuzzy rules and membership functions must be defined beforehand. Moreover, the error has to be measured in order to improve and guide the learning step.

The lower right one determines rule weights for all fuzzy rules by a neural network. This can be done online and offline. A rule weight is interpreted as the influence of a rule (Kosko, 1992). They are multiplied with the rule output. In (Nauck et al., 1997) the authors argue that the semantics of rule weights are not clearly defined. They could be replaced by modified membership functions. However, this could destroy the interpretation of fuzzy sets. Moreover, identical linguistic values might be represented differently in dissimilar rules.

Hybrid Fuzzy Neural Network

Figure 3: A hybrid fuzzy neural network

Hybrid neuro-fuzzy systems are homogeneous and usually resemble neural networks. Here, the fuzzy system is interpreted as special kind of neural network. The advantage of such hybrid NFS is its architecture since both fuzzy system and neural network do not have to communicate any more with each other. They are one fully fused entity. These systems can learn online and offline. Figure 3 shows such a hybrid FNN.

The rule base of a fuzzy system is interpreted as a neural network. Fuzzy sets can be regarded as weights whereas the input and output variables and the rules are modeled as neurons. Neurons can be included or deleted in the learning step. Finally, the neurons of the network represent the fuzzy knowledge base. Obviously, the major drawbacks of both underlying systems are thus overcome.

In order to build a fuzzy controller, membership functions which express the linguistic terms of the inference rules have to be defined. In fuzzy set theory, there does not exist any formal approach to define these functions. Any shape (e.g., triangular, Gaussian) can be considered as membership function with an arbitrary set of parameters. Thus the optimization of these functions in terms of generalizing the data is very important for fuzzy systems. Neural networks can be used to solve this problem.

By fixing a distinct shape of the membership functions, say triangular, the neural network must optimize their parameters by gradient descent (Nomura et al., 1992). Thus, aside information about the shape of the membership functions, training data must be available as well.

To guarantee the characteristics of a fuzzy system, the learning algorithm must enforce the following mandatory constraints:

Fuzzy sets must stay normal and convex.

Fuzzy sets must not exchange their relative positions (they must not pass each other).

Fuzzy sets must always overlap.

Additionally there do exist some optional constraints like the following:

Fuzzy sets must stay symmetric.

The membership degrees must sum up to 1.

An important hybrid fuzzy neural network has been introduced in (Berenji, 1992). The ARIC (approximate reasoning-based intelligent control) is presented as a neural network where a prior defined rule base is tuned by updating the network's prediction. Thus the advantages of fuzzy systems and neural networks are easily combined as presented in Table 1.

The ARIC is represented by two feed-forward neural networks, the action-state evaluation network (AEN) and the action selection network (ASN). The ASN is a multilayer neural network representation of a fuzzy system. It then again consists of two separate. The first one represents the fuzzy inference and the second one computes a confidence measure based on the current and next system state. Both parts are eventually combined to the ASN's output.

As it is shown in Figure 1, the first layer represents the rule antecedents, whereas the second layer corresponds to the implemented fuzzy rules and the third layer symbolized the system action. The network flow is at follows. In the first layer the system variables are fuzzified. In the next step these membership values are multiplied by the attached weights of the connections between the first and second layer. In the latter layer, every rule's input corresponds to the minimum of its input connections.

A rule's conclusion is installed as membership function. This function maps the inverse rule input value. Its output values is then multiplied by the weights of the connections between second and third layer. The final output value is eventually computed by the weighted average of all rules' conclusions.

The AEN (which is as three-layer feed-forward neural network as well) aims to forecast the system behavior. The hidden layer obtains as input both the system state and an error signal from the underlying system. The output of the networks shall represent the prediction of the next reinforcement which depends on the weights and the system state. The weights are changed by a reinforcement procedure which takes into consideration the outputs of both networks ASN and AEN, respectively. ARIC was successfully applied to the cart-pole balancing problem.

Whereas the ARIC model can be easily interpret as a set of fuzzy-if-then rules, the ASN network to adjust the weights is rather difficult to understand. It is a working neural network architecture that utilizes aspects of fuzzy systems. However, a semantic interpretation of some learning steps is not possible.

Berenji and Khedkar (1992) introduced an improvement of the their former approach named GARIC (generalized ARIC). This idea does not suffer from dif­ferent interpretations of the linguistic values anymore by refraining from weighted connections in the ASN. Instead the fuzzy sets are represented as nodes in the network. Moreover the learning procedure changes parameters of these nodes and thus the shape of the membership functions. GARIC is also able to use any kind of membership functions in the conclusion since a different defuzzifier and a differentiable soft-minimum function are used.

Note that the ANFIS model (Jang, 1993) also implements a Sugeno-like fuzzy system in a network structure. Here a mixture of plain backpropagation and least mean squares procedure is used to train the system. Both the ANFIS and the GARIC model are not so easy to interpret as, e.g., Mamdani-type fuzzy systems. Therefore models like NEFCON (Nauck, 1994), NEFCLASS (Nauck and Kruse, 1996) and NEFPROX (Nauck and Kruse, 1997) have been developed for neuro-fuzzy control, classification and regression, respectively. They all implement Mamdani-type fuzzy systems and thus use special learning algorithms.