You are here

Activation Functions

Most neural networks pass the output of their layers through activation functions. These activation functions scale the output of the neural network into proper ranges. The neural network program in the last section used the sigmoid activation function. The sigmoid activation function is the default choice for the FeedforwardLayer class. It is possible to use others. For example, to use the hyperbolic tangent activation function, the following lines of code would be used to create the layers.

As you can see from the above code, a new instance of ActivationTANH is created and passed to each layer of the network. This specifies that the hyperbolic tangent should be used, rather than the sigmoid function.

You may notice that it would be possible to use a different activation function for each layer of the neural network. While technically there is nothing stopping you from doing this, such practice would be unusual.

There are a total of three activation functions provided:

Hyperbolic Tangent

Sigmoid

Linear

It is also possible to create your own activation function. There is an interface named ActivationFunction. Any class that implements the ActivationFunction interface can serve as an activation function. The three activation functions provided will be discussed in the following sections.

Using a Sigmoid Activation Function

A sigmoid activation function uses the sigmoid function to determine its activation. The sigmoid function is defined as follows:

Equation 5.1: The Sigmoid Function

The term sigmoid means curved in two directions, like the letter “S.” You can see the sigmoid function in Figure 5.2.

Figure 5.2: The Sigmoid function.

One important thing to note about the sigmoid activation function is that it only returns positive values. If you need the neural network to return negative numbers, the sigmoid function will be unsuitable. The sigmoid activation function is implemented in the ActivationFunctionSigmoid class. This class is shown in Listing 5.2.

As you can see, the sigmoid function is defined inside the activationFunction method. This method was defined by the ActivationFunction interface. If you would like to create your own activation function, it is as simple as creating a class that implements the ActivationFunction interface and providing an activationFunction method.

The ActivationFunction interface also defines a method named derivativeFunction that implements the derivative of the main activation function. Certain training methods require the derivative of the activation function. Backpropagation is one such method. Backpropagation cannot be used on a neural network that uses an activation function that does not have a derivative. However, a genetic algorithm or simulated annealing could still be used. These two techniques will be covered in the next two chapters.

Using a Hyperbolic Tangent Activation Function

As previously mentioned, the sigmoid activation function does not return values less than zero. However, it is possible to “move” the sigmoid function to a region of the graph so that it does provide negative numbers. This is done using the hyperbolic tangent function. The equation for the hyperbolic activation function is shown in Equation 5.2.

Equation 5.2: The TANH Function

Although this looks considerably more complex than the sigmoid function, you can safely think of it as a positive and negative compatible version of the sigmoid function. The graph for the hyperbolic tangent function is provided in Figure 5.3.

Figure 5.3: The hyperbolic tangent function.

One important thing to note about the hyperbolic tangent activation function is that it returns both positive and negative values. If you need the neural network to return negative numbers, this is the activation function to use. The hyperbolic tangent activation function is implemented in the ActivationFunctionTanH class. This class is shown in Listing 5.3.

As you can see, the hyperbolic tangent function is defined inside the activationFunction method. This method was defined by the ActivationFunction interface. The derivativeFunction is also defined to return the result of the derivative of the hyperbolic tangent function.

Using a Linear Activation Function

The linear activation function is essentially no activation function at all. It is probably the least commonly used of the activation functions. The linear activation function does not modify a pattern before outputting it. The function for the linear layer is given in Equation 5.3.

Equation 5.3: A Linear Function

The linear activation function might be useful in situations when you need the entire range of numbers to be output. Usually, you will want to think of your neurons as active or non-active. Because the hyperbolic tangent and sigmoid activation functions both have established upper and lower bounds, they tend to be used more for Boolean (on or off) type operations. The linear activation function is useful for presenting a range. A graph of the linear activation function is provided in Figure 5.4.

Figure 5.4: The linear activation function.

The implementation for the linear activation function is fairly simple. It is shown in Listing 5.4.

As you can see, the linear activation function does no more than return what it is given. The derivative of the linear function is 1. This is not useful for training, thus the derivativeFunction for the linear activation function throws an exception. You cannot use backpropagation to train a neural network that makes use of the linear function.