derivation of common activation functions

Jun 18, 2017
• Aidan Rocke

In this blog post I’d like to show how commonly used activation functions can be derived from the sigmoid
activation function. As a result, we can show that these functions have a shared mathematical lineage with
the sigmoid.

sigmoid:

hyperbolic tangent:

Now, we note that:

From this it follows that we have:

softplus:

Now, if we compute the integral of the sigmoid:

where is an arbitrary constant.

ReLU:

Note that in when ,

From this we may deduce the much more computationally efficient ReLU activation:

What I find very interesting is that although these activation functions can all be derived
from the sigmoid they have very different properties from the sigmoid. I’m not sure we can
derive all the emergent properties of a neural network with a particular function using the
tools of real analysis but this is an interesting question that I shall certainly revisit
in the near future.