What is special about rectifier neural units used in NN learning?

The softplus function can be approximated by max function (or hard max ) ie . The max function is commonly known as Rectified Linear Function (ReL).

In the following figure below we see different activation functions plotted.

The major differences between the sigmoid and ReL functions are:

Sigmoid function has a range [0,1] whereas ReL function has a range . Due to its range, sigmoid can be used to model probability hence, it is commonly used for regression or probability estimation at the last layer even when you use ReL for the previous layers. NERD NOTE: The view of softplus function is approximation of stepped sigmoid units relates to the binomial hidden units as discussed in http://machinelearning.wustl.edu...

The gradient of sigmoid function vanishes as x recedes from 0 so basically it is called "saturated" at this point. However, the gradient of ReL function is such problem free due to its unbounded and linear positive part.

The advantages of using Rectified Linear Units in Neural Networks are;