MLP Activation Function - train vs predict

In the docs, it says that training is done using the sigmoid activation function. However, when you use predict it uses y = 1.7159*tanh(2/3 * x).

Can I just have a little clarification please? I don't understand why they train and predict on two different equations. Not only am I just genuinely curious why they are different, but I also need responses to have the range [0,1]. I could normalize the responses, however a better solution is to insure my values in the feature vector are positive (which does not cause a loss in data for my case), this insures the sigmoid responses are between [0,1]. This works for the training side, however not for the testing side since they use different equations and responses will now be between [0,1.7]. I would rather avoid another normalization, if possible.

Ok I'm not native english speaking but I understand it is a conditional :

"If you are using the default cvANN_MLP::SIGMOID_SYM activation function with the default parameter values fparam1=0 and fparam2=0 then the function used is y = 1.7159*tanh(2/3 * x), so the output will range from [-1.7159, 1.7159], instead of [0,1]."

In source code (opencv 3.3) it is here to defne activation function and predict it's here. it is the same fucntion .

1 answer

Okay so tanh(x) = sigmoid function = (1+e^(-2x)) / (1-e^(-2x)) so the activation functions are the same. The sigmoid function for training is defined as (beta) * (1+e^(-(alpha)x)) / (1-e^(-(alpha)x)). The predict function is defined as 1.7159*tanh(2/3 * x).

LBerger noticed in the source code that this can change with fparam1 and fparam2, which after some more digging are also alpha and beta, respectfully. By default they are set to zero, however the source code then changes their values to 2/3 and 1.7159 if it is less than FLT_EPSILON which is just a small decimal - so default value of zero means default values of 2/3 and 1.7159.

I'm not sure why they chose these numbers, but if I want to scale my responses between [0,1] then I need to insure all passed in values are positive, and that beta is set to 1.

I would like to set alpha to 2 as this is the original tanh(x) identity. However I am not sure how alpha and the fparam1 (2/3) conversion is made - as in order to maintain the identity with alpha = 2, fparam1 for tanh (the value set to 2/3 by default) should be 1. As seen by tanh(x) = (1+e^(-2x)) / (1-e^(-2x)).

So at some point I think there should be something like alpha = 2 * fparam1, however I do not see this anywhere in the source code, it seems fparam1 = alpha as identity but that is not the equality relationship between hyperbolic tangent and the sigmoid function, is there a reason why?