I have feed-forward neural network with six inputs, 1 hidden layer and two output nodes (1; 0). This NN is learned by 0;1 values.
When applying model, there are created variables confidence(0) and confidence(1), where sum of this two numbers for each row is 1.
My question is: what do these two numbers (confidence(0) and confidence(1)) exactly mean? Are these two numbers probabilities?

2 Answers
2

The confidence values (or scores, as they are called in other programs) represent a measure how, well, confident the model is that the presented example belongs to a certain class. They are highly dependent on the general strategy and the properties of the algorithm.

Examples

The easiest example to illustrate is the majority classifier, who just assigns the same score for all observations based on the proportions in the original testset

Another is example the k-nearest-neighbor-classifier, where the score for a class i is calculated by averaging the distance to those examples which both belong to the k-nearest-neighbors and have class i. Then the score is sum-normalized across all classes.

In the specific example of NN, I do not know how they are calculated without checking the code. I guess it is just the value of output node, sum-normalized across both classes.

Do the confidences represent probabilities ?

In general no. To illustrate what probabilities in this context mean: If an example has probability 0.3 for class "1", then 30% of all examples with similar feature/variable values should belong to class "1" and 70% should not.

As far as I know, his task is called "calibration". For this purpose some general methods exist (e.g. binning the scores and mapping them to the class-fraction of the corresponding bin) and some classifier-dependent (like e.g. Platt Scaling which has been invented for SVMs). A good point to start is:

Franck, I think I was misunderstood, because I was not writing about confidence of whole set. In my output set I have predicted variable (0 or 1) and confidence(0) and confidence(1) columns. These numbers are different for each row/observation of output.
–
julobAug 8 '12 at 12:40

@FranckDernoncourt please use "@", I did not see the reply. In general means "in general for all classification algorithms implemented in rapidminer". I may have misinterpreted your answer though, but as it stands, it is wrong. Note that the confidence is calculated per row, but your confidence is fixed for all rows. Your answer hence assumes that every classifier works like a majority classifier.
–
steffenOct 9 '12 at 15:15