I haven't done any experiments with non-deterministic
payoffs--and it is important to do some, certainly.
My feeling--"guess"--is that in the presence of noise,
different classifiers would still come down with different
degrees of error. Their error would be a combination of
their "intrinsic" (I.e., noiseless) error, and the
added noise. Then the exponential accuracy curve would
be able to separate them.

Note that the important thing is to give classifiers having
different errors different accuracies, and thus fitnesses.
Then those with the least error will win out. It is not
important that the classifiers be "accurate" in the sense
of having errors less than epsilon0. My use of the term
"accurate" was merely technical, to denote classifiers with
errors less than the threshold.

(In the following I use p to represent the classifiers prediction,
Eo to represent Epsilon sub 0 - the minimum prediction error
divided by the maximum payoff, Ej to represent Epsilon sub J - the
classifiers current prediction error normalised in the same way,
, and F to represent the current fitness. : )

I have been playing about with your updates of p Eo and F. It
would appear that, if you are in an enviroment where the payoffs
received are even slightly non-deterministic with respect to
a classifier's condition, it is difficult to obtain an accuracy
value of 1.0.

This accuracy value is obtained when the cut-off accuracy function
obtains a rule prediction error (Ej) of less than the (user specified)
minimum error. However, when we have a slightly oscillating reward
sequence, such as 9.0, 9.5, 9.3 (out of a maximum reward range of 10),
then Ej does not reduce sufficiently to trigger the cut-off accuracy
calculation to produce 1.0 when Eo is set to 0.01.

At the same time, the steep logarithmic curve of the accuracy
calculation allows (in this example) the computed accuracy values to
hover around 0.1 - clearly a relatively poor accuracy result even when
the classifier predicts the reward fairly well. Setting alpha to a
higher figure (in this example it was 0.1) will, of course, improve
things since the logarthmic curve will be less steep, and the
accuracy values returned will rapidly move towards this higher alpha,
although the profile of the curve now provides less distinction between
classifiers which have not yet reached the accuracy cut-off point.

Variable payoffs could easily occur in a robotics control problem
where the classifier inputs are reduced from the actual inputs
that the robot receives, or where the robots sensors are not
accurate enough to distinguish all enviromental states as
separate states, inspite of the fact that the payoffs are close.

My simulation of the calculations could, of course, be in error,
and the use of the discounted Max as the payoff would tend to
smooth such instances somewhat, but I remain concerned that the
steep logarithmic curve might cause some classifiers to be
classed as inaccurate when in fact they are predicting a slightly
non-deteministic enviromental signal well.