Were looking at a task, presented to monkeys over 10 years ago, where two images were presented to the monkeys, and they had to associate left and rightward saccades with both.

The associations between saccade direction and image was periodically reversed. Unlike humans, who probably could very quickly change the association, the monkeys required on the order of 30 trials to learn the new association.

Interestingly, whenever the monkeys made a mistake, they effectively forgot previous pairings. That is, after an error, the monkeys were as likely to make another error as they were to choose correctly, independent of the number of correct trials preceding the error. Strange!

The synaptic weights are changed based on the presynaptic activity, the postsynaptic activity minus the probability of both presynaptic and postsynaptic activity. This 'minus' effect seems similar to that of TD learning?

The synaptic weights are soft-bounded,

There is a stop-learning criteria, where the weights are not positively updated if the total neuron activity is strongly positive or strongly negative. This allows the network to ultimately obtain perfection (at some point the weights are no longer changed upon reward), and explains some of the asymmetry of the reward / punishment.

Their model perhaps does not scale well for large / very complicated tasks... given the presence of only a single reward signal. And the lack of attention / recall? Still, it fits the experimental data quite well.

They also note that for all the problems they study, adding more layers to the network does not significantly affect learning - neither the rate nor the eventual performance.