While I don't know enough about this topic to be an expert, Bayesian
methods are supposed to automatically prevent overfitting - or rather,
overfitting results from violation of Bayesian first principles. On the
other hand, true Bayesian methods might be too expensive. My point is
that noise is not the only possible way of preventing overfitting; there
are deeper, more powerful, mathematically more elegant ways of preventing
overfitting, to which the injection of noise is only an approximation. A
cheap, good approximation? Perhaps. But it isn't magic. Nor is the
relation between noise and preventing overfitting as deep as it seems.
According to Bishop94 above, for example, it amounts to adding an extra
term in the loss function, and it is possible to get the same benefit
without some dangers by modifying the algorithm directly instead of
training with noise.

One of these papers describes how adding a noise term to a weak input
signal to a cricket *neuron* resulted in more information in the spike
train emerging *from the cell*; it doesn't mean that adding noise to a
weak signal actually adds information to it! Furthermore it seems fairly
obvious how adding to noise to a weak input signal could result in more
information from the spike train, given the cell's *lossy* processing
properties; if the noise boosts the weak signal into the steepest part of
the slope of the cell's response threshold, the resulting spike train may
contain more information or better temporal information (if previously
weak signals needed to accumulate before the cell fired). Perhaps the
cricket cells evolved to work in the presence of noise, or in the presence
of sound at a particular threshold value, but that does not mean the noise
"boosts processing efficiency"; an ab initio algorithm could extract more
data from the signal without the noise.

Noise is not magic, and for noise to result in any fundamental algorithmic
improvement would, I maintain, violate the second law of thermodynamics.