They obviously both have different mathematical formulas, but to my (untrained) eye they both seem to model similar curves, perhaps even curves that could be reached exactly by either function given the right inputs.

If we're talking about fitting (which we are...), then presumably the person choosing between one or the other is interested in the values of the chosen function's variables (or some other feature of the curve like its height or FWHM), once the fitting has been done. Why? Presumably, the function is chosen since its mathematical formula instrinsicly relates to some part of the system that produced the data, but what has the Lorentzian got that the Gaussian doesn't, or vice-versa? (Or any other curve, for that matter?)

Ok, I really do apologise - I know this is a poor (and poorly worded) question. Hopefully somebody understands what I'm trying to get at and can point me in the right direction.

Where to start? The Cauchy distribution has no finite mean whereas the normal one has. The Cauchy distribution has fat tails whereas the normal one does not... these are simply two completely different worlds. Besides, I do not think the question is well suited for MO.
–
Peter SarkociApr 18 '12 at 20:59

3 Answers
3

A Short answer: Robustness. The gaussian distribution effectively assumes there are no outliers. If that assumption is wrong, it can give misleading results! But, on the other hand, using the cauchy distribution might be too extreme. A startpoint for you could be:
http://en.wikipedia.org/wiki/Robust_statistics

This depends on what you are trying to model. In my work (vibrational spectroscopy) the Lorentz lineshape is used to model 'pure' vibrational modes, which only undergo homogeneous line-broadening. The Gaussian lineshape is used to model those curves which have additional broadening terms from instrumental effects.

Here is a citation to a paper on this: Robert Meier, Vibrational Spectroscopy 39 (2005) 266–269

You'll have to do some research in your specific field to know for sure which you should be using. It is a very subtle science...

One side of the question is whether to use a mixture of Cauchy distributions or a mixture of Gaussians to fit some given multimodal function. Sometimes the data is better suited to one or the other.

Another aspect is how fitting a mixture of Cauchy distributions to data which is well approximated by a mixture of Cauchy distributions differs from fitting a mixture of Gaussians to a mixture of Gaussians. Because Cauchy distributions are less localized and have some nice algebraic properties, it can be much easier to fit a mixture of Cauchy distributions.

Suppose you can estimate the probability density function of a distribution. This is common in practice, and you expect that your measurements are not perfect. There are algebraic methods for identifying a mixture of exponential distributions by using the Laplace transform (numerically) to convert a sum of exponentials to a rational function you can identify with Padé approximation. Similar techniques work on mixtures of Cauchy distributions, as I found when I helped a some material scientists with a spectroscopy problem. The Hilbert transform of a Cauchy distribution is another rational function: $H(\frac{1}{1+t^2})(s) = \frac{s}{1+s^2}$. Just as with the well-used Laplace-Padé method, you can identify the number of Cauchy components. Of course, you could try to use Padé approximation in the first place to identify the Cauchy mixture, but that wouldn't use all of your data, and it would be very vulnerable to noise. Using the Hilbert transform first lets you cancel out the noise and lets you extract useful information from data points far from the peaks of the components.

By contrast, densities collected several FWHMs or standard deviations from the peak of a Gaussian component will have little information about that component. You need exponentially large amounts of data or else the signal will be hidden by the noise. It is difficult to identify the correct number of components in a Gaussian mixture model. The best numerical fits may ignore components of significant amplitudes which are far from the data in favor of splitting components or adding very localized peaks to fit outliers. These may be ameliorated by using regularization, but the lack of signal far from the peaks of the components can't be fixed by massaging the data.