I want to elaborate on a statement I read in Acton's "Numerical methods that work", paragraph "Exponential fitting", page 252.

Computationally we are being asked to fit only the parameters $A$ and $B$ in the equation $y = A\exp (-at) + B\exp (-bt)$
when we have observed a sample at several times to produce
a set of $(t_i, y_i)$ pairs. It is a simple least-squares fit that
generally requires only a desk calculator.

OK I get this part. But then the author goes on :

Unfortunately there is a companion problem that looks
only slightly more complicated — until you try it! We again
have $(t_i, y_i)$ readings from a radioactive sample, but the
decaying materials are not known, hence the decay rates $a$
and $b$ must also be fitted. The answer to this problem lies
in the chemical rather than the computer laboratory, and the
sooner the hopeful innocent can be sent there and away
from the computer room, the better off everyone will be.
For it is well known that an exponential equation of this
type in which all four parameters are to be fitted is extremely
ill conditioned.

(Emphasis is mine.)

I didn't understand this statement, so I started to think how I would have proceed if faced with the same problem. Considering the simplicity of this function

$$ y = A\exp (-at) + B\exp (-bt) $$

seen as a function of $(A,a,B,b)$, I would go straighforward to try some sort of gradient-based method because I can easily compute the partial derivatives.

If I call $E(y)$ my error function, the problem resort to minimizing the quantity

$$ E(y) = \sum_{i=1}^{N} \left( y(t_i) - y_i \right) ^2 $$

Again the partial derivatives with respect to $A$, $a$, $B$ or $b$ are easy to compute, and I don't see the problem coming.

$\begingroup$The need to fit $A, B, a, b$ makes it a nonlinear least squares problem because the function is nonlinear in $a$ and $b$. There are plenty of algorithms out there, some gradient based, as you have observed. IMO, the measurement of the decay data has to be done in the chemical lab. The choice of exponential or other function can be done by some one who knows basic functions and their trends. The decay rate can be guessed by a good numerical mind and/or further refined by a numerical program.$\endgroup$
– NameRakesApr 28 '16 at 15:17

$\begingroup$@NameRakes actually, exponential fitting is a linear (albeit infinite-dimensional problem) that can be solved in finite dimensions with error control using variants of Prony's method.$\endgroup$
– Richard ZhangApr 28 '16 at 16:53

$\begingroup$OP, do you mean to ask why the problem of exponential fitting itself is fundamentally ill-posed (regardless of method), as stated in the original quote, or are you more interested in understanding why the nonlinear least-squares approach would not be effective?$\endgroup$
– Richard ZhangApr 28 '16 at 16:58

$\begingroup$@RichardZhang Both but the former if I have to choose !$\endgroup$
– bela83Apr 28 '16 at 17:04

1

$\begingroup$@NameRakes Instead, suppose that we wish to fit a single sinusoid to periodic data. We could attempt to formulate the problem as the nonlinear curve fitting problem of picking $A, \omega, \phi$ over $y(t) = A\cos(\omega t + \phi)$. But this is highly nonconvex, computationally intensive, and we're likely to get stuck on a bad solution. Instead, we would perform a (finite-dimensional) Fourier transform and read the information off the spectrum. Prony's method is the analog of this procedure for decaying exponentials.$\endgroup$
– Richard ZhangApr 28 '16 at 20:32

3 Answers
3

The problem is ill-conditioned in the sense that very small changes to the $(y_{i},t_{i})$ data can lead to much larger relative changes in the best fitting parameters. If you do the arithmetic in limited precision (and even IEEE double precision is can be inadequate in practice) then the precision of your floating point arithmetic may not be adequate to obtain a reasonably accurate solution to the nonlinear least squares problem.

$\begingroup$Yes I understand that, but don't know how to demonstrate. Would it be the same for instance if one had only one exponential with two parameters $A \exp (-at)$ ?$\endgroup$
– bela83Apr 29 '16 at 8:54

In addition to being ill-posed (as discussed by @BrianBorchers), the problem is also difficult to solve because it is not convex. The original least squares problem is convex because it only contains a sum of squares in the unknowns $A,B$ if $a,b$ are given. But if all four parameters are unknowns, then the exchange $A,a \leftrightarrow B,b$ in four-dimensional space leaves the objective function untouched. In other words, the problem will have (at least) two minima and consequently cannot be convex unless the optimum satisfies $A=B,a=b$; however, in that case, it is easy to see that there are infinitely many optima of the form $A'+B'=A+B$ and the problem is at least not strictly convex.

So, the problem not being (strictly) convex, you will run into all sorts of numerical difficulties when trying to solve your least-squares problem.

I can report a personal experience on trying to fit fluorescence decays by a sum of two exponentials. We had a quantity of repeated measurements in the same condition, and several different conditions. Non-convexity, and non-uniqueness of the solutions have already been dealt with, the problem is ill-posed, so I will focus here on practical "data" aspects.

Out of my best efforts (back in 2010), in one of the cleanest cases, the fit was fine enough, except at the beginning.
Some caveats follow.

Offset: in measurements, you often have an initial dead-time (before some $t_0$ time), flat signal, whose amplitude may inform you about a data offset. That offset is troublesome, because you could fit it with a lot of low-decay exponentials. Very often, maybe because of the limited length of the data frame, the signal does not bow down to the initial offset level, and I suspect that the final level might not be the same. One more parameter to estimate.

Initial time: normally, the model is insensitive to time-shifts. But a measured signal has a beginning. Here, we have to choose a $t_0$. Not too late, but if too early, the sharp peak might be distorted by instrumental response (convoluted), and the bi-exponential model is not valid anymore. However, this part is highly energetic, and pumps a lot in optimization. As you can see, the difference between a fit and the data shows a peculiar not-quite-exponential residue.

Location: past the initial time, you have to choose a fit window, where the model is the most valid or stable. I have observed that when I slide the fit window, fitting parameters tend to vary, before becoming invalid on the right end. In fact, one can observe in $\log$ domains that stuff is not always stationnary.

Sampling: often, when running experiments, you choose a fixed sampling. It affects differently fatty signals (low decay) and skinny ones (sharp decay), both on the choice of an initial time, and on the fitting windows choice.

Noise: of course, signal fluctuate, rarely uniformly, and you sometimes get spurious random spikes that affect the fit. Should one filter the data first? Not very sure, it can provide biases. I have tested some combinations of nonlinear geometric filters.

Finally, on $100$ signals in the same condition, I usually had large variations on estimated parameters that should have been the same, e. g. from $20\%$ to $2$-fold variation.