Let us consider a probability distribution $(g_n)_{n \in \mathbb{N}}$ which we want to approximate by a mixture of $(f_n(\lambda))_{n \in \mathbb{N}}$ where $\lambda \in \mathbb{R}$ is a parameter.

Are there known techniques that allow one to find the mixture minimizing the $L^1$ norm:
\begin{equation}
\min_{p} \sum_{n=0}^{\infty} \left|g_n - \int \rm{d} \lambda \; p(\lambda) f_n(\lambda) \right|
\end{equation}
where $p(\lambda)$ is a normalized probability distribution?

The motivation of this problem is linked to experimental physics: ideally one would like to generate an experimental process characterized by the probability distribution $g$ but this is really not practical. What is really easy, however, is to generate an experimental process with the distribution $f(\lambda)$ where $\lambda$ is a tunable parameter.
Therefore, the goal is to approximate $g$ as closely as possible with such a mixture of $f(\lambda)$, where the distance between the two distribution is computed with the $L^1$ norm, that is, I want to minimize the variation distance between the two distributions.

In the specific problem I consider, $f(\lambda)$ is a Poisson distribution with parameter $\lambda \geq 0$, but I really am interested in a general method to approach this problem-

Any pointer to the relevant literature would be greatly appreciated.
Thanks a lot!

Can you motivate a bit more your problem (why do you need that in two lines, is it some sort of least favorable prior for simultaneous testing)? it is probability over $\mathbb{R}$ ? The norm you use in your sum is the $L^1$ norm between distribution right ? note that $p$ should have integral = 1.
–
robin girardSep 1 '10 at 11:02

Maybe it's better to bound $L^1$ distance by $L^2$, then Fourier analytic techniques can be used. That is also the strategy used in length minimization via energy minimization, well known to differential geometers.
–
John JiangSep 1 '10 at 20:58

Unfortunately, it is really the $L^1$ distance which is relevant in my problem so I cannot switch from the $L^1$ to the $L^2$ distance. Furthermore, as the distributions are defined over $\mathbb{N}$, I cannot see how a bound on the $L^2$ distance could give any information concerning the $L^1$ distance?
–
Anthony LeverrierSep 2 '10 at 16:05

The iteration algorithm called Expectation-maximization is often suitable for approximations with mixitures, though it might not actually converge to the minimum you asked for.
–
Zsbán AmbrusSep 16 '10 at 14:39

1 Answer
1

perhaps for starters you could take $p$ to be supported on a finite number of points. then the constraints on $p$ become simple linear inequalities and you have a [convex - perhaps even linear] programming problem.

there has been attention in the statistical literature to fitting models using least absolute deviations, rather than least squares. [in the simplest case, the minimizer of

$$\sum_{i=1}^n |x_i - a|$$

is the sample median - rather than the sample mean one gets for $a$ using least squares.]

you could see if references in the monograph by yadolah dodge [L$_1$ statistical procedures and related topics, ims lecture notes - monograph series vol 31 1997] give anything useful.

Thanks for the reference! Yes, I indeed started by taking $p$ supported on a finite number of points, and it works pretty well in practice. Still, it's a bit frustrating as a continuous density would be much more natural.
–
Anthony LeverrierSep 16 '10 at 16:12