Marco Cuturi - Sinkhorn Scaling for Optimal Transport

Description

We propose in this work a way to approximate Optimal Transport (OT) distances (a.k.a Wasserstein or Earth Mover's) using a regularized Optimal Transport (OT) problem. That problem can be solved with Sinkhorn's algorithm.

The accuracy of the approximation is parameterized by a regularization parameter . Computing this regularized OT problem results in two quantities: an upper bound on the actual OT distance, which we call the dual-Sinkhorn divergence, as well as a lower bound, which can be used for nearest neighbor search under the OT metric.

If you would like to use the EMD in your applications, but gave up doing so because of its computational cost, we suggest that you replace the EMD evaluations in your applications by the dual-Sinkhorn divergence, and benefit from its faster speed and nice computational features. To do so, you will need to tune the regularization parameter.

As gets larger, the dual-Sinkhorn divergence converges to the OT distance. For large values, the computation of our bounds become more computationally expensive and numerically unstable as grows. Small values of have been shown to yield very good approximations (in the sense that our approximations perform better than the real OT distance).

We suggest thus that you choose by cross-validation and start with a small value. A small value for means that the product of with (where is the ground metric) should not be too large. You need to make sure that which would be the threshold when numerical problems appear.
Numerical instabilities arise when the elementwise exponential matrix has elements that are numerically because of insufficient numerical precision caused by a value that is too big. In that case Sinkhorn's algorithm can blow up.

Parallelization

A nice feature of these distances is that their computation is vectorized: the computation of a distances, whether from one histogram to many, or many to many, can be carried out simultaneously using elementary linear algebra operations. As such, these upper and lower bounds can be computed trivially on GPGPU's.

The implementation provided in Matlab below should also run on your GPU if you have the PCT toolbox. Simply instantiate all input variables on the GPU using the gpuArray command first.

Implementation in Matlab [V0.2b, 21/04/15]

The sinkhornTransport zip file contains an implementation that outputs the upper and lower bounds, as well as the smoothed optimal transport(s) if required.

the dual-Sinkhorn divergences and lower bounds to the EMD of a histogram to a family of histograms , as well as their optimal transport maps, with different stopping criteria. We call this the 1-vs-N execution mode.

the dual-Sinkhorn divergences and lower bounds of a family of pairs of histogram , as well as their optimal transport maps, with different stopping criteria. We call this the N x 1-vs-1 execution mode.

Display Vector of Distances and Lower Bounds on EMDDisplay (smoothed) optimal transport from a to b_29, which has been chosen randomly.Deviation of T from marginals: 8.5463e-05 1.6837e-17 (should be close to zero)

Display Vector of Distances and Lower Bounds on EMDDisplay (smoothed) optimal transport from a_18 to b_18, which has been chosen randomly.Deviation of T from marginals: 1.0924e-05 2.3773e-17 (should be close to zero)