set couplings of and , i.e. subsets such that for all there such that and for all there is such that ,

metric couplings of and , i.e. metrics on the disjoint union of and such that if and are in and if and are in . In fact, one could also work with semi-metrics , i.e. they do not need to be positive definite, and

measure couplings of and , i.e. measures on such that and for all -/-measurable set and , respectively.

Now we make the next (and final) step and compare metric spaces and which are both equipped with a measure. These objects are known as metric measure spaces or mm-spaces and are formally defined as follows:

Definition 1 (mm-space) A metric measure space is a tripel consisting a compact metric space and a Borel probability measure on .

Note that sometimes it is included that has full support (i.e., equal to ) in the definition of an mm-space, but it seems that not everybody does it like that.

1. Comparing mm-spaces: Gromov-Wasserstein

Our question is: How to we compare two mm-spaces and ? The plan is simply to augment the previous versions of the Gromov-Hausdorff distances defined in my previos posthere and here by something which takes the measures on the respective metric spaces into account. We recall both formulations of the Gromov-Hausdorff distance: The first is

where the infimum is taken over all set couplings of and and metric couplings of and , and the second is

where the infimum is also taken over all set couplings of and .

Basically, we have already seen what we should do to build a metric between mm-spaces. The idea is: if there were some “natural measure” for any set coupling of and , then we would simply define the distance between and as

(as a generalization of (2)). In both cases we can have . Note that the obvious modification for leads to something very similar to the Gromov-Hausdorff metrics.

But indeed there are “natural measures” on the set couplings of and : At least for the full coupling there are the measure couplings of and ! (One does not need to consider smaller set couplings since this can be taken into account by the measure couplings; they do not need to have full support anyway.)

Applying this idea to the version (1) of the Gromov-Hausdorff metric we arrive at the following expression, which can be called Gromov-Wasserstein metric,

where the infimum is taken over all measure couplings of and and all metric couplings of and .

Starting from the version (2) of the Gromov-Hausdorff metric we arrive at another formulation:

where the infimum is taken over all measure couplings of and .

While both versions of the Gromov-Hausdorff metric for compact metric spaces where equal, the same is not true for both generalizations to mm-spaces: In his paper Memoli proves that and gives an example (right after Remark 5.14) where strict inequality holds.

2. Comparing mm-spaces: Gromov-Prokhorov

Instead of starting from the Gromov-Hausdorff distance between metric spaces and augmenting their definition with something that takes the measures into account, we could also start from the “>Prokhorov metric between probability measures and augment the definition with something that takes the metric into account. In fact there are also two possibilities to do so: In the appendix of his paper, Memoli quotes this version (from this paper by Greven, Pfaffelhuber and Winter) of a metric between mm-spaces which we call Gromov-Prokhorov metric

where the infimum is taken over all measure couplings of and and all metric couplings of and .

The next version (also from the same paper by Greven, Pfaffelhuber and Winter where it was called Eurandom metric) is

where the infimum is taken over all measure couplings only.

3. A very simple example

The calculation of the proposed metrics by hand can be quite cumbersome. Let’s look at the simplest example.

We consider metric spaces (with the euclidean metric) accompanied with the measures and for some points and . In this case the is only one measure coupling of and , namely

Now it is easy to calculate the variant (4) of the Gromov-Wasserstein metric:

Let’s have a look at the variant (3): Since there is only one measure coupling, the metric is

As we have learned in Example 4 in the previous post, we can find a metric coupling of and that brings the points in and in arbitrarily close together (by embedding both and into some such that these points are only -far away from each other). Hence, we see that we have

similarly to .

Now let’s look at the Gromov-Prokhorov metric from (5). Again we only have one measure coupling and we get

Since the measure coupling is a Dirac, we can evaluate

As observed previously, there are metric couplings which bring the points and arbitrarily close together, and hence for any there is a metric coupling such that which shows that

Finally, consider the second variant of the Gromov-Prokhorov metric from (6). Since we only have the one measure coupling we have

Evaluating the tensored Dirac delta is easy: We have that is either one or zero and it is one if and only if the point is in the set that is measured. However, we have that is, of course, always zero (and never larger than any ). Hence, the measure is always zero and hence, this version of the Gromov-Prokhorov distance also gives

Note that all four metrics can not see that the two Diracs are at different points. The reason seems to be, that one can “deform the metric space outside of the support of the measures” arbitrarily, in some sense.

It seems, that the computation of and were easier, since one needs to know all measure couplings and no metric couplings has been involved.

4. Next difficult example: Compare some mm-space to a point

Ok, we have seen that mm-spaces which carry their measure in just a single point all look alike in both versions of the Gromov-Wasserstein metric and also in both versions of the Gromov-Prokhorov metric. Let’s look at a slightly more difficult example: We consider some mm-space and want to calculate its distance to a single point, i.e. the mm-space with the only possible metric and measure. This should somehow measure the “size” or “spread” of the mm-space .

First, we need to know all measure couplings and all metric couplings between these spaces. The measure couplings are very easy: There is just one, namely

(i.e. all subsets of are treated as if they were subsets of ). Concerning metric couplings, there are a few more. We allow semi-metrics on the disjoint union : Since should respect the metric on we see that all metric couplings are parametrized by the points in by identifying (the element in ) with this point , i.e. all metric couplings are of the form defined by

(This only gives semi-metrics since we have although, formally .)

Let’s calculate the first Gromov-Wasserstein metric: There is only one measure coupling and we use the parametrization of the metric couplings to deduce

The second variant of the Gromov-Wasserstein metric is (remember, there is only one measure coupling)

This quantity (without the factor ) is called the “-diameter” of .

Let’s turn to the Gromov-Prokhorov metrics. The first one is (remember, that the metric couplings are parametrized by the points in )

If this looks familiar, then you may have encountered the Ky-Fan metric already? The Ky-Fan metric is a metric between random variables and defined of the same probability space with values in a metric space with metric . It reads as

Hence, the first version of the Gromov-Prokhorov metric is

i.e., the minimal Ky-Fan metric between the identity mapping and the constant mappings. (In other words, it measures how far the identity is from the constant mappings in the sense of Ky Fan.)

The second variant of the Gromov-Prokhorov metric is (remember, the only measure coupling is )

I do not have a neat name or a good intuition for this metric yet (although it also looks like it measures “size” or “non-localization” of in some sense). If you have one, let me know!

There are different notions in mathematics to compare two objects, think of the size of real numbers, the cardinality of sets or the length of the difference of two vectors. Here we will deal with not only comparison of objects but with “measures of similarity”. Two fundamental notions for this are norms in vector spaces and metrics. The norm is the stronger concept in that it uses more structure than a metric and also, every norm induces a metric but not the other way round. There are occasions in which both a norm and a metric are available but lead to different concepts of similarity. One of these instances occurs in sparse recovery, especially in the continuous formulation, e.g. as described in a previous post. Consider the unit interval and two Radon measures and on ( could also be an aritrary metric space). On the space of Radon measures there is the variation norm

where the supremum is taken over all partitions of into a finite number of measurable sets. Moreover, there are different metrics one can put on the space of Radon measures, e.g. the Prokhorov metric which is defined for two probability measures (e.g. non-negative ones with unit total mass)

where the infimum is taken over all measure couplings of and , that is, all measures on such that for measurable it holds that

Example 1 We compare two Dirac measures and located at distinct points in as seen here:

The variation norm measures their distance as

(choose such that it contains and small enough that , but and ). The calculate the Prokhorov metric note that you only need to consider ‘s which contain only one of the points and hence, it evaluates to

For the Wasserstein metric we observe that there is only one possible measure coupling of and , namely the measure . Hence, we have

The variation norm distinguishes the two Diracs but is not able to grasp the distance of their supports. On the other hand, both metrics return the geometric distance of the supports in the underlying space as distance of the Diracs. Put in pictures: The variation norm of the difference measures the size ob this object

while both metrics capture the distance of the measures like here

It should not stay unnoted that convergence in both the Prokhorov metric and the Wasserstein metrics is exactly the weak convergence of probability measures.

The above example provides a motivation to study metric structures on spaces, even if they are also equipped with a norm. Another reason to shift attention from normed spaces to metric spaces is the fact that there has emerged a body of work to build a theory of analysis in metric spaces (see, e.g. this answer on mathoverflow or the book Gradient Flows: In Metric Spaces And In The Space Of Probability Measures by Ambrosio, Gigli and Savaré (which puts special emphasis on the space of probability measures)). Yet another motivation for the study of metrics in this way is the problem of comparing shapes (without being precisely defined yet): Which of these shapes look most alike?

(Note that shapes need not to be two dimensional figures, you may also think of more complex objects like surfaces in three dimensions or Riemannian manifolds.)

One may also ask the question how two compare different images defined on different shapes, i.e. different “distributions of colour” on two different shapes.

2. Comparing shapes: Metric spaces

Up to now we tried to compare different measures, defined on the same set. At least to me it seems that both the Prokhorov and the Wasserstein metrics are suited to measure the similarity of measures and in fact, they do so somehow finer than the usual norm does.

Let’s try to go one step further and ask ourselves, how we could compare two measures and which are defined on two different sets? While thinking about an answer one need to balance several things:

The setup should be general enough to allow for the comparison of a wide range of objects.

It should include enough structure to allow meaningful statements.

It should lead to a measure which is easy enough to handle both analytically and computationally.

For the first and second bullet: We are going to work with measures not on arbitrary sets but on metric spaces. This will allow to measure distances between points in the sets and, as you probably know, does not pose a severe restriction. Although metric spaces are much more specific than topological spaces, we still aim at quantitative measures which are not provided by topologies. With respect to the last bullet: Note that both the Prokhorov and the Wasserstein metric are defined as infimums over fairly large and not too well structured sets (for the Prokhorov metric and need to consider all measurable sets and their -neighborhoods, for the Wasserstein metric, one need to consider all measure couplings). While they can be handled quite well theoretically, their computational realization can be cumbersome.

In a similar spirit than Facundo Memoli’s paper we work our way up from comparing subsets of metric spaces up to comparing two different metric spaces with two measures defined on them.

2.1. Comparing compact subsets of a metric space: Hausdorff

Let be a compact metric space. Almost hundred years ago Hausdorff introduced a metric on the family of all non-empty compact subsets of a metric space as follows: The Hausdorff metric of two compact subsets and of is defined as

(again, using the notion of -neighborhood). This definition seems to be much in the spirit of the Prokhorov metric.

Proposition 2.1 in Facundo Memolis paper shows that the Hausdorff metric has an equivalent description as

where the infimum is taken over all correspondences of and , i.e., all subset such that for all there is such that and for all there such that . One may also say set coupling of and instead of correspondence.

Example 2 There is always the full coupling . Three different set couplings of two subsets and of the unit interval are shown here:

the “full one” in green and two “slim” ones in red and orange. Other “slim” couplings can be obtained from surjective mappings by (or with the roles of and swapped): If you couple a set with itself, there is also the trivial coupling

which is just the diagonal of

Note that the alternative definition of the Hausdorff metric is more in the spirit of the Wasserstein metric: It does not use enlarged objects (by -neighborhoods) but couplings.

The Hausdorff metric is indeed a metric on the set of all non-empty compact subsets of a metric space and if itself is compact it even holds that is a compact metric space (a result, known as Blaschke Selection Theorem).

One may say that we went up an abstraction ladder one step by moving from to .

In words: To compute the Gromov-Hausdorff metric, you try embed both and into a common larger space isometrically such that they are as close as possible according to the Hausdorff metric in that space.

Strictly speaking, the above definition is not well stated as one can not form an infimum over all metric spaces since this collection does not form a set according to the rules of set theory. More precisely one should write that is the infimum over all such that there exists a metric space and isometric embeddings and of and , respectively, such that .

As the Hausdorff metric could be reformulated with set couplings there is a reformulation of the Gromov-Hausdorff metric based on metric couplings: A metric coupling of two metric spaces and is a metric on the disjoint union of and such that for all and it holds that and .

Example 3 We couple a metric space with itself. We denote with an identical copy of and look for a metric on that respects the metrics and in the way a metric coupling has to.

To distinguish elements from and we put a on all quantities from . Moreover, for we denote by its identical copy in (and similarly for , is its identical twin). Then, for any we can define (i.e. the distance between any two identical twins is . By the triangle inequality we get for and that should fulfill

and hence

Indeed we can choose if and leading to one specific metric coupling for any . This couplings allow to distinguish identical twins and behave as a metric on the whole disjoint union. In the limiting case we do not obtain a metric but a semi-metric or pseudo-metric which is just the same as a metric but without the assumption that implies that .

Example 4 The above example of a metric coupling of a metric space with itself was somehow “reproducing” the given metric as accurate as possible. There are also other couplings that put very different distances to points and there is also a way to visualize metric couplings: When building the disjoint union of two metric spaces and , you can imagine this as isometrically embedding both in a larger metric space in a non-overlapping way and obtain the metric coupling as the restriction of the metric on to . For you can embed both into . A metric coupling which is similar (but not equal) to the coupling of the previous example is obtained by putting and side by side at distance as here (one space in green, the other in blue).

A quite different coupling is obtained by putting and side by side, but in a reversed way as here:

You may even embed them in a more weired way as here:

but remember that the embeddings has to be isometric, hence, distortions like here are not allowed.

This example illustrate that the idea of metric coupling is in similar spirit as of “embedding two spaces in a common larger one”.

With the notion of metric coupling, the Gromov-Hausdorff metric can be written as

where the infimum is taken over all set couplings of and and all metric couplings of and .

In words: To compute the Gromov-Hausdorff metric this way, you look for a set coupling of the base sets and and a metric coupling of the metrics and such that the maximal distance of two coupled points and is as small as possible. While this may look more complicated than the original definition from~(2), note that the original definition uses all metric spaces in which you can embed and isometrically, which seems barely impossible to realize. Granted, the new definition also considers a lot of quantities.

Also note that this definition is in spirit of the Wasserstein metric from~(1): If there were natural measures on the set couplings we could write \begin{equation*} d_{GH}(X,Y) = \inf_{R,d} \Big(\int d(x,y)^pd\mu_R\Big)^{1/p} \end{equation*} and in the limit we would recover definition~(3).

Example 5 The Gromov-Hausdorff distance of a metric space to itself is easily seen to be zero: Consider the trivial coupling from Example~2 and the family of metric couplings from Example~3. Then we have for any showing . Let’s take one of the next-complicated examples and compute the distance of and , both equipped with the euclidean metric. We couple the sets and by and the respective metrics by embedding and into as follows: Put at the line from to and at the line from to :

This shows that and actually, we have equality here.

There is another reformulation of the Gromov-Hausdorff metric, the equivalence of which is shown in Theorem 7.3.25 in the book “A Course in Metric Geometry” by Dmitri Burago, Yuri Burago and Sergei Ivanov:

where the infimum is taken over all set couplings of and .

In words: Look for a set coupling such that any two coupled pairs and have the “most equal” distance.

This reformulation may have the advantage over the form (3) in that is only considers the set couplings and the given metrics and and no metric coupling is needed.

Note that, as the previous reformulation~(3), it is also in the spirit of the Wasserstein metric: If there were natural measures in the set couplings , we could write

One may say that we went up an abstraction ladder one step further by moving from to to .

Since this post has been grown pretty long already, I decided to do the next step (which is the already announced metric on metric spaces which additionally carry some measure on them – so-called metric measure spaces) in a later post.