Problem: Given points in Banach space , how many dimensions are needed to represent pairwise distances to within ?

In other words, given a finite subset , we want a map such that for all , ,

This works well in , this is the Johnson-Lindenstrauss theorem: for an -point set, dimensions suffice. If , are sufficient, and often necessary.

2. Old and new results

In , exact reduction (i.e. ) requires dimensions.

For reduction, a result of Schechtman (later improved by Talagrand), states that dimensions suffice. Newman and Rabinovich improved this to dimensions suffice. An unpublished result of Andoni, Naor and Newman says that dimensions suffice for large .

Lower bounds. With Brinkman, I proved that dimensions are necessary for distorsion , [BC]. A simpler proof has been given since by James Lee and Assaf Naor, [LN].

Today, I focus on close to . Previously, I proved that are needed. Alexandru Andoni, Ofer Neiman and Nguyen and I show

Theorem 1 For reduction, dimensions are needed.

3. The examples

The lower bound is based on a recursive -cycle graph. This means one starts with a cycle with two marked antipodal vertices and . Then we replace each edge with a copy of the cycle, and so on. For , this is called the diamond graph (basis of the lower bound). The diamond graph is isometrically embeddable in dimensions, so cannot give a sharp lower bound.

The -th graph has diameter , edges, roughly vertices. It embeds isometrically in dimensions. Note that , fitting with the Andoni, Naor and Newman upper bound.

4. From distorsion to stretch

We first show that the -cycle needs dimensions for distorsion., if . To prove this, think of an embedding as a probabilistic line embedding. I.e.

Say a line embedding has stretch if

Say a distribution over line embeddings has stretch if above inequality holds for all in the support.

Our key estimate is the following: a (low) -dimensional embedding has stretch . So it suffices to estimate stretch of line embeddings from below.

5. Analysis of embeddings of the -cycle

The canonical embedding of the -cycle uses dimensions. For instance, if , 0000, 0001, 0011, 0111, 1111, 1110, 1100, 1000. When we view it as a distribution over line embedding, we rescale by .

Let be the sum of antipodals, be the sum of edges. In the exact embedding case, , . In the -distorsion case, and , so that

Now we show that low () stretch implies . When the cycle is mapped to a line, every two antipodal pairs and are mapped to non nested intervals, so

This implies that nearby antipodal pairs are mapped to comparable lengths. This propagates to not that nearby antipodal pairs, and so on… This leads to

where are edge lengths on the line. Think of as . Then

Assume that every edge has stretch at most . Then and other . This gives if stretch is .

6. General case

Now we turn to graph . It has roughly vertices. We show that for distorsion , one needs stretch .

The canonical embedding of is built recursively, adding dimensions at each step. Edges have length , antipodal pairs at level are mapped to distance . There are such pairs.

Use a Poincaré inequality. Let be the sums of antipodals at level and the sum of edges at level . Now consider

One shows that low () distorsion implies , and low () stretch implies . For this, one adds up estimates for for all cycles in the construction.

7. Comments and questions

This style of analysis can be used to give a simple proof of our previous result. It was discovered using LP methods, including experiments.

Regev: connection with functional analysis ? Answer: this does not go to spaces. Dimension reduction in is widely open.

Question: how tight is the result ? role of aspect ratio ? Answer: there is room for slight improvements.

Linial: do things generically embed well ? these examples look extreme. Answer: there are other graphs for which we care about embeddings. The recursive nature of examples suggests is there a parallel repetition phenomenon.