Limits On Matrix Multiplication

Josh Alman is a graduate student at a technical school in the Boston area. He is working on matrix multiplication, among other problems in complexity theory, and is advised by Ryan Williams and Virginia Vassilevska Williams.

Today we comment on his recent paper with Virginia on the limits of approaches to matrix multiplication.

This paper is titled, “Limits on All Known (And Some Unknown) Approaches to Matrix Multiplication,” and appears in the 2018 FOCS. An earlier paper by them was presented at the 2018 Innovations in Theoretical Computer Science conference.

Since Volker Strassen’s famous result that matrix multiplication (MM) can be computed in less than cubic time, there has been great interest in discovering the best possible exponent. More precisely, what is the smallest so that MM can be done in time ? A widely discussed conjecture is that can be any number greater than . Of course the best currently is higher—much higher?—and is equal to

This is due to François Le Gall—see his paper entitled “Powers of Tensors and Fast Matrix Multiplication.”

Does this suggest any pattern to you? Can any number below 2.37 and above 2 be “the” number?

Limits on Limits

Since Strassen there have been several advances in reducing the value of . We discussed the history here following our two–part presentation of the advances by Andrew Stothers and Virginia. There was a time when it was thought by many that might be a barrier. It is not. Note the cluster of points near 1980 in this graphic on Wikipedia’s matrix multiplication algorithms page:

It is poetic that after Don Coppersmith and Shmuel Winograd (CW) broke ever so slightly, Strassen himself made a larger break in 1986. Then CW in 1991 produced a running time of whose first three digits have stood all the past 27 years despite massive efforts. It seems is stuck at around . This clearly suggests that maybe there is a reason that we cannot make further progress. So an idea is:

Are there limits to what we can prove about ?

This is precisely the focus of Josh and Virginia’s joint paper. They divide the known methods of proving better upper bounds on MM into categories according to techniques employed:

Laser Method: This is due to Strassen.

Group Method: This is due to Henry Cohn and Christopher Umans.

Solar Method: Josh and Virginia created this and next one too.

Galactic Method: This is a common generalization of the previously known methods.

They show that the above methods cannot prove that . Namely, they show:

Theorem 1 There is an absolute constant so that any proof that uses the above methods can only prove for a value above .

Their result is not the first result on the limitations of MM approaches, however. The first “limitation” result was by Noga Alon, Amir Shpilka and Umans in a paper titled, “On Sunflowers and Matrix Multiplication.” That paper had the sobering message that several “hopeful” conjectures working toward are incompatible with versions of the SunflowerConjecture by Paul Erdős and Richard Rado that are widely believed. Here is a graphic from that paper on the implications it proves:

The New Paper

Rather then show how Josh and Virginia prove their limit results perhaps it is better to explain what they have proved. As usual we will refer to their paper to get the full details.

The point, as we see it, is this. Deciding how many operations are needed to compute the matrix product boils down to determining the rank of a certain tensor. A tensor has a rank like matrices. However, the rank is hard to compute and thus the direct calculation of the rank seems to be impossible. So the above methods starting with the Laser method are methods that try to control the difficulty of determining the rank of a tensor.

An analogy is this: Imagine you are searching some large space for an optimal value of a function. This can be complex both computationally and also from a proof point of view. That is, it may be hard to even prove that particular instances have a claimed value.

A natural approach that avoids these issues is to restrict the search space in some way. Of course whenever you restrict a search space you may lose some optimal values. The paper in question shows that the tradeoff for using restricted search spaces is indeed you may lose some optimal values. But you gain because the search space is easier to understand. It may, however, be too easy: you may not be able to show that .

The strangest aspect to our eyes is that they prove the existence of but do not give an estimate for it. Here the earlier paper is highly informative. It observes that all previous advances could be made by analyzing the tri-linear forms (tensors)

where the last subscript is modulo for some fixed . In the simplest case where is prime (or a prime power), it shows that any achieved via similar analysis of is bounded below by

and where is the unique real solution in to These numbers are explicit—for instance, is about But they give as so this didn’t rule out getting by these means.

The new paper shows that getting upper bounds that are similarly asymptotic to by these (and other) methods is impossible. The proof feels like working by contradiction from this supposition. Josh and Virginia tell us that in the case of only the Galactic method applied to extended CW tensors, extracting estimates without trying to optimize them gives a lower limit of about .

Even an optimized estimate, however, would not rule out the “true” being much higher. The full paper when it is out will shed further light. For now, purely as a joke, we did a quadratic regression on the Wikipedia figure’s data points for since 1968 and got a minimum of —which the regression says would have come in the year 2004. Oh well.

Open Problems

What is the best value for ? Do these limit results really help us determine a value and understand why a particular value—greater than 2—might be the barrier?

We hinted at that opinion in the post. After the work by Cohn and Umans (and others) we might have agreed. But there may be some number akin to Euler’s constant lurking. Ryan skirts this in his recent “Likelihoods” piece, which we intend to cover at some point.