Counting sums and differences

Take a set of integers, say {0, 2, 5, 8, 11}, and write down all the numbers that can be represented as sums of two elements drawn from this set. For our example the answer is {0, 2, 4, 5, 7, 8, 10, 11, 13, 16, 19, 22}. Now construct the corresponding set of pairwise differences: {–11, –9, –8, –6, –5, –3, –2, 0, 2, 3, 5, 6, 8, 9, 11}. Note that there are only 12 distinct sums but 15 differences.

Again the differences outnumber the sums, this time by a margin of 19 to 14.

It’s not hard to see why differences tend to be more numerous: Addition is commutative but subtraction isn’t. Thus the sums 5+8 and 8+5 both yield the single result 13, whereas 5–8 and 8–5 produce two distinct differences, –3 and +3. It is rumored that someone other than John Horton Conway once conjectured that the number of sums formed in this way never exceeds the number of differences. But the conjecture is false. A counterexample is the set {0, 2, 3, 4, 7, 11, 12, 14}, which has 26 distinct pairwise sums but only 25 differences:

Melvyn B. Nathanson of Lehman College—the Bronx campus of the City University of New York—has lately called attention to such anomalous sets of integers, which he identifies by the abbreviation MSTD (more sums than differences). He has lots of questions. Why do such sets exist? Where are they found? How many are there? What is their structure?

This past April Nathanson discussed MSTD sets in a talk titled “Problems in Additive Number Theory” at the University of Montreal; the talk is available on the arXiv as math.NT/0604340. In June Nathanson delivered a follow-up talk, “Sets with More Sums than Differences,” at the SIAM Conference on Discrete Mathematics in Victoria, British Columbia; that paper was released last week on the arXiv as math.NT/0608148. Meanwhile Kevin O’Bryant of the College of Staten Island (another CUNY unit) has addressed somewhat different aspects of the MSTD problem in a paper titled “Many Sets Have More Sums than Differences” (math.NT/0608131).

The appeal of a problem like this one is that it seems to get a lot of mileage out of the simplest mathematics: adding, subtracting and counting—operations that most of us know how to do. As the papers of Nathanson and Bryant show, the math is not all so trivial, and yet an amateur like me can still hope to have some fun with these questions. I’ve been toying with MSTD sets for the past week or so.

First, a few preliminaries. A set, as defined for this discussion, is a collection of items without duplicates. For example, {1, 3} is a two-element set. There are four ways to add these elements in pairs—1+1, 1+3, 3+1 and 3+3—but two of those summations yield the same result, and so the “sumset” has just three elements, {1, 4, 6}. The order of the elements in a set has no significance, but for convenience I’ll always list them in ascending sequence. All the sets discussed here are finite.

For a clearer understanding of set sums and differences, it helps to write down an example in matrix format:

A set of n elements has n2 pairwise sums and differences, but they are not all distinct. In the case of the sums, the matrix is symmetric, and so everything above the main diagonal is duplicated below it. Thus the maximum possible number of unique sums is given by counting the entries along the diagonal plus those in either the upper or the lower triangle, but not both. This number is equal to n(n+1)/2; for the example given here, where n=4, the maximum size of the sumset is 10. However, for the specific set shown here the maximum is not attained, because of a few “coincidences”: 4 appears as both 0+4 and 2+2, and 6 arises as both 2+4 and 3+3. Thus the sumset has only eight elements: {0, 2, 3, 4, 5, 6, 7, 8}.

The difference matrix is antisymmetric, and so the elements of both the upper and the lower triangles need to be counted. On the other hand, the diagonal of the difference matrix is all zeros. The maximum number of distinct differences is n(n–1)+1. Again, though, the maximum is not reached in this example; coincidences reduce the size of the “diffset” from 13 to 9. Still, the differences outnumber the sums, and so {0, 2, 3, 4} is not an MSTD set.

The smallest possible sumset or diffset has 2n–1 elements. (You might want to work out why.) It’s easy to construct a set that attains this minimum: Just choose elements in an arithmetic progression. For example, the set {0, 2, 4, 6} has the sumset {0, 2, 4, 6, 8, 10, 12} and the diffset {–6, –4, –2, 0, 2, 4, 6}, both of size 7. It’s also straightforward to build a set that generates the largest possible sumset and diffset; the trick is to make each element more than twice as large as the next smaller element, as in the set {0, 1, 3, 7}. This structure eliminates all coincidental duplicates in both the sums and the differences.

In the search for MSTD sets we don’t have to look at all possible sets of integers. It turns out that both the number of sums and the number of differences generated by a set remain unchanged if you add a constant to each member of the set. Likewise, multiplying each element by a constant also leaves the number of sums and differences invariant. In other words, you can transform each element x into ax+b (an affine transformation) without altering the size of the sumset or the diffset. This property is important because it means we can represent any MSTD set in a canonical form. We can shift it along the number line until its smallest element is 0, and we can shrink it down to its smallest possible span of integers by dividing out any factors that are common to all the nonzero elements. For example, the set {5, 8, 17, 26, 41} mentioned above has the canonical form {0, 1, 4, 7, 12}. Both of these sets have 19 differences and 14 sums.

Now for some questions.

What is the smallest MSTD set? The smallest known set is the example {0, 2, 3, 4, 7, 11, 12, 14} that I have already introduced. It has eight elements, and in the canonical representation the largest element is 14. There is one other known eight-element example, {0, 2, 3, 7, 10, 11, 12, 14}. A few seconds of computing is all it takes to show that there is no smaller eight-element MSTD set. But is there an example with fewer than eight elements? I don’t think so, but I can’t prove it. A brute-force search rules out any MSTD set with seven or fewer elements where the largest element is less than 81. Imre J. Ruzsa of the Mathematical Institute of the Hungarian Academy of Sciences claims that any MSTD set must have at least seven elements.

How many MSTD sets are there? In one sense, this is a very easy question. Given the affine invariance of MSTD sets, if just one such set exists, then we can generate infinitely many of them by translation and dilation. But most people would agree that these are all just copies of the same set in disguise. What we really want to know is the number of MSTD sets when they are all reduced to canonical form. Nathanson has shown that in this scheme of reckoning, too, the number of sets is infinite. He gives a formula for generating infinite families of MSTD sets. Starting with the example {0, 2, 3, 4, 7, 11, 12, 14}, the formula yields a sequence of progressively larger sets that Nathanson proves must all have more sums than differences: {0, 2, 3, 4, 7, 11, 15, 16, 18}, then {0, 2, 3, 4, 7, 11, 15, 19, 20, 22}, then {0, 2, 3, 4, 7, 11, 15, 19, 23, 24, 26}, and so on. The question then arises, are all MSTD sets members of such infinite families, or are there also “sporadic” MSTD sets?

How rare are MSTD sets? Having already established that there are infinitely many MSTD sets, it might seem that they can’t be very rare, but that’s not necessarily true. The primes are also infinite, yet they are vanishingly rare. Take the ratio of the number of primes less than N to the number of integers less than N; as N goes to infinity, the ratio goes to zero. MSTD sets could be rare in a similar sense. O’Bryant shows that within a certain infinite series of integer sets, the probability of finding an MSTD set is greater than zero. Does that result hold also for integer sets in general?

By how much can the number of sums exceed the number of differences? Let’s define the discrepancy Δ of a set as the number of differences minus the number of sums. For all “ordinary” sets, Δ ≥ 0. For MSTD sets, Δ is negative. In all the MSTD examples I’ve shown so far, Δ = –1, or in other words the number of sums is just 1 greater than the number of differences. I’ve been able to find lots of sets with Δ = –2; the smallest, with 11 elements, is {0, 1, 2, 4, 5, 9, 12, 13, 14, 16, 17}, which has 35 sums and only 33 differences. I’ve also stumbled upon a few sets with Δ = –3, such as the 16-element set {0, 1, 2, 4, 5, 9, 12, 13, 14, 16, 17, 21, 24, 25, 26, 28}, which has 56 sums and 53 differences. I’m guessing there is no lower bound on Δ. (Update: As I was smoothing out the last details of this report, I discovered a 1973 paper by Sherman K. Stein. He proves that the ratio of the number of differences to the number of sums can be made arbitrarily large or small.)

How are MSTD sets distributed among all integer sets? The trouble with even asking a question like this is that we can’t look at all integer sets. If we try to answer the question statistically, by choosing a representative sample of sets, then we have to wade into the messy business of deciding what sort of sample is representative. For lack of a better idea, I have tried the following approach. Assuming that all sets are in canonical form, I classify them by two parameters: n, the number of elements, and m, the largest element. Then for any given values of n and m I can either examine all sets (if n and m are small enough) or generate a random sample. Note that m cannot be less than n–1. If m = n–1, then there is only one possible set, namely the counting sequence 0, 1, 2,…, m. As m increases for a fixed value of n, so does the number of possible sets and the average gap between the elements. Now we can ask how Δ varies as a function of m and n. In the case of m = n–1, the answer can be given unequivocally: Δ = 0, because the set is an arithmetic progression, and both the number of sums and the number of differences is 2n–1. When m is much greater than n, Δ is almost surely positive. The reason is that the elements of the set are widely dispersed, and a coincidence in which different pairs of elements yield the same sum or the same difference is unlikely. In almost all sets with m >> n, the number of sums takes its maximum value n(n+1)/2 and the number of differences is also at its maximum, n(n–1)+1. If you do the subtraction, you’ll find that Δ increases in proportion to n2.

The interesting region would appear to be the middle ground, where m is not too much larger than n. Here is a graph showing the frequency of Δ values for n = 10 and all values of m between 9 and 27.

As m increases, the distribution grows wider and shifts to the right—toward more-positive values of Δ. (Note that the graph is based on a complete enumeration of all sets, not on a statistical sampling.)

MSTD sets are so rare that in the graph above their frequency is indistinguishable from zero. Below we look exclusively at the frequency of MSTD sets as a function of m for three values of n.

It appears that MSTD sets are most common (or maybe one should say least rare) at the smallest value of m where such sets first appear. But this impression is somewhat misleading. As the graph below reveals, the absolute number of MSTD sets increases as a function of m (or, in the case of n = 8, remains constant). Although it’s true that the proportion of MSTD sets declines, that’s only because the total number of integer sets grows exponentially with m.

Thus if you are wandering aimlessly in the universe of integer sets, the number of sets with more sums than differences increases with m, but the probability that a randomly encountered member of the population has the MSTD property falls to zero as m increases.

Finally, where did this problem come from? The locus classicus appears to be an unpublished 1967 edition of a list of unsolved problems compiled by Hallard T. Croft of the University of Cambridge. Other authors, citing the Croft list, attribute the conjecture that differences always outnumber sums to John Horton Conway. Nathanson writes, however: “I asked Conway about this at the Logic Conference in Memory of Stanley Tennenbaum at the CUNY Graduate Center on April 7, 2006. He said that he had actually found a counterexample to the conjecture, and that this is recorded in unpublished notes of Croft.” The mention of Croft refers to the same 1967 list that others cite in attributing the conjecture to Conway. I have not seen this document; if anyone can send me a copy, I would be most grateful. Here are a few more slightly less obscure references:

Marica, John. 1969. On a conjecture of Conway. Canadian Mathematical Bulletin 12:233–234.