When Pierre de Fermat famously complained that he didn't have space to write the proof of his famous “Fermat's Last Theorem”, he only ran out of space of the margin of a book. Now, a pair of mathematicians at the University of Liverpool in the UK have produced a 13GB proof that's sparked a debate about how to test it.
The …

COMMENTS

Test? No problem

The entire proof is reduced to 1 nm depressions on the head of a pin which is dipped in hemlock and inserted in each author of the proof. If they live the proof is true if not well you get the picture.

Re: Test? No problem

The head of a pin is approx 1mm x 1mm. Assuming each bit is 1nm wide one can fit 10^12 bits on the head of a pin, so theoretically you can fit about 116 GB of data on the head of a pin. Each bit would be about 7 atoms across.

Re: Test? No problem

Yes indeedy

“Imagine a random, infinite sequence of numbers containing nothing but +1s and -1s. Erdős was fascinated by the extent to which such sequences contain internal patterns. One way to measure that is to cut the infinite sequence off at a certain point, and then create finite sub-sequences within that part of the sequence, such as considering only every third number or every fourth. Adding up the numbers in a sub-sequence gives a figure called the discrepancy, which acts as a measure of the structure of the sub-sequence and in turn the infinite sequence, as compared with a uniform ideal.”

Re: Yes indeedy

I'm a mathematician, but this is based on a quick reading of this article alone so may be complete rubbish:

Imagine an endless list of random pluses and minuses.

Take any section of that list, or say every other symbol, or whatever kind of pattern you like from it. This gives you another list of pluses or minuses that you've plucked from the original list.

Work out whether you have more pluses than minuses in that or the other way around (or maybe an even number of both?). The difference is called the "discrepancy". A discrepancy of zero means there's the same number of pluses and minuses.

Using your (carefully-chosen) shorter list, and the discrepancy, you could then tell whether, for example, most of the pluses are in the beginning of your original list, or whether your list alternates between pluses and minuses, or whether it has a long run of pluses followed by a short run of minuses or whatever pattern you're looking for, just by looking at your short list extracted by a certain clever pattern. You can tell things about the infinite list just by carefully choosing the rule you use to extract the shorter list.

To translate the sentence: "For any sequence, Paul Erdős believed, you could find a finite sub-sequence that summed to a number bigger than any than you could choose – but he couldn't prove it."

What I think he's saying is, you can always find a smaller list inside that infinite list that - if you choose it carefully - has a discrepancy (i.e. more pluses or minuses) bigger than the original infinite list. So you could always "fudge" the numbers by misrepresenting the larger list with a carefully-chosen pattern.

But, to be honest, it's not entirely clear and probably a LOT more complicated than even the article makes out.

And I'd be hard-pushed to come up with something practical out of it (though I'm sure there would be - this is the sort of maths that sits behind things like coding theory and, thus, sending messages, compression, error-correction, RAID, etc.)

Re: Yes indeedy

>>"My understanding is that you can't pick any old pattern. It has to be every number, or every second number, or every third, etc. But that's just from Wikipedia."

I think that's the classic version of the problem and is good because it is illustrative. But in theory the problem should apply to ANY pattern you come up with, so long as that pattern does not depend on foreknowledge of the numbers. I mean however complex your criteria for selection, there ought to be a subset that has greater discrepancy than it in an infinite sequence. No?

Re: Yes indeedy

I think what's in Wikipedia is more likely to be correct, as it makes enough sense that I can understand exactly what it means. And it seems like such a deceptively simple problem that it's surprising a proof, not of the theorem, but only that the discrepancy must be bigger than 2, not any number you care to name, takes 13 gigabytes.

Re: Yes indeedy

Perhaps I don't understand it either.

Given some finite subsequence, there is an upper bound on its sum. That is, if the sequence selected happened to be a run of all '+', the sum will be equal to the length of that subsequence. It can't be bigger.

Now, given the original infinite subsequence, its sum is determined by probability. The most probable sum is zero, but there is some probability that it is greater (or less). And given most probability distributions (that I am familiar with), they approach but never reach a probability of 0 at a sum of +/- infinity. So there is a non-zero probability that there is a sequence sum larger than any bounded value that I can think of.

Or there's something about infinite series and probabilities that I'm missing.

Re: Yes indeedy

Reading the Wikipedia definition of the problem seems to limit the scope a bit. The idea is for any sequence, there should be some values d, k and C, such that the summation of every dth element in the list from the first k elements in the list would yield a number (discrepancy) greater than C.

Neither of the definitions provide any limits on the type of infinite +1/-1 sequence, which would make the problem trivially false: with a sequence of all -1s, it would never be possible to create a positive discrepancy at all. Alternatively, a infinite sequence of "+1, -1" would never yield a discrepancy greater than 1.

As such, it's likely that the problem is more concerned with random sequences or at least sequences with an infinite number of +1s and -1s. Then the problem starts to look like a simple application of the statistical law of large numbers. For which we would conclude that, yes, for every sequence containing an infinite number of +1s, it must be possible to select every dth element from the first k, such that any given value C could be exceeded. More over, it must also be possible to find such a string given a fixed d, because any randomly chosen subsequence of a random sequence is itself a random sequence. Then all that's left is to find a k (> d*C) that sums to C+1, which would be a simple linear search through the infinite sequence. Of course, having invoked the law of large numbers, this can't be considered a formal proof. Trying to come up with a mathematical proof feels like looking at the proofs for infinite series in general, they're interesting, but all kinda weird.

Alternatively, Wikipedia and the New Scientist may just suck at explaining all the details of the problem.

Re: Yes indeedy

>>Neither of the definitions provide any limits on the type of infinite +1/-1 sequence, which would make the problem trivially false: with a sequence of all -1s, it would never be possible to create a positive discrepancy at all."

That's not an arbitrary pattern. That's a case of looking at the data first and then constructing a pattern to fit. The pattern must be independent of the random sequence. So you choose, e.g. every third number, and then apply it to a random string. Not study the string and construct a pattern.

Re: Yes indeedy

And I'd be hard-pushed to come up with something practical out of it (though I'm sure there would be - this is the sort of maths that sits behind things like coding theory and, thus, sending messages, compression, error-correction, RAID, etc.)

My understanding is that discrepancy theory has applications similar to Ramsay theory - it tells you how much structure will inevitably arise in a random distribution (well, it really puts a lower bound on the amount of such "accidental" structure), so it's useful for things like deciding what threshold constitutes a probable signal (rather than noise that looks like a signal) and that sort of thing.

I've seen Ramsay theory applied to some machine-learning problems to help filter out false correlations, for example.

The answer isn't 42?

For every infinite string it is possible to define a finite substring with a discrepancy of any size.

If the discrepancy is 42, you know you have found a substring which fits the problems of the Adams universe.

I'd suggest, if it isn't, and you still have a problem, that you try a substring of different size if the discrepancy is positive, and use a different pattern if it goes in the wrong direction or is near-zero.

Of course the answer is 42!

Been an issue ever since the 4 colour theorem proof came out.

Computer generated so too big to test by hand.

Yes as a puzzle it sounds a bit pants but "satisfiability" and +1/-1 are the sort of phrases that crop up in minimal digital logic design and custom logic layout for minimum surface area (and hence cost) on a chip.

As the complexity of proposed mathematical proofs increases, the likelihood of humans being able to fully understand it falls. The chances of mistakes in the proof that are not picked up by the reviewers increases.

Re: maths

Proof by induction

This is effectively 'proof by induction', where you say (sort of): if it's true for this, it must be true for that, and I've just shown it for this, so it must be true for all that. It works if you dig deeper into it. Proof by induction can get messy because you have to show (sometimes laboriously) that it is true for a certain case or cases.

Re: wikipedia download too big?

"lots of +1 and -1 sounds like it should zip (or rar, or bzip, or etc.) fairly well" - really not sure where you got the idea; if you think of +1/-1 in terms of good old binary 0/1, you should be well aware that while some of it zips quite well, others such sequences (like JPEGs, MP4s etc.) don't compress practically at all. It really depends on how random your sequence already is...

Re: Where did they get the numbers?

Empirical testing? As in run a test on a random selection of data and assume if it gives a decent p value then the hypothesis can be considered correct?

That sort of thing doesn't cut it as mathemetical proof.

It might be good enough for sciences less pure than maths (e.g. sociology, physics, all other sciences), but even a a statistical proof - the probability of this being wrong is 0 - isn't enough for a mathematician.

Re: Where did they get the numbers?

>>"Empirical testing? As in run a test on a random selection of data and assume if it gives a decent p value then the hypothesis can be considered correct? That sort of thing doesn't cut it as mathemetical proof."

Well I agree. But look up what a SAT attack is. Unless I've misunderstood the article, they ran a SAT algorithm solver on a sufficiently large data set and "proved" it statistically. Hence my question about the randomness of the source data. Even if one accepts a proof by demonstration rather than induction (which you do not and I am reluctant to do), that remains a potentially serious flaw in the proof.

Re: Where did they get the numbers?

That's not that serious a question, unless you haven't read the article. The theorem isn't about one particular sequence of -1s and +1s, it's about any and all possible such sequences. No matter how hard you try, you shouldn't be able to construct a series so cannily arranged that, if you're allowed to choose between the absolute values of the partial sums of the series itself, every 2nd, every 3rd, every 4th, and so on, you can't make that result as large as you wish.

I smell snake oil .....

When the lottery started, I knew a guy who was convinced he could predict the numbers after a few draws. He created a speadsheet with a list of draws lowest ball to highest. He then worked out the average for each column. He then worked out the deviation from the average for each ball, and then worked out an average of that. He then used those derived averages to generate a list of numbers which he then played.

Re: I smell snake oil .....

Not a new problem...

Surely this is hardly new. When Kenneth Appel and Wolfgang Haken proved the four colour theorem in 1976, it was originally not accepted by mathematicians as the computer-assisted proof was far to large to be proof-read by human beings. it gained a sort of grudging acceptance.

However, I always thought the objection was flawed in principle. Firstly, it's possible to consider a computer program (and the hardware on which it runs) as a mathematical entity which, itself, is subject to inspection and being proof-read. There have even been attempts to produce mathematically provable programs (although that is only possible in principle for some sorts of programs) and, even processors - anybody remember VIPER? Of course, any proof-reading of a program, especially one which cannot be mathematically verified, is open to human error. However, that is also true of human proof-reading of mathematical proofs. The possibility is always there of a human error, especially for very complex proofs.

It's therefore true, in principle, that the body of mathematics is open to human errors, albeit not on the scale of complex computer programs. It is therefore a problem of scale, and not principle.

And somewhere in that sequence is every other picture you could possibly imagine and, indeed, every other finite sequence you could imagine including arbitrary long sequences of 1's and 0's, the complete text of Wikipedia, the answer to the ultimate problem of life, the universe and everything etc., etc. However I wonder whether you could specify a sub-sequence in the sense of the proof by specifying its content or properties rather than its position? Such content addressability is an everyday aspect of computing, I wonder how this would affect the proof and/or how such a strategy would be excluded?

And somewhere in that sequence is every other picture you could possibly imagine and, indeed, every other finite sequence you could imagine including arbitrary long sequences of 1's and 0's

This is a popular claim, but it's incorrect, or more precisely the conditions you've specified are not sufficient to make it correct.

Consider the infinite sequence defined by the following:

1. A random finite number > 1 of +1s

2. A -1

3. Go to 1

This is infinite and random: if you are looking at one element of the sequence, and it is a +1, you cannot predict the next element; and if it's a -1, you cannot predict the element after the next. But it never contains the subsequence {-1,-1} (or any longer subsequence of -1s).

"Random and infinite sequence" does not imply "contains every possible subsequence".

Now, insofar as this sequence has some (non-accidental) structure - subsequences of -1s are restricted in length - it is not purely random. But no one in this thread said "purely random". Further, no finite subsequence of this not-purely-random sequence can be proven to not be a subsequence of a purely-random infinite sequence: you can't tell by inspection that it has non-accidental structure, only by definition.

Re: This is the way it's going to go, isn't it?

Actually, in this instance they are not confused. The mathematicians ran a SAT algorithm solver. Essentially "proving" it statistically. This is not a proof by induction. That's why the proof is so large.

Re: You got it wrong, vultures

Re: You got it wrong, vultures

I didn't see anything in the article about specific values of C. Did I miss something, or was the article updated?

Technically, the way the Erdõs Discrepancy Problem is phrased, at least according to the 'pedia, this work constitutes a proof for C=2.

The problem is "for any ±1 sequence, there exists k, d, C such that |Σi=1kxi*d| > C. So contrary to the informal explanations in previous posts, it's "take a subsequence starting from the beginning of the sequence, and then take every dth element of that, and add them, and take the absolute value, and compare it to C".

(Apologies for the formatting. Reg, can we have LaTeX? MathML? Probably some time after we get proper support for the HTML PRE element, I guess.)

The shorter version: you start at the beginning and add every Nth element, and eventually you'll get arbitrarily far from zero, if you pick N correctly.

In any case, though, it appears this latest work is a proof for C=2, since the problem says "greater than C".

But...

Computers and Proofs

I think it was the English mathematician, Keith Devlin (fomerly of Lancaster Uni and now, I think, of Stanford) who pointed out that there is as likely to be a mistake in a 2 million line computer program as there is in a 2 million line (human created) mathermatical proof.

The computer-based proof of the 4 colour map theorem works because it exhaustively demonstrated that all maps with certain features could be coloured using at most four colours, and that the list of those certain features was complete for planar maps.

The assertion of a proof for the Erdos discrepancy problem seems to require somthing different, as clearly not all (or indeed any) infinte random binary sequence can have been fully analysed. Seems like I'm going to have to read the original papers. Although if Erdos couldn't prove it, it is going to be tricky.

Re: Computers and Proofs

I think it was the English mathematician, Keith Devlin (fomerly of Lancaster Uni and now, I think, of Stanford) who pointed out that there is as likely to be a mistake in a 2 million line computer program as there is in a 2 million line (human created) mathermatical proof.

Does he have a proof of this? Is "as likely" meant to be a precise statement of probability? Because that looks unlikely to me, for three reasons:

1. Programming languages are heavily constrained by their syntactic and semantic rules - more so even than the formal prose of non-symbolic lines of human-written mathematical proofs. That means lower entropy per line, and thus a lower probability of an error (in a program that is successfully interpreted by the machine). This is equivalent to saying that a machine-readable proof will be longer than a human-readable proof of the same information entropy. (We probably could use Kolmogorov complexity theory and Chaitan's algorithmic information theory to set some bounds on that, but I'm only posing it as an objection.)

2. Along similar lines: the machine implicitly checks a vast number of correlated values in the process of executing the proving program (by not wandering into undefined behavior and failing to produce a coherent result). This redundancy reduces the information entropy of the transformation of the static program into its dynamic execution, and so further reduces the probability of error.

3. Human-constructed proofs are rarely entirely symbolic (the programs of Whitehead & Russell, Hilbert, etc notwithstanding), and so ask the interpretant to engage in rational deliberation of claims. That is the source par excellence of mathematical innovation, but it is also fraught with error; indeed, that is its strength.

So I suspect an N-line1 computer program, if it executes successfully, in fact has a probability of error that is only a fairly small fraction of the probability of an N-line human-written proof, once N gets sufficiently large. Of course, that's still plenty of room for error. But also of course, the probability of error can be further reduced by many other mechanical techniques, and it's unclear (and per AIT probably unprovable) how far down that process can go.

1It might further be objected that source-lines-of-code is a meaningless metric for the complexity of a program anyway. But by Kolmogorov equivalence, N lines in any one Turing-complete programming language is equivalent to N*C lines in any other, where C is a constant and usually relatively small. So SLOC is good enough for this line of argument.

rephrased?

If you look at the +1 and -1 values as steps in a random walk over the number line, then there must be some subsequence of steps that moves you arbitrarily far from the origin. It seems intuitively obvious: on the one hand, it's kind of like an infinite monkey argument or, say, that if you take all the digits of pi in some number base and squint at them in the right way (cherry picking some subsequence), you'll find some pattern that looks meaningful. Actually, Pi, the film uses this as its premise, now that I think of it.

The other way it seems intuitively obvious is that if the sequence is actually random and you start slicing it up, you're bound to be able to find some sequence that has lower entropy than expected. It shouldn't matter, though, because the entropy of the entire sequence is fixed (so using Kolmogorov complexity to define the entropy, the fact that there are some signs of structure, it doesn't mean that this negentropy is useful for compressing the entire string). That's the way I see it, anyway. I just don't know what the point is of looking for structure in unstructured (random) strings.

Re: rephrased?

It's not about whether structure exists. It's whether for any arbitrary pattern you can come up with (i.e. based not on the pattern of numbers itself), a subset exists with greater discrepancy. We know this intuitively, but how can we prove it?

@teebie Re: Corrections.

Bah!

Two things:

a) Fermat was pulling the old "hence it is obvious from the diagram" stunt that was roundly rejected as sound reasoning or excuse-making by my Applied Maths teacher in '73.

2) It is *NEVER* appropriate to point to a Wikipedia page to dodge explaining something mathematical to the non-hard-sums-numpties out there. If there was one field of study in which the Wikipedia is solidly demonstrated to be an encyclopedia for anyone BUT the people it is mathematics. Opening paragraphs reject the Britannica's technique of beginning with a simple explanation and then working up to a full-blown exposition in sumsspeek in favor of a wall of sumsblither with links here, there and everywhere (often running in big, unhelpful circles) which only compound the gibberish and waste an afternoon to no good effect.

The math wonks who put together the Wikipedia math pages should take an afternoon to follow the linguistics pages and, chastened by yet another fine example of blither over helpfulness, return and recompose every damned page.

Re: Bah!

a) Fermat was pulling the old "hence it is obvious from the diagram" stunt that was roundly rejected as sound reasoning or excuse-making by my Applied Maths teacher in '73.

Oh, I don't know. I think that Pythagoras' theorem is self-evident from this diagram

2 [sic]) It is *NEVER* appropriate to point to a Wikipedia page to dodge explaining something mathematical to the non-hard-sums-numpties out there

Oops. I appear to have linked to a wikipedia page. Oh noes!

In all fairness, though, I do agree with you that Wikipedia pages on maths are generally bad. They're definitely failing in their attempts to explain most concepts to the general reader, and often even to those who aren't frightened by and/or have a modest level of maths ability.

What's Random Doc?

Be interested to know how they generated random numbers because the random number generators used by computers are not really random at all, it would be better to call them "sequence generators" as with the same starting configurations they will generate the same numbers.

The only truly random events are at the quantum level such as radioactive decay which can be tricky to incorporate into a PC, radio noise (cosmic background radiation) can be utilised to provide an adequate source of random numbers but the pseudo-random number generators in normal PCs are not random enough for this sort of thing.

Proof

Err... "random sequence of 1 and -1 must have an offset" is the laymans simplification.

Emperical evidence is looking at a badly wired Ethernet link (i.e. one that is truly floating). Here the +1 and -1 charges should cancel if they are equal in number. (A day of general Ethernet traffic could be considered as a random sequence.) But you will find that the charge will float in one direction or the other over time, indicating a bias for +1 or -1 in the signal.

Re: Proof

Err... "random sequence of 1 and -1 must have an offset" is the laymans simplification.

Too simple, as that's neither a restatement of the problem, nor true. But it's looking in the right direction.

In the EDC:

1. The sequence must be infinite, and need not be random.

2. The sequence as a whole need not have a bias from zero.

3. The EDC hypothesizes that you pick a subsequence starting from the beginning, and summing every Nth element in that range; and then it says if you pick N correctly and take a long-enough subset, you can get arbitrarily far from zero.

So the EDC conjecture doesn't just say there's a bias; it says that if you take a large enough subsequence (starting at the beginning) and choose the correct stride, there's an arbitrarily large bias.

This latest work claims to prove that this is always true for a bias of up to 3. No one's proven yet whether it's true for larger values, much less proven the general case.

So glad I'm not a mathematician

I've taken Calculus through Calc 2 plus Linear Algebra and Theory of Computation, and I have to tell you that I'm glad that someone else likes to do nothing but sit around all day and imagine new problems to solve in mathematics, because I personally never want to punish myself with exposure to it again. You have to be a special kind of crazy to like it.

You could read the actual paper

???

using a fixed 'pattern' to sample the bulk with, seems counter-intuitive as a method to prove this. as any proof that the bulk data is truly random is next to impossible in the first place. rather,does this exercise attempt to extract a 'proof of randomness' of the original bulk stream via this method?

Surveys can prove anything

In my opinion , this proves that surveys containing yes(+1) and no(-1) type of questions for a finite number of people(which is always the case), CAN PROVE ANYTHING. so, conclusion: DON'T TRUST SURVEYS.