Posted
by
Soulskill
on Tuesday February 18, 2014 @02:41PM
from the which-this-margin-is-too-narrow-to-contain dept.

mikejuk writes "Mathematicians have generally gotten over their unease with computer-assisted proofs. But in the case of a new proof from researchers at the University of Liverpool, we may have crossed a line. The proof is currently contained within a 13 GB file — more space than is required to hold the entirety of Wikipedia. Its size makes it unlikely that humans will be able to check and confirm the proof. The theorem that has been proved is in connection with a long running conjecture of Paul Erdos in 1930. Discrepancy theory is about how possible it is to distribute something evenly. It occurs in lots of different forms and even has a connection with cryptography. In 1993 it was proved that an infinite series cannot have a discrepancy of 1 or less. This proved the theorem for C=1. The recent progress, which pushes C up to 2, was made possible by a clever idea of using a SAT solver — a program that finds values that make an expression true. Things went well up to length 1160, which was proved to have discrepancy 2, but at length 1161 the SAT returned the result that there was no assignment. The negative result generated an unsatisfiability certificate: the proof that a sequence of length 1161 has no subsequence with discrepancy 2 requires over 13 gigabytes of data. As the authors of the paper write: '[it]...is probably one of longest proofs of a non-trivial mathematical result ever produced. ... one may have doubts about to which degree this can be accepted as a proof of a mathematical statement.' Does this matter? Probably not — as long as other programs can check the result and the program itself has to be considered part of the proof."

Funny thing about this. They've checked it. Actually, their "check" of this proof is many of orders of magnitude more rigorous than when, for example, a reviewer "checks" a math paper for errors before firing off a positive review. Nondisclaimer: I'm a mathematician.

My understanding is that checking an output of a proof assistant/generator is a trivial matter (i.e., a trained monkey should be able to do it). It's just that it's a lot of data even for the most patient humans in this case.

Mathematicians are supposed to be able to think at a higher level of abstraction than most other folks. Any mathematician who claims that 'this is too much for a human to check' is an idiot. It's not too much. We understand how computers work. They're way less error-prone than humans.

1) Verify the proof that the verification algorithm works.
2) Obtain several independent simple, portable implementations of said verification.
3) Run said implementations on proof certificate on a variety of hardware.

Trust the math, and where it comes to the hardware and software, trust but verify. Too long to check without aid of a computer? Sure, I'll buy that. But you'd have to be an idiot to want to check this proof without a computer. Why is this news? (actually, the result in discrepancy theory is wonderful, and I'm very happy to see it here on Slashdot... but massive computer proofs are truly nothing new)

But you'd have to be an idiot to want to check this proof without a computer.

Historically, mathematicians have resisted computer-only proofs. They want eyeballs end-to-end. Your ideas (independent implementations, hardware, etc.) are sound/feasible from a software engineering standpoint, but unsatisfying to mathematicians. (Not being a mathematician myself, I'm ill-suited to testify as to why that's the case, but so it is.)

Maybe one day that mindset will be abandoned, but what's more likely to happen is that mathematics will bifurcate: there will be the set of mathematics that re

Not exactly. Machine-checkable outputs tend to come in one of two varieties: certificates (of which this is an example, since it's an UNSAT certificate) and proofs proper.

Proofs (which are the sort of things you'd feed to Coq or Isabelle) tend to rely heavily on built-in tactics. There are some theories (classical logic, intuitionistic logic, Presburger arithmetic, Tarski arithmetic, etc) which are known to be decidable, but the decision procedures are beyond most humans, let alone trained monkeys. For exam

You're the imaginative one, aren't you? Distributed proofs tend to duplicate work -- at least k contributors prove every claim, and each contributor should have some portion (maybe log(# contributions) or so, IIRC) of their work double-checked by an expert. Sure, it takes k times longer, but you can design it to take advantage of statistics to certify the proof more trustworthy than the average math paper.

I wish modern mathematicians believed the math that they prove day after day for undergrads. If th

Even if we cannot prove it formally, systems like this can put together a system with a very high probability of being correct if we simply test their results. If you can get multiple automated proof systems to claim it is impossible, and you trust the automated proof systems with a moderate degree of certainty, you can trust the results with about the same certainty.

For many problems, having something "statistically proven" is good enough.

Coq and Isabelle are far better choices, being better supported and open source. And if you need their credentials, there is a Coq-checked proof of the four colour theorem, and an Isabelle proof that the L4 kernel is secure.

I wonder about the usefulness of a unit of measure that constantly changes. Perhaps we should also consider storing the Library of Congress inside of a temperature controlled, airless chamber. We could then store this unit in the Library of Congress for further refere-

What you say would be true, if the word "that" had appeared in the original summary; however, it didn't. Here's the original wording: "The proof is currently contained within a 13 GB file — less space than is required to hold the entirety of Wikipedia." Nothing to stress or interpret there, although I will grant you that the inappropriate use of the emdash does confuse matters a bit.

I suppose it's all moot now anyway, since the summary has been edited to protect the guilty and to correct the mistak

Science requires testable, provable, repeatable. If a human cannot understand the proof then he cannot participate in the science. This is likely to be referred to as an "early" version of machine-exclusive science.

I'd hesitate to call one big for loop "AI." The interesting part of the proof is the reduction to SAT, and that's easily understood by mathematicians. The computer part is a straightforward and dull brute force search.

As someone nearing the completion of his Ph.D. in a subject close to SAT I can say that SAT does not resemble "one big for loop", not a bit. A modern SAT solver can solve problems with millions of variables and hundreds of thousand clauses. In contrast, a brute force for loop would require O(2^N) iterations where N is in the millions, which is like eternity. As an exercise, please try to write a trivial solver that can handle even 100 variables.

Also, unlike what you may think, a SAT proof is not a list of "I tried a=1 and it did not work out, and this is the proof that a=0". A standard SAT proof [wikipedia.org] deduces new clauses from the original problem by applying the resolution rule [wikipedia.org] repeatedly. The newly deduced clauses reduce the search space and, if the problem is unsatisfiable, the solver ends up with the empty clause, which is always FALSE. The proof is a collection of resolution steps that lead to FALSE.

SAT solvers are AI at least since:

1. They employ search (not unlike chess game).

2. They have non-trivial heuristics (not unlike chess game).

3. The heuristics evolve and improve over the course of a run.

4. They are able to deduce new clauses from the original problem.

5. Many solvers employ a lot of smarts to simplify the problem even before starting search.

SAT is clearly NP complete, and clearly the existence of good SAT solvers is not a proof that P=NP. This means that there will be relatively small problems that SAT solvers won't be able to solve. On the other hand, most real-world problems have a hidden structure which SAT solvers are able to find and use to their advantage.

Yeah, I'm familiar with SAT solvers and the fact that they aren't REALLY full brute force; I oversimplified it a bit for the Slashdot crowd. Might have gone a little too far on the "lies to children [wikipedia.org]" scale, mea culpa.

My point was that anyone with high school level math experience can understand the basic problem of boolean satisfiability; I was trying to draw a distinction between problems that are beyond human comprehension and those that are merely beyond human time and ability, with huge SAT instances falling into the latter category. Shouldn't have glossed over the details quite as badly as I did.

You are right that the basic concepts of SAT solving can be understood by a smart person with high school math experience. But I don't agree that this is as simple as that for the fine details of modern SAT solvers. Some solving steps are non-trivial and look weird some times. Especially the steps that the solver takes to simplify the problem, in the middle of the run.

Today, SAT is not only about binary resolution. Some of the stuff is difficult even for Grad students. There are hundreds of non-trivial pape

I was trying to draw a distinction between problems that are beyond human comprehension and those that are merely beyond human time and ability, with huge SAT instances falling into the latter category.

Exactly, the use of SAT unsatisfiability certificates is just a proof strategy... Just like reductions used to show complexity classes, nobody comprehends all of them...

SAT is clearly NP complete, and clearly the existence of good SAT solvers is not a proof that P=NP. This means that there will be relatively small problems that SAT solvers won't be able to solve.

Enjoyed your post, but have to correct a small quibble.

From a mathematical standpoint at least, being NP complete doesn't imply that there are some problems that are unsolvable; merely that they won't be solvable in any reasonable amount of computing time. If you have a few hundred billion years of compute time available, a SAT solver might be able to solve even those small problems you mention. Of course, from a practical perspective, none of us are going to be here to get the result in those situations, making them unsolvable from a practical standpoint.

(On the other hand, once the billions of aeons roll by and the machine goes 'ding' and spits out an answer, we do know that we can verify it in poly time. Huzzah!)

While all of this may seem ultra-pedantic, there is enough confusion about NP out there that someone reading your post may get the idea that things that are NP-complete are unsolvable. They're not unsolvable -- we can typically fashion algorithms to solve them, simply that those algorithms run in nondeterministic polynomial time, and thus may have runtimes exceeding the expected lifetime of the solar system, even with every cycle of compute time ever invented pushed at it.

...unless, of course, someone comes up with a proof that P = NP, in which case all those NP-complete problems can be transformed into P problems. Sure, they might still take a few hundred billion years to get a solution, but at least we'd know how many hundreds of billions of years would be needed to get a solution!

It's far from the beginning. That would be in 1976 with the computer proof of the four color theorem [wikipedia.org], which was the original controversy over proofs only checkable by a computer. The 13GB proof is certainly huge, but proofs that a human can't check aren't new. I don't know how it is in mathematics, but in programming languages research, any proof that isn't computer-checked is suspect because humans are just really bad at being completely consistent at long repetitive processes like checking proofs.

Agree in principle, but I'm not sure this fails that standard to the extent that it's relevant for science to work. Sure, a human may not directly understand the entire proof. However, like with the Four Color Theorem, they can verify:

- A proof checker would catch errors if there were any, and has failed to.- The thing it purports to prove is in fact (a representation) of the theorem the submitter claims to have proven.- The proof generator generates only valid steps.

Actually it is worse.When somebody finally got around to looking at the 9.5GB proof, it started like this:

All work and no play makes HAL a dull program.All work and no play makes HAL a dull program.All work and no play makes HAL a dull program.All work and no play makes HAL a dull program.All work and no play makes HAL a dull program.

In the results there is the following statement."As any idiot can plainly see"

They wouldn't say that. Now if you wrote "... Obviously we have...." or "clearly it follows" I remember the first time I said to a student - Nope, it isn't obvious. Explain. Deer in headlights look. WHaaaaaa?

I think this is a failure of editing. WIkipedia's database is 9gb compressed and 44gb uncompressed. (source: http://en.wikipedia.org/wiki/W... [wikipedia.org]) The statement that it is less than Wikipedia's database is a useless comparison.

I don't see why you need to go through the fuss of the 13 GB file. What was the algorithm used to make the file? Prove that the algorithm works. That's your proof. (Run the program a few times, so the probability of errors in the output is close to zero. Remember that the probability of the computer making a mistake (cosmic rays, transistor noise, etc) is smaller than the probability of a human mathematician making a mistake.)

Run the program a few times, so the probability of errors in the output is close to zero.

No. If it's indeed a proof the probability of errors must be 0, not just close to it.

He's referring to errors during runtime (electrical noise, bit flips, not enough spiders in the case, etc.), not errors in the logic.If the generator's logic is provably correct, then the things it generates are as well as long as your hardware it working properly. There is no way to rigorously prove hardware works correctly for all input strings, for all time, for all environmental conditions, across all variations due to manufacturing, etc.

There's not really any such thing as "provably correct logic" to begin with. A some point you just have to decide that the chance of errors across the process is low enough to go on with. I think of this as the "certainty noise floor": it's not important whether the chance of error is 0, but that the chance is really quite small, because that's the best we ever get.

" Prove that the algorithm works. That's your proof. (Run the program a few times, so the probability of errors in the output is close to zero"

"probably true" is NOT a prove.

This isn't a probabilistic 'proof' - it is straight-up deterministic: the SAT result proves it true. Period.

The poster above is alluding to the fact that a random software error could occur that gives the same result erroneously. Thus running the program is used to show that this isn't the case at all.

To assert that a lengthy, complex mathematical proof entirely written by a human is absolutely true requires you to believe the human is incapable of error (Wile's proof of the FLT ran 150 pages and this is n

Gödel [wikipedia.org] and Turing [wikipedia.org] make strong cases that proving the algorithm works for some inputs that are correct proofs doesn't count as proof it will work for all correct proof inputs. So no, even if you "prove the algorithm works" it is not the same as a rigorous mathematical proof.

Forgot to mention those guys showed that such an algorithm that "works" for all valid proofs is not just difficult but mathematically impossible.

No, that's not actually what they proved; it is perfectly possible to prove a given algorithm works for all possible inputs, and even that a proof checker works for all valid proofs. There are certainly things that they proved impossible (e.g. a writing a program that can provide a proof for any true mathematical statement, or that can determine if two arbitrary programs are equivalent), but those don't apply here.

Gödel [wikipedia.org] and Turing [wikipedia.org] make strong cases that proving the algorithm works for some inputs that are correct proofs doesn't count as proof it will work for all correct proof inputs. So no, even if you "prove the algorithm works" it is not the same as a rigorous mathematical proof.

You're comparing apples to oranges (and lemons.)

If the algorithm can be proved correct (within whatever axiomatic system you're using) then it's correct. The End.

Gödel's incompleteness theorem shows that certain statements about axiomatic systems can be true but cannot be proved. That doesn't mean you can't be certain of something that is in fact proved (subject of course to the axioms.)

Turing's halting problem is a statement about limitations in the ability of algorithms to examine other algorithms. Again, it doesn't mean you can't prove that an algorithm is correct.

I think the issue here stems from the concept of "correct" and how knowable that value is.

Turing's halting problem is a statement about limitations in the ability of algorithms to examine other algorithms. Again, it doesn't mean you can't prove that an algorithm is correct, no matter how "correct" the algorithm appears.

That's kind of my point. Given this proof, it would show that the algorithm is incorrect if the proof is shown to be invalid, yet the proof is too long to be verified by anything but another

I think the issue here stems from the concept of "correct" and how knowable that value is.

Turing's halting problem is a statement about limitations in the ability of algorithms to examine other algorithms. Again, it doesn't mean you can't prove that an algorithm is correct, no matter how "correct" the algorithm appears.

Um, excuse me. If you're going to quote me and change what I said, then indicate your edits. I have done so above, in bold. Not that I can make sense of them.

That's kind of my point. Given this proof, it would show that the algorithm is incorrect if the proof is shown to be invalid

Wha...? That's just plain wrong. I can think up all kinds of invalid proofs of the Pythagorean Theorem. But showing that a proof is invalid does not mean the theorem is incorrect. It just means your proof is.

yet the proof is too long to be verified by anything but another algorithm, so the halting problem is definitely relevant in a discussion about algorithm-generated proofs which can't be verified by humans.

Again, Turing's halting problem illustrates limitations on the ability of algorithms to decide certain propositions. It does not mean that algorithms can't decide anything. You seem to think that it does.

Yes, some proofs can be generated by algorithms and others can be checked by algorithms, but a mathematician is necessary at some point in the process since no non-trivial generating algorithm can be shown to create only correct proofs and no universal checking algorithm can be created which generates no false positives or negatives.

Your fallacy is that one cannot trust specific algorithms to prove things because no such universal algorithm can be created.

Um, excuse me. If you're going to quote me and change what I said, then indicate your edits. I have done so above, in bold. Not that I can make sense of them.

Sorry, I realized at one point I was editing the quote rather than my own text and thought I fixed it but apparently missed one of my edits and there's no reason it should make sense there. Entirely my fault.

Wha...? That's just plain wrong. I can think up all kinds of invalid proofs of the Pythagorean Theorem. But showing that a proof is invalid does

Note also the poster I responded to didn't say prove the algorithm is correct and error-free, but rather that it "works," which means generates a correct proof, the checking of which (probably) invokes the halting problem. I say probably because it's likely no easier to write an algorithm that is designed specifically to check this "proof" for correctness than it is for a mathematician to verify it manually, and therefore it would be verified by a general one designed to verify proofs. The halting problem

Both are also general case statements. There are many, many programs which can be proven to halt or to loop indefinitely quite easily. The trick is that you can't create a program which, when given ANY program as input determines whether it will halt. Your program may well work for billions of programs, but there is at least one program for which it must fail. (Actually, there are infinitely many programs for which it must fail, but they're sparse in the space of all possible programs...)

Gödel [wikipedia.org] and Turing [wikipedia.org] make strong cases that proving the algorithm works for some inputs that are correct proofs doesn't count as proof it will work for all correct proof inputs. So no, even if you "prove the algorithm works" it is not the same as a rigorous mathematical proof.

Not true - proving the algorithm works is the same as a rigorous mathematical proof, if you prove mathematically and rigorously that the algorithm works. (The comment about running the algorithm a number of times was simply to guard against the proven-correct algorithm producing a wrong result due to a machine malfunction.)

Proof is absolute, within the confines of the accepted axioms. Within the larger scope of things, we accept proof probabilistically, and this includes the entire works of every mathematician ever to live. Bayesian stats attempts to capture this idea that knowledge is never absolute, but merely held with probabilistic certainty, and all things are based on axioms (inherently unprovable, but assumed to be useful) ultimately. I only gripe (and boy is it a really fine, pedantic gripe), because your comment comm

No, not really. Or perhaps I should say: one can never be absolutely certain that a proof is correct. Practically the flaws in the model (when the model is just math) are so small compared to likely flaws in the modeling that it's best to ignore them, but even in the abstract there is no "absolute proof".

The problem with this is that "axiomatic system" is an inadequate caveat. You also have to blindly assume that some specific system of deduction works. In practice, specific axioms are usually chosen based on the assumption that induction works, the basic unprovable assumption underlying all of science. But it's worse than that: you can't even prove that deduction works! (It's obvious in hindsight, really.)

Any logical system simply asserts rules of deduction. Why use some particular rules of deduction

So, you're saying that the rules of production are axioms too. Still doesn't change what I said. But, you do arrive at the same endpoint that I arrived at, which is that if you don't accept some of them at some point, you'll wind up back at my original point.

otherwise you'd be arguing with solipsists over every detail, no matter how blindingly "obvious"

So, you're saying that the rules of production are axioms too. Still doesn't change what I said.

Not that it changes your conclusion, but it's an important difference in kind. Axioms are just the assumptions of the model, and you can reject certain axioms without being a hardcore skeptic who doubts logic. The latter assumption - that deduction works - is an assumption often made by solipsists who won't grant the assumption that sense data is accurate, but never think to question their other assumptions. It's great fun to poke at the solipsist position that way!

SAT solving is easy when there is a solution. When there is not, it gets very hard, as basically the solver enumerates all ways it could have found a proof and shows for each that it did not work. Still faster than a full exhaustive search (which is infeasible from, say 80 bits or so of problem space size). On the other hand, SAT solvers are not that complicated if you ignore implementation details. So the solver itself, together with the 1-bit answer "no" could be used as proof instead of the 13GB. My gues

My only questions are is it possible to simplify the proof? And how hard would that be?If we have a testable proof, then it should be possible to throw another algorithm on it to simplify and optimize it...Only after that step should it be considered ready to inspect and test by others.

Please tell me the browser cache is screwing with me. Please tell me that my wife wants to have sex more often ( ok that isn't going to happen, I have a 12 and 15 year old) Do we really have Slashdot.org back?

There are other theorems with computer-assisted proofs that are too complex to verify by hand, going back decades. The four colour map theorem and the classification of finite simple groups are two examples.

Okay, so the proof in the article may be too lengthy to accurately check, so why not work with something simpler.

For that, I present the proof that Pi R Square is incorrect.

1: Write down on a piece of paper the commonly used 3 digit shortened approximation of Pi: 3.142: Hold that paper up in front of a mirror.3: Note that in reverse, the characters 3.14 now look like PIE.4: Pie is typically round, therefore since Pie is round, PI.E is 3.14 backwards, and 3.14 is the common shortened value of Pi, then P

I was trying to classify the normal subgroups of PSL(n,q) where n,q may be nonstandard elements of a nonstandard model of arithmetic. I pointed out that if Ariah Lev's work formalised correctly, then a few steps would yield the result I wanted, but that this was beyond checking. Once the PhD was done, I did further investigation, and scribbled a thought in a moment of insight, and left it to tidy itself up. I believe I put an entry on either chalisque.wordpress.com or deardiary.chalisque.org, but forget

You presume the proof has unique steps at every point. It doesn't, if something couldn't be found in a random sequence of 1161 numbers, then it couldn't be found in an infinite sequence (my apologies for paraphrasing, go read the article). So they used a computer to check the 1161 numbers. So they essentially had a for loop. The code for the for loop was finite. The loop was finite. A few invariants and a bit of Floyd-Hoare logic and whallah, the proof be checked, just not the usual way you'd expect.