Currently in my undergraduate courses I am being taught how to set up various machinery using slick, short proofs and then how to apply that machinery. What I am not being taught, largely, is what came before these slick, short proofs. What did mathematicians do before so-and-so proved such-and-such lemma? Where, in other words, are the tedious, long proofs that we can look to as examples of the horrible mess we are escaping? What insights helped mathematicians escape those messes?

Right now I am particularly interested in examples from measure theory. What did people do before, for example, Dynkin's lemma or Caratheodory's extension theorem? Or were these tools available from near the start?

An answer should include both some indication of how tedious and long the old approach was and how much slicker and shorter the modern approach is. Ideally, it should also discuss how the transition between the two happened.

(If you prefer the old approach to the modern approach, for example for pedagogical reasons, that would also be interesting to hear about.)

Nice question. But I would not call it "Extremely BAD proofs". Sometimes messy and cumbersome proofs were the only possible for their times. I'm thinking in particular to "proofs" given by classical algebraic geometers before commutative algebra, sheaves and cohomology were developed. Or to the tedius, long and case-by-case "exaustion method" used by Archimedes in order to compute areas and volumes. I would not call it bad...
–
Francesco PolizziOct 27 '10 at 16:11

The first things coming into my mind: Four-color theorem, classification of finite simple groups, Capelli's identities. Now I don't know whether they have nice and slick proofs now, but I don't see why they shouldn't1
–
darij grinbergOct 27 '10 at 16:36

+1. I'm not a huge fan of big-list questions, but this one is well thought out. Equally importantly, I strongly believe that big-list questions should include something yours includes, but many do not: a paragraph near the end explaining what a good answer should include. Unfortunately, I don't know any good examples with which to actually answer the question.
–
Theo Johnson-FreydOct 28 '10 at 5:20

26 Answers
26

Not from measure theory, alas, but the example that jumps to my mind is Gauss's first proof of Quadratic Reciprocity. It appears in the Disquisitiones Mathematicae. The proof occupies arts. 135 through 144 (five and a half pages in the English edition published by Springer); the proof is by strong induction on $q$ (when $p\lt q$). I don't recall who, but someone once called it a proof by "mathematical revulsion."

The proof is quite messy. Gauss argues by cases, considering the congruence classes of $p$ and $q$ modulo $4$, and whether $p$ is or is not a quadratic residue modulo $q$. He actually casts his proof as if it were a proof by minimal counterexample, so he further assumes in some instances that the result does not hold (e.g., for $p\equiv q\equiv 1 \pmod{4}$, either $p$ is a quadratic residue modulo $q$ and $q$ is not one modulo $p$; or $p$ is not a quadratic residue modulo $q$ and $q$ is a quadratic residue modulo $p$). They fall into eight cases, though some of those cases themselves break into subcases. For example, Gauss looks at the case when $p$ and $q$ are both congruent to $1$ modulo $4$, and $\pm p$ is not a residue modulo $q$; then he takes a prime $\ell\neq p$ less than $q$ for which $q$ is not a quadratic residue, and considers the cases in which $\ell\equiv 1 \pmod{4}$ or $\ell\equiv 3 \pmod{4}$ separately; the first subcase itself breaks into four separate sub-subcases: since $p\ell$ is a quadratic residue modulo $q$, it is the square of some even $e$; then he considers the case when $e$ is not divisible by either $p$ or $\ell$, when it is divisible by $p$ but not $\ell$; when it is divisible by $\ell$ but not $p$; and when it is divisible by $\ell$ and $p$. And so on. By the time Gauss finally gets to the eighth and final case, he is clearly somewhat exhausted, writing merely "The demonstration is the same as in the preceding case."

On the one hand, the proof is pretty much the first proof that one might think to try when encountering the problem. But the different cases are just way too messy, and one quickly loses sight of the forest because one is so intently staring at the beetles in the bark of the tree directly in front.

Plenty of other proofs would follow (including five more by Gauss), ranging from the clever to the almost magical (do this, do that, and oops, quadratic reciprocity falls out).

@David: Depends on which ones, I think; in some, Quadratic Reciprocity seems to sneak up on you while you're not looking. Others might be summarizable as per that question...
–
Arturo MagidinOct 28 '10 at 3:38

@Franz: I taught it once or twice in a number theory course. It has some things to recommend it, I think (it is very hands on, you really see what is going on with the congruences, etc). But there is no denying its messiness...
–
Arturo MagidinOct 28 '10 at 17:27

Interesting. So this is why we call it a Theorem, although it's a lemma.
–
Martin BrandenburgMar 30 '11 at 21:59

13

@Martin: the stature of a statement is not measured by the complexity of the proof but with its importance (Yoneda's lemma being a lemma notwithstanding...) nor diminishes when those standing on the shoulders of the author of the statement turn it into a triviality :)
–
Mariano Suárez-Alvarez♦Aug 11 '11 at 3:15

@anon: indeed, there is a lot more in that article than what we now know as the Hilbert Basis Thm: the Hilbert polynomial, the Hilbert Syzygy Thm, the application to rings of invariants via averaging over SU(n) done by way of Cayley's omega operator...
–
Abdelmalek AbdesselamMar 4 '13 at 20:51

One nice example (from topology) is Tychonoff's Theorem (that a product of compact spaces is compact). No matter how many times I see it, I find the classic proof based on the (Alexandre Subbase Lemma) difficult and opaque. On the other hand if one first develops the theory of nets (aka Moore-Smith Convergence), not only is that a powerful tool for all sorts of other purposes, but its development is a natural and intuitive generalization of sequences, and the place where Zorn's Lemma enters (the proof that any net has a universal subnet) is much clearer than in the proof of the subbase lemma. And of course once one has universal nets, the proof of Tychonoff is the obvious generalization of the trivial proof that a finite product of sequentially compact spaces is sequentially compact.

+1: I completely agree that the naturality and ease of the net-theoretic (or filter-theoretic) proofs of Tychonoff makes the other, more ad hoc approaches look kind of silly. (And, as you say, developing the theory of nets -- or filters, or both -- is natural and important for its own sake.) I just wanted to point out that the "clasic" Subbase Lemma proof is not actually Tychonoff's original proof: rather he uses the concept of a complete accumulation point, which is all but forgotten nowadays.
–
Pete L. ClarkDec 29 '10 at 6:39

2

The proof based on ultrafilters is even more magical. It is a mistery why one would want to introduce filters in the first place, but once that step is done, the proofs is more or less a collection of simple facts about filters which are both easy to guess and to prove.
–
Andrea FerrettiJan 14 '11 at 17:40

1

@Andrea: well, nowadays there are at least two natural reasons to invent ultrafilters: either you are thinking about logic or you are thinking about prime ideals in Boolean rings (e.g. see qchu.wordpress.com/2010/11/22/…).
–
Qiaochu YuanJan 21 '11 at 10:01

The traditional way of proving Grothendieck duality is to first show it for proper maps and for open immersions, which already is quite a labour. Then one uses that any morphism of Noetherian schemes factors into such and pastes the partial results together. This requires an awful lot of non-trivial checking that certain diagrams are commutative. The extension of the result to Non-Noetherian schemes then requires yet more work.

In contrast, Neeman's proof of Grothendieck duality via Brown representability is slick, short (30 pages) and conceptual and a pure pleasure to read.

(but: the first approach gives you more insight into what the functors from Grothendieck duality actually do, so it is by no means worthless)

Well, it's a paper of 30 pages. The relevant definitions and the actual proof are given on pages 4-19 which contain spacious diagrams and not too much text and additional examples on D-modules which can be skipped. Also it is so nicely written that you would rather want to last it longer :-)
–
Peter ArndtJul 17 '14 at 22:24

This is not about measure theory or Dynkin's lemma or Caratheodory's extension theorem, but it is hard for me to resist sharing one of my favorite examples of improving proofs with modern machinery: the Intermediate Value Theorem. This theorem is so intuitively obvious, but the proof using classical analysis involves taking a supremum of the set of $a \leq x \leq b$ such that $f(x) \leq y$ (where $y$ is the desired output) and then showing using continuity that this supremum $c$ satisfies $f(c) = y$. There are lots of $\delta$'s and $\epsilon$'s and the proof feels uninspiring and technical at best.

Enter topology. The proof that the image of a connected set is connected for a continuous function is simple and intuitive, as is the notion of a connected set. Once this is established, the Intermediate Value Theorem is essentially just the statement that an interval is a connected set, so the image must be connected. This proof captures, in my opinion, the intuition of the Intermediate Value Theorem in a precise way.

How do you distinguish a different proof from the same proof in different language? My point is that the concepts simplify the language of the proof and put it in an intuitive context.
–
Jeremy WestOct 27 '10 at 16:26

30

I would suggest that, at least for this theorem, what separates the messy proof from the slick one (even if they are morally the same) is that the latter practices a clear separation between independent concepts, and the former just rolls everything up into one big argument. In other words, it uses lemmas (just as the question asks).
–
Ryan ReichOct 27 '10 at 16:42

5

Actually once you realize that continous functions have the "permanence of sign" property - which is definitely trivial - your proof becomes about two lines. All this without ever mentioning connectedness. You simply observe (let me assume that y = 0) that f(c) cannot be neither negative nor positive, otherwise the nearbby values would have the same property and c could not be that supremum.
–
Andrea FerrettiJan 14 '11 at 17:44

You may want to look into the history of the de Branges proof of the Bieberbach conjecture. Reader's Digest version: his original proof was over 100 pages, but others studying his proof got it down to about a dozen.

+1: This one came to my mind as well. Not necessarily the situation described in the OP, since the original "proof" had a number of holes, and the shorter one almost fell out from the results used to fill the gaps. It does show how math can be drastically simplified by the right approach, which captures the spirit of the question, if not the letter.
–
Logan MaingiMar 16 '11 at 6:24

Nash's original proof of his famous isometric embedding theorem was extremely complicated. I'm under the impression that very few people ever read or understood the details. The hard step is a generalized implicit function theorem. His proof was simplified considerably by Moser and others (I learned the proof from a paper by Sergeraert). However, the proof of the isometric embedding theorem was dramatically simplified by Matthias Gunther, who found a way to use the standard contraction mapping argument and eliminate completely the need for the so-called Nash-Moser implicit function theorem.

However, Gunther's proof, unlike the other examples and the intent of the question, is for me just as mysterious and miraculous as Nash's original proof.

From an interview (ams.org/notices/201003/rtx100300391p.pdf) with Gromov: Raussen and Skau: This means that you read Nash’s work and were impressed by it very early? Gromov: Yes, I read it very carefully. And I still believe I am the only person who read his papers from the beginning to the end. By judging what people have written about it afterwards, I do not think they have read it.
–
user5831Jun 9 '11 at 19:58

2

Bora, thanks! I would add that although I believe Gromov really did read and understand Nash's original paper, Gromov himself used Moser's simplified version in his book Partial Differential Relations. The only place I have seen anything resembling Nash's original proof is a paper by Hormander and even there Hormander does not reproduce the full strength of Nash's original theorem and proof.
–
Deane YangJun 10 '11 at 10:22

3

Let me add that the inverse function theorem of Nash developed for the purpose is quite amazing and is probably the best part of that results
–
Piero D'AnconaMay 3 '12 at 12:36

Sometimes an alternative to proofs being messy because they were written before advanced methods were available is proofs being numerous because they were written before advanced methods were available. An example is the proofs by Archimedes in his Mechanical Method. They are stunningly beautiful, but you need a new brilliant trick for each elementary integral.

This isn't a proof, but I always liked Brownian motion as an example of a continuous, nowhere differentiable function. As opposed to putting a lot of effort into artificially creating a series which converges and proving that it is continuous and nowhere differentiable, you take a natural, physical process and say "Ah ha! It works." The proof that it's nowhere differentiable is a pretty straightforward, though constructing a Brownian motion can be a pain...

I bet that the people who originally sought an explicit example of a continuous but nowhere differentiable function never suspected that the most natural example would come from a physical process, though.
–
Thierry ZellAug 11 '11 at 11:35

Lambert actually showed that tan x is irrational whenever x is a nonzero rational number. This is much stronger than just the irrationality of $\pi$. In addition, the proof is quite readable and well motivated, though not as polished as Niven's.
–
Franz LemmermeyerOct 28 '10 at 8:50

5

By the way, is there an accessible version of Lambert's proof? (If I remember rightly, his works are mostly in Latin and not easily available.)
–
John StillwellOct 28 '10 at 16:39

6

According to Laczkovich, On Lambert's proof of the irrationality of $\pi$, Amer Math Monthly 104 (1997) 439-443, "The last monograph that gives Lambert's argument in detail seems to be Chrystal's Algebra."
–
Gerry MyersonOct 29 '10 at 1:10

2

We should stop crediting "the nice proof" to Niven. It rightfully belongs to Hermite. Niven's proof is a slight "variation" of Hermite's proof. The integral Niven used (a simple change of variable from Hermite's integral) is closely related to Lambert's continued fraction. Read my preprint and the references therein.
–
Li ZhouMar 16 '11 at 3:01

But weren't homology and cohomology only loosely defined in Poincaré's time? That cannot help with getting a correct proof!
–
Thierry ZellAug 11 '11 at 11:37

2

Actually the proof via dual triangulations (found by Poincare just shortly after his original "proof") can be made totally rigorous and is arguably more intuitive then the modern proofs once you have a triangulation. It has some disadvantages though: 1) Not every topological manifold has a triangulation. 2) Even if it has, it may be hard to show. 3) Naturality of Poincare duality is less then clear.
–
Lennart MeierDec 4 '13 at 21:40

What about Wiener's proof that the reciprocal of a nowhere zero function on the circle with absoluetely convergent Fourier series also has this property? Gelfand basically created the theory of Banach algebras in order to give a short, clean proof.

I think Halmos said something to the effect that Wiener didn't really understand his own theorem, because he didn't find Gelfand's proof.

@Nik: That application may not have been the main motivation for Gelfand's work on Banach algebras. See comments by Jonas Meyer to my answer on the page mathoverflow.net/questions/20268/….
–
KConradMay 3 '12 at 15:30

An example from measure theory that might qualify is in the construction
of non-measurable sets. The first example is well known, Vitali's 1905
construction obtained by choosing a member from each coset modulo the
additive group of rationals. Less well known is the construction given by
van Vleck, in the 1908 Transactions of the AMS. pp. 237--244.(Here
if you have JSTOR.)

Van Vleck's construction is somewhat messy, but it becomes clearer when
one realizes that it is essentially based on an ultrafilter on $\omega$,
an idea that was made explicit in the construction of Sierpinski 1938.

Wait, this seems backwards relative to the question that was asked: short proofs coming after messy ones. From what you write, Vitali's simple idea came before the other one. Does van Vleck's idea lead to anything besides another example of a non-measurable set?
–
KConradOct 28 '10 at 8:27

2

@KConrad: I agree that van Vleck's proof is messier than Vitali's, so it seems like a backward step. But my point is that van Vleck's proof contains the germ of an important new idea -- ultrafilters -- and his proof became a lot clearer after this idea was recognized.
–
John StillwellOct 28 '10 at 16:21

and his proof also shows that there is no-need of ful AC to show the existence of nonmeasurables, ultrafiter lemma(or Boolean prime ideal theorem) suffices.
–
Ostap ChervakJun 13 '11 at 15:17

How about Gauss' Lemma? Gauss proved it in Disquisitiones Arithmeticae, the same work in which he developed modular arithmetic, but didn't use modular arithmetic to prove it. I once wrote a blog post about this, including his original proof and a modern one for comparison. (The blog post was intended for an audience of teachers so attempts to assume no abstract algebra background.)

The structure of (complete) discretely valued fields. Today this is usually done with Teichmüller representatives and Witt vector machinery (e.g. Serre, Local Fields, ch. II §§4-6). But the main results had been proved with more complicated methods by H. Hasse and F. K. Schmidt (Crelle 170 (1934): 60 pages) before these tools were developed. Also, what is today called Artin-Schreier-Witt theory was developed by H. L. Schmid with very messy computations: These inspired Witt to invent his vectors, which then gave it a much more conceptual background. Cf. the "history" part of my answer here.

David,I may be wrong but I feel Riemann, deduces (generic) RRT clearly and briefly from the existence of g holom. forms and a merom. form with simple poles at any two pts. His analytic argument is complicated and incomplete, but he remarks that for plane curves one can write the forms down by adjoint polynomials. Roch's paper, called "incomprehensible" by Gray [citing Clebsch], is a simple residue calculation computing the rank of a period integral in Riemann's less precise formula. [This argument is in Griffiths-Harris]. I am not aware the analytic proof has ever been rendered easy or short.
–
roy smithMar 23 '13 at 19:22

In short, I suggest that Riemann and Roch's original exposition of the proof is the clearest one in existence. At least I myself felt I first understood how simple it is in principle after reading them. This acknowledges the gap in Riemann's proof of existence of one forms of 1st, 2nd, and 3rd kinds, in the abstract case of complex one manifolds. Note that Griffiths and Harris cite the Kodaira vanishing theorem for this existence, which itself has no trivial proof to my knowledge.
–
roy smithMar 24 '13 at 2:23

The first proof I ever read was the analytic one, I read it in Farkas and Kra's Riemann surfaces book, and there are no gaps (or appeal to harder theorems) there; however, it was long, and felt to me very un-natural. My highly anecdotal evidence for which proof is "the most natural" is that now - being mostly retired for alg-geom for over five years - I only remember the sheaf theoretic proof; moreover, I remember it so vividly that I was able to outline it to someone during a 7km run !
–
David LehaviMar 24 '13 at 17:11

Arhangel'skii's positive solution to Alexandroff's problem (does every compact Hausdorff space with a countable local base at every point have cardinality at most the continuum?) was quite clever and complicated. Later proofs of Arhangel'skii Theorem by Pol and Shapirovskii are not only simpler, but they also provide a common framework for many cardinal inequalities which before had to be proved in ad hoc ways (ramification arguments, infinitary combinatorics...) The model-theoretic version of the Pol-Shapirovskii method makes that framework even more transparent and user-friendly.

I will mention another more recent example from Set-theoretic topology to remember a great mathematician who passed away just a few days ago. Mary Ellen Rudin's proof of Nikiel's Conjecture is almost 30 pages long! Rudin's theorem states that every compact monotonically normal space is the continuous image of a compact linearly ordered space. (Monotonically normal is an elegant common generalization of linearly ordered and metric spaces). I recall from his talk at one of the Spring Topology Conferences that Todd Eisworth had a project of using elementary submodels to clarify her proof and has already succeeded to do that for the separable case. But I don't know if his proof has already been published (he has a related paper on his website, but it does not contain that proof http://www.ohio.edu/people/eisworth/research/Preprints%20and%20Reprints/finalMN.pdf).

The modern proof uses Hopf's maximal ergodic lemma, which makes it much shorter than the classical proof is. See, for example, these notes (the proof given here is quite detailed, but not as complex as in the classical approach. I have seen shorter proofs of the exact same statement though, also using Hopf's lemma). It is possible to prove it even more briefly, for instance in this text, where Keane and Petersen prove a strengthened maximal ergodic lemma.

The original theorem stated by Birkhoff in 1931 can be found here, for example. So you can see 'what mathematicians did before E. Hopf proved the maximal ergodic lemma'. I wouldn't call this extremely messy, but it's definitely more complicated.

I cannot give any background as to how Hopf came to proving his lemma, but it must have appeared in his book about ergodic theory, published in 1937. So I conjecture it was inspired directly by Birkhoff's work. (I'd be happy to see comments or corrections concerning this)

I nominate the homology version of Cauchy's theorem: a sufficient condition for a curve in a (multiply) connected region has $0$ countor integral for any function which is analytic in that region is that the winding number of the curve with respect to any point outside the region is $0$.

E. Artin's proof is pretty messy: he used grids to approximate the countor, and then we have to verify that the integrals around the grid is equal to the integral around the countor, as well as that they have the same topology.

J. Dixon's proof is very elementary (a little less intuitive than the approximation proof, though, only a little) and straight forward. The key step is no more than interchanging the integration order.

An example from algebra is Albert's paper "On the Wedderburn norm condition for cyclic algebras" relating a 6-dimensional quadratic form to every biquaternion algebra (which is now known as the "Albert quadratic form"). His original paper is essentially one long (but of course extremely clever) computation. By now, we have much more conceptual proofs and we understand the situation much better (see for instance the Book of Involutions). I'm sure that there are many more examples of this style.

Hindman's theorem states that if we finitely colour the naturals, there exists an infinite set $S$ such that the sum of every finite non-empty subset of $S$ has the same colour.

Hindman's original combinatorial proof of his theorem was very long and protracted. Baumgartner found a shorter combinatorial proof, which was still rather long, before Galvin and Glazer blew it out of the water with a beautiful proof involving defining a topology on the space $\beta \mathbb{N}$ of ultrafilters and proving various things about it.

The Galvin-Glazer idea of idempotent ultrafilters has been applied to many other problems, only some of which are related to Ramsey theory.

Interestingly, the longer proofs actually operate in weaker systems of mathematics: the original proof works in ACA_0, the second proof in a stronger second-order arithmetic, and the third proof requires full ZFC (without choice, one cannot prove the existence of any non-trivial ultrafilters, never mind an idempotent one). See http://arxiv.org/pdf/0906.3885.pdf for more details.