I do not know exactly how to characterize the class of proofs that interests me, so let me give some examples and say why I would be interested in more. Perhaps what the examples have in common is that a powerful and unexpected technique is introduced that comes to seem very natural once you are used to it.

Example 1. Euler's proof that there are infinitely many primes.

If you haven't seen anything like it before, the idea that you could use analysis to prove that there are infinitely many primes is completely unexpected. Once you've seen how it works, that's a different matter, and you are ready to contemplate trying to do all sorts of other things by developing the method.

Example 2. The use of complex analysis to establish the prime number theorem.

Even when you've seen Euler's argument, it still takes a leap to look at the complex numbers. (I'm not saying it can't be made to seem natural: with the help of Fourier analysis it can. Nevertheless, it is a good example of the introduction of a whole new way of thinking about certain questions.)

Example 3. Variational methods.

You can pick your favourite problem here: one good one is determining the shape of a heavy chain in equilibrium.

Example 4. Erdős's lower bound for Ramsey numbers.

One of the very first results (Shannon's bound for the size of a separated subset of the discrete cube being another very early one) in probabilistic combinatorics.

Example 5. Roth's proof that a dense set of integers contains an arithmetic progression of length 3.

Historically this was by no means the first use of Fourier analysis in number theory. But it was the first application of Fourier analysis to number theory that I personally properly understood, and that completely changed my outlook on mathematics. So I count it as an example (because there exists a plausible fictional history of mathematics where it was the first use of Fourier analysis in number theory).

Example 6. Use of homotopy/homology to prove fixed-point theorems.

Once again, if you mount a direct attack on, say, the Brouwer fixed point theorem, you probably won't invent homology or homotopy (though you might do if you then spent a long time reflecting on your proof).

The reason these proofs interest me is that they are the kinds of arguments where it is tempting to say that human intelligence was necessary for them to have been discovered. It would probably be possible in principle, if technically difficult, to teach a computer how to apply standard techniques, the familiar argument goes, but it takes a human to invent those techniques in the first place.

Now I don't buy that argument. I think that it is possible in principle, though technically difficult, for a computer to come up with radically new techniques. Indeed, I think I can give reasonably good Just So Stories for some of the examples above. So I'm looking for more examples. The best examples would be ones where a technique just seems to spring from nowhere -- ones where you're tempted to say, "A computer could never have come up with that."

Edit: I agree with the first two comments below, and was slightly worried about that when I posted the question. Let me have a go at it though. The difficulty with, say, proving Fermat's last theorem was of course partly that a new insight was needed. But that wasn't the only difficulty at all. Indeed, in that case a succession of new insights was needed, and not just that but a knowledge of all the different already existing ingredients that had to be put together. So I suppose what I'm after is problems where essentially the only difficulty is the need for the clever and unexpected idea. I.e., I'm looking for problems that are very good challenge problems for working out how a computer might do mathematics. In particular, I want the main difficulty to be fundamental (coming up with a new idea) and not technical (having to know a lot, having to do difficult but not radically new calculations, etc.). Also, it's not quite fair to say that the solution of an arbitrary hard problem fits the bill. For example, my impression (which could be wrong, but that doesn't affect the general point I'm making) is that the recent breakthrough by Nets Katz and Larry Guth in which they solved the Erdős distinct distances problem was a very clever realization that techniques that were already out there could be combined to solve the problem. One could imagine a computer finding the proof by being patient enough to look at lots of different combinations of techniques until it found one that worked. Now their realization itself was amazing and probably opens up new possibilities, but there is a sense in which their breakthrough was not a good example of what I am asking for.

While I'm at it, here's another attempt to make the question more precise. Many many new proofs are variants of old proofs. These variants are often hard to come by, but at least one starts out with the feeling that there is something out there that's worth searching for. So that doesn't really constitute an entirely new way of thinking. (An example close to my heart: the Polymath proof of the density Hales-Jewett theorem was a bit like that. It was a new and surprising argument, but one could see exactly how it was found since it was modelled on a proof of a related theorem. So that is a counterexample to Kevin's assertion that any solution of a hard problem fits the bill.) I am looking for proofs that seem to come out of nowhere and seem not to be modelled on anything.

Further edit. I'm not so keen on random massive breakthroughs. So perhaps I should narrow it down further -- to proofs that are easy to understand and remember once seen, but seemingly hard to come up with in the first place.

Perhaps you could make the requirements a bit more precise. The most obvious examples that come to mind from number theory are proofs that are ingenious but also very involved, arising from a rather elaborate tradition, like Wiles' proof of Fermat's last theorem, Faltings' proof of the Mordell conjecture, or Ngo's proof of the fundamental lemma. But somehow, I'm guessing that such complicated replies are not what you have in mind.
–
Minhyong KimDec 9 '10 at 15:18

9

Of course, there was apparently a surprising and simple insight involved in the proof of FLT, namely Frey's idea that a solution triple would give rise to a rather exotic elliptic curve. It seems to have been this insight that brought a previously eccentric seeming problem at least potentially within the reach of the powerful and elaborate tradition referred to. So perhaps that was a new way of thinking at least about what ideas were involved in FLT.
–
roy smithDec 9 '10 at 16:21

10

Never mind the application of Fourier analysis to number theory -- how about the invention of Fourier analysis itself, to study the heat equation! More recently, if you count the application of complex analysis to prove the prime number theorem, then you might also count the application of model theory to prove results in arithmetic geometry (e.g. Hrushovski's proof of Mordell-Lang for function fields).
–
D. SavittDec 9 '10 at 16:42

7

I agree that they are difficult, but in a sense what I am looking for is problems that isolate as well as possible whatever it is that humans are supposedly better at than computers. Those big problems are too large and multifaceted to serve that purpose. You could say that I am looking for "first non-trivial examples" rather than just massively hard examples.
–
gowersDec 9 '10 at 18:04

4

It seems to me that this question has been around a long time and is unlikely garner new answers of high quality. It also seems unlikely most would even read new answers. Furthermore, nowadays I imagine a question like this would be closed as too broad, and if we close this then we'll discourage questions like it in the future. So I'm voting to close.
–
David WhiteOct 13 '13 at 18:52

64 Answers
64

Grothendieck's insight how to deal with the problem that whatever topology you define on varieties over finite fields, you never seem to get enough open sets. You simply have to re-define what is meant by a topology, allowing open sets not to be subsets of your space but to be covers.

I think this fits the bill of "seem very natural once you are used to it", but it was an amazing insight, and totally fundamental in the proof of the Weil conjectures.

Obviously, I agree that this was fundamental. But since we're speaking only about Grothendieck topologies and not the eventual proof of the Weil conjectures, there could be a curious sense in which this idea might be particularly natural to computers. Imagine encoding a category as objects and morphisms, which I'm told is quite a reasonable procedure in computer science. You'll recall then that it's somewhat hard to define a subobject.
–
Minhyong KimDec 10 '10 at 1:20

19

That is, it's easier to refer directly to arrows $A\rightarrow B$ between two of the symbols rather than equivalence classes of them. In this framework, a computer might easily ask itself why any reasonable collection of arrows might not do for a topology. Grothendieck topologies seem to embody exactly the kind of combinatorial and symbolic thinking about open sets that's natural to computers, but hard for humans. We are quite attached to the internal 'physical' characteristics of the open sets, good for some insights, bad for others.
–
Minhyong KimDec 10 '10 at 1:26

1

According to Grothendieck-Serre correspondence, I think it is more appropriate to say it is an insight due to both of them.
–
tempJun 3 '12 at 19:49

Although this has already been said elsewhere on MathOverflow, I think it's worth repeating that Gromov is someone who has arguably introduced more radical thoughts into mathematics than anyone else. Examples involving groups with polynomial growth and holomorphic curves have already been cited in other answers to this question. I have two other obvious ones but there are many more.

I don't remember where I first learned about convergence of Riemannian manifolds, but I had to laugh because there's no way I would have ever conceived of a notion. To be fair, all of the groundwork for this was laid out in Cheeger's thesis, but it was Gromov who reformulated everything as a convergence theorem and recognized its power.

Another time Gromov made me laugh was when I was reading what little I could understand of his book Partial Differential Relations. This book is probably full of radical ideas that I don't understand. The one I did was his approach to solving the linearized isometric embedding equation. His radical, absurd, but elementary idea was that if the system is sufficiently underdetermined, then the linear partial differential operator could be inverted by another linear partial differential operator. Both the statement and proof are for me the funniest in mathematics. Most of us view solving PDE's as something that requires hard work, involving analysis and estimates, and Gromov manages to do it using only elementary linear algebra. This then allows him to establish the existence of isometric embedding of Riemannian manifolds in a wide variety of settings.

Partial Differential Relations? I don't believe the h-principle requires much analysis. In the sections on isometric embeddings, he does state and give a complete analytic proof of the version of the Nash-Moser implicit function he needs.
–
Deane YangDec 9 '13 at 21:59

I vote yes. Look how hard Liouville had to work to find the first examples of transcendental numbers, and how easy Cantor made it to show that there are scads of them.
–
Gerry MyersonDec 9 '10 at 22:07

4

While Cantor's argument is amazing, and certainly produces scads, Liouville didn't have to work that hard; his approach is also very natural, and doesn't rely on much more than the pigeon-hole principle.
–
EmertonDec 10 '10 at 3:15

7

Cantor's whole idea of casting mathematics in the language of set theory is now so pervasive we don't even think about it. It dominated our subject until the category point of view. So to me these are the two most basic insights, viewing mathematics in terms of sets, and then in terms of maps. Etale topologies are just one example of viewing maps as the basic concept.
–
roy smithDec 13 '10 at 17:55

Generating functions seem old hat to those who have worked with them, but I think their early use could be another example. If you did not have that tool handy, could you create it?

Similarly, any technique that has been developed and is now widely used is made to look natural after years of refining and changing the collected perspective, but might it not have seemed quite revolutionary when first introduced? Perhaps the question should also be about such techniques.

I'd like to add to generating functions the idea that you can use singularity analysis to determine the coefficient growth. But I don't know how unexpected this was when first used...
–
Martin RubeyDec 9 '10 at 17:24

2

"any technique that has been developed and is now widely used is made to look natural after years of refining and changing the collected perspective, but might it not have seemed quite revolutionary when first introduced?" It surely was, and it is exactly why it is widely used now: it allowed a lot of things that were impossible previously and we are still trying to figure out how much is "a lot". Also note, that shaping an idea and recognizing its power is a long process, so "unexpected" means that 20 years ago nobody would have thought of that, not that it shocked everyone on one day.
–
fedjaDec 17 '10 at 12:53

My favorite example from algebraic topology is Rene Thom's work on cobordism theory. The problem of classifying manifolds up to cobordism looks totally intractable at first glance. In low dimensions ($0,1,2$), it is easy, because manifolds of these dimensions are completely known. With hard manual labor, one can maybe treat dimensions 3 and 4. But in higher dimensions, there is no chance to proceed by geometric methods.

Thom came up with a geometric construction (generalizing earlier work by Pontrjagin), which is at the same time easy to understand and ingenious. Embed the manifold into a sphere, collapse everything outside a tubular neighborhood to a point and use the Gauss map of the normal bundle... What this construction does is to translate the geometric problem into a homotopy problem, which looks totally unrelated at first sight.

The homotopy problem is still difficult, but thanks to work by Serre, Cartan, Steenrod, Borel, Eilenberg and others, Thom had enough heavy guns at hand to get fairly complete results.

Thom's work led to an explosion of differential topology, leading to Hirzebruch's signature theorem, the Hirzebruch-Riemann-Roch theorem, Atiyah-Singer, Milnor-Kervaire classification of exotic spheres.....until Madsen-Weiss' work on mapping class groups.

The method of forcing certainly fits here. Before, set theorists expected that independence results would be obtained by building non-standard, ill-founded models, and model theoretic methods would be key to achieve this. Cohen's method begins with a transitive model and builds another transitive one, and the construction is very different from all the techniques being tried before.

This was completely unexpected. Of course, in hindsight, we see that there are similar approaches in recursion theory and elsewhere happening before or at the same time.

But it was the fact that nobody could imagine you would be able to obtain transitive models that mostly had us stuck.

I took the last set theory course that Cohen taught, and this isn't how he presented his insight at all (though his book takes this approach). The central problem is "how do I prove that non-constructible [sub]sets [of N] are possible without access to one?", and his solution is "don't use a set; use an adaptive oracle". Once that idea is present, the general method falls right into place. The oracle's set of states can be any partial order, generic filters fall right out, names are clearly necessary, everything else is technical. The hardest part is believing it will actually work.
–
Chad GroftDec 14 '10 at 2:07

1

@Chad : Very interesting! Curious that his description is so "recursion-theoretic." Do you remember when was this course?
–
Andres CaicedoDec 14 '10 at 2:48

I don't know who deserves credit for this, but I was stunned by the concept of view complicated objects like functions simply as points in a vector space. With that view one solves and analyzes PDEs or integral equations in Lebesgue or Sobolev spaces.

Technically, the following are not proofs, or even theorems, but I think they count as insights that have the quality that it's hard to imagine computers coming up with them. First, there's:

Mathematics can be formalized.

Along the same lines, there's:

Computability can be formalized.

If you insist on examples of proofs then maybe I'd be forced to cite the proof of Goedel's incompleteness theorem or of the undecidability of the halting problem, but to me the most difficult step in these achievements was the initial daring idea that one could even formulate a mathematically satisfactory definition of something as amorphous as "mathematics" or "computability." For example, one might argue that the key step in Turing's proof was diagonalization, but in fact diagonalization was a major reason that Goedel thought one couldn't come up with an "absolute" definition of computability.

Nowadays we are so used to thinking of mathematics as something that can be put on a uniform axiomatic foundation, and of computers as a part of the landscape, that we can forget how radical these insights were. In fact, I might argue that your entire question presupposes them. Would computers have come up with these insights if humans had not imagined that computers were possible and built them in the first place? Less facetiously, the idea that mathematics is a formally defined space in which a machine can search systematically clearly presupposes that mathematics can be formalized.

More generally, I'm wondering if you should expand your question to include concepts (or definitions) and not just proofs?

Edit. Just in case it wasn't clear, I believe that the above insights have fundamentally changed mathematicians' conception of what mathematics is, and as such I would argue that they are stronger examples of what you asked for than any specific proof of a specific theorem can be.

There's something amusing about the idea of a computer coming up with the idea that computability can be formalized.
–
ndkrempelDec 10 '10 at 13:58

3

This is an example that has bothered me in the past, and I have to admit that I don't have a good answer to it. The ability to introspect seems to be very important to mathematicians, and it's far from clear how a computer would do it. One could perhaps imagine a separate part of the program that looks at what the main part does, but it too would need to introspect. Perhaps this infinite regress is necessary for Godelian reasons but perhaps in practice mathematicians just use a bounded number of levels of navel contemplation.
–
gowersDec 10 '10 at 14:29

8

Conversely, this type of introspection and formalisation is much less effective outside of mathematics (Weinberg has called this the "unreasonable ineffectiveness of philosophy".) Attempts to axiomatise science, the humanities, etc., for instance, usually end up collapsing under the weight of their own artificiality (with some key exceptions in physics, notably relativity and quantum mechanics). The fact that mathematics is almost the sole discipline that actually benefits from formalisation is indeed an interesting insight in my opinion.
–
Terry TaoDec 11 '10 at 16:59

5

But if you axiomatize some portion of some other science, doesn't that axiomatization constitute mathematics? So it seems almost tautologous to say that only mathematics "benefits" from formalization.
–
Michael HardyDec 11 '10 at 17:11

Not sure whether to credit Abel or Galois with the "fundamental new way of thinking" here, but the proof that certain polynomial equations are not solvable in radicals required quite the reformulation of thinking. (I'm leaning towards crediting Galois with the brain rewiring reward.)

P.S. Is it really the case that no one else posted this, or is my "find" bar not working properly?

It seems that certain problems seem to induce this sort of new thinking (cf. my article "What is good mathematics?"). You mentioned the Fourier-analytic proof of Roth's theorem; but in fact many of the proofs of Roths' theorem (or Szemeredi's theorem) seem to qualify, starting with Furstenberg's amazing realisation that this problem in combinatorial number theory was equivalent to one in ergodic theory, and that the structural theory of the latter could then be used to attack the former. Or the Ruzsa-Szemeredi observation (made somewhat implicitly at the time) that Roth's theorem follows from a result in graph theory (the triangle removal lemma) which, in some ways, was "easier" to prove than the result that it implied despite (or perhaps, because of) the fact that it "forgot" most of the structure of the problem. And in this regard, I can't resist mentioning Ben Green's brilliant observation (inspired, I believe, by some earlier work of Ramare and Ruzsa) that for the purposes of finding arithmetic progressions, that the primes should not be studied directly, but instead should be viewed primarily [pun not intended] as a generic dense subset of a larger set of almost primes, for which much more is known, thanks to sieve theory...

Another problem that seems to generate radically new thinking every few years is the Kakeya problem. Originally a problem in geometric measure theory, the work of Bourgain and Wolff in the early 90s showed that the combinatorial incidence geometry viewpoint could lead to substantial progress. When this stalled, Bourgain (inspired by your own work) introduced the additive combinatorics viewpoint, re-interpreting line segments as arithmetic progressions. Meanwhile, Wolff created the finite field model of the Kakeya problem, which among other things lead to the sum-product theorem and many further developments that would not have been possible without this viewpoint. In particular, this finite field version enabled Dvir to introduce the polynomial method which had been applied to some other combinatorial problems, but whose application to the finite field Kakeya problem was hugely shocking. (Actually, Dvir's argument is a great example of "new thinking" being the key stumbling block. Five years earlier, Gerd Mockenhaupt and I managed to stumble upon half of Dvir's argument, showing that a Kakeya set in finite fields could not be contained in a low-degree algebraic variety. If we had known enough about the polynomial method to make the realisation that the exact same argument also showed that a Kakeya set could not have been contained in a high-degree algebraic variety either, we would have come extremely close to recovering Dvir's result; but our thinking was not primed in this direction.) Meanwhile, Carbery, Bennet, and I discovered that heat flow methods, of all things, could be applied to solve a variant of the Euclidean Kakeya problem (though this method did appear in literature on other analytic problems, and we viewed it as the continuous version of the discrete induction-on-scales strategy of Bourgain and Wolff.) Most recently is the work of Guth, who broke through the conventional wisdom that Dvir's polynomial argument was not generalisable to the Euclidean case by making the crucial observation that algebraic topology (such as the ham sandwich theorem) served as the continuous generalisation of the discrete polynomial method, leading among other things to the recent result of Guth and Katz you mentioned earlier.

EDIT: Another example is the recent establishment of universality for eigenvalue spacings for Wigner matrices. Prior to this work, most of the rigorous literature on eigenvalue spacings relied crucially on explicit formulae for the joint eigenvalue distribution, which were only tractable in the case of highly invariant ensembles such as GUE, although there was a key paper of Johansson extending this analysis to a significantly wider class of ensembles, namely the sum of GUE with an arbitrary independent random (or deterministic) matrix. To make progress, one had to go beyond the explicit formula paradigm and find some way to compare the distribution of a general ensemble with that of a special ensemble such as GUE. We now have two basic ways to do this, the local relaxation flow method of Erdos, Schlein, Yau, and the four moment theorem method of Van Vu and myself, both based on deforming a general ensemble into a special ensemble and controlling the effect on the spectral statistics via this deformation (though the two deformations we use are very different, and in fact complement each other nicely). Again, both arguments have precedents in earlier literature (for instance, our argument was heavily inspired by Lindeberg's classic proof of the central limit theorem) but as far as I know it had not been thought to apply them to the universality problem before.

But, Terry, are the adjectives "radical" or "fundamentally new" really justified in the description of any of these examples? and of our business as a whole?
–
Gil KalaiDec 10 '10 at 13:30

3

In other words, his contribution was that D:A=C:B (algebraic topology is to continuous incidence geometry as algebraic geometry is to discrete incidence geometry), which was definitely a very different way of thinking about these four concepts that was totally absent in previous work. (After Guth's work, it is now "obvious" in retrospect, of course.)
–
Terry TaoDec 11 '10 at 16:48

2

Perhaps what this example shows is that a computer trying to generate mathematical progress has to look at more than just the 1-skeleton of mathematics (B is solved by C; A is close to B; hence A might be solved by C) but also at the 2-skeleton (B is solved by C; D is to A as C is to B; hence A might be solved by D) or possibly even higher order skeletons. It seems unlikely though that these possibilities can be searched through systematically in polynomial time, without the speedups afforded by human insight...
–
Terry TaoDec 11 '10 at 17:12

I think that Eichler and Shimura's proof of the Ramanujan--Petersson conjecture for weight two modular forms provides an example. Recall that this conjecture is a purely analytic statement: namely that if $f$ is a weight two cuspform on some congruence subgroup of $SL_2(\mathbb Z)$, which is an eigenform for the Hecke operator $T_p$ ($p$ a prime not dividing the level of the congruence subgroup in question) with eigenvalue $\lambda_p$, then $| \lambda_p | \leq 2 p^{1/2}.$ Unfortunately, no purely analytic proof of this result is known. (Indeed, if one shifts one's focus from holomorphic modular forms to Maass forms, then the corresponding conjecture remains open.)

What Eichler and Shimura realized is that, somewhat miraculously, $\lambda_p$ admits an alternative characterization in terms of counting solutions to certain congruences modulo $p$,
and that estimates there due to Hasse and Weil (generalizing earlier estimates of Gauss and
others) can be applied to show the desired inequality.

This argument was pushed much further by Deligne, who handled the general case of weight $k$ modular forms (for which the analogous inequality is $| \lambda_p | \leq 2 p^{(k-1)/2}$),
using etale cohomology of varieties in characteristic $p$ (which is something of a subtle and more technically refined analogue of the notion of a congruence mod $p$). (Ramanujan's original conjecture was for the unique cuspform of weight 12 and level 1.)

The idea that there are relationships (some known, others conjectural) between automorphic forms and algebraic geometry over finite fields and number fields has now become part of the received wisdom of algebraic number theorists, and lies at the heart of the Langlands program. (And, of course, at the heart of the proof of FLT.) Thus the striking idea of Eichler and Shimura has now become a basic tenet of a whole field of mathematics.

Note: Tim in his question, and in some comments, has said that he wants "first non-trivial instances" rather than difficult arguments that involve a whole range of ideas and techniques. In his comment to Terry Tao's answer regarding Perelman, he notes that long, difficult proofs might well include within them instances of such examples. Thus I am offering this example as perhaps a "first non-trivial instance" of the kind of insights that are involved in proving results like Sato--Tate, FLT, and so on.

I'm a little surprised no one has cited Thurston's impact on low-dimensional topology and geometry. I'm far from an expert, so I'm reluctant to say much about this. But I have the impression that Thurston revolutionized the whole enterprise by taking known results and expressing them from a completely new perspective that led naturally both new theorems and a lot of new conjectures. Perhaps Thurston himself or someone else could say something, preferably in a separate answer so I can delete mine.

Gromov's use of J-holomorphic curves in symplectic topology (he reinterpreted holomorphic functions in the sense of Vekua) as well as the invention of Floer homology (in order to deal with the Arnol'd conjecture).

Donaldson's idea of using global analysis to get more insight about the topology of manifolds.
Nowadays it is clear to us that (non-linear) moduli spaces give something new, and more than linear (abelian) Hodge theory, for example, but I think at that time this was really new.

I fully agree with this. I was a graduate student at Harvard, when Atiyah came and described Donaldson's thesis (Donaldson got his Ph.D. the same year as me). Before that, we all thought we were trying to understand Yang-Mills, because it connected geometric analysis to physics and not because we thought it would prove topological theorems. As I recall it, Atiyah said that when Donaldson first proposed what he wanted to do, Atiyah was skeptical and tried to convince Donaldson to work on something less risky.
–
Deane YangDec 11 '10 at 5:10

Gromov's proof that finitely generated groups with polynomial growth are virtually nilpotent. The ingenious step is to consider a scaling limit of the usual metric on the Cayley graph of the finitely generated group.

Of course the details are messy and to get the final conclusion one has to rely on a lot of deep results on the structure of topological groups. However, already the initial idea is breathtaking.

Quillen's construction of the cotangent complex used homotopical algebra to find the correct higher-categorical object without explicitly building a higher category. This may sound newfangled and modern, but if you read Grothendieck's book on the cotangent complex, his explicit higher-categorical construction was only able to build a cotangent complex that had its (co)homology truncated to degree 2. Strangely enough, by the time Grothendieck's book was published, it was already obsolete, as he notes in the preface (he says something about how new work of Quillen (and independently André) had made his construction (which is substantially more complicated) essentially obsolete).

I don't know how good an example this is. The Lefschetz fixed point theorem tells you that you can count (appropriately weighted) fixed points of a continuous function $f : X \to X$ from a compact triangulable space to itself by looking at the traces of the induced action of $f$ on cohomology. This is a powerful tool (for example it more-or-less has the Poincare-Hopf theorem as a special case).

Weil noticed that the number of points of a variety $V$ over $\mathbb{F}_{q^n}$ is the number of fixed points of the $n^{th}$ power of the Frobenius map $f$ acting on the points of $V$ over $\overline{\mathbb{F}_q}$ and, consequently, that it might be possible to describe the local zeta function of $V$ if one could write down the induced action of $f$ on some cohomology theory for varieties over finite fields. This led to the Weil conjectures, the discovery of $\ell$-adic cohomology, etc. I think this is a pretty good candidate for a powerful but unexpected technique.

Emil Artin's solution of Hilbert's 17th problem which asked whether every positive polynomial in any number of variables is a sum of squares of rational functions.

Artin's proof goes roughly as follows. If $p \in \mathbb R[x_1,\dots,x_n]$ it not a sum of squares of rational functions, then there is some real-algebraically closed extension $L$ of the field of rational functions in which $p$ is negative with respect to some total ordering (compatible with the field operations), i.e. there exists a $L$-point of $R[x_1,\dots,x_n]$ at which $p$ is negative. However, using a model theoretic argument, since $\mathbb R$ is also a real-closed field with a total ordering, there also has to be a real point such that $p<0$, i.e. there exists $x \in \mathbb R^n$ such that $p(x)< 0$. Hence, if $p$ is everywhere positive, then it is a sum of squares of rational functions.

The ingenius part is the use of a model theoretic argument and the bravery to consider a totally ordered real-algebraic closed extension of the field of rational functions.

I'd generalize this answer to include the observation that transfinite induction (or the axiom of choice) can simplify proofs of statements that don't actually require them. This is similar to how probabilistic arguments can sometimes be simpler than constructions. Here's an example statement for which all three kinds of proof exists: there exists a set $ A \subseteq [0,1]^2 $ that is dense everywhere on the unit square [0,1]<sup>2</sup>, but for every x, A contains only finitely many points of form (x, y) or (y, x).
–
Zsbán AmbrusDec 19 '10 at 16:57

Heegner's solution to the Gauss class number 1 problem for imaginary quadratic fields, by
noting that when the class number is 1 then a certain elliptic curve is defined over Q and
certain modular functions take integer values at certain quadratic irrationalities, and then
finding all the solutions to Diophantine equations that result, seems to me equally beautiful
and unexpected. Maybe its unexpectedness kept people from believing it for a long time.

Sometimes mathematics is not only about the methods of the proof, it is about the statement of the proof. E.g., it is hard to imagine an theorem-searching algorithm ever finding a proof of the results in Shannon's 1948 Mathematical Theory of Communication, without that algorithm first "imagining" (by some unspecified process) that there could BE a theory of communication.

Even so celebrated a mathematician as J. L. Doob at first had trouble grasping that Shannon's reasoning was mathematical in nature, writing in his AMS review (MR0026286):

[Shannon's] discussion is suggestive throughout, rather than mathematical, and it is not always clear that the author's mathematical intentions are honorable.

The decision of which mathematical intentions are to be accepted as "honorable" (in Doob's phrase) is perhaps very difficult to formalize.

[added reference]

One finds this same idea expressed in von Neumann's 1948 essay The Mathematician:

Some of the best inspirations of modern mathematics (I believe, the best ones) clearly originated in the natural sciences. ... As any mathematical discipline travels far from its empirical source, or still more, if it is a second or third generation only indirectly inspired by ideas coming from "reality", it is beset by very grave dangers. It becomes more and more purely aestheticizing, more and more l`art pour le art. ... Whenever this stage is reached, the only remedy seems to me to be the rejuvenating return to the source: the reinjection of more or less directly empirical ideas.

One encounters this theme of inspiration from reality over-and-over in von Neumann's own work. How could a computer conceive theorems in game theory ... without having empirically played games? How could a computer conceive the theory of shock waves ... without having empirically encountered the intimate union of dynamics and thermodynamics that makes shock wave theory possible? How could a computer conceive theorems relating to computational complexity ... without having empirically grappled with complex computations?

The point is straight from Wittgenstein and E. O. Wilson: in order to conceive mathematical theorems that are interesting to humans, a computer would have to live a life similar to an ordinary human life, as a source of inspiration.

I find Shannon's use of random codes to understand channel capacity very striking. It seems to be very difficult to explicitly construct a code which achieves the channel capacity - but picking one at random works very well, provided one chooses the right underlying measure. Furthermore, this technique works very well for many related problems. I don't know the details of your Example 4 (Erdos and Ramsey numbers), but I expect this is probably closely related.

It would be a better example if the proof were easier to understand ...
–
gowersDec 9 '10 at 16:38

18

I think there are at least two aspects of the Perelman-Hamilton theory that fit the bill. One is Hamilton's original realisation that Ricci flow could be used to at least partially resolve the Poincare conjecture (in the case of 3-manifolds that admit a metric with positive Ricci curvature). There was some precedent for using PDE flow methods to attack geometric problems, but I think this was the first serious attempt to attack the manifestly topological Poincare conjecture in that fashion, and was somewhat contrary to the conventional wisdom towards Poincare at the time. [cont.]
–
Terry TaoDec 9 '10 at 17:30

21

The other example is when Perelman needed a monotone quantity in order to analyse singularities of the Ricci flow. Here he had this amazing idea to interpret the parabolic Ricci flow as an infinite-dimensional limit of the elliptic Einstein equation, so that monotone quantities from the elliptic theory (specifically, the Bishop-Gromov inequality) could be transported to the parabolic setting. This is a profoundly different perspective on Ricci flow (though there was some precedent in the earlier work of Chow) and it seems unlikely that this quantity would have been discovered otherwise.
–
Terry TaoDec 9 '10 at 17:32

14

Terry's answer illustrates a principle relevant to this question: even if a proof as a whole is too complex to count as a good example, there are quite likely to be steps of the proof that are excellent examples.
–
gowersDec 9 '10 at 20:54

Shigefumi Mori's proof of Hartshorne's conjecture (the projective spaces are the only smooth projective varieties with ample tangent bundles). In his proof, Mori developed many new techniques (e.g. the bend-and-break lemma), which later became fundamental in birational geometry.

Lobachevsky and Bolyai certainly introduced a fundamentally new way of thinking, though I'm not sure it fits the criterion of being a proof of something - perhaps a proof that a lot of effort had been wasted in trying to prove the parallel postulate.

Morse theory is another good example. Indeed it is the inspiration for Floer theory, which has already been mentioned.

Atiyah-Bott's paper "Yang-Mills equations on a Riemann surface" and Hitchin's paper "Self-duality equations on a Riemann surface" both contain rather striking applications of Morse theory. The former paper contains for example many computations about cohomology rings of moduli spaces of holomorphic vector bundles over Riemann surfaces; the latter paper proves for instance that moduli spaces of Higgs bundles over Riemann surfaces are hyperkähler.

Note that these moduli spaces are algebraic varieties and can be (and are) studied purely from the viewpoint of algebraic geometry. But if we look at things from an analytic point of view, and we realize these moduli spaces as quotients of infinite dimensional spaces by infinite dimensional groups, and we use the tools of analysis and Morse theory, as well as ideas from physics(!!!), then we can discover perhaps more about these spaces than if we viewed them just algebraically, as simply being algebraic varieties.