Serre's A Course in Arithmetic gives essentially the following proof of the three-squares theorem, which says that an integer a is the sum of three squares if and only if it is not of the form 4^m (8n + 7): first one shows that the condition is necessary, which is straightforward. To show it is sufficient, a lemma of Davenport and Cassels, using Hasse-Minkowski, shows that a is the sum of three rational squares. Then something magical happens:

Let C denote the circle x^2 + y^2 + z^2 = a. We are given a rational point p on this circle. Round the coordinates of p to the closest integer point q, then draw the line through p and q, which intersects C at a rational point p'. Round the coordinates of p' to the closest integer point q', and repeat this process. A straightforward calculation shows that the least common multiple of the denominators of the points p', p'', ... are strictly decreasing, so this process terminates at an integer point on C.

Bjorn Poonen, after presenting this proof in class, remarked that he had no intuition for why this should work. Does anyone have a reply?

Edit: Let me suggest a possible reformulation of the question as follows. Complete the analogy: Hensel's lemma is to Newton's method as this technique is to ___________.

I don't have any intuition for why this works (other than a vague idea that one should look at other classes of algorithm with similar analysis), but I do just want to say that this is a fantastic proof.
–
Harrison BrownOct 29 '09 at 17:35

2

The lemma that Serre attributed to Davenport and Cassels was actually the magical one, not the criterion for being a sum of 3 rational squares. In any case, as explained in my answer below, Serre later discovered that it was known long before Davenport and Cassels...
–
Bjorn PoonenDec 31 '09 at 23:13

6 Answers
6

The intuition for this method of passing from a rational solution to an integral solution seems pretty simple to me: passing from a rational solution to a nearby integral point (not necessarily a solution) is passing to a point whose denominators are 1, so you can anticipate that when you intersect the line through your rational solution and the nearby integral point with whatever curve or surface contains your solutions, the second intersection point on that line will have denominators that have moved closer to 1. That is, connecting a rational solution with some integral point will spit out a new solution whose denominators are somewhere between the denominators of your solution and the denominators of the integral point you used to produce the line.

Of course intuition is one thing and checking the details is another: you choose the integral point nearby and the math has to work out to show the denominators really get smaller in the second solution you produce. For instance, this method of proving the 3-square theorem goes through without a problem for a similar 2-square theorem (if an integer is a sum of two rational squares than it's a sum of two integral squares by the same method, replacing the sphere x^2 + y^2 + z^2 = a with the circle x^2 + y^2 = a). But this intuitive way of creating an integral solution from a rational solution breaks down if you apply it to the 4-square theorem: the inequalities in the proof just barely fail to work (sort of like doing division with remainder and finding the remainder is as big as the divisor instead of smaller).

The intuition also breaks down if you slightly change the expression x^2 + y^2 (sticking to two variables). Consider x^2 + 82y^2 = 2 and the rational solution (4/7,1/7). Its nearest integral point in the plane is (1,0), and the line through these intersects the ellipse in (16/13,-1/13), so the denominator has gone up. There actually are no integral solutions to x^2 + 82y^2 = 2. Or if we take x^3 + y^3 = 13 and the rational solution (2/3,7/3), its nearest integral point in the plane is (1,2), the line through these meets the curve again in (7/3,2/3), whose nearest integral point in the plane is (2,1), the line through them meets the curve in (2/3,7/3),...

A few years ago when I was giving some lectures on the method of descent, I worked out some examples of this geometric "three-square" theorem (start with an equation a = x^2 + y^2 + z^2 where a is an integer and x, y, and z are rational and produce in a few steps an equation where x, y, and z are integral) and I noticed in my initial examples that the denominators in each new step did not merely drop, but dropped as factors, e.g., if the common denominator at first was 15 then at the next step it was 5 and then 1. Maybe the denominators always decreas through factors like this? Nope, eventually I found a case where they don't: if you start with

13 = (18/11)^2 + (15/11)^2 + (32/11)^2

then the integral point nearest (18/11,15/11,32/11) is (2,1,3) and the line through these two points meets the sphere 13 = x^2 + y^2 + z^2 in the new point (2/3,7/3,8/3), so the denominator has fallen from 11 to 3, which is not a factor. (At the next step you will terminate in the integral solution (0,3,2).)

Hi KConrad; yes, the lcm of the denominators gets closer to 1 at each step (as Qiaochu said already in the question statement). The unanswered question is whether there is a conceptual proof of this that is any clearer than the proofs given so far. For example, can one make it conceptually clear why the relevant hypothesis on the quadratic form is that the absolute value of its value at the "error vector" in the lattice point approximation should be between 0 and 1?
–
Bjorn PoonenJan 16 '10 at 2:00

The point I was trying to make was that the denominators drop "because" you're connecting (by a line) a point with denominators greater than 1 to a point whose denominators all equal 1, so the new rational point that comes out of the process has denominator in between the two. I think the fact that the coordinates of the nearby integral point are all 1 is the intuition for why this process makes the denominators go down from one rational solution to the next. The question asked for intuition about the process, not a conceptual proof of the method. I don't have a conceptual proof. :(
–
KConradJan 16 '10 at 3:51

2

When I gave this proof in a lecture (to students at the Ross Program at Ohio State in 2003), Markus Rost was in the audience. He brought to my attention that instead of speaking about the second point of intersection of a line with a circle, you could speak about reflections (for circles, not ellipses!). This led me to write up the details of the argument using the language of reflections, and although it only works up to a sum of 3 squares in Z, it goes through for n squares in F[T]. Look here: math.uconn.edu/~kconrad/blurbs/linmultialg/sumsquareQF(T).pdf Is that helpful?
–
KConradJan 16 '10 at 4:11

1

OK, thank you. I hadn't thought about the function field case before - it's interesting that you get a stronger result in that case!
–
Bjorn PoonenJan 16 '10 at 5:40

2

It's stronger only because of the non-archimedean behavior of the degree on F(T), and that removes the surprise for me. What happens is that in Q, a sum of k numbers that are each at most 1/4 is less than 1 only for k = 1, 2, and 3, but in F(T) any finite sum of rational functions with degree below some bound also has degree below that bound.
–
KConradJan 16 '10 at 6:26

A few days ago Serre told me about some modest improvements to the proof, based on Weil's book Number theory: an approach through history from Hammurapi to Legendre and on a 1998 letter from Deligne to Serre; I will paraphrase these below.

According to Weil (p. 292), the ``magical'' argument is due to an amateur mathematican: L. Aubry, Sphinxe-Oedipe7 (1912), 81--84. Here is a generalization that allows for a clearer proof.

Lemma: Let $f = f_2+f_1+f_0 \in \mathbf{Z}[x_1,\ldots,x_n]$, where $f_i$ is homogeneous of degree $i$. Suppose that for every $x \in \mathbf{Q}^n-\mathbf{Z}^n$, there exists $y \in \mathbf{Z}^n$ such that $0<|f_2(x-y)|<1$. If $f$ has a zero in $\mathbf{Q}^n$, then it has a zero in $\mathbf{Z}^n$.

Proof: If $x=(x_1,\ldots,x_n) \in \mathbf{Q}^n$, let $\operatorname{den}(x)$ denote the lcm of the denominators of the $x_i$. By iteration, the following claim suffices: If $x \in \mathbf{Q}^n - \mathbf{Z}^n$ and $y \in \mathbf{Z}^n$ satisfy $0<|f_2(x-y)|<1$, and the line $L$ through $x$ and $y$ intersects $f=0$ in $x,x'$, then $\operatorname{den}(x')<\operatorname{den}(x)$. By restricting to $L$ and choosing a coordinate $t$ on it taking the value $0$ at $y$ and integer values exactly on $L \cap \mathbf{Z}^n$, we reduce to proving the following: given $f(t)=At^2+Bt+C \in \mathbf{Z}[t]$ with zeros $x,x' \in \mathbf{Q}$ such that $0<|Ax^2|<1$, we have $\operatorname{den}(x')<\operatorname{den}(x)$. Proof: $0<|Ax^2|<1$ implies $0<|A|<\operatorname{den}(x)^2$, and we have $xx'=C/A$, so $\operatorname{den}(x) \operatorname{den}(x') \le |A| < \operatorname{den}(x)^2$, so $\operatorname{den}(x')<\operatorname{den}(x)$.

Here's a strange application: let p = 1 mod 4 be prime, and let f_2 be the norm form of the quadratic number field K with discriminant p. If K is norm-Euclidean, then the negative Pell equation f_2 = -1 is solvable.
–
Franz LemmermeyerMay 23 '10 at 14:44

I have never understood this proof either. What in my mind makes it very odd is that a very similar argument can sometimes be used to prove in some sense the exact opposite---that certain equations have solutions with denominators as big as you like. For example the proof in Cassels' "elliptic curves" book that x^3+y^3=9 has infinitely many rational solutions goes like this: "find one point (e.g. (2,1)). Now draw the line tangent to the curve through that point, giving a new point (the 3rd point of intersection of the line and the curve), compute the denominators of the new point and observe that they are bigger than those of the old curve. Hence doing this procedure infinitely often produces infinitely many points". You're doing the "new points from old" trick but this time the denominators are getting provably worse rather than provably better.

This is not an answer per se, but: I have recently been thinking about the Davenport-Cassels Theorem and found a generalization to the context of "normed rings" which also recovers the Cassels-Pfister Theorem as a special case. (This also makes precise KConrad's assertion in the comments to his answer that the latter result is a function field analogue of the former one.) See

for a preliminary writeup. I think those who are interested in this question may also be interested in this paper and have useful input to give: comments are most welcome. (Note also that, if this answer is an advertisement for my work on MO, the favor is returned in the paper: Bjorn's answer is cited there.)

Okay, this is a crazy line of thought even for me, but there are a lot of number-theoretic or combinatorial algorithms where the proof of correctness proceeds along essentially the same lines. A solution is shown to be equivalent to some parameter equalling 1 (or 0). The parameter can usually take only integer values at any point in the set of potential solutions. We then apply an iterative algorithm and show that at each iteration, if we don't already have a solution, the parameter strictly decreases. Then the algorithm must eventually terminate at a solution.

I don't really know enough in this area to speak with any kind of certainty, but you're looking for a real-valued analogue of something like this -- I suspect it's probably linear programming, or some generalization thereof. Maybe interior point methods?

Right, but I don't have an intuition for any of those algorithms either, except the ones that boil down to the Euclidean algorithm.
–
Qiaochu YuanOct 29 '09 at 18:01

I mean, a lot of the time what you do at each step is improve the solution locally in a way so obvious as to be almost stupid -- the miracle that occurs/meta-algorithm to consider is that the small local improvements don't eventually start cancelling each other out. I don't see an easy way to fit the three-squares theorem into this paradigm, though.
–
Harrison BrownOct 29 '09 at 18:20

Perhaps not what you're looking for, but superficially this looks like a minimization procedure under constraint: rounding to the closest integer point means minimizing the Euclidean distance to the integer lattice, then the straight line to C means projecting back on the constraint manifold. (I'll try to see which one procedure in particular, could be simply gradient descent.)