One of the many articles on the Tricki that was planned but has never been written was about making it easier to solve a problem by generalizing it (which initially seems paradoxical because if you generalize something then you are trying to prove a stronger statement). I know that I've run into this phenomenon many times, and sometimes it has been extremely striking just how much simpler the generalized problem is. But now that I try to remember any of those examples I find that I can't. It has recently occurred to me that MO could be an ideal help to the Tricki: if you want to write a Tricki article but lack a supply of good examples, then you can ask for them on MO.

I want to see whether this works by actually doing it, and this article is one that I'd particularly like to write. So if you have a good example up your sleeve (ideally, "good" means both that it illustrates the phenomenon well and that it is reasonably easy for others to understand) and are happy to share it, then I'd be grateful to hear it. I will then base an article on those examples, and I will also put a link from that article to this MO page so that if you think of a superb example then you will get the credit for it there as well as here.

Added later: In the light of Jonas's comment below (I looked, but not hard enough), perhaps the appropriate thing to do if you come up with a good example is to add it as an answer to the earlier question rather than this one. But I'd also like to leave this question here because I'm interested in the general idea of some kind of symbiosis between the Tricki and MO (even if it's mainly the Tricki benefiting from MO rather than the other way round).

38 Answers
38

Great question. Maybe the phenomenon is less surprising if one thinks that there are $\infty$ ways to generalize a question, but just a few of them make some progress possible. I think it is reasonable to say that successful generalizations must embed, consciously or not, a very deep understanding of the problem at hand. They operate through the same mechanism at work in good abstraction, by helping you forget insignificant details and focus on the heart of the matter.

An example, which is probably too grandiose to qualify as an answer, since your question seems very specific, is Fredholm theory. At the beginning of last century integral equations were a hot topic and an ubiquitous tool to solve many concrete PDE problems. The theory of linear operators on Banach and Hilbert spaces is an outgrowth of this circle of problems. Now, generalizing from
$$ u(x) + \lambda \int _a ^b k(x,s) u(s) ds = f(x) $$
to
$$ (I+ \lambda K) u = f $$
makes the problem trivial, and we do it without thinking. But it must have been quite a shock in 1900.

The fact that this sequence is dense in $[0,1]$ is a simple application of the pigeonhole principle, but the fact that it is equidistributed seems a lot harder to prove at first. However, this is very easy to establish if one generalizes the definition of equidistribution (or more precisely, the objects appearing in it). Namely, by very easy arguments one can show that $x_n$ is equidistributed in $[0,1]$ if and only if $\frac{1}{N} \sum_{n=1}^{N} f(x_n) \to \int_{0}^{1}f(x)dx$ for every continuous (or Riemann integrable) function $f$ on $[0,1]$. Now one can't resist the temptation to look for functions $f$ for which this is easy to show. The complex exponentials $f_m (x) = e^{2 \pi i m x}$, where $m \in \mathbb{Z}$, are such functions and by classical Fourier analysis (or the Stone-Weierstrass theorem) it is enough to show the above convergence for such functions. One is thus led to the Weyl criterion, which reduces the problem of showing that the sequence $\{ n \alpha \}$ is equidistributed in $[0,1]$ to a simple computation involving the sum of a geometric sequence. (Of course, Weyl's criterion is useful in studying other, more involved sequences as well)

In essence, we started with a problem which concerns indicator functions of subintervals of $[0,1]$, generalized it to a problem involving a much bigger class of functions $\mathcal{F}$ and then found a nice, special subclass with which we can "capture" the whole class $\mathcal{F}$ (and in particular the functions we started with). This procedure of "generalize and then specialize" seems quite common in analysis. In this case it also has some relation to the Tricki article "Turn sets into functions".

The canonical example to introduce this idea early to students is the Fundamental Theorem of Calculus. In order to figure out one area $\int_{a}^b f(t)dt$, you must come to grips with the generalized problem $x \mapsto \int_{a}^x f(t)dt$

Some of the power series/Taylor series also fall into this category, they are probably easier to evaluate as power series than at a point. In both situations one can see easely why sometimes the more general problem is easier. Studing the (signed) area as a function or a series as a power series allows one to use one extra tool: derivation (which is actually the key for this problem).
–
Nick SSep 28 '10 at 19:22

There are many examples where introducing one or more extra parameters into an integral that you want to evaluate or an identity that you want to prove makes things easier. For example, Feynman was fond of evaluating integrals by differentiating under the integral sign, and in his advanced determinant calculus, Christian Krattenthaler explicitly urges you to introduce more parameters into any determinant you are having trouble evaluating.

For the Tricki, maybe one of the simplest examples would be the evaluation of $\int_0^\infty {\sin x \over x}dx$ by considering $\int_0^\infty e^{-xt} \bigl({\sin x \over x}\bigr) dx$ and differentiating with respect to $t$.

Great examples with integrals! One that I remember reading in a Polya's book way back when is $$\int_{-\infty}^{\infty}\frac{1}{(x^2+1)^2},$$ which can be evaluated by replacing 1 with $a$ and differentiating $\int_{-\infty}^{\infty}\frac{1}{x^2+a}$ under the integral sign.
–
Victor ProtsakSep 29 '10 at 6:34

The following was a conjecture for several years. Put a light bulb and a switch at every
vertex of an $m\times n$ grid ($mn$ vertices in all). Each bulb can be on or off. Each switch changes the state of the bulb at its vertex and all its neighbors. (A neighbor is a vertically or horizontally adjacent vertex.) Then whatever the initial set of bulbs that are lit, it is possible to turn all the lights off by some set of switches. Sutner showed in 1989 that the corresponding result is true for any graph. Caro gave a simpler proof in 1996. The proof is an elegant application of linear algebra mod 2. Looking at grids adds an extra layer of complexity that obscures the underlying theory. One reference is http://mathworld.wolfram.com/LightsOutPuzzle.html.

I thought that the proof for arbitrary graphs assumes that the initial state has all lights on? Whereas for grid graphs it does not matter what the initial state is. For example if I have a triangle with one light on, I don't think I can turn all the lights off.
–
Tony HuynhSep 27 '10 at 0:27

1

Tony is right: what the link actually says is that it is possible to go from "all lights off" to "all lights on" for any square grid. Another counterexample to the "any state" problem on arbitrary graphs is any trivalent graph. (Or pentavalent, or so on...)
–
drvitekSep 27 '10 at 1:19

6

I guess the smallest counterexample would be a single edge with one light on, which also happens to be a 1x2 grid.
–
Tony HuynhSep 27 '10 at 1:58

Sometimes an appropriate response to a difficult mathematical problem is to pose a much harder problem. Here we find the minimal nonzero length intractable, and thus ask for *all* the lengths of vectors of $C$ and their multiplicities. Equivalently, we ask for the following generating function of all the squared lengths, called the *theta function* (or *theta series*) of $C$:
$$\Theta_C(z) = \sum_{x \in C} z^{\langle x, x \rangle} = 1 + \sum_{m >0}^{\infty} N_m(C) z^m$$
where $N_m(C)$ is the number of lattice vectors of length $\sqrt{m}$.

It is hard to consider particular lengths but easier to consider the entire theta function because you give the problem more structure, and then you have access to stronger tools like Poisson summation.

This may be a little trivial, but there are a number of identities for the Fibonacci numbers that are most easily proved by generalizing them. For example, proving

$f_{2n-1}=f_{n+1}f_{n}-f_{n-1}f_{n-2}$

requires a rather convoluted process involving a couple lemmas unless one realizes that it is far easier to prove

$f_{m+n}=f_{m}f_{n+2}-f_{m-2}f_{n}$

and then substitude $m=n,n=n-1$.

I think a classic example of generalizing in order to prove a simple result is Galois Theory. Ruffini's attempted proofs of the unsolvability of the quintic were enormously long and tremendously complicated. However, once the machinery of Galois Theory is developed, which is rather easy, it is almost trivial to demonstrate that there exist quintic equations that are not solvable by radicals.

I would argue that turning the question about the quintic into the question "which polynomials can be solved by radicals" is not a simplification at all. To develop Galois theory is easy now, in retrospect, but back then it was a tremendous achievement and involved amazingly deep insights and innovative thinking.
–
Alex B.Sep 27 '10 at 1:21

1

The funny thing about Galois theory, as suggested in the recent Notices review of Duel at dawnams.org/notices/201010/rtx101001297p.pdf, is that Galois was not snubbed because his ideas were perceived as incorrect, but rather because they were perceived as useless. Well, it was then and this is now, I guess...
–
Thierry ZellNov 10 '10 at 15:55

An extreme case of the phenomenon you are asking about happens frequently enough to be worth mentioning (and I think that it even partially "explains" it); sometimes you can generalize a problem and then realize that the more general problem is already solved ! The point is that there are usually many ways to look at or formulate the same problem, and when the problem is looked at from some points of view it seems novel, but when correctly reformulated you can recognize it as a special case of a solved problem, perhaps even a classic one. Here is an example that I admit is a bit forced, but it illustrates the idea. Show that if a triangle C can be divided by a line from a vertex to an opposite side into two triangles A and B that are similar to the original triangle, and that if corresponding sides of the three triangles are c,a, and b, then $a^2 + b^2 = c^2$. This is of course completely trivial, since the areas of similar triangles are obviously proportional to the squares of corresponding sides while the areas of A and B clearly add up to the area of C. On the other hand, a special case of this: dropping the perpendicular from the right angle vertex of a right triangle onto the hypotenuse, gives a proof of the Pythagorean as a special case. (As you may know, this is how Einstein was supposed to have found a proof of the Pythagorean Theorem for himself.)

I like the matroid intersection theorem. Basically, this hammer often renders a problem trivial once you generalize to matroids. Also, it's nice that the hammer itself is not hard to prove. First the statement of the theorem.

Matroid Intersection Theorem. Let $M_1$ and $M_2$ be matroids on the same ground set $E$, with rank functions $r_1$ and $r_2$ respectively. Then the size of a maximum common independent of $M_1$ and $M_2$ is

\[
\min_{A \subset E} \ r_1(A)+r_2(E-A).
\]

I won't include a proof, but you can find one in say Volume B of Combinatorial Optimization by Schrijver. It turns out that this theorem simultaneously proves many theorems such as König's theorem and Nash-Williams' theorem on covering graphs with $k$ edge-disjoint spanning trees. Here's another application.

Rainbow Spanning Trees. Let $G$ be a graph with a (not necessarily proper) $k$-colouring of $E(G)$. Suppose you are trying to decide if $G$ contains a rainbow spanning tree. That is a spanning tree such that no two edges of the tree are the same colour. One obvious necessary condition is as follows. If I choose $t$ colours and remove all the edges with these colours, then the resulting graph better have at most $t+1$ components. Is this also sufficient? This seems that it could be rather tricky to prove. However, there is an easy proof using matroids.

Let $M_2$ be the matroid whose independent sets are those subsets of edges which contain at most 1 edge of each colour. Let $M_1$ be the cycle matroid of $G$. Then, $G$ has a rainbow spanning tree if and only if $M_1$ and $M_2$ have a common independent set of size $|V(G)|-1$.

By the matroid intersection theorem it suffices that

\[
\min_{A \subset E} \ r_1(A)+r_2(E-A) \geq |V(G)|-1.
\]

Note that $r_1(A)$ is simply $|V|-c(A)$, where $c(A)$ is the number of components of the subgraph $(V,A)$. On the other hand, $r_2(E-A)$ is just the number of colours among $E-A$. Rearranging yields

I can't resist mentioning the following problem (and requesting that nobody gives away the solution any more than it is already given away by my mentioning it here).

Call a real number repetitive if for every k you can find a string of k digits that appears more than once in its decimal expansion. The problem is to prove that if a real number is repetitive then so is its square.

Doron Zeilberger wrote a very nice expository article entitled, "The method of undetermined generalization and specialization illustrated with Fred Galvin's amazing proof of the Dinitz conjecture," in which he discusses how repeated generalization and specialization can lead one to the solution of a difficult problem.

Sometime around 25 years ago, Dr. Jeffrey Vaaler at UT Austin gave me the following problem.. He needed the result as a lemma for a paper he was working on.

Let $n$ be a square-free integer with $k$ distinct prime factors and thus $\sigma(n) = 2^k$ divisors. Split the divisors into two sets of equal size: the small divisors $S$ and the large divisors $T$. The statement he was trying to prove was:
$$\text{There exists a bijection}\ f: S \rightarrow T \hspace{2mm} \text{such that} \hspace{2mm} d \hspace{1mm} | \hspace{1mm} f(d) \hspace{2mm} \text{for every} \hspace{2mm} d \in S.$$
I was an undergraduate and highly motivated to demonstrate my usefulness, but I didn't really have many ideas about how to go about it. The obvious approach is by induction on $k$, but I never really got anywhere despite spending many hours on the problem.

A year later I ran into Dr. Vaaler in the hall and asked if he ever solved it. Of course, he had, by induction on $k$. He went on to explain the "trick" to making the induction work. He proved a more general result. Introduce a parameter $0 \le r \le \frac{1}{2}$ and consider $S_r$ and $T_r$, the smallest $\lfloor r \cdot 2^k \rfloor$ divisors and the largest $\lfloor r \cdot 2^k \rfloor$ divisors respectively, and instead prove the above statement with $S_r$ and $T_r$ in place of $S$ and $T$.

The lemma is then the special case with $r = \frac{1}{2}$.

This example stuck with me. How could it be easier to prove something more general? Though I understand the concept better today, it still surprises me.

By the way, the generalization may be stated as follows: Let A be any collection of subsets of $\\{1,2,\dots,n\\}$ s.t. if $U\in A$ and $V\subset U$, then $V\in A$. Then there exists a bijection $\pi:A\rightarrow A$ such that $V\cap \pi(V)=\emptyset$ for any $V\in A$.
–
Fedor PetrovNov 20 '10 at 6:55

A very simple example of this phenomenon is also given by the following problem: prove that the maximum determinant of $n\times n$ matrices that have entries in the set $\{-1,+1\}$ is divisible by $2^{n-1}$. In fact, it is much easer proving by induction that all matrices with coefficients in $\{-1,+1\}$ have determinant divisible by $2^{n-1}$, because you can use induction and just say that changing a $-1$ to a $+1$ will change the determinant by the amount $2 * 2^{n-2}k$, by row expansion, and you can do so till when you get a matrix will all entries $1$, which has determinant $0$ when $n>1$. Note that it is not clear how you could use induction to prove the weaker statement about the maximum determinant.

Actually when a statement can be proved by induction it happens quite often that the correct statement that "makes induction work" is a somewhat generalized version of the result to be proved.

I think this leads to an alternative question of finding "simpler" proofs, where in this case "simpler" may mean "not using an induction principle" . The proof involving multiplication by -1 on columns to get a matrix with one row all ones and then subtracting this row from others to get an (n-1)x(n-1) minor which is a {0,-2} matrix seems to me simpler in this respect, and gives more information. I see a study of proof refinement based on several metrics of "to refine" that could start with your example. Gerhard "Ask Me About System Design" Paseman, 2010.09.26
–
Gerhard PasemanSep 26 '10 at 18:55

1

I am particularly interested on the example you cite. It was last Sunday that I began studying the collection of all matrices with such entries. I started enumerating all matrices with size 2X2 and as I progressed to the matrices of size 3X3, there were too many. Before I study them from scratch, is there any book you can point me to with regard to such matrices? One thing I noticed was that "the product of all the matrices of even size gives the zero matrix" What do you think? Thanks.
–
UnknownSep 28 '10 at 17:18

2

@Gerhard: interesting comment, i didn't know your proof, that is even more synthetic. @Elohemahab: enumeration of such matrices is something that if required should be performed in an automated way, since for $3\times 3$ matrices there are already $2^9 = 512$ of them. I don't know of any book, by they have some interest because of Hadamard conjecture, you could give a look at the references on the page en.wikipedia.org/wiki/Hadamard_matrix (Hadamard matrices are the ${1,-1}$ that maximize the absolute value of the determinant). What do you mean by product, being it not commutative?
–
Maurizio MongeSep 28 '10 at 19:15

If $\delta:[0,\infty)\to\mathbb{R}$ is such that $\delta^{(n)}$ decreases monotonically to zero, then there is a unique $f$ such that $f(x+1) = f(x) + \delta(x)$, $f(0) = 0$, and $f^{(n)}$ is monotonically increasing.

(To recover the original statement, take $n=1$ and $\delta(x) = \log(x+1)$.)

The benefit of this formulation is that it can be proved by induction on $n$. The inductive step is pretty routine and involves applying the $(n-1)$-case to $f'$ and $\delta'$. The basis case for $n=0$ still requires a nontrivial argument, but to me it feels much simpler and more intuitive than the case $n=1$ (for the original theorem), and it's actually easy to remember (in contrast to the direct proof of the Bohr-Mollerup theorem, which I find hard to remember). All you have to do is apply the functional equation $k$ times to get $f(x+k)-f(k) = f(x) + \sum_{j=0}^{k-1}[\delta(x+j)-\delta(j)]$ and take the limit as $k$ approaches infinity; you end up with an increasing function of $x$ that is zero whenever $x$ is an integer and is thus zero for all $x$. Thus $f(x) = \sum_{j=0}^{\infty}[\delta(j) - \delta(x+j)]$ is the unique solution. (Since $\delta$ is decreasing, we do have an increasing function, and the sum converges since $\delta(x)\to 0$ as $x\to +\infty$.)

Perhaps the example of the calculation of fundamental groups works in this situation. Calculating, for instance, the fundamental group of the circle uses hard ideas from complex analysis (at least in many of the traditional approaches) but if one generalises to calculating the fundamental groupoid of the circle based at two points and one uses the groupoid van Kampen theorem, (not the group vKT) then the proof reduces to quite simple algebra. (The importance of that is really that it links the original complex analytic machinery into the geometric machinery in a neat way helping one to see the 'unity' or close symbiotic relationship of the two areas, which of course have a common heritage in Poincaré.)

Don't you then want to prove that any path in U(1) lifts uniquely (given the starting point) to a path in iR? It's not hugely hard, but it's the irreducible difficulty I was referring to when asking whether the groupoid approach could avoid it.
–
gowersSep 26 '10 at 12:07

7

Proving the groupoid vKT is not so different from proving the group vKT. As I see it, there are two main ways of computing a fundamental group: (1) Use vKT, writing a space as union of spaces with already-known fund gps, or (2) find a simply-connected covering space and use the tie between covering spaces of X and fund gp of X. For example, these can give a student two complementary insights into the fund gp of a Klein bottle. For the circle, method (1) is unavailable; but it becomes available when you pass to groupoids.
–
Tom GoodwillieSep 26 '10 at 12:43

2

I'm not sure that I agree with this example, because the simplicity comes at the expense of extra baggage (groupoids instead of groups). I wish I had a better example (i.e. one I can agree with) of the fundamental groupoid making things genuinely easier.
–
Daniel MoskovichSep 26 '10 at 15:33

2

The vKT for groupoids is substantially more natural than the vKT for groups.
–
Harry GindiSep 26 '10 at 16:03

4

I would disagree that groupoids are 'extra baggage' as they are extremely useful in proving results in group theory without resort to topological concepts. For instance that a subgroup of a free group is free and related results can be done just with groupoids, without the very complicated machinery originally used and without explicit use of covering spaces etc. Perhaps you could formulate your wish as a question in some way and see what the replies throw up. I also suggest looking at Ronnie Brown's Topology and Groupoids book as he has some examples that might be of interest.
–
Tim PorterSep 26 '10 at 18:54

It may be matter of taste or being used to certain ways of thinking, but I find that this proof that the opposite of a category of finitary algebras isn't itself a category of finitary algebras again, is somewhat involved, compared to the proof of the statement that the opposite of a locally presentable category is not again locally presentable (if it's not a poset) - this has both weaker hypothesis and stronger conclusion.

(the original statement follows because a category of finitary algebras is locally finitely presentable, see here)

Here's an example in planar euclidean geometry. Consider an equilateral triangle of side $a$ and a general point in the plane distant $b$, $c$, and $d$ from the respective vertices. Then

$3(a^4 + b^4 + c^4 + d^4) = (a^2 + b^2 + c^2 + d^2)^2$.

This is an an awful slog to get by planar trigonometry. Even harder to do by trig in three dimensions is the corresponding result for the regular tetrahedron. However, it's easy to get the $(n - 1)$-dimensional result for a regular $(n - 1)$-dimensional simplex of side $d_0$, with vertex distances $d_1$ ,..., $d_n$ :

$n(d_0^4 + ... + d_n^4) = (d_0^2 + ... + d_n^2)^2$.

You can do this by embedding the euclidean $(n - 1)$-dimensional space as the hyperplane of points $(x_1 ,..., x_n)$ in euclidean $n$-space such that $x_1 + ... +x_n = d_0/\sqrt2$. The vertices of the simplex can then be represented as the points $(d_0/\sqrt2)(1, 0 ,..., 0)$, ... , $(d_0/\sqrt2)(0 ,..., 0, 1)$ in the hyperplane, and the result drops out in a few lines.

It seems people have kept on posting answers here rather than in the older thread, so I'll put this one here as well.

I remember thinking about the phenomenon you described when I first came across the tensor power trick. I can't summarise the idea any better than the quick description given in that link; the point relevant here is that, even though one might want to prove something for a single object or a single family of objects, if one can prove it for a family that includes the one of interest and that is also closed under taking 'tensor products', then that might be easier.

Here is a quick example from the book by Tao and Vu. If $A$ and $B$ are finite non-empty subsets of an abelian group $G$, then a natural argument gives the sumset inequality

$$ |2B-2B| \leq 16 \frac{|A+B|^4 |A-A|}{|A|^4}.$$

Now, it is possible to get rid of the factor of 16, but if we had only proved the inequality for $G = \mathbf{Z}/p\mathbf{Z}$, say, then we might very well have trouble doing so. Given the more general statement, one can get rid of the factor by applying the inequality to the group $G\oplus G \oplus \cdots \oplus G$ (say with $M$ copies of $G$) with sets $A \oplus \cdots \oplus A$ and $B \oplus \cdots \oplus B$ and taking $M$th roots.

Another scenario that springs to mind is when one wants to prove a statement involving several instances of a single object $X$: it can sometimes be easier to prove the statement if one replaces some of the instances by possibly distinct objects $X_i$. For example, it seems to be easier to prove the Cauchy-Davenport inequality $|A+B| \geq \min(|A|+|B|-1,p)$ for sets $A,B \subset \mathbf{Z}/p\mathbf{Z}$ rather than its corollary $|A+A| \geq \min(2|A|-1,p)$, since one can induct on the size of $B$ (say) separately from the size of $A$. (For the particular induction proof I have in mind I guess this can be seen as an example of the 'strengthening the induction hypothesis' idea as well.)

Quantum topology is much easier for knotted tori in R4 than for knots in R3. The former is a generalization of the latter, because for any knot you can take the boundary of a tubular neighbourhood of the inclusion of the knot into R4, which is a knotted torus.
The reason that the more general problem is easier is that the projectivization (homomorphic expansion) of the space of knotted tori in R4 gives rise to a space of diagrams $\mathcal{A}$ containing oriented chords. The homomorphic expansion of the space of knots, on the other hand, gives rise to a space of diagrams in which the chords are not oriented. Oriented, based trees are much simpler combinatorial objects that unoriented, unbased trees. In particular, the Drinfeld associator, which is the most painful aspect of quantum topology of knots, vanishes in $\mathcal{A}$.
The upshot of the generalization is that the universal finite-type invariant for knotted tori in R4 is the Alexander polynomial, which is a homological invariant, and which is immeasurably simpler then the universal finite type invariant for knots, the Kontsevich invariant.
In fact, a further generalization, allowing "trivalent vertices" in the knotted tori (where two tubes fuse into one) simplifies the algebra yet further and allows the proof of theorems relating the value of the Alexander polynomial of such an object with its cablings. Again, this is motivated by the algebra- we expand the class of topological objects under consideration in order to create an associated graded space which looks as much as possible like a quantized Lie (bi)algebra, from where our invariants are going to come, and which we are supposed to know how to handle.
Dror Bar-Natan discussed this, and related ideas, in a series of talks in Montpellier.

Added: This doesn't obviously solve a problem. It's a non-obvious example of a problem being easier for a more general class of objects; and you can hope to use insights gained from the easier problem to attack the harder, less general problem.

Bruce Schneier has an online paper called "A Self-Study Course in Block-Cipher Cryptanalysis": http://www.schneier.com/paper-self-study.pdf containing an extensive list of algorithms to cryptanalyze as exercises. By far the easiest exercise is this one:

[Cryptanalyze] a generic cipher that is “closed” (i.e., encrypting with key A and then key
B is the same as encrypting with key C, for all keys).

The solution to this exercise would be a lot less obvious had Schneier instead pointed to some particular block cipher that has this property. But because the reader is told nothing about the cipher except that it is closed, he immediately knows exactly what to attack.

The family of examples that leaps to my mind is the sub-trick of "strengthening the inductive hypothesis," which I thought I wrote a Tricki page on but now see I abandoned after doing epsilon of editing. (I may still have a rough draft, or at least notes, somewhere; I'll try to dig it up in the next week or so.) My all-time favorite example of this is Carsten Thomassen's proof that planar graphs are 5-choosable, which in fact proves the following:

Let $G$ be a planar graph whose interior is triangulated; let $v_1, v_2$ be adjacent vertices lying on the infinite face of $G$; and let $\{L_v\}$ be a family of lists associated to the vertices $v \in V(G)$, $v \neq v_1, v_2$ such that $|L_v| = 3$ if $v$ lies on the infinite face, and $|L_v| = 5$ otherwise. Furthermore, fix the colors of $v_1, v_2$ (ensuring that they are not colored the same). Then $G$ is $L$-choosable.

The above is proved by a fairly straightforward induction; the 5-choosability theorem follows immediately as a corollary.

The free cocompletion. A lot of adjoint couples of functors are just particular case of this construction, and often it is easier to use the general theorem than working out a particular case by hand (for example for $i_{!}$ and $i^{!}$).

Both methods are obviously useful, and I may have under appreciated one of them, but on the relative value of generalization versus specialization in problem solving, I offer the following opinion from Hilbert:

"If we do not succeed in solving a mathematical problem, the reason frequently consists in our failure to recognize the more general standpoint from which the problem before us appears only as a single link in a chain of related problems. After finding this standpoint, not only is this problem frequently more accessible to our investigation, but at the same time we come into possession of a method which is applicable also to related problems. The introduction of complex paths of integration by Cauchy and of the notion of the ideals in number theory by Kummer may serve as examples. This way for finding general methods is certainly the most practicable and the most certain; for he who seeks for methods without having a definite problem in mind seeks for the most part in vain.

In dealing with mathematical problems, specialization plays, as I believe, a still more important part than generalization. Perhaps in most cases where we seek in vain the answer to a question, the cause of the failure lies in the fact that problems simpler and easier than the one in hand have been either not at all or incompletely solved. All depends, then, on finding out these easier problems, and on solving them by means of devices as perfect as possible and of concepts capable of generalization. This rule is one of the most important levers for overcoming mathematical difficulties and it seems to me that it is used almost always, though perhaps unconsciously."

In the case of proofs by induction, the reason it may be easier to prove a stronger result can be simply that one can use a stronger induction hypothesis.

(I think of the example of proving Łos's theorem in model theory. It says something about formulas that may have free variables. It's proved by induction on the formation of first-order formulas. Imagine trying to prove it only in the case of sentences without free variables. A weaker statement. The proof by induction doesn't work for that case.)

I've already posted an answer on this thread, but I found another example I'd like to describe separately. Let $r > 0$ and consider the following problem, coming from compound interest or as one definition of $e^r$:

Show that $f(n) = (1 + \frac{r}{n})^n$ increases with $n$.

One generalize the problem strategy is to allow $n$ to be a continuous variable (probably this trick could have its own article). Now, see if you can prove that $f(n)$ still increases. If you take this mindset, it's natural to use the definition of $n$th power for $n \in {\mathbb R}$ and write

$f(n) = e^{n \log(1 + \frac{r}{n})}$

And the problem has reduced to showing that
$x \log(1 + \frac{r}{x}) = \int_0^1 \frac{r}{(1 + \frac{sr}{x})} ds$
increases with $x$, which it clearly does. (Here we've used the integral definition of the logarithm, but written in a way typically helpful for analyzing such products.)

Another problem that can be solved through allowing a discrete parameter to be continuous is to prove Stirling's approximation for $n!$ (although to make that proof very clean you can also use other labor saving tricks like Taylor expansion by integration by parts and the dominated convergence theorem).

If you ran into this problem from compound interest, or you were hoping for something more elementary which did not use such a heavy understanding of the exponential function, then you probably want to find a different proof. But finding a different proof still seems to require "generalizing the problem", but in a different way.

Another proof, goes as follows. Imagine that interest at a rate $r$ works as follows: once an amount of money is invested, the value of each unit after a time $t$ is given by $(1 + tr)$. That is, the value of the money grows linearly. Now imagine you had the opportunity to withdraw and immediately reinvest your money at a time of your choice. Having this ability would allow you to raise more money, because it would allow you to accrue interest on the interest you've already earned (hence the name "compound interest"). With this interpretation, the number $(1 + \frac{r}{n})^n$ is the value of each unit of money after time $1$ and $n$ regularly spaced compoundings.

The proof now goes as follows: if you had a choice of when these compoundings would occur, then the more compoundings the better, and the best way to allocate $n$ compoundings is to have them occur at $n$ regularly spaced time intervals. That is, we interpret
$(1 + \frac{r}{n})^n = \max \prod_{i=1}^n (1 + a_i r)$
under the constraint that $0 \leq a_i \leq 1$ with $\sum a_i = 1$.

For example, it is better to have one compounding than to have none at all, because after withdrawing and reinvesting the money, now not only does the initial investment grow linearly, but also the interest you earned before the withdrawal grows linearly. For the same reason, given $a_1, \ldots, a_n$, the opportunity to compound once more during, say, $0 < t < a_1$, would allow you to increase the amount of money at all later times.

The fact that the best choice of $(a_1, \ldots, a_n)$ is to have $a_1 = a_2 = \ldots = a_n = \frac{1}{n}$ is the principle that the largest product you can obtain when the sum of positive numbers is fixed is to have all the terms equal. This is easy to check with two variables: you can either find the largest rectangle to fit inside an isosceles triangle, or otherwise just note that if $a_1 \neq a_2$, then changing to $a_1' = \frac{(a_1 + a_2)}{2} = a_2'$ gives an improvement for $(1 + a_1 r)(1+ a_2 r) < (1 + a_1' r) (1 + a_2' r)$. The case of $n$ variables actually follows from this observation.

So if you really wanted some elementary solution to the problem, this one would do. It's an interesting example because you can see that either solution involves some kind of generalization, but the two generalizations are unrelated to each other. The first one does not need to / is unable to consider these non-even partitions. The second does not need to / is unable to consider fractional $n$.

By the way, does anyone know how to prove in an elementary way (i.e. expanding) that $\prod_{i=1}^n (1 + a_ir)$ tends to $e^r = \sum \frac{r^k}{k!}$ as $\max |a_i| \to 0$ with $0 \leq a_i \leq 1$ and $\sum a_i = 1$? An easy solution goes by writing the product with the exponential function so that you get the exponential of $\sum \log(1 + a_i r) = \sum \int_0^1 \frac{a_i r}{(1 + s a_i r)} ds$.

You can then integrate by parts (i.e. Taylor expand) to obtain
$\sum a_i r - \sum \int_0^1 (1-s) \frac{(a_i r)^2}{(1 + s a_i r)^2} ds$. Now, $\sum a_i r = r$ is the main term. After you take $\max |a_i|$ to be less than $.5 / |r|$, the error term is bounded in absolute value by $C \sum (a_i r)^2 \leq \max \{ |a_i| \} \cdot \sum a_i |r|^2$. I can, of course, move this question to a different thread.

EDIT: I realized later on that there is a completely elementary proof, and it is also completely obvious even though I didn't think of it. Namely, you expand $(1 + \frac{r}{n})^n$ into powers of $r$, and it is easy to see after a little algebra that each coefficient increases with $n$. I still find the other solutions interesting, but this turns out not to be a good demonstration of how generalizing can make a problem easier. By the way, the last question I had asked was answered in this thread:

The LLL algorithm to factor polynomials with integer coefficients. Previously people had been fussing with Hensel lifting and tons of other methods that (imo) were far too complicated. (For a good reference on LLL and factoring polynomials, also see Yap's excellent book and his chapter on lattice reduction ).

LLL solved the more general problem of finding short (or 'short enough') vectors on integer lattices in higher dimensional spaces. This was then used to to encode the problem of factoring polynomials with integer coefficients in it. As an added bonus, the lattice reduction techniques presented also solved the simultaneous Diophantine approximation problem, but that somehow doesn't seem as striking as integer polynomial factorization.

Semialgebraic sets are very nice: they are closed under Boolean operations (obvious) and projections (not so obvious, but old result of Tarski-Seidenberg). Semianalytic sets are not so nice, because they're not closed under projections.

What is one to do if one wants to study them nonetheless? Shift the focus to projections of semianalytics instead, a.k.a. subanalytic sets. Those sets are closed under projection by construction, but all of a sudden, the Boolean algebra property is not so clear. But that's where Gabrielov's theorem of the complement comes in: the complement of a subanalytic set is again subanalytic. We now have a nice structure in which reside all the natural geometric operations we may want to do.