First, let me make it clear that I do not mean jokes of the
"abelian grape" variety. I take my cue from the following
passage in A Mathematician's Miscellany by J.E. Littlewood
(Methuen 1953, p. 79):

I remembered the Euler formula $\sum n^{-s}=\prod (1-p^{-s})^{-1}$;
it was introduced to us at school, as a joke (rightly enough, and
in excellent taste).

Without trying to define Littlewood's concept of joke, I venture
to guess that another in the same category is the formula

$1+2+3+4+\cdots=-1/12$,

which Ramanujan sent to Hardy in 1913, admitting "If I tell you this
you will at once point out to me the lunatic asylum as my goal."

Moving beyond zeta function jokes, I would suggest that the empty
set in ZF set theory is another joke in excellent taste. Not only
does ZF take the empty set seriously, it uses it to build the whole
universe of sets.

Is there an interesting concept lurking here -- a class of
mathematical ideas that look like jokes to the outsider, but which
turn out to be important? If so, let me know which ones appeal to
you.

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
If this question can be reworded to fit the rules in the help center, please edit the question.

In regard to the empty set being a joke, Frank Harary and Ronald Read wrote a 1974 paper entitled "Is the null graph a pointless concept?".
–
Richard StanleySep 15 '10 at 22:29

10

It's amazing to see how many such jokes involve geometric series.
–
Thierry ZellSep 16 '10 at 12:16

11

I just noticed that there are two puns in "abelian grape variety," as variety can associate with abelian or with grape. Too bad this is the kind of joke you don't mean.
–
Gerry MyersonSep 26 '10 at 6:34

2

The two most recent answers have been more-or-less duplicates of previous answers. Time to close?
–
Gerry MyersonOct 3 '10 at 12:25

I was going to give this example! It is from a classic book by W. W. Sawyer, Prelude to mathematics. What makes this example interesting is that Sawyer described precisely the phenomenon we are discussing here: to a mathematics student, it appears to be a joke, but in fact, this is a standard technique of solving integral equations. Just to appreciate how a good a joke it is, the passage in Sawyer's book I quoted by memory I read over 20 years ago!
–
Victor ProtsakSep 16 '10 at 3:07

@Victor. I read some of those W.W. Sawyer books too, and they may have planted the seed of this question in my mind. Alas, it was 50+ years ago, and I no longer remember what I read in them.
–
John StillwellSep 16 '10 at 19:20

@Aaron: This equation holds in the Grothendieck ring of varieties (sometimes called ``ring of motives of varieties'' or similar). Whenever you have a Zariski locally trivial fibration $X \to Y$ with every fiber isomorphic to $F$, then $[X] = [F] * [Y]$ in the Grothendieck ring.
–
Arend BayerSep 17 '10 at 14:57

@Vivek: that doesn't mean the equation doesn't have content, e.g. it is meaningful to know that in the Grothendieck ring the terms don't all collapse to zero. (At least, I assume that they don't.)
–
Qiaochu YuanSep 28 '10 at 16:51

There are some funny proofs of this, too. Here's one that works over any field: First notice that the theorem holds for diagonalizable matrices. Then, adjoin $n^2$ indeterminants to our field and take the algebraic closure. But the $n \times n$ matrix whose entries are those indeterminants is now diagonalizable! Thus, we've proved the Cayley-Hamilton theorem as a polynomial identity over our original field.
–
Gene S. KoppSep 25 '10 at 17:19

In the same vein as the "Freshman's dream"
$$(a + b)^p = a^p + b^p,$$
which is true in characteristic $p$, there is also the "Sophomore's dream", which is the identity
$$\int_{0}^{1}{x^{-x} \: dx} = \sum_{n = 1}^{\infty}{n^{-n}}.$$
Surprisingly enough, this identity is actually correct.

Nice! This joke has a long pedigree, going back to a formula of Johann Bernoulli (1697): $\int^{1}_0 x^x dx=\sum^{\infty}_{n=0}(-1)^n n^{-n}$. But it makes a much better joke with the sign change.
–
John StillwellSep 17 '10 at 14:48

If you think of products of a's and b's as regular expressions, with $(1-x)^{-1}$ playing the role of $x^{\ast}$, the result is fairly obvious. This type of reasoning is, to some extent, captured by Kleene algebras.
–
Dan PiponiOct 4 '10 at 17:31

Tim Gowers mentioned infinities that may sound like jokes, especially to outsiders. Here is one specific example: you are standing in a room; at every tick of the clock, someone throws in a pair of numbered ping-pong balls: 1 & 2, then 3 & 4, etc... and you only have enough time to throw out one of them before the next tick. If you throw out the one with the largest number, then after $\omega$ ticks of the clock, you are in the room with all the odd-numbered balls, whereas if you always threw out the ball with the smallest number, you would be rid of them all!

And what if the balls are not numbered? A good way to get non-mathematicians thinking about infinity.

A similar example is The Gnome and the Pearl of Wisdom: A Fable by Richard Willmott (animations at komal.hu/cikkek/egyeb/torpe/torpe.h.shtml). This is about a sequence of numbered boxes and numbered marbles. The boxes start out empty, then in step $ t $, the gnome puts the stone number $ t $ to box $ 0 $, then resolves the conflict of two stones being in the same box by repeatedly moving the stone with the higher number (in the first variation; lower number in the second variation) to the next box. The question is where are the stones after $\omega$ steps.
–
Zsbán AmbrusSep 16 '10 at 13:53

As Torkel Franzen pointed out in his wonderful book Godel's Theorem: An Incomplete Guide to Its Use and Abuse, if you harbored serious doubts about the consistency of your axioms, why would you be seek a consistency proof in that same setting?
–
Thierry ZellSep 16 '10 at 16:33

14

But would a naive person be upset to find a proof of consistency in a supposedly rock-solid system? Probably not---but the joke here is that, nevertheless, they should be upset, as it reveals inconsistency.
–
Joel David HamkinsSep 16 '10 at 21:55

Mazur's proof that knots do not have inverses under addition of knots:
If $A+B=0$, then $$A = A + (B+A)+(B+A)+\cdots=(A+B)+(A+B)+\cdots=0.$$
This is like the traditional joke proof that $1=0$ with $A=1$, $B=-1$; the difference is that the proof with knots is valid because the infinite sums of knots are meaningful: make the knots smaller and smaller.

This is not a joke, you CAN cancel those du's, if understood properly. Namely, if we have graph of f(x), and a tangent vector v at point (c, f(c)), then dy = dy(v) = projection of v to y-coordinate, dx = dx(v) = projection of v to x-coordinate. Their quotient dy(v)/dx(v) is equal to f'(c).
–
mathreaderSep 16 '10 at 9:20

1

mathreader: indeed you can do that, but it's still a joke in the sense this question asks.
–
Zsbán AmbrusSep 16 '10 at 13:39

3

Well, if you have two differentials which are dependent in the sense that $dx \wedge dy = 0$ then there is of course some function such that $dy$ = $f dx$. What could $f$ be called but $dy / dx$? Since $\mathbb{R}$ is one-dimensional, the chain rule joke always works.
–
Matt NoonanSep 16 '10 at 14:28

I've met this one in a presentation where I was told that the Gauss binomial coefficient $ \genfrac{[}{]}{0pt}{}{n}{k}_q $ is the number of $ k $ dimensional subspaces in an $ n $ dimensional projective space over the field $ GF_q $.
–
Zsbán AmbrusSep 16 '10 at 14:13

Another example from intro calculus: I once put a question of the form "$y=f(x)^{g(x)}$, find $y'$" on an exam. One student reasoned, if the exponent were a constant, the answer would be $g(x)f(x)^{g(x)-1}f'(x)$, but that's not right; if the base were a constant, the answer would be $g'(x)f(x)^{g(x)}\log f(x)$, but that's not right either; so I'll put them together to get $g(x)f(x)^{g(x)-1}f'(x)+g'(x)f(x)^{g(x)}\log f(x)$. This joke was on me, since that turns out to be correct.

In graduate school I graded one of the questions on a multiple section exam in beginning calculus. The correct answer was 4, and many many students got that, but few by any correct route. My conclusion, as the derivative is a limit process, was that the set consisting of the single number 4 is dense in the real line.
–
Will JagySep 16 '10 at 0:52

25

Of course, this is just the multivariate chain rule $\frac{d}{dx}f(u(x),v(x)) = \frac{\partial f}{\partial u}\frac{du}{dx}+\frac{\partial f}{\partial v}\frac{dv}{dx}$. Perhaps your student was motivated by the product rule $(fg)'=f'g+fg'$, which works the same way.
–
Ricky LiuSep 16 '10 at 17:33

10

It has naturaly physical sense: if $x$ arises several times in our expression, we may consider small peturbations (from the definition of derivative) of $x$ being independent and then sum up.
–
Fedor PetrovSep 19 '10 at 20:09

3

I noticed this myself a long time ago (for the special case y=x^x), but always thought it was a cute coincidence -- I never realized that it was the multivariate chain rule!
–
Harrison BrownOct 29 '10 at 19:02

A good joke about infinity is the following. A hotel has rooms $1,2,\dots$. Every room is full when a new guest arrives. The clerk moves the occupant of room $n$ to $n+1$ to make room for the new guest in room 1. An hour later another guest arrives and the clerk repeats the
process. 30 minutes later a third guest arrives and the process is repeated. Then 15 minutes, 7.5 minutes, etc., until two hours after the first new guest infinitely many guests have arrived and been accommodated. The clerk is very pleased with himself for dealing with these infinitely many guests, when he notices to his horror that all the rooms are empty! All the guests have mysteriously disappeared!

I want to evaluate $f(x+t)$. This is a function of two variables, but let's consider it a function $F(t)$ whose value is a function of $x$, i. e., $F(t)(x) = f(x+t)$. Note that $F(0) = f$, and in general $F$ satisfies the differential equation
$$F'(t) = D_x(F(t))$$ (both sides being the function $x\mapsto f'(x+t)$). But $D_x$ is just a linear operator, so this is just a homogeneous linear ODE with constant coefficients. The solution is thus $$f(x+t) = F(t)(x) = (e^{tD_x}F(0))(x) = (e^{tD_x}f)(x)
= \sum_{n=0}^\infty \frac{((tD_x)^nf)(x)}{n!}
= \sum_{n=0}^\infty \frac{t^n f^{(n)}(x)}{n!}.$$ Voilà, Taylor series!

We owe Paul Dirac two excellent mathematical jokes. I have amended them with a few lesser known variations.

A. Square root of the Laplacian: we want $\Delta$ to be $D^2$ for some first order differential operator (for example, because it is easier to solve first order partial differential equations than second order PDEs). Writing it out,

It remains to come up with the right $\gamma_i$'s. Dirac realized how to accomplish it with $4\times 4$ matrices when $n=4$; but a neat follow-up joke is to simply define them to be the elements $\gamma_1,\ldots,\gamma_n$ of

Using symmetry considerations, it is easy to conclude that the commutator of the $n$-dimensional Laplace operator $\Delta$ and the multiplication by $r^2=x_1^2+\cdots+x_n^2$ is equal to $aE+b$, where
$$E=x_1\frac{\partial}{\partial x_1}+\cdots+x_n\frac{\partial}{\partial x_n}$$ is the Euler vector field. A boring way to confirm this and to determine the coefficients $a$ and $b$ is to expand $[\Delta,r^2]$ and simplify using the commutation relations between $x$'s and $\partial$'s. A more exciting way is to act on $x_1^\lambda$, where $\lambda$ is a formal variable:

Since $x_1^{\lambda}$ is an eigenvector of the Euler operator $E$ with eigenvalue $\lambda$, we conclude that

$$[\Delta,r^2]=4E+2n.$$

B. Dirac delta function: if we can write

$$g(x)=\int g(y)\delta(x-y)dy$$

then instead of solving an inhomogeneous linear differential equation $Lf=g$ for each $g$, we can solve the equations $Lf=\delta(x-y)$ for each real $y$, where a linear differential operator $L$ acts on the variable $x,$ and combine the answers with different $y$ weighted by $g(y)$. Clearly, there are fewer real numbers than functions, and if $L$ has constant coefficients, using translation invariance the set of right hand sides is further reduced to just one, $\delta(x)$. In this form, the joke goes back to Laplace and Poisson.

What happens if instead of the ordinary geometric series we consider a doubly infinite one? Since

which add up to $1$. This time, the sum of doubly infinite geometric series is zero!
Thus the point $0\in\mathbb{Z}$ is the sum of all lattice points on the non-negative half-line and all points on the positive half-line:

$$0=[\ldots,-2,-1,0] + [0,1,2,\ldots] $$

A vast generalization is given by Brion's formula for the generating function for the lattice points in a convex lattice polytope $\Delta\subset\mathbb{R}^N$ with vertices $v\in{\mathbb{Z}}^N$ and closed inner vertex cones $C_v\subset\mathbb{R}^N$:

An expansion on Timothy Chow's example of Grandi's series $1 - 1 + 1 - 1 \pm ... = \frac{1}{2}$. It is possible to interpret the left hand side as computing the Euler characteristic of infinite real projective space $\mathbb{R}P^{\infty}$, which is a $K(\mathbb{Z}/2\mathbb{Z}, 1)$ and therefore rightfully has orbifold Euler characteristic $\frac{1}{2}$! I think I learned this example from somewhere on Wikipedia.

There are other divergent series that fit the bill, such as $1-1+1-1+ \cdots = 1/2$. Here's one from formal language theory: Suppose we define a language $L$ recursively by the rule $L = 1 | aL$, meaning that the empty string $1$ is in $L$, and the letter $a$ followed by any element in $L$ is also in $L$. Jokingly, we note that $|$ is akin to addition and concatenation is akin to multiplication, so we can solve for $L$: $1 = L - aL = L (1-a)$, so
$$L = {1\over 1-a} = 1|a|aa|aaa|aaaa \ldots,$$
which is the right answer.

Nice example. I like the fact that the very name "umbral calculus" admits its shady nature.
–
John StillwellSep 15 '10 at 20:51

1

Dear Mariano, I think that "umbral calculus" was coined by Sylvester. Looking on Mathworld confirms this, but (according to Mathworld) the shadows being alluded to are the combinatorial identities obtained, which "shadow" more obvious polynomial or Taylor series idenities.
–
EmertonSep 16 '10 at 0:10

3

@Mariano: The term "umbra" (shadow) is attributed to Sylvester, and its first occurrence seems to be the following, from 1851 (in his Collected Mathematical Papers vol. 1, p.242): "Each quantity is now represented by two letters; the letters themselves, taken separately, being symbols neither of quantity nor of operation, but mere umbrae or ideal elements of quantitative symbols."
–
John StillwellSep 16 '10 at 0:40

2

Dear John, This is more appealing and romantic than the Mathworld explanation, and presumably more accurate, since you are quoting Sylvester directly. I wonder if the Mathworld explanation is just based on speculation rather than primary sources?
–
EmertonSep 16 '10 at 1:52

The classical Stokes formula $\int_{\partial\Omega}\omega=\int_\Omega d\omega$ is certainly a Littlewood type joke. That is especially true if you learn it after you've spent a few months covering vector calculus, learned rotor, divergence, path and surface integrals of 2 kinds, etc., which is the standard route to follow.

In a probability oral exam, a student is asked to compute the probability that a random number
chosen from the interval $[0,1]$ is larger than $2/3$. The students answers $1/3$. The teacher asks him to explain his argument, and he says: well, there are three possibilities: the number is either less than, or bigger than, or equal to $2/3$, so, the probability is $1/3$!

I know you say "moving beyond zeta function jokes", but I'd say the following two zeta-regularizations deserve to be alongside your Ramanujan example:
$$\infty!= \sqrt{2\pi}\qquad\qquad\mbox{and }\qquad\qquad
\prod_{\mbox{$p$ prime}}p =4\pi^2.$$
One can also entertain beginning calculus students with $\frac{1}{2}!=\frac{1}{2}\sqrt{\pi}$ as a way of introducing the Gamma function.

This is a souped-up version of the freshman's dream: as Jon Borwein pointed out to me: if $a_n=(-1)^n/(2n+1)$, then $$\left(\sum_{n=-\infty}^{\infty}a_n\right)^2=\sum_{n=-\infty}^{\infty}a_n^2$$ as they are both $\pi^2/4$.

Well, there is one set of cardinality 0; one set of cardinality one; one set of cardinality 2, but since its automorphism group has order 2, we only count it with multiplicity 1/2; there is one set of cardinality 3, counted with multiplicity 1/3!; ...
So the number of sets is
$$
1 + 1+ 1/2! + 1/3! + \dots = e
$$

This isn't a particularly interesting example, but the existence of different sizes of infinity fits your criterion of being something that makes outsiders laugh (as I know from experience) and that is also very important to mathematicians.

The familiar argument that says that if you want an explicit example of $a^b=c$ with a and b irrational and c rational, then one of $a=b=\sqrt{2}$ or $a=\sqrt{2}^{\sqrt{2}}$ and $b=\sqrt{2}$ will work is certainly an argument that makes people laugh. Though the result itself is not very important, the phenomenon it illustrates is quite important.

Added two minutes later: I've just had a look at Scott Aaronson's post and seen that Erik, one of the earlier commenters, chose precisely the same two examples. It was a coincidence -- honest.

Given a function $f$ on the real line, let's compute the function
$\Sigma f$, taking $n \mapsto f(1) +\ldots + f(n)$. Well, $\Sigma = 1/\Delta$,
where $\Delta$ is the differencing operator $shift - 1$. And the shift
operator is the exponential of the differentiation operator (this being,
essentially, Taylor's theorem). Hence
$$ \Sigma = \frac{1}{e^D - 1} = \frac{1}{D} \frac{D}{e^D-1} $$
Using L'Hopital's rule on the latter as $D\to 0$,
whatever THAT means, we see the limit is $1$. So expand in a power series:
$$ \frac{1}{D} \frac{D}{e^D-1} = \frac{1}{D} (1 + \text{power series in $D$}) $$
The first term is $1/D$, which is of course $\int$.

No surprise:
$\Sigma = \int + $ correction terms. What the above suggests is that those
correction terms come from the Taylor expansion of $\frac{D}{e^D - 1}$.
This leads to the Euler summation formula (and eventually, to
Hirzebruch-Riemann-Roch).

I learned this from "Concrete Mathematics", where I recall this joke
being attributed to Laguerre. Part of why it is a joke is that the Euler
summation formula has an error term, that can't be neglected for most
functions, e.g. $\ln(x)$ which one wants to sum up to compute $\ln(n!)$.
It can be neglected for polynomials times exponentials.

This was once presented to me as a kind of proof, though I think it works better as a kind of joke:

To compute ${\partial^n\over\partial x^n}(fg)$, split ${\partial\over\partial x}$ into the sum of a piece $D$ that just acts on $f$ (acting as the identitiy on $g$) and a piece $E$ that just acts on $g$ (acting as the identity on $f$) and write
${\partial^n\over\partial x^n}(fg) = (D+E)^n(fg) = \sum_{i=0}^n \binom{n}{i} D^i E^{n-i}(fg) = \sum_{i=0}^n \binom{n}{i} {\partial^i f\over\partial x^i}{\partial^{n-i} g\over\partial x^{n-i}}$.