It's a common theme in mathematics that, if there's no canonical choice (of basis, for example), then we shouldn't make a choice at all. This helps us focus on the heart of the matter without giving ourselves arbitrary stuff to drag around.

However, in this question, I'm looking for examples of problems solved by a specific type of "not making a choice" - namely, making all available choices, and looking at all the end results together as a whole. We can't necessarily discern any individual piece, but the average behavior, or some other information about the big picture, provides (or at least points towards) a solution.

I really wish I had an example of this phenomenon to provide, but even one escapes me at the moment, which is what spurred me to ask this question. I imagine combinatorics is full of examples; unfortunately I haven't really studied that field in any depth yet.

Something close to what I'm after is Burnside's (a.k.a. not-Burnside's) Lemma. There's no good way of directly counting orbits, i.e. choosing to look at a particular orbit one at a time, so we just look at the average number of fixed points of elements of $G$ (I'm reluctant to call this an example of the kind of result I'm looking for because I'm not entirely clear on why the fixed points of an element should be thought of as substitutes for orbits. Perhaps that's a separate question).

Your question seems too vague. For example, the problem of computing the average of $\sin(x)$ on $[0,2\pi]$, isn't this an example of what you're looking for?
–
Ryan BudneyDec 21 '10 at 4:13

2

And of course, the probabilistic method is really a special kind of counting.
–
Derrick StoleeDec 21 '10 at 4:21

1

I think Zev means something like the following: suppose one has an algebraic object G to which one can attach a collection of related objects G_a, but to do so one has to make an arbitrary choice of a. So instead one considers something like the direct product of all the G_a.
–
Qiaochu YuanDec 21 '10 at 5:08

1

Also, I would be interested if you asked the Burnside's lemma question. I think the lemma can be categorified and it would be quite interesting to have an opportunity to try to work this out.
–
Qiaochu YuanDec 21 '10 at 5:14

2

A neat formal definition of the positive integer $n$ is to take the set of all sets of size $n$. Hmm ... actually perhaps not.
–
gowersDec 21 '10 at 15:36

27 Answers
27

Another classical example of "looking at all choices instead of one" is the idea of the fundamental groupoid of a topological space. Instead of choosing one base point and letting all loops begin and end in this point one considers all paths between all points (modulo homotpy). This notion makes theorems like the Seifert-Van-Kampen-theorem much more natural. One does no longer have to add technical conditions that certain intersections contain the base point and are path connected.

I would like to mention Witten's idea (marvelously explained by Freed) of obtaining topological invariants of manifolds by "simply" integrating the Chern-Simons form over all possible connections. It doesn't solve a particular problem in the sense that it's a key step in some proof, but it does give rise to new mathematics.

A great example is found in the famous 1976 Annals (Vol. 103, No. 1) paper of Deligne-Lusztig, "Representations of Reductive Groups over Finite Fields".

This paper begins (in Section 1) with the following:

Suppose that in some category we are given a family $(X_i)$ ($i \in I$) of objects and a compatible system of isomorphisms $\phi_{ji}: X_i \rightarrow X_j$. This is as good as giving a single object $X$, the 'common value' or 'projective limit' of the family. This projective limit is provided with isomorphisms $\sigma: X \rightarrow X_i$ such that $\phi_{ji} \sigma_i = \sigma_j$.

Where this is applied by Deligne-Lusztig, and how it relates to the question, is the following: In a connected reductive algebraic group $G$ over a field $k$, there is no canonical choice of maximal torus $T$. (A maximal torus in $GL_n$ corresponds to a choice of basis vectors up to scaling, so this is very similar to the original question). So Deligne-Lusztig consider the indexing set $I$ consisting of all pairs $(T,B)$ where $T$ is a maximal torus in $G$ and $B$ is a Borel subgroup containing $T$. For each $i = (T,B) \in I$, let $T_i = T$—the first entry in the ordered pair. For each pair $i,j \in I$, there is a unique isomorphism from $T_i$ to $T_j$ given by $\operatorname{Int}(g_{ij})$ (conjugation by $g_{ij}$) for some $g_{ij}$ satisfying $g_{ij} B_i g_{ij}^{-1} = B_j$. (Note that $g_{ij}$ and $\operatorname{Int}(g_{ij})$ are not unique, but the induced isomorphism from $T_i$ to $T_j$ is uniquely determined since $N_G(T) \cap B = T$ when $T \subset B$)

This indexing set $I$ makes all possible choices and the extra data of the Borel subgroup allows for the definition of "THE" maximal torus $T$, the projective limit of the system $(T_i)$.

I think this provides a good example to answer the original question. It also demonstrates how it is possible to make choices with extra data (in this case Borel subgroups in addition to tori), to "rigidify" and eventually define a universal choice.

For any graph, there exists a 2-coloring for which at least half the edges have vertices of both colors.

The problem here is to choose the right coloring. The way to prove it exists, is to consider all colorings. Pick an edge E. Then there's a bijection between colorings in which the vertices of E are colored the same and colorings in which they're colored differently. (Define the bijection by changing the color of one vertex.) This is true for each edge separately. Therefore, in the average covering, half of all edges are bi-colored. Therefore, in at least one covering, half of all edges are bi-colored.

To each coloring, we associate a number --- the number of edges that are bicolored. The average of these numbers is at least equal to E/2, where E is the number of edges in the graph. If the average of a set of numbers is at least E/2, then at least one of those numbers is at least E/2.
–
Steven LandsburgDec 21 '10 at 4:58

2

Correction: The average of those numbers is exactly equal to E/2. If the average of a set of numbers is exactly E/2, then at least one of those numbers is at least E/2.
–
Steven LandsburgDec 21 '10 at 4:59

1

Zev, it seems to me the point of the probabilistic method is not that one is trying to avoid making non-canonical choices but that one is trying to avoid making choices that are too simple: at least in many applications, the search space is very complicated and it is easier to search it randomly than deterministically.
–
Qiaochu YuanDec 21 '10 at 5:20

4

I'm not sure this argument quite fits the bill because there are other proofs. For example, add a vertex one at a time and choose its colour to be different from already added neighbours at least as often as it is the same. Thus, in this case averaging is somehow not essential, even if it happens to work very neatly.
–
gowersDec 21 '10 at 9:36

6

I would have to agree with gowers. There is a canonical choice. Namely, just take a colouring with the most bicoloured edges. Then it is easy to prove that at least half the edges are bicoloured. If not, then some white vertex v has more white neighbours than black neighbours. Switch the colour of v to obtain a contradiction. For the historical record, this problem is known as the affirmative action problem. See this applet if you want to experiment with Tim's idea
–
Tony HuynhDec 21 '10 at 13:45

In crystalline cohomology, you want to lift your variety in prime characteristic $p$ to a $p$-adic variety and then take its de Rham cohomology. But there's no natural lift. If you're Dwork, you just pick a lift and compute. But if you're Grothendieck, you look at the category of all lifts, and then things work by magic.

In fact, here the failure of uniqueness is more extreme, because you might not even have existence. In other words, your variety might not lift at all. But it does always lift locally, so you need to do everything for all sufficiently small open subvarieties. This is a common theme in geometry, where the failure of a local construction to be unique can prevent the global construction from existing at all.

It sounds like the probabilistic method fits your description. The following is an old Putnam problem (which has a detailed solution in the book, "The Probabilistic Method" by Alon and Spencer). Consider the following problem:

You have an $n\times n$ matrix of lights, with a given initial configuration of some lights being on, and others being off. Now, each row and column has a switch which switches the lights from on to off (off to on) in that particular row or column (respectively). The question is: by pressing the switches in some fashion, how big can the number of lights on minus the number of lights off be? Indeed, one can show that REGARDLESS of the initial configuration, you can guarantee at least $(\sqrt{2/\pi}+o(1))n^{3/2}$ lights on minus lights off.

The proof seems daunting at first, especially if one gets bogged down in the myriad of various initial configurations. The key idea is to go to each column switch, flip a coin and if it comes up heads you press the column switch. By the magic of probability, you can work out that now each row is UNIFORMLY distributed. Apply the Central Limit Theorem, and you're pretty much done (see the above reference for the gory details).

The moral here is clear: look at all initial configurations at the same time!

Sard's theorem provides such an example. Given a random smooth map between two manifolds (lets say compact and of the same dimension), there is no canonical way of constructing a regular value. But, Sard's theorem looks at this entire set and proves it is of full measure (and hence there exists atleast one regular value). This result enables us to prove many such results like (the transversality theorem for instance).

I don't know whether you would count this, but the proof of the existence of quotient groups seems to fit your description. Some people define the product of the cosets $gH$ and $g'H$ to be the coset $gg'H$, and then go on to prove that this is well-defined. Another approach is to define two elements $g$ and $h$ of $G$ to be equivalent if $gh^{-1}\in H$, to define cosets to be equivalence classes, and to define the product of two cosets to be ... their product. That is, if $A$ and $B$ are cosets then their product is $AB=\{ab:a\in A, b\in B\}$. Of course, we have to prove that that is a coset. But if $ac^{-1}\in H$ and $bd^{-1}\in H$, then $cbd^{-1}c^{-1}\in H$ so $ab(cd)^{-1}\in H$. Thus, if $a\sim c$ and $b\sim d$, then $ab\sim cd$.

I can't quite decide whether this is a real example. I did have to take elements, but I wasn't exactly choosing them so much as proving something about all of them. The point of the example is that the product of cosets is defined by multiplying everything by everything -- that is the sense in which it does what the OP asks for.

Let $G$ be a finite bipartite graph with vertex bipartition $(V,W)$. Suppose we want to
find a matching $f\colon V\to W$, i.e., an injective function such that for all $v\in V$,
the vertices $v$ and $f(v)$ are adjacent. We may not be sure what to choose for $f(v)$, so
we choose all possible vertices at once as follows. Let $K$ be a field and $KS$ the vector
space with basis $S$. Define a linear transformation $\varphi\colon KV\to KW$ by letting
$\varphi(v)$ be the sum of all vertices in $W$ adjacent to $v$ (or more generally, some
linear combination of such vertices). It is easy to show that if $\varphi$ is injective,
then a matching $f\colon V\to W$ exists. We can think of an injective $\varphi$ as a
"quantum matching" (since it is in all possible "states" $f(v)$ at once). Choosing a
nonzero term in a certain determinant "collapses" the wave function $\varphi$ to the
matching $f$. For further details and some nice applications, see Sections 4-6 of
http://math.mit.edu/~rstan/algcomb.pdf.

This is more properly a comment on gowers' quotient group example, but a little long for that.

For me the canonical example of not making a choice by making all the choices is the quotient of a finite-dimensional vector space $V$ by a subspace $X$. Naively (imagine back to taking linear algebra for the first time) we might want this to again be a subspace of $V$. This is especially tempting when one has not yet been broken of the habit of thinking immediately of $V = \mathbb{R}^n$ with its extra inner product structure. With an inner product in the picture, the orthogonal complement of $X$ is a perfectly reasonable and natural quotient object.

Ignoring this extra structure, any complement $Y$ (subspace such that $X+Y=V$, $X\cap Y = 0$) can play the role of the quotient $V/X$. Such a complement comes equipped with a natural map $V\to Y$ obeying the universal mapping property. So existence is not a problem for this "definition" of quotient like it would be with groups. Every quotient map of finite dimensional vector spaces splits.

But without some extra structure lying around like an inner product on $V$, we have no natural way of choosing such a $Y$. So we instead define $V/X$ to be a set of cosets, each of which (except zero) contains one point of each possible $Y$. The resulting object is no longer a subspace of $V$, but it has the advantage of avoiding making the choices by making all the possible choices. This naturality makes the "real" definition extend to infinite dimensions, for example, without using choice.

Background: In contrast to Riemannian manifolds which carry a unique torsion-free metric connection, a conformal manifold has many torsion-free connections preserving the conformal structure (they are in 1-1 correspondence with connections on the weight bundle, and thus form an affine space directed by the space of 1-forms). Any such connection is called a Weyl structure.

Application: Using Weyl structures one might define so-called Gauduchon gauge on conformal Hermitian manifolds. This is the unique metric $g$ in the conformal class whose Lee form is $g$-co-closed. Gauduchon gauges have important applications in geometry (e.g. Kobayashi-Hitchin correspondence, see http://www.cmi.univ-mrs.fr/~teleman/documents/universal-05.pdf).

This is not exactly what you wanted, but in algebraic geometry it is often easier to prove something for a particular object by considering the moduli space parametrizing such objects. The example I have in mind is the following: Suppose you picked some random elliptic curve over $\mathbb{Q}$ and were wondering if it has a rational point of order $11$. It is possible to answer this for any particular curve with some computational facility, but we don't have to. We have Mazur's Theorem, which says that the answer is `no, it doesn't.' Mazur does this essentially by showing that the corresponding moduli space of elliptic curves with a choice of $11$-torsion point (which is a nice modular curve) has no rational points: so you can never have an elliptic curve over $\mathbb{Q}$ with an $11$-torsion point.

This may be a little tangential to the original question, but it avoids Choice, and 'works with all choices at once'. In Makkai's theory of anafunctors, one recognizes that 'the' functor $C^J \to C$ giving a limit of a (small) diagram $J \to C$ is only really defined via universal properties, and so requires Choice. However, there is a unique anafunctor $C^J$ ⇸ $C$ - which is a span $C^J \leftarrow D \to C$ where the left-pointing leg is fully faithful and surjective on objects - expressing the limit. The category $D$ is defined to consist of limit cones and maps between them. The functor to $C^J$ forgets the vertex of the cone, and the functor to $C$ forgets the diagram and keeps the vertex. The universal properties take care of functoriality, and if one can choose a limit for each diagram, or there are canonical constructions of limits, then this can be converted into an ordinary functor. The cost of working with anafunctors rather than functors is that one gets a bona fide bicategory of categories, rather than a 2-category, but otherwise the whole theory of categories goes through.

I like Donald Newman's proof of the non-vanishing of Dirichlet series on the real line $\mathfrak{R}(s) = 1$, where for a given modulus $m$ one should not consider each series $\displaystyle \sum_{n=1}^\infty \frac{\chi(n)}{n^s} = \prod_p \left(\frac{1}{1 - \chi(p) p^{-s}} \right)$ separately, but instead consider the product $\displaystyle Z(s) = \prod_\chi \prod_{p} \left(\frac{1}{1 - \chi(p)p^{-s}} \right)$. Then the non-vanishing of the $L$-series follows almost immediately by first supposing the existence of a zero on the line $\mathfrak{R}(s) = 1$, say $1 + i\sigma$, then considering the product $Z(s)^2 Z(1 + i \sigma)Z(1 - i \sigma)$. This product is entire (by hypothesis) and its series has positive coefficients. This implies that the series in fact converges everywhere, which is plainly absurd.

Here's a basic but important example. In the modern approach to things like algebra we study objects like groups and rings in two stages: first we study their abstract structure, then we study their representations. Generally there is no way to pick a "canonical" representation. So instead we study all of them: the entire category of representations. And that gives us a lot of extra structure to work with and allows us to solve many, many problems.

I think a better point along these lines is this: When $A$ is a finite-dimensional (probably Artinian is enough) semisimple $k$-algebra, then $A$ decomposes into a direct sum of $A$-left modules. But these are not canonical (although, up to isomorphism, they are). Same for $A$-right modules. But if we switch to $\left(A,A\right)$-bimodules (which sort of are several $A$-left modules welded together and several $A$-right modules welded together), they become canonical (up to order).
–
darij grinbergDec 21 '10 at 10:07

Usually Lp-spaces of a measurable space Z are defined with respect to some faithful measure μ on Z.
More precisely, if p is a complex number with a nonnegative real part,
then by definition Lp(Z,μ) consists of all functions f on Z such that μ(|f|1/ℜp) is finite if ℜp>0. If ℜp=0, then Lp(Z,μ) consists of all bounded functions on Z.
(Here I use the algebraic convention for Lp-spaces, namely Lp=L1/p, in particular L0=L∞.)

Even though p is assumed to be complex, Lp(Z,μ)
only depends on the real part of p. This will be fixed later.

It turns out that all spaces Lp(Z,μ) for different choices of a faithful
measure μ are canonically isomorphic to each other.
Suppose μ and ν are two faithful measures on Z and (Dμ:Dν) is the Radon-Nikodym derivative
of μ with respect to ν, i.e., μ=(Dμ:Dν)ν. Then the map f∈Lp(Z,μ)⁠↦f(Dμ:Dν)p∈Lp(Z,ν) is an isomorphism.
Observe that these isomorphisms are compatible with each other (i.e., passing from λ to μ and then
from μ to ν is the same as passing from λ to ν and passing from μ to itself is identity).
Hence we have a compatible system of isomorphisms as described in Marty's answer,
therefore we can denote its limit (or colimit) by Lp(Z).
Thus we no longer need to choose a measure to define Lp-spaces.

Individual spaces Lp(Z,μ) depend only on the real part of p,
but the isomorphisms between them also depend on the imaginary part of p.
Therefore, if p−q is real, then Lp(Z) is isomorphic to Lq(Z) non-canonically.
There is no canonical isomorphism because such a canonical isomorphism would
give us a canonical measure on Z.

Thus we got rid of the dependence on the choice of a measure
and obtained a meaningful definition of Lp-space for compex values of p.

The spaces L0(Z) and L1(Z) can be defined canonically without this procedure:
L0(Z) is the space of all bounded functions on Z
and L1(Z) is the space of all finite complex-valued measures on Z.
However, all constructions of Lp(Z) for p∉⁠{0,1} known to me involve some kind
of limit/colimit over all measures.

You could say that treating all models of a first-order theory is a way of avoiding the arbitrary selection of a particular completion of that theory. There are other situations where it may be best to treat all completions - http://ncatlab.org/nlab/show/completion#nonunique.

The point of view of making all choices at once instead of canonical choices is part of Grothendieck's philosophy. In SGA1, VI, 12, after considering the notions of fibered categories and cleavages, we can see the following remark at the end:

which essentially qualifies the "canonical choice" approach as an old form of thinking, stressing the preference for the "making all choices at once" point of view. This is particularly well illustrated in the case of the concept of cleavages of Grothendieck fibrations. One of the first examples of fibered categories (probably the one upon which Grothendieck was thinking when coming up with this concept) is the one associated to the slice categories with pullback functors between them. Since generally there's no canonical choice of pullbacks, the existence of pullback functors usually appeals to the axiom of choice. This amounts to specifying a cleavage in the fibration considered, but Grothendieck wanted to avoid, when possible, working with cleavages.

A case where the above is relevant is the following. In 1978 Joyal gave a series of lectures in Montréal exposing a categorical proof of the completeness theorem for many kinds of logic (including classical logic). It was essential for Joyal's proof, in the case of non classical logics, not to rely on the axiom of choice, and hence Grothendieck fibrations without cleavages was the solution for making his proof constructive. This has though the disadvantage of making the proof less intuitive, and in general related expositions (as can be seen for example at Johnstone's "Sketches of an Elephant", D1.5) prefer to use the "canonical choice" approach that is a bit less obscure.

It is easy to define the homology of a simplicial complex. But in general, even a nice space $X$ may have myriads of triangulation (and not so nice spaces may have none at all). The idea is now to obtain a canonical "triangulation" by considering all triangulations at once. This only works if makes the notion of a triangulation much weaker, though. Every simplex in a triangulation defines a map from the standard simplex to the space $X$. One builds now a simplicial set (a generalization of a simplicial complex), where the simplices are indexed by all maps from simplices to $X$. This (or rather its geometric realization) is no longer homeomorphic to $X$, but still weakly equivalent, which turns out to be sufficient. The singular homology of $X$ is then the simplicial homology of this simplicial set.

There are many more examples in topology, maybe most of which can be encoded in operads. For example, we can look at the loop space $\Omega X$ of a space $X$ with base point $x\in X$, i.e. the space of paths $f: [0,1] \to X$ with $f(0) = f(1) = x$. One can now define the product of two paths $f$ and $g$ in $\Omega X$ by taking $f$ with double speed on $[0,1/2]$ and $g$ with double speed on $[1/2,1]$. But this choice is arbitrary. So, one looks at the space $D_1(2)$ of all pairs of disjoint subintervals of $[0,1]$ and gets a parametrized multiplication map
$$D_1(2) \times \Omega X\times \Omega X \to \Omega X.$$
This encodes in a way all possible choices of multiplication maps $\Omega X \times \Omega X \to \Omega X$. More generally, if one takes the space $D_n(k)$ of $k$ disjoint subdisks of $D^n$ (defined in a suitable way), one gets a parametrized multiplication map
$$D_n(k) \times (\Omega^n X)^{\times k} \to \Omega^n X$$
for the $n$-fold loop space $X$ (consisting of maps $f: D^n \to X$ with $f(\partial D^n) = x$). The spaces $D_n(k)$ form for fixed $n$ an operad. This point of view is very fruitful for the study of the cohomology of $\Omega^nX$ and cohomology operations in general.

In defining the tangent space of a smooth manifold at a point $p$, one can consider a local coordinate $\phi: U \to \phi(U)$ with $p \in U$ and define the tangent space as the tangent
space to $\phi(p)$ in $\phi(U)$ which is an open subset of $\{\mathbb R}^n$. However this definition would depend on the choice of $\phi$, which is undesirable. Now, the idea is to use the natural isomorphisms between tangent spaces thus defined (the derivatives of the change of coordinate maps) to identify these spaces.

I don’t know if this is what you are looking for, but in linear algebra, if you have a finite dimensional vector space $V$ and $f\in\mathrm{End}(V)$, then in order to define the trace of $f$, you choose a basis $(e_1,\dots,e_n)$ of $V$ and define $\mathrm{Tr}(f)=\mathrm{Tr}(M_{(e_1,\dots,e_n)}(f))$

You have to make a choice but this definition does not depend on the basis.

If you want a basis-free definition, then one way of doing it is to define the trace of $A$ to be the derivative of the determinant of $I+\delta A$ at zero. Or more generally, one can define it to be the coefficient of the linear term of that polynomial in $\delta$, or equivalently (plus or minus) the coefficient of the degree-$(n-1)$ term of the characteristic polynomial. Of course, one has to use a basis-free definition of determinant for this.
–
gowersDec 21 '10 at 15:28

6

Or you use the isomorphism of End(V) with $V\otimes V^*$ and then take the map given by $v\otimes f\mapsto f(v)$.
–
Harry AltmanDec 21 '10 at 19:03

Harry: this is once again an example of "looking at all choices at once". We are given ONE endomorphism $f$, but we have to work with the whole $\mathrm{End}V$.
–
darij grinbergDec 24 '10 at 22:08

There is an example which seriously stunned me two years ago when I first learned about Hopf algebras. In hindsight it is not that much surprising...

Hopf algebras are usually sold as generalizations of groups. Now, in a group, the notion of "inverse" is something that has to do with one element only: If $g$ is an element of a group $G$, then the inverse of $g$ is defined to be a $g^{-1}\in G$ satisfying $gg^{-1}=g^{-1}g=e$. In contrast, in a Hopf algebra, it is hard to tell whether one given element equals the antipode of another one just by looking at these elements: The axiom for the antipode is

$S\left(x_{(1)}\right)x_{(2)}=x_{(1)}S\left(x_{(2)}\right)=\varepsilon\left(x\right)$ for all $x$,

and checking this can only be done by checking it for all $x$ simultaneously. When I was a beginner with Hopf algebra, this fact foiled all my attempts at proving elementary properties of the antipode (such that: it is unique, it is an anti algebra homomorphism, any finite-dimensional sub-bialgebra of a Hopf algebra has an antipode, etc.), until I started considering the elements of a Hopf algebra as one big hive rather than a collection of detached things.

If $A$ is a Von Neuman algebra there is something we can called its $L^2$ representation: it is the GNS representation attached to any faithful normal state.

The point is that the $L^2$ representation does not (up to canonical isomorphism) depends on the choice of the faithfuls normal state. In order to obtain a construction of the $L^2$ representation which clearly does not depends on any choice, we used this technique: We take (for example, there is a lot of possible variation) the set of all couple $(\eta,h)$ where $\eta$ is a faithfull normal state and $h$ is a vector in the representation attached to $\eta$, and putt a scalar product on this (using the canonical identification of two GNS representation) and this generate the $L^2$ representation.