To conclude the proof of Wallace’s version of the Deutsch-Wallace theorem, we shall add the Equivalence theorem from the previous post to a pretty weak decision theory, and show that if you are rational and live in a universe described by the Many-Worlds interpretation, you must bet according to the Born rule.

The first rationality assumption we need is pretty minimal: we only demand Amir to have preferences between games that are in a precise sense coherent. They must be transitive, in the sense that if he would rather vote for Strache than to vote for Kern, and would prefer to vote for Kern over Kurz, than he must choose to vote for Strache over Kurz. He must also have definite preferences about any pair of games: either he thinks that Strache is better than Kurz, or that Kurz is better than Strache, or he is indifferent between them. He is not allowed to say that they are not comparable. Note that we are not judging whether his preferences are politically coherent, or whether voting for Strache is at all a good idea. The axiom is then:

Ordering: Amir’s preferences between games, written as $G \succeq G’$, define a total order in the set of games: if $G \succeq G’$, and $G’ \succeq G”$, then $G \succeq G”$. Moreover, for any two games $G$ and $G’$, either $G \succ G’$, or $G \sim G’$, or $G \prec G’$.

This means that the $\succeq$ behaves like the usual $\ge$ relation between real numbers[1]And unlike the positive semi-definiteness relation between matrices, that defines only a partial order..

The second and last rationality assumption we shall use is rather stronger, but I think still pretty well-justified. We demand that Amir’s preferences between games must remain consistent while he plays: if he prefers game $G$ to game $G’$, he cannot change his mind if $G$ and $G’$ are offered as rewards inside another game:

Consistency: Let $\alpha \neq 0$, and consider the games \[\ket{F} = \alpha\ket{M_0}\ket{G} + \beta\ket{M_1}\ket{z}\] and \[\ket{F’} = \alpha\ket{M_0}\ket{G’} + \beta\ket{M_1}\ket{z},\] that differ only on the game given as a reward when the measurement result is $M_0$. Then $F \succeq F’$ iff $G \succeq G’$.

These assumptions, together with Indifference and Substitution, are enough to imply the

Born rule: Suppose you are rational, and consider the games
\[\ket{G} = \sum_i \alpha_i\ket{M_i}\ket{z_i}\quad\text{and}\quad\ket{G’} = \sum_i \beta_i\ket{D_i}\ket{w_i}.\] Then there exists a function $u$ such that
\[ u(G) = \sum_i |\alpha_i|^2 u(z_i)\] and \[G \succ G’ \iff u(G) > u(G’) \] Moreover, $u$ is unique modulo the choice of a zero and a unity.

This theorem says that you are free to decide your preferences between the rewards: these will define their utility. Your freedom ends here, however: the probabilities that you assign to obtaining said rewards must be given by the Born rule, on pain of irrationality.

A comment is also in order about the uniqueness: the choice of a zero and a unity is analogous to the one that must be done in a temperature scale. In the Celsius scale, for example, the zero is chosen as the freezing point of the water, and the unity as $1/100$ the difference between the freezing point and the boiling point. In the Fahrenheit scale, the zero is chosen as the coldest temperature in Gdańsk’s winter, and the unity as $1/96$ the difference between the temperature of Gdańsk’s winter and the temperature of the blood of a healthy male. In any case, the choice of these two values define the temperature scale uniquely, and the same is true for utility, as implied by the following theorem:

Uniqueness: If $u$ is a utility, then $\mathcal F(u)$ is a utility if and only if $\mathcal F(u) = au+b$ for some real numbers $a,b$ such that $a>0$.

The proof of the ‘if’ direction is easy: just note that \[\mathcal F(u(G)) = a\sum_i |\alpha_i|^2 u(z_i) + b = au(G)+b,\] and that such positive affine transformations preserve the ordering of real numbers. The proof of the ‘only if’ direction is not particularly hard, but it is a bit longer and I shall skip it[2]You can find it on page 74 of “The Foundations of Statistics” by Savage, or on page 20 of “The Foundations of Expected Utility” by Fishburn. Since the choice of a value for the utility at two rewards $x$ and $y$ is enough to fix $a$ and $b$, the claim follows.

But enough foreplay, now we need to start proving the Born rule theorem in earnest. We’ll build it out of two lemmas: Slider, that says that the weights of a game with rewards $x$ and $y$ behave like a tuner for the preferences, and Closure, that says that as we move this slider we are bound to hit any reward between $x$ and $y$.

Proof: suppose $p > q$. Then we can define the games
\[ \ket{F} = \sqrt{q}\ket{M_0}\ket{x} + \sqrt{p-q}\ket{M_1}\ket{x} + \sqrt{1-p}\ket{M_2}\ket{y}\] and
\[ \ket{F’} = \sqrt{q}\ket{M_0}\ket{x} + \sqrt{p-q}\ket{M_1}\ket{y} + \sqrt{1-p}\ket{M_2}\ket{y}.\]
Note that the weights of rewards $x$ and $y$ in the game $F$ are $p$ and $1-p$, and in the game $F’$ they are $q$ and $1-q$, so by Equivalence we have that $F \sim G$ and $F’ \sim G’$. Since Consistency implies that $F \succ F’$, transitivity gives us $G \succ G’$. To prove the other direction, note that $p = q$ implies directly that $G \sim G’$, and $p < q$ implies $G \prec G'$ by the flipped argument.

Proof: since $\succeq$ is a total order, for all $\rho$ it must be the case that either \[z\succ \ket{G_p}, \quad z \sim \ket{G_p},\quad\text{or}\quad z \prec \ket{G_p}.\]Moreover, Slider tells us that there exists a critical $p_z$ such that
\begin{align*}
p > p_z \quad &\Rightarrow \quad \ket{G_p} \succ z \\
p < p_z \quad &\Rightarrow \quad \ket{G_p} \prec z
\end{align*}
Then some continuity argument will conclude that $z \sim \ket{G_{p_z}}$.

Now for the main proof: Let $x$ and $y$ be fixed rewards such that $x \succ y$. Set $u(x)$ and $u(y)$ to be any real numbers such that $u(x) > u(y)$, defining the unity and the zero of the utility function[3]If it makes you happy, set $u(x)=1$ and $u(y)=0$. This makes for less cumbersome notation, but I think it makes the proof less conceptually clear.. Now because of Closure for every reward $z$ such that $x \succeq z \succeq y$ there will be a unique number $p_z$ such that
\[ z \sim \sqrt{p_z}\ket{M_0}\ket{x} + \sqrt{1-p_z}\ket{M_1}\ket{y}.\] Define then
\[ u(z) = p_z u(x) + (1-p_z) u(y). \] We want to show that the utilities so defined do represent the preferences between any two rewards $z$ and $w$ in the sense that $z \succ w$ iff $u(z) > u(w)$. Suppose that $u(z) > u(w)$. This is the case iff $p_z > p_w$, which by Slider is equivalent to \[\sqrt{p_z}\ket{M_0}\ket{x} +\sqrt{1-p_z}\ket{M_1}\ket{y} \succ \sqrt{p_w}\ket{M_0}\ket{x} + \sqrt{1-p_w}\ket{M_1}\ket{y},\] which is equivalent to $z \succ w$.
Now we want to show that for any game \[\ket{G} = \sqrt{q}\ket{M_0}\ket{z} + \sqrt{1-q}\ket{M_1}\ket{w}\] its utility is given by \[ u(G) = q u(z) + (1-q) u(w),\] as advertised. By Consistency, we can replace $z$ and $w$ in $G$ by their equivalent games, and we have that
\begin{multline}
\ket{G} \sim \sqrt{q p_z}\ket{M_0}\ket{M_0}\ket{x} + \sqrt{q(1-p_z)}\ket{M_0}\ket{M_1}\ket{y} + \\ \sqrt{(1-q)p_w}\ket{M_1}\ket{M_0}\ket{x} +\sqrt{(1-q)(1-p_w)}\ket{M_1}\ket{M_1}\ket{y}. \end{multline} By Equivalence,
\[\ket{G} \sim \sqrt{\lambda p_z + (1-q)p_w}\ket{M_0}\ket{x} + \sqrt{q(1-p_z)+(1-q)(1-p_w)}\ket{M_1}\ket{y},\]
and since $x \succeq G \succeq y$, its utility is given by the above formula, so
\begin{align*}
u(G) &= (q p_z + (1-q)p_w)u(x) + (q(1-p_z)+(1-q)(1-p_w))u(y)\\
&= q u(z) + (1-q) u(w),
\end{align*} as we wanted to show.

With this argument we have proved the Born rule theorem for any game inside the interval $[y,x] = \{z: x\succeq z \succeq y\}$. This would be enough if we were to assume that the set of rewards was something so lame, but since we want to be deal with more interesting reward sets – like $\mathbb R$ – we cannot stop now. It is fortunately not hard to complete the proof: consider a sequence of intervals $[y_i,x_i]$ such that all of them contain $[y,x]$ and that their union equals the set of rewards. By the above proof, in each such interval there exists a utility function $f_i$ that satisfies the requirements. We want to show that these functions agree with each other, and as such define a unique utility over the whole set of rewards. For that, consider a reward $z$ in $[x_i,y_i]\cap [x_j,y_j]$ for some $i,j$. Then it must be the case that either $x\succeq z \succeq y$, or $x\succ y \succ z$, or $z \succ x \succ y$. By Closure, there exists unique $p_z,p_y$, and $p_x$ such that
\begin{align*}
z &\sim \sqrt{p_z}\ket{M_0}\ket{x} + \sqrt{1-p_z}\ket{M_1}\ket{y}, \\
y &\sim \sqrt{p_y}\ket{M_0}\ket{x} + \sqrt{1-p_y}\ket{M_1}\ket{z}, \\
x &\sim \sqrt{p_x}\ket{M_0}\ket{z} + \sqrt{1-p_x}\ket{M_1}\ket{y}. \\
\end{align*}
Since $f_i$ and $f_j$ are utilities over this interval, we must have that for $k=i,j$
\begin{align*}
f_k(z) &= p_zf_k(x) + (1-p_z)f_k(y), \\
f_k(y) &= p_yf_k(x) + (1-p_y)f_k(z), \\
f_k(x) &= p_xf_k(z) + (1-p_x)f_k(y). \\
\end{align*}
Now, we use our freedom to set the zero and the unity of the utilities to choose $f_k(y) = u(y)$ and $f_k(x) = u(x)$, taking these equations to
\begin{align*}
f_k(z) &= p_zu(x) + (1-p_z)u(y), \\
u(y) &= p_yu(x) + (1-p_y)f_k(z), \\
u(x) &= p_xf_k(z) + (1-p_x)u(y), \\
\end{align*}
which uniquely define $f_k(z)$ in all three situations, implying that $f_i(z)=f_j(z)$. Setting $u(z)$ to be this common value, we have defined a unique utility function over the whole set of rewards, and we’re done.

One might still be worried about Deutsch’s Additivity. What if it is actually necessary prove the Born rule? In this case one wouldn’t be able to use the Born rule in the Many-Worlds interpretation without committing oneself to stupid decisions, such as giving away all your money to take part in St. Petersburg’s lottery. Should one give up on the Many-Worlds interpretation then? Or start betting against the Born rule? If these thoughts are keeping you awake at night, then you need Wallace’s version of the Deutsch-Wallace theorem, that replaces Deutsch’s simplistic decision theory with a proper one that allows for bounded utilities.

Wallace’s insight was to realise that the principles of Indifference and Substitution do all the real work in Deutsch’s argument: they are already enough to imply the mod-squared amplitude part of Born’s rule. The connection of those mod-squared amplitudes with probabilities then follow from the other, decision-theoretical principles, but those are incidental, and can be replaced wholesale with a proper decision theory.

More precisely, Wallace used Indifference and Substitution to prove[4]Actually he completely changed the principles, using new ones called Erasure, Branching indifference, Microstate indifference, and Diachronic consistency, but they amount to the same thing. a theorem called Equivalence, which states that Amir must be indifferent between games that assign equal Born-rule weights to the same rewards.

It was not at all obvious to me why this should be a strong result. After all, if I say that the games \[ \ket{G} = \alpha\ket{M_0}\ket{r_0}+\beta\ket{M_1}\ket{r_1}\] and
\[ \ket{G’} = \gamma\ket{D_0}\ket{r_0}+\delta\ket{D_1}\ket{r_1}\] are equivalent if $|\alpha|^2=|\gamma|^2$ and $|\beta|^2=|\delta|^2$, it will also be true that they are equivalent if $|\alpha|=|\gamma|$ and $|\beta|=|\delta|$[2]In fact for any bijective function of the modulus of the amplitudes, so we haven’t actually learned anything about the “square” part of the Born rule, we have only learned that the phases of the amplitudes are irrelevant. Or have we?

Actually, Equivalence shows its power only when we consider sums of mod-squared amplitudes. It says, for example, that the game (taken unnormalised for clarity) \[ \ket{G} = 2\ket{M_0}\ket{r_0}+\ket{M_1}\ket{r_1}\] is equivalent to the game
\[ \ket{G’} = \ket{M_0}\ket{r_0}+\ket{M_1}\ket{r_0}+\ket{M_2}\ket{r_0}+\ket{M_3}\ket{r_0}+\ket{M_4}\ket{r_1},\] as they both assign weight $4$ to reward $r_0$ and weight $1$ to reward $r_1$. Some alternative version of Equivalence that summed the modulus of the amplitudes instead, as it would be appropriate in classical probability theory, would claim that $G$ was actually equivalent to
\[ \ket{G’^\prime} = \ket{M_0}\ket{r_0}+\ket{M_1}\ket{r_0}+\ket{M_2}\ket{r_1},\] as they both would assign weight $2$ to reward $r_0$ and weight $1$ to reward $r_1$, a decidedly non-quantum result.

Having hopefully convinced you that Equivalence is actually worthwhile, let’s proceed to prove it. The proof is actually very similar to the one presented in the previous post, so if you think it is obvious how to adapt it you can safely skip to the next post, where we’ll do the decision-theory part of the proof. Below I’ll write down the proof of Equivalence anyway just for shits and giggles.

First let’s state it properly:

Equivalence: Consider two games \[ \ket{G} = \sum_{ij}\alpha_{ij}\ket{M_i}\ket{r_j}\quad\text{and}\quad \ket{G’} = \sum_{ij}\beta_{ij}\ket{D_i}\ket{r_j}.\] If all rewards $r_j$ have the same Born-rule weight, that is, if \[\forall j\quad \sum_i|\alpha_{ij}|^2 = \sum_i|\beta_{ij}|^2,\] then $G \sim G’$.

Note that unlike in the previous post we’re not stating that these games have the same value, but rather that Amir is indifferent between them, which we represent with the $\sim$ relation. We do this because we want to eventually prove that Amir’s preferences can be represented by such a value function, so it feels a bit inelegant to start with the assumption that it exists.

Now, let’s recall Indifference and Substitution from the previous post, slightly reworded to remove reference to the values of the games:

Indifference: If two games $G$ and $G’$ differ only by the labels of the measurements, then $G \sim G’$.

And to the proof. First we show that any complex phases are irrelevant. For that, consider the game
\[ \ket{G} = \alpha e^{i\phi}\ket{M_0}\ket{r_0}+\beta e^{i\varphi}\ket{M_1}\ket{r_1}.\] By Substitution, we can replace the rewards $r_0$ and $r_1$ with the degenerate games $e^{-i\phi}\ket{D_0}\ket{r_0}$ and $e^{-i\varphi}\ket{D_1}\ket{r_1}$, and Amir must be indifferent between $G$ and the game \[ \ket{G’} = \alpha \ket{M_0}\ket{D_0}\ket{r_0}+\beta\ket{M_1}\ket{D_1}\ket{r_1}.\]Since $G’$ can be obtained from a third game \[\ket{G’^\prime} = \alpha \ket{M_0}\ket{r_0}+\beta \ket{M_1}\ket{r_1}\] via Substitution, this accumulation of measurements does not matter either, and we have Amir must be indifferent to any phases.

This allows us to restrict our attention to positive amplitudes. It does not, however, allow us to restrict our attention to amplitudes which are square roots of rational numbers, but we shall do it anyway because the argument for all real numbers is boring. Consider then two games \[ \ket{G} = \sum_{ij}\sqrt{\frac{p_{ij}}{q_{ij}}}\ket{M^j_i}\ket{r_j}\quad\text{and}\quad \ket{G’} = \sum_{ij}\sqrt{\frac{a_{ij}}{b_{ij}}}\ket{D^j_i}\ket{r_j}\] for which
\[\forall j\quad \sum_i\frac{p_{ij}}{q_{ij}} = \sum_i\frac{a_{ij}}{b_{ij}}.\] We shall show that $G \sim G’$. First focus on the reward $r_0$. We can rewrite the amplitudes of the measurement results that give $r_0$ so that they have the same denominator in both games by defining $d_0 = \prod_i q_{i0}b_{i0},$ and the integers $p_{i0}’ = d_0 p_{i0}/q_{i0}$ and $a_{i0}’ = d_0 a_{i0}/b_{i0}$, so that
\[ \frac{p’_{i0}}{d_{0}} = \frac{p_{i0}}{q_{i0}}\quad\text{and}\quad\frac{a’_{i0}}{d_{0}} = \frac{a_{i0}}{b_{i0}}.\]The parts of the games associated to reward $r_0$ are then
\[ \frac1{\sqrt{d_0}}\sum_i \sqrt{p_{i0}’}\ket{M^0_i}\ket{r_0} \quad\text{and}\quad \frac1{\sqrt{d_0}}\sum_i \sqrt{a_{i0}’}\ket{D^0_i}\ket{r_0}, \] and using again Substitution we replace the reward given for measurement results $M^0_i$ and $D^0_i$ with the trivial games \[\frac1{\sqrt{p_{i0}’}}\sum_{k=1}^{p_{i0}’}\ket{P_k}\ket{r_0}\quad\text{and}\quad \frac1{\sqrt{a_{i0}’}}\sum_{k=1}^{a_{i0}’}\ket{P_k}\ket{r_0}, \] obtaining
\[ \frac1{\sqrt{d_0}}\sum_i\sum_{k=1}^{p_{i0}’}\ket{M^0_i}\ket{P_k}\ket{r_0} \quad\text{and}\quad \frac1{\sqrt{d_0}}\sum_i \sum_{k=1}^{a_{i0}’}\ket{D^0_i}\ket{P_k}\ket{r_0}, \] which are just uniform superpositions, with $\sum_ip_{i0}’$ terms on the left hand side and $\sum_ia_{i0}’$ on the right hand side. Judicious use of Indifference and Substitution can as before erase the differences in the piles of measurements, taking them to
\[ \frac1{\sqrt{d_0}}\sum_{l=1}^{\sum_ip_{i0}’}\ket{C_l}\ket{r_0} \quad\text{and}\quad \frac1{\sqrt{d_0}}\sum_{l=1}^{\sum_ia_{i0}’}\ket{C_l}\ket{r_0}. \] Now by assumption we have that \[\sum_i\frac{p’_{i0}}{d_{0}} = \sum_i\frac{p_{i0}}{q_{i0}} = \sum_i\frac{a_{i0}}{b_{i0}} = \sum_i\frac{a’_{i0}}{d_{0}},\]so the number of terms on both sides are the same, so the $r_0$ parts of the games are equivalent. Since this same argument can be repeated for all other $r_j$, Equivalence is proven.

With the decision theory from the previous post already giving us probabilities, all that is left to do is add the Many-Worlds interpretation and show that these probabilities must actually be given by the Born rule. Sounds easy, no?

But we don’t actually need the whole Many-Worlds interpretation, just some stylized part of it that deals with simple measurement scenarios. We only need to say that when someone makes a measurement on (e.g) a qubit in the state \[ \alpha\ket{0} + \beta\ket{1},\] what happens is not a collapse into $\ket{0}$ or $\ket{1}$, but rather a unitary [3]Applied on the state together with the measurement apparatus and the environment, or an isometry applied only on the state. But let’s not be pedantic. evolution into the state \[\alpha\ket{0}\ket{M_0}+\beta\ket{1}\ket{M_1},\] which represents a macroscopic superposition of the measurement device showing result $M_0$ when the qubit is in the state $\ket{0}$ with the device showing result $M_1$ when the qubit is in the state $\ket{1}$.

We want to use these measurements to play the decision-theoretical games we were talking about in the previous post. To do that, we just say that Amir will get a reward depending on the measurement result: reward $r_0$ if the result is $M_0$, and reward $r_1$ if the result is $M_1$. We can represent this simply by appending the reward into the correct branch of the above macroscopic superposition, taking it to \[\alpha\ket{0}\ket{M_0}\ket{r_0}+\beta\ket{1}\ket{M_1}\ket{r_1}.\] Since this state has all the information we need to define the game – the amplitudes, the measurement results, and the rewards – we can use it as the representation of the game. So when we need to write down a game $G$, we shall do this by using the state[2]For brevity we are omitting the qubit.
\[ \ket{G} = \alpha\ket{M_0}\ket{r_0} + \beta\ket{M_1}\ket{r_1}.\] And this is pretty much all we need from quantum mechanics.

Now we need to state two further rationality axioms, and we can proceed to the proof. The first one is that Amir must not care about what do we call the measurement results: if $0$ and $1$, or $\uparrow$ and $\downarrow$, or $H$ and $V$, it doesn’t matter. If two games are the same thing but for the labels of the measurement results, Amir must value these games equally:

Indifference: If two games $G$ and $G’$ differ only by the labels of the measurements, then $V(G) = V(G’)$.

The other axiom says that Amir must be indifferent between receiving reward $r_0$ or playing a game that gives reward $r_0$ independently of the measurement result, even when this reward was part of a previous game:

Now, to the proof. Consider the games
\begin{align*}
\ket{G} &= \frac1{\sqrt2}(\ket{M_0}\ket{r_0} + \ket{M_1}\ket{r_1}) \\
\ket{G’} &= \frac1{\sqrt2}(\ket{M_0}\ket{r_1} + \ket{M_1}\ket{r_0}) \\
\ket{G’^\prime} &= \frac1{\sqrt2}(\ket{M_0}\ket{r_0+r_1} + \ket{M_1}\ket{r_0+r_1}) \\
\end{align*}
By Additivity, from the previous post, we have that
\[V(G’^\prime) = V(G) + V(G’),\] and from Constancy that $V(G’^\prime) = r_0 + r_1$, so we already know that $V(G) + V(G’) = r_0 + r_1$. But the games $G$ and $G’$ are just relabellings of each other, so by Indifference we must have $V(G) = V(G’)$, so we can conclude that \[V(G) = \frac12(r_0+r_1)\] or, in other words, that quantum states with amplitude $1/\sqrt{2}$ click with probability $1/2$.

We can easily extend this argument to show that games involving a uniform superposition of $n$ states \[ \ket{G} = \frac{1}{\sqrt{n}}\sum_{i=0}^{n-1}\ket{M_i}\ket{r_i}\] must have value[3]If it doesn’t sound that easy, just consider the $n$ cyclic permutations \[ \ket{G_k} = \frac{1}{\sqrt{n}}\sum_{i=0}^{n-1}\ket{M_i}\ket{r_{i\oplus k}},\]use Additivity and Constancy to show that \[ \sum_{k=0}^{n-1}V(G_k) = \sum_{i=0}^{n-1}r_i,\]and invoke Indifference to show that \[V(G_k) = V(G).\] \[ V(G) = \frac1n\sum_{i=0}^{n-1}r_i.\] Now we need to deal with non-uniform superpositions. Consider the games
\[ \ket{G} = \sqrt{\frac{2}{3}}\ket{M_0}\ket{r_0} + \frac{1}{\sqrt3}\ket{M_1}\ket{r_1}\] and
\[ \ket{G’} = \frac1{\sqrt2}(\ket{D_0}\ket{r_0} + \ket{D_1}\ket{r_0}). \] By Constancy the value of $G’$ is $r_0$, and by Substitution the value of $G$ must be equal to the value of
\begin{align*}
\ket{G’^\prime} &= \sqrt{\frac{2}{3}}\ket{M_0}\ket{G’} + \frac{1}{\sqrt3}\ket{M_1}\ket{r_1} \\
&= \frac{1}{\sqrt3}\ket{M_0}\ket{D_0}\ket{r_0} + \frac{1}{\sqrt3}\ket{M_0}\ket{D_1}\ket{r_0} + \frac{1}{\sqrt3}\ket{M_1}\ket{r_1}.
\end{align*}
But $G’^\prime$ is just a uniform superposition, so from the previous argument we know that
\[ V(G’^\prime) = \frac13(r_0+r_0+r_1),\] and therefore that
\[ V(G) = \frac23r_0+\frac13r_1.\] Using analogous applications of Substitution we can show that for any positive integers $n$ and $m$ the value
\[ \ket{G} = \sqrt{\frac{n}{n+m}}\ket{M_0}\ket{r_0} + \sqrt{\frac{m}{n+m}}\ket{M_1}\ket{r_1}\] is
\[ V(G) = \frac{n}{n+m}r_0+\frac{m}{n+m}r_1,\] and we are pretty much done. To extend the argument to any positive real amplitudes one only needs a continuity assumption[4]Which I shall not state or work out explicitly, and to extend it to arbitrary complex amplitudes we can do a little trick: consider the game with a single outcome \[\ket{G} = e^{i\phi}\ket{M_0}\ket{r_0}.\] By Constancy the value of this game is $r_0$, independently of the phase $e^{i\phi}$. Now consider the game
\[ \ket{G} = \sqrt{\frac{n}{n+m}}e^{i\phi}\ket{M_0}\ket{r_0} + \sqrt{\frac{m}{n+m}}e^{i\varphi}\ket{M_1}\ket{r_1}.\] By Substitution we can replace the rewards $\ket{r_0}$ and $\ket{r_1}$ with the single outcome games $e^{-i\phi}\ket{D_0}\ket{r_0}$ and $e^{-i\varphi}\ket{D_1}\ket{r_1}$ without changing its value, so the phases play no role in determining the value of the game.

To summarize, we have show that the value of a game \[\ket{G} = \alpha\ket{M_0}\ket{r_0} + \beta\ket{M_1}\ket{r_1}\] must be given by \[ V(G) = |\alpha|^2r_0 + |\beta|^2r_1,\]which is just the Born rule.

There exists a problem in the world, that is even more pressing than the position of cheese in the cheeseburger emoji: namely that nobody™ understands the Deutsch-Wallace theorem. I’ve talked to a lot of people about it, and the usual reaction I get is that they have heard of it, are vaguely interested in how can one prove the Born rule, but have no idea how Deutsch and Wallace actually did it.

It’s hard to blame them. The original paper by Deutsch is notoriously idiosyncratic: he even neglected to mention that one of his assumptions was the Many-Worlds interpretation[5]I guess that for him this was as pointless as writing in an astronomy paper that the author is assuming that the Earth rotates around the Sun.! Several people wrote papers trying to understand it: Barnum et al. mistakenly concluded that Deutsch was simply wrong, Gill made a valiant effort but gave up without a conclusion, and Wallace finally succeeded, formalising Deutsch’s proof and putting it in context.

Wallace was not successful, however, in popularising the theorem. I think this is because his paper is a 27-page mess. It did not help, either, that Wallace quickly moved on to improving and formalising Deutsch’s theorem, providing an even more complicated proof from weaker assumptions, leaving the community with no easy entry point into this confusing literature.

To fill this hole, then, I’m writing two “public service” blog posts. The first (this one) is to explain how to derive probabilities from decision theory, and the second is to show how this decision-theoretical argument, together with the Many-Worlds interpretation, yields the Born rule.

Unlike Deutsch, I’m going to use a standard decision theory, taken from the excellent “The Foundations of Causal Decision Theory” by James Joyce. We’re going to consider a simple betting scenario, where an agent – called Amir – decides how much to pay to take part in a game where he receives $a$ euros if event $E$ happens, and $b$ euros if event $\lnot E$ happens[2]We’ll consider games with only two events simply to avoid weighting the argument down with annoying notation. The generalisation to $n$ outcomes is trivial. The game is then defined by the vector $(a,b)$, and Amir wants to decide its value $V(a,b)$.

The first rationality axiom we demand is that if the game is certain to pay him back $c$ euros, he must assign value $c$ to the game. This means that the Amir is indifferent to betting per se, they don’t demand some extra compensation to go through the effort of betting, nor are they accept to lose money just go experience the thrill of betting (unlike real gambling addicts, I must say). The axiom is then

Constancy: $V(c,c) = c$.

The second axiom we demand is that if for a pair of games $(a,b)$ and $(c,d)$ it happens that $a \ge c$ and $b \ge d$, that is, if in both cases where $E$ happens or $\lnot E$ happens the first game pays a reward that is larger or equal than the second game, then Amir must value the first game no less than the second game. The axiom is then

Dominance: if $(a,b) \ge (c,d)$ then $V(a,b) \ge V(c,d)$.

The third and last axiom we need sounds very innocent: if Amir is willing to pay $V(a,b)$ to play the game with rewards $(a,b)$, and thinks that playing for rewards $(c,d)$ is worth $V(c,d)$, then the price he should pay for getting the rewards $(a+c,b+d)$ must be $V(a,b) + V(c,d)$. In other words: it shouldn’t matter if tickets for the game with rewards $(a+c,b+d)$ are sold at once, or broken down into first a ticket for rewards $(a,b)$ followed by a ticket for rewards $(c,d)$. The axiom is then

Additivity: $V(a+c,b+d) = V(a,b) + V(c,d)$.

One problem with Additivity is that real agents don’t behave like this. People usually assign values such that $V(a+c,b+d) < V(a,b) + V(c,d)$, because if you have nothing then 10€ might be the difference between life and death, whereas if you already have 10,000€ then 10€ is just a nice gift. Besides not matching reality, this linear utility function implied by Additivity causes pathological decisions such as the St. Petersburg paradox or Pascal’s Wager. But these problems do not appear if the amounts at stake are small compared to Amir’s wealth, which we can assume to be the case, and Additivity makes for a rather simple and elegant decision theory, so we’ll use it anyway[3]Deutsch assumed instead a weaker version of additivity, but that also implied linear utilities, so I don’t think there is any point in using his version.. After all, I’m not writing for the people whose objection to the Deutsch-Wallace theorem is that Deutsch’s decision theory implies linear utilities, but rather for those whose objection is “What the hell is going on?”.

Now, to work. First we shall show how Additivity allows us to write the value of any game as a function of the value of the elementary games $(1,0)$ and $(0,1)$. Additivity immediately implies that
\[V(a,b) = V(a,0) + V(0,b),\]and that for any positive integer $n$
\[V(na,0) = nV(a,0).\]Taking now $a=1/n$, the previous equation gives us that \[V(1,0) = nV(1/n,0),\] or that \[V(1/n,0) = \frac1n V(1,0).\] Considering $m$ such games, we have now that \[V(m/n,0) = \frac{m}{n} V(1,0)\] for any positive rational $m/n$. We can extend this to all rationals if we remember that by Constancy $V(0,0) = 0$ and that by Additivity
\[V(0,0) = V(m/n,0) + V(-m/n,0).\]Now one could extend this argument to all reals by taking some continuity assumption, but I don’t think it is interesting to do so. I’d rather assume that one can only have rational amounts of euros[4]Dollars, on the other hand, can come in irrational amounts.. Anyway, now we have shown that for all rational $a$ and $b$ we have that
\[V(a,b) = aV(1,0) + bV(0,1).\]What is left to see is that the values of the elementary games $(1,0)$ and $(0,1)$ behave like probabilities. If we consider Constancy with $c=1$ we have that \[V(1,0) + V(0,1) = 1,\] so these “probabilities” are normalised. If we now consider Dominance, we get that
\[ V(1,0) \ge V(0,0) = 0,\]so the “probabilities” are positive. Is there anything left to show? Well, if you are a Bayesian, no. The probability of an event $E$ is defined as the price a rational agent would pay for a lottery ticket that gives then 1€ if $E$ happens and nothing otherwise. Bayesians have the obligation to show that these probabilities to obey the usual Kolmogorov axioms, but on the interpretational side there is nothing left to explain.

Consider the entirely hypothetical situation where you are in a physics conference with really bad wifi. Either because the router has a hard limit in the amount of devices that can connect simultaneously, or the bandwidth is too small to handle everyone’s OwnClouds trying to sync, or it is a D-Link. The usual approach is just to be pissed off and and complain to the organizers, to no avail (while ignoring the talks and trying to reconnect like crazy). Here I’d like to describe a different approach, that if not morally commendable at least lead to more results: blackholing.

To blackhole, what you do is to create a hotspot with your phone with the same name, encryption type, and password as the conference wifi. You then disable the data connection of your phone, and turn on the hotspot. What happens is that the devices of the people close to you will automatically disconnect from the conference router and connect to your hotspot instead, since they will think that your hotspot is a repeater with a stronger signal. But since you disabled your data connection, they are connecting to a sterile hotspot, so you are creating a kind of wifi black hole. To the people far from your, however, this is a gift from the gods, as they keep connected to the conference router, and can use the bandwidth that was freed up by the poor souls that fell in your black hole.

The question is, is it moral to do this? Obviously the people who did fall in your black hole are not going to like it, but one thing to notice is that this technique is intrinsically altruistic, as you cannot use wifi either, since you are in the middle of the black hole (and as far as I know it is not possible to defend oneself against it). It is even more altruistic if you like to sit close to your friends, who will then sacrifice their wifi in favour of a more distant acquaintance. It does become immoral if you arrange with a friend to sit close to the conference router, and you blackhole some random people far from it with the specific intent of giving your friend wifi, without caring about the other people who will also get it.

But let’s consider that you don’t have such tribalistic morals, and consider everyone’s welfare equally. Then the question is whether the utility of $n$ people with bad wifi is smaller than the utility of $k$ people with no wifi and $n-k$ people with good wifi, that is, whether
\[ n\, U(\text{bad wifi}) \le k\,U(\text{no wifi}) + (n-k)\,U(\text{good wifi}). \]Now, assuming that the utility is a function only of the bandwidth available, this simplifies to
\[ n\,U(B/n) \le k\,U(0) + (n-k)\,U(B/(n-k)),\]where $B$ is the total bandwidth of the conference router. Therefore, to determine whether blackholing is moral or not we need to find out how people’s happiness scale as a function of the available bandwidth.

One immediately sees that if the happiness scales linearly with the bandwidth, it is indifferent whether to blackhole or not. But to make relevant moral judgements, we need to find out what the actual utility functions are. By asking people around, I empirically determined that
\[ u(x) = \frac{1}{1+\left(\frac{B_0}{x}\right)^2}, \]where $B_0$ is the critical bandwidth that allows people to do basic surfing. Substituting in the previous inequality, we see that blackholing is moral iff
\[ k \le \frac{n^2 – \left(\frac{B}{B_0}\right)^2}{n}, \]which is better understood if we rewrite $\frac{B}{B_0} = fn$, that is, as the fraction $f$ of people that can do basic surfing with the given bandwidth. We have then
\[ k \le (1-f^2)n, \]which shows that if $f = 1$ it is never moral to blackhole, whereas if $f \approx 0$ it always is. In an hypothetical conference held in Paraty with $n=100$ and $\frac{B}{B_0} = 50$, it is moral to blackhole up to $k=75$ people.

Last week two curiouspapers appeared on the arXiv, one by Marletto and Vedral, and the other by Bose et al., proposing to test whether the gravitational field must be quantized. I think they have a nice idea there, that is a bit obscured by all the details they put in the papers, so I hope the authors will forgive me for butchering their argument down to the barest of the bones.

The starting point is a worryingly common idea that maybe the reason why a quantum theory of gravity is so damn difficult to make is because gravity is not actually quantum. While concrete models of “non-quantum gravity” tend to be pathological or show spectacular disagreement with experiment, there is still a lingering hope that somehow a non-quantum theory of gravity will be made to work, or that at least a semi-classical model like QFT in a curved spacetime will be enough to explain all the experimental results we’ll ever get. Marletto and Bose’s answer? Kill it with fire.

Their idea is to put two massive particles (like neutrons) side-by-side in two Mach-Zender interferometers, in such a way that their gravitational interaction is non-negligible in only one of the combination of arms, and measure the resulting entanglement as proof of the quantumness of the interaction.

More precisely, the particles start in the state \[ \ket{L}\ket{L}, \] which after the first beam splitter in each of the interferometers gets mapped to \[ \frac{\ket{L} + \ket{R}}{\sqrt2}\frac{\ket{L} + \ket{R}}{\sqrt2} = \frac12(\ket{LL} + \ket{LR} + \ket{RL} + \ket{RR}), \] which is where the magic happens: we can put these interferometers together in such a way that the right arm of the first interferometer is very close to the left arm of the second interferometer, and all the other arms are far away from each other. If the basic rules of quantum mechanics apply to gravitational interactions, this should give a phase shift corresponding to the gravitational potential energy to the $\ket{RL}$ member of the superposition, resulting in the state
\[ \frac12(\ket{LL} + \ket{LR} + e^{i\phi}\ket{RL} + \ket{RR}), \] which can even be made maximally entangled if we manage to make $\phi = \pi$. Bose promises that he can get us $\phi \approx 10^{-4}$, which would be a tiny but detectable amount of entanglement. If we now complete the interferometers with a second beam splitter, we can do complete tomography of this state, and in particular measure its entanglement.

Now I’m not sure about what “non-quantum gravity” can do, but if it can allow superpositions of masses to get entangled via gravitational interactions, the “non-quantum” part of its name is as appropriate as the “Democratic” in Democratic People’s Republic of Korea.

EDIT: Philip Ball has updated his article on Nature News, correcting the most serious of its errors. While everyone makes mistakes, few actually admit to them, so I think this action is rather praiseworthy. Correspondingly, I’m removing criticism of that mistake in my post.

Recently I have read an excellent essay by Philip Ball on the measurement problem: clear, precise, non-technical, free of bullshit and mysticism. I was impressed: a journalist managed to dispel confusion about a theme that even physicists themselves are confused about. It might be worth checking out what this guy writes in the future.

I was not so impressed, however, when I saw his article about quantum teleportation, reporting on Jian-Wei Pan’s group amazing feat of teleporting a quantum state from a ground station to a satellite. While Philip was careful to note that nothing observable is going on faster than light, he still claims that something unobservable is going on faster than light, and that there is some kind of conspiracy by Nature to cover that up. This is not only absurd on its face, but also needs the discredited notion of wavefunction collapse to make sense, which Philip himself noted was replaced by decoherence as a model of how measurements happen. For these reasons, very few physicists still take this description of the teleportation protocol seriously. It would be nice if the media would report on the current understanding of the community instead of repeating misconceptions from the 90s.

But enough ranting. I think the best way to counter the spreading of misinformation about quantum mechanics is not to just criticize people who get it wrong, but instead to give the correct explanation about the phenomena. I’m going to explain it twice, first in a non-technical way in the hope of helping interested laypeople, and then in a technical way, for people who do know quantum mechanics. So, without further ado, here’s how quantum teleportation actually works (this is essentially Deutsch and Hayden‘s description):

Alice has a quantum bit, which she wants to transmit to Bob. Quantum bits are a bit like classical bits as they can be in the states 0 or 1 (and therefore used to store information like blogs or photos[5]But using quantum bits to store text is a monumental waste, like building Belo Monte dam to power a single lightbulb.), and entirely unlike classical bits as they can also be in a superposition of 0 and 1. Now if Alice had a classical bit, it would be trivial to transmit it to Bob: she would just use the internet. But the internet cannot handle superpositions between 0 and 1: if you tried to send a qubit via the internet you would lose this superposition information (the Dutch are working on this, though). To preserve this superposition information Alice would need an expensive direct optical fibre connection to Bob’s place, that we assume she doesn’t have.

What she do? She can try to measure this superposition information, record it in classical bits, and transmit those via the internet. But superposition information is incredibly finicky: if Alice has only one copy of the qubit, she cannot obtain it. She can only get a good approximation to it if she measures several copies of the qubit. Which she might not have, or even if she does, it will be only an approximation to her qubit, not the real deal.

So again, what can she do? That’s where quantum teleportation comes in. If Alice and Bob share a Bell state (a kind of entangled state), they can use it to transmit this fragile superposition information perfectly. Alice needs to do a special kind of measurement — called Bell basis measurement — in the qubit she wants to transmit together with her part of the Bell state. Now, this is where everyone’s brains melt and all the faster-than-light nonsense comes from. It appears that after Alice does her measurement the part of the Bell state that belongs to Bob instantaneously becomes the qubit Alice wanted to send, just with some error that depends on her measurement result. In order to correct the error, Bob then needs to know Alice’s measurement result, which he can only find out after a light signal has had time to propagate from her lab to his. So it is as if Nature did send the qubit faster than light, but cleverly concealed this fact with this error, just so that we wouldn’t see any violation of relativity. Come on. Trying to put ourselves back in the centre of the universe, are we?

Anyway, this narrative only makes sense if you believe in some thoroughly discredit interpretations of quantum mechanics[2]old-school Copenhagen or collapse models. If you haven’t kept your head buried in the sand in the last decades, you know that measurements work through decoherence: Alice’s measurement is not changing the state of Bob in any way. She is just entangling her qubit with the Bell state and herself and anything else that comes in the way. And this entanglement spreads just through normal interactions: photons going around, molecules colliding with each other. Everything very decent and proper, nothing faster than light.

Now, in this precious moment after she has done her measurement and before this cloud of decoherence has had time to spread to Bob’s place, we can compare the silly story told in the previous paragraph with reality. We can compute the information about Alice’s qubit that is available in Bob’s place, and see that it is precisely zero. Nature is not trying to conceal anything from us, it is just a physical fact that the real quantum state that describes Alice and Bob’s systems is a complicated entangled state that contains no information about Alice’s qubit in Bob’s end. But the cool thing about quantum teleportation is that if Bob knows the measurement result he is able to sculpt Alice’s qubit out of this complicated entangled state. But he doesn’t, because the measurement result cannot get to him faster than light.

Now, if we wait a couple of nanoseconds more, the cloud of decoherence hits Bob, and then we are actually in the situation where Bob’s part of the Bell state has become Alice’s qubit, modulo some easily correctable error. But now there is no mystery to it: the information got there via decoherence, no faster than light.

Now, for the technical version: Alice has a qubit $\ket{\Gamma} = \alpha\ket{0} + \beta\ket{1}$, which she wishes to transmit to Bob, but she does not have a good noiseless quantum transmission channel that she can use, just a classical one (aka the Internet). So what can they do? Luckily they have maximally entangled state $\ket{\phi^+} = \frac1{\sqrt2}(\ket{00}+\ket{11})$ saved from the time when they did have a good quantum channel, so they can just teleport $\ket{\Gamma}$.

To do that, note that initial state they have, written in the order Alice’s state, Alice’s part of $\ket{\phi^+}$, and Bob’s part of $\ket{\phi^+}$, is
\[ \ket{\Gamma}\ket{\phi^+} = \frac{1}{\sqrt2}( \alpha\ket{000}+\alpha\ket{011} + \beta\ket{100} + \beta{111}), \] and if we rewrite the first two subsystems in the Bell basis we obtain
\[ \ket{\Gamma}\ket{\phi^+} = \frac{1}{2}( \ket{\phi^+}\ket{\Gamma} + \ket{\phi^-}Z\ket{\Gamma} + \ket{\psi^+}X\ket{\Gamma} + \ket{\psi^-}XZ\ket{\Gamma}),\] so we see that conditioned on Alice’s state being a Bell state, Bob’s state is just a simple function of $\ket{\Gamma}$. Note that at this point nothing was done to the quantum system, so Bob’s state did not change in any way. If we calculate the reduce density matrix at his lab, we see that it is the maximally mixed state, which contains no information about $\ket{\Gamma}$ whatsoever.

Now, clearly we want Alice to measure her subsystems in the Bell basis to make progress. She does that, first applying an entangling operation to map the Bell states to the computational basis, and then she makes the measurement in the computational basis.[3]This is how any measurement is done in quantum mechanics, nothing special about teleportation here After the entangling operation, the state is
\[ \frac{1}{2}( \ket{00}\ket{\Gamma} + \ket{01}Z\ket{\Gamma} + \ket{10}X\ket{\Gamma} + \ket{11}XZ\ket{\Gamma}),\] and making a measurement in the computational basis — for now modelled in a coherent way — and storing the result in two extra qubits results in the state
\[ \frac{1}{2}( \ket{00}\ket{00}\ket{\Gamma} + \ket{01}\ket{01}Z\ket{\Gamma} + \ket{10}\ket{10}X\ket{\Gamma} + \ket{11}\ket{11}XZ\ket{\Gamma}).\] Now something was done to this state, but still there is no information at Bob’s: his reduced density matrix is still the maximally mixed state. Looking at this entangled state, though, we see that if Bob applies the operations $\mathbb{I}$, $X$, $Z$, or $ZX$ to his qubit conditioned on the measurement result he will extract $\ket{\Gamma}$ from it. So Alice simply sends the qubits with the measurement result to Bob, who uses it to get $\ket{\Gamma}$ in his side, the teleportation protocol is over, and Alice and Bob lived happily ever after. Nothing faster than light happened, and the information from Alice to Bob clearly travelled through the qubits with the measurement results. The interesting thing we saw was that by expending one $\ket{\phi^+}$ and by sending two classical bits we can transmit one quantum bit. Everything ok?

No, no, no, no, no!, you complain. What was this deal about modelling a measurement coherently? This makes no sense, measurements must by definition cause lots of decoherence! Indeed, we’re getting there. Now with decoherence, the state after the measurement in the computational basis is \[ \frac{1}{2}( \ket{E_{00}}\ket{00}\ket{00}\ket{\Gamma} + \ket{E_{01}}\ket{01}\ket{01}Z\ket{\Gamma} + \ket{E_{10}}\ket{10}\ket{10}X\ket{\Gamma} + \ket{E_{11}}\ket{11}\ket{11}XZ\ket{\Gamma}),\] where $\ket{E_{ij}}$ is the state of the environment, labelled according to the result of the measurement. You see that there is no collapse of the wavefunction[4]Contrary to popular misconception, most physicists actually agree that there is no collapse, except for the few ones that believe in collapse models. The consensus is that the apparent collapse is just an agent updating their knowledge about the quantum state, and this is mostly correct.: in particular Bob’s state is in the same entangled superposition as before, and his reduced density matrix is still the maximally mixed state. Moreover, as any physical process, decoherence spreads at most as fast as the speed of light, so even after Alice has been engulfed by the decoherence and has obtained a definite measurement result, Bob will still for some time remain unaffected by it, with the state still being adequately described by the above superposition. Only after a relativity-respecting time interval he will become engulfed as well, coherence will be killed, and the state relative to him and Alice will be adequately described by (e.g.) \[ \ket{E_{10}}\ket{10}\ket{10}X\ket{\Gamma}.\] Now we are in the situation people usually describe: his qubit is in a definite state, and he merely does not know which is it. Alice then sends him the measurement result — 10 — via the Internet, from which he deduces that he needs to apply operation $X$ to recover $\ket{\Gamma}$, and now the teleportation protocol is truly over.

But what if one does not want general quantum operations, but wants to single out pure quantum operations? Can one have such an axiomatic description, a derivation from intuitive[5]For physicist values of intuitive. If you try to explain to normal people that quantum operations should be linear you’ll be met with a blank stare. assumptions?

Well, the usual argument one sees in textbooks to show that the evolution of quantum states must be given by a unitary assumes that the evolution

Is linear.

Maps pure quantum states to pure quantum states.

From this, you get that a quantum state $\ket{\psi}$ is mapped to a quantum state $U\ket\psi$ for a linear operator $U$, and furthermore since by definition quantum states have 2-norm equal to 1, we need the inner product $\bra\psi U^\dagger U \ket\psi$ to be 1 for all $\ket\psi$, which implies that $U$ must be a unitary matrix.

The only problem with this argument is that it is false, as the map
\[ \mathcal E(\rho) = \ket\psi\bra\psi \operatorname{tr} \rho, \]which simply discards the input $\rho$ and prepares the fixed state $\ket\psi$ instead is linear, maps pure states to pure states, and is not unitary. The textbooks are fine, as they usually go through this argument before density matrices are introduced, and either implicitly or explicitly state that the evolution takes state vectors to state vectors. But this is not good enough for us, as this restriction to state vectors is both unjustified, and does not satisfy our requirement of being an “intuitive assumption”.

Luckily, the fix is easy: we just need to add the analogue of the third assumption used in the derivation of general quantum operations. If we assume that a pure quantum operation

Is linear.

Maps pure quantum states to pure quantum states.

Still maps pure quantum states to pure quantum states when applied to a part of a quantum system.

then we can prove that pure quantum operations are just unitaries[2]Or actually isometries, to be more precise. But the distinction between isometries and unitaries is not interesting in this case, as one can simply increase the dimension of the input Hilbert space and complete the isometry into a unitary.. Since the proof is simple, I’m going to show it in full.

Let $\mathcal F$ be the pure quantum operation we are interested in. If we apply it to the second subsystem of a maximally entangled state, $\ket{\phi^+} = \frac1{\sqrt d}\sum_{i=1}^d \ket{ii}$, by assumption 3 the result will be a pure state, which we call $\ket{\varphi}$. In symbols, we have
\[ \mathcal I \otimes \mathcal F (\ket{\phi^+}\bra{\phi^+}) = \ket{\varphi}\bra{\varphi}, \]where $\mathcal I$ represents doing nothing to the first subsystem. Now the beautiful thing about the maximally entangled state is that if $\mathcal F$ is a linear map then $\mathcal I \otimes \mathcal F (\ket{\phi^+}\bra{\phi^+})$ contains all the information about $\mathcal F$. In fact, if we know $\mathcal I \otimes \mathcal F (\ket{\phi^+}\bra{\phi^+})$ we can know how $\mathcal F$ acts on any matrix $\rho$ via the identity
\[ \mathcal F (\rho) = \operatorname{tr}_\text{in} [(\rho^T \otimes \mathbb I) \mathcal I \otimes \mathcal F (\ket{\phi^+}\bra{\phi^+})]. \]
This is the famous Choi-Jamiołkowski isomorphism[3]It might be interesting to check the identity if you don’t already know it. It helps to use the representation $\mathcal F(\rho) = \sum_i A_i \rho B_i$, which is valid for any linear map.. Now let’s use the fact that the result $\ket{\varphi}\bra{\varphi}$ is a pure state. If we write it down in the computational basis
\[\ket\varphi = \sum_{i,j=1}^d \varphi_{ij} \ket{i j}, \]we see that if we define a matrix $\Phi$ with elements $\Phi_{ij} = \varphi_{ji} \sqrt d$ then $\ket\varphi = \mathbb I \otimes \Phi \ket{\phi^+}$[4]This is sometimes known as the pure Choi-Jamiołkowski isomorphism, so
\[ \mathcal I \otimes \mathcal F (\ket{\phi^+}\bra{\phi^+}) = (\mathbb I \otimes \Phi) \ket{\phi^+}\bra{\phi^+} (\mathbb I \otimes \Phi^\dagger).\]
Using the identity above we have that
\[ \mathcal F(\rho) = \Phi \rho \Phi^\dagger, \]and since $\operatorname{tr}(\mathcal F(\rho)) = 1$ for every $\rho$ we have that $\Phi^\dagger\Phi = \mathbb I$, so $\Phi$ is an isometry. If in addition we demand that $\mathcal F(\rho)$ has the same dimension as $\rho$, then $\Phi$ must be a square matrix, and therefore has a right inverse which is equal to its left inverse, so $\Phi$ is a unitary.

This result is so amazing, so difficult, and so ground-breaking that the referees allowed me to include it as a footnote in my most recent paper without bothering to ask for a proof or a reference. But joking aside, I’d be curious to know if somebody already wrote this down, as a quick search through the textbooks revealed me nothing.

But how about Wigner’s theorem, I hear you screaming. Well, Wigner was not concerned with deriving what were the quantum operations, but what were the symmetry transformations one could apply to quantum states. Because of this he did not assume linearity, which was not relevant to him (and in fact would make his theorem wrong, as one can have perfectly good anti-linear symmetries, such as time reversal). Also, he assumed that symmetry transformations preserve inner products, which is too technical for my purposes.

But clearly this is the wrong way to formulate the question, as there are interesting things to be said about the probabilities of infinite sequences of coin tosses. The situation is analogous to uniformly sampling real numbers from the $[0,1]$ interval: the probability of obtaining any specific number is just 0. The solution, however, is simple: we ask instead what is the probability of obtaining a real number in a given subinterval. The analogous solution works for the case of coin tosses: instead of asking the probability of a single infinite sequence, one can ask the probability of obtaining an infinite sequence that starts with a given finite sequence.

To be more concrete, let’s say that the probability of obtaining Heads in a single coin toss is $p$, and for brevity let’s denote the outcome Heads by 1 and Tails by 0. Then the probability of obtaining the sequence 010 is $p(1-p)^2$, which is the same as the probability of obtaining the sequence 0100 or the sequence 0101, which is the same as the probability of obtaining a sequence in the set {01000, 01001, 01010, 01011}, which is the same as the probability of obtaining an infinite sequence that starts with 010.

There is nothing better to do with infinite sequences of zeroes and ones than mapping them into a real number in the interval $[0,1]$, so we shall do that. The set of infinite sequences that start with 010 are then very conveniently represented by the interval $[0.010,0.010\bar1]$, also known as $[0.010,0.011]$ for those who do not like infinite strings of ones, or $[0.25,0.375]$ for those who do not like binary. Saying then that the probability of obtaining a sequence in $[0.010,0.010\bar{1}]$ is $p(1-p)^2$ is assigning a measure to this interval, which we write as
\[ \rho([0.010,0.010\bar{1}]) = p(1-p)^2 \]
Now if we can assign a sensible probability to every interval contained in $[0,1]$ we can actually extend it into a proper probability measure over the set of infinite sequences of coin tosses using standard measure-theoretical arguments. For me this is the right answer to the question posed on the title of this post.

So, how do we go about assigning a sensible probability to every interval contained in $[0,1]$? Well, the argument of the previous paragraph can clearly be extended to any interval of the form $[k/2^n, (k+1)/2^n]$. We just need write $k$ in the binary basis, padded with zeroes on the left until it reaches $n$ binary digits, and count the number of 0s and 1s. In symbols:
\[ \rho\left(\left[\frac{k}{2^n}, \frac{k+1}{2^n}\right]\right) = p^{n_1(k,n)}(1-p)^{n_0(k,n)} \]
The extension to any interval where the extremities are binary fractions is straightforward. We just break them down into intervals where the numerators differ by one and apply the previous rule. In symbols:
\[ \rho\left(\left[\frac{k}{2^n}, \frac{l+1}{2^n}\right]\right) = \sum_{i=k}^{l} p^{n_1(i,n)}(1-p)^{n_0(i,n)} \]
We are essentially done, since we can approximate any real number as well as we want we want by using binary fractions [5]That is, we can write real numbers in the binary basis. But life is more than just binary fractions, so I’ll show explicitly how to deal with the interval
\[[0,1/3] = [0,0.\bar{01}] \]

The key thing is to choose a nice sequence of binary fractions $a_n$ that converges to $1/3$. It is convenient to use a monotonically increasing sequence, because then we don’t need to worry about minus signs. If furthermore the sequence starts with $0$, then \[ [0,1/3] = \bigcup_{n\in \mathbb N} [a_n,a_{n+1}] \] and
\[ \rho([0,1/3]) = \sum_{n\in \mathbb N} \rho([a_n,a_{n+1}]) \] An easy sequence that does the job is $(0,0.01,0.0101,0.010101,\ldots)$. It lets us write the interval as
\[ [0,1/3] = [0.00, 0.00\bar{1}] \cup [0.0100, 0.0100\bar{1}] \cup [0.010100, 0.010100\bar{1}] \cup … \] which gives us a simple interpretation of $\rho([0,1/3])$: it is the probability of obtaining a sequence of outcomes starting with 00, or 0100, or 010100, etc. The formula for the measure of $[a_n,a_{n+1}]$ is also particularly simple:
\[ \rho([a_n,a_{n+1}]) = p^{n-1}(1-p)^{n+1} \] so the measure of the whole interval is just a geometric series:
\[ \rho([0,1/3]) = (1-p)^2\sum_{n\in\mathbb N} \big(p(1-p)\big)^{n-1} = \frac{(1-p)^2}{1-p(1-p)} \]

It might feel like something is missing because we haven’t examined irrational numbers. Well, not really, because the technique used to do $1/3$ clearly applies to them, as we only need a binary expansion of the desired irrational. But still, this is not quite satisfactory, because the irrationals that we know and love like $1/e$ or $\frac{2+\sqrt2}4$ have a rather complicated and as far as I know patternless binary expansion, so we will not be able to get any nice formula for them. On the other hand, one can construct some silly irrationals like the binary Liouville constant
\[ \ell = \sum_{n\in\mathbb N} 2^{-n!} \approx 0.110001000000000000000001\]whose binary expansion is indeed very simple: every $n!$th binary digit is a one, and the rest are zeroes. The measure of the $[0,\ell]$ interval is then
\[ \rho([0,\ell]) = \sum_{n\in \mathbb N} \left(\frac{p}{1-p}\right)^{n-1} (1-p)^{n!} \]Which I have no idea how to sum (except for the case $p=1/2$ ;)

But I feel that something different is still missing. We have constructed a probability measure over the set of coin tosses, but what I’m used to think of as “the probability” for uncountable sets is the probability density, and likewise I’m used to visualize a probability measure by making a plot of its density. Maybe one can “derive” the measure $\rho$ to obtain a probability density over the set of coin tosses? After all, the density is a simple derivative for well-behaved measures, or the Radon-Nikodym derivative for more naughty ones. As it turns out, $\rho$ is too nasty for that. The only condition that a probability measure needs to satisfy in order to have a probability density is that it needs to attribute measure zero to every set of Lebesgue measure zero, and $\rho$ fails this condition. To show that, we shall construct a set $E$ such that its Lebesgue measure $\lambda(E)$ is zero, but $\rho(E)=1$.

Let $E_n$ be the set of infinite sequences that start with a $n$-bit sequence that contains at most $k$ ones[2]So for example if $k=1$ then $E_3$ is the set of sequences that start with 000, 001, 010, or 100. Then
\[ \rho(E_n) = \sum_{i=0}^k \binom ni p^i (1-p)^{n-i} \] and
\[ \lambda(E_n) = 2^{-n} \sum_{i=0}^k \binom ni \] These formulas might look nasty if you haven’t fiddled with entropies for some time, but they actually have rather convenient bounds, which are valid for $p < k/n < 1/2$:
\[ \rho(E_n) \ge 1 - 2^{-n D\left( \frac kn || p\right)} \] and
\[ \lambda(E_n) \le 2^{-n D\left( \frac kn || \frac 12\right)} \] where $D(p||q)$ is the relative entropy of $p$ with respect to $q$. They show that if $k/n$ is smaller than $1/2$ then $\lambda(E_n)$ is rather small (loosely speaking, the number of sequences whose fraction of ones is strictly less than $1/2$ is rather small), and that if $k/n$ is larger than $p$ then $\rho(E_n)$ is rather close to one (so again loosely speaking, what this measure does is weight the counting of sequences towards $p$ instead of $1/2$: if $k/n$ were smaller than $p$ then $\rho(E_n)$ would also be rather small).

If we now fix $k/n$ in this sweet range (e.g. by setting $k = \lfloor n(p + 0.5)/2\rfloor$)[3]I know, $k/n$ it is not actually fixed, it is just a sequence that converges to $(p + 0.5)/2$, but come on. then
\[ E = \bigcap_{i \in \mathbb N} \bigcup_{n \ge i} E_n,\]
is the set we want, some weird kind of limit of the $E_n$. Then I claim, skipping the boring proof, that
\[ \rho(E) = 1 \]and
\[ \lambda(E) = 0 \]

But don’t panic. Even without a probability density, we can still visualize a probability measure by plotting its cumulative distribution function
\[ f(x) = \rho([0,x]) \]which for $p = 1/4$ is this cloud-like fractal:

Often people ask me why I’m not more open-minded about ideas that defy the scientific consensus. Maybe global warming is just a conspiracy? Maybe Bell’s theorem is in fact wrong? Maybe the EmDrive does provide thrust without using propellant? Maybe the E-Cat can make cold fusion? I mean, it is not logically impossible for some outsider to be correct while the entire scientific community is wrong. Wasn’t Galileo burned at the stake (sic) for defying the scientific consensus? Why should I then dismiss this nonsense outright, without reading it through and considering it carefully?

Well, for starters the scientific method has advanced a lot since the time of Galileo. Instead of asserting dogma we are busy looking at every tiny way experiment can deviate from theory. And if you do prove the theory wrong, you do not get burned at the stake (sic), but get a Nobel Prize (like the prize gave for the discovery of neutrino oscillations in 2015). So I’m naturally very suspicious of outsiders claiming to have found glaring mistakes in the theory.

But the real problem is the sheer amount of would-be Galileos incessantly spamming researchers about their revolutionary theories (despite not being exactly famous, I get to join the fun because they usually write to every academic email address they find online. I can only wonder how Stephen Hawking’s inbox looks like). It is already a lot of work to keep me up-to-date with the serious papers in my field. Imagine if I also had to read every email that proved Einstein wrong?

Without further ado, I’d like to illustrate this point by showing here the most entertaining crackpots that have spammed me:

Probably the most well-known is Gabor Fekete, who has a truly amazing website to expound his theories (don’t forget to press Ctrl or click with the right button of the mouse while you’re there!). Apparently he doesn’t like the square root in the Lorentz factor, and has a nice animation showing it being erased. If you do that I guess you’ll be able to explain all of physics with eight digits accuracy. He has recently taken to spoofing his emails to make it look like they were sent by Nobel laureates, probably thinking that his theories would be accepted if they came from a famous source. While the forgery itself was well-made (one needs to look carefully at the source code of the email to detect it), the content of the email kind of gives it away. Maybe if he had spend his time studying physics instead of the SMTP protocol…

Another persistent spammer is Sorin Cosofret, who started a newsletter about his theories to unwilling subscribers. They are about classical electromagnetism, relativity, quantum mechanics, planetary dynamics, cosmology, chemistry… apparently everything is wrong, but he knows how to correct it. He also has a website, that if not as flashy as Gabor Fekete’s, is at least available in Romenian, English, French, German, and Spanish.

A more aggressive one is stefan:sattler, who has a problem with the known laws of planetary mechanics, and wants the scientific community to help in publicising his “Sattler’s Law of planetary mechanics”. After sending 5 emails in one month he lost his patience, and gave us 48 hours to do it, threatening to publish all our names and email addresses if we don’t (you know, the name and email addresses that are publicly available). He told us

Go now and REPENT – go now and try to offer redemption for the guilt and responsibility you all have loaded upon your shoulders.

Time is ticking – you have 48 hours – the JUDGEMENTS ARE BEING WRITTEN RIGHT NOW…..

I haven’t heard from him since.

More recently, I got an email from an anonymous crackpot who maintains a prolific YouTube channel in Croatian dedicated to showing that the Earth is flat. It was entertaining to see that the crackpot sent me emails to both my University of Vienna address and to my University of Cologne address, each signed as a different person pretending to be interested in whether the videos were correct.

If you want to defy the scientific consensus, first study it for a few years. Then publish a peer-reviewed paper (Reputable journals do accept some prettyoutlandishstuff). Then I’ll listen to you.