1. By the theorem of total
probability, if Qi is the proposition
that the chance of p
is xi, C(p) =
∑iC(Qi)
C(p|Qi). Suppose
that one has arrived at one's current credence C by
conditionalising a reasonable initial function on admissible evidence;
then if the PP is true (and the NP is approximately true), it follows
that one's credence C(p) is equal to
∑iC(Qi)xi.
In other words, one's credence in p is simply your subjective
expectation of the chance of p (always assuming you have no
inadmissible evidence).

2.
The proposition expressed
by the relevant utterance of the sentence, ‘The coin lands
heads’ does exist even when the coin doesn't land heads. Can it
be the bearer of the chance? It seems implausible, since the
proposition exists necessarily, and is intrinsically qualitatively
identical, even when the chance varies; so the chance cannot supervene
on properties of this proposition.

3.
This conclusion, that
chanciness is a feature of a generating process, is resisted by those
philosophers, like von Mises, who reject single-case chance. If von
Mises' frequentism were the only theoretically adequate account of
chance, there would be some force to his contention, but that view is
widely believed to be
inadequate as an account of chance—see the discussion at
supplement
A.3.
Another argument
offered against single-case chance is Milne's generalisation of
Humphreys 1985, ‘directed against any realist single-case
interpretations of probability’ (Milne 1985: 130). His argument
basically is that single-case conditional chances only exist when the
conditioned event is not determined at the time of the conditioning
event, but that this situation is very rare and would make almost all
the conditional chances we use illegitimate. Milne's argument
rests in part on making a close connection between chance and
determinism, a controversial issue we will return to below
(§7).
But a more immediate possible line of objection to his argument
is that Milne takes a feature of one theory of single-case
chance—the causal tendency view of Giere (1973), which requires
that conditional chances be understood as the chance of the conditioned
event, given that the conditioning event has occurred—and
applies it to theories which make no such assumption. In
particular, he applies it to views according to which chances vary
with time, even though Milne makes no explicit reference to any
temporal parameter on the chance functions he uses in his argument.
These possible responses to Milne's argument make it reasonable to conclude
that single-case chance is consistent and an integral part of our
ordinary beliefs about chance.

4.
This does not apply to Bohmian mechanics, which is a different
theory than orthodox elementary quantum mechanics, though it makes the
same experimental predictions (Albert 1994, chapter ). This theory has
a determinate prior state for every quality; it reproduces the
observed predictions of quantum mechanics by permitting non-local
interactions between arbitrarily separated parts of entangled
systems.

5.
This inference
shouldn't be taken too far; even fundamental theories which do
not mention chances may nevertheless be true in chancy
worlds—just as there are people despite the fact that our
fundamental physics doesn't mention people (Eagle 2011:
§2).

6.
More formally, a
sequence is Borel normal if the frequency of every string
σ of length |σ| in the first n
digits of the sequence approaches
1/2|σ| as n → ∞.
This obviously entails the strong law.

7.
Von Mises himself gives
a more general characterisation, as he is concerned to define the
probability of an arbitrary type of outcome in an arbitrary sequence of
outcomes, so he insists only that each type of outcome should have a
well defined limit frequency in the overall sequence, and that
frequency should remain constant in all admissibly selected
subsequences, whether or not that frequency is ½.

8.
We must restrict this proposal to include only those computable
functions which take the value 1 an infinite number of times, and thus
select only infinite subsequences of the original sequence. The
problem with finite selection rules (those that take the value 1 only
finitely often) is that every such rule will select outcomes drawn
from some finite initial segment of the sequence. But, by the law of
iterated logarithm, which we return to below in connection with the
law of symmetric oscillation, infinitely many initial segments of a
random sequence will have more 0s than 1s (and infinitely many will
have more 1s than 0s). A selection rule that draws from the just the
first n outcomes in a sequence, for infinitely
many n, will be selecting from a subsequence with the wrong
frequency, and will yield in turn infinitely often a finite
subsequence with the wrong frequency.

10.
The law of symmetric oscillation follows immediately from the law
of the iterated logarithm, a celebrated result on the asymptotic
behaviour of
Sn, which states that almost all sequences have
Sn spread around the mean
(n/2) to an asymptotic limit of
(2 log log n)½σn,
where σn is the standard deviation
√n/2. The result holds for much more general kinds
of random variables (Kolmogorov 1929; Feller 1945).

11.
As an interesting aside,
Kolmogorov randomness provides a good explication of
Lewis' notion of a
‘quasi-miracle’, crucial to his
treatment of chancy counterfactuals. Lewis says that a quasi-miracle is
a low-probability event that is also remarkable:

What makes a quasi-miracle is not improbability per se but
rather the remarkable way in which the chance outcomes seem to conspire
to produce a pattern. (Lewis, 1979a: 60)

The classic case is that of a monkey typing Hamlet by randomly
striking the keyboard; that is an event remarkable but no more
improbable than any other sequence of characters the monkey might have
produced. The probability of a producing a given sequence of length
n is 1/2n; this is equal for any
sequence of the given length, orderly or not. So the probability does
not account for the fact that different sequences of the same length
can differ with respect to their remarkableness. The orderliness of a
sequence σ may be defined as
1/2C(σ); orderly sequences are
such that they exhibit patterns, and for such a patterned sequence
C(σ) will be low, and
1/2C(σ) correspondingly
higher. We can then define the remarkableness of σ as
2C(σ)/2|σ|—i.e.,
the orderliness of σ divided by its probability. If
σ is both orderly and low-probability, it will be highly
remarkable. So we might say that the occurrence of a remarkable event
is a quasi-miracle. (This suggestion dovetails nicely with the revised
Lewis-style approach to counterfactuals, using what amounts to a
tweaked notion of quasi-miraculousness, in Williams 2008:
§3.)

12. To see why, suppose
it was not prefix-free. Then the code of one string σ is a
proper part of the code of another string, τ. Since both
codes start with 1[|σ|]0 (for how else could the
code of σ be an initial part of each?), it follows that σ
and τ must be the same length. But then the code
of τ will be the same length as the code of σ,
contrary to the assumption that the latter code is a proper part of
the former code.

3.
Note the presence in
the upper bound for K(σ) of
C(σ); since C is not a recursive
function, this is not a computable upper bound on K (since
|σ| is an upper bound on C, we do have a
computable upper bound in that case). There is a somewhat less
well-behaved computable upper bound on K (Downey and
Hirschfeldt, 2010: §2.12).

14.
Rather than solve the
reference class problem, von Mises proposed to sidestep it entirely,
and to deny that there is any such thing as the single-case chance of a
particular event:

A probability of death is attached to this class of men or
to another class that can be defined in a similar way. We can say
nothing about the probability of death of an individual even if we know
his condition of life and health in detail. The phrase
‘probability of death’, when it
refers to a single person, has no meaning at all for us. (von Mises,
1957: 11)

15.
This is contrary to what some have argued: for example, Earman (1986:
143–4) argues that there is no natural way to extend Kolmogorov
randomness to biased sequences, because biased sequences are
Kolmogorov compressible with respect to the Lebesgue measure. But that
seems to hold 'compressibility' to a double standard that we aren't
holding ML-randomness to; the natural generalisation in the main text
brings the two approaches back into parity. I'm indebted to
Christopher Porter for bringing my attention to this problem for
Earman, and for helpful discussion of this generalisation of
Kolmogorov randomness to the case of biased sequences.

16.
The Champernowne
number fails the Law of Iterated Logarithms, for example (Dasgupta,
forthcoming: §3.4).

17.
If we represented the
system by a single bi-infinite sequence, this symbol shift
characterisation is more obvious.

18.
The pseudorandom
sequences we considered in
§4.5
lacked randomness because the finite seed
entailed repetition in the sequence after some finite period. In the
baker's transformation, the initial seed is a real
number, a measure one set of which have an infinite random binary
expansion (Downey and Hirschfeldt, 2010: Part II). If the initial seed
is a random real in this sense, then the product will be a random
sequence, even though the process is akin to the algorithms which
produce merely pseudorandom sequences (of course since no algorithm can
perfectly represent arbitrary reals, it cannot be the same
algorithm).

19.
Consider some small
bundle of initial states S, and some state
s0 ∈ S. Then, for some systems,

∃ε>0 ∀δ>0 ∃s0′∈S ∃t>0(|s0
− s0′| < δ ∧
|st − st′| >
ε).

In fact, for many chaotic systems, all neighbouring trajectories within
the bundle of states diverge exponentially fast.

20.
Not all chaotic
systems are dissipative; Werndl (2009) argues that this feature,
displaying mixing behaviour, is precisely what chaos more generally
amounts to.

21.
This does not,
despite first appearances, violate the first law of
inertia—for at every instant, the law (in the form,
bodies with no net force acting on them are unaccelerated) is
true. At t = t* , the body
is at rest at the only place on the dome where the forces balance; and
at every t > t* , the
body is accelerating because at every such time the body is at a point
where net force is being exerted. (This does suggest, however, that the
law of inertia in the form bodies continue in uniform motion
(including rest) if not subject to force yields a misleading
dynamical reading that would rule out the dome example.)

22.Norton's footnote: Since all excitation
times T would have to be equally probable, the probability
that the time is in each of the infinitely many time intervals, (0, 1),
(1, 2), (2, 3), (3, 4),… would have to be the same,
so that zero probability must be assigned to each of these intervals.
Summing over all intervals, this distribution entails a zero
probability of excitation ever happening.

23. Eagle (2005:
§4–5) suggests that a system is predictable iff,
conditional on what we know about the past states of the system, and
knowing the laws, we may have a posterior credence in future states
that is closer to the truth than our prior credence (where closeness
to the truth is characterised by having a more inaccuracy-minimising
credence, as in Joyce 1998). Conditioning on we know about a system,
rather than everything that is true, makes unpredictability
appropriately relative to us, especially given the existence of
unknown truths. It follows that a system will be unpredictable to the
extent that information we can obtain about the past states of the
system is, or is very close to, probabilistically independent of
future states, holding fixed our knowledge of the laws. Werndl (2009:
§5) endorses a similar characterisation when she proposes that
approximate probabilistic irrelevance is the hallmark of
unpredictability, and proves that mixing systems exhibit such
unpredictability. If randomness is unpredictability, then, sequences
produced by random processes will be at least approximately Bernoulli,
which does explain the appeal of KML-randomness. Berkowitz et
al. 2006 argue that in fact an epistemic conception of randomness
as unpredictability is the only way to understand the ergodic
hierarchy of ergodicity, mixing, and Bernoulli properties of
systems.

24.
The existence of such
a ‘prediction algorithm’ should
not be taken to mean that the system really is predictable. We could
never know that the algorithm made correct predictions until after they
had come true, so they would hardly have the epistemic status that a
reliable prediction should have for us, namely, to guide our future
expectations.

25.
An even more radical
denial of universalism comes from the debate over free will.
Libertarians believe that our free will is not determined by the past
states of the universe, but also that our exercises of will are not
purely by chance—rather, we make them happen (through the
determinations of the will, not the determination of past history). If
this view is coherent, it may provide a case where there is
indeterminism but no chance.

26.
Admittedly, showing
that a sequence is vM-random is not yet to show that it is KML-random.
Yet the role of effective tests in
Martin-Löf's construction ensures that
the non-random sequences will be effectively determined to satisfy some
measure zero property. Such effective tests are specified by properties
low down on the arithmetical hierarchy (ML-randomness is also known as
Σ01-randomness
because passing a
ML-randomness test can be specified as not being in the intersection of
a sequence of
Σ01
classes), so it is
plausible that some specification higher up the hierarchy (as
Humphreys' is, at least
Π02)
will define a deterministic sequence
which does not violate any effective measure one property.

Notes to Supplement A. Basic Principles About Chance

1.
One recent
objection to PP worth noting arises from apparent direct
counterexamples to the PP, derived from the existence of contingent a
priori truths, which illustrate the possibility of a formal mismatch
between chance functions and credence functions (Hawthorne and
Lasonen-Aarnio, 2009; Williamson, 2006) This example gives the flavour
of both arguments: consider the sentence ‘actually A
iff
A’. This sentence is a priori true, and so should get
credence 1. Yet the sentence is contingent. Suppose that A is
actually true; then according to the standard logic of the
‘actually’ operator (Davies and Humberstone, 1980),
‘Actually A iff A’ is true in exactly
the same possibilities as A (this is because
‘Actually A’ is necessarily true if true). So
if A is a statement with a non-trivial chance,
‘actually A iff A’ also has a
non-trivial chance, differing from the credence one should have in it,
which will pose an obvious problem for the PP. One response to this
kind of argument is to take the entities in the domain of the chance
and credence functions to be propositions rather than sentences (as
Lewis originally did in formulating the PP), and suggest that the
problem does not arise because the contingent a priori only emerges at
the level of sentences. To have different credences in A and
‘Actually A iff A’, even though they
express the same proposition when A is actually true, is to
violate some widely accepted (though not uncontroversial) norms on
rational credence. A sentential account of the a priori (such as the
two-dimensionalist account offered in Stalnaker 1978) can accept this
norm, accept the PP, and nevertheless offer some kind of explanation
of the apparent counterexample. Whether this sort of response
succeeds remains a matter for debate.

It is also worth noting that de Finetti (1974), for one, did deny that
there really are chances, hoping to do everything with exchangeable
credence alone; this may not be right, but perhaps Lewis is too strong
in calling this program ‘silly’.

2.
Others have argued
that the original puzzle only arises because even the original PP
inappropriately conditionalises the credence on evidence E
which includes information about the chances. Ismael (forthcoming)
argues that the real principle to adopt is the following, where
Ht is just the history up to
t

(UPP)C(p|Ht) =
Cht(p)

This principle also is not susceptible to undermining, because one
never conditionalises on the theory of chance (assuming that the past
history itself does not fix the chances). One won't ever in
general know the right hand side of this equation; but by the theorem
of total probability and general principles about current estimates of
unknown quantities, it can be estimated as the weighted sum of the
chances assigned by various future histories, weighted by your
credences in those histories. Ismael's final recommendation is
‘that you should adjust credence in A to your best
estimate of the chances’.

3.
Schaffer endorses
a strengthened version of the BCP, which he calls the Realization
Principle (RP): this is the claim that if the present chance of
A is greater than zero, there is a world where A is
true which matches ours in history and natural laws (not just laws of
chance, but all laws).

4.
Although there is an argument that something
like the BCP can be defended on elementary grounds: It is an axiom of
probability theory that tautologies should receive probability one;
Hájek suggests that necessary truths also should receive
probability one. This is perhaps debatable in the case of subjective
probability, for perhaps it is sometimes possible to rationally not be
fully confident in a necessary truth (for example, if identity
statements are necessary if true, it may yet be rational on occasion to
doubt a true identity claim). But in the case of objective chance, it
is hard to dispute that if p is necessary, then the chance of
p should be 1. By substitution, necessarily ¬p
entails Ch(¬p) = 1. Equivalently, necessarily: if it is
not possible that p, then the chance of
p is zero. By contraposition, we get that if the chance of
p is non-zero, p is possible. While
Hájek's argument seems sound, its
conclusion is notably weaker than the BCP as defended by Bigelow et
al., Mellor, Schaffer, or Eagle. For
Hájek's claim is just that something
with a non-zero chance is true at some possible world—it
says nothing about whether that possibility should be one which is
systematically similar to actuality with respect to the facts that
ground the actual chance. But all the other formulations do entail that
the possibilities where the outcome does happen should be similar with
respect to the actual past, or to the actual properties of the chance
device. The BCP proper is thus stronger than the relatively trivial
conclusion of Hájek's argument, and
correspondingly more vulnerable to objections.

5.
The downside is
that his theory of ‘L*-chance’ is subject to the objections
raised by Arntzenius and Hall (2003) that we discussed above.

6.
Though Strevens
(1999) raises an epistemological objection to the effect that, even if
it is a truth about chance, the PP cannot be justified at all, on any
conception of chance. His basic concern is analogous to Hume's
worries about traditional principles of induction—that setting
one's credences equal to the chances is rational only if one is
already convinced that the high objective probability of PP leading to
epistemic success justifies confidence in the PP, i.e., is justified
only if one already accepts the PP. (Hall 2004 develops a more limited
version of this conclusion, arguing that reductionists about chance
can't justify the PP.) Hoefer (2007) responds using
his particular account of chance. Others may well adapt their own
preferred response to the Humean problem of induction, or simply deny
that the justification of the PP matters as much to the theory of
objective chance as its truth—and its truth will do to
secure the frequency-chance connection needed here.

7.
Popper (1959: 34)
took himself to be objecting to frequentism with his example of a
sequence of mixed tosses of differently biased dice, arguing that
‘the frequency theorist is forced to introduce a modification of
his theory… He will now say that an admissible
sequence of events (a reference sequence, a ‘collective’)
must always be a sequence of repeated experiments.’ But von Mises
(1957: 14) had already imposed this requirement, and indeed had already
made progress towards identifying the basis of a collective with a
physical property of the chance setup: ‘The probability of a 6 is
a physical property of a given die and is a property analogous to its
mass, specific heat, or electrical resistance. Similarly, for a given
pair of dice (including of course the total setup) the
probability of a “double 6” is a
characteristic property, a physical constant belonging to the
experiment as a whole and comparable with all its other physical
properties.’

The SEP would like to congratulate the National Endowment for the Humanities on its 50th anniversary and express our indebtedness for the five generous grants it awarded our project from 1997 to 2007.
Readers who have benefited from the SEP are encouraged to examine the NEH’s anniversary page and, if inspired to do so, send a testimonial to neh50@neh.gov.