Epistemic Utility Arguments for Probabilism

First published Fri Sep 23, 2011

Our beliefs come in degrees; we believe some more strongly than
others. We call the strengths of our beliefs our credences.
Suppose I know that a die is to be rolled, and I believe that it will
land on six more strongly than I believe that it will land on an even
number. In this case, we would say that there is something wrong with
my credences, for if it lands on six, it has landed on an even number.
I ought not to believe a proposition more strongly than I believe any
of its logical consequences. This is a consequence of a popular
doctrine in epistemology called Probabilism, which
says that our credences at a given time ought to satisfy the axioms of
the probability calculus (given in detail below). Since this says
something about how our credences ought to be rather than how
they in fact are, we call this an epistemic norm.

Suppose next that I satisfy Probabilism before the
die is rolled, and that I divide my credences equally over the
possible outcomes of the roll: that is, I assign a credence of 1/6 to
each. The die is then rolled and I learn that it landed either on one
or on two. If my credence that the die landed on one becomes 1/3,
whilst my credence that it landed on two becomes 2/3, we would again
say that there is something wrong with my credences. This time it is
not Probabilism that I have violated,
but Conditionalization, an epistemic norm that
governs how we should update our credences when we learn new
information: in the example, the information is about how the die
landed. Roughly speaking, Conditionalization says
that I should remove all my credence from the outcomes that I have
learned did not happen and divide this amongst the remaining outcomes
in proportion to the initial credence I had in each. We state
Conditionalization precisely below.

In this entry, we explore a particular strategy that we might
deploy when we wish to establish an epistemic norm such
as Probabilism
or Conditionalization. It is called epistemic
utility theory, or sometimes cognitive decision
theory. I will use the former. Epistemic utility theory is
inspired by traditional utility theory, so let's begin with a quick
summary of that.

Traditional utility theory (also known as decision theory) explores a
particular strategy for establishing the norms that govern which
actions it is rational for us to perform in a given situation. The
framework for the theory includes states of the world,
actions, and, for each agent, a utility function,
which takes a state of the world and an action and returns a measure
of the extent to which the agent values the outcome of performing that
action at that world. We call this measure the utility of the
outcome at the world. For example, there might be just two relevant
states of the world: one in which it rains and one in which it does
not. And there might be just two relevant actions from which to
choose: take an umbrella when one leaves the house or don't. Then my
utility function will measure how much I value the outcomes of each
action at each state of the world: that is, it will give the value of
being in the rain without an umbrella, being in the rain with an
umbrella, being with an umbrella when there is no rain, and being
without an umbrella when there is no rain. Sometimes, we also have,
for each agent, a credence function, which takes a state of
the world and returns a measure of the agent's credence that the world
is in that state. In our example, the credence function would give my
credence that it will rain and my credence that it will not. With this
framework in hand, we can state certain very general norms of action
in terms of it. For instance, we might say that an agent ought to
perform a particular action if, for every possible state of the world,
that action has the highest utility at that state of the world amongst
all possible actions. This norm is
called Dominance. Or we might say that an agent ought
to perform an action that has maximal expected utility, where the
expected utility of an action is obtained by weighting its utility at
each state of the world by the credence assigned to that state of the
world, and summing. This norm is called
Maximize Expected Utility. Thus, for instance, it
would be rational for me to
take my umbrella if I believe that it will rain exactly as strongly as
I believe that it won't and if I dislike being in the rain without an
umbrella more than I like being without an umbrella when it isn't
raining.

In epistemic utility theory, the states of the world remain the
same, but the possible actions an agent might perform are replaced by the possible epistemic states she might adopt, and
the utility function is replaced, for each agent, by an epistemic
utility function, which takes a state of the world and a possible epistemic state and returns a measure of the purely epistemic value
that the agent would attach to being in that epistemic state at that
state of the world. So, in epistemic utility theory, we can appeal to epistemic utility to ask which of a range of possible epistemic states it is rational to adopt, just as in traditional utility theory we appeal to utility to ask which of a range of possible actions it is rational to perform.

Again, certain very general norms may be stated, such as the obvious
analogues of Dominance and Maximize Expected Utility from above.
Thus, before the die is rolled, we might ask whether I should adopt an
epistemic state in which I believe that the die will land on six more
strongly than I believe that it will land on an even number. And we
might be able to show that I shouldn't because there is some other
epistemic state I could adopt instead that will have greater epistemic
utility however the world turns out. In this case, we appeal to the
epistemic version of Dominance to show what is wrong with my
credences. This is an example of how epistemic utility theory might
come to justify Probabilism. As we will see,
arguments just like this have indeed been given. And arguments that
appeal to an epistemic version of Maximize Expected Utility have been
given to justify Probabilism
and Conditionalization. In this entry, we explore
these arguments.

In formal epistemology, epistemic states are modelled in many
different ways. Given an epistemic agent and a time t, we
might model her epistemic state at t using any of the
following: the set of propositions she believes at t; the set
of propositions she believes at t together with an
entrenchment ordering, which specifies the order in which she
is prepared to abandon her beliefs in the light of conflicting
evidence; her credence function at t; the set of credence
functions, each of which is a precisification of her otherwise vague
credences at t; her upper and lower probability functions at
t. Epistemic utility theory may be applied to any one of these
ways of modelling epistemic states. Whichever we choose, we define an
epistemic utility function to be a function that takes an epistemic
state modelled in this way, together with a state of the world, to a
real number that measures the epistemic utility of having that
epistemic state at that world.

However, the vast majority of work carried out so far in epistemic
utility theory has taken an agent's epistemic state at time t
to be modelled by her credence function at t. And, in any
case, the Bayesian norms that interest us here govern agents modelled
in this way. Thus, we focus on this case.

So, henceforth, we model an agent's epistemic state at t by
her credence function at t. We assume that the set of
propositions about which an agent has an opinion forms an algebra
F: that is, it contains a contradictory
proposition (⊥) and a tautologous proposition (⊤), and it is
closed under disjunction, conjunction, and negation. Thus, if the agent has only an opinion about whether or not it will rain, then the algebra contains the contradictory proposition,
the tautologous proposition, and the propositions It will rain and It will not rain. We then assume that our agent's credence
in a proposition in F can be measured by a
real number. Then her credence function at t is a function
b from F to the real numbers ℜ.
If A is in F, then
b(A) is our agent's credence in A at
t. Throughout, we denote by
BF the
set of possible credence functions b :
F → ℜ.

Much of what follows depends on certain assumptions about the
algebra F. For instance, many of the
arguments presented will assume that F is
finite. That is, our credence functions model agents who have opinions
only about a finite set of propositions. They will also often assume
that the propositions that form the algebra are non-indexical
or non-self-locating. By this, we mean that they say something
only about the world; they do not also say something about where or
when in the world the agent is located. Thus, the proposition It
rains in Bristol at noon on 1st January 2011 is
non-indexical, while the proposition It rains here now is
indexical. Again, I will indicate when this assumption can be
relaxed.

So, an epistemic utility function EU takes a credence
function b, together with a model of the way the world might
be, to a measure of the epistemic utility of having that credence
function if the world were that way. In fact, we call such an epistemic
utility function a global epistemic utility function, since it
applies to the whole epistemic state. A local epistemic
utility function applies only to a certain sort of proper part of an
epistemic state. For instance, a local epistemic utility function might
take a particular credence in a particular proposition, together with a
model of the way the world might be, to a measure of the epistemic
utility of having that credence in that proposition if the world were
that way.

Although it is not an essential feature of epistemic utility theory,
in this article, we model a way the world might be as a classically
consistent assignment of truth values to the propositions in
F, that is, as a classical valuation
function. We let
VF
denote the set of such consistent truth assignments or valuations
v : F → {0, 1}. Note that
VF
⊆
BF. That
is, each truth assignment to F is a credence
function on F; indeed, it is the credence
function of a maximally opinionated agent. Note also that, in algebras
that contain atomic propositions, there is a natural one-one
correspondence between the atomic propositions of the algebra and the
consistent assignments of truth values to the propositions of the
algebra. After all, the atoms of F are
maximally opinionated propositions. Thus, given an atomic proposition,
there is exactly one valuation that makes it true; and given a
valuation, there is exactly one atomic proposition that it makes true.
We will often abuse notation and use the same notation for a truth
assignment and its corresponding atomic proposition (when
F has atoms). Since we typically assume that
the propositions in F are non-indexical, the
worlds represented by the valuation functions v in
VF are
ordinary possible worlds; they are not so-called centred
worlds, which we might think of as pairs consisting of a world and a
spatiotemporal position in it (Quine 1969).

In epistemic utility theory, we attempt to justify an epistemic norm
N using the following two ingredients:

Q. A norm of standard utility
theory/decision theory, which is to be applied to epistemic utility
functions.

E. A set of conditions that an
epistemic utility function must satisfy.

Typically, the inference from Q and
E to N goes via a
mathematical theorem, which shows that, applied to any epistemic
utility function that satisfies the conditions E, the
norm Q entails the norm
N.

Given that the extant arguments of epistemic utility theory share this
common form, we might organize these arguments by the norms they
attempt to justify, or by the norms of standard utility theory they
employ, or by the set of rational constraints on epistemic utility
functions they impose. We will take the latter course in this survey.
In the next section, we state Probabilism
and Conditionalization precisely: listing them here,
we can refer back to them later. Then, in the three sections that
follow, we consider different rational constraints on epistemic
utility functions and we present the arguments for the norms that they
have been used to give, as well as the objections that have been
raised against these arguments.

The most famous putative norms for credence functions are those that
comprise orthodox Bayesianism: they are Probabilism,
Countable Additivity, and
Conditionalization. Probabilism and
Countable Additivity are synchronic norms:
that is, each states a condition of rationality for individual credence
functions that represent a single agent's epistemic state at different
times. Conditionalization, on the other hand, is a
diachronic norm: that is, it states a condition of rationality
for pairs of credence functions. These are the norms on which we will
focus below, though we will consider others in passing. We state each
here so that we may refer back to them later. Those familiar with these
norms may wish to skip to the next section.

Probabilism is a coherence constraint on credence functions.
It is often likened to the consistency constraint on sets of full
beliefs.

Probabilism. An agent's credence function b
at a given time ought to be a probability function on the algebra
F:

b(⊥) = 0 and b(⊤) = 1;

0 ≤ b(A) ≤ 1, for all A in
F;

b(A ∨ B) = b(A) +
b(B), for all mutually exclusive A and
B in F.

Henceforth, we denote the set of such functions
PF.
Clearly,
VF
⊆
PF
⊆
BF.

Note that any agent who satisfies Probabilism must
be logically omniscient: that is, she must be certain of every
tautology.

Countable additivity is a further property that a credence
function may or may not have when the algebra
F on which it is defined is a
σ-algebra: that is, F is closed under
infinite disjunction. The norm Countable Additivity
says that an agent's credence function at any time ought to have this
property:

Countable Additivity. An agent's credence function
b at a given time ought to be a countably additive probability
function on the σ-algebra F: b
ought to be a probability function satisfying

b(∨i in IAi) = Σi in
Ib(Ai) for all
countable sets {Ai : i in
I} of mutually exclusive propositions in
F. (A set is countable if it can be
put into one-one correspondence with the set of natural numbers {1, 2,
… }. )

Thus, for instance, suppose that there is a lottery with a countable
infinity of tickets t1, t2,
… . And suppose that the algebra F
contains the proposition Ticket tn will win for
every n = 1, 2, … . Then it is a consequence of
Countable Additivity that, if an agent's credence
function is defined on F, then her credence
in the proposition Some ticket will win ought to be equal to
the infinite sum of her credences in each of the individual
propositions Ticket tn will win. It follows from
this that no agent who satisfies this norm can assign to each
proposition Ticket tn will win the same
credence.

Conditionalization is an updating rule. That is, it
describes a way of updating one's credence function in the light of a
piece of evidence that comes in the form of a proposition learned with
certainty. The norm Conditionalization says that an
agent ought to update by following that rule:

Conditionalization. Suppose that, between times
t and t′, our agent learns proposition
E with certainty, and nothing more. And suppose that
b and b′ are the agent's credence functions at
t and t′ respectively. Then, if
b(E) > 0, it ought to be the case that
b′(•) = b(• |
E).

Intuitively, Conditionalization says that, when we
learn E with certainty, we remove all credence that we
originally assigned to the worlds in which E is false and
distribute it over the worlds in which E is true in proportion
to our original credence in them and in such a way that the resulting
credence function obeys Probabilism. It follows that
we multiply our credence in each of these worlds by a factor of
1/b(E).

In the next three sections, we describe the arguments that have been
given for these three putative epistemic norms using the strategy of
epistemic utility theory. As mentioned above, we group them according
to the rational constraints they place on epistemic utility functions.
In fact, in epistemic utility theory, it turns out to be much easier to
deal with epistemic disutility functions, rather than
epistemic utility functions. The two are interchangeable. If
EU is a utility function, then −EU is a
disutility function, and vice versa. In sections 5 and 6, we identify a
specific epistemic goal and treat epistemic disutility
functions as measures of the distance of an epistemic state from that
goal in a given situation; we lay down conditions that it is claimed
all such measures must satisfy. In section 7, we take an alternative
route: we lay down putative general conditions on any epistemic
disutility function, which it is claimed such a function must
satisfy regardless of whether or not it is a measure of distance from a
specified epistemic goal.

In this section, we consider the conditions imposed on an epistemic
disutility function when we treat it as a measure of the distance of an
epistemic state from the goal of being actually or
hypothetically calibrated (Shimony 1988), (van Fraassen 1983),
(Lange 1999). We say that a credence function is actually calibrated at
a particular possible world if the credence it assigns to a proposition
matches the relative frequencies with which propositions of that kind
are true at that world. Thus, credence 0.2 in proposition A is
actually calibrated if one-fifth of propositions like A are
actually true. And we say that it is hypothetically calibrated if the
credence it assigns to a proposition matches the limiting relative
frequency with which propositions of that kind would be true
were there more propositions of that kind. Thus, credence 0.2
in proposition A is hypothetically calibrated if, as we move
to worlds with more and more propositions like A, the
proportion of such propositions that are true approaches 0.2 in the
limit. According to the calibration arguments, matching the relative
frequencies or limiting relative frequencies is an epistemic goal. And
they attempt to justify Probabilism and
Conditionalization by appealing to this goal and
measures of distance from it.

First, we must make precise what we mean by actual and hypothetical
calibration; then we can say which functions will count as measuring
distance from these putative goals. We treat actual calibration first.
Since we are talking of relative frequencies, we will need to assign to
each proposition in F its reference
class: that is, the set of relevantly similar propositions. Thus,
we require an equivalence relation ∼ on
F, where A ∼ B iff
A and B are relevantly similar. For instance, if our
algebra of propositions contains Heads on first toss of coin,
Heads on second toss of coin, and Six on first roll of
die, we might plausibly say that the first two are relevantly
similar, but neither first nor second is relevantly similar to the
third. Proponents of calibration arguments do not claim to give an
account of how the equivalence relation is determined. Nor do they
claim that there is a single, objectively correct equivalence relation
on a given algebra of propositions: this is the notorious problem
of the reference class that haunts frequentist interpretations of
objective probability. Rather they treat the equivalence relation as a
component of the agent's epistemic state, along with her credence
function. Indeed, for van Fraassen, it is determined entirely by the
credence function together with the form of the propositions in
F (van Fraassen 1983, p. 299). However, they
do impose some rational constraints on ∼ in order to establish
their conclusion. We will not discuss these conditions in any detail.
Rather we denote them C(∼), and keep in mind that this is
a placeholder for a full account of conditions on ∼. Detailed
accounts of these conditions have been given by (van Fraassen 1983) and
(Shimony 1988). We say that a credence function b, together
with an equivalence relation ∼, is perfectly calibrated or not
relative to a way the world might be, which we model by a consistent
truth assignment v in
VF. We
are now ready to give our first definitions; but we preface these with
an example.

Suppose a coin is to be flipped 1000 times. And suppose that
A is the proposition Heads on toss 1. And suppose
that the propositions that are relevantly similar to A in
algebra F are: Heads on toss 1,
…, Heads on toss 1000. Finally, suppose that v
is a consistent truth assignment that represents a possible world. Then
the relative frequency of A at v (written
Freq(F, A, ∼, v)) is
the proportion of the propositions relevantly similar to A
that are true at v: that is, the frequency of heads amongst
the 1000 coin tosses at that world. For instance, if every second toss
lands heads, then Freq(F, A, ∼,
v) = ½.

Now we give the definition in full generality. Suppose ∼ is an
equivalence relation on F, and v is
in VF.
Then:

For each A in F, the relative
frequency of truths amongst propositions like A is

Freq(F, A, ∼,
v)

:=

|{X in F : X ∼
A and v (X) = 1}|

|{X in F : X ∼
A}|

where |X| is the cardinality of the set X.

Relative to ∼, the credence r in proposition
A is actually calibrated at v if r
= Freq(F, A, ∼,
v).

The idea is that, if ∼ satisfies constraints C(∼),
then the function Freq(F, •, ∼,
v) is always a probability function on
F.

It is clear from this definition that the calibration arguments will
work only for finite algebras F. For an
infinite algebra, the definition just given will often make no sense,
since the cardinalities of the two sets involved in the ratio will
often be infinite.

Next, we treat hypothetical calibration. For this, we need the
notion of the limiting relative frequency of truths amongst
propositions of a certain sort. The idea is that, for each proposition
A in F, there is not just a fact of
the matter about what the frequency of truths amongst propositions like
A actually is; there is also a fact of the matter about what
the frequency of truths amongst propositions like A would be,
if there were more propositions like A. For instance, there is
not just a fact of the matter about how many actual tosses of a given
coin will land heads; there is also a fact of the matter about the
frequency of heads amongst hypothetical further tosses of the same
coin. In general, suppose we have a consistent truth assignment
v in
VF
(representing a possible world), an extension
F′ of F
(containing new propositions like A), and an extension
∼′ of ∼ to cover the new propositions in
F′. Then there is a single unique
number Freq(F′, A,
∼′, v) that gives what the relative frequency of
truths amongst propositions like A would be were there all the
propositions in F′ and were the
relation of similarity amongst them given by ∼′, where this
counterfactual is evaluated at the world represented by v.
Again, let us illustrate this using our example of the coin toss from
above.

Suppose again that A is the proposition Heads on toss
1 and that the propositions in F that
are relevantly similar to A according to ∼ are Heads
on toss 1, …, Heads on toss 1000. Now suppose that
F1 extends
F by introducing a new proposition about a
further hypothetical toss of the coin (as well as perhaps other
propositions). That is, it introduces Heads on toss 1001 (and
closes out under negation, disjunction, and conjunction). And suppose
that ∼1 extends ∼, so that the new proposition
Heads on toss 1001 is considered relevantly similar to each
Heads on toss 1, …, Heads on toss 1000. Then
those who appeal to hypothetical limiting frequencies must claim that
there is a unique number that gives what the frequency of heads would
be, were the coin tossed 1001 times. They denote this number
Freq(F1, A,
∼1, v). Now suppose that
F2 extends
F1 by adding the new proposition
Heads on toss 1002 and ∼2 extends
∼1, so that the new proposition Heads on toss
1002 is considered relevantly similar to each Heads on toss
1, …, Heads on toss 1001. And so on. Then the
limiting relative frequency of A at v (written
LimFreq(F, A, ∼, v)) is
the number towards which the following sequence tends:

Freq(F, A, ∼,
v), Freq(F1, A,
∼1, v),
Freq(F2, A,
∼2, v), …

In general, for each algebra F and
equivalence relation ∼, there is an infinite sequence

(F, ∼) =
(F0, ∼0),
(F1, ∼1),
(F2, ∼2),
…

of algebras and equivalence relations such that each
Fi+1 is an extension of
Fi and each
∼i+1 is an extension of
∼i and, for all i,
C(∼i). Using this, we can define the
notion of limiting relative frequency and the associated notion of
hypothetical calibration in full generality. Suppose ∼ is an
equivalence relation on F and v in
VF. And
suppose (F, ∼) =
(F0, ∼0),
(F1, ∼1),
(F2, ∼2), …
is the sequence just mentioned.

For each A in F, the limiting
relative frequency of truths amongst propositions like A is

LimFreq(F, A, ∼, v)
= limn →
∞Freq(Fn,
A, ∼n, v)

That is, the limiting relative frequency of A is the number
approached arbitrarily closely by the hypothetical relative frequencies
of truths as we extend the algebra F to
include more and more propositions like A.

Relative to ∼, the credence r in proposition A is
hypothetically calibrated at v if r =
LimFreq(F, A, ∼,
v)

According to some calibration arguments, actual calibration is an
epistemic goal; according to others, hypothetical calibration is the
goal. Whichever it is, a local epistemic disutility function ought to
measure the distance of an epistemic state from this epistemic goal in
a given situation. This gives rise to the following definition of a
particular sort of local epistemic disutility function:

Possibility of vindication. An agent ought to act in
such way that she does not thereby preclude the possibility of
attaining minimal disutility, when such a minimum exists.

That is: Suppose U is a disutility function,
W is the set of possible worlds, and
A the set of possible actions. Then an agent
ought to choose an action a0 in
A such that there is a possible world
w0 in W such that

U(a0, w0) =
min{U(a, w) : a in
A and w in
W}

when such a minimum exists.

It can be shown that, together with the characterization of measures
of actual calibration given above, suitable constraints
C(∼) on the equivalence relation ∼, and the assumption
that actual calibration is the sole epistemic goal, this norm entails
something stronger than Probabilism. It entails:

Rational-valued Probabilism. At any time t,
an agent's credence function b ought to be a probability
function on the algebra Fthat takes only
values in ℚ (where ℚ is the set of rational
numbers).

This is a consequence of the following theorem:

Theorem 1. Suppose Cal is a calibration
measure and suppose C(∼). Then the following are
equivalent:

b is a probability function on F
that takes only values in ℚ.

There is a world at which b is actually calibrated. That
is, there is a world v in
VF such
that, for all A in F,
Cal(b(A), A, F,
∼, v) = 0.

Different versions of this theorem result from different constraints
C(∼) on the equivalence relation ∼ (Shimony 1988),
(van Fraassen 1983), but the result is not surprising. An agent will
satisfy Possibility of vindication just in case her
credences match the relative frequencies at some world. And those
relative frequencies will satisfy the probability axioms if
C(∼) and if we have specified that condition correctly.
That they will be rational numbers follows from the definition of the
relative frequency of a proposition at a world.

Most proponents of the calibration argument are reluctant to accept
a norm that rules out all credences given by irrational numbers. To
establish the weaker norm of Probabilism, there are
two strategies they might adopt. The first is to appeal to the
epistemic goal of hypothetical calibration instead of actual
calibration. This, together with Possibility of
vindication gives us Probabilism via the
following theorem:

Theorem 2. Suppose LimCal is a hypothetical
calibration measure, and suppose C(∼). Then the following
are equivalent:

b is a probability function on
F.

There is a world at which b is hypothetically calibrated.
That is, there is a world v in
VF such
that, for all A in F,
LimCal(b(A), A, F,
∼, v) = 0.

The reason is that, while relative frequencies are always rational
numbers, the limit of an infinite sequence of rational numbers may be
an irrational number. And in fact, for any irrational number, there is
a sequence of rational numbers that approaches it in the limit (indeed,
there are infinitely many such sequences).

An alternative route to Probabilism changes the
decision-theoretic norm to which we appeal, rather than the sort of
calibration from which we wish our epistemic disutility function to
measure distance. The alternative norm is:

Possibility of arbitrary closeness to vindication.
An agent ought to act in such way that there are possible worlds in
which her disutility is arbitrarily close to being minimal.

That is: Suppose U is a disutility function,
W is the set of possible worlds, and
A the set of possible actions. Then an agent
ought to choose an action a0 in
A such that, for any ε > 0, there
is a possible world wε in
W such that

|U(a0, wε)
− min{U(a, w) : a in
A and w in
W}| < ε

when these minima exist.

Together with the characterization of calibration measures given
above, suitable constraints C(∼) on the equivalence
relation ∼, and two extra assumptions, this norm does establish
Probabilism (van Fraassen 1983, Shimony 1988). The
extra assumptions are these: First, if our agent has a credence
function b in
BF, the
possible worlds that we are considering include not only all
(consistent) truth assignments to F, but also
any (consistent) truth assignments to any (finite) algebra
F′ that extends
F. And, second, given any such
F′, the equivalence relation ∼ can
be extended in any possible way, providing the extension ∼′
of ∼ satisfies C(∼′).

Theorem 3. Suppose Cal is a calibration
measure and C(∼). Then the following are equivalent:

b is a probability function on
F.

For all ε > 0, there is a finite extension
F′ of F and
an extension ∼′ of ∼ that satisfies
C(∼′), and a possible world v′ in
VF′ such that, for all A in
F, Cal(b(A), A,
F′, ∼′, v) <
ε

Thus, if our agent satisfies Probabilism, then
however close she would like to be to actual calibration, there is some
possible world at which she is that close. And conversely.

These are the calibration arguments for
Probabilism. In the next section, we consider
objections that may be raised against them.

Calibration is not an epistemic goal.
It may be objected that neither actual nor hypothetical
calibration measures are truth-directed epistemic
disutility functions, where this is taken to be a necessary
condition on such a function (Joyce 1998), (Seidenfeld 1985). We
say that a local disutility function is truth-directed if it
assigns a higher disutility to one credence in a proposition than
another credence in that proposition exactly when the first is
further from the truth value than the second. Calibration
measures do not necessarily do this. Let us return to our toy
example: the propositions Heads on toss 1, …,
Heads on toss 1000 are in F and
they are all relevantly similar according to ∼. Now suppose
that the first coin toss lands heads, but all the others land
tails. Then credence 0.001 in Heads on toss 1 is further
from the truth, but closer to calibration; indeed, it is actually
calibrated since exactly one out of one-thousand relevantly
similar propositions are true. However, this objection seems
rather question-begging. Proponents of the calibration argument
will simply reject the claim that an epistemic disutility function
must be truth-directed.

Limiting relative frequencies are not well-defined
To define the limiting relative frequency of A at a world
v, we require that there is a unique sequence of
extensions of the algebra that contain more and more propositions
that are relevantly similar to A, and a corresponding
sequence of relative frequencies of truths amongst the
propositions like A in the corresponding algebra. But the
assumption of such a unique sequence is extremely controversial
and the problems to which it gives rise have haunted hypothetical
frequentism about objective probability (Hájek 2009).

Neither Possibility of vindication nor Possibility of arbitrary closeness to vindication is a norm
It might be that the only actions that give rise to the possibility
of vindication or of arbitrary closeness to vindication also give
rise to the possibility of maximal distance from vindication. And it
might be that there are actions that do not give rise to the
possibility of vindication or of arbitrary closeness to vindication,
but do limit the distance from vindication that is risked by
choosing that action. In such cases, it is not at all clear that it
is rationally required of an agent that she ought to risk maximal
distance from vindication in order to leave open the possibility of
vindication or of arbitrary closeness to vindication.

The constraints on ∼ are ill-motivated
This objection will vary with the constraints C(∼) that
are imposed on ∼. One uncontroversial constraint is this: If
A ∼ B, then b(A) =
b(B). The further constraints imposed in (van
Fraassen 1983) and (Shimony 1988) are more controversial (Joyce
1998). Moreover, they limit the application of the result, since
they involve assumptions about the form of the propositions in
F. Thus, the calibration arguments do not
show in general, of any finite algebra F,
that a credence function on F ought to be
a probability function, since not every such algebra will contain
propositions with the form required by the constraints C(∼).

In this section, we identify the norm of decision theory that is
deployed in conjunction with the above characterization of hypothetical
calibration measures to derive Conditionalization. And
we give the argument for Conditionalization.

The decision theoretic norm is familiar and uncontroversial:

Minimize disutility. If there is an action that has
minimal disutility in all worlds that are not ruled out by the agent's
epistemic state, then the agent ought to perform such an action.

That is: Suppose U is a disutility function,
W is the set of epistemically possible
worlds, and A the set of possible actions.
And suppose there is an action a in
A such that, for all a′ in
A and all w in
W, we have U(a, w)
≤ U(a′, w). Then the agent ought to
perform an action with that property.

Suppose that, between t and t′, an agent
learns proposition E with certainty and nothing more. And
suppose that b and b′ are her credence
functions at t and t′ respectively. Then, in
choosing the epistemic state she will adopt at time t′,
she can only appeal to her epistemic state at time t and the
new evidence E. How can this prior epistemic state and new
evidence guide her? According to Lange, there are two ways (Lange
1999).

First, if ∼ and ∼′ are the equivalence relations
relative to which relative frequencies are calculated at t and
at t′ respectively, then there is a way in which ∼
together with E must guide ∼′. Lange claims that two
propositions A and B are relevantly similar at
t (that is, A ∼ B) just in case the
evidence the agent has for A at t is the same as the
evidence that the agent has for B at t. Since
learning E adds new evidence to the agent's stock,
∼′ is a more fine-grained relation. In fact, Lange imposes a
number of further constraints on each ∼ and the relations between
them. We denote these constraints D(∼).

Second, there is a way that her credence function
bt, together with E and
Minimize Disutility, can guide her. The idea is
adapted from van Fraassen: According to Lange, to be in an epistemic
state is to be committed to acting as if that epistemic state is
hypothetically calibrated. That is, an agent with a credence function
b should act as if b is hypothetically calibrated.
Thus, in conjunction with Minimize Disutility, an
agent who learns E and nothing more between t and
t′ ought to choose a credence function
bt′ at t′ with the
following property: at all worlds v at which
bt is hypothetically calibrated relative
to ∼ and at which E is true, her epistemic state at
t′ is hypothetically calibrated relative to
∼′.

Together these entail Conditionalization via the
following theorem (Lange 1999):

Theorem 4. Suppose D(∼) and
D(∼′). And suppose that, for all v in
VF, if
b is hypothetically calibrated at v relative to ∼
and v(E) = 1, then b′ is
hypothetically calibrated at v relative to ∼′. Then
b′( •) = b( • |
E), providing
b(E) > 0.

This argument inherits the objections to the calibration arguments
for probabilism: see section 5.3.

In this section, we consider the rational constraints imposed on an
epistemic disutility function when we treat it as a measure of the
accuracy of an epistemic state at a possible world; that is,
as a measure of the distance of an epistemic state from the epistemic
goal of being true or being correct (Joyce 1998),
(Leitgeb and Pettigrew 2010a), (Leitgeb and Pettigrew 2010b). We say
that a credence function on algebra F is
true or correct at a possible world if it assigns
credence 1 to all propositions that are true at that world and credence
0 to all propositions that are false. Thus, representing possible
worlds using valuation functions v in
VF, we
say that a credence function b in
BF is
true or correct at a possible world v in
VF just
in case v(A) = b(A) for all
A in F. According to accuracy
arguments, matching the truth values is an epistemic goal. And they
attempt to justify Probabilism, Countable
Additivity, and Conditionalization, along
with some other putative epistemic norms, by appealing to this goal and
measures of distance from it.

Henceforth, we drop the subscript on
VF,
BF, and
PF. That
is, we will keep our algebra F fixed
throughout.

In this section, we consider three different ways in which we might
characterize those epistemic disutility functions that measure
inaccuracy. In the first, some of the functions that satisfy the
characterizing conditions are known, but it is an open question whether
any others do. In the second and third attempt, the conditions imposed
are strict enough to narrow the class of inaccuracy measures to a
single familiar function known as the Brier score (or a
closely related function), which is defined as

α²(b, v) =
ΣA in F |b(A) −
v(A)|²

6.1.1 Learning from the Brier score

The Brier score is a plausible measure of the inaccuracy of a
credence function at a world. Indeed, it is the measure often used by
meteorologists in order to measure the inaccuracy of their
probabilistic weather forecasts (Brier 1950). But it is not clear that
it is the only plausible measure. The first attempt to characterize the
class of (global) epistemic disutility functions G :
B × V →
ℜ that measure the inaccuracy of a credence function at a world
attempts to extract the properties of the Brier score
α² that we would like any measure of inaccuracy to
share with that function (Joyce 1998).

We consider such properties now (though the properties I list below
differ from those listed in (Joyce 1998) in certain respects). In each
case, I will state the property formally, and then give an informal
gloss.

Strong Non-Triviality. If b ≠
v, then G(v, v) <
G(b, v).

This says that the true or correct credence
function v in V that is maximally
certain of the truth of all truths and the falsity of all falsehoods at
v is the only minimally inaccurate credence function at
v.

Proposition-wise continuity. For all v in
V, G(•, v) is
proposition-wise continuous on B. That is,
for all b in B and all ε
> 0, there is δ > 0 such that, for all b′ in
B, if |b(A) −
b′(A)| < δ for all A in
F, then |G(b, v)
− G(b′, v)| < ε.

This says that the inaccuracy of a credence function should not be
able to ‘jump’ without any corresponding ‘jump’
in the credences it assigns to propositions. Some have argued that
there might be reason for allowing inaccuracy measures that violate
this condition (Schervish et al 2009).

Unboundedness. For any A in
F, G(b, v) →
∞ as b(A) → ∞ or
b(A) → − ∞.

This says that inaccuracy has no upper bound and it increases
without bound as the credence increases or decreases without bound.

Truth-directednes.s If |b(A)
− v(A)| ≤ |b′(A)
− v′(A)|, for all A in
F, then G(b, v)
≤ G(b′, v′).

It might be argued that this is part of what it means for an
epistemic disutility function to measure inaccuracy. If the credences
of b are all at least as close to truth values at v
as the credences of b′ are at v′, then
b is at most as inaccurate as b′.

Given two credence functions b, b′ in
B and given 0 < λ < 1, the
credence function λb + (1- λ)b′
is defined pointwise: that is, (λb +
(1-λ)b′)(A) = λb(A) +
(1-λ)b′(A). It is plausible to think of
λb + (1-λ)b′ as a compromise
between b and b′. The compromise is biased
towards b if ½ < λ ≤ 1 and biased towards
b′ if 0 ≤ λ < ½. It is unbiased if
λ = ½. Thus, Strong Convexity says that,
if two credence functions are equally inaccurate, the unbiased
compromise between them is more accurate than both.

This is the first truly contentious condition. Various arguments
have been proposed in its favour (Joyce 1998), (Joyce 2009). It should
be noted that the so-called absolute value measure, defined
as

α1(b, v) =
ΣA in F |b(A) − v(A)|,

violates this condition, and yet seems initially to be a plausible
measure of inaccuracy (Maher 2002). Thus, the proponent of
Strong Convexity will have to say what is wrong with
the absolute value measure.

If a putative inaccuracy measure were to violate
Symmetry, then it would be biased towards one part of
the space of possible credence functions over another. After all, there
would be two equally inaccurate credence functions between which there
is a compromise biased towards one by a certain amount that is more
accurate that the compromise biased towards the other by the same
amount (Joyce 1998). Interestingly, the absolute value measure also
violates Symmetry. However, in this case, it seems
that the condition might be more plausible than the claim that the
absolute value score is a good measure of inaccuracy.

The final condition also concerns the inaccuracy of compromises
(there is no analogue of this condition in (Joyce 1998)).

Dominating compromise. Suppose b,
b′, c, c′ in
B. Then, if

G(b, v) ≤ G(c,
v) and G(b′, v) ≤
G(c′, v)

For all A in F, |b(A) − b′(A)| = |c(A) − c′(A)|

then we have

G(λb + (1-λ)b′,
v) ≤ G(λc +
(1-λ)c′, v)

This says that, if (i) b is at most as inaccurate as
c and b′ is at most as inaccurate as
c′ and (ii) b is ‘as far’ from
b′ as c is from c′, then any
compromise between b and b′ is at most as
inaccurate as the corresponding compromise between c and
c′. Initially, we might think that a stronger condition
should be imposed, which results from removing condition (ii) from the
antecedent. However, this is too strong. Indeed, it is inconsistent
with Truth-Directed and Strongly
Non-Trivial. Thus, we restrict ourselves to those cases in
which b is ‘as far’ from b′ as
c is from c′.

It is not immediately clear that these conditions are consistent
with one another. We can see that they are by showing that the Brier
score satisfies them, as do other epistemic disutility functions that
are obtained from the Brier score by weighting the summands. However,
it is not known whether any further functions satisfy the
conditions.

It is clear that none of the conditions listed in this section
depends for its statement or for its plausibility on the finitude of
F nor on the propositions of
F being non-indexical. Thus, they apply
equally to inaccuracy measures on sets of credence functions on a
countable algebra, if they apply at all; and similarly for an algebra
that includes indexical propositions.

In section 6.2.1, we will see how these conditions might be put to
use to give an argument for Probabilism. First,
however, we consider two alternative sets of conditions that we may
wish to impose on epistemic disutility functions conceived as measures
of inaccuracy. These both serve to narrow the field to the Brier score
alone.

6.1.2 Inaccuracy and urgency

In the previous section, we considered a particular global epistemic
disutility function called the Brier score and we tried to extract the
features we would like any inaccuracy measure to share with it. In this
section, we consider both local and global epistemic disutility
functions that measure the inaccuracy of individual credences and whole
credence functions, respectively. And we consider two conditions on how
these two sorts of functions are related. Suppose L :
F × V ×
ℜ → ℜ is a local inaccuracy measure, and G :
B × ℜ → ℜ is a global
inaccuracy measure. Then the first condition demands that L
and G take a particular form. And the second demands that
they interact in a certain way: in particular, it demands that they
both give rise to a measure of the urgency with which an agent ought to
change her credence in the light of the inaccuracy she can expect it to
have, and that these local and global measures of urgency agree in all
situations (Leitgeb and Pettigrew 2010a).

Local and Global Comparability. There is a strictly
increasing function f : ℜ → ℜ such that
f(0) = 0 and

L(A, v, r) =
f(|v(A) − r|) and
G(b, v) =
f(||b −
v||)

where ||b −
v|| is the Euclidean distance between the two
vectors b =
(b(v1), ….,
b(vn)) and
v = (v(v1),
…., v(vn)), where
v1, …, vn are
the atoms of F.

This says that local inaccuracy supervenes in a certain way on the
difference between credence and truth value; and global inaccuracy
supervenes in the same way on the Euclidean distance between credence
function as applied to atoms of F and truth
assignment to those atoms. (Note that this condition can only be
imposed if F has atoms. In fact, as the
second condition assumes that, furthermore, F
is finite, we must have that F has
atoms.)

Why might we wish to impose this condition on our local and global
epistemic utility functions when we conceive of them as measures of
inaccuracy? One possible answer is this: We might hold that, although
we are not sure that the difference between credence and truth value is
the correct measure of the inaccuracy of the credence, we are at least
sure that it gives the correct ordering of credence-truth value pairs
by their accuracy. That is, credence r in A is at
least as accurate at v as r′ in
A′ is at v′ if, and only if, |r
− v(A)| ≤ |r′ −
v′(A′)|. If this is the case, we will
want our local inaccuracy measure to be a strictly increasing function
of such differences. And similarly, perhaps, for the global epistemic
utility function. You may think that the Euclidean distance at least
gets it right with respect to the ordering of credence function-world
pairs by their accuracy. But this leaves a question unanswered: Why
should the same strictly increasing function take us from differences
to local epistemic utility functions and from Euclidean distances to
global epistemic utility functions? Here, we might say that accuracy is
a dimensionless quantity. That is, it is applicable in any space of
credence functions, regardless of dimension. Thus, once the appropriate
distance measure is imposed on the space of credence functions, we
apply the same function to turn it into an inaccuracy measure (Leitgeb
and Pettigrew 2010a).

To state the second condition considered in this section, we must
explain how a local or global inaccuracy measure can give rise to a
measure of the urgency with which an agent ought to alter her credence
in a proposition due to the inaccuracy she expects it to have. First,
we need to define what we mean by the expected local and global
inaccuracy of an individual credence or a credence function,
respectively. Suppose an agent uses local inaccuracy measure
L and global inaccuracy measure G; suppose she has
credence function b; and suppose her total evidence is that
proposition E is true. Then we can define the local inaccuracy
she expects credence r in proposition A to have as
follows (where ΣE denotes the sum over
valuation functions v in V that make
E true, and where we abuse notation and let v denote
both valuation and the corresponding atomic proposition):

LExpL, E(r, A |
b) = ΣEb(v)L(A, v,
r)

And the global inaccuracy she expects the credence function
b′ to have is:

GExpG, E(b′ |
b) = ΣEb(v)G(b′, v)

Thus, the local inaccuracy she expects b′ to have is
given by the sum of the local inaccuracies at each world in which
E is true weighted by her credence in that world. And
similarly for the global inaccuracy she expects it to have. It is
important to note that, if we were to allow indexical propositions into
our algebra F, so that the valuation
functions v in
VF
represent centred worlds rather than ordinary uncentred possible
worlds, then it would no longer be clear that this is the correct
definition of expected local and global inaccuracies (Kierland and
Monton 2005). So it would no longer be clear that this argument would
work. Thus, our assumption that the algebra contains only non-indexical
propositions is crucial here.

Second, we need to use these expectation values to define the local
and global measures of the urgency with which an agent ought to change
her credence. The local directed urgency with which our agent ought to
change credence x in proposition vi is
given by:

LUrgL, E(x,
vi |
b) = ∂/∂x
LExpL, E(x,
vi |
b)

where it exists. That is, it is the rate at which the expected local
inaccuracy of her credence changes. And the global directed urgency
with which an agent ought to change credence in proposition
vi whilst having credences given by credence
function b′ for other propositions is given by:

GUrgG, E(b′,
x, vi |
b) =
∂/∂x GExpG,
E(b′(x / vi) |
b)

where b′(x / vi) agrees
with b′ on all propositions except
vi where it takes x. That is, it is the
rate at which the expected global inaccuracy of her credence in
vi changes.

With these definitions in hand, we state our condition on local and
global inaccuracy measures:

Agreement on urgency. For all
vi, all b′, and all x in
ℜ, we have that LUrgL, E(x,
vi |
b) and GUrgG,
E(b′, x, vi |
b) exist, they are continuous at x, and

LUrgL, E(x,
vi |
b) = GUrgG,
E(b′, x, vi |
b)

That is, any local and global inaccuracy measures give rise to
measures of local and global urgency that are defined everywhere, are
continuous, and agree everywhere.

The idea is that a pair of local and global inaccuracy measures
should not give rise to a dilemma for an agent who uses them. But, if
they disagree on the urgency with which an agent must change her
credence in the light of her expected inaccuracy, she will face such a
dilemma. Hence, Agreement on urgency.

From these two conditions, we can infer that local and global
inaccuracy measures must be closely related to the Brier score. Indeed,
given the local inaccuracy measure characterized by these two
conditions, the Brier score is obtained by taking the local inaccuracy
of the credence in each proposition and summing. And the global
inaccuracy measure characterized by these two conditions is obtained by
taking the local inaccuracy of the credence in each atomic
proposition and summing (Leitgeb and Pettigrew 2010a).

Theorem 5. Local and Global Comparability
and Agreement on Urgency entail

L(A, v, r) =
λ|v(A) − r|²

G(b, v) =
λΣ|v(vi) −
b(vi)|², where again
v1, …, vn are the atoms
of F.

6.1.3 Inaccuracy and distance

We turn now to the final attempt to characterize the inaccuracy
measures. Note, however, that its ambition is less than that of
previous characterizations. This characterization applies only to
functions that measure the inaccuracy of probabilistic credence
functions p in P. It says nothing
about more general inaccuracy measures that also measure the inaccuracy
of non-probabilistic credence functions b in
B − P. The
characterization is based on the following idea (Selten 1984). The
inaccuracy of a credence function at a world is something like a
measure of distance of that credence function from the truth at that
world. There is a natural way in which to extend this distance measure
from a measure of the distance of a credence function from a world to a
measure of the distance of one credence function from another. Given a
global inaccuracy measure G, let the distance of
p′ from p be given by
GExpG(p′ |
p) −
GExpG(p |
p). That is, it is the
expected inaccuracy of p′ by the lights of p
corrected so that the distance of p from itself is zero. (Note
that GExpG(p |
v) −
GExpG(v |
v) =
G(p, v) if G(v,
v) = 0. That is, on plausible assumptions, this new measure of
distance genuinely extends the old measure G.) However, if
this is going to provide a distance measure, it must be symmetric. That
is, the distance of p from p′ must always be
the same as the distance of p′ from p. This is
the first condition considered in this section:

Perspective Indifference. For all p,
p′ in P,

GExpG(p′ |
p) −
GExpG(p |
p) =
GExpG(p |
p′) −
GExpG(p′ |
p′)

It turns out that this, together with the two weak conditions on
G listed below, characterizes the same global inaccuracy
measures as were characterized by the conditions considered in the
previous section, when they are restricted to measuring the inaccuracy
of credence functions in P.

World Indifference. If v,
v′, w, w′ in
V then

If v ≠ v′ and w ≠
w′, then G(v, v′) =
G(w, w′)

If v = v′ and w =
w′, then G(v, v′) =
G(w, w′)

Thus, all worlds are equally inaccurate relative to one another. And
each world is equally inaccurate relative to itself.

Weakly Non-Trivial. If v, v′
in V then G(v, v)
< G(v, v′).

Thus, a world is more accurate relative to itself than relative to
another world.

This theorem characterizes the legitimate global inaccuracy measures
restricted to P. Thus, it cannot be used to
justify Probabilism. However, it can be used to
justify Conditionalization and other diachronic norms,
as well as more restrictive synchronic norms.

What is notable from the last two sections is that quite different
conditions motivated by quite different philosophical considerations
characterize roughly the same global epistemic disutility functions as
the correct measures of inaccuracy. That is, they both characterize the
Brier score (or some slight variant). We seem to be triangulating
towards that measure of inaccuracy.

In the previous section, we saw three different sets of conditions
that we might impose upon a global epistemic disutility function to
ensure that it is a measure of inaccuracy. In this section, we put
these characterizations to work justifying the synchronic norm
Probabilism. We consider two arguments, which are
distinguished by the norms of standard decision theory that they
employ.

6.2.1 The Accuracy Dominance Argument for Probabilism

The first argument for Probabilism appeals to the
following norm of standard decision theory. First, some terminology. If
A and A′ are actions, we say that

Aweakly dominatesA′ if
A is at least as good as A′ at all worlds and
better than A′ at some world.

A strongly dominates A′ if A is better than
A′ at all worlds.

Now, the norm:

Weak Act-Type Dominance. Suppose an agent is
choosing between two sorts of action. Suppose further that the
following hold:

For every action of the first sort, there is an action of the
second sort that strongly dominates it.

For any action of the second sort, there is no other action of
either sort that even weakly dominates it.

In this situation, an agent ought to choose an action of the second
sort.

In section 7.2, we will consider a stronger version of this norm,
but the weaker version will suffice for present purposes.

Now suppose that we characterize the inaccuracy measures as we did
in Section 6.1.1. Then Weak Act-Type Dominance
together with the following theorem gives an argument for
Probabilism (and Countable
Additivity, when it is applied to countable algebras). We call
it the Accuracy Dominance Argument for Probabilism
(Joyce 1998).

Theorem 7. Suppose G
satisfies:

Strongly Non-Trivial,

Proposition-Wise Continuity,

Unbounded,

Truth-Directed,

Strong Convexity,

Symmetry, and

Dominating Compromise.

Then:

For every non-probabilistic b in B
− P, there is a probabilistic p in
P that strongly dominates it.

For every probabilistic p in P,
there is no credence function b in B
that weakly dominates it.

Thus, if our global epistemic utility function satisfies the
conditions listed in Section 6.1.1, then two things follow. First: For
any non-probabilistic credence function b, there is a
probabilistic credence function c that is more accurate than
b however the world turns out; that is, c strongly
dominates b. Now, it might seem that this alone ought to be
enough to establish Probabilism. However, for all that
has been said so far, it might be that, for every probabilistic
credence function c, there is another credence function
d that strongly dominates c. If this were the case,
then we couldn't conclude Probabilism, since the
probabilistic credence functions would suffer from the same epistemic
vice as the non-probabilistic ones: that is, they would also be
strongly dominated. The second part of Theorem 7 shows that this isn't
so. No probabilistic credence function is even weakly dominated. From
this, we conclude Probabilism.

6.2.2 Objections to the Accuracy Dominance Argument for Probabilism

The conditions on inaccuracy measures are too strong

The first raises problems with the conditions imposed on global
inaccuracy measures. As noted above, Strong Convexity
is the most controversial of the conditions required by the Accuracy
Dominance Argument for Probabilism. In particular, it
is not clear why equal compromises between equally inaccurate credence
functions must be more accurate rather than just as
accurate. However, it is not possible to weaken the condition in this
way and retain the conclusion of the previous theorem (Maher 2002). Of
course, it is difficult to adjudicate disputes over the veracity of
what might be taken to be bedrock claims about our concept of
accuracy.

Accuracy isn't the only virtue. The second sort of objection
to the Accuracy Dominance Argument questions the assumption that an
agent's epistemic utility function ought to measure only the accuracy
of her credences. After all, there are other features of credences
that we value. For instance, the simplicity of our epistemic state is
typically taken to be a virtue, as is its informativeness, its
explanatory power, and its verisimilitude, to name just a few. Surely
our epistemic utility function ought also to measure the degree to
which our epistemic states have these virtues. And perhaps once the
epistemic utility function has been altered to reflect this variety of
epistemic virtues, we will no longer be able to use it to argue
for Probabilism. The proponent of the Accuracy
Dominance Argument typically responds to this charge in one of three
ways. First, she might argue that some of these apparent epistemic
virtues are really pragmatic virtues: thus, she might say that
explanatory power and simplicity are really pragmatic virtues because
we value them for their usefulness in drawing inferences from our
epistemic state and deciding how to act quickly on the basis of our
epistemic state. Second, she might argue that those other virtues that
are genuinely epistemic and not pragmatic are to be understood in
terms of the virtue of accuracy. See, for instance, the discussion of
verisimilitude in (Joyce 1998), where Joyce argues that an epistemic
state that enjoys greater verisimilitude will typically also enjoy
greater accuracy. It is hoped that a similar story can be told about
the other putative epistemic virtues. And finally, third, she might
argue that, while these other virtues are genuine, they are always
trumped by considerations of accuracy (Leitgeb and Pettigrew
2010a).

No credence function that dominates on every measure. The third objection to the Accuracy Domination Argument concerns the
normative force of the argument. While the decision-theoretic norm
Weak Act-Type Dominance seems compelling, the
normative force of the Accuracy Dominance Argument can still be
questioned: Aaron Bronfman originally raised the following problem in
an unpublished manuscript entitled “A Gap in Joyce's Proof of
Probabilism”; it has been discussed by (Hájek 2008),
(Pettigrew 2010). The conditions on a global inaccuracy measure on
which this argument is based don't characterize a single function;
they characterize a family of functions. But, for all the theorem
tells us, it may well be that, for a given non-probabilistic credence
function
b, different functions in this family of global inaccuracy
measures will give different probabilistic credence functions that
strongly dominate b. Thus, an agent with a non-probabilistic
credence function b might be faced with a range of probability
functions, each of which strongly dominates b relative to a
different global inaccuracy measure. Moreover, it may be that any
probability function that strongly dominates b relative to
G does not strongly dominate b relative to
G′ and indeed risks very high inaccuracy at some world
relative to G′, and vice versa. In this situation, it
is plausible that the agent ought not to move from her
non-probabilistic credence function to any probabilistic credence
function.

There are two replies to this objection. According to the first, the
objection relies on a false meta-normative claim; according to the
second, it misunderstands the purpose of Joyce's conditions.

The meta-normative claim on which the objection relies is the
following: For a norm to hold, there must be specific advice available
to those who violate that norm concerning how to improve their
behaviour. Bronfman's objection begins with the observation that, for
any specific advice that one might give to a non-probabilistic agent
concerning which credence function she should adopt in favour of her
own, there will be inaccuracy measures that satisfy Joyce's conditions,
but don't sanction this advice; indeed, there will be inaccuracy
measures relative to which that advice is very bad. Thus, the Accuracy
Domination Argument violates the meta-normative claim. But, the reply
submits, the meta-normative claim is false: for a norm to hold, it is
sufficient that there is a serious defect suffered by those who violate
the norm that is not shared by those who satisfy the norm; it is not
also required that there should be advice on which specific action an
agent should perform to improve her behaviour. And Joyce's argument
satisfies this sufficient condition. One ought to satisfy
Probabilism because non-probabilistic credence
functions suffer from a serious epistemic defect (namely, accuracy
domination) that does not beset probabilistic ones. And this fact is
‘supertrue’, so to speak: that is, it is true on any precisification of
the notion of accuracy that obeys Joyce's minimal conditions on an
inaccuracy measure.

The second reply to this objection does not take issue with this
meta-normative claim mentioned above; indeed, on the understanding of
the Accuracy Domination Argument it proposes, the argument satisfies
the necessary condition imposed by that claim. That is, according to
this reply, the Accuracy Domination argument, properly understood, does
in fact provide specific advice to non-probabilistic agents. The idea
is this: There are (at least) three ways to understand the purpose of
Joyce's conditions on inaccuracy measures. First, we might think that
the notion of inaccuracy is vague; and we might say that any inaccuracy
measure that satisfies the conditions is a legitimate precisification
of it. This is a supervaluationist approach. On this approach,
there is no specific advice available to non-probabilistic agents that
is sanctioned by all precisifications. Second, we might think that the
notion of inaccuracy is precise, but that we have only limited
knowledge about it, and that the sum total of our knowledge is embodied
in the conditions. This is an epistemicist approach. On this
approach, there is specific advice, but it is not available to us.
Third, we might think that there is no objectively correct inaccuracy
measure; rather, any inaccuracy measure that satisfies the conditions
is rationally permissible. But nonetheless, any particular agent has
only one such measure. This is a subjectivist approach. On
this understanding, there is specific advice for any non-probabilistic
agent. Any such agent uses an inaccuracy measure that satisfies Joyce's
conditions. And this gives, for any non-probabilistic credence
function, a probabilistic credence function that strongly dominates it.
So the specific advice is this: adopt one of the probabilistic credence
functions that strongly dominates your non-probabilistic credence
function relative to your favoured measure of inaccuracy. This
gives us Probabilism and does so without violating the
meta-normative claim on which Bronfman's objection relies.

6.2.3 The Expected Inaccuracy Argument for Probabilism

In the Accuracy Dominance Argument, we consider the epistemic norms
that we can derive by imposing conditions on global inaccuracy
measures. In the arguments considered in this section, we consider the
effect of imposing conditions on local inaccuracy measures instead. In
particular, we consider the effect of imposing conditions that narrow
the class of legitimate local inaccuracy measures to the following
quadratic family:

L(A, v, r) =
λ|v(A) − r|²

where λ > 0. And we exploit the following familiar norm
from standard decision theory:

Minimize expected disutility. An agent ought to
choose an act that has minimal expected disutility relative to her
current credence function and in the light of the total evidence she
currently possesses.

That is: If an agent has the credence function b and the
strongest proposition she knows is E, then she ought to choose
an action a0 such that

ΣEb(w)U(a0, w)
≤ ΣEb(w)U(a, w)

for all a in A, where again we
write ΣE to denote the sum over all worlds
w in which E is true.

Now, in standard decision theory, this norm imposes no constraints
on the credence function that an agent may have, although they are
typically assumed to lie in P. The norm says
how credence functions together with utility functions should inform
action; it says nothing about what should inform credence functions or
utility functions. However, in epistemic utility theory, the acts are
the epistemic states. So there may be credence functions that it is not
rational for an agent to have since it may be that those credence
functions do not minimize expected epistemic disutility relative to
themselves and in the light of certain evidence (Gibbard 2007). That
is, by adopting them, an agent immediately violates Minimize
expected disutility. As the following theorem shows, if we
measure local inaccuracy by one of the quadratic local inaccuracy
measures, then all and only the probabilistic credence functions
p in P such that
p(E) = 1 minimize expected inaccuracy relative to
themselves and in the light of evidence E (Leitgeb and
Pettigrew 2010b).

Theorem 8. Suppose L(A,
v, r) = λ|v(A) −
r|². Suppose b in
B and E in
F. Then the following are equivalent:

Thus, any agent whose credence function is not a probability
function, or does not assign credence 1 to E, will expect some
other credence function to be better, epistemically speaking, than she
expects her own to be. From this, we might argue for
Probabilism. We call this argument the Expected
Inaccuracy Argument.

Again, we might object to the Expected Inaccuracy Argument by
objecting to the characterization of the legitimate local inaccuracy
measures. For instance, we might object to the heavy use of geometric
assumptions in the characterization of L from Local
and Global Comparability and Agreement on Directed
Urgency. In that characterization, it is assumed that the
global inaccuracy of a credence function at a world supervenes on the
Euclidean distance between two closely-related vectors: Why the
Euclidean metric? Why not some other metric? (Leitgeb and Pettigrew
2010a). Another objection is that the definition of expected values of
random variables as weighted sums is legitimate only for probability
functions and thus cannot be used in an argument for
Probabilism (Joyce 1998).

In this section, we focus on arguments for
Conditionalization that appeal to local measures of
inaccuracy. Again, we appeal to Minimize expected
disutility. When an agent receives new evidence in the form of
a proposition learned with certainty, she ought to choose her new
credence function so as to minimize its expected inaccuracy relative to
her old credence function and in the light of her total evidence, which
incorporates her new evidence. As the following theorem shows, if we
measure local inaccuracy by one of the quadratic local inaccuracy
measures, and if we assume Probabilism, this leads to
Conditionalization via the following theorem (Leitgeb
and Pettigrew 2010b):

That is, if our agent has credence function b prior to
learning E, then the credence function she will expect to be
best, epistemically speaking, in the light of evidence E is
the credence function obtained from b by conditionalizing on
E. The objections to this argument are the same as to the
Expected Inaccuracy Argument for Probabilism.

Note that this argument for Conditionalization
appeals crucially to the standard definition of expected utility. As
noted above, this is only legitimate because the propositions in the
algebra F are non-indexical, and so the
worlds represented by the valuation functions are not centred worlds.
If such worlds were allowed, then the definition of expected utility
might change, and thus the correct update rule might be different. This
is as it should be, since there are numerous concerns about
Conditionalization in the presence of indexical
propositions (Arntzenius 2003).

Accuracy Arguments have also been given for a handful of further
epistemic norms, including an alternative to Richard Jeffrey's
generalization of Conditionalization (Leitgeb and
Pettigrew 2010b) and both Halfer and Thirder solutions to the Sleeping
Beauty puzzle (Kierland and Monton 2005).

In the previous two sections, we considered particular apparent
epistemic vices, namely, distance from calibration and distance from
truth. And we explored what conditions an epistemic disutility function
must have in order to count as a measure of these vices, and which
norms could be justified by appealing to these functions. In this
section, we consider conditions that we might impose on any epistemic
disutility function, regardless of which epistemic vice or collection
of epistemic vices it is intended to measure.

There are two sorts of conditions that we might require any
epistemic disutility function to satisfy. We treat them in this
section.

7.1.1 Propriety, Strict Propriety, and Admissibility

The first sort of general condition on an epistemic disutility
function stems from the following idea: There are some credence
functions that we know it is rationally permissible to have in the
presence of certain evidence. For instance, we might hold that, in the
absence of any evidence, it is at least rationally permissible to have
the probabilistic credence function that assigns the same credence to
each possible world, even if this is not rationally required.
Therefore, no legitimate epistemic disutility function should rule out
these credence functions as irrational in the presence of that
evidence. Depending on the class of credence functions to be preserved
as rationally permissible, this condition can narrow the class of
legitimate disutility functions enough to allow us to argue for
Probabilism or
Conditionalization.

Suppose, for instance, that, in the absence of any evidence, any
credence function in a given set C ⊆
B is rationally permissible. Then you might
wish to impose one of the following three conditions on an epistemic
disutility function:

Propriety for C. For all p in
C and b in
B, if b ≠ p, then, prior
to any evidence, p expects itself to have at most as great
epistemic disutility relative to U as it expects b
to have.

That is, for all p in C and
b in B, if b ≠
p, then GExpU,
⊤(p|p) ≤ GExpU,
⊤(b|p)

If this isn't the case, there is a credence function p in
C that expects itself to be epistemically
worse than it expects another credence function b to be.
Together with Minimize Expected Disutility, this gives
that p is not rationally permitted.

Strict Propriety for C. For all p in
C and b in
B, if b ≠ p, then, prior
to any evidence, p expects itself to have less epistemic
disutility relative to U than it expects b to
have.

That is, for all p in C and
b in B, if b ≠
p, then GExpU,
⊤(p|p) < GExpU,
⊤(b|p)

If this isn't the case, there is a credence function p in
C that expects itself to be at most as
epistemically good as it expects another credence function b
to be. On its own, Minimize Expected Disutility does
not declare p irrational on this basis. According to that
norm, it is rationally permissible for an agent to choose an action
that she expects to be at most as good as another. However, we might
justify Strict Propriety nonetheless, if we are
prepared to argue for a claim that we might call
Conservatism, which says that, if an agent is in a
rationally permissible epistemic state, then it is never rational for
her to shift epistemic state in the absence of any new evidence (Oddie
1997). If Strict Propriety fails, then p
expects b to be just as good as it expects itself to be. Thus,
by Minimize Expected Disutility, it is rationally
permissible for our agent to shift from p to b
without any new evidence. It follows from this and
Conservatism that p cannot be rationally
permissible in the first place, which contradicts our assumption that
all credence functions in C are rationally
permissible.

Admissibility for C. For all p in
C and b in
B, if b ≠ p, then there
is v in V such that
U(p, v) > U(b,
v).

If this isn't the case, there is p in
C that is epistemically at least as bad as
another credence function b at all worlds. Again, since this
would permit a shift from p to b without any new
evidence, Conservatism entails that p is not
rationally permitted, which contradicts our assumption.

7.1.2 Separability

The second sort of general condition we might impose on any
epistemic disutility function is this:

Separability. Suppose that
F′ is a subset of the algebra
F and, for all b, b′,
c, c′ in B,

b(A) = b′(A) and
c(A) = c′(A) for all
A in F′

b(A) = c(A) and
b′(A) = c′(A) for all
A in F −
F′.

Then U(b, v) >
U(c, v) if, and only if,
U(b′, v) =
U(c′, v).

This requirement militates against certain sorts of holism about the
epistemic disutility of credence functions. It rules out, for instance,
an epistemic utility function that takes into account only the maximum
local epistemic disutility when determining the global epistemic
disutility, or accords weight to the variance amongst the credences
assigned to propositions. It is not clear whether this is a problem for
Separability, for it might be that, whenever we seem
to value a global or holistic feature of a credence function, we really
value some set of complex local facts.

There is a series of theorems that might underwrite arguments for
Probabilism from the three conditions
Propriety for C, Strict Propriety for
C, and Admissibility for C. In each case, the
set C of credence functions that we must
accept as rationally permitted prior to any evidence is the set
P of probabilistic credence functions. We
consider this assumption below. But first we must state the stronger
version of Weak Act-Type Dominance.

Strong Act-Type Dominance. Suppose an agent is
choosing between two sorts of action. Suppose further that the
following hold:

For every action of the first sort, there is an action of the
second sort that weakly dominates it.

For any action of the second sort, there is no other action of
either sort that even weakly dominates it.

In this situation, an agent ought to choose an action of the second
sort.

The difference between Weak and Strong
Act-Type Dominance lies in the first condition, which is
weaker in the stronger version of the norm. We turn now to the theorems
that underpin the Propriety Arguments for Probabilism.

For every non-probabilistic b in B
− P, there is a probabilistic p in
P that strongly dominates it.

For every probabilistic p in P,
there is no credence function b in B
that weakly dominates it.

Together with Weak Act-Type Dominance, this entails
Probabilism (Predd, et al. 2009). Thus, by imposing
the stronger condition of Strong Propriety for P on
our inaccuracy measures, we need only appeal to the weaker norm of
Weak Act-Type Dominance to establish
Probabilism.

The following is a conjecture rather than a theorem:

Conjecture 1.Admissibility for
P, Truth-Directedness, and
Proposition-Wise Continuity entail that

For every non-probabilistic b in B
− P, there is a probabilistic p in
P that strongly dominates it.

For every probabilistic p in P,
there is no credence function b in B
that weakly dominates it.

Together with Weak Act-Type Dominance, this would
entail Probabilism (if it were true). The conjecture
has been proved for cases in which F is not
an algebra of propositions, but rather a finite set of mutually
exclusive and exhaustive propositions (Joyce 2010). If it could be
proved, we would no longer need to use expected disutilities in our
argument for Probabilism, though it is not clear
whether they are really problematic as they occur in propriety
arguments, since they are always applied to probabilistic credence
functions.

Of course, these arguments require a premise that seems rather
strong at first sight. In order to establish
Probabilism, which says that it is a
necessary condition on rationality to have a credence function
in P, they must assume that the credence
functions in P are all rationally permissible
prior to any evidence, and thus that it is a sufficient
condition on rationality to have a credence function in
P, at least prior to obtaining any evidence.
But is this assumption justified? The problem is that many philosophers
wish to claim that Probabilism is not the strongest
necessary condition on rationality in the absence of evidence. For
instance, we might say that rationality requires further that our agent
satisfies David Lewis' Principal Principle, which says
that her credence in proposition A conditional on the
proposition that the objective chances are given by the function
ch ought to be ch(A) (Hájek 2008). Or
we might say that an agent's credence function must obey Strict
Coherence, which says that it must assign credence 0 only to
necessary falsehoods and credence 1 only to necessary truths. Or we
might say that our credence function ought to encode minimal
information when that is measured by Shannon's entropy function: this
norm is called Maximize Entropy. And so on. The point
is that, if these more restrictive requirements for rationality hold,
then it is not true that every probabilistic credence function is
rationally permitted prior to any evidence. After all, the
probabilistic credence functions that violate these further norms are
not rationally permitted at all! And if this is true, the arguments
above will fail.

However, the theorems above do show that, even if some more
restrictive norm than Probabilism is true, there can
be no argument from, for instance, Strong Propriety for
C (where C
is a proper subset of P) via Strong
Act-Type Dominance to the conclusion that an agent ought to
have a credence function in C. After all, any
epistemic disutility function that satisfies Strong Propriety
for P will satisfy Strong
Propriety for C. And, for these
functions, we have that all credence functions in
B − P are
strongly dominated while no credence function in
P is even weakly dominated. That is, they
don't militate in favour of credence functions in
C particularly.

That is, if our epistemic disutility function satisfies
Strong Propriety for P, conditionalizing on a piece of
evidence E minimizes expected disutility by the lights of the
agent's original credence function b and in the presence of
E. This is a generalization of Theorem 9.

Of course, the same objections apply to Strong Propriety for
P as we saw in the previous section.

Epistemic utility theory has already proved itself a powerful tool
in formal epistemology. In this survey, we have focussed only on the
arguments for the core Bayesian norms. But there are many areas in
which it hasn't yet been exploited. In this concluding section, we
suggest some of the many questions that it might be used to answer:

As is reflected in this entry, epistemic utility theory has been
employed mainly in the study of credence functions and their norms.
But, as noted at the beginning, there are many other ways in which we
may wish to model epistemic states. For instance, we might model them
as sets of credence functions. And if we do, there are well known
problems concerning how we should update in the light of new evidence
(Seidenfeld and Wasserman 1993). Perhaps epistemic utility theory can
shed light on this. Of course, in order to employ epistemic utility
theory to explore norms governing epistemic states modelled in a
particular way, we require a standard decision theory based on that
sort of model. And, in the case of sets of credence functions, this is
still controversial (White 2010), (Elga 2010).

Above, we mentioned briefly that epistemic utility theory has been
employed a little in the theory of self-locating beliefs (Kierland and
Monton 2005). But the account to which it gives rise is not fully
general. Again, in order to make it fully general, we require a
standard decision theory in the self-locating framework, and again this
is controversial (Piccione and Rubinstein 1997).

We have focussed only on the norms of Probabilism,
Countable Additivity, and
Conditionalization. There are many other norms, both
synchronic and diachronic, that we might try to justify using epistemic
utility theory: Strict Coherence (or
Regularity), Principal Principle (or
New Principle), and Maximize Entropy
to name a few.

Epistemic utility theory has been oddly absent from the discussion
of the norms that govern epistemic states modelled as sets of full
beliefs. When we say that an agent ought to have classically consistent
full beliefs, we tend to justify this by pointing out that if she does
not, there is no possible way the world might be on which her beliefs
are all true. That is, if we take truth to be the correct notion of
vindication for full beliefs, we appeal implicitly to
Possibility of vindication. However, as we saw, there
are concerns about this norm. Do the same epistemic norms for full
beliefs follow if we consider different norms from decision theory,
such as Weak or Strong Act-Type
Dominance?

The SEP would like to congratulate the National Endowment for the Humanities on its 50th anniversary and express our indebtedness for the five generous grants it awarded our project from 1997 to 2007.
Readers who have benefited from the SEP are encouraged to examine the NEH’s anniversary page and, if inspired to do so, send a testimonial to neh50@neh.gov.