Sunday, 10 September 2017

Aggregating abstaining experts

In a series of posts a few months ago (here, here, and here), I explored a particular method by which we might aggregate expert credences when those credences are incoherent. The result was this paper, which is now forthcoming in Synthese. The method in question was called the coherent approximation principle (CAP), and it was introduced by Daniel Osherson and Moshe Vardi in this 2006 paper. CAP is based on what we might call the principle of minimal mutilation. We begin with a collection of credence functions, $c_1$, ..., $c_n$, one for each expert, and some of which might be incoherent. What we want at the end is a single coherent credence function $c$ that is the aggregate of $c_1$, ..., $c_n$. The principle of minimal mutilation says that $c$ should be as close as possible to the $c_i$s -- when aggregating a collection of credence functions, you should change them as little as possible to obtain your aggregate.

We can spell this out more precisely by introducing a divergence $\mathfrak{D}$. We might think of this as a measure of how far one credence function lies from another. Thus, $\mathfrak{D}(c, c')$ measures the distance from $c$ to $c'$. We call these measures divergences rather than distances or metrics, since they do not have the usual features that mathematicians assume of a metric: we assume $\mathfrak{D}(c, c') \geq 0$, for any $c, c'$, and $\mathfrak{D}(c, c') = 0$ iff $c = c'$, but we do not assume that $\mathfrak{D}$ is symmetric nor that it satisfies the triangle inequality. In particular, we assume that $\mathfrak{D}$ is an additive Bregman divergence. The standard example of an additive Bregman divergence is squared Euclidean distance: if $c$, $c'$ are both defined on the set of propositions $F$, then
$$
\mathrm{SED}(c, c') = \sum_{X \in F} |c(X) - c'(X)|^2
$$In fact, $\mathrm{SED}$ is symmetric, but it does not satisfy the triangle inequality. The details of this family of divergences needn't detain us here (but see here and here for more). Indeed, we will simply use $\mathrm{SED}$ throughout. But a more general treatment would look at other additive Bregman divergences, and I hope to do this soon.

Now, suppose $c_1$, ..., $c_n$ is a set of expert credence functions. And suppose $c_i$ is defined on the set of propositions $F_i$. And suppose that $\mathfrak{D}$ is an additive Bregman divergence -- you might take it to be $\mathrm{SED}$. Then how do we define the aggregate $c$ that is obtained from $c_1$, ..., $c_n$ by a minimal mutilation? We let $c$ be the coherent credence function such that the sum of the distances from $c$ to the $c_i$s is minimal. That is,
$$
\mathrm{CAP}_{\mathfrak{D}}(c_1, \ldots, c_n) = \mathrm{arg\ min}_{c \in P_{F_i}} \sum^n_{i=1} \mathfrak{D}(c, c_i)
$$
where $P_{F_i}$ is the set of coherent credence functions over $F_i$.

As we see in my paper linked above, if each of the credence functions are defined over the same set of propositions -- that is, if $F_i = F_j$, for all $1 \leq i, j, \leq n$ -- then:

if $\mathfrak{D}$ is squared Euclidean distance, then this aggregate is the straight linear pool of the original credences; if $c$ is defined on the partition $X_1$, ..., $X_m$, then the straight linear pool of $c_1$, ..., $c_n$ is this:$$c(X_j) = \frac{1}{n}c_1(X_j) + ... + \frac{1}{n}c_n(X_j)$$

if $\mathfrak{D}$ is the generalized Kullback-Leibler divergence, then the aggregate is the straight geometric pool of the originals; if $c$ is defined on the partition $X_1$, ..., $X_m$, then the straight geometric pool of $c_1$, ..., $c_n$ is this: $$c(X_j) = \frac{1}{K}(c_1(X_j)^{\frac{1}{n}} \times ... \times c_1(X_j)^{\frac{1}{n}})$$where $K$ is a normalizing factor.

In this post, I'm interested in cases where our agents have credences in different sets of propositions. For instance, the first agent has credences concerning the rainfall in Bristol tomorrow and the rainfall in Bath, but the second has credences concerning the rainfall in Bristol and the rainfall in Birmingham.

I want to begin by pointing to a shortcoming of CAP when it is applied to such cases. It fails to satisfy what we might think of as a basic desideratum of such procedures. To illustrate this desideratum, let's suppose that the three propositions $X_1$, $X_2$, and $X_3$ form a partition. And suppose that Amira has credences in $X_1$, $X_2$, and $X_3$, while Benito has credences only in $X_1$ and $X_2$. In particular:

Thus, we might expect our aggregation procedure to give the same result whether we aggregate Amira's credence function with Benito's or with Benito's extended credence function. That is, we might expect the same result whether we aggregate $c_A$ with $c_B$ or with $c^*_B$. After all, $c^*_B$ is in some sense implicit in $c_B$. An agent with credence function $c_B$ is committed to the credences assigned by credence function $c^*_B$.

However, CAP does not do this. As mentioned above, if you aggregate $c_A$ and $c^*_B$ using $\mathrm{SED}$, then the result is their linear pool: $\frac{1}{2}c_A + \frac{1}{2}c^*_B$. Thus, the aggregate credence in $X_1$ is $0.25$; in $X_2$ it is $0.6$; and in $X_3$ it is $0.15$. The result is different if you aggregate $c_A$ and $c_B$ using $SED$: the aggregate credence in $X_1$ is $0.2625$; in $X_2$ it is $0.6125$; in $X_3$ it is $0.125$.

Now, it is natural to think that the problem arises here because Amira's credences are getting too much say in how far a potential aggregate lies from the agents, since she has credences in three propositions, while Benito only has credences in two. And, sure enough, $\mathrm{CAP}_{\mathrm{SED}}(c_A, c_B)$ lies closer to $c_A$ than to $c_B$ and closer to $c_A$ than the aggregate of $c_A$ and $c^*_B$ lies. And it is equally natural to try to solve this potential bias in favour of the agent with more credences by normalising. That is, we might define a new version of CAP:
$$
\mathrm{CAP}^+_D(c_1, \ldots, c_n) = \mathrm{arg\ min}_{c' \in P_{F_i}} \sum^n_{i=1} \frac{1}{|F_i|}D(c, c_i)
$$
However, this doesn't help. Using this definition, the aggregate of Amira's credence function $c_A$ and Benito's extended credence function $c^*_B$ remains the same; but the aggregate of Amira's credence function and Benito's original credence function changes -- the aggregate credence in $X_1$ is $0.25333$; in $X_2$, it is $0.61333$; in $X_3$, it is $0.1333$. Again, the two ways of aggregating disagree.

So here is our desideratum in general:

Agreement with Coherent Commitments (ACC) Suppose $c_1$, ..., $c_n$ are coherent credence functions, with $c_i$ defined on $F_i$, for each $1 \leq i \leq n$. And let $F = \bigcup^n_{i=1} F_i$. Now suppose that, for each $c_i$ defined on $F_i$, there is a unique coherent credence function $c^*_i$ defined on $F$ that extends $c_i$ -- that is, $c_i(X) = c^*_i(X)$ for all $X$ in $F_i$. Then the aggregate of $c_1$, ..., $c_n$ should be the same as the aggregate of $c^*_1$, ..., $c^*_n$.

CAP does not satisfy ACC. Is there a natural aggregation rule that does? Here's a suggestion. Suppose you wish to aggregate a set of credence functions $c_1$, ..., $c_n$, where $c_i$ is defined on $F_i$, as above. Then we proceed as follows.

First, let $F = \bigcup^n_{i=1} F_i$.

Second, for each $1 \leq i \leq n$, let $$c^*_i = \{c : \mbox{$c$ is coherent & $c$ is defined on $F$ & $c(X) = c_i(X)$ for all $X$ in $F$}\}$$ That is, while $c_i$ represents a precise credal state defined on $F_i$, $c^*_i$ represents an imprecise credal state defined on $F$. It is the set of coherent credence functions on $F$ that extend $c_i$. That is, it is the set of coherent credence functions on $F$ that agree with $c_i$ on propositions in $F_i$. Thus, if, like Benito, your coherent credences on $F_i$ uniquely determine your coherent credences on $F$, then $c^*_i$ is just the singleton that contains that unique extension. But if your credences over $F_i$ do not uniquely determine your coherent credences over $F$, then $c^*_i$ will contain more coherent credence functions.

Finally, we take the aggregate of $c_1$, ..., $c_n$ to be the credence function $c$ that minimizes the total distance from $c$ to the $c^*_i$s. The problem is that there isn't a single natural definition of the distance from a point to a set of points, even when you have a definition of the distance between individual points. I adopt a very particular measure of such distances here; but it would be interesting to explore the alternative options in greater detail elsewhere. Suppose $c$ is a credence function and $C$ is a set of credence functions. Then $$D(c, C) = \frac{\mathrm{min}_{c' \in C}D(c, c') + \mathrm{max}_{c' \in C}D(c, c')}{2}$$ With this in hand, we can finally give our aggregation procedure:$$\mathrm{CAP}^*_D(c_1, \ldots, c_n) = \mathrm{arg\ min}_{c' \in P_F} \sum^n_{i=1} D(c, c^*_i)$$

The first thing to note about CAP$^*$ is that, unlike the original CAP, or CAP$^+$, it automatically satisfies ACC.

Let's now see CAP$^*$ in action.

Since CAP$^*$ satisfies ACC, the aggregate for $c_A$ and $c_B$ is the same as the aggregate for $c_A$ and $c^*_B$, which is just their straight linear pool.

One interesting feature of CAP$^*$ is that, unlike CAP, we can apply it to individual agents. Thus, for instance, suppose we wish to take Cleo's single credence in $X_1$ and 'fill in' her credences in $X_2$ and $X_3$. Then we can use CAP$^*$ to do this. Her new credence function will be $$c'_C = \mathrm{CAP}^*_{\mathrm{SED}}(c_C) = \mathrm{arg\ min}_{c' \in P_F} D(c', c_C)$$ That is, $c'_C(X_1) = 0.5$, $c'_C(X_2) = 0.25$, $c'_C(X_3) = 0.25$. Rather unsurprisingly, $c'_C$ is the midpoint of the line formed by the imprecise probabilities $c^*_C$. Now, notice: the aggregate of Amira and Cleo given above is just the straight linear pool of Amira's credence function $c_A$ and Cleo's 'filled in' credence function $c'_C$. I would conjecture that this is generally true: filling in credences using CAP$^*_{\mathrm{SED}}$ and then aggregating using straight linear pooling always agrees with aggregating using CAP$^*_{\mathrm{SED}}$. And perhaps this generalises beyond SED.

1 comment:

Posted this on facebook then figured it might be more appropriate to post here.

Interesting considerations. I wonder if there is anything useful to be said when there isn't a unique extension of the credences to the whole space but they merely impose a restriction on the remaining values on which they are undefined.

For instance, I suspect if Alice assigns a probability to X_1, X_2 and X_3 (which form a partition) and Bob assigns P(X_1)=.99 this highly restricts his assignment to X_2 and X_3 but I suspect for an appropriate credence function for Alice we will find merely aggregating these credences on X_1 will yield a result that couldn't be achieved by aggregating any coherent extension of Bob's credences to the whole space with Alice's. However, since there is no unique extension one needs to do something more complex then merely use the unique extension. Perhaps consider some kind of average of all possible extensions of Bob's credence function with Alice's credence would meet this more demanding criteria.

There might be an interesting theorem in here ... I'll have to think about it.