5. A Specific Conditioning Inductive
Logic

The illustrations of the last section have all employed the
most familiar of inductive logics, one formulated in terms of the probability
calculus. There are many more in the class of deductively definable logics.
In general, one defines a new inductive logic by
selecting any suitably well-behaved function fN in (S). Very many
such functions are possible, so there are many alternative inductive logics
definable. Most of these logics will differ in some important property from
the probability calculus.

In work elsewhere, I have followed a tradition that seeks
alternatives to the additivity of the probability
calculus. According to that additivity, if we have two contradictory
propositions A and B, we find the probability of their disjunction by adding
their individual probabilities:

P(A or B | ΩÂ ) = P(A | ΩÂ ) + P( B | ΩÂ
)

This sort of additivity makes sense if low values of
probability are to represent disbelief. (For
then a low probability for A forces a high probability for not-A, so that we
strongly believe not-A, which means we strongly disbelieve A.) It is
inappropriate if low values in the inductive strengths are to reflect ignorance.(For
then we might not want a low support for A to force high support not-A. If we
are largely ignorant about a, we are typically similarly ignorant about
not-A, so we would want low support for both A and not-A, which the
probability calculus does not allow.) Then a non-additive measure may
be more appropriate. For complete ignorance, I have argued elsewhere, we
should seek a measure that allows

[A or B | ΩÂ ] = [ A | ΩÂ ] = [ B | ΩÂ
]

where all three strengths are set to some "ignorance"
value.

Since I have elaborated on these possibilities elsewhere,
here I want to pursue another way that inductive logics may differ from a
probabilistic logic. The logic to be pursued below differs from the Bayesian
system in employing a different dynamics of conditionalization.

Narrowness

One of the distinctive properties of conditional
probabilities is a property I have elsewhere called "narrowness." It asserts
that P(A|B) = P(A&B|B). That means that the conditional probability
P(A|B) takes no notice of the second of the two
disjunctive parts of A = (A&B) or (A&~B).

This property can be generalized to other inductive logics
and is expressed as:

Narrowness
[A|B] = [A&B|B]

The property is so familiar as generally to pass without
comment. It is, however, a rather odd aspect of
Bayesian confirmation theory. Imagine for example that we are trying
to identify some unknown animal. Let us say that we learn it is a bird. In a
narrow logic, that evidence gives the same support to the animal being a
canary or to it being a (canary or whale), even though the evidence precludes
it being a whale:

[ canary | bird ] = [ canary or whale | bird ]

That is, Narrowness lets quite inert disjuncts pass in a
proposition without penalty, even though they play no role beyond background
noise (or worse).

In one sense, Narrowness is admissible. In the population
of birds, entities that are canaries arise exactly as often as
entities that are canaries or whales. However that statistical fact rarely
exhausts our interest in the case. The evidence that the animal is a bird
points more specifically to it being a canary
than it does to the animal being a canary or a whale. While we might assign
equal support to both possibilities in a narrow logic, we would then
generally add a step to our analysis. We would dismiss the proposition
(canary or whale) as encumbered with nuisance noise in comparison with the
proposition (canary), perhaps remarking that the evidence (bird) points
specifically to the second.

That we need to add an extra step by hand in cases like
these suggests that our inductive logic is incomplete. Shouldn't our logic
tell us directly that the two propositions are not really equally favored by
the evidence?

Defining a specific conditioning logic

We pick out an inductive logic that addresses these
problems and contradicts Narrowness by selecting the following function
fN in (S):

(SC)
[A|B]SC = fN(#A&B, #A&~B, #~A&B) =
(#A&B/#B).(#A&B/#A)

The basic idea behind the logic can be seen by
looking at its two terms.

The first term (#A&B/#B) is the same
as is found in the definition of probability (P). This much of the
logic is the same as a probabilistic logic.

The second term (#A&B/#A) will only
fail to be unity when A extends beyond B. Only then will the logic
differ from a probabilistic logic. This second term is a penalty paid
whenever A extends beyond B. Proposition A extends beyond B whenever
#A exceeds #A&B; so the penalty factor is their ratio.

One should read the formula in (SC) loosely as saying "a probability
with a penalty for extending disjunctively beyond the evidence."

It is this second factor (#A&B/#A) that penalizes A=(canary or
whale) when we are conditionalizing on B=bird.

It is easy to see that this formula will produce
an asymptotically stable logic. The
inductively adapted partitions could all be produced from one initial
partition by uniform refinements. These are refinements that replace
each atom by the same number of atoms. Then the value of
[A|B]SC will remain the same in all refinements, for the
strengths depend only on the ratio of the atoms counts #A&B, #A
and #B and not on N. These ratios remain the same under the uniform
disjunctive refinements to new adapted partitions.

Here's how it works in a simple case. Let us say we
start out in some partition in which #A&B=1, #A=2 and #B=2, so
that [A|B]=1/4.

A uniform refinement will replace every single atom by the same
number of new atoms. Suppose that new number is 10. Then, in the new
partition, we will have #A&B=10, #A=20 and #B=20, so that
[A|B]=1/4 as before.

Selectivity

We can illustrate the logic's
signature property, its selectivity under conditioning, with a simple
example. A die is tossed and our evidence is that the outcome is

LOW = or or .

The probabilistic degrees of support and the corresponding selective
conditioning degrees of support for the two cases are:

P( | or or ) = 1/3

P( or | or or ) = 1/3

P( or or | or or ) = 1/3

P( or or or | or or ) = 1/3

[ | or or ]SC = 1/3 . 1/1 = 1/3

[ or | or or ]SC = 1/3 . 1/2 =
1/6

[ or or | or or ]SC = 1/3 . 1/3 =
1/9

[ or or or | or or ]SC = 1/3 . 1/4 =
1/12

When conditionalizing on LOW, a probability measure does
not penalize the outcome for inert atoms that contradict the evidence LOW.
The SC logic does penalize, so that the support offered each outcome
diminishes according to the number of inert atoms
it contains.

Properties

A few properties of the specific conditioning logic are
noteworthy. If we conditionalize on the background Ω, then the logic
reverts to an additive measure. For no
proposition A can extend beyond the background Ω. A quick calculation
confirms that this is the case. If there are N atoms in Ω, then

[ A | Ω ]SC =
(#A&Ω/#Ω).(#A&Ω/#A) =
(#A/N).(#A/#A) = #A/N

However the result of conditionalizing is rather different
from what happens when we conditionalize in probability theory.

In probability theory,
conditionalization produces a new measure P(.|B), which is also an additive
measure. Because of Narrowness, we need no longer considers those parts of
the original space Ω that contradict B.

The corresponding quantity in the specific conditioning logic [.|B]SC is not
in general an additive measure. Since Narrowness is violated, we must
continue to consider the disjunctive parts of outcomes that contradict the
evidence. Indeed the essential novelty lies in outcomes being penalized for
just such parts.

In the Bayesian system, we can usefully compute the
conditionalized probabilities by means of Bayes' theorem. The analog of Bayes' theorem in a specific conditioning logic is
astonishing simple and can be read directly from the symmetry of A and B in
the definition (SC) of [A|B]SC:

[A|B]SC = [B|A]SC

This simple formula shows that degrees of support coincide
with what is called, in the Bayesian context, likelihoods. That is much less
significant that it would be in the Bayesian context, for likelihoods are much more variable in the new logic. If
we have some hypothesis H that entails evidence E, the probabilistic
likelihood would be P(E|H)=1. In a specific conditioning logic, we are no
longer assured of a simple value for the likelihood [E|H]SC when H
entails E. If E has many atoms that extend beyond H, this likelihood can be
very much less than one.

The better way to read this striking symmetry
[A|B]SC = [B|A]SC is as follows. In a specific
conditioning logic, when we form [A|B]SC, A is penalized for
falling short of B; and it is penalized for extending beyond B. The symmetry
in the formula merely expresses the fact that both penalties are exacted in equal measure.

One consequence is that the maximum value of [A|B]SC = 1 can only arise
when the finitely many atoms of A and the finitely many atoms of B coincide.
In the probability calculus, when P(A|B) = 1, B deductively entails A. (Or at least that is the case when we have finitely
many equiprobable atoms, so that there are no "measure zero"
outcomes.) This limiting case of deduction does not arise in a
specific conditioning logic. In it, [A|B]SC = 1 means that A is
B.

When should a specific conditioning logic be
used?

According to the material theory of induction, there is no
One True andUniversal inductive logic. It is definitely
not intended that this specific conditioning logic is offered here as
that one true and universal logic. Rather, all that is suggested is that it
is another logic that may have application in this or that domain.

Whether a particular inductive logic can be applied in a
particular domain is, according to the material theory of induction, decided by the material facts obtaining in the domain.
That means that the applicability of the logic must be decided on a case by
case basis. There is no general principle that can decide it in advance.

We can display one case in
which the material facts will support the logic. It turns out to be a case
that also supports a probabilistic logic. What decides between them is what
we would like our degrees of support to express. In forming [ hypothesis |
evidence ], do we want the evidence to support the hypothesis while ignoring
inert possibilities in the hypothesis; or penalizing it for inert
possibilities?

This case arises when we have a process for whose outcomes
we have physical chances, in one form or another.
Such processes include the familiar coin tosses, die throws, timing of
radioactive decay, weather on particular days of the year, and so on. The
essential point is that the physical chances give us some purchase on the
relative frequency of certain outcome.

For example, if the process is the repeated throw of a die, then we expect in the long run that we
will throw a roughly
with frequency 1/6. We can also say that among the LOW (= or or ) outcomes thrown, we expect the to appear roughly with frequency 1/3.

These material facts can license a probabilistic
logic of induction, if we require that the inductive strength for
some outcome is to match, near enough, to the long term frequency of
the outcome. That is, we match inductive
strength with frequency of long run truth, near enough.

We assign a strength 1/3 to a LOW die throw being a , since that outcome will
obtain roughly 1/3rd of the time among LOW throws.

Or we might assign inductive strength 0.999 to the proposition that
there is at least one head among each ten coin tosses, for in the
long run, that will be the case 0.999 fraction (=1023/1024) of the time.

This is NOT another
ill-fated attempt to deliver an "interpretation of
probability." It is assumed that we already have physical chances or
physical probabilities for the outcomes and all that comes with them.
In particular, we have some version of the law of large numbers that
assures us that relative frequencies will, with arbitrarily good
chance, match relative frequencies over many trials.

What is also assumed is that we match the atoms of the inductively
adapted partitions of our inductive logic with outcomes that have
equal chances. That is sufficient to enable probabilistic degrees of
belief to match long term frequencies, "near enough"--which means
within the usual qualifications of a law of large numbers.

Demands for an "interpretation of probability" have had a woeful
effect. For they demand an explicit definition, even an operational
or behaviorist definition, of a central term of a theory. We have
long since abandoned this demand for other central terms of our
physical theories, since it is generally unsatisfiable. One product
of this demand, the subjective interpretation, has proven especially
dangerous. It allows its proponents to dismiss failures of Bayesian
confirmation theory, such as the theory's failure adequately to treat
ignorance, as mere aberrations of opinions.

Let us write this a little more formally. A and B will be
outcomes each comprised of one or more atomic outcomes, where these atomic
outcomes have equal physical chances. The relative
frequency that an atomic outcome in A obtains whenever B obtains among
n trials is

Relative frequencyn(A, B)

If we set strengths of support to the probability

(P)P(A|B) =
#A&B/#B

then these strengths will match the long term relative
frequencies, near enough.

Now let us turn to specific
conditioning. The problem with merely matching inductive strengths
with frequency is that we have no defense against inert atoms. In the case of
the die throw, the outcome or
will obtain among LOW throws just as often as the simple outcome . The atom is inert in the sense that it never
arises among LOW throws and makes no contribution to the relative frequency.
Thus it no makes contribution to the inductive strength assigned.

To arrive at probabilities, we used relative frequency by
itself as our guide in forming inductive strengths. The simple way to defend
against these inert atoms is to apply a penalty factor to these relative
frequencies. We would then use the penalized relative
frequency as our guide in forming these inductive strengths. The
penalty for outcome A among outcomes of type B in trials can be defined as

Penalty(A, B) = (
number atoms in A that can be realized in the trials of type B) / (number of
atoms in A)

in place of the simple relative
frequency as our guide to forming inductive strengths. The two terms of the
formula (SC) match the two terms of this formula for penalized relative
frequency. The first term, #A&B/#B will match the long term relative
frequency of A among B, as before. The second term #A&B/#A equals the
penalty.

Therefore, if we form our inductive strengths according to
(SC) in these cases, then, in the long run of very large n, our inductive
strengths will match up near enough with the
penalized relative frequency just defined.