Tuesday, June 25, 2013

There’s been an idea
circulating for some time that retributive justice is morally and logically
founded upon the fact that we possess a thing called free will - some assumed
weird mechanism that disconnects human behaviour from the normal
cause-and-effect based evolution of nature. If, after all, human actions were
really ‘just’ the result of mechanistic microscopic processes, then whatever we
do would be entirely determined by the laws of physics and the configuration of
our environment. And if this were really so, then whatever somebody does is a
consequence of the fact that they could not have willfully done otherwise, in which case
there is no sense in which a person can be blamed for doing wrong. And if
culpability can not be established, then doesn't the validity of punishment
look suspect? So prevalent is this idea that it forms a major part of
contemporary legal philosophy.

Not only is this idea of
free will completely nonsensical, but the connection between it and the justice
of retribution is totally unfounded. Vengeance, after all is really just an
expression of anger. Is anger rational? Is it a reliable, systematic producer of well judged behaviour? Or is it merely a crude and ancient heuristic moderator
of human interaction that in a modern, enlightened era, we could do with much
less of?

There is simply no
logical link between culpability and the righteousness of retributive punishment, which somehow ‘repays a debt to society.’ Try to derive this principle
logically, and you will find it impossible without directly assuming the
desired outcome among the required premises.

What we must see instead
is that, in line with more agreeable consequentialist moral philosophies, the
only appropriate consideration when assigning juridical interventions is: what
actions will lead to a better society for us, and for our children to grow up
in? In this case, the problem justifying enforced treatment (e.g. imprisonment)
upon somebody who ‘couldn't have acted any other way’ disappears
completely. The enforced treatment is only indirectly determined by the
person’s actions, and is wholly derived from what we would like the world to
look like in the future. The relevance of past behaviour is limited to the
extent to which it serves as a predictor of future behaviour. What are
traditionally viewed as punishments - justice administered for the satisfaction
of the victims - become more properly viewed as treatments, designed to
minimize the cost for society of a person’s demonstrated antisocial tendencies.

The desire for revenge against a person who has committed wrongs against us is likely to be at least partly due to population
genetics, naturally selected for self-preserving behaviour (it is advantageous for me to create an environment in which another’s bad behaviour toward me makes
life uncomfortable for them), but the idea linking this concept of justice to
free will seems to be far more memetic than genetic: it is a matter of culture.

The concept that free will
is necessary and sufficient to entail the punishment of
moral failing seems to date back to Aristotle, in Nicomachean
Ethics. I’m no scholar of Aristotle, but to me its not clear
whether for him the appropriateness of blame has a consequentialist or an
absolutist foundation - are praise and blame desirable because they make
certain modes of future behaviour more likely, or because they try to balance
what has happened in the past?

If I had to speculate on
the reason for the cultural success of the notion specifically linking
retribution to free will, I’d guess that it was found to come in very useful
when dictators wrestled with the seemingly contradictory goals of being loved,
yet being utterly feared.

How can you be brutally
violent against your enemies, while remaining admired by the remaining
population? One way would seem to be to claim that violence
against certain people is morally just, even necessary. “It made me cry to do
that to him, but his crimes left me no choice.” Such pious adherence to absolute moral
principle, even when it demands the most unpleasant actions, might even elevate
a thug to saintly status, bringing joyous tears to the eyes of his devoted followers.

In the course of time,
it may be that neuroscience, experimental psychology, and the social sciences
will come to the conclusion that a better society is generally one in which
people’s innate desire for vengeance is somewhat fulfilled (I doubt this, as
I’ll explain shortly), but this would not undermine the principle that
treatment of criminals should be determined on purely consequentialist grounds.
If it happened to be that this desire was so strong, and so innate that no amount
of cultural evolution could remove it, and that the frustration of unplacated
victims of crime was so intense as to threaten civil unrest, then a retributive
element may need to be restored, but the ultimate reasoning would be the
rational evaluation of different courses of action, and selection in favour of
those strategies determined to be in society’s best interests.

The debate between
absolutist and consequentialist moral philosophies has been going on for a long time: consequentialism goes at least as far back as Machiavelli, around 500
years ago. Absolutism goes much further back, and persists still. This is
really quite surprising - its not a difficult problem to solve. All morality is
manifestly consequentialist, no matter what we might profess. Wait a moment,
‘thou shalt not kill.’ It doesn't get much more absolutist than that does it?
No, it doesn't But just how absolutist is that exactly?

For starters, no society
implements principles like this in the strict absolutist way. Christians
believe that this basic rule, ‘thou shalt not kill’ was handed to them by their personal deity: thou shalt not kill means that killing is
absolutely wrong, under all circumstances - no exceptions allowed. Its never
stopped Christian nations going to war when they felt like it. It never
prevented Christian inquisitors burning people at the stake when the winter
nights were dark and cold. All assumed absolutist principles have always been
tacitly appended with a host of additional clauses beginning with the word
‘Unless...’ This is pure consequentialism.

Well, maybe those people
adding their arbitrary ‘unless’ clauses were simply bad moralists. Thou shalt
not kill is a good rule after all, right? Yes, typically. But what if the
person who you are invited to consider killing has a strong ambition to kill
you at the earliest convenient moment? Or alternatively, what if that person
suffers intolerably, with no hope of improvement, ever? Killing can not be said
to be categorically wrong under all circumstances - it all depends on the
consequences.

Finally, absolutist
versions of morality, in the sense that the content of the principle, “X is
wrong,” takes precedence over the actual likely outcomes of performing X, are
actually demonstrably incoherent. Lay aside the problem of what could possibly
be the source of any absolute moral principle. Suppose for a moment that such
principles really are set by some divine entity. What then? These
moral laws are obviously not physical laws, since we have the capacity to
systematically deviate (if we didn’t, they wouldn’t be called moral laws in the
first place). Thus, somewhere in the process of our minds, decisions are made
about whether or not to follow a particular moral principle at a particular
time. If we believe that Godzilla will roast us alive for eternity if we fail
to follow the rules, then those predicted consequences are what guide our
behaviour. Moral decisions are always the result of a consequentialist
evaluation of the options.

Going a little beyond
the standard terminology, then, morality is absolute, but with only one
rule: “whatever actions are revealed by a rational analysis to be most likely
to bring me closer to achieving my goals are the actions I should implement.”
This is exactly as I demonstrated in an earlier article on scientific morality. Furthermore, it illustrates that the founding principles of that
argument, (1) goodness does not exist outside minds and (2) morality is doing
what is good, are both properly basic: they are necessarily correct, and our
knowledge of them is not contingent upon empirical observations.

Lets get back to the
potential role of retribution in an advanced consequentialist morality. The
extent to which the will to see wrongdoers punished is genetically innate, as
opposed to culturally transmitted, is certainly an interesting question, and
one whose investigation would no doubt require some ingenious experimental
protocols. But I strongly suspect that the innateness of these feelings is
limited to an extent that can easily be overruled by rationality, allowing
vengeance to be effectively eliminated from all consideration in the problem of dealing with criminals. There are several
reasons for this suspicion.

Firstly, if we look
at the portion of the population most commonly found expressing anger, I’m fairly
sure it'll be small children. Anger is, we
all recognize, a childish emotion. We grow out of it. We learn (with great relief
to most, I presume) to control it, and when as adults we occasionally succumb
to emotional outbursts, we typically feel silly afterwards. As advanced society
has developed, we have continually learned, oh so painfully slowly, that anger
and resentment typically achieve little except the propagation of more anger
and resentment. Secondly, there seems to be considerable evidence showing that the traditional practices of retributive justice have failed miserably. This paper, for example, argues strongly that imprisonment is ineffective at reducing the frequency and intensity of crime, and that alternative treatments such as education achieve greater reductions of recidivism. Another article summarizes some of its findings: "Research into specific deterrence shows that imprisonment has, at best, no effect on the rate of reoffending and often results in a greater rates of recidivism." The utilitarian advantages of a more rational approach seem to be there for the taking.

Thirdly, whatever
memetic components there are, supporting any in-built tendency to desire
vengeance, they can, by definition, be overcome by changing our culture.

Fourthly, religious
leaders throughout history seem to have made artful use of the philosophy of free will in order to
bolster acceptance of their reign of terror (hell doesn’t seem very fair, if
all your actions are fixed by the way God set up the boundary conditions, and
so damnation only gains a veneer of coherence if we have free will - a notion
that evidently has to extend to the mortal plane, in order to justify certain
historical hobbies of the major religions). This suggests that the hard-wired
machinery of anger was, stripped of any socially conditioned props, insufficient to
sustain the required levels of violence in our ever increasingly sophisticated culture.

When it comes to figuring out how to deal with crime, therefore, it is irrational to decide based on a shortsighted lust to see a criminal's debt repaid through suffering. Instead, we must look to scientific data to decide what courses of action minimize the costs to society. We must seek to understand what treatments will cost-effectively turn today's rule breakers into tomorrow's contributors to society, and what measures will economically eliminate the desire and the opportunity to commit crimes in the first place.

Monday, June 17, 2013

There is a popular folk theorem among some Bayesians, to the effect that it is unacceptable for a probability to be 0 or 1. There's a simple motivation for this principle: as rationalists, we demand the opportunity for nature to educate us by blessing us with novel observations. No matter how confident we become in some proposition, it should always be possible for us to change our minds when strong enough evidence accumulates in favour of some alternative. As Karl Popper rightly observed, after all, a theory that is invulnerable to falsification is not much of a theory.

But what happens if P(H | I) becomes zero? How is the probability for the hypothesis, H, to be updated by new evidence? If P(H | I) is 0 then the numerator in Bayes' theorem, prior times likelihood,

P(H | I) × P(D | HI),

is also 0, regardless how convincing the data, D, may be. No matter what happens, the outcome is unchanged: a nice round posterior.

Similarly, if P(H | I) is 1, then for the converse hypothesis, P(~H | I) is necessarily 0. Now, the denominator in Bayes' theorem is

P(H | I) × P(D | HI) + P(~H | I) × P(D | ~HI)

and when the second term (everything after the plus sign) is zero, both numerator and denominator in Bayes' theorem are the same, producing the ratio 1, for all eternity.

I have sympathy with this motivation, therefore, but as a general rule, it is utter nonsense, resulting from forgetting one of the most basic facts about how inference works. The mathematics I have just described is all correct, but there are other ways for us to change our minds, and retain our rationality.

A recent, brief discussion at another website drew my attention to an article by Eliezer Yudkowsky, in which he also argues that 0 and 1 are not probabilities. The argument is a little different: the amount of evidence (the likelihood ratio expressed in log-odds form) needed to update an intermediate probability to 0 or 1 is infinite. This infinite certainty is an absurdity, he claims, unable to be represented with real numbers, and so 0 and 1 aren't probabilities.

Yudkowsky, as many readers will know, is a widely regarded thinker and writer on the topic of applied rationality, and I can recommend his writing most highly. The overlap between his broad philosophy and mine is, I would say, very large, with the main difference that in cases where I lack mastery of the theoretical apparatus, he very often does not. Yudkowsky knows and understands the mind-projection fallacy better than the vast majority (see for example his article of the same name, and this followup), but in this instance, he seems to have forgotten it. It is essentially the same error made by all who claim that probabilities equal to zero or one should not enter one's calculations.

A little thought experiment, then, before resolving the paradox. Let H be the hypothesis that in some five-day interval, at some location on the Earth, the sun will rise on each of the five mornings. Let D represent the observation of the sun rising on the first of the mornings in question. What is P(D | HI)? I humbly submit that it is 1. Is H, therefore, not an appropriate, well-formed hypothesis? Is D not a valid observation? Evidently, if probability theory is to have any power at all, it must be capable of supporting hypotheses such as H, and data as trivial as D. It is not conceivable to have such things automatically ruled out under our epistemology.

In general, it is perfectly legal for P(D | HI) (or, for that matter, a posterior, like P(H | DI)) to be 0 or 1, but here's that basic fact about probability that we have to keep in mind: a probability can not be divorced from the model within which it is calculated. A model may imply infinite certainty, without any person ever achieving that state (which would be impossible to encode in their brain, anyway). Our notation says something very important: P(D | HI), no matter what it is, is necessarily contingent upon the conjunction HI, which obviously depends on the truth of I. This is something we can never be absolutely certain of.

The all-important "I" that forms the foundation for every Bayesian calculation is usually said to stand for 'information' - all the relevant prior knowledge we have. Unfortunately, this creates a little trap that too many fall into, which is to forget that there is another component besides information needed before "I" is fully populated. "I" could just as easily stand for 'imagination.' To get Bayes' theorem to do any useful work for us, we have to specify a theoretical framework. We have to make certain assumptions, including specification of a full set of hypotheses against which is to H compete. To arrive at a candidate set of hypotheses, we must make a leap of the imagination. There is no possible criterion for judging whether or not all our assumptions are correct, and no way to know in advance whether we have chosen the 'correct' set of hypotheses. To think otherwise is just wishful thinking.

To think that the infinite confidence implied under some "I" represents the actual infinite confidence of some physical rational agent is the mind-projection fallacy. Instead, a probability is a model of the confidence a rational agent would have if "I" was known to be true. That this confidence might need to be modelled using a non-numeric concept such as infinity is merely an uncomfortable (though often highly convenient) mathematical fact.

And now we can see how it is that we can continue to accrue knowledge under the threat of the apparent epistemological cul de sac that is P = 1 or P = 0. To liberate ourselves from the straight jacket of "I", we simply need to recognize that what we now call "I" is itself merely a hypothesis in some broader hierarchical model. This is how model checking (wielding the analytical blade of model comparison) works, which, as I pointed out before, seems philosophically unpalatable to many, yet is in fact an essential ingredient in our inferential machinery. This is how we can come to look again at our theoretical framework and say 'hold on, I should be working with a different hypothesis space.' Novel theories and scientific revolutions would be impossible without this flexibility.

Some see this need in Bayesian epistemology to make assumptions in "I" that can't be established with certainty as a severe weakness, but it isn't - at least not one that can be avoided (no matter how many black belts we hold in the ancient art of self deception). We can always extend the scope of our hypothesis space so that some of our assumptions become themselves random variables in a wider inferential context, but to have all of them take on the role of hypotheses under test would require an infinitely deep hierarchy of models. In the example above, where H was a hypothesis about the sun rising, one might argue that a more sophisticated model would account for the possibility, however small, that my sensation of the sun rising was mistaken. Indeed, this is correct, and would prevent the likelihood function going to 1. Sooner or later, though, I'm going to have to introduce a definitive statement - one that supposes something to be definitely true - in order to avoid the intractable quagmire of infinite complexity.

The early frequentists (and some still, in private communication with me), claimed that this subjectivity of Bayesian probability is its downfall, but in reality, it is impossible to learn in a vacuum. No kind of inference is possible without assumptions. Part of the beauty of Bayesian learning is that we make our assumptions explicit. The frequentists, of course, also make assumptions (see Yudkowsky, for example), but by refusing to acknowledge them, like the fabled ostrich sticking its head in the sand, they eliminate the possibility to examine whether or not they are reasonable, to understand their consequences, or to correct them when they are manifestly wrong.

Search This Blog

About Me

I'm behind the grasshopper. I'm a physicist at the University of Houston. I work on radiation monitoring, using pixelated particle detectors, for NASA's astronauts. Previously, I worked in x-ray imaging and, before that, in semiconductor physics. (I don't know if the grasshopper has his own blog.)