Tag Archive

The credit risk world likes to work with ‘odds’ and related quantities so these are covered today.

You could just do everything in terms of probability, i.e. PD, which is unambiguous. PD lies in [0,1] and a small number (like 0.002) is a better
customer than a bigger number (like 0.013). In typical modelling
situations (in Australia, in the good times..), a lot of PDs would have
one or two or even three leading zeroes and these numbers are not handy
for transcription or to quickly convey which zones they lie in.

It goes without saying that it often more palatable to format a PD as a percentage, e.g. PD = 0.013 as PD = 1.3%.

‘Odds’ have a special status because they are intimately linked with
logistic regression, the main PD-modelling statistical tool. Odds
can be worked out from the PD, and vice versa, as follows:

Odds are generally taken to be the Good:Bad odds; thus a bigger number for odds is a better
situation. I have seen analysts using Odds the other way up i.e. the
Bad:Good odds. You can come out alive but it will confuse your
colleagues; +/- changes of sign will cascade through and
graphs will tilt the opposite way.

One step closer to the logistic zone is to transform to “log_odds”.

log_odds = ln(odds)

odds = exp(log_odds)

‘ln’ means natural logs, i.e. to the base ‘e’. Actually, mathematicians always
mean natural logs when they say log and as a matter of pride would
never mention the base, or contemplate a base other than ‘e’ unless it
was a neat way to summarise a problem that had structure particular to
integral bases. Ambiguity can arise: computer systems that are
tech-oriented, like SAS or MATLAB, assume ‘log’ means ln, whereas those
that are business-oriented, like MS/Excel, assume that ‘log’ means
log_to_base_10. It also doesn’t help that ‘ln’ is not comfortable
in speech.

By ‘log’ I always mean natural log, and I use log10 or
log2 to mean logs to base 10 or 2. For the meantime,
the terminology ‘log_odds’ will be used, which is easy in speech,
but if anyone can suggest better nomenclature they are welcome to
put it forward.

If we’ve taken the right choices so far, a bigger number for log_odds is a better situation. Note that log_odds can be negative (when odds < 1 which is when PD > 0.5).

To make the numbers more convenient to handle, it is common practice
to convert the log_odds to a ‘score’ on a user-friendly scale that
wouldn’t involve negatives or decimal places. For the first time in this
chain of transformation, arbitrary scaling constants are involved in
this choice: one for location and one for scale (spread). A typical approach is illustrated below:

for location: bang a stake in the ground at the point that will
represent odds of 1 (== log_odds of zero == PD of 0.5): so, for example,
choose a score of 500 to represent this point (which BTW would be a lousy customer)

for scale: this is normally done by specifying how many points it takes to double the odds
(PDO). A comfortable choice would be PDO=20, which says that a score of
520 <=> odds=2, 540 <=> odds=4, 560 <=> odds=8 etc.

Because log_odds is a logarithmic scale, the above choices work out
and amount to a linear transformation of log_odds to score. The two
scaling parameters, and hence the transformations from log_odds to score
and back, will depend on these fairly arbitrary choices.

PDO=20 gives a nice granularity to the scores, which will mostly land
in the 500-800 zone and you won’t feel the need to use decimal points
i.e. whole-number scores suffice. As long as PDO is chosen to be
positive, it will still be the case that a bigger score is a better situation.

All the above transformations are absolute arithmetic ones that
always apply, irrespective of context such as outcome window, default
definition, calibration, closed goods in/out, etc. If you find you
disagree with someone via these calcs, it means you started from
different contexts and therein lies the entire explanation for your
disagreement.