Monday, 2 January 2017

Having recently put up here a couple of posts about mistakes, I feel I should perhaps say why I find (some) mistakes fascinating.

As you would expect, there are a number of reasons.

Perhaps a mistake casts interesting light on someone's thought processes, revealing the way a mathematician was approaching a problem or what was in their had as they tackled it.

Perhaps it is simply a cautionary example: seeing how someone else has erred helps one avoid making the same mistake.

Yet another reason is not exactly schadenfreude (though that may be part of it) but that seeing better people than me make errors is encouraging: I make lots of mistakes and it's helpful to realise that most other people do too!

Now, I worked for many years as a software engineer, working on safety-critical systems, and I have taught software engineering.

Errors occur too often in software, and the better we understand how we make errors, the more likely we are to be able to reduce their frequency.

When I was writing software, I felt that there were lessons to be learnt from railway accidents, particularly those where a remarkable combination of circumstances defeated what had seemed to be an infallible system.

You might have had considerable faith in the Tyer electric tablet system, which for many years after its introduction prevented the dreadful collisions on single-track lines which had occurred when two trains travelling in opposite directions entered the same section: but at Abermule in 1921 the combination of many tiny lapses by several individuals subverted what had appeared to be an infallible system.

Equally unlucky was the Hull Paragon accident in 1927, when two apparently independent slips by signalmen interacted in an extremely unlikely way to subvert the signalling system which protected the trains (the photo above comes from www.railwaysarchive.co.uk).

As a software engineer,I felt there was a lot to be learnt from thinking about such system failures: could I be confident that my own system could not fail in some unlikely combination of circumstances, when the accidents at Abermule and Hull Paragon show how even apparently the most secure systems can fail?