HSBC UK managed to get its beleaguered e-payments system back online yesterday evening after the second extended outage in less than a fortnight.
Retailers using the system were unable to process transactions from about 5pm (BST) on Monday until Tuesday evening.
Similar problems left merchants unable to take payments for two …

COMMENTS

Here we go...

Queue lots of people who know nothing about banking IT to make comments about them not having any DR, this is what you get for systems running VB code, that this sort of problem should be avoidable (which sort of problem?) etc. etc. etc.

Banking IT

IT in banking is extremely complex and has been made more so by the different platforms which so may different banks use. Any payment system is going to be inherrently unstable because it is going to have to interact with a number of other disparate systems and that's where VB hacks come in to glue the whole thing together.

Add to that the fact that if a complex and interdependant system goes down it is often very difficult to restart it again quickly as some parts will have to be taken down in order that other ones may start and the dependancies can be the death of you.

Who's betting...

...a small but critical part of this was running on a box under someone's desk, it got 'accidently' turned off, the delay was in working out exactly *which* desk it was under and finding someone who could remember the admin password to fire it all back up?

poor sods

Pedant, moi?

@Fraser

It's actually "Cue", as in a signal, such as a word or action, used to prompt another event in a performance, such as an actor's speech or entrance, a change in lighting, or a sound effect.. not a line of waiting people or vehicles, more commonly found in any British "fast food" outlet where the concept of FAST is above them....

Ok...

I wasn't suggesting that this shouldn't have happened, or that it isn't a massive screw up, rather that banking systems are far more complex than generally given credit for.

Consider: A production server fails, you fail over to the DR server with it's copy of the disk in a remote site. Easy, probably even automated.

However: A database corruption occurs, this corruption would have been instantly transfered to the DR disk, so that DR server is totally useless. You (probaby) have snapshots from Start of day, or end of previous day (pre-batch). Did the batch corrupt the database? Do you want to recover from pre-batch, or post batch? If the batch corrupted the database, how? Do you need to re-run the batch, can it be re-run the next night? How long does it take to run? Did another server cause the corruption, at a guess a dialin system like merchant handsets would be using a bigass unix box, or Tandem, almost certainly talking to a back end mainframe, probably via some sort of broker... etc. etc. etc.

This is just one of many scenarios that could have happened, rather simplified one as well, but illustrates how DR can be rendered pretty useless. It doesn't even consider the reuqirement to recover from tape.