Buses will replace trains between Clifton Hill and Heidelberg due to level crossing equipment damaged by a motor vehicle at Fairfield. Buses have been ordered but will take time to move into position. Please listen for announcements or speak to Metro Customer Service staff for more information.Somebody

Contact Carnot

Delays to over 60 minutes due to a computer failure affecting the metropolitan area. Trains will hold at available platforms until further notice. Please listen for announcements.
Passengers are advised to please defer travel where possible, or seek alternative transport.

Delays to over 60 minutes due to a computer failure affecting the metropolitan area. Trains will hold at available platforms until further notice. Please listen for announcements.
Passengers are advised to please defer travel where possible, or seek alternative transport.bevans

Whatever happened to redundancies? Without knowing more, I'm just speculating, but (1) if this was a hardware failure, there should be redundancies to take over (this should extend as far as a complete secondary control centre that can take the load in the event the primary control centre goes offline) and (2) if this was a software failure, the problem should have been caught in testing.

The computer failure has shut down the entire network, according to the map on the Metro website.

Contact TheMeddlingMonk

Location: At the back of the train, quitely doing exactly what you'd expect.

Glad I am in Thailand on holidays.

Take a look at this mess for the hurstbridge line

Delays to over 60 minutes due to a computer failure affecting the metropolitan area. Trains will hold at available platforms until further notice. Please listen for announcements.
Passengers are advised to please defer travel where possible, or seek alternative transport.

Whatever happened to redundancies? Without knowing more, I'm just speculating, but (1) if this was a hardware failure, there should be redundancies to take over (this should extend as far as a complete secondary control centre that can take the load in the event the primary control centre goes offline) and (2) if this was a software failure, the problem should have been caught in testing.

The computer failure has shut down the entire network, according to the map on the Metro website.TheMeddlingMonk

I take it Metrol hasn't been replaced like it was supposed to be?

Posted: 13 Jul 2017 17:24

Contact railblogger

Whatever happened to redundancies? Without knowing more, I'm just speculating, but (1) if this was a hardware failure, there should be redundancies to take over (this should extend as far as a complete secondary control centre that can take the load in the event the primary control centre goes offline) and (2) if this was a software failure, the problem should have been caught in testing.

The computer failure has shut down the entire network, according to the map on the Metro website.TheMeddlingMonk

I don't know what happened, but as a general rule it's impossible in software to test for every single thing that could ever go wrong, and there is no such thing as a bug free program. There will always be something out of the blue which no one ever thought of. Programmers have a saying: "Murphy was an optimist". Having said that, there could perhaps have been better handling of an unknown error.

Contact Lad_Porter

Whatever happened to redundancies? Without knowing more, I'm just speculating, but (1) if this was a hardware failure, there should be redundancies to take over (this should extend as far as a complete secondary control centre that can take the load in the event the primary control centre goes offline) and (2) if this was a software failure, the problem should have been caught in testing.

The computer failure has shut down the entire network, according to the map on the Metro website.

I don't know what happened, but as a general rule it's impossible in software to test for every single thing that could ever go wrong, and there is no such thing as a bug free program. There will always be something out of the blue which no one ever thought of. Programmers have a saying: "Murphy was an optimist". Having said that, there could perhaps have been better handling of an unknown error.Lad_Porter

Of course - I totally agree. It's impossible to check for every single possibility. However, one is supposed to design software to degrade or fail "gracefully"; if you don't implement proper exception-handling, then your program can just completely crash (and possibly corrupt useful data in the process) if something unexpected happens.

The fact that this computer failure brought down the whole network would suggest that there were no redundancies (or they all failed as well) or that the system or software has not been designed with quick recovery in mind if something does cause a catastrophic error.

Edit: The other point to add is that testing should try to cover all the common scenarios in which a catastrophic failure can occur and test the system's ability to handle and recover from them.

Posted: 13 Jul 2017 17:51

Last edited by TheMeddlingMonk on 13 Jul 2017 17:53; edited 1 time in total