Posted
by
CmdrTacoon Tuesday November 30, 2010 @05:22PM
from the debugging-is-a-little-harder dept.

Normally I avoid slide show type articles, but this one is actually pretty interesting. It starts "This NASA lab contains a recreation of the computer systems found onboard the International Space Station. It is the place where the final bug testing takes place before software is uploaded to the station and where software engineers recreate bugs that occur onboard the station in an attempt to help fix them."

Wouldn't "The Last Stop For Space Station-bound Software" be the space station, and not the testing environment?

I thought more about this and I think it's correct. There are two stops the software makes: testing, and the space station. At the first stop, it's bound for the space station. At the second stop, it's not bound for the station anymore since it's already there. Thus, the first stop is the last stop for the space station-bound software. And the last stop is the first stop for the software that's a

I disagree, when the software is in the process of making the stop on the space station, it is still space-station-bound. It doesn't stop being bound for the space station until after the stop, so the stop occurs while it's still bound for the space station.

Don't forget... HAL 9000 had a ground-based testbed "twin", SAL 9000. Too bad they didn't try lying to it about its critical and super-secret mission before doing so to HAL. They only tried that afterward, as a debugging replay.

well, i'm not sure about that... but there certainly have been some interesting times with the computers on board. THere are some computers which basically are like HAL and run most of the station... they are known as C&C computers and since they are so critical... there are 3 of them. And they constantly are in hot standby.

Interesting enough, the crap hit the fan during stage 6a, and being here, I can tell you that everyone was looking to blame someone else's subsystem. The canadian robot arm was being installed and so people were very suspicious it had something to do with it, but it basically turned out to be hard drive problems on all 3 C&C computers.

The cool story here is that the idea that you could have a triple-redundant system fail seemed so far off that it was almost thought impossible. Even still, some engineer had this idea to write this little program which would jump into action if it ever saw all 3 C&C computers offline. The program was called "Mighty Mouse".

During the episode, things went really bad. Lights were out. Comm was out. At one point people were trying to confirm whether commands were coming up and down by looking out the windows and seeing lights get turned on and off. But... mighty mouse saved the day. It kept cyclicling power to the 3 C&C computers until it saw a healthy one and for a while C&C2 came back online and let the ground controllers get some data and start fixing some issues.

All is fine now. I believe they have replaced the hard drives with space hardened solid state drives... but it was one of those interesting periods during early space station construction where software was an integral part. you can read all about this here:

The cool story here is that the idea that you could have a triple-redundant system fail seemed so far off that it was almost thought impossible.

Heh, well it would be, if the estimated probability of failure for each was truly an independent random variable. The excrement usually hits the fan when it isn't. Like, say, a hard drive with an unknown defect where a certain access pattern can make it fail. 3 machines doing the same work means they could all fail for the same reason at about the same time.

Obviously a country wouldn't sabotage their own space station. (Unless you want to get into this whole conspiracy theory thing where a government kills its own people in order to make people afraid and seize more power in the ensuing chaos.)But terrorist organizations or countries that don't have anything on board the ISS may conceivably want to bring it down. The point of doing something like stuxnet, as opposed to simply shooting it with a missile, is that it's hard to find and hard to trace back to who