Posted
by
Hemos
on Monday February 26, 2007 @11:19AM
from the like-windows-for-workgroups-with-guns dept.

mattaw writes "The Register is carrying the sanest and balanced article on Windows deployment in UK warships that I have read to date in the public domain.
As an ex-naval bod myself we have long considered that this is potentially a REAL problem. The main issues are the huge amount of unrelated code that is imported with the kernel and the need for incredibly fast response times."

...this is probably a positive step, in many ways. As the article shows, the previous software was terrible already. Military research and development may seem high tech and modern, but they are one of the most inefficient organizations imaginable -- tons of ancient embedded programs trying to integrate with one another. I can't imagine being a "new" programmer in the military and trying to comprehend what decades of previous programmers were trying to do, let alone keep it working.

Sure, there are many options out there -- Linux, continuing to use a proprietary OS, Windows, whatever. Yet with technology changing as fast as it does (even military hardware), it does make sense to use an operating system that has some base support for almost everything. In this case, it is Microsoft.

Does Windows crash often? For many users, I think the answer is yes. But in my experience, you can tailor a Windows installation to just the most basic requirements and it runs fairly well. I highly doubt that warships would be connecting to the public Internet with the users downloading any number of buggy apps to conflict with mission-critical applications. Since that is the case, there are a number of long term installations that I have familiarity with that have been running Win2K (and some WinXP) that have been running flawlessly for years for my client base. None of these installations are on a public IP, none of them allow end-user application installation, and all of them have been extremely rock solid AND easy to maintain when necessary. As the article shows, their main connection is a unidirectional 300 baud ship-to-shore link.

We're not talking about a machine running everything, just specific software for a specific purpose. Anything is a step in the right direction when you consider what a Luddite the military can be in terms of support applications versus the modern hardware they're running. Training new users on ancient system is very inefficient and dangerous (read the article on their ancient interface hardware!), giving them an interface they recognize makes sense from many angles, including safety. The interface to enable weapons firing won't rely just on Windows to approve or disapprove a launch -- there are always old-fashioned hard key-based turn-locks that override whatever the software does. If they want to launch a missile, the physical keys must be turned, and THEN the software must be approved. If there's a glitch after this hard-approval is turned, it can't be in grave error.

The bottom line is that I liked Win2K towards the end of its supported life. I had many customers who were unhappy about moving to Windows XP, and we still support numerous servers running Windows 2000 for mission critical (not THIS critical, though) applications that are running strong and haven't had to be restarted in over a year or longer (one customer hasn't rebooted their Win2K installation in 3 years). The software works, the API interface is known by most modern programmers, user interface is comfortable for almost everyone, and as long as you don't connect it to the public Internet or try to install a variety of conflicting/buggy applications, you're in good shape.

I think this option is better than Linux or F/OSS operating systems that would possibly require MORE training for their programmers and users to learn. My biggest frustration with F/OSS operating systems is that the user interface is counter-intuitive for a lot of Windows-friendly users, and even worse, trying to find an "old but stable" operating system is a mess as the F/OSS operating system support-base seems to be more focused on the latest stable builds rather than what mission-critical users would want: older software that has a longer history of running well for a given situation.

This article is infantile puffery, something that's obvious from the style.

Take non sequiturs such as "Windows may be unreliable, but it's hard to imagine it being as failure-prone as the kit which is out there already." This logic may suffice for a lightweight Register article but it's no way to justify picking the worst available consumer grade O/S over proven systems such as Solaris, OpenVMS, or other far more reliable alternatives.

Waht makes me Zzzzzz is everytime I read a slashdot article, I come across comments trying to predict what the other commenters (by implication, those others are less intelligent than the poster- pointing out stupidity in others in an attempt to make him look smart by association).
I prefer to read posts about the subject on slashdot, rather than posts about slashdot, especially when they have the irritating smug tone of "Oh, look at all the losers and their oh so predictable posts. I'm glad I'm far more intelligent than the unwashed masses!"
And, yes, I am aware that I don't have to read any posts here, and that I have not only read one of these pointless posts but replied to it.

I think this option is better than Linux or F/OSS operating systems that would possibly require MORE training for their programmers and users to learn.

You must not be a resident of the United Kingdom. I find it interesting that any country's government or military would rely on a foreign proprietary piece of software to reach mission critical goals.

Who cares about training when you're now dependent on a company in another nation? What happens when there's another nutcase in the white house who orders Microsoft to cut off updates or support to foreign military customers?

I believe prior to BAE's sole recommendation that AMS, a company that specializes in Combat Management Systems, recommended Unix [theregister.co.uk]. There was also criticism [theregister.co.uk] of a lack of third party external review for this decision (not sure if that's linked in the original article or not). If it's the case that BAE up and said "We're going with Win2K" and the government said "ok," I would be a bit concerned.

I do not think the United States Navy would willingly rely on any foreign proprietary software.

Regarding the crashing though, I found that on my Windows system, most crashes can be attributed to either

(A) Bad hardware(B) Bad drivers - usually the graphics driver.

The more complicated 3D stuff an app does, especially a game, the more problematic it is in terms of stability, though this is not always the case - many professional apps put a lot more time into getting aroudn these bugs.

On one machine I had, regardless of the OS, if I had high network IO with either high CPU use or high 3D use, it crashed. Changed the mobo, problem went away.

On another, it had not only one of the worst SATA chips out there, but probably one of the worst implementations of said chip. Linux and FreeBSD solved the stability issue by not installing on anything except IDE drives, Windows on the other hand installed, but had issues. A new SATA controller card fixed that.

Yes Windows has issues. But in my old Windows 2000 box, with a Tyan Trinity S1598 based box, K6-III 450 and 512MB of memory, I was regularly getting multi-month uptimes. And I even gamed a bit, though not much.

The point is, as you stated, you/can/ make Windows stable, it just takes a bit of effort because

(1) Driver quality is more relevant - I don't know the details but a bad driver is less likely to crash the whole system, in my experience, in FreeBSD or Linux.(2) Windows is more likely to load up on bad hardware. It's also more vulnerable to issues related to bad hardware.

Note: this is just for 2000 and later (really, in my experience XP is a downgrade on stability, and I can't say much on Vista, though mileage may vary). 9x variants of Windows were crashmonsters.

I doubt very much that this is the Win2K that you may have bought for your desktop. Many companies make products for consumers that differ greatly to those made for the military, police, and other services. My suspicion is that this is a highly customized install that will be considerably more limited and specialized. And yes, far more stable. The details of the customization, will no doubt, not be available to the press or public (and nor should they be).

As for the articles description of some of the systems out there that are being used by the militaries of the world. It's pretty accurate.I had a Vic20 that had more power than some of the systems still out there.

"I think this option is better than Linux or F/OSS operating systems that would possibly require MORE training for their programmers and users to learn. My biggest frustration with F/OSS operating systems is that the user interface is counter-intuitive for a lot of Windows-friendly users"

Okay we are talking about embedded systems! The user interface to an advanced missile defence system will not be the same as Word!Also I pray to God that they don't hire your typical Windows VB programmers for these jobs so that extra training for the programmers is bunk.

The simple truth is that no "off the self" software should be run on these systems. You are not going to run Word or the latest version of Photoshop on your Command and control systems. You can put a great looking user interface on any OS if you want to so the user friendliness of Windows doesn't really matter. The other issue is going with W2K is you are stuck using X86. Unless they want to move to Vista they are stuck using 32 code.

Seems like a bad plan to be stuck with one type of CPU and a near end of life OS.

Solars, QNX, OpenBSD, VMS, Linux, are any number of secure, actively developed, and or real-time capable OSs seem like better choices.

No internet access is irrelevant. The fact that a system like that is vulnerable AT ALL to common viruses is a recipe for disaster. Consider: Someone who doesn't like the current direction the ship is going bringing in his USB pen drive and launching a virus across the ship, taking control of it or just disabling it. While this could potentially happen with a custom designed OS, without the specs, interface calls, and knowledge of the system and how to compile for it, you aren't going to be writing many viruses at all for it. Even the potential for ACCIDENTAL infection makes it highly undesirable to have a common OS at the core of your battleship.

Obviously, this was caused by the fact that the Yorktown's control software was of a really bad design.

You are mistaken. Safeguards were intentionally disabled.

The truth is that a server app corrupted it's data, a client app tried to use that bad data, and the client app failed to control equipment. Can happen with any OS. Add to this the fact that the ship was a test platform not an operational ship and they were trying to break things.

"Others insist that NT was not the culprit. According to Lieutenant Commander Roderick Fraser, who was the chief engineer on board the ship at the time of the incident, the fault was with certain applications that were developed by CAE Electronics in Leesburg, Va. As Harvey McKelvey, former director of navy programs for CAE, admits, "If you want to put a stick in anybody's eye, it should be in ours." But McKelvey adds that the crash would not have happened if the navy had been using a production version of the CAE software, which he asserts has safeguards to prevent the type of failure that occurred."

"McKelvey writes that the failure, "was not the result of any system software or design deficiency but rather a decision to allow the ship to manipulate the software to stimulate [sic] machinery casualties for training purposes and the 'tuning' of propulsion machinery operating parameters. In the usual shipboard installation, this capability is not allowed.""

I've found library hell to be worse on Linux, even with Ubuntu, in my experience than in Windows.

Okay, but now explain HOW it is "worse".

Under Ubuntu, if the library isn't in the repository, that single app won't install so you know it won't work.

With Windows, installing a new app causes one or more existing (and previously working) apps to stop working.

As for (B) and stress tests, The trick isn't so much to put a high load in all the time, but to trigger the wrong event in the wrong state, stress tests can easily miss this one.

Which would indicate that it was a software bug if that behaviour was documented or known.

Such would be a hardware bug if such was not documented and behaved differently in different pieces... or if it was documented but not correctly implemented in any of those pieces. Either way, it should be very easy to troubleshoot. And with Linux, it is very easy to troubleshoot that.

You'd know that Win2k, however bad, is far better than what they have now.

I find this hard to believe. This sounds like something that you'd hear from someone who had already decided to upgrade.

Their current system works; therefore, it is inherently superior to any new, unproven, new system. There should be a huge barrier to upgrading with anything, because you're replacing a devil you know with a devil you don't. The new system should have to have demonstrated credentials in other similar situations, proving that it's at least as capable as what it's replacing. Things like ease-of-use and training should all fall under the system's core purpose.

I've seen companies replace "legacy" systems because some manager walked out onto the production floor / cube-pit and was horrified to see green-screen terminals sitting around. To them, terminals = old, old = bad, end of discussion. So they would come up with reasons to upgrade, and say things like 'well, it couldn't be worse than what we have!' with complete neglect for the fact that the old systems, by virtue of having been there for a long time, clearly did their job.

And, bottom line, it's a lot easier to train someone on a complicated green-screen system that always works, than on an unpredictable new system, where you have a ton of gotchas and error modes. Generally, once you get everything worked out, and people know what things they just can't do because it'll crash the system, you haven't really simplified anything. I have personally seen tens of millions of dollars wasted on 'upgrades' like this, where the result was so much worse than the beginning, that it immediately rolled into a new cycle of upgrades -- the executives believing, like deranged poker players, that as long as they had tossed that many millions into the pot, that they would surely solve it with a few million more.

This sounds like the same thing is happening; someone freaked because the equipment and software is old, but didn't realize that there's no logical reason why something that's old is necessarily bad, if it's still doing it's job. "Anything is better than this" is always false if what you have right now gets you through the day and does its job. Unless the system you're implementing has a strong track record of doing the same job elsewhere, you have nothing besides a salesman's promise that it's going to be better. And remember: at the end of the job, that salesman is going to disappear, and you're going to be stuck using whatever is left.

You do realize there are sites full of nothing but pictures of BSOD/other errors on closed systems with a dedicated purpose, no internet access, and running a single application? The last such system I saw was at the Miami Internation Airport about two weeks ago. Just as you approach security you look up and there is a monitor with blue background and a windows fatal error popped up on the screen.

A competent windows admin can harden windows, he can harden it more than an incompetent *nix admin can. But windows simply can't be hardened to the degree that *nix can. With a *nix system you can remove everything that is not neccesary right down to unused kernel components. You will never be able to say that, windows will always have tens of thousands of lines of code with bug potential running that have nothing to do with your application.

The interface is also fairly irrelevent when you are running a single application fullscreen. These aren't desktops.

Given Britain exports a lot of defence technology, use of foreign machenary is not that big a problem to many nations

Buying machinery is one thing; software is quite another. With a machine, even a fairly complicated one, you can with enough effort, understand what's going on inside it.

Say you have an Apache helicopter. When you buy that helicopter, you also buy training. Not only do you send the pilots in for training, but you also send all of the maintenance people, pad crews, etc. They learn how to service it, tear-down the engines, etc. So what you get back is far from just the machine, you get a machine, and a crew who (ought to) basically understands it. And if you really want to understand it, if you're any country worth discussing, you ought to have at least a few engineers who could spend a few weeks figuring out key parts.

But with software, you're buying a true black box. You're being handed something (which, if every line of code was the size of a watch-gear, would probably be as big as a trailer truck) that you cannot have any significant insight into the workings of. You have no idea how it really works, or what it's truly programmed to do.

With a machine, you can tear the thing apart on receipt and make sure there's nothing suspect in there; no bombs or homing beacons, etc. You really can't do that with a large piece of precompiled software. You are totally at the mercy of the people who built it; you're taking them at their word that they haven't backdoored it.

And for what it's worth, if I were the CIA in the U.S., you'd bet I'd be leaning on Microsoft to seriously backdoor every piece of software that it sold for military purposes abroad. To them, it's a perfect way to prevent resale to folks that we don't like (or later decide we don't like). Sure, we're friends with the British, but what if the British in 10 years sell a destroyer to the South Africans, who sell it to the Egyptians, who sell it to the Iranians? Suddenly, a way of making it go dead in the water would come in handy. You'd better bet that the folks in Langley, who are paid to be paranoid, have thought about this, too.

Software is inherently different than physical machinery, because while physical devices can be taken apart and investigated, and follow basically well-understood rules (physics, chemistry, etc.), software does not. A large binary blob is as close to indecipherable as a functional object can get, and there's really no way to secure it. It is an inherent risk, and one that I'm not sure many established militaries are putting enough thought into.

I worked as an intern for a big company in the power protection and control field (i.e. power substation automation). It's not warship control and if something fails probably no-one is going to be killed, but things will break and money will be lost.

They had some in-house software to program the protection and control devices. That software could also be run under Windows for testing and debugging purposes. I worked on a prototype of an extension of said testing and debugging environment, so I have a bit of experience with this kind of embedded-ish real-time Windows programming, and I must say that Windows is definitely not the way to go for anything like that. It just lacks the flexibility of operating systems made for this sort of task.

Later I found out that what they actually wanted to do is to replace the special-purpose systems with the simulation and debugging environment, all running on Windows because it was supposedly much easier to use and what not. They're going to use my prototype to do so:-(

I have the impression that Windows is often chosen for this sort of task because management knows it and has the feeling that "Microsoft is the real thing", that it is easier to find experienced developers for Windows than for any other platform and that the development tools are better and/or more user friendly. While I agree on the last two points, I'd like to point out that "experienced Windows developer" does not mean experienced real-time, high-reliability-systems or embedded developer, and that the development tools are mostly focused on GUI/Network service programming which is what windows is mainly used for.

I'm sure there are lots of people out there with way more experience in this field than me, but if I were to decide for an OS on a warship it would definitely not be Windows, Unix or any other general purpose OS, but something which can be customized and is built for this kind of task - VxWorks or something similar.

Read your article again: "After a crew member mistakenly entered a zero into the data field of an application, the computer system proceeded to divide another quantity by that zero. The operation caused a buffer overflow, in which data leak from a temporary storage space in memory, and the error eventually brought down the ship's propulsion system. The result: the Yorktown was dead in the water for more than two hours."

Safeguards disabled or not, that is not an acceptable outcome. These machines kill people. The error should have stopped at the divide by zero. But it didn't. It resulted in a buffer overflow. Which resulted in a memory leak. Which resulted in the eventual crash of the entire network.

All that Mr. McKelvey is saying is that they didn't have the checks in place that would have prevented such values from being entered. The fact still remains that a single bug took down every subsystem in the ship. That is unacceptable, as situations may arise where invalid data either passes the checks by accident, or is unexpectedly created from inside the system. (e.g. Sensors sometimes give values that are unexpected.) Proper design would have taken into account that this could happen, and protected each system against crashes in other systems.

In any case, all the Navy was attempting to do was drive machinary outside of their speced ranges. Allowing those ranges to be manually overridden is not an excuse for total failure. The Yorktown was a warship. Which means that she may have been called upon to operate outside of safe limits inside a variety of combat situations. Would it be acceptable for the ship to crash because the crew was trying to compensate for battle damage? And if the ship's systems are so vulnerable without these checks, what happens when damage from enemy fire starts causing power spikes and drops? Does every subsystem cascade into failure just because a different networked subsystem failed?

If the USS Yorktown (CV-5) had been equipped with these systems, we would have lost the Pacific theater in WWII. Rather than continuing to fight after taking torpedo after torpedo after torpedo, her systems would have crashed or been corrupted, and that would have been the end of her fighting ability.

Never mind the reality that the Yorktown carrier had continued operations at the Battle of Coral Sea after receiving a bomb through the deck that penetrated the hull and exploded below decks. The damage was estimated to take 3 months back in port to repair. Never mind that she was hastily patched up in only three days and sent straight back out to the Battle of Midway. Never mind that she took 3 bombs from enemy fighter planes before the boilers were taken offline for repairs. Never mind that she was back up and giving 20 knots only one hour later. Never mind that in her heavily damaged, beaten, and bruised state, she still managed to evade two torpedos through wild maneuvering before the enemy torpedoing finally tore into her hull. Two torpedos ripped into her andjammed her rudder. Her powerplants went offline and she began to list. The ship was abandoned, but wasn't lost until the next day when another two torpedos contacted her hull during (amazingly successful) salvage operations.

THAT is the type of hell that these computer systems will need to go through. They must fight to the last minute to make sure that the ship remains operational. The lives of those on board, and those back home may depend on it some day. Having systems crash at the slightest sign of bad data is not acceptable. Bad data is a guarantee in these systems. When the ship starts taking damage, she WILL experience failures. There's no question about it. But one failure should never, ever, ever lead to another one. If it does, people die and wars are lost.

But having source lets you begin to replace components. If it's a black box you never can. If it's an open source system you can get around problems. Maybe the C compiler is hacked, maybe it'll even hack all future C compilers, but will it recognize a Ruby interpreter? Will it successfully hack Ruby such that a debugger written in Ruby will fail to display the vulnerability in C programs?

Look at Debian on BSD. They're swapping the Linux kernel while keeping the GNU tools and Debian packaging. You could swap in another kernel, or emulate three or four kernels in a VM and make sure they all agree. You could skip the GNU tools and use others, etc.

How do you avoid the potentially bugged parts of Windows. Let's say the MMU and the encryption routines. Swap in other components and see if it works identically.