Posted
by
kdawson
on Tuesday April 27, 2010 @05:19AM
from the seemed-like-a-good-idea-at-the-time dept.

An anonymous reader tips a PC Authority review of some of the biggest technical goofs of all time. "As any computer programmer will tell you, some of the most confusing and complex issues can stem from the simplest of errors. This article looking back at history's big technical mistakes includes some interesting trivia, such as NASA's failure to convert measurements to metric, resulting in the Mars Climate Orbiter being torn apart by the Martian atmosphere. Then there is the infamous Intel Pentium floating point fiasco, which cost the company $450m in direct costs, a battering on the world's stock exchanges, and a huge black mark on its reputation. Also on the list is Iridium, the global satellite phone network that promised to make phones work anywhere on the planet, but required 77 satellites to be launched into space."

I still have a pentium 50 with the FDIV bug (with motherboard). Never bothered to exchange it, and never found myself in a situation where the bug surfaced (except while trying out some of the test calculations). It was the last main processor that I had that could operate with a simple heat sink (no fan). Great times...

OEM versions of Windows ME ended up on countless PCs, essentially all of which needed to be upgraded to XP. No "mistake" there if you ask me. Frankly I think the Vista/Windows 7 upgrade path was no mistake either.

There was no technical flaw in Iridium. It was stated what it would do. It did it. Someone screwed up the business plan, but there was no technical mistake. They knew it took 77 satellites for what they wanted. And they launched them all and they worked flawlessly. Now, if only they had sales to match the business plan, they'd be billionaires. But again, unrelated to any technical issue.

I read somewhere that between the initial idea and implementation that some 13 years had passed. Motorola first started the project in 1985 if I'm not mistaken. While it would have been a good idea when it was conceived, cell phone and their towers became more practical in the intervening years. No one however thought about revising the plan when the times have changed.

Now, if only they had sales to match the business plan, they'd be billionaires.

They had a great sales plan. Make your primary customer the US Military, build a massive satellite network, declare bankruptcy after it's built, reform Iridium LLC, and continue operations through today offering satellite phone service at a price comparable to US international roaming prices.

Satellite will always have limitations until we can get congress to raise the speed on light (stupid greenies worried about photon pollution), can get rid of the line-of-sight issue, and can build the very strong radio

Is that even a problem? There are 30 GPS satellites apparently, plans to upgrade it, and Europe wanted to launch its own alternative system too. I'm not sure if the better military GPS is using different sats currently. We've also invested in a ton of phone cell masts, satellite phones, etc. Taking an uninformed guess, might not Iridium have worked out cheaper, when the final bill was added up?

I'm not sure if the better military GPS is using different sats currently

The "military GPS" uses the same satellites. aka P signals are transmitted with 10 times the resolution and on two frequencies. The civilian C/A is transmitted at 1/10 the resolution of the military and only one frequency.

But latency through multi-hop LEO is potentially as bad as geostationary. Absolute distance may be less, but add per-hop packet store-and-forward times.

In my (admittedly limited) first-hand experience, the US military tends to use Iridium for data comm. Stuff which, 20 years ago, would have been landlines with modems. Except you can't really string landline to some mountain in Upickastan, can you?

You can get close to the flight-time delay of geo (250ms or so) if you include enough instances of per-node transmission time in your back-of-the-envelope calculations. Each node requires a non-zero amount of time to transmit its packet of data to the next hop. For a geostationary, that's a single fixed chunk of latency. (Only one hop). For Iridium et al, that's once per lateral hop. And Inmarsat BGAN has pretty good throughput: 64kbps. Iridi

I have an Iridium phone (the original Motorola 9500). Not only does it work flawlessly (as long as you're outdoors...), it only uses 66 active LEOs. They vastly underestimated the number of people who want/need one, but it's the only (handheld) phone system in the world that works *everywhere* in the world: North pole, south pole, everywhere.

The only "flaw" (besides the multi-billion-dollar goof in estimating the market size), was the name: They knew they really only needed 66 satellites, but who's goi

1. By the time it was operational mobile / cell phones could be carried in your pocket. An Iridium handset was a brick. It was HUGE. Flaw here is that they did not factor in current / future accepted form factors. A blatant missing requirement. AKA a technical flaw.2. They don't work indoors. Yep you heard me right the system does not work in doors. Again someone didn't bother with that requirement. A fairly major one as it turns out.3. Very poor ope

As well as not working indoors, not working well in cities, and having a huge handset (mostly because of the huge antenna), there is also the issue that the satellites need to have a very low and hence unstable orbit. Hence, they burn up on a regular basis, and need to be replaced regularly. This is enormously expensive.

>Um. Iridium didn't actually work that well at all.
Perhaps you missed my post. It works flawlessly. It was never going to compete with cell phones, nor was it designed to. It works where cell phones *don't*, not where they already do. Tall buildings? Why would you need a satellite phone if you're near a tall building?
Your cell phone doesn't work in the middle of the desert (technical flaw?). Nor in the middle of the Sargasso Sea. Nor in most of the places in the Pacific Ocean. My Iridium phon

There was no technical flaw in Iridium. It was stated what it would do. It did it. Someone screwed up the business plan, but there was no technical mistake. They knew it took 77 satellites for what they wanted. And they launched them all and they worked flawlessly. Now, if only they had sales to match the business plan, they'd be billionaires. But again, unrelated to any technical issue.

They launched 66 satellites, not 77 (which was the original plan), as they came up with a cheaper orbital configuration. The cool-sounding name "Iridium" was taken from element 77, since the 77 satellites reminded people of its 77 electrons. When they reconfigured the constellation to 66 I was disappointed that they did not rename it "Dysprosium".

Actually, the first cellular mobile phones were as big as a brick as well; I wouldn't say that this was a "technical error", again, it's a failure of marketing to recognize that they wouldn't sell.And even the phone wasn't the biggest problem; the problem was the huge cost to make a phonecall... it was simply prohibitive. Had it been reasonably cheap, I'm sure there woulb've been plenty of uses (if only for enabling people in isolate places, adventurers, ship & oil platform crew etc. to communicate).

The really early cell phones were the size of briefcases, so heavy that you needed a separate handset part -- I guess calling them "mobile" would be a bit too much. See the (Nokia) Mobira Talkman 450 [about-nokia.com] in all its beauty...

I remember my dad buying one and us being pretty damn impressed when it actually worked at the summer cottage in the middle of the forest. We had to lug the damn thing to the roof to get a signal, but it did work.

Had it been reasonably cheap, I'm sure there woulb've been plenty of uses (if only for enabling people in isolate places, adventurers, ship & oil platform crew etc. to communicate).

Most adventurers I know buy one sword once, and then get all of their equipment updates from loot and drops. I guess the people in isolate places would have to buy double to replace the phones adventurers took, though, so maybe it balances out.

Though, given the same size, Iridium phone is destined to be worse than typical terrestrial cellular one; if the latter has the range (which in large part of the developed word, the area of the world that spends the most, is a given...I don't remember ever noticing "loss of range" in my 6 years of using mobile phones, except in a pretty serious "cellar", one of a castle)

With comparable size of the phone & battery, the satellite one will have very notably shorter talk time. And works only outside buildin

Yeah, I would immediately classify any error that caused deaths to be more important.

Another interesting case was the Patriot Missile failure [umn.edu]. The system clock counted in 1/10th second increments. However, it added 0.1 to a floating point number. Unfortunately, 0.1 in binary is a repeating number, similar to 1/3rd in binary being 0.333333333...

So, ten times every second the time drifted just the tiniest bit. The missile that missed had been running for days, so its clock was one third of a second off, and a Scud travels a long way during that time.

Let that be a lesson to all of you: use an integer counter, and divide by 10 to get the time in seconds.

The article is right about FDIV. The chance of it happening was infinitesimal and it was really any worse than other bugs in contemporary CPUs of that time. A bug in Excel is a much bigger issue for most folks and I for one never bothered to have my P60 replaced.

The problem Intel had with the FDIV bug was one of PR. The Pentium range was the first CPU family to be directly marketed to the general public in a big way.

While anyone with knowledge of the chip design and production processes understood that such bugs are not particularly uncommon (many much simpler chips have well documented errata and workarounds for unintentional behaviour, like the 286's "gate A20" bug that actually turned out to be useful) the general public and the popular press had no such understanding so were very surprised - they assumed that all CPUs were (or should be) completely 100% perfect and therefore taking issue with what they saw as being sold defective goods.

Before the first generation Pentium FDIV issue, such relatively minor problems were dealt with by the error, including any extra side-effects and possible workarounds, being documented, those errata being sent to the chip makers customers and relevant software developers, and things would get patched up without the general public ever being aware there was an issue in the first place aside perhaps from a small number of users who by shear chance were noticeably affected by the one-in-a-few-billion problem before their software was patched (those people would be given replacement chips and/or other recompense). A costly replacement program simply wouldn't have been needed in this case.

Though wasn't the issue in case of Pentium FDIV bug specifically that Intel didn't publish the errata or...any other information after Intel researchers discovered the error? It took one independent one, to whom Intel didn't even respond initially...

Though wasn't the issue in case of Pentium FDIV bug specifically that Intel didn't publish the errata or...any other information after Intel researchers discovered the error? It took one independent one, to whom Intel didn't even respond initially...

I could be remembering wrongly, as much time has passed and I'm not in a position to spend time double checking right now, but I have the impression that the delay in acknowledging the problem was mainly due to being very slow to verify and analyse it and not wanting to acknowledge it until the analysis was complete. While failing to acknowledge the issue in a timely manner was bad, it was more due to slowness/stupidity than actively trying to cover it up. That is part of it being a PR issue as much as anyt

At that time, I was programming a network game about trucks, and when when replaying a demo on the network, the players desynchronized after a few minutes.

I spent a lot of time looking into the logs, and discovered that there was a floating point error that desynchronized the trucks.I still believe that the FDIV bug was much more frequent than publicized, and it had more impact than what Intel originally described.

Intel released a software patch to Watcom C++ library, but the patch was terrible, with the FDIV replaced with a lot of instructions just to detect the cases where the bug might appear, and use shiftings instead of FDIV.

I think that the bug was much publicized because it was the beginning of Internet, where a lot of new information went unfiltered, and Intel completely missed their communication on this bug discovered by Thomas Nicely.Here is the whole story behind this bug:http://www.trnicely.net/pentbug/pentbug.html [trnicely.net]

Way to have half a piece of information. The French had a plan - advance into Belgium and meet the Germans head-on. Who could have guessed that the Germans would pass through impassable terrain and precisely hit the single weak point between the strong Maginot Line and the first-string armies in Belgium?

Reading that article it sounds like the technical mistake wasn't really a mistake but the reality of the Germans hitting the most well defended spot with a creative attack that effectively countered the defense design. That's more of a lack of guessing what the future would bring. The line was effective against what it was built for.

Maginot was built to fight WWI technology and tactics. In the interim, mechanized infantry and tanks had advanced so that the blitzkrieg could actually be accomplished. In the history of warfare, haven't alot of changes in tactics been decided on the advances in technology that the loser did not forecast or plan for?

Maginot was built to fight WWI technology and tactics....and today, we spend hundreds of billions of dollars on a navy which is ideally suited to win world war two. A carrier can be sunk with missiles that cost vastly less than even one of its fighter planes.

Anti-missile systems, of course. We have those, and we're working on better ones.

But what you fail to realize is that carriers are for a lot more than just planes landing and taking off from the water. Carriers are the modern US military's pack mules: if something is going from A to B and not by C130 or similarly large aircraft, it's going by ship. If it's going by ship, that probably means it's on a carrier.

Food, water, ammunition, gear, and various other supplies travel on carriers. The things have weeks if not months of supplies for their fleet in reserve, as well as excess for things like emergencies (see: hurricane/tsunami relief). They are self-contained international emergency response units and, aside from wielding immense military power, are the biggest thing keeping the teeth in the US military's international and sea presence.

A city can be destroyed by a missile that costs less than what a single city skyscraper would cost, but that doesn't (necessarily) make cities an antiquation.

I had some of those growing up and it wasn't really an engineering failure, it was a mentality failure. IBM didn't built PCs, they built tanks. Their keyboards are infamous and still equally usable today 20 years later as when they were new.

That was equally much the case with the rest of their PCs, using very high quality equipment operated under very less than ideal random home/office conditions and with very much consumer software of consumer quality, not server quality. In short, it made no sense.

The result was that IBM priced themselves way out of the market of cheaper clones. It was cheaper and better to buy a clone, throw it out if it failed and buy another. You just don't do that with big iron or servers, but with desktops hell yeah.

Like the article said, it wasn't more of a failure than that PS/2 ports become the dominating keyboard/mouse connector. If there was every a silly move by IBM there it was giving away the software market to Microsoft, but the average desktop market was doomed long before the PS/2.

Have to disagree to a point. The PS/2 range sold big time in the business/corporate and education worlds (at least in the UK until RM/Viglen got their toe in the door). Built like tanks, yes - but they were very reliable in my experience.

The biggest failing within the PS/2 world was the licencing arrangements for the MCA (microchannel architecture) bus which made it expensive for other manufacturers to use and so few did. MCA was technically great, but the way IBM brought it to market ended up with is getti

Their keyboards are infamous and still equally usable today 20 years later as when they were new.

In other words, they're famous. I'm typing this on an IBM 1391411 (Swedish version of the PS/2 1391401) - best keyboard I've ever had. I got it about 3 years ago after many many "modern" keyboards of different kinds and I'll never go back to some low-profile, "high tech" (=useless mediakeys) keyboard.

VGA still is one of the most persistent standards of desktop computing. Many popular (read: cheap) LCDs still use it exclusivelly; however little sense, when it comes to price of manufacture (but not when it comes to artificial product segmentation) this has. Plus you can almost count on VGA in laptops; other connectors - hit & miss.

The technical error here was that there was no test on the real thing. The company that made a part of the telescope had only a separate testbed that was made to specifications. Alas, these specifications were exactly one inch misunderstood, so the result was a part that was incredibly accurately one inch out of position.

Those weren't mistakes per se. Rome's lead problem was due to a lack of knowledge about the effects of lead. You can't blame people who don't consider information that is not known to mankind at the time.

Galloping Gertie was an unfortunate situation, but since there were no tools to do dynamic modeling at the time, it wasn't quite a mistake.

Therac-25 was a mistake. The dangers were known, the problem was well defined. All the information was there to make the right choices and we knew how to make appropriately safe software at the time.

"Orbiting this at a distance of roughly ninety-eight million miles is an utterly insignificant little blue-green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat idea." - Douglas Adams.

There's that, and there's also the whole "the world is flat" and "disease is caused by imbalances in the four humours of the body" ideas. The article's examples seem pretty trivial in comparison.

I have a great digital watch. The band is integral with the body of the watch so I can wear it in bed and it won't catch on anything. It has up and down timers, world clock and multiple alarms. It cost 30 bucks on line.

I wear it when travelling. I use the stopwatch to time my medication and the world clock to schedule calls home. It does things which no mechanical watch can do.

The virus is thought to have been developed in 1986 by two brothers in Pakistan named Basit and Amjad Farooq Alvi, who were looking to protect some medical software they had written from disc copying. They had found some suitable code on an internet bulletin board site and adapted it so that if someone used the software then the malware would be installed.

I'm guessing "Iain Thomson" is not a day over 25, not very versed on the history of the Internet, and too busy to look up the meaning of "BBS". Am I right?

Sigh. Even if he's 16, if you're writing a piece on tech mistakes you oughta suspect that they couldn't possibly have used an "Internet Bulletin Board Site" in 1986, so maybe you got the acronym wrong.

A man I worked for many years ago, one of my engineering mentors, told me about a mistake made during World War Two, where a large number of very large castings were discarded because the specification called for a much smaller tolerance on the location of an exhaust port than was actually necessary. As I recall, the spec allowed it to be 1/4" away from its nominal location, but it actually was connected to a flexible hose and it could have been a couple inches off in any direction without causing any problem. This mistake wasn't discovered until several millions of dollars worth of tank bodies had been scrapped and melted down unnecessarily.

It turned out that while most of the programming and mission planning had been done in units of measurement from the Imperial system used in the US, the software to control the orbiter's thrusters had been written with units of measurement from the metric system.

And that is WRONG! It was the software that had the archaic units, and the rest of the spaceship was built with international units.

The biggest failure to date which didn't get mentioned is Unix. If we had Multics, with it's B2 security rating, we might have actually had secure operating systems in the hands of the public at this point in time. We wouldn't be dealing with spam, or virii.

But no..... it was soooooo complicated.... K&R had to stick us with a piece of insecure crap... and everyone else was stupid enough to copy it.

Let's not forget Apple's "Lisa". I know the Apple III was in the list but the Lisa cost more to develop and probably sold less units. I know a lot of the Mac UI came from Lisa underpinnings but the "Epic Fail" tag is deserved.

Speaking of technical flaws and Lisa... You could plop the boot drive into the Dumpster, and it would format it. The tech savvy devs who designed the "drag-to-trash = format" function never imagined that users would be stupid enough to do something like that! Little did they know about how giving someone a mouse transforms them from someone who can use a line based editor to set up printer drivers and networking into the horror that is a modern user.

Intel's 8086 CPU, Intel's first 16-bit processor, was possibly much worse than any of those mentioned because it affected all of us. Intel chose to continue the quirkiness of the 8008 rather than abandon it.

Just before the time of the introduction of the 8086 I knew a chief of technology of a high-tech company who was waiting for the 8086 as though it were a combination of Christmas, his birthday, and the birth of his child. He would start every conversation by telling everyone Intel's release date for the 8086.

The day of its release, he was miserably unhappy. Intel chose to continue an architecture that made assembly language programming and debugging of high-level languages more difficult.

Wikipedia says about the 8086 [wikipedia.org]: "Marketed as source compatible, the 8086 was designed so that assembly language for the 8008, 8080, or 8085 could be automatically converted into equivalent (sub-optimal) 8086 source code, with little or no hand-editing. The programming model and instruction set was (loosely) based on the 8080 in order to make this possible. However, the 8086 design was expanded to support full 16-bit processing, instead of the fairly basic 16-bit capabilities of the 8080/8085."

The problem was that the quirkiness has been extended to the 32-bit processors of today. The Wikipedia article says, "The legacy of the 8086 is enduring in the basic instruction set of today's personal computers and servers..."

And, "Programming over 64 KB boundaries involved adjusting segment registers... and was therefore fairly awkward (and remained so until the 80386)."

Everyone on the planet who used or were affected by computers then suffered because the debugging was much more complicated than if Intel had chosen to make the operation of the 8086 simpler.

"Such relatively simple and low-power 8086-compatible processors in CMOS are still used in embedded systems."

The 8086 and the MSDOS legacy made more 680x0 fanboys that Motorola marketing

Well, that and the 68000 just being a really good chip in its own right. Motorola were smart enough to stick to flat memory architectures, and it had a really nice, obvious instruction set, and was powerful to boot.

Maybe NASA wouldn't have made that mistake, but the sub-contractor could. OTOH, maybe the sub-contractor had a button to pass from Imperial to Metric units for its navigational controls, but maybe NASA didn't RTFM, and that may have caused the mistake.

Yeah, my reading is that the contractor reused an older routine (in English units) with a newer routine (in metric units) without double-checking the interface spec. It's (sort of) like the operators ordered a metric speedometer but received one marked in English. Since the unit wasn't marked, they assumed it was metric as per the specifications.

The MCO [wikipedia.org] Investigation Board report [nasa.gov] is a quick read and an interesting case study.

You've got this right on a number of levels. Most obviously because the probe was a JPL project, not NASA. Despite their close ties, they are separate entities.

Secondly, it was not a JPL mistake either. JPL is a pure metric shop. This pervades everything they do; if you walk in the front door and ask the receptionist where the toilet is, he'll tell you that it is "Thirty meters down the hall and to your left"

So what happened? How was this mistake made? Politics. When the mission was funded, some congre

From the article: "The Mars Climate Orbiter, and the Mars Polar Lander it contained, would have advanced our knowledge of the Red Planet immensely...."

Ouch. Mars Climate Orbiter did not "contain" Mars Polar Lander. They were two separate missions.

Saying it was a "simple" mistake is a little simple. The mistake could also be stated as the error of using heritage software in an embedded system, without examining it and testing its validity.

Strider wrote:

When the mission was funded, some congressman saw that it was an opportunity to give some pork to his district and put in some language essentially requiring JPL to hire Rockwell (as I recall, though it might have been Boeing) as the prime contractor.

The metric/imperial mix-up that destroyed the craft was caused by a human error in the software development, back on Earth. The thrusters on the spacecraft, which were intended to control its rate of rotation, were controlled by a computer that underestimated the effect of the thrusters by a factor of 4.45. This is the ratio between a pound force–the standard unit of force in the imperial system–and a newton, the standard unit in the metric system. The software was working in pounds force, while the spacecraft expected figures in newtons; 1 pound force equals approximately 4.45 newtons.

The software had been adapted from use on the earlier Mars Climate Orbiter, and was not adequately tested before launch. The navigation data provided by this software was also not cross-checked while in flight. The Mars Climate Orbiter thus drifted off course during its voyage and entered a much lower orbit than planned, and was destroyed by atmospheric friction.