XHTML

Taxing Inferior Products

I recently had a medical appointment cancelled due to a “computer crash”. Apparently the reception computer crashed and lost all bookings for a day and they just made new bookings for whoever called – and anyone who had a previous booking just missed out. I’ll probably never know whether they really had a computer problem or just used computer problems as an excuse when they made a mistake. But even if it wasn’t a real computer problem the fact that computers are so unreliable overall that “computer crash” is an acceptable excuse indicates a problem with the industry.

The problem of unreliable computers is a cost to everyone, it’s effectively a tax on all business and social interactions that involve computers. While I spent the extra money on a server with ECC RAM for my home file storage I have no control over the computers purchased by all the companies I deal with – which are mostly the cheapest available computers. I also have no option to buy a laptop with ECC RAM because companies like Lenovo have decided not to manufacture them.

It seems to me that the easiest way of increasing overall reliability of computers would be to use ECC RAM everywhere. In the early 90’s all IBM compatible PCs had parity RAM, that meant that for each byte there was one extra bit which would report 100% of single-bit errors and 50% of errors that involved random memory corruption. Then manufacturers decided to save a tiny amount of money on memory by using 8/9 the number of chips for desktop/laptop systems and probably make more money on selling servers with ECC RAM. If the government was to impose a 20% tax on computers that lack ECC RAM then manufacturers would immediately start using it everywhere and the end result would be no price increase overall as it’s cheaper to design desktop systems and servers with the same motherboards – apparently some desktop systems have motherboard support for ECC RAM but don’t ship with suitable RAM or advertise the support for such RAM.

This principle applies to many other products too. One obvious example is cars, a car manufacturer can sell cheap cars with few safety features and then when occupants of those cars and other road users are injured the government ends up paying for medical expenses and disability pensions. If there was a tax for every car that has a poor crash test rating and a tax for every car company that performs badly in real world use then it would give car companies some incentive to manufacture safer vehicles.

Now there are situations where design considerations preclude such features. For example implementing ECC RAM in mobile phones might involve technical difficulties (particularly for 32bit phones) and making some trucks and farm equipment safer might be difficult. But when a company produces multiple similar products that differ significantly in quality such as PCs with and without ECC RAM or cars with and without air-bags there would be no difficulty in making them all of them higher quality.

I don’t think that we will have a government that implements such ideas any time soon, it seems that our government is more interested in giving money to corporations than taxing them. But one thing that could be done is to adopt a policy of only giving money to companies if they produce high quality products. If a car company is to be given hundreds of millions of dollars for not closing a factory then that factory should produce cars with all possible safety features. If a computer company is going to be given significant tax breaks for doing R&D then they should be developing products that won’t crash.

15 comments to Taxing Inferior Products

Someone described to me how their car’s on-board computer ‘crashed’ and left them rolling down a hill without being able to activate the handbrake. After managing to stop safely, they got out, then the doors locked and they couldn’t get back in. The car dealer later ‘serviced’ it by cold-booting the computer and said “well that’s really strange, but it seems to be working now”.

It terrifies me that crap consumer components or software, which work wonderfully for leisurely computer and phone uses and do some really cool stuff, are finding their way more easily into life-endangering situations. And I suppose relied on for emergency responders’ logistics/communications too.

I too thought of mirrored L1 CPU cache, ECC, RAID with checksumming filesystems, redundant PSUs and what goes into designing a simple fileserver to be reliable over its working life of just a few years. And if systems are worth a substantial amount of money to a corporation, the vendor damn well better have an explanation if something goes wrong, and make sure it doesn’t happen again.

And what was it Linus once said, “Controlling a laser with Linux is crazy…”

To a first approximation, all crashes are due to software (or sometimes hardware) bugs and cannot be avoided by error correction. ECC will make almost no difference at all, except when the computer is being used in a plane or other environment with higher levels of radiation.

Anyway don’t blame Lenovo for not supporting ECC – this is a feature of the memory controller, which is now part of the CPU. Intel doesn’t support ECC in ‘client’ processors, and AMD made some critical mistakes that mean their CPUs are no longer competitive with Intel in most areas.

Steven: What model of car is that? snopes.com has no reference but it’s something that seems like a good candidate for them.

Most cars have a handbrake that isn’t controlled by the computer, not that it matters as a handbrake won’t stop a car (try driving a car with the handbrake on, it won’t slow you down much).

There have been a number of serious crashes of Lexus where the car gets locked into a full-accelerate mode, last time I was reading about that there was some debate about whether that problem was related to the engine or the accelerator pedal. When a typical petrol car has the accelerator fully depressed the brake pedal won’t work much (at best you get one good press) due to the lack of vacuum.

A friend of mine worked on a project with an extremely high powered LASER controlled by Linux. It’s simple you have it in a locked room with emergency cut-offs, the worst it could do is damage the machine it operated in and damage the raw material used for the factory.

Ben: I’ve twice had a BTRFS filesystem corrupted to the point that it could only ever be mounted read-only due to a defective non-ECC DIMM. I’ve had a HP server announce memory errors while running (the DIMM was replaced with no data loss AFAIK). I’ve seen many discussions on Linux mailing lists suggesting things like doing repeated kernel builds to test for memory errors. ECC does make a difference even without radiation problems.

If Lenovo wanted ECC support in Thinkpads and was willing to pay anything less than 20% of the price of a Thinkpad for it then Intel would build the chips. If not then AMD would build it.

Unfortunately I didn’t catch the car’s make or model because I only heard half a conversation, but I heard it first-hand from the owner of the car, who was driving, fortunately able to bring the car to a stop somewhere safe (and flat). Even if a footbrake still works, an unexpected failure like this could cause a human to panic before reacting appropriately; rolling through a stop light may not take very long. And if such a car was already parked on a hill I would hope a failing computer can’t disengage the parking brake. Engine braking requires being in gear, but gearboxes might be automatic also. Locking him out of the car is not that great either, supposing it was very cold weather and he’d left his coat inside.

Re: Ben, I also think the quality of software concerns me most. Certainly when things try to do wireless communication as well.

Steven: I’m not aware of any car that requires being in gear for braking. The only requirement I’m aware of is that the vacuum be available, see the above URL for how power brakes work. For a car with a typical petrol engine the vacuum is provided by an idling engine. So if there is no vacuum without a serious mechanical failure then the engine wouldn’t be idling. If the accelerator is stuck down then you have an entirely different problem to what you describe and the car won’t roll to a stop.

It’s possible for someone to manufacture a car with only electronic control over both park brake and regular brake, but that would be a remarkably stupid thing to do. For the manufacturer it would add to the regulatory requirements and customers who understand such things would be unlikely to want to drive it.

When a car is parked it should be left in P if it has an automatic transmission or 1st gear or reverse if manual. That should stop it moving on anything other than the steepest hill. The more paranoid drivers angle the front wheels to point towards the kerb.

Intel is not putting ECC in its non-server chips no matter the price offered by anyone. They believe feature differentiation (and not just clockspeed) is necessary to justify the prices for their higher-end CPUs. ECC is one of many features treated in this fashion. If Intel is willing to make people buy Xeon workstations to get complete virtualization features then you should believe they’re willing to do the same to make you use ECC memory.

Besides, Lack of ECC RAM also wasn’t the cause of the schedule getting deleted. Medical software is not good, and it’s just as likely a disgruntled (former) employee deleted the entire schedule.

I can tell that a electrical connection (the wire to the wall) with just two cables without a ground cable (I’m Swedish and doesn’t know the English word for it) can cause lots of trouble with crashes and freezes.

My mom has got tree desktop computers that I had had. They have all worked rock solid in my home. But then she got them they started to frees often. All was running Debian Sid.
The first was a Mac G4. I fought that the the user base had gone so low that there was not enought bug-reports anymore. And therefore bugs started to aper.
The second was a i386. And I fought the case was the same as with the Mac G4.
The third was a amd64. There was no excuse for the freezes anymore. I thought about it and the only difference between her use of the computer and my use was that I had grounded cables. I told her to draw a cable from the kitchen, were she has grounded wall connections, to the computer. After that her computer has become stable.

But I suspect that it’s difference between laptops and desktop computers. My laptop runs fine without grounded cable.

Adam: Intel engages in a lot of anti-competitive behavior to preserve their market position, this makes it difficult for manufacturers to use AMD products even when they are better. But if there was a significant profit incentive then the major manufacturers would buy AMD CPUs and chipsets and then Intel would be forced to compete.

I’ll never know the exact cause of the schedule being deleted. But I’m sure that the lack of ECC RAM contributes to a general opinion that computers are unreliable and can’t be expected to work correctly.

Magnus: I think that the above article might explain your problem. If you have a poorly earthed PC connected to an ADSL modem, Ethernet switch, or other device then you might end up having unexpected current flowing through that cable which can cause problems.

Laptops are designed to run from battery without any earth connection. This would probably require some protection against AC on the earth lines of whatever devices they connect to.

Even if AMD were somehow competitive, the notion they’ll somehow force Intel to include ECC in all their memory controllers is a pipe dream. AMD doesn’t do it either today (nor will they in the future), and you’re perhaps the only person on Earth who wants ECC in consumer grade computers. I’d much rather have universal SSD hard drives with reliable failure detection, which would actually improve reliability. The margins are too low for OEMs even if the option was available to them. Besides, if you believe memory is the main hardware source of issues, you want more than ECC these days anyway.

As for the Magnus’ grounding issue, it can’t be caused by Ethernet since it Ethernet connections are isolated by specification. The main point of the earth connection is to protect humans from being shocked by the metal case in case it becomes energized somehow. However, without that connection, the case voltage is unknown and components making contact with the case may cause a static discharge resulting in problems, including crashes and hardware damage. Additionally, many power supplies filter noise in part to the earth connection, so it creates interference. This inference can occasionally be heard in sound cards and impact analog modems.

Laptops are less sensitive since they have external power supplies from which they receive DC, so the components are less sensitive to the power supply lacking an earth connection (if it even requires one). They also have additional filtering at the DC connection point on top of that.

Adam: SSDs are very common and systems that don’t have them can usually be upgraded to them for little cost. Failure detection in the SSD isn’t anything different from what we already have, hard drives have had ECCs for decades. However there’s still a 0.5% incidence of consumer hard drives returning bad data and claiming it to be good in 24*7 use (according to NetApp research), presumably the data corruption problems are greater in the face of the unexpected reboots that happen in typical home use (where “unexpected” to the OS includes application not responding to the user).

Intel wanted to use the IA64 architecture as a replacement for i386. AMD introduced the AMD64 architecture which was a better match for the user’s needs (better compatibility with i386) and beat Intel in the market. Now you hardly ever hear of IA64 (is it even possible to buy an IA64 laptop?) and the majority of Intel chips support AMD64.

Regarding the issue of whether Ethernet can trigger problems, “isolated by specification” isn’t an argument I find convincing when discussing typical PC hardware.

Intel never made a serious push for IA64 to replace the commodity market, despite some marketing speak at the time. There’s no reason to believe they seriously wanted it to replace x86, and even if they did, they never did anything about it.

SSDs aren’t that common in commodity equipment. They’re a common option, but they’re only standard in very small and high-end machines. If they were ubiquitous today, you’d see an marked reliability improvement over mechanical HDDs. If they had better failure handling too, even more so over the current status quo.

I don’t know how that incident rate was derived, or even what it means, but it sounds somewhat suspicious, and 24*7 isn’t a reasonable measurement case for home equipment.

As for Ethernet, you’re free not to believe me, but all twisted-pair 100MBit+ Ethernet PHYs contain an isolation transformer and are required to do so by specification. This is necessary to avoid ground loops and for noise reduction.

Adam: It’s difficult to believe that Intel would be so silly as to not want IA64 consumer systems. The only reason why we had i386 class processors in servers was because of the success of that architecture in consumer systems. MIPS, PPC, SPARC, and Alpha all were better designs in many ways, but without the design money that comes from a huge sales base they couldn’t compete.

I’ve had SSDs in several systems I run for 2 years now. It was 2 years ago that the price of Intel SSDs approached $100 each. If you buy a desktop PC with a monitor then an extra $120 for an Intel SSD isn’t going to significantly increase the price but will significantly improve performance. Also note that the prices of the smallest SSDs and hard drives are very close, so if you only need 100G of storage and you are buying a PC that doesn’t come bundled with a hard drive you don’t need then the price difference would probably be about $20.

If Intel wanted consumer IA-64 machines so badly, then where were they? They never existed, hence Intel and cohorts never were serious about an IA-64 takeover. It wasn’t even a success as a workstation system.

As for SSDs, they lack price parity at the same size, and size is a common differentiation metric among low and mid-end computers (especially laptops). They’re not going to be ubiquitous anytime soon. But that doesn’t matter, my point still remains: you’ll likely get more value in consumer PCs improving HDD reliability than RAM reliability.

I know all about silent read errors. It’s still not clear to me what the referenced incidence rate is, or why I should care about it. Assuming it’s just the number of drives where they saw the issue, then it is meaningless.