We as enthusiasts have for too long suffered without some basic necessities that enterprise customers have been getting in even their low end parts for years. I say it's time that we band together and demand a few basic concessions be made to consumers. The two most basic requirements that I think every system from the bargain bin up to the high end should support are SAS and ECC. Do you have any other strong candidates? If so, share them, lets rally together and demand more from our manufacturers!

SAS is a critical technology to connect hard drives and SSDs to your motherboard or Host Bus Adapter. SAS provides a few minor technical improvements over SATA including more resilient data correction, a small number of error correction commands, and a slightly enhanced instruction set based on SCSI instead of ATA. The Real reason I believe that SAS must become standard not just for new Motherboards but for all drives going forward is that SAS doesn't actually cost anything extra. There are no extra chips on a SAS hard drive and a SAS controller card only COSTS more to buy because they're sold to high end enterprises primarily. In fact, Hard Drive and Controller Card manufacturers are spending MORE on R&D and Production costs to manufacture separate SAS and SATA products when it makes far more sense to eliminate SATA all together and switch to SAS. Furthermore, the existing generations of SATA drives are forward compatible with SAS HBAs and Controllers, so the complete elimination of all SATA ports on motherboards and all SATA cards from the stores would not eliminate the ability to use existing SATA products.Furthermore, People may complain that SAS drives are more expensive. That is presently so for two reasons. First, they're targeted primarily to enterprise users who in general think nothing of paying $400 for a drive when an equal capacity is available for $150. Second, the more expensive SAS drives are tested more thoroughly and held to a higher standard of reliability and verification. The drives are rigorously tested and held to a higher standard of manufacture before leaving the factory. We don't need this. I'm not saying that there isn't grounds for expensive Enterprise hard drives. There certainly is. What I'm saying is that ALL hard drives as they currently are need to be sold with SAS interfaces and SOME of them need to be held to enterprise quality standards and marketed and priced accordingly.

The Second feature we must demand is ECC. ECC is a feature that works with RAM, Motherboards, and Processors to protect the data stored in memory from silent corruption. Most AMD processors support ECC. About 1/3 of Intel's processors support ECC. I believe it's critical that all processors support ECC. ECC is mostly used in Data Centers and Server Farms to ensure that the Critical data is always correct. I say that's not good enough. How would you feel if due to a bad bit of ram you lost the only picture you had of your Grandmother to data corruption on your desktop? It is a CRITICAL feature for every system. Many AMD motherboards currently support ECC from the low end all the way through to the high end. Most Intel boards do not. Currently, you have to spend DOUBLE to buy an equal amount of ECC RAM per gigabyte. This is not unreasonable. What's unreasonable is that Intel, and MOtherboard Manufacturers deem it acceptable to deny end users access to ECC RAM and enhanced data reliability.Yes it's true on low end systems ECC can add a noticeable overhead to a running system, and Yes it's true that on a high end system it can add a measurable latency. Therefore, I don't believe that all systems should be forced to be ECC enabled until such a time that we can make a sufficiently significant enhancement to RAM performance as to enable ECC and still get generational improvements over previous RAM models. However, I believe that over the course of the next years we must slowly enable ECC on all motherboards and all new RAM should be ECC Enabled, so that you have the OPTION to enable ECC on your system if you so desire. The Price delta for ECC Memory is understandable, given that extra R&D Costs do go into making ECC memory. However, I believe that when more people start buying ECC Memory for their desktops, and dare I say, Gaming computers? As More people invest in ECC Ram, the cost overhead to Manufacturers shall vanish.

You have every reason to expect your computer to function as intended. Silent Errors and inferior products shouldn't corrupt your data, and one low end piece of hardware shouldn't condemn you to everything being low end either. Do you not demand the most out of your systems? Should you not also expect the best? My expectations are not unreasonable, and my demands are fair. All I want is a computer that works, and works right every time, because it hasn't been nerfed by design.

Except 99.9999999999% of the time, the computer does work as intended. It's just that these enterprise computers have applications that need an extra 4 or 5 significant digits for added comfort, especially in servers (where data requests happen a lot more often than the average client) and scientific mainframes (where we'd like to not have to rerun the experiment because a bit error propagated down the line and trashed a lengthy simulation).

Other than that, a lot of applications used in non professional fields are fault tolerant. Either because it's the user is tolerant (do I care that a polygon in my game was shifted 2-3 pixels because of a math error from data corruption? No), or because the application is soft tolerant to it (does the game care that a data corruption caused a polygon to be shifted 2-3 pixels? Not really). And besides that, if you are worried about faults in your data, a CRC check will do.

So while yes, sometimes physical phenomena, like noise or spikes or cosmic rays, can screw up a signal, it happens so infrequently on a well designed system that for all intents and purposes, you're really just nitpicking.

Oh, Don't get me wrong, I agree that a lot of applications (Game clients especially) don't need ECC ram, or redundant storage, or any of the fancy niceties I would LIKE to see in the average desktop. But Grandma and Uncle both need those niceties because when their E-mail gets corrupt, or they can't see their picture of the dearly deceased, very bad feels happen. It's not for the gamer, or even for the facebooker that these things matter, but for the less informed users. Those who may not know why their application isn't behaving, or why their file is destroyed. It's also for the power users who DO care if the program they're trying to debug suffers a cosmic ray, or they lose a terabyte of data on disk.Believe you me, When the time comes that every system has ECC ram available, more than half of mine will still run with ECC DIsabled. I have the ability to use RAIDZ on most of my systems in some form or another, but I still choose to go with single disk setups.. The truth of the matter is it just doesn't matter if I lose a copy of Windows, because *I* know how to fix that. There are a LOT more features I'd like to see available in every computer, like VTD/IOMMU, and by extension, ZFS (There's a logic there, I'll explain it if you insist, but it's off topic). But I think the scope is well beyond what most people would know how to make use of, and beyond reasonable for a demand to place upon manufacturers.

Well the idea is include it gradually until volume makes the price drop. The problem is though that again, the probability of this happening is so infrequent, I'm pretty sure an extinction level event will happen before enough people get affected by errors that ECC RAM corrects to warrant it being a standard.

Or who knows? Maybe it'll be a built in standard in DDR4. Whenever that comes out.

I find the elimination of SATA more important to the end user anyway. The R&D of two separate sets of chips with two different protocols is a significant cost overhead. The SAS protocol is ever so slightly more robust, and I further believe that all the hard-drives should be running the same firmware with different default settings. There have been a lot of firmware related issues especially on lower end drives that, simply put, would not have happened had the firmware been the same firmware in every product line. The firmware should be consolidated into a single unified piece of code with greater resources placed into testing and quality assurance of the code base. If this had been the case, we wouldn't have had the TLER Timeout errors in low end WD drives causing the entire BUS to have to be reset. Such problems are, simply put, inexcusable. I know a lot of people weren't affected by these problems, but a lot were as well. SATA is the CAUSE of these problems, and the bait tempting hard-drive manufacturers to lower the bar again and again.

As for ECC Ram, I just personally believe that for any PRODUCTIVITY system, it's a gamble that's not worth the risk. I've had 3 sticks of bad RAM in my lifetime, all found by memtest86+, all as a result of mysterious errors that couldn't be attributed to anything specific. All of which would have been found and fixed or identified by ECC Memory in short order. I run ECC Memory in any motherboard that will take it, but I'm an Intel fanboy, so that's not a lot of motherboards or a lot of systems.

I don't think you have the complete knowledge of what's going on here to make such a demand.

First, SCSI hard drives are enterprise drives by default. And as such, they're designed to run 24 hours a day, 7 days a week, all year round. There's plenty of engineering decisions here at work that have to be factored in for an MTFR of over 1 million hours. So a combination of factors is why SCSI drives are much more robust and reliable, not simply because they use a "better" firmware. I'm pretty sure even if you put SCSI logic in a regular ATA drive, you wouldn't really see an improvement. Chances are, that drives going to fail catastrophically at the same rate anyway. http://pages.cs.wisc.edu/~remzi/Classes ... si-ata.pdf is a good paper to read about this.

Secondly, ECC is not a catch-all for correcting all errors. ECC is only good for the amount of data that the carrier is willing to repeat. I'm pretty sure for RAM in order to maximize density and keep costs down, ECC on RAM only works for one error per data transfer. And those errors you've encountered could've simply been something as minor as a defect that developed over time. Electronics wear out like everything else (see Electromigration)

Lastly, no matter how hard you try, your computer isn't going to work correctly 100% of the time. You should drop this thinking right now because you'll never get it.

Even if you want to get as close as possible to 100%, for economic reasons sometimes it isn't even worth it. These parts consume more materiel to build, which means less product overall. And the amount of testing that has to be done to ensure these products work as expected. Among other things. There's no such thing as a free lunch here.

Joined: Tue Jul 13, 2004 2:55 pmPosts: 9276Location: back on the right side of the middle of the left side YES i'm folding

it isnt as if SAS, SCSI, ECC, have not been around for a good number of years, if you want the tech, build a server that is what these were developed for anyway. scsi has been put into ata drives long ago, that is where the raptors came from scsi drives with sata controllers.

I'm not asserting that all drives need to be enterprise quality, or that a higher QA on the drives themselves would have eliminated the UNDERLYING problem with these drives. My assertion, though I may have been unclear about it, is that using a single unified SAS bus and firmware would have allowed those drives to properly communicate about the underlying drive failures to the host and have the issues resolved quickly and easily.

As for ECC, I'm aware that it can't repair every error. I don't expect it to catch every error. I simply expect an exception to be raised and a log entry to be made so that I can replace the faulty parts. Is that too much to ask? I don't expect my spinning rust to spontaneously recover the data from a bad sector, either. Just to report it.

I understand that some malfunctions are to be expected. Some can be identified and remedied easily. I only demand that those mechanisms that can easily be identified to be remedied not be artificially limited by such small items as a few lines of code disabled by a configure flag, or such obtuse items as maintaining two separate lines of controller and firmware. It's not a lot to ask really.

If you're asserting that SCSI and SATA should unify, I think you would be better suited to unifying other standards that make no sense why they're fragmented. For instance, there's a standard for rendering web pages, but there's like a half dozen ways to do it. This is a nightmare for web developers because they have to make sure each of these rendering engines and bloat their code at worst case 5-6 fold. At least SCSI and SATA are two incompatible standards and both of them have been around for a very long time (SATA is just an extension of ATA after all). And they're hard standards rather than soft ones web protocols specify, so pretty much every SATA controller is going to look the same.Also, this is a pretty tall claim:

Quote:

I know a lot of people weren't affected by these problems, but a lot were as well. SATA is the CAUSE of these problems, and the bait tempting hard-drive manufacturers to lower the bar again and again.

Do you care to provide some hard evidence rather than talking out of your rear? And how do you know that through a combination of commands that I can't make the SAS interface poop out? POSIX operating systems are more robust than Windows NT based ones by default, but you can easily break a POSIX OS with the right commands, just as you can a Windows NT OS. Can we blame OCZ's SSD issues are because of SATA?... No, it's because OCZ wrote crappy firmware that controlled the bare-metal communication to the storage medium

In any case, the end point is that a lot of these errors the enterprise and scientific computing are worried about happens so infrequently to the average customer that it's not even worthwhile. For that extra 1 part per million chance out of a 1 part per billion system. And as pointed out, do you want a system with all the trimmings? Get a workstation board. Most of us are content with what we have. And if I'm not going to see an appreciable difference for "insurance", then I'm not going to be happy paying out more money for a part that will eventually "lower in cost".

Who is online

Users browsing this forum: No registered users and 5 guests

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum