Just to clarify the main question regarding the sense and the task of a "Trim Check Tool":There is no need for an application, which verifies at each time of the day or at special situations while working on the computer, if TRIM is currently active within the RAID0 or not (maybe the development of such tool is even impossible).
What CyberShadow is trying to develop is a tool, which is able to verify once for ever, if TRIM is active at all within the RAID0 array. The tool should find out, if the TRIM command passes through the related Intel SATA RAID Controller into the RAIDed SSDs.
According to my experiences the chances to get positive TRIM test results increase to 100%, when the release of TRIM commands by the OS (Win7/Win8) has been triggered just before doing the TRIM test. This can easily be done by running the "Optimizer" (formerly named Defrag Tool) of Windows 8, but I do not know a similar TRIM command impulse while running Windows 7.

The program generates a random block of data each run, so COW should have no role here.

I am not sure what "generating a random block of data" would have to do with copy on write. If your app generates data and the empty OS level clusters/blocks/sectors are in a NAND block that is not fully allocated, the controller will copy on write the data from the previous NAND block.

Say you wrote 32MB of data. There is a chance, while rare, that in a standard Windows 4KB cluster all 8192 4KB parts end up sharing a 8,16,32KB NAND page. If this was the case, a TRIM for that file could result in 8192 TRIM commands to the disk. The garbage collection may not do anything in that case because copy on write to spare pages would produce less wear on the NAND. The TRIM commands arrived at the disk, but GC can choose to do nothing with them.

This is an edge case but is a valid approach and I suspect more common on the enterprise drives that don't bother with TRIM because the spare area writes are faster and they have a lot more of them.

Quote:

The program currently recognizes 00 and FF as unallocated. I can add more patterns if anyone reports them. Regardless, the program prints the first 16 bytes of whatever it reads.

I agree with everything else in your post. I hope you'll agree that this tool is still much better than nothing (or having to run a performance benchmark).

Sorry, I'm having trouble following your post, imagoon. "Copy-on-write" is a term used most frequently in the context of data deduplication, which is how I interpreted your first mention of it. In that meaning, I wanted to clarify that the data written should be unique, thus the reference count of the allocation unit containing the data should always be 1 at the time of deletion. It doesn't look like any current SSDs actually use deduplication yet, though.

I can't find anything relevant to the usage of the term in the context of SSDs. As expected, a Google search only returns results in the context of deduplicating file systems.

Would you mind clarifying the terminology you used? References to established sources which discuss the points of your argument would also be appreciated.

Sorry, I'm having trouble following your post, imagoon. "Copy-on-write" is a term used most frequently in the context of data deduplication, which is how I interpreted your first mention of it. In that meaning, I wanted to clarify that the data written should be unique, thus the reference count of the allocation unit containing the data should always be 1 at the time of deletion. It doesn't look like any current SSDs actually use deduplication yet, though.

I can't find anything relevant to the usage of the term in the context of SSDs. As expected, a Google search only returns results in the context of deduplicating file systems.

Would you mind clarifying the terminology you used? References to established sources which discuss the points of your argument would also be appreciated.

Code:

SSD copy on write refers to a technique new SSD's use to write data to NAND pages. Say you have a 8KB page but windows has a 4KB page. When Windows deletes 1 of the 4KB clusters and sends the TRIM command to the disk, rather than reading the page, clearing the 8KB page and then rewriting the 4KB to the NAND and leaving the the 4KB erased, it leaves the page as is until that 4KB chunk is used again, then copies the rest of the page, combines it in RAM and writes it to an entirely new already erase page. Often this page is in the SSD spare area.
Completely random example:
SSD page ID: AF
Windows cluster 0E, 0F
Windows: Delete 0F -> TRIM "0F is empty and free for use"
SSD: Ok. (Do nothing)
Windows: Write 0F "some random file"
SSD: Ok. -> SSD: Copy page first 4KB of AF to new page FE, write new 4KB to FE. Queue AF for garbage collection.
End result is the data is there but the new data and the old data on the page were merged and written somewhere else on the NAND and the old page it marked for GC.
The disk can still use the TRIM commands and would mark page AF for GC if Windows had actually said "Delete 0E->0F"
Main issue is that until the COW, 0F may contain the old data in that cluster.

Thanks, I understand what you meant now. TrimCheck uses a 64MB file, with the tested data being in the middle, but I could see how something like that could occur due to fragmentation. Do you happen to know how likely it is in practice for SSD TRIM-able pages to mismatch the OS sector size?

Edit: I could improve TrimCheck to avoid the fragmentation issue, but I'd like some confirmation that this problem can occur in practice first.

Thanks, I understand what you meant now. TrimCheck uses a 64MB file, with the tested data being in the middle, but I could see how something like that could occur due to fragmentation. Do you happen to know how likely it is in practice for SSD TRIM-able pages to mismatch the OS sector size?

Edit: I could improve TrimCheck to avoid the fragmentation issue, but I'd like some confirmation that this problem can occur in practice first.

It is fairly common on the Windows side. Default cluster in Windows is 4KB. Most SSD's are 8kb -> 512kb pages.

Do you have a source? All I could find was talks about how it could improve things in theory, but considering that deduplication can already be implemented on the FS level I can see how adding them to the controller would not be a priority.

Also, as we've established, that was not the meaning of COW that imagoon meant.

My system more or less crashed shortly after running this tool. It didnt really crash per se, but something bad happened. I could not kill a process that seemed corrupt. I couldnt even use windows explorer. Windows explorer would open, but after clicking on 1 or two random directories it just quits responding. Yet other programs work fine.

__________________
I am looking for a cheap upgrade to my 3 year old computer.
AT forum member #1: Buy a 4790k

I am looking for a way to get 10 more fps in TF2.
AT forum member #2: Buy a 4790k

I already gave you one. I guess you didn't read the article.
Here, I'll quote it for you. This applies to Sandforce controllers specifically (they call it Durawrite), but I'm sure other vendors have their own procedures/implementations for doing the same.

Quote:

a block-level deduplication scheme where incoming chunks of data to a storage medium are examined to see if they are different from the data already on the storage medium, and only the unique portions are sent on to be held in a small buffer and then written out to the flash. This is the essence of a "pre-process" or "ingest" block-level deduplication scheme, which deals not in files (which, remember, are constructs of the operating system, not something the drive knows anything about) but in the actual data structures on the drive.

Quote:

Originally Posted by CyberShadow

I am the creator of TrimCheck. I would appreciate less assumptions of ignorance about newcomers.

Okay, but let's face it, so far every assumption you have made about SSD operation has been wrong.

If testing for TRIM was as easy as you make it out to be someone else would have done it already. After all, how many years has it been? About 6 or so?

"Copy-on-write" is a term used most frequently in the context of data deduplication

What Imagoon probably means is write remapping. This is very similar to how Copy-on-Write (COW) operations work in modern filesystems like ZFS. When ZFS overwrites existing data, it does not actually overwrite that data like NTFS filesystem would do, but rather writes this data at a different spot. This improves resilience if the filesystem in case of corruption or missed recent writes, such as lost buffers after a sudden power loss.

Deduplication in SSDs is only implemented by the Sandforce controller. It's only purpose is to cheat the IOmeter benchmarks, just like compression was used to cheat ATTO benchmark.

As for a 'TRIM check tool'.. that does not really exist. There is no mechanism for the computer to know what the SSD did with a TRIM request. The SSD could just skip the requests altogether, like harddrives do. Usually the best way to test TRIM is to look at the storage driver, using AS SSD. Not many Windows drivers are TRIM compatible.

One last remark. Please understand we are talking about Windows-specific issues here. TRIM over RAID and all that is only a problem under Windows. The same hardware can use TRIM just fine on any RAID configuration you want because the hardware itself is nothing more than a storage controller; it just sends the commands it receives. The controller itself doesn't play any part in the whole TRIM story. The TRIM story is about whether the filesystem-generated TRIM requests survive it all the way to the transport to the SSD. Many (RAID-) drivers simply do not process TRIM requests. As far as memory serves me, only Intel, AMD and Microsoft drivers implement TRIM under Windows OS.

One last remark. Please understand we are talking about Windows-specific issues here. TRIM over RAID and all that is only a problem under Windows. The same hardware can use TRIM just fine on any RAID configuration you want because the hardware itself is nothing more than a storage controller; it just sends the commands it receives. The controller itself doesn't play any part in the whole TRIM story. The TRIM story is about whether the filesystem-generated TRIM requests survive it all the way to the transport to the SSD. Many (RAID-) drivers simply do not process TRIM requests. As far as memory serves me, only Intel, AMD and Microsoft drivers implement TRIM under Windows OS.

But what about RAID configurations other than RAID 1, where data is striped or distributed in ways where merely passing along a TRIM command wouldn't make sense?

One last remark. Please understand we are talking about Windows-specific issues here. TRIM over RAID and all that is only a problem under Windows. The same hardware can use TRIM just fine on any RAID configuration you want because the hardware itself is nothing more than a storage controller; it just sends the commands it receives. The controller itself doesn't play any part in the whole TRIM story. The TRIM story is about whether the filesystem-generated TRIM requests survive it all the way to the transport to the SSD. Many (RAID-) drivers simply do not process TRIM requests. As far as memory serves me, only Intel, AMD and Microsoft drivers implement TRIM under Windows OS.

I generally disagree here. First this is not Windows specific, it is RAID specific issue. The RAID controller is basically acting like a like a huge translation device (among other things like doing XOR etc). Any OS sending "TRIM LBA 000000AF" would have no clue where that data actually sits on the physical disks. The controller would need to do translation which is more complex in anything in above RAID 0,1,10 since the LBA for the file might sit on a stripe that is divided among 3 -> 50 drives.

But what about RAID configurations other than RAID 1, where data is striped or distributed in ways where merely passing along a TRIM command wouldn't make sense?

Interesting question. Basically, you are thinking: a single SSD is easy to send TRIM to, we know the LBA address and it won't change. But with SSDs in RAID, we have to split the TRIM and do other tricks with it, so it is a little harder here. That is true.

But what you think of being difficult, is the prime function of a RAID engine. RAID basically is nothing more than a disk multiplexer. It converts the LBA ranges from multiple disks to one virtual LBA range. So the 'trick' to support TRIM in RAID is very simple. It boils down to the core of the RAID driver functionality: splitting and combining I/O to translate virtual to physical LBA.

If a TRIM request arrives at the RAID engine layer, the RAID layer must split this command just like a READ request or even write request. In most implementations, TRIM is implemented using the same I/O path as other requests, meaning that the same piece of code is used to split a READ request as it is used to split a TRIM request to two or more SSDs.

So what you think of being difficult, is actually the prime function of a RAID engine. It has to 'translate' requests from virtual LBA to physical LBA. That's the primary function of a RAID engine.

Likewise, a filesystem has the prime function of translating character-based storage ("files") to block-based storage ("harddrives"). All these are translations from one domain to the other. Each 'node' only has to know stuff applicable to its own layer only. For example, a RAID engine strictly knows nothing about files or filesystems; it only cares about block storage referenced by LBA.

Quote:

Originally Posted by imagoon

I generally disagree here. First this is not Windows specific, it is RAID specific issue. The RAID controller is basically acting like a like a huge translation device (among other things like doing XOR etc). Any OS sending "TRIM LBA 000000AF" would have no clue where that data actually sits on the physical disks. The controller would need to do translation which is more complex in anything in above RAID 0,1,10 since the LBA for the file might sit on a stripe that is divided among 3 -> 50 drives.

Same difficulty exists for READ requests, doesn't it? ;-)

I do understand your confusion however, TRIM and RAID are subjects that have a huge amount of misinformation associated with it. Even worse, the information that is correct only applies to one operating system: Windows.

For your information, TRIM on RAID works just fine on RAID0, RAID1, RAID0+1, RAID1+0, RAID5, RAID6 even 'RAID7' in the form of RAID-Z3 triple-parity ZFS 'RAID'. There is nothing that would make TRIM on RAID excessively difficult. In many cases, no change to the RAID layer itself is required. It just translates ATA requests.

In Windows, the operating system uses two interfaces to communicate with device drivers: SCSI and ATA. The latter supports TRIM because TRIM is an ATA ACS2 command. But SCSI does not support TRIM; SCSI supports UNMAP which is equivalent to TRIM. But as far as I know, Windows do not support UNMAP, only TRIM.

So what happens on Windows is that NTFS will generate TRIM requests, but these simply die when communicating with the device driver over SCSI interface. This is a design limitation of the Windows operating system. FreeBSD, by contrast, uses unified ATA/SCSI interface. FreeBSD has the superior GEOM framework which allows I/O to travel down the chain as 'BIO_DELETE' operations. Only at the level of the device driver is this command translated to either:
- TRIM, if on ATA interface
- UNMAP, if on SCSI/SAS interface
- CF ERASE, if used on compactflash storage

In other words, the TRIM requests are not TRIM requests until the lowest level physical layer in software. This is the proper way to implement TRIM. This also means that automatically, all GEOM modules support TRIM. The RAID modules, the encryption modules, the compression modules, the virtualisation modules, etc.

As an example, you can search on OCZ Revodrive. This product comes in three versions (1 and 2, both based on Silicon Image, and version 3 based on LSISAS). All three versions do not support TRIM on Windows 7, because the driver communicates as SCSI not as ATA. Byebye TRIM!

Likewise, if you use the same hardware under non-Windows OS, TRIM will work just fine on all three revodrives. The first two are Silicon Image which only offers RAID through the Windows-only drivers, that is why it is called 'FakeRAID' because the actual hardware is just a normal SATA controller with bootstrapping. The drivers are what provide the RAID functionality, and only under Windows OS. If you boot Linux with Revodrive, you can actually see two or four separate SSDs on Silcon Image controller, instead of one big RAIDed PCIe storage device. In other words: you see the true hardware, not some mere driver which pretends to be something else.

In general, Windows is very outdated in storage technology. Other operating systems like Linux and BSD generally are far ahead in multiple areas. TRIM-support being one of them.

I do understand your confusion however, TRIM and RAID are subjects that have a huge amount of misinformation associated with it. Even worse, the information that is correct only applies to one operating system: Windows.

For your information, TRIM on RAID works just fine on RAID0, RAID1, RAID0+1, RAID1+0, RAID5, RAID6 even 'RAID7' in the form of RAID-Z3 triple-parity ZFS 'RAID'. There is nothing that would make TRIM on RAID excessively difficult. In many cases, no change to the RAID layer itself is required. It just translates ATA requests.

In Windows, the operating system uses two interfaces to communicate with device drivers: SCSI and ATA. The latter supports TRIM because TRIM is an ATA ACS2 command. But SCSI does not support TRIM; SCSI supports UNMAP which is equivalent to TRIM. But as far as I know, Windows do not support UNMAP, only TRIM.

So what happens on Windows is that NTFS will generate TRIM requests, but these simply die when communicating with the device driver over SCSI interface. This is a design limitation of the Windows operating system. FreeBSD, by contrast, uses unified ATA/SCSI interface. FreeBSD has the superior GEOM framework which allows I/O to travel down the chain as 'BIO_DELETE' operations. Only at the level of the device driver is this command translated to either:
- TRIM, if on ATA interface
- UNMAP, if on SCSI/SAS interface
- CF ERASE, if used on compactflash storage

In other words, the TRIM requests are not TRIM requests until the lowest level physical layer in software. This is the proper way to implement TRIM. This also means that automatically, all GEOM modules support TRIM. The RAID modules, the encryption modules, the compression modules, the virtualisation modules, etc.

As an example, you can search on OCZ Revodrive. This product comes in three versions (1 and 2, both based on Silicon Image, and version 3 based on LSISAS). All three versions do not support TRIM on Windows 7, because the driver communicates as SCSI not as ATA. Byebye TRIM!

Likewise, if you use the same hardware under non-Windows OS, TRIM will work just fine on all three revodrives. The first two are Silicon Image which only offers RAID through the Windows-only drivers, that is why it is called 'FakeRAID' because the actual hardware is just a normal SATA controller with bootstrapping. The drivers are what provide the RAID functionality, and only under Windows OS. If you boot Linux with Revodrive, you can actually see two or four separate SSDs on Silcon Image controller, instead of one big RAIDed PCIe storage device. In other words: you see the true hardware, not some mere driver which pretends to be something else.

In general, Windows is very outdated in storage technology. Other operating systems like Linux and BSD generally are far ahead in multiple areas. TRIM-support being one of them.

Your comparison is a false argument. ZFS addresses the RAW devices and removes the controllers from the mix. Windows RAID which yes is "fake RAID" in the same way ZFS is "fake RAID." It is all done in software and the software has chosen to implement TRIM.

Quote:

So what happens on Windows is that NTFS will generate TRIM requests, but these simply die when communicating with the device driver over SCSI interface. This is a design limitation of the Windows operating system. FreeBSD, by contrast, uses unified ATA/SCSI interface. FreeBSD has the superior GEOM framework which allows I/O to travel down the chain as 'BIO_DELETE' operations.

This is also a false argument. Windows [ntfs] can send the equivalent of TRIM to SCSI devices. It requires device (hardware and software) support of the command. This is exactly the same as FreeBSD. If the kernel drivers have no way to handle the TRIM commands or the hardware has no support for it, it it dies at the driver / hardware. 'BIO_DELETE' has to be translated to the hardware API at some point. There is nothing 'magical' about the FreeBSD implementation that makes it work on hardware that simply doesn't support it.

The Revodrive that appears as SCSI could also TRIM on Windows assuming there was a proper TRIM supported SCSI driver written for it.

Quote:

There is nothing that would make TRIM on RAID excessively difficult. In many cases, no change to the RAID layer itself is required. It just translates ATA requests.

Sending the commands is easy. Making sure that all devices respond the same way and resolve with proper check sums etc is the hard part. TRIM as currently defined has no defined value that should be read back. Lots of SSDs report either a block of FF or a block of 00. XOR of FF ⊕ FF = 00. So if you TRIM 3 RAID 5 LBAs and the raw disks report back FF ⊕ FF = FF, the array just when inconsistent and you can't honestly know at this point if the trimmed data is correct or if the file that was written there was say TIFF with megabytes of whitespace. What happens if those trimmed blocks are part of check summed stripe? Hardware RAID generally expects that a) devices report errors b) the drives are consistent. The current big implementations do not expect the drives to be making decisions on a whim. This would require the drives to actually report back that the sector was cleared during garbage collection to the RAID engine so it could handle these cases. It would also require the drives to have some what to at least partly sync garbage collection. If one drive GC's block 08h at time 0 and the second and third drive do it time + 15minutes and time +3days, when that sector is read, which is correct? Will the RAID card be expected to track all the requests someplace? What if the clearing of drive one was actually a drive fault that caused GC to clear the sector but the drive reports properly it cleared it? The current TRIM specs simply do not require a drive to do anything with the request at any given time or at all.

ZFS gets around this because it is a filesystem and a disk subsystem. It can note in the file system that GC was requested against sectors and during reads know that a consistency error like FF ⊕ FF = FF is "ok."

Until the "TRIM" for SCSI / SAS is locked down and defined and TRIM for ATA is redefined, there will be no real "TRIM" support on most big RAID arrays. I expect that the Enterprise will continue to use SSD spare area, copy on write techniques which reduce the value of TRIM to near zero anyway.

I value your comments, imagoon. However, I feel you have not yet fully mastered the fine details of storage backends. Please allow me to elaborate.

ZFS accesses devices just like other software engines would. Software RAID is also not 'FakeRAID'. That term is reserved for onboard RAID which uses option ROM firmware to allow bootstrapping from RAID arrays that are operated by Windows-only drivers. The actual hardware is just an ordinary SATA controller, but it has firmware to allow creation of RAID firmware and make bootstrap possible until the RAID drivers take over. At that point, it becomes driver RAID without any 'acceleration' from the hardware. Because many people have led to believe that onboard RAID is also hardware RAID, the term FakeRAID applies since it is not a true hardware RAID implementation. In fact, it is extremely close to software RAID except for the booting part.

The hardware also does not need to support TRIM, it already does because the hardware supports ATA and TRIM is a valid ATA8 command. Hence, TRIM also works on controllers before TRIM even existed, assuming the SSD itself supports it and the operating system as well.

There are only three things required to make TRIM work:
- SSD support (this one is easy)
- operating system support (this one is harder)
- driver chain that can pass TRIM (this one is hardest)

The latter is where TRIM fails on Windows. What you say about 'assuming there was a proper TRIM supported SCSI driver written for it' cannot be true, simply because SCSI has no ATA TRIM command, it has only UNMAP command amongst others. Thus, if your driver interfaces with SCSI on Windows 7 operating system, it cannot receive TRIM commands because those simply do not go over a SCSI driver-interface.

Windows could properly support TRIM probably, if the operating system would have converted a 'BIO_DELETE' equivalent to UNMAP and then send it to the SCSI-interfaced driver. That would work. But it can't, it only supports TRIM in NTFS driver, not other filesystems like FAT32 or third party filesystems. Windows 7 anyway.

FreeBSD on the other hand, has the GEOM I/O framework, the most advanced storage backend on the planet. It can create chains of all kinds of driver modules, including various types of RAID, and supports TRIM/UNMAP *and* CFERASE all in one beat. TRIM is not implemented on the filesystem level like on Windows, instead an agnostic 'BIO_DELETE' command is used which only gets translated to ATA TRIM if it reaches an ATA driver at the end of its chain. If it's a SCSI driver it will be translated to SCSI UNMAP instead.

In other words, FreeBSD has a superior implementation that allows TRIM in all various forms, while Windows 7 (not sure about 8) only supports a very limited and simply hardcoded method of TRIM that only works on ATA drivers. It doesn't just apply to ZFS, but also to UFS filesystem which had TRIM support even sooner. So it's not like ZFS can implement ZFS easier; in fact it was much harder to implement TRIM on ZFS than the usual UFS + software RAID.

Quote:

TRIM as currently defined has no defined value that should be read back.

This is a very clever remark and it pleases me that you bring up this argument, because it is a very good one. Because you must be thinking: couldn't we get into trouble if say SSD X decides to return garbage data or 1's instead of 0's when we read TRIMed LBA? The answer is both yes and no.

Yes, we do get in trouble, because if we TRIM ERASE the spots and we rebuild the RAID5 afterwards, we will not have consistent parity and thus the parity will get overwritten whenever a rebuild occurs. For this reason a RAID5 driver may opt not to send the BIO_DELETE to the disk containing the parity for the current full stripe block. But that isn't necessary; you can TRIM erase the parity as well and just have a partially consistent RAID. It does not matter that spot has inconsistent parity because it is not used. Once it starts being used - even if partially - then the parity for the (partial) data block will be consistent. On partial writes the whole block will be inconsistent, but partial parity will match the partially written data.

So the actual answer is: it doesn't matter. If a spot is TRIMed, it is not going to be used. It will not be read. Instead, it will be overwritten with new data once the filesystem wants to use that LBA range again, or only a fraction of it.

Concluding, there is no theoretical limitation on RAID5 and TRIM support. If an implementation doesn't support it, that is a limitation of that specific RAID subsystem and/or operating system, not the fault of RAID scheme itself.

ZFS, of course, is vastly superior to all software or hardware RAID and legacy filesystems. It's a giant leap in protection offered versus the legacy combination of RAID + filesystem on a virtual disk. Especially in relation to bad sectors. I would be happy to explain this as well, but I fear I must decline because this is considered off-topic. However, nothing prevents any of you from creating your own thread. ;-)

The latter is where TRIM fails on Windows. What you say about 'assuming there was a proper TRIM supported SCSI driver written for it' cannot be true, simply because SCSI has no ATA TRIM command, it has only UNMAP command amongst others. Thus, if your driver interfaces with SCSI on Windows 7 operating system, it cannot receive TRIM commands because those simply do not go over a SCSI driver-interface.

According to my knowledge it is the SCSI filter driver named iaStorF.sys, which switches the TRIM command to an UNMAP command, which is able to pass through the Intel SATA RAID Controller.
Contrary to Win7 the OS Win8 supports the UNMAP command by itself. That is why the SCSI filter driver will not be installed and not working at all, if any of the Intel RST(e) drivers v11.5.x.x.xxxx or higher will be installed onto a Win8 system.

I value your comments, imagoon. However, I feel you have not yet fully mastered the fine details of storage backends. Please allow me to elaborate.

So far you have not convinced me that you understand it.

Quote:

ZFS accesses devices just like other software engines would. Software RAID is also not 'FakeRAID'. That term is reserved for onboard RAID which uses option ROM firmware to allow bootstrapping from RAID arrays that are operated by Windows-only drivers. The actual hardware is just an ordinary SATA controller, but it has firmware to allow creation of RAID firmware and make bootstrap possible until the RAID drivers take over. At that point, it becomes driver RAID without any 'acceleration' from the hardware. Because many people have led to believe that onboard RAID is also hardware RAID, the term FakeRAID applies since it is not a true hardware RAID implementation. In fact, it is extremely close to software RAID except for the booting part.

By that same definition, ZFS is fake RAID.
ZFS can run on top of an ordinary SATA controller. It uses software (ie firmware) to handle RAID functions.
The entire Windows Only part is a misnomer because some of the manufactures of these driver based RAID controllers have released BSD / Linux kernel drivers doing the same thing.
ZFS because it is software, will have no controller based acceleration.
ZFS is not a true hardware RAID implementation.

Quote:

The hardware also does not need to support TRIM, it already does because the hardware supports ATA and TRIM is a valid ATA8 command. Hence, TRIM also works on controllers before TRIM even existed, assuming the SSD itself supports it and the operating system as well.

There are only three things required to make TRIM work:
- SSD support (this one is easy)
- operating system support (this one is harder)
- driver chain that can pass TRIM (this one is hardest)

A hardware controller such as server class board would definitely need to support TRIM. It would be required to do the translation work to get it to the end devices.

Quote:

The latter is where TRIM fails on Windows. What you say about 'assuming there was a proper TRIM supported SCSI driver written for it' cannot be true, simply because SCSI has no ATA TRIM command, it has only UNMAP command amongst others. Thus, if your driver interfaces with SCSI on Windows 7 operating system, it cannot receive TRIM commands because those simply do not go over a SCSI driver-interface.

Windows could properly support TRIM probably, if the operating system would have converted a 'BIO_DELETE' equivalent to UNMAP and then send it to the SCSI-interfaced driver. That would work. But it can't, it only supports TRIM in NTFS driver, not other filesystems like FAT32 or third party filesystems. Windows 7 anyway.

NTFS itself is divided in to 2 layers. The top layer generates delete notifications that are more generic than TRIM / UNMAP. The next layer of the API handles the translation in to the hardware calls. Intel already hooks this to build UNMAP commands for their SCSI devices. This is the same way REFS on 2012 works also but MS made the hook and call set public. The driver could also just receive the raw TRIM command and translate it in to SCSI UNMAP.

Quote:

FreeBSD on the other hand, has the GEOM I/O framework, the most advanced storage backend on the planet. It can create chains of all kinds of driver modules, including various types of RAID, and supports TRIM/UNMAP *and* CFERASE all in one beat. TRIM is not implemented on the filesystem level like on Windows, instead an agnostic 'BIO_DELETE' command is used which only gets translated to ATA TRIM if it reaches an ATA driver at the end of its chain. If it's a SCSI driver it will be translated to SCSI UNMAP instead.

The same thing can be accomplished on the Windows side. RE: Intel doing it now. Windows 2012 improved this by making it public now.

Quote:

In other words, FreeBSD has a superior implementation that allows TRIM in all various forms, while Windows 7 (not sure about 8) only supports a very limited and simply hardcoded method of TRIM that only works on ATA drivers. It doesn't just apply to ZFS, but also to UFS filesystem which had TRIM support even sooner. So it's not like ZFS can implement ZFS easier; in fact it was much harder to implement TRIM on ZFS than the usual UFS + software RAID.

This is a very clever remark and it pleases me that you bring up this argument, because it is a very good one. Because you must be thinking: couldn't we get into trouble if say SSD X decides to return garbage data or 1's instead of 0's when we read TRIMed LBA? The answer is both yes and no.

I am not here to please you. Please stop using condescending language like that.

Quote:

Yes, we do get in trouble, because if we TRIM ERASE the spots and we rebuild the RAID5 afterwards, we will not have consistent parity and thus the parity will get overwritten whenever a rebuild occurs. For this reason a RAID5 driver may opt not to send the BIO_DELETE to the disk containing the parity for the current full stripe block. But that isn't necessary; you can TRIM erase the parity as well and just have a partially consistent RAID. It does not matter that spot has inconsistent parity because it is not used. Once it starts being used - even if partially - then the parity for the (partial) data block will be consistent. On partial writes the whole block will be inconsistent, but partial parity will match the partially written data.

So the actual answer is: it doesn't matter. If a spot is TRIMed, it is not going to be used. It will not be read. Instead, it will be overwritten with new data once the filesystem wants to use that LBA range again, or only a fraction of it.

You make several very large assumptions here. First you are assuming that the controllers are not a) doing read verification b) periodic scrubbing c) using other methods to check the strip. If you are doing brainless XOR, you are correct, it will not cause issues. If you are using an array controller that is scrubbing on a regular basis you will see consistency alarms when you run across these garbage collected sectors. If you are doing any type of read verify or strip checksums you will also get alarms.

Quote:

Concluding, there is no theoretical limitation on RAID5 and TRIM support. If an implementation doesn't support it, that is a limitation of that specific RAID subsystem and/or operating system, not the fault of RAID scheme itself.

Only in the limited picture you have painted yes.

Quote:

ZFS, of course, is vastly superior to all software or hardware RAID and legacy filesystems. It's a giant leap in protection offered versus the legacy combination of RAID + filesystem on a virtual disk. Especially in relation to bad sectors. I would be happy to explain this as well, but I fear I must decline because this is considered off-topic. However, nothing prevents any of you from creating your own thread. ;-)

I am not going to argue that point because it is off topic. It however is very debatable depending on what you want to accomplish.

I would also note that unmap will do a lot more than TRIM. It will be used to manage variable sized volumes, thin provisioning clean up and the like. This is important in the enterprise SSD arena because SSDs in these huge arrays are rarely dedicated to one task. It can be used to unmap space to allow another LUN to expand in to that space more than clearing the NAND (as much.) TRIM mostly matters in the home space. Once you get up to business you are using larger spare areas and more aggressive garbage collection anyway. These same benefits enhance spinning rust also so its usage will be more wide spread I expect.

The proof is in the pudding: the program produces the correct result in the great majority of cases. I welcome you to try it for yourself, rather than make conclusions from theoretical conjectures.

You mean it produces the result you expect, which doesn't mean it produces the correct result. Even if it did produce the "correct" result, correlation <> causation.

Again, as has been stated by quite a few others here: Without access to the low level information about the drive - which you cannot get from Windows - you have no way of telling if TRIM is working or not.

Sorry, but your tool doesn't work because it CAN'T work.

I know it's difficult to be shown to be incorrect when you have put a lot of effort into something. It's probably best to treat it as a learning experience.

ZFS is not a fakeRAID, and neither is software RAID. FakeRAID specifically applies to driverRAID combined with Option ROM firmware, commonly referred to as 'onboard RAID'. AMD, Intel, VIA, nVidia, Silicon Image, Marvell, Promise, ASMedia, JMicron, etcetera. Most have onboard chips and are integrated on the motherboard, but the cheap discrete 'RAID' cards are often just FakeRAID in the sense that Windows-only drivers do the actual trick; not the controller itself.

Quote:

ZFS is not a true hardware RAID implementation.

That is right. In fact, ZFS is not even a RAID implementation. No RAID engine could mimic the way ZFS implements disk multiplexing aka interleaving. RAID schemes rely on static stripesizes for example, whereas ZFS utilise dynamic stripesizes to prevent all 2-phase writes (read-modify-write).

Strictly speaking, hardware RAID is also software (firmware). Executing logic in software generally is superior to hardware implementation because of the many advantages like flexibility.

Quote:

The entire Windows Only part is a misnomer because some of the manufactures of these driver based RAID controllers have released BSD / Linux kernel drivers doing the same thing.

Can you point to me where? You are aware that both Linux and BSD simply can recognise onboard RAID arrays and implement their own software RAID engine? There is no vendor driver involved. The only exceptions are Highpoint which uses a binary blob (proprietary) to implement RAID functionality on the RocketRAID 2000 series.

In fact, virtually all onboard RAID arrays work on Linux and BSD because of recognising RAID metadata on the last sector of each disk. This contains the stripesize and raid level and disk order and other variables required to implement the RAID using the software RAID capabilities of BSD itself. This is called PseudoRAID by FreeBSD, and is implemented in geom_raid module; which of course also can pass TRIM requests. ;-)

Quote:

A hardware controller such as server class board would definitely need to support TRIM. It would be required to do the translation work to get it to the end devices.

A hardware controller actually consists of two parts:
- a SATA/SAS controller like Marvell SAS PHY chip
- a RAID SoC or 'RAID chip' which is usually an ARM core capable of executing firmware that provides RAID functionality

So in your case of a hardware RAID controller, the SAS PHY chip does not need to support TRIM or UNMAP. The RAID SoC with firmware however, does not need to implement RAIDTRIM because it is part of the driver chain between operating system and physical storage device. (edited with strikestrough and bold)

Quote:

The same thing can be accomplished on the Windows side. RE: Intel doing it now. Windows 2012 improved this by making it public now.

I'm afraid I cannot agree. NTFS generates TRIM at the filesystem side, thus the begin of the driver chain. While FreeBSD generates TRIM at the storage driver side, thus the end of the driver chain. Thus, it is implemented the exact opposite. Filesystems should not need to know about ATA commands; that domain is reserved for the storage driver. Windows NTFS is just hardcoded and messy and unsophisticated. Basically TRIM is a hack. In FreeBSD, the GEOM framework allows TRIM/UNMAP/CFERASE to flourish in all combinations and possibilities without requiring filesystems or RAID drivers to know about TRIM at all.

Quote:

You make several very large assumptions here. First you are assuming that the controllers are not a) doing read verification b) periodic scrubbing c) using other methods to check the strip.

As I stated earlier, even if that happens, the worst that can happen is that earlier TRIMed spot is rewritten with new parity data. However, most generic RAID5 implementations in FreeBSD and Linux anyway behave without read verification or periodic scrub out of the box. The read verification is generally not an issue because it will only happen for written data; not TRIMed LBA space.

Concluding, there is no theoretical limitation to implement TRIM on RAID on whatever combination you want. BSD does it, Linux probably as well although I prefer to stick to what I know best, and that is BSD. It has been demonstrated in both legacy filesystems and RAID schemes and the filesystem hybrid: ZFS. In any fair court, this ought to be accepted as evidence. ;-)

Ps: There is no need to feel offended. You will only offend yourself if you do not allow yourself the benefit of a wealth of knowledge and insights that other people bring to the table. In fact, that is the very essence of a forum, is it not?

Cheers,
- sub.mesa

Last edited by sub.mesa; 02-20-2013 at 05:24 PM.
Reason: edited section about RAID SOC implementing TRIM, rather than RAID

Ok I read that and think we are going in circles or misunderstanding each other.

I think you are comparing coming from Linux where as I am coming from the NetAPP / Compellent / VNX direction. I also think we are nitpicking. Those devices do more for data resilience and are more unaware of what is actually stored in them. These devices do perform disk checks during idle times and the like so they are more likely to have issues with an array of SSD's clearing or returning garbage at a whim. The controller would need to maintain some sort of table so it would know that these errors are ok and should be ignored. The issue with having them recalc parity is that now you have written to the wiped pages and basically undid what TRIM was trying to accomplish. Also how does the array know what to recalculate? The failure would look very much like a multidisk failure etc. The whole point of UNMAP is to help fix that issue. However you can tell that T10 has learned from the... 'mess' that TRIM is at the moment. UNMAP will be allowed to truly UNMAP LBA segments and return them to the SAN pool. That way the SAN can manage zeroing space / letting SSD devices do garbage collection.

NTFS as a whole is the same as saying ZFS. There are way more parts and layers to it than that.

NTFS at its top layer builds the file delete data and passes it down to its lower layers generate the actual TRIM command that is passed to ATA. I get that BSD might pass that top delete data farther down to the disk. Fine. The part I was saying is that NTFS can and will pass raw data to other parts if asked. One of the big things I know MS has to deal with is compatibility and deciding to just change out interface for the storage drivers would need vendor buy in so I get why they do it the way they do. However, you should go read up on ReFS to get a decent understanding on how the NTFS layers actually work. ReFS is MS's answer to ZFS. Other than ZFS having more polish, they accomplish near the exact same thing. You will see in tech data that they removed the NTFS bottom layers but left the NTFS top layer there to handle compatibility and general access. At that point ReFS is handling the clearing requests such as TRIM, UNMAP etc. My point was that MS typically does build with the intent to expand and as you can see, Intel even gets data from NTFS for clears before they get down to the bottom of the pile.

This really isn't all that different than ZFS except that is closer to ReFS than NTFS. It is a merging of the disk systems with the file systems. NTFS was much more independent in that regard. Because of that NTFS needed to tell the disks what to clear and TRIM was the specified way so that is what MS did. I would agree that it is very much a bolt on and could have been done better but you can see they cleaned it up in Win8/2012. I also suspect they did this because they would not be able to go back and tell the storage companies to completely redesign the drivers to handle it.

I will also state that ZFS does TRIM in RAID because ZFS wants direct access to the disks via basic ATA or SCSI. This is part of that merger of the filesystem and disksystems. ZFS's implementation of RAID is more like Microsoft software RAID than hardware RAID. This of course has its major advantages and major disadvantages.

Promise was going to be my example for the Linux version of those RAID chips.