Pure Storage

So we've got all the small stuff moved . . . here's some stats during the moves and backups over night.

We were basically at the mercy of our previous storage for Storage vMotion throughput. And now we are at the mercy of our backup storage being slow, but that will be rectified as soon as our previous storage is empty. We'll be converting our NetApp to a backup target . . . 2x 15K shelves as the target, 1x SATA shelf for archive/long-term and 1 SSD shelf for CommVaults Databases (SQL, Dedupe, Etc ...). Should be blazing at that point.

Read latencies are kind of crap, better than the 7200RPM drives would do by themselves but not really any better than 15k drives. I'm also surprised that write latencies spike up so much under what should be very low load for the SSD and ram cache.

Umm, so .5ms read latency at 15k IOP is bad? The read latency hasn't exceeded 1ms for any work load yet. Write latency hasn't gone over 5ms except for a spike once at 1AM to 8ms . . . average is less than 2.5 with 7-8k IOPS.

15K can't really touch this without a massive number of disks.

afidel wrote:

Read latencies are kind of crap, better than the 7200RPM drives would do by themselves but not really any better than 15k drives. I'm also surprised that write latencies spike up so much under what should be very low load for the SSD and ram cache.

Umm, so .5ms read latency at 15k IOP is bad? The read latency hasn't exceeded 1ms for any work load yet. Write latency hasn't gone over 5ms except for a spike once at 1AM to 8ms . . . average is less than 2.5 with 7-8k IOPS.

15K can't really touch this without a massive number of disks.

afidel wrote:

Read latencies are kind of crap, better than the 7200RPM drives would do by themselves but not really any better than 15k drives. I'm also surprised that write latencies spike up so much under what should be very low load for the SSD and ram cache.

Write latency hasn't gone over 5ms except for a spike once at 1AM to 8ms

Did/do you plan to poke Pure over that spike? All the other graph lines have a reasonable amount of correlation so it sticks out a little bit. I'd be vaguely curious if it's just a measurement anomaly or if you somehow ran into a checkpoint/GC routine on the array at the wrong time with your write load.

The write latency is OK. Higher than a more traditional array that isn't exhausting its NVRAM, but it'll probably scale more elegantly under IO pressure. The read latencies however are *very* nice.

I'm keeping an eye on the write latency spikes. That's not the first one to happen but they are so infrequent that I'm just waiting to see if anything comes of them. Haven't seen any ill effects from them yet, but if they continue with any regularity I'll report them just to see what the cause is.

The read latency is so consistent whether its large or small block IO. The write latency seems to only come up when there is a lot of large block writes, which obviously the storage vMotions are causing. Can't wait to get the data warehouse on it next week to see how our sync jobs are handled, as well as read response for reports. Should be fun for our developers to pound on.

OK, so we've ran into some issues with our iSCSI setup that we are working through with Pure and VMware. The write latency spikes seem to be a hiccup in the iSCSI communication, so we're working with both vendors to track down the issue. Pure Storage support and engineering have been great in helping us work on the issue and have been in constant contact literally day and night.

The issue they are seeing, as described to me, is that occasionally a write request comes in, but the data for the request gets delayed before reaching the array for some reason. The delays are causing the latency spikes and in some cases even causing the array to detect a failure and attempt to rectify by failing over the controllers. It had gotten progressively worse as time went on so we ended up migrating our servers back to our previous storage, until we get everything sorted out.

Hope to have the issue resolved next week and begin testing again.

As far as validation of the performance results, we use Quest Spotlight on SQL Enterprise to monitor our Windows Servers and SQL Servers as well as vCenter. Once we get things moving back again I can pull some stats to show from the host and server side for comparison to the arrays charts.

I met with some of the guys from Pure a couple of weeks back. Really amazing product. Really glad pricing has nothing to do with what I thought it did on the previous page. They're super-competitive with the "big guys" and provide way better performance. I was really blown away by the product. Seriously.

So we've migrated most everything back to the Pure. Worked out the iSCSI and latency issues with Pure support and everything is running smooth again. Here's another quick look at the performance numbers we're seeing on the storage side:

The issue was resolved in a recent code update to the controllers. We had some write latency spikes that occasionally caused on controller fail over, but all the issues have been resolved. We have migrated about 150 servers and 50 virtual desktops to it in a couple days and have been running full tilt on it for about 2 weeks now. They will have you on the latest software release so you won't see any of the issues we did.

Read and write latencies on the array don't exceed 2ms, even with 20k IOP backup load pulling from it each night. On the VMware side we are seeing similar numbers through our vCenter performance stats.

We just got off the phone with them and I am really impressed. My boss isn't so hot on how small/new of a company they are, but we will likely do a PoC later this year, and maybe get a shelf + HA controllers around Q1 of next year.

Was doing some theoretical mind flexing with some co-workers yesterday about VDI.And somehow when I mention Pure Storage using SSD only storage with dedup and compression they were impressed, but apprehensive about using something like this in actual production environments. Guess most of todays storage admins don't have good experience with dedup and compression in production. Also they are skeptical about MLC SSD endurance for enterprise use.

They did like the low latency and performance though Maybe in a few years when the product is (or atleast feels) more proven...

Are there any updates on the OP's Pure system? I'd like to investigate something like this for my primary customer based on what I've read so far but am looking for any revised opinions now that we are past the first 30 days or so.

Yep, we've got our production workload on it now and will be moving our last database this Tuesday morning. Haven't had anymore issues with it at all. Some interesting observations with regards to thin provisioning and inline data reduction:

* Under heavy write loads (300MB/s+) the data reduction will actually lag behind as you will see usable capacity decrease while writes are happening, then a few minutes later it will jump back down.

* When you go past 80% space utilization, garbage collection is given a higher priority in the I/O stream as it begins to prune stale data more rapidly to make space available for the writes coming in. This can cause write latency to rise if you are doing continuous large block writes. Letting up on the writes for a time will allow the array to catch up and reduce the data back down.

* I've found that our older servers that have had a lot of writes and deletes on their volumes, may not dedupe as well, since data is never really removed from a Windows volume. I've began a process of zeroing out volumes on larger VMs to free up space. It forces you to be much more aware of what's going on in the volumes. Eagerly waiting for VMware to integrate UNMAP functions into VMware Tools to allow background UNMAP of deleted data.

Are there any updates on the OP's Pure system? I'd like to investigate something like this for my primary customer based on what I've read so far but am looking for any revised opinions now that we are past the first 30 days or so.

* When you go past 80% space utilization, garbage collection is given a higher priority in the I/O stream as it begins to prune stale data more rapidly to make space available for the writes coming in. This can cause write latency to rise if you are doing continuous large block writes. Letting up on the writes for a time will allow the array to catch up and reduce the data back down.

For what it's worth the sales guys said that the current I/O limits are CPU bound on the head ends, and the storage is capable of much more.

Not that surprising I guess, when you look at the inline dedup and compression.

Also might explain this:

snakethejake wrote:

* When you go past 80% space utilization, garbage collection is given a higher priority in the I/O stream as it begins to prune stale data more rapidly to make space available for the writes coming in. This can cause write latency to rise if you are doing continuous large block writes. Letting up on the writes for a time will allow the array to catch up and reduce the data back down.

It got progressively higher the further we pushed it past 80%. I think at one point we were at 90% and still writing about 300MB/s and the write latency had gotten up above 10ms. Not surprisingly read latency was still relatively normal.

* When you go past 80% space utilization, garbage collection is given a higher priority in the I/O stream as it begins to prune stale data more rapidly to make space available for the writes coming in. This can cause write latency to rise if you are doing continuous large block writes. Letting up on the writes for a time will allow the array to catch up and reduce the data back down.

Our primary database is now on the storage. As we expected, it barely put a dent in it. About 1TB worth of databases written and around 300GB of space utilization and its still slowing moving down more. I'll post some additional charts after we let it run a while. It can hammer the storage with its nightly processes so we'll see how it does.

Depends on the array architecture, but it's not a general assumption I would use.

If the array uses a write pattern similar to ONTAP, there is a freespace crossover point where your write workload may not be able to use full stripe writes anymore which then has an overall impact on performance. With thin provisioned arrays you could land up with a "perfect storm" where a number of hot workloads can land up on the same set of spindles, because the "smart" allocation algorithms couldn't find the free space anywhere else.

Both scenarios have mitigation methods using various reallocation strategies(realloc in ONTAP, System Tuner on 3PAR, etc.), but their effectiveness is almost entirely dependent on your workload.

Actually, considering the semantics of flash, I would consider slowing down as it fills up a safe assumption. Certainly the activity level of the array and its garbage colllection algorithm will play a big part. That is unless there is sufficient spare area / scatch spaces that allows the array to stay ahead of the changes.

It depends on the vendors. Some use SLC flash with custom firmware on the drives themselves. Pure uses commodity MLC drives with stock firmware and the magic all happens at the (pure) controller level

It would also be very interesting if they could publish some figures about SSD endurance and some reason as to why their drives die.And maybe even combine that with some numbers on MLC endurance and how it ACTUALLY responds to dying NAND.

It depends on the vendors. Some use SLC flash with custom firmware on the drives themselves. Pure uses commodity MLC drives with stock firmware and the magic all happens at the (pure) controller level

Neither firmware nor SLC have solved the limitations of using NAND memory (and we're only talking about the performance limitations, not endurance). Blocks still need to be erased before they are written to. SLC just makes everything happen faster and lasts longer. Almost every vendor is moving away from SLC. I can't think of any that aren't actually.

I suppose your point is that SLC holds off performance degradation longer (by having lower latencies, higher IOPs).

Actually, considering the semantics of flash, I would consider slowing down as it fills up a safe assumption. Certainly the activity level of the array and its garbage colllection algorithm will play a big part. That is unless there is sufficient spare area / scatch spaces that allows the array to stay ahead of the changes.

Unless it's a commanality among vendors to run drives without any spare area whatsoever, which I highly doubt, the performance degradation would have far more to do with your write rate and patterns than the free space on the array. And the same degradation could occur with the array only 10% full if you somehow manage to exhaust the supply of clean blocks before the GC mechanisms can kick in and consolidate pages into single blocks and issue erases.

Actually, considering the semantics of flash, I would consider slowing down as it fills up a safe assumption. Certainly the activity level of the array and its garbage collection algorithm will play a big part. That is unless there is sufficient spare area / scratch space that allows the array to stay ahead of the changes.

Unless it's a commanality among vendors to run drives without any spare area whatsoever, which I highly doubt, the performance degradation would have far more to do with your write rate and patterns than the free space on the array. And the same degradation could occur with the array only 10% full if you somehow manage to exhaust the supply of clean blocks before the GC mechanisms can kick in and consolidate pages into single blocks and issue erases.

So no, I don't think it's a safe assumption with flash arrays either.

Pure's drives have spare area (Samsung disks, they used to use 830s I think) and according to Jake's observations he is seeing a performance impact. Large rusty disk arrays perform largely similar to single disks and have similar limitations. Or more clearly, given certain conditions, the underlying storage medium characteristics can (and usually are ) exposed. *Broadly* speaking given sufficient load, big arrays feel the impact of rotational latency and have difficulty with random writes. Layered technology on top of the storage medium (like ONTAP) obfuscates those characteristics but only for so long which is why we see a fall off when we approach the limitations of the array. It stands to reason that most NAND arrays have will have a similar relationship to single SSDs.

Only with more reviews and full testing will we know but Pure already shows an impact. That impact seems to be inline with the nature of NAND (and obvious ways to deal with it like prioritizing GC) so anyone else using it is at least for me a safe assumption to assume a similar impact of using that medium.

Everyone can certainly be correct, but until I see benchmarks with similar conditions that stress the nature of NAND or hear of some technology (with details) that can sufficiently deal with them, I am going to stick to having doubts.

Either way flash is a nice step up from HDD arrays (I'd kill for a Pure array, or the budget to get one :-)). I am just not going to expect perfection.

The question was, does an array slow down as it gets full. The answer is a distinct "Maybe". A more accurate question is, "Under write workload X, does the array slow down?", to which the answer is likely a binary "Yes"/"No".

The "Yes" may or may not be idempotent of array space utilisation. It depends quite heavily on the design of the array, where the starvation might occur(such as weak controllers, rather than backend disks), the array allocation methods, and the nature of the workload. Overall space utilisation might have nothing to do with it.

Talking with support when we were seeing the higher latency above 80% utilization, they specifically stated that the array was doing garbage collection in order to release space for the incoming writes we were doing. The array begins prioritizing the garbage collection when running past 80% since you are more likely to run into a situation where blocks may not be available without it. The controllers then allow the garbage collection higher priority in the I/O scheduler, which causes the latency increase.

This would seem to be inevitable in any array that spreads data across all disks like the Pure does. Its purposefully using all of the disk in order to balance the number of writes to the cells to reduce wear.

Talking with support when we were seeing the higher latency above 80% utilization, they specifically stated that the array was doing garbage collection in order to release space for the incoming writes we were doing. The array begins prioritizing the garbage collection when running past 80% since you are more likely to run into a situation where blocks may not be available without it. The controllers then allow the garbage collection higher priority in the I/O scheduler, which causes the latency increase.

This would seem to be inevitable in any array that spreads data across all disks like the Pure does. Its purposefully using all of the disk in order to balance the number of writes to the cells to reduce wear.

Jake

Sorry for the thread resurrection, but it looks like I have the opportunity to get additional storage. A few months later, any complaints with Pure? I'm just curious if time has changed any opinions.

No change. The array is working flawlessly. We're eagerly awaiting the next software update to allow us to leverage snapshots and xcopy to begin cloning VMs for our test lab. Should allow instant clone creation through hardware offloading of the copy process to the array which snapshots the VM and creates the new one based on pointers. This way its fully deduped still and transparent to the VMware cloning process.

The write latency spikes seem to be a hiccup in the iSCSI communication

A few weeks ago we had our Nimble rep come in and set up one of their units alongside our 6month old EMC VNX units, and after running IOPS comparisons the Nimble was utterly wreaking the EMC units, especially in write latency. The EMC was also running higher end disc discs as well. After he had his usual fun making our EMC techs red in the face, he admitted to me it was a problem was just a way the EMC's were handling iSCSI.