Pure Storage

Anybody use them? Looking for feedback on any issues with the system, support etc. We've done the POC and its got all the performance they say it does and it is bullet proof from my testing (cascading failures from drives, to NVRAM, to SAS, to Infiniband, to fiber, then an entire controller).

Looking for opinions, preferably from an owner of one but the usual 2nd hand is welcome as well. We are looking to replace our production storage with and FA-320 with two shelves, roughly 20k IOPs and 12TB of data.

are you talking about Pure Storage, the all flash based array? the IO profile you are quoting at 20k IOPS is pretty small potatoes, with nearly every major storage player providing some form of mid tier platform (EMC VNX, HP7400, Netapp FAS3200, etc.) could run that particular workload and probably at half the cost of an all solid state array. IIRC Pure doesn't even have array based replication currently either.

Don't get me wrong, I really dig what the flash array players are doing, but its only a matter of time before the big boys simply crush them or acquire them.

I haven't actually used it, but based on what i know, there's some things that i like and dislike

Like:Built for SSD: wear leveling, etc all built in.Deduplication and compression are expected to be used while maintaining awesome performance.Can do lots and lots of iops at sub ms latency.No crazy SSD price gouging unlike major SAN vendors (where SSD = $$$)Smart guys and cool technology.

Dislike:Their marketing department must've taken cues from the Avamar playbook: Capacity Licensing - gotta pay to "license" the capacity even if you physically have it.Sold on "expected capacity" after dedupe/compression.Can't have it as your only platform:no slow drives (to host file services for example) no NFS/CIFSno replication and other "expected" SAN features.

Also as gchapman said, it will be probably be acquired or stomped on by a large manufacturer if pure SSD storage platform takes off

Well, we've decided to give them a go. We are coming from a FAS3240 and we are outgrowing it as our latency is starting to slip during peak times. We were slated to add additional 15k shelves and software (SnapVault, FlexClone, etc.) and a FAS2240 to replicate to in order to release some pressure. But having done the POC, we talked to Pure and they are giving us a very competitive deal on a 2 controller + 2 shelf unit.

Not sure where you got the Capacity licensing thing, you get what you buy hardware wise. I believe they have one system at their bottom end that you can get that is half "populated". Basically it's a full shelf but only presents half the storage at a discount. Why would you want slow drives for your files? If you can have it on flash, assuming space is available, why not? We will basically be hosting everything on it, databases, app servers, file servers and a 50 seat starter VDI deployment.

We'll be in it for 3 years with support so we can re-evaluate at that point and see where the company is at.

Like I said, this is not the case. You do not pay for capacity as a license. You get to use everything that you buy. The exception is on a single bottom end system where they license half the shelf at a lower cost in order to lower the barrier of entry. Hardly a capacity licensing scheme since it is only on one entry level system.

AngelZero wrote:

Quote:

Capacity Licensing - gotta pay to "license" the capacity even if you physically have it.

Like I said, this is not the case. You do not pay for capacity as a license. You get to use everything that you buy. The exception is on a single bottom end system where they license half the shelf at a lower cost in order to lower the barrier of entry. Hardly a capacity licensing scheme since it is only on one entry level system.

Well, we've decided to give them a go. We are coming from a FAS3240 and we are outgrowing it as our latency is starting to slip during peak times. We were slated to add additional 15k shelves and software (SnapVault, FlexClone, etc.) and a FAS2240 to replicate to in order to release some pressure. But having done the POC, we talked to Pure and they are giving us a very competitive deal on a 2 controller + 2 shelf unit.

Not sure where you got the Capacity licensing thing, you get what you buy hardware wise. I believe they have one system at their bottom end that you can get that is half "populated". Basically it's a full shelf but only presents half the storage at a discount. Why would you want slow drives for your files? If you can have it on flash, assuming space is available, why not? We will basically be hosting everything on it, databases, app servers, file servers and a 50 seat starter VDI deployment.

We'll be in it for 3 years with support so we can re-evaluate at that point and see where the company is at.

my understanding was that you can buy in full shelf (5.5tb) or half shelf capacities, but the physical shelf is always full.

Anyhow, so 2 shelves = 11TB raw. Granted, if you get their advertised average data reduction, that's like 60TB. If you don't, that's 11TB raw. If you don't get advertised data reduction and have a lot of data, you need to buy more storage, and the $/TB is pretty high. Still, i think pure SSD storage is cool, and I was just stating my opinion about some of the downsides, not trying to dissuade you. If it fits your environment, great!

Liked it more than RAMSAN- but it does something a little different. Pure is less raw IOPS and more features. I like Pure quite a bit in VDI or VMWare situations. I like it more than Nimble as well.

If you need max IOPS and minimum latency, Pure is not your play. If you need NAS protocols it is also currently not you play. But if you need block devices, IOPs and can tolerate a little latency with some cool features like replication and dedupe, it is pretty solid.

Their data reduction rates remind me of Data Domain- you don't always see the advertised rate. I'd plan on something a little higher than the usable minimum but not a lot higher.

I wonder how long they'll be around- I imagine they'll be acquired or marginalized by the major manufacturers coming out with a similar product.

We saw 3.8:1 reduction on a 700GB SQL Server database and we were easily seeing 5:1 or more on VM appliation servers. VDI is even more rediculous in the upwards of 10-20+:1. So the dedupe rates are there. We should have the unit onsite in the next couple weeks and I'll report back as we start adding our production workloads to it. If anyone wants to see any specific IO tests let me know as I will be baselining the system before we begin migration.

10GbE iSCSI with two ports per controller to start. Comes with four per controller but I doubt we'll ever need that much throughput. We tested the FC version in our POC and it performed very well, even on 4Gb FC.

10GbE iSCSI with two ports per controller to start. Comes with four per controller but I doubt we'll ever need that much throughput. We tested the FC version in our POC and it performed very well, even on 4Gb FC.

Jake

More ports doesn't just get you more bandwidth, it also gets you more queue's, we don't come anywhere near touching the 4Gbps bandwidth per port on our EVA but frequently run into issues where even with round robin load balancing we're pushing the queue depth quite high despite having 8 ports available.

Is this the queue depth measured on the storage side HBAs, at the switch or on the HBAs on the server side? We are now looking at additional switching infrastructure as well and will have the ports to plug all 8 in, but I'm wondering if it would be necessary to extend on the server side as well, to 4 ports instead of just 2?

afidel wrote:

snakethejake wrote:

10GbE iSCSI with two ports per controller to start. Comes with four per controller but I doubt we'll ever need that much throughput. We tested the FC version in our POC and it performed very well, even on 4Gb FC.

Jake

More ports doesn't just get you more bandwidth, it also gets you more queue's, we don't come anywhere near touching the 4Gbps bandwidth per port on our EVA but frequently run into issues where even with round robin load balancing we're pushing the queue depth quite high despite having 8 ports available.

Is this the queue depth measured on the storage side HBAs, at the switch or on the HBAs on the server side? We are now looking at additional switching infrastructure as well and will have the ports to plug all 8 in, but I'm wondering if it would be necessary to extend on the server side as well, to 4 ports instead of just 2?

afidel wrote:

snakethejake wrote:

10GbE iSCSI with two ports per controller to start. Comes with four per controller but I doubt we'll ever need that much throughput. We tested the FC version in our POC and it performed very well, even on 4Gb FC.

Jake

More ports doesn't just get you more bandwidth, it also gets you more queue's, we don't come anywhere near touching the 4Gbps bandwidth per port on our EVA but frequently run into issues where even with round robin load balancing we're pushing the queue depth quite high despite having 8 ports available.

That's measured at the host side ports on the array. I've never seen a major problem on the host side with two ports but if you had a big enough VM it might be a concern, in our case it's aggregate load. It's really strange to see very high guest disk latency and look at the array and see physical disk latency is at 5-8ms.

In round numbers, what does a single or two tray system of this stuff cost, and when they say 5.5TB per tray is that a gross marketing type (probably) or net usable amount?

I'm just wondering if this is $5k/TB usable or $50k...

I can think of a couple of specific applications where under 10TBs of really fast shared storage might have some value that I could turn into a project budget, but the numbers would need to look right.

Their cost structure is based around their ability to dedupe and compress the data. A single shelf nets roughly 5TB usable, which with a 5:1 reduction in data (very attainable depending on the dataset) puts you at 25TB usable. Obviously theres some variation depending on the type of data. Comparing their cost based on the usable without data reduction would not yield a very cost effective system, but thats not really what they are targeting anyways. List pricing, I beleive, is around $7 per GB usable after data reduction (apply appropriate negotiaton techniques and this can come down, as with any vendor).

What kind of datasets are you looking at? Virtualization (5-10:1 reduction) and VDI (5-30:1 reduction) is where you will see the most cost effectiveness. Databases as well when compared to equivelent 15K to get the IOP and latency you get with this system.

I'm thinking more of DBs, Oracle and 'other'; and I'm not thinking of VDI or other very highly 'de-dupable' applications.

One specific would probably be our data-warehouse; it isn't really that massive, but it gets pounded with poorly-structured ad-hocs. You know RDBs: "Ask a stupid question, it will get back to you in a day or two."

Our data warehouse is one of the primary systems that will be benefiting. We saw a 3.8:1 data reduction on our production database of 700GB. This being one of the databases we use in our data warehouse, it will essentially be 100% deduped as it will be on the same storage. We have a couple dozen databases that will be copied over in the same way, but still reside on the same storage. When I had the POC system in house we did a few test queries from the dataset and saw massive savings in query times from minutes to seconds. The reporting and analytics teams will be shocked once we get things moved over.

I saw them at a storage conference in SF 2 months ago, and they were quoting 4 to 5 dollars per gig. I thought they would be a good solution IF i needed the iops, but as others pointed out, a bit narrow in their focus.

what is more interesting are your dedupe ratio numbers. How do they compare to NetApp? Particular the VDI numbers?

We use Pure. We have two controllers and two shelves for 11TB total. We connect over 8G FC. We're very satisfied with performance. Read latency is amazing and write latency is much better than our old HDS AMS. Bandwidth has been over 1GB. (Yes, big B as in bytes.) Most of the space is used for some big MSSQL boxes, but the structure of our data doesn't allow for much dedupe. However, we have about 50 regular Windows VMs in a couple datastores that dedupe between 10:1 to 15:1.

Their UI is stupid easy to use and everything but LUN sizes and hosts groups are managed for you. That was nice change from HDS and their ever-evolving DP/RG best practice nonsense. Alerting/Monitoring could be better, but it's something they are working on.

All of the people from Pure who I've worked with have been very cool. From sales to engineering to support, their people are great.

Are any of you concerned that for redundancy at least, it's a midrange/workgroup architecture? Active/active controller? It looks like a really cool array but it's also the type of array you'd be putting really important stuff on, and I'd want to see better than that..

Are any of you concerned that for redundancy at least, it's a midrange/workgroup architecture? Active/active controller? It looks like a really cool array but it's also the type of array you'd be putting really important stuff on, and I'd want to see better than that..

What are firmware upgrades like?

How many systems are really better than active/active? VMAX, VSP, 3Par, SVC, I believe that's it. Even when you do have better than active/active you still only have 2 paths to a given disk if it's a SAS backend (which everyone is going to) so if you lose a controller pair you're going to have issues unless you have massive spare performance (which would mean your $/GB is laughably bad).

*edit*And according to their founder and CTO the Pure system is designed for 2N meshing which means they might get to the same place as some of the other players if there's enough demand.

How many systems are really better than active/active? VMAX, VSP, 3Par, SVC, I believe that's it. Even when you do have better than active/active you still only have 2 paths to a given disk if it's a SAS backend (which everyone is going to) so if you lose a controller pair you're going to have issues unless you have massive spare performance (which would mean your $/GB is laughably bad).

Yes those, and XIV as well (which doesn't have the 2 paths to a disk problem). Isilon too I suppose if we're reaching beyond FC. Anyway my point is if you need the speed Pure offers, it's probably something that would normally be on one of those array types mentioned, something that important. So what was the thinking of the people who went with Pure - extra speed was worth the tradeoff in redundancy? Wasn't a concern in the first place? I'm just curious because I see these new flash vendors who obviously are targeting the highest performance scenarios but they are all active/active twin controller designs that I have seen (so far).

afidel wrote:

And according to their founder and CTO the Pure system is designed for 2N meshing which means they might get to the same place as some of the other players if there's enough demand.

Neat.

Still curious on the firmware updates too, if it can upgrade the firmware without losing any redundancy during the process.

How many systems are really better than active/active? VMAX, VSP, 3Par, SVC, I believe that's it. Even when you do have better than active/active you still only have 2 paths to a given disk if it's a SAS backend (which everyone is going to) so if you lose a controller pair you're going to have issues unless you have massive spare performance (which would mean your $/GB is laughably bad).

Yes those, and XIV as well (which doesn't have the 2 paths to a disk problem). Isilon too I suppose if we're reaching beyond FC. Anyway my point is if you need the speed Pure offers, it's probably something that would normally be on one of those array types mentioned, something that important. So what was the thinking of the people who went with Pure - extra speed was worth the tradeoff in redundancy? Wasn't a concern in the first place? I'm just curious because I see these new flash vendors who obviously are targeting the highest performance scenarios but they are all active/active twin controller designs that I have seen (so far).

The number of companies that need beyond 6 9's of storage availability (roughly what you can get with an active/active pair if you don't punt up the firmware) number in the low hundreds, the number that can use a fast, low cost array is probably in the tens of thousands.

I take that to mean that you're effectively down to one controller during a firmware update, while it upgrades the other one and has it offline?

Yes, the firmware upgrade takes one controller offline at a time. The process is pretty quick, only a couple mins.

The primary concern there being single-array environments where the whole "being on a single controller for a few minutes" thing still freaks a lot of people out. I'm really hoping for some faster-boot tech to hit the controller space in the next couple of years, so updates flash and reload in seconds instead of minutes, mitigating the stress factor.

The number of companies that need beyond 6 9's of storage availability (roughly what you can get with an active/active pair if you don't punt up the firmware) number in the low hundreds, the number that can use a fast, low cost array is probably in the tens of thousands.

I suppose so. Around here the SAN is viewed like a utility - it's always on with all paths up, pretty much no exception. Getting that reliability in the past was expensive but these days not so much.

It'll be interesting to see this technology mature, it's one of the more inventive arrays I've seen lately..

AngelZero - the technology is already here, it's just really tricky to implement if it's not a design goal of the firmware from day 1. Just ask Isilon, they are still working on it.

Pure controllers are active/passive, not A/A. All IO is processed via a single controller at all times. IO sent to passive controller (via mpath) is sent over to the active one. Their NDU guarantees performance because it's always a single controller doing all the work.

The array was racked and stacked yesterday. Waiting on a pair of Cisco Nexus 5548UP's for the top of rack connectivity. Should be here in a couple of days along with fiber cables. Should have everything setup and ready for baselining and failover testing middle of next week. I'll post the results as I get them.

This flash array looks like something we could really use, however we have 0 fiber channel. We're 100% 10gig iSCSI. Anyone know what impact (if anything) that has on array performance latency or IO? I've always heard FC typically has a lower latency than iSCSI....but I don't really know as I've never touched FC.

FC does provide lower latency and generally better performance from several aspects vs generalized iSCSI which will depend on how your network is configured, and if you have a dedicated set of pipes for your storage traffic, or if you are sharing storage/nic over the same line. Keep in mind, ethernet is a best effort delivery therefore unpredictable and can experience delays. If you want to take it to the next step with iSCSI investigate DCB over iSCSI which utilizes PFC and DCBx. PFC, in effect, creates prioritized lanes in a single connection plus enables lossless ethernet by pausing packets so they will not interfere with higher priority traffic. DCBx does conflict resolution, this is a simplification but for the most part thats what we are looking at.

Honestly, if I was going to be deploying an all flash array, I'd rather it be connected via FC. Disclosure, I work for a company that makes HBA/CNA's and the newer generation of PCIe 3.0 FC adapters running at 8 and 16GFC speeds just have more horsepower to provide from a transactional standpoint than NIC. Now in the future when 40GbE becomes readily available this will change, as you will be able to take advantage of some newer technologies that will compete above what FC can do, and even be on par with Inifiniband, but for now, even if the storage array only supports 8GFC, you will not be able to outperform the current crop of 16GFC adapters even running with 8GFC optics.

The array was racked and stacked yesterday. Waiting on a pair of Cisco Nexus 5548UP's for the top of rack connectivity. Should be here in a couple of days along with fiber cables. Should have everything setup and ready for baselining and failover testing middle of next week. I'll post the results as I get them.

Jake

Bump.

Any word on how the array turned out? I'm curious to find out how well it performs.

Quick update:We've got it cabled up to the Nexus switches with 8 10GbE split between both controllers and switches. Hosts have dual 10GbE split between the switches and carry all VM and storage traffic with NIOC and LBT managing the links. All I can say is holy crap this setup is blazing. A quick round of max IOPs runs from the VMware IO Analyzer yielded 120k IOP at sub-ms latency. Not the best test really but it was quick and at least shows the heads are up to it. Throughput matches my previous tests with SQL Server backups and restores reached 800MB/s without issue, and showed no signs of slowing down as I capped my VM's processors.

I'll be moving our 1TB data warehouse over to it next week, which will give much better real world numbers as we run our queries against it. Should also have out VDI migrated end of next week with roughly 60 VM. All the while moving the rest of our couple dozen databases severs and 100 or so app servers. This will build up to the final move of our 1TB production SQL server, that handles the most IO of our apps.

The biggest test will be when we run our backups against the full data set. I'll post some stats of that and our data warehouse queries once we're rolling.