NetApp posts SPC-1 Top Ten Performance results for its high end systems – Tier 1 meets high functionality and high performance

It’s been a while since our last SPC-1 benchmark submission with high-end systems in 2012. Since then we launched all new systems, and went from ONTAP 8.1 to ONTAP 8.3, big jumps in both hardware and software.

In 2012 we posted an SPC-1 result with a 6-node FAS6240 cluster – not our biggest system at the time but we felt it was more representative of a realistic solution and used a hybrid configuration (spinning disks boosted by flash caching technology). It still got the best overall balance of low latency (Average Response Time or ART in SPC-1 parlance, to be used from now on), high SPC-1 IOPS, price, scalability, data resiliency and functionality compared to all other spinning disk systems at the time.

Today (April 22, 2015) we published SPC-1 results with an 8-node all-flash high-end FAS8080 cluster to illustrate the performance of the largest current NetApp FAS systems in this industry-standard benchmark.

And #3 if you look at performance at load points around 1ms Average Response Time (ART).

The NetApp system uses RAID-DP, similar to RAID-6, whereas the other entries use RAID-10 (typically, RAID-6 is considered slower than RAID-10).

In addition, the FAS8080 shows the best storage efficiency, by far, of any Top Ten SPC-1 submission (and without using compression or deduplication).

The FAS8080 offers far more functionality than any other system in the list.

We also recently posted results with the NetApp EF560 – the other major hardware platform NetApp offers. See my post here and the official results here. Different value proposition for that platform – less features but very low ART and great cost effectiveness are the key themes for the EF560.

In this post I want to explain the current Clustered Data ONTAP results and why they are important.

Flash performance without compromise

Solid state storage technologies are becoming increasingly popular.

The challenge with flash offerings from most vendors is that customers typically either have to give up a lot in order to get the high performance of flash, or have to combine 4-5 different products into a complex “solution” in order to satisfy different requirements.

For instance, dedicated all-flash offerings may not be able to natively replicate to less expensive, spinning-drive solutions.

Or, a flash system may offer high performance but not the functionality, scalability, reliability and data integrity of more mature solutions.

But what if you could have it all? Performance and reliability and functionality and scalability and maturity? That’s exactly what Clustered Data ONTAP 8.3 provides.

Here are some Clustered Data ONTAP 8.3 running on FAS8080 highlights:

All the NetApp signature ultra-tight application integration and automation for replication, SnapShots, Clones

Over 460TB (yes, TeraBytes) of usablecache after all overheads are accounted for (and without accounting for cache amplification through deduplication and clones) in an 8-node cluster. Makes competitor maximum cache amounts seem like rounding errors – indeed, the actual figure might be 465TB or more, but it’s OK… 🙂 (and 3x that number in a 24-node cluster, over 1.3PB cache!)

The ability to virtualize other storage arrays behind it

The ability to have a cluster with dissimilar size and type nodes – no need to keep all engines the same (unlike monolithic offerings). Why pay the same for all nodes when some nodes may not need all the performance? Why be forced to keep all nodes in the same hardware family? What if you don’t want to buy all at once? Maybe you want to upgrade part of the cluster with a newer-gen system? 🙂

The ability to evacuate part of a cluster and build that part as a different cluster elsewhere

The ability to have multiple disk types in a cluster and, indeed, dedicate nodes to functions (for instance, have a few nodes all-flash, some nodes with flash-accelerated SAS and a couple with very dense yet flash-accelerated NL-SAS, with full online data mobility between nodes)

That last bullet deserves a picture:

“SVM” stands for Storage Virtual Machine – it means a logical storage partition that can span one or more cluster nodes and have parts of the underlying capacity (performance and space) available to it, with its own users, capacity and performance limits etc.

In essence, Clustered Data ONTAP offers the best combination of performance, scalability, reliability, maturity and features of any storage system extant as of this writing. Indeed – look at some of the capabilities like maximum cache and number of LUNs. This is designed to be the cornerstone of a datacenter.

it makes most other systems seem like toys in comparison…

FUD buster

Another reason we wanted to show this result was FUD from competitors struggling to find an angle to fight NetApp. It goes a bit like this: “NetApp FAS systems aren’t real SAN, it’s all simulated and performance will be slow!”

Right…

Well – for a “simulated” SAN (whatever that means), the performance is pretty amazing given the level of protection used (RAID6-equivalent – far more resilient and capacity-efficient for large pooled deployments than the RAID10 the other submissions use) and all the insane scalability, reliability and functionality on tap 🙂

Another piece of FUD has been that ONTAP isn’t “flash-optimized” since it’s a very mature storage OS and wasn’t written “from the ground up for flash”. We’ll let the numbers speak for themselves. It’s worth noting that we have been incorporating a lot of flash-related innovations into FAS systems well before any other competitor did so, something conveniently ignored by the FUD-mongers. In addition, ONTAP 8.3 has a plethora of flash optimizations and path length improvements that helped with the excellent response time results. And lots more is coming.

The final piece of FUD we made sure was addressed was system fullness – last time we ran the test we didn’t fill up as much as we could have, which prompted the FUD-mongers to say that FAS systems need gigantic amounts of free space to perform. Let’s see what they’ll come up with this time 😉

On to the numbers!

As a refresher, you may want to read past SPC-1 posts here and here, and my performance primer here.

Important note: SPC-1 is a 100% block-based benchmark with its own I/O blend and, as such, the results from any vendor SPC-1 submission should not be compared to marketing IOPS numbers of all reads or metadata-heavy NAS benchmarks like SPEC SFS (which are far easier on systems than the 60% write blend of the SPC-1 workload). Indeed, the tested configuration might perform in the millions of “marketing” IOPS – but that’s decidedly not the point of this benchmark.

The SPC-1 Result links if you want the detail are here (summary) and here (full disclosure). In addition, here’s the link to the “Top 10 Performance” systems page so you can compare other submissions that are in the upper performance echelon (unfortunately, SPC-1 results are normally just alphabetically listed, making it time-consuming to compare systems unless you’re looking at the already sorted Top 10 list).

I recommend you look beyond the initial table in each submission showing the performance and $/SPC-1 IOPS and at least go to the price table to see the detail. The submissions calculate $/SPC-1 IOPS based on submitted price but not all vendors use discounted pricing. You may want to do your own price/performance calculations.

The things to look for in SPC-1 submissions

Typically you’re looking for the following things to make sense of an SPC-1 submission:

ART vs IOPS – many submissions will show high IOPS at huge ART, which would be rather useless when it comes to Flash storage

Sustainability – was performance even or are there constant huge spikes?

RAID level – most submissions use RAID10 for speed, what would happen with RAID6?

Application Utilization. This one is important yet glossed over. It signifies how much capacity the benchmark consumed vs the overall raw capacity of the system, before RAID, spares etc.

Let’s go over these one by one.

ART vs IOPS

Our ART was 1.23ms at 685,281.71 SPC-1 IOPS, and pretty flat over time during the test:

Sustainability

The SPC-1 rules state the minimum runtime should be 8 hours. We ran the test for 18 hours to observe if there would be variation in the performance. There was no significant variation:

RAID level

RAID-DP was used for all FAS8080EX testing. This is mathematically analogous in protection to RAID-6. Given that these systems are typically deployed in very large pooled configurations, we elected long ago to not recommend single parity RAID since it’s simply not safe enough. RAID-10 is fast and fine for smaller capacity SSD systems but, at scale, it gets too expensive for anything but a lab queen (a system that nobody in their right mind will ever buy but which benchmarks well).

Application Utilization

Our Application Utilization was a very high 61.92% – unheard of by other vendors posting SPC-1 results since they use RAID10 which, by definition, wastes half the capacity (plus spares and other overheads to worry about on top of that).

Some vendors using RAID10 will fill up the resulting space after RAID, spares etc. to a very high degree, and call out the “Protected Application Utilization” as being the key thing to focus on.

This could not be further from the truth – Application Utilization is the only metric that really shows how much of the total possible raw capacity the benchmark actuallyused and signifies how space-efficient the storage was.

Otherwise, someone could do quadruple mirroring of 100TB, fill up the resulting 25TB to 100%, and call that 100% efficient… when in fact it only consumed 25% 🙂

It is important to note there was no compression or deduplication enabled by any vendor since it is not allowed by the current version of the benchmark.

Compared to other vendors

I wanted to show a comparison between the Top Ten Performance results both in absolute terms and also normalized around 1ms ART.

Here are the Top Ten highest performing systems as of April 22, 2015, with vendor results links if you want to look at things in detail:

FYI, the HP XP 9500 and the Hitachi system above it in the list are the exact same system, HP resells the HDS array as their high-end offering.

I will show columns that explain the results of each vendor around 1ms. Why 1ms and not more or less? Because in the Top Ten SPC-1 performance list, most results show fairly low ART, but some have very high ART, and it’s useful to show performance at that lower ART load point, which is becoming the ART standard for All-Flash systems. 1ms seems to be a good point for multi-function SSD systems (vs simpler, smaller but more speed-optimized architectures like the NetApp EF560).

The way you determine the 1ms ART load point is by looking at the table that shows ART vs SPC-1 IOPS. Let’s pick IBM’s 780 since it has a very interesting curve so you learn what to look for.

IBM’s submitted SPC-1 IOPS are high but at a huge ART number for an all-SSD solution (18.90ms). Not very useful for customers picking an all-SSD system. Even the next load point, with an average ART of 6.41ms, is high for an all-flash solution.

To more accurately compare this to the rest of the vendors with decent ART, you need to look at the table to find the closest load point around 1ms (which, in this case, it’s the 10% load point at 0.71ms – the next one up is much higher at 2.65ms).

You can do a similar exercise for the rest, it’s worth a look – I don’t want to paste all these tables and graphs since this post will get too big. But it’s interesting to see how SPC-1 IOPS vs ART are related and translate that to your business requirements for application latency.

Here’s the table with the current Top Ten SPC-1 Performance results as of 4/22/2015. Click on it for a clearer picture, there’s a lot going on.

Key for the chart (the non-obvious parts anyway):

The “SPC-1 Load Level near 1ms” is the load point in each SPC-1 Report that corresponds to the SPC-1 IOPS achieved near 1ms. This is not how busy each array was (I see this misinterpreted all the time).

The “Total ASU Capacity” is the amount of capacity the test consumed.

The “Physical Storage Capacity” is the total amount of capacity in the array before RAID etc.

What do the results show?

Predictably, all-flash systems trump disk-based and hybrid systems for performance and can offer very nice $/SPC-1 IOPS numbers. That is the major allure of flash – high performance density.

Some takeaways from the comparison:

Based on SPC-1 IOPs around 1ms Average Response Time load points, the FAS8080 EX shifts from 5th place to 3rd

The other vendors used RAID10 – NetApp used RAID-DP (similar to RAID6 in protection). What would happen to their results if they switched to RAID6 to provide a similar level of protection and efficiency?

Aside from the NetApp FAS result, the rest of the Top Ten Performance submissions offer vastly lower Application Utilization – about half! Which means that NetApp is able to use 2x the capacity vs raw compared to the other submissions. And that’s before starting to count the possible storage efficiencies we can turn on like dedupe and compression.

How does one pick a flash array?

It depends. What are you trying to do? Solve a tactical problem? Just need a lot of extra speed and far lower latency for some workloads? No need for the array to have a ton of functionality? A lot of the data management happens in the application? Need something cost-effective, simple yet reliable? Then an all-flash system like the NetApp EF560 is a solid answer, and it can still be front-ended by a Clustered Data ONTAP system to provide more functionality if the need arises in the future (we are firm believers in hardware reuse and investment protection – you see, some companies talk about Software Defined Storage, we do Software Defined Storage).

On the other hand, if you would prefer an Enterprise architecture that can serve as the cornerstone of your datacenter for almost any workload and protocol, offers rich data management functionality and tight application integration, insane scalability, non-disruptive everything and offers the most features (reliably) compared to any other platform – then the FAS line running Clustered Data ONTAP is the only possible answer.

In summary – the all-flash FAS8080EX gets a pretty amazing performance and efficiency SPC-1 result, especially given the extensive portfolio of functionality it offers. In my opinion, no competitor system offers the sheer functionality the FAS8080 does – not even close. Additionally, I believe that certain competitors have very questionable viability and/or tiny market penetration, making them a risky proposition for a high end system purchase.

15 Replies to “NetApp posts SPC-1 Top Ten Performance results for its high end systems – Tier 1 meets high functionality and high performance”

As you know, I’m in HP Storage, and love to look at these results – for the same reasons as you point out – they provide very good insights into what is going on.

At first blush, i looked at the full report and thought “hhmmm, looks like a nice result”. However, upon loading all the SPC-1 data points into my tracker – the true picture became clear.

Congratulations NetApp for quite possibly the *worst* all flash array SPC-1 result on record to-date.

The SPC-1 tracker I maintain has a few key data points and simple calculations to give insights into the general population and then the outliers. The outliers generally have something exceptional regarding their architecture or technology – good or bad – which sets them apart from the general population.

In this case – For the SPC-1 results for all flash arrays using standard SAS interface SSD – the average SPC-1 IOPS per SSD tends to average around about 8,000 IOPS per drive. The range is approx. 6000 – 10000. In fact, if I look at the overall total average of all 62 SPC-1 results in my tracker at present – the average SPC-1 IOPS per drive is 5,266.

In this result for the FAS8080 EX AFF – the average SPC-1 IOPS per drive is 1,785. At 1,785 IOPS per drive – this is the worst result for ALL all flash arrays on record !!! Well done! It is even worse than a circa 2012 HP EVA P6500 all flash result at 2,500 SPC-1 IOPS per drive!

Which begs the question NetApp – What is going on in that all flash optimised code path – such that you cannot get within the industry average IOPS per drive for all flash results?

As an FYI, the NetApp EF560 result obtained a very credible 10,209 SPC-1 IOPS per drive which is at the upper end of my 6,000 to 10,000 range for AFA using standard SAS SSD….

First, I want to thank you for being instrumental in our improved SPC-1 results. Your past comments proved invaluable.

For example, you had complained in the past about how our system should be more full. So we listened! We filled it up more than any other array has ever been filled up in SPC-1, especially compared to the other Top Ten systems. The Application Utilization number is now unheard of and about 2x of the other systems in the Top Ten.

But do we get congrats?

No…

Instead, we get this new line of attack about how each SSD wasn’t doing a lot of IOPS. Which means we must be doing something right since you weren’t able to find any other weakness. My sincere thanks, once more. We will try to address this for the next time 🙂 At which point you may complain about the color scheme of the cabinets (it is a bit drab). You never know, you may become instrumental in us having more colorful gear! 🙂

So, here’s a bit about how the sausage was made:

SPC-1 is about not just performance but also price/performance. The 200GB SSDs are less expensive than the bigger ones. We also wanted to have a reasonable amount of capacity.

We got great performance even with 12x 200GB SSD per controller (which would have quadrupled your metric to over 7,000 SPC-1 IOPS per SSD), but the capacity would have been laughable and the working set would have fit entirely in memory, making it a very unrealistic result (which didn’t stop HDS from doing the same thing, but that’s their choice).

So we upped the drive count to keep capacities and working sets vs RAM realistic.

The value isn’t diluted – we still have the most functional system, nice low latency, a solid SPC-1 IOPS number, the best capacity efficiency, and great value for money for a Tier 1 system.

BTW… the more unused space there is, the more expensive each system gets. HDS got a high SPC-1 IOPS result but if you do the math you’re paying a lot for their unused capacity.

But no, you decided to ignore all that.

How about HP post a Top Ten number with a system that’s not a rebadged HDS? Focus on that instead.

Maybe even do it with RAID6, I’ll be very curious to see the performance with RAID6 on 3Par and a working set at least 3x the RAM of the box and over 60% Application Utilization.

Here’s where I come in on this: a top performing array with a superior $/IOP number. So, if I’m a customer and I hit enter and my system responds faster than your system, I win, right? Now, what would that performance cost me?

I love all of the breathless hype around flash. It injects some geek excitement into a boring commodity industry. But, it could be Campbell Soup cans and kite string on the back end and my model still holds. If I hit and my system of Campbell Soup cans responds first, I still win and I’m willing to bet my costs are lower! We can drop our pants and compare shadows on the sidewalk all we want. I’m not sure how that benefits the customer but customers are interested in this stuff. Why? Just like shadows on the sidewalk it is a valid way to compare as long as everyone agrees to do it at the same time of the day, facing the same direction, etc. – control for all the variables. You can try to shift perceptions all you want but “the Shadow knows!” The numbers don’t lie. These are great numbers and it didn’t cost much to get them.

Ironically, we could have gotten an insanely good $/IOPS number by going with 1/4th the SSDs and capacity and fitting everything in RAM. But it would open the door to the accusation of too little data vs RAM. There’s no way to satisfy every possible angle.

We went for the more realistic configuration instead. Something customers would actually buy and use.

I think either your perf team only had 200GB SSD lying around their lab to use and needed a lot of them to get the capacity desired; OR they needed a high drive count for some reason and made them small ones to keep the price down… Either way – the very low (for AFA’s) IOPS / drive kind of indicates the ONtap OS isn’t scaling too well in performance as the drive count increases.

Glad my opinions have helped you improve the test configs and get that capacity utilisation up ! 🙂

Oh, and on the color bit – yeah I agree with you – your frames could do with a splash of Yellow on them….. 🙂

cheers,

Paul. (@ HP Storage)

P.S. – We’ll be back in the top 10 with a 3PAR system, but can’t give away any secrets here!

The systems “above” it have only a tiny fraction of the FAS8080 functionality and two of them are either systems from a company that might just go away any time now or from a company that has no foothold in most of the world for various reasons.

Only HDS has a decent offering and they barely passed the unused capacity limit! Probably to get the working set size small enough to fit in cache as much as possible. Why not fill it up more? Even with RAID10? The capacity was there after all and we’ve seen other SPC-1 results with RAID10 and SSD like the EF560 have very little unused space.

So, our submission:

– used a lot more capacity vs raw (2x the competitors’) – no waste – used RAID-6 equivalent protection where everyone else uses RAID10. Still managed to get top 3 in performance… – used a realistic working set size and didn’t fit it all in RAM – has way more functionality than any other system in the list yet is in the top 3 performance-wise at 1ms – is close to the top regarding performance yet used a realistic configuration people would actually deploy

Those are the facts. And they look pretty good to me and to customers.

IOPS per SSD – like I said that wasn’t our target. I’m sure you’d like it if ONTAP wasn’t efficient there but it is – we used 200GB SSDs for other reasons and got it to over 7,000 SPC-1 IOPS/SSD using 12 SSDs per controller.

But like I said you’re giving us nice ideas, maybe we should have other submissions that prove out different things. We could show the least number of SSDs per head (we’ll call it the “Paul Special”).

And prepare for dissection once you submit your Top Ten result. Come on – try it with a nicely filled RAID-6… 🙂 (we know you won’t dare – nobody does).

I think you are crossing the blurry line by creating too many subjective and derived interpretations on SPC-1 results. As your engineering team knows, and I would hope you would know, the SPC bylaws & policies have strict rules around fair-use and creation of derived data points. (http://www.storageperformance.org/about)

Normalising results subjectively and speculatively to 1ms I dont think fits within the fair-use provisions. Straight line divisions, multiplications etc. are fine so long as the inputs & formula are disclosed.

And on pricing – the submitted price *IS* the street price – individual company practices and behaviors around list price strategies and discounts are immaterial – so your commentary on uplifting the pricing back up to the list price is completely unfair and out of line.

“once you make everyone list price at 1ms to keep a level playing field” <<— the level playing field is the submitted pricing – not Dimitris's adjusted pricing!!

I’m merely using the data points already in the submission reports (specifically the different load points). Open a few of the results and take a look at the load point table and my table. You’ll see what I mean. I’m just copying the values from the official submissions and taking the performance value closest to 1ms (and also disclosing the actual latency at that load point). This does not violate any rules.

Submitted pricing has nothing to do with street pricing. Many vendors (including NetApp) report zero discounting. Does anyone in their right mind think that’s street price? No – that’s just what each vendor decided to submit.

There are various reasons we and many other vendors don’t submit discounted pricing but in the end I’m really trying to teach people how to read these reports and not just look at the little summary table at the beginning of each report. It doesn’t tell the whole story.

The full disclosure material is there for a reason after all. It’s just that most people don’t bother analyzing it. Which certainly helps some vendors more than others.

While I’m not great defender of Netapp, I will point out HP doesn’t really have a lot of room to talk on street price vs. submitted price. I’ve seen pricing on 3PAR vary by 4X depending on the opportunity, negotiated discounts etc or being unlucky enough to be a customer in the UK (Do your disti’s there make 100% margin or something?).

Interesting read overall. I’ve always felt like all flash submissions should be normalized on 1ms. If your deploying ALL flash instead of hybrid and paying the premium I don’t see 10ms being acceptable for most shops.

What I do find more interesting is the argument that features/functionality matter always, I’m running into more and more customer cases were replication at the application level, data reduction at the app level (to optimize cache in app) etc is a more attractive solution.

Good call to edit the original table of top ten results and remove the “List price $/SPC-1 IOPS with adjusted latency ~1ms” comparisons. Much fairer to compare pricing based on the actual submitted SPC-1 price performance and the actual total price of the TSC as submitted. This is fairest, as the submitted TSC price we know is the “Guaranteed not to exceed price” which customers would pay for that tested config.

Thanks for taking the time to highlight the immense details and nuggets of information within the SPC-1 FDR’s. I always like to draw attention to Appendix C of the FDR and encourage customers to take a look at how the actual tested storage config was configured and built – and compare the complexity of some configs versus others.

Why such a dismissive attitude towards Kaminario? By my count they held the #1 SPC-1 and SPC-2 results with a config (the same config for both actually) submitted in 2013 and only got dethroned in early 2015. Slinging FUD like “systems from a company that might just go away any time now” seems like a dangerous and complacent attitude to have. I bet somebody at Auspex said something similar before NetApp came along and started eating their lunch. In fact, I’ve noticed some other resemblances to Auspex these last 9 or so quarters.

Dismissive because they simply haven’t managed to capture a decent account base over all these years. Performance is one thing they do well but so do many other vendors. Performance alone is actually easy.

The point of my post is that performance isn’t the only thing that matters. Sure, we got a high SPC-1 IOPS number and low latency with ONTAP but the point is that’s not the only thing that matters.

Flexibility, maturity, reliability, manageability, automation – all are extremely important for data that matters and non-trivial scales of deployment.

If someone needs 5TB for non-critical data that needs to go super-fast then there’s all kinds of choice.

Once the requirements expand though, the choice of safe vendors shrinks.