SSD

July 29, 2010

It's getting very hard to keep up with all the crazy social media stunts coming out of Hopkington, but they seem to have done it to themselves again. First was the questionable spamming for viewers so they could claim they had a viral video, then today they just "leaked" a 3PAR sales "kill sheet" - and also apparently established a "secret" site with the URL Notapp.com, where they compared their own guarantee program to Netapp's. According to Simon Sharwood at Search Storage Australia, the site was removed and accessing the URL directed browsers to EMC's site.

Perhaps it is all part of a new marketing strategy by newcomer Jeremy Burton, who joined EMC as Chief Marketing Officer back in March. As best I can tell, Burton's new marketing strategy for the company is that people will believe anything. Maybe he doesn't think there are enough new products coming out of EMC - or that the delays in getting their ballyhooed FAST out the door are too embarrassing - but instead of trying to promote EMC on its own merits, it looks like he is doing his utmost to mud wrestle. Is that what EMC is paying him the big bucks for?

EMC suddenly is taking a bigger interest in 3PAR. That's good. Search Storage Australia just published parts of a competitive document that EMC was circulating to it's partners about 3PAR. It certainly wasn't a surprise because we'd seen it previously, but I was sorry to see it published because it made EMC look ridiculous, which was working pretty well for us. But now that it's been outed, here is what we have to say about it (in the guise of Ineption's lead character, the CRO)

The messaging is not built in, but our zero detection technology for optimizing capacity is. The host SW commands to do this are short and do not require "careful coordination". Veritas, Oracle, Windows Server and Linux software all work with minimal operator effort. For instance, this document from Oracle, describes the whole process, with the sole operator command being this: #bash ASRU LDATA.

Can EMC provide online reclamation of zeroed space without risking capacity overruns and with tolerable performance? 3PAR can. Does EMC have these capabilities in both mid-range and enterprise storage arrays? 3PAR does.

3PAR has both Flash and 1 TB SATA drives. We also have Adaptive Optimization software that uses Flash SSDs for storage tiering. EMC still doesn't have it after they made such a big deal about it last year. They like to tell customers that their size gives them development advantages, but their track record doesn't support their claim.

3PAR arrays allow users to create many tiers, but without the need for disk pools. Tiers are constructed from the combination of drive type plus RAID level. For instance, you can have separate tiers for SATA, FC and Flash SSD drives with the RAID level you select. Our Dynamic Optimization software allows admins to move data from one tier to another. You can "dial in" the performance and protection you want.

All systems have a peak output , ours just happens to have a lot more throughput than theirs - and at higher disk utilizations. We have published benchmarks that show how our systems perform. They don't. Adding disk drives to a system and utilizing those drives is far easier with a 3PAR system than either VMAX or Clariion where you have to wrestle with putting drives in the pools you want to use them for.

There are no disk pools in 3PAR storage. Pools trap resources so you can't use them. Work isolation in pools leads to hot spots and storage admin nightmares. Wide striping does not mean you can't have tiers. That is an idiotic statement.

VMAX can configure large pools - and all the drives in them have to be at the same RAID level meaning you can't create multiple tiers within those pools. If you want multiple tiers, you need multiple pools and all the headaches that involves. Change management in an environment with multiple pools is complicated. You also need to consider the pools needed for snapshots and remote replication. Are those easy to provision and change on EMC storage. Most would say "no".

3PAR uses all disk spindles all the time for delivering IOPS and pro-active sparing is done using reserved space on those drives. Rebuilds do process quickly. Would EMC have you believe they never have to perform drive rebuilds? Really?

The RAID6 thing really makes me sad. They look so stupid when they say it. We're all sorry to say goodbye to that piece of FUD.

Our front end archiecture was designed for large-scale parallel connectivity to match the massive bandwidth capabilities of our wide striped back end. Our benchmarks and the cost per IOPs in those benchmarks speak for themselves. Our customers also tend to run 3PAR systems at much higher disk utilizations than they run other vendor's arrays.

We support a huge number of ports on our systems w/full
active/active data access across all controllers. All controller nodes can be used to access all data volumes. We have a number of customers that run fairly sizable SANS without switches because they have enough ports on their arrays so they don't need to consolidate access through switches.

5- 9s? We're there. Our systems get pounded on every day in some of the largest private and public data centers in the world. They are designed with complete redundancy in all components and have advanced capabilities such as Persistent Cache to maintain high levels of performance even after the loss of a controller.

The delays in bringing their FAST tiering software - a product they were hyping in April of 2009 - to market have shown that size doesn't matter much when it comes to delivering technology on time. I'm not saying 3PAR always delivers on time, but EMC is far from immune to these problems. In fact, the need for them to coordinate across multiple product lines creates certain disadvantages for them.

As to their comments on our support; they are pure FUD and grasping for straws. We would not be able to maintain the customers we have if it were not for our efforts at supporting them.

* * * * * *

The following content was added on July 30th by Rusty Walther, 3PAR's Vice President of Customer Services & Support.

Stating
that 3PAR “outsources support” is just plain silly, especially coming from a
company that keeps most of the worlds’ largest offshore outsourcing companies
in business. Like EMC, 3PAR uses Third Party Maintenance suppliers
(TPM’s) for break-fix field activities. In some geographies, EMC and 3PAR
even use the “same” TPM. But EMC also outsources most of their volume call center and Level-1 Technical Support to offshore suppliers. Not so at
3PAR. Everyone that touches a 3PAR support case is a 3PAR-badged
employee. I challenge EMC to identify a single outsourcing company that
handles 3PAR technical support. EMC’s outsourced technical support
sub-contractors could be listed alphabetically, by geography, or by technology category
… but you’d need a couple of sheets of paper to do it.

June 30, 2010

I've been going slightly nuts since yesterday after Cisco announced the CIUS. It looks like the perfect tablet for the sorts of things I really want a personal screen device for - communicating with other people. This review by Erik Parker of InfoWorld is a pretty good read and it summarizes key advantages and disadvantages of CIUS. If it can make the technology of video conferencing transparent to end users, it will be a big deal.

But the hidden story to this is that Cisco is also making a play to get into the corporate desktop/laptop business with the CIUS. The idea that companies could deploy these with VDI is definitely part of Cisco's grand plan for world domination. Whether or not the CIUS could replace laptop or desktop computers remains to be seen, but there are reasons to think they could eventually if the stars align.

The arguments for VDI are strong, but there are still a lot of hurdles to overcome, such as back end storage performance to support boot storms. By the way, people looking at large VDI implementations might want to look at 3PAR's wide striping storage systems to get the sort of affordable IOPS needed to support large VDI environments. My previous post illustrates our design for massive throughput, which supports a huge number of IOPS without needing SSDs or requiring storage administrators to create special disk pools to isolate the VDI workload from other applications running in the same storage array.

10. You have to buy it before you can determine if it will work for you.

3PAR's sub-volume tiering, Adaptive Optimization (AO), comes with a lot of features that make it much better than what competitors say. It doesn't require SSDs and it's not for everybody, but if you need it, it will do the job.

There are two aspects of the discussion that I think are fascinating: first is the role of social media as the means to include customers, vendors and others in an open discussion that typically is conducted privately by a vendor preparing to release a new product or feature, second is the challenge of defining a storage capability with sufficient focus and vendor independence so that is is meaningful. There has been some amount of skepticism about this effort, suggesting that we are predetermined to end up with ambiguous terms that can be interpreted (spun) by anybody (any vendor) to mean anything (our product does it). I'm hopeful the results can be better than that, but subjectivity is the only constant of social media and it's not likely that everyone will like whatever happens.

In general, the language of digital storage has many overlapping meanings, which causes a fair amount of confusion in our industry. I've been dealing with this problem for many years, going back to when I wrote Building Storage Networks and had the challenge of trying to invent generic terms for functions that had been coined by vendors and tied to specific products. My interest in defining Storage Federation goes beyond my role as an employee for 3PAR.

The notion of federated storage has been around for several years, but recently it came to light when Pat Gelsinger of EMC referred to it during a press briefing on March 11. EMC blogger Chuck Hollis wrote about it afterward and there was some chatter, culminating in a blog post by @Stuiesav on April 2, which proposed that the discussion about Storage Federation was just marketing hype attempting to rebrand storage virtualization. EMC's Barry Burke (@Storageanarchy) and I (@3parfarley) both agreed that this was not the case this time around and then in the last couple days the discussion fired up again. The problem with Twitter is the limitations of 140 characters per tweet. It's surprising what can be done with so few characters, but it does have limitations.

Skeptics Allowed

This weekend, @ianhf posted in his Grumpy Storage blog echoed @Stuiesav's skepticism and expressed his perspective (as a customer) as to the things he would like to see when new technologies - in this case Storage Federation - are introduced. Here are some of the items from his list (Some of them don't fit our definition exercise because they assume there is a new product being introduced, which is not the case here).

The use cases this feature / function applies to, and those that it doesn't

Why & how this feature is different to that own vendor's previous method for solving this problem

Provide clarity over the non-functional impacts of the feature
before, during & after it's use - ie impact on resilience, impact
on performance, concurrency of usage etc (including provide up-front
details of constraints)

Naturally you'll also expect me to require TCO & ROI of the
feature, and any changes to the models as a result of this feature

So lets get started (already!):

The definition of Storage Federation that was kicked around on Twitter is something like: "the transparent, dynamic and non-disruptive distribution of storage resources across self-governing, discrete, peer storage systems." (And yes, I did elaborate a bit on this while I was writing based on bits and pieces from comments I read and further thought of my own.)

The idea is to have multiple storage systems cooperating as a team (as opposed to under the direction of an external entity) to place data in the aggregated storage resources (LUNs or volumes) of all participating members. An example of Storage Federation is how Dell/EqualLogic arrays distribute their volumes over multiple systems. When a new EqualLogic array is added to an iSCSI SAN, the administrator is asked if the array should be placed in the same group as other arrays in the SAN. If this is done, the arrays start splitting their volumes (and workloads) across both arrays.

Five examples of storage federation capabilities are:

Storage expansion: You want to increase the storage capacity of an existing storage system that cannot accommodate the total amount of capacity desired. Storage Federation allows you to add additional storage capacity by adding a whole new system.

Storage migration: You want to migrate from an aging storage system to a new one. Storage Federation allows the joining of the two systems and the evacuation from storage resources on the first onto the second and then the first system is removed.

Safe system upgrades: System upgrades can be problematic for a number of reasons. Storage Federation allows a system to be removed from the federation and be re-inserted again after the successful completion of the upgrade.

Load balancing: Similar to storage expansion, but on the performance axis, you might want to add additional storage systems to a Storage Federation in order to spread the workload across multiple systems.

Storage tiering: In a similar light, storage systems in a Storage Federation could have different capacity/performance ratios that you could use for tiering data. This is similar to the idea of dynamically re-striping data across the disk drives within a single storage system, such as with 3PAR's Dynamic Optimization software, but extends the concept to cross storage system boundaries.

These are all examples of Storage Federation that Dell/EqualLogic storage systems are capable of today. It's not intended to be an endorsement of their distributed volume manager, I am simply using it as an example of Storage Federation. Saying that, if you look for "Storage Federation" on the Dell/EqualLogic web site, you probably won't find it today because they don't describe it that way, but that doesn't mean that it is not in the product.

That's not it

People also want help understanding what the definition of Storage Federation should not include. Here are my thoughts:

In-band storage virtualization systems like IBM's SVC can provide all the capabilities listed above for Storage Federation. However, in-band virtualization products govern the behavior of the other storage systems networked to it and Storage Federation involves self-governing, peer systems. Another way of saying it is that Storage Federation does not have functionality in the network between host systems and storage.

File, object and data distribution technologies like EMC's Atmos certainly provide a type of federation insofar as multiple computer systems can access the same data objects from multiple locations that may be separated geographically. However, this capability primarily migrates files (data objects) - and is functionally orthogonal to Storage Federation, which works on volumes. A term like Data Federation is probably more precise than Storage Federation for this sort of capability.

Clustered storage systems, like 3PAR's InServ storage systems or clustered NAS systems are not examples of Storage Federation because they function as single, scalable storage systems, not as discrete storage systems that are networked together. Clustered storage systems can work together in Storage Federations with other clustered or non-clustered storage systems. In that case you could have a Storage Federation of clustered storage systems.

How is this different?

I thought I'd make a little effort to respond to @ianhf's requests.

The main problem Storage Federation addresses the limitations presented by a single storage system, such as capacity, performance and maintenance availability.

Currently, when a new storage system is added to an existing environment, there are server administration tasks that need to be done in order to migrate volumes from an existing storage system to the new one. In addition, there is usually some amount of downtime and/or performance degradation associated with the migration and configuration of new data paths. By contrast, with Storage Federation, a new storage system could be added without having to reconfigure host data paths and the migration would be processed transparently and dynamically, without any downtime or loss of path redundancy while the migration is in process.

Increasing performance for a particular volume in a Storage Federation is somewhat less obvious because the other federated storage systems might not have more performance-resources available to boost performance. For instance, if a new storage system inserted in a Storage Federation does not have more disk drives (spindles) or more flash SSD capacity, there is no guarantee that any volumes moved to it would provide faster performance. There is some likelihood that performance could be improved by moving a volume to a storage system running at lower utilization levels, but maintaining lower utilization levels is not realistic or very cost-effective. All this said, it is possible for a Storage Federation to use the aggregate resources of multiple storage systems to increase performance of a single volume - such as by spanning all the disk drives in the federation and by using the cache of all participating storage systems. An example of products that provides this capability today are the Dell/EqualLogic storage arrays.

The availability of storage systems needing maintenance could be greatly improved by Storage Federation if an existing storage system can be removed from the federation after having it's volumes evacuated (think v-Motion but for storage systems) to other storage systems in the federation. After being removed from the federation, the storage system could be downed to have any sort of maintenance operation done to it without risking the availability of any of the volumes that had previously been on the system undergoing maintenance.Obviously, all of this would take time planning and ensuring the Storage Federation would not be overloaded with data and workloads, but there are probably many customers that would prefer to do maintenance to storage systems when they are offline.

One of the possible exposures of Storage Federation is the increased exposure to system failures. For instance, a Storage Federation that distributes a single volume over three separate storage systems is 3 times more likely to have a failure than a Storage federation that does not allow volumes to span across storage system boundaries. FWIW, this is the main weakness of the Dell/EqualLogic implementation of Storage Federation.

That's all for now

It's late and there is plenty of material here for the grinder. Please feel free to comment in any way; agree, disagree, correct mistakes and ask questions. Thanks for reading.

April 01, 2010

I wrote a post yesterday that showed IOPS calculations for a few different native wide striping configurations and I thought I'd add storage tiering to the mix today. Native wide striping places data from all volumes across all drives in the array (or of a certain drive class if you have mixed drives in your array) and randomizes workloads across all resources. The biggest advantages of native wide striping over traditional array designs that rely on multiple pools and workload isolation are:

Avoiding poor storage utilization associated with workload isolation

Avoiding IOPS and capacity limits associated with workload isolation

Avoiding hotspots associated with legacy RAID striping

Sustaining dependable, high performance for mixed workloads

Improving both VM and storage densities

Although native wide striping can handle complex, mixed workloads of transaction and sequential data access, applications that are either latency sensitive or single threaded can significantly increase their storage performance through the use of SSDs and storage tiering.

The 3PAR tiering solution uses STEC MACHIOPS SSDs with a sustainable I/O rate of 10,000 IOPS. These devices have 50GB capacities and are installed as sets of eight SSDs across all mesh active controllers in a 3PAR InServ array to balance the high IOPS workload load over all controllers as well as drives.

IOPS calculations

Below are a few calculations for maximum sustainable IOPS from InServ arrays that use both SATA drives as well as SSDs with AO. I used 5,000 IOPS as the metric for calculating SSD performance, which is a conservative estimate for the STEC MACHIOPS performance, but actual performance from an AO-enabled array could be lower due to a number of variables including the I/O activity levels that can be sustained by both applications and servers, policy settings made by storage administrators, the accuracy of algorithms to select data for tiering and copy operations that populate and de-populate the SSDs.

Storage tiering is still in it's early stages and the industry is going to learn a great deal about this technology over the next several years. Performance models will certainly evolve as key variables are identified, which will almost certainly include server and application components.

Array 1: 160 SATA disk drives; 80% reads, no SSDs

Total IOPS of all drives in the array: 12,800

IOPS delivered to all servers w/RAID 5: 8,000

IOPS delivered to all servers w/RAID 10: 10,667

IOPS delivered to all servers w/RAID 6: 5,998

Array 2: 160 SATA disk drives; 80% reads - 8 SSDs; 50% reads

Total IOPS of all drives in the array: 52,800

IOPS delivered to all servers w/RAID6 (SATA) & RAID5 (FC): 21,998

Array 3: 160 SATA disk drives; 80% reads - 32 SSDs; 50% reads

Total IOPS of all drives in the array: 172,800

IOPS delivered to all servers w/RAID6 (SATA) & RAID5 (FC): 69,998

Array 4: 480 SATA disk drives; 80% reads - 32 SSDs; 50% reads

Total IOPS of all drives in the array: 198,400

IOPS delivered to all servers w/RAID6 (SATA) & RAID5 (FC): 81,994

Conclusions

Even a relatively small amount of SSD storage can boost performance approximately four times, as shown by Arrays # 1 and 2 above where eight SSDs totaling 400GB were added to an all-SATA configuration. It's also interesting to note the performance differences between arrays #3 and #4 above. Although the number of SATA drives tripled, the IOPS performance increased only 15%.

Single threaded applications that are idle while storage I/Os complete

People ask about Microsoft Exchange and I tell them it benefits a great deal from big, wide striping, but not much from tiering because Exchange performance is mostly a matter of providing adequate throughput.

An app that people run daily but is seldom associated with transaction processing is backup. This SWCSA video discusses backup as well as the prevailing shift to dashcams and the implications for SWCSA branding.

March 11, 2010

About 10 years ago a small team that I was a part of looked at starting a company that would do something similar to what IBM's SVC does. The idea was to create a SAN front end controller with a lot of cache memory that would virtualize "downstream" storage and provide performance boosts through various techniques such as caching, striping, and multi-way mirroring. We gave up on the idea when it became apparent to us that the project was quite a bit larger than we initially thought and it was unclear when we would ever have sufficient resources to get a competitive product to market. I think we could have sold the idea to venture capital investors who were throwing money at storage startups, but we couldn't sell it to ourselves. For those of you that wonder why I tend to think SVC is an important product, that's why - I know some of the things IBM did to make it work and I admire their ability to bring it to market.

Anyway, one of the hurdles we couldn't get past was how to deal with mixed workloads originating from multiple SAN attached servers simultaneously streaming data from data warehouses and IOPS from transaction databases and all sorts of other bursty, unpredictable applications competing for memory and disk resources. As a front-end appliance, you can control your own cache resources, but you can't do much about the back end disks because downstream arrays mask them from the appliance. In fact, there was no way to predict the performance of the downstream arrays for any given workload. We consulted with experts from the industry and research universities and they were all discouraging about the ability to significantly improve the performance of mixed workloads in a shared SAN appliance.

There were two issues: downstream arrays already had their own caches and the dynamics of sharing cache resources in an appliance. If these fundamental problems could be solved, the rest of the work was the not so simple matter of making it work as a cluster with mirrored cache for HA.

Uncoordinated multi-level caches have the problem of redundant data. For example, pre-fetching data in the appliance's cache will likely end up loading the same data in the downstream array's cache. With duplicated cache data, a cache hit on the appliance won't be that much faster than a cache hit in the downstream array - and a cache miss in both will be much slower. It's difficult to prove the value of THAT. Its certainly possible to tune both caches differently, but this turns out to be easier said than done and tends to flatten the value of an "easy to use" appliance.

So, the way to overcome this is to make a HUGE cache in an appliance. Increasing a cache's size can do a lot for performance and Chuck said,

The exception, of course, is if you've got the bucks to create a
ginormous read cache, and pull almost all the significant data into
memory. Don't snicker -- there are a few use cases where this sort of
approach makes sense.

In other words, create a RAM disk in cache. This technique is normally reserved for a single, high profile application and as Chuck wrote, there are use cases where this makes sense. But it doesn't address the requirements of mixed workloads where there are a large number of applications that do not merit dedicated memory, but still need good performance. It might be possible to micro-manage cache for some number of applications by dedicating cache for each of them, but that requires a great deal of work that is likely going to be a temporary solution lasting a couple of months at most. It's probably a great way to drive storage admins crazy.

A more palatable approach is to use a global cache that shares cache resources among multiple servers and their applications. In some cases, it's possible to predict workload demands (end of month processing for example), but in many cases, the instantaneous performance requirements cannot be predicted because it is driven by spontaneous events. As many people are too familiar with, spikes in Internet activity and the corresponding bottlenecks in back end storage clearly elaborate the challenges of mixed workloads. Global caches that can accommodate large Internet traffic spikes are expensive and do not provide noticeable performance advantages most of the time - they are overkill.

The proliferation of virtualized servers has significantly increased the breadth of the
mixture that a storage array has to deal with. In general, there is more overall
I/O activity that is, for the most part, less predictable, and therefore
more difficult to deal with in cache.

The question is, is storage tiering with SSDs any better? Possibly. If it is going to more effective than caching, it needs to be able to provide more control points and intelligence than cache typically does. For instance, the ability to prioritize and schedule applications for movement into SSD tiers could be an important difference. 3PAR's QoS Gradient concept in Adaptive Optimization is an example of a simple prioritization scheme. The internal counterpart to QoS Gradients is internal monitoring of I/O levels which help determine which applications are promoted.

That's not to say caching can't have some of the same controls, but traditionally caching has been more reactive than proactive. To be clear, tiering is also reactive, but within the context of intelligent preparation and business-driven policies.

Still, considering the cost of SSDs, you have to ask the question is tiering overkill too? It might be. If the array can keep up most of the time, how much SSD capacity should be purchased? These are the sorts of things that will be determined over the next couple years.

The best technology to date for dealing with mixed workloads continues to be wide striping. If you throw out latency-sensitive applications from the mix - those are the same apps Chuck Hollis referred to when he talked about the applicability of "ginormous read caches" - then the array just has to provide adequate throughput at reasonable service times (latencies).

Wide striping does this by spreading the workload mix over as many disk drives as possible. Not the number of drives that fit in a shelf or can be added to a RAID group, but all the drives in an array, at best or all the drives of a particular class, day SATA or FC. Wide striping very thin layers of data across hundreds of drives means that hundreds of servers can be accessing data simultaneously, with a minimal amount of contention. The result is that all the drives are kept busy at the same rate and that none of them bear an unfair burden. The overall sustainable throughput is very high, scales by adding more drives and in general fits the profile of mixed workloads better than any affordable caching scheme does.

We've become accustomed to thinking that more memory is the answer to all storage performance problems, but that doesn't exhaust the potential of massed disk drives.

March 10, 2010

There have been some interesting discussions lately about storage tiering And just because 3PAR beat most everybody else to the punch this week with our AO announcement, I think it's important to keep things in perspective - storage tiering does not solve everybody's problems. Here is my top 10 list of reasons some people don't want to embrace tiering:

1) Tiering works by making copies of data on lower cost, low-IOPS storage to high-IOPS storage - and back again. Storage tiering has been associated with ILM, which assumed data is initially located on more expensive, high-IOPS storage and, as it ages and is accessed less frequently, is moved to lower-cost, low-IOPS storage. The perception that tiering implies fast to slow data migration was reinforced by Compellent with it's early entrant storage tiering technology, Data Progression.

The economic benefits of tiering are much more compelling if data is originally located on low-IOPS storage and then moved to high-IOPS storage when it becomes useful to do so. This reduces the amount of high-IOPS storage that needs to be purchased and reserves high-IOPS storage for the applications that need it the most. This model of promoting data to high-IOPS storage will replace the old model of data "trickling downhill to cheap storage."

2) Sub-volume tiering means high-IOPS storage can be reserved for high-IOPS work and effectively shared by the applications that need it the most. AO copies data in 128 MB sub-volume regions that contain specific RAIDed volume slices. Many physical and virtual servers can have their volume's most active regions located in high-IOPS storage capacity at the same time.

Data redundancy is accomplished when AO reads data from it's source region and restripes it into a region on the target tier - using the RAID level of the target. AO
allows data to be protected by whatever RAID is appropriate for the
tier and the data. 3PAR's chunklet architecture is maintained for SSDs, which means a SSDs in an InServ array can apply several different RAID levels simultaneously. Every vendor's sub-volume tiering technology will be different, including the number of ways devices can be combined in RAID and how wide striping can be applied.

3) Tiering does not mean you have to buy SSDs to make it pay off. Tiering is a cost-reduction technology. One of the most obvious ways to reduce the cost of storage is to buy cheaper disks with higher capacity, such as SATA drives.

The regions used by AO are the same on-disk structures that 3PAR uses for it's Dynamic Optimization (DO) software that re-levels volumes across disk drives in an InServ array. A customer with all FC drives in an InServ array could take advantage of both AO and DO by increasing the capacity of an array with SATA drives, using Dynamic Optimization to redistribute their volumes across the SATA drives and then using FC drives as their high-IOPS AO tier. This way, they can continue to get the IO rates they expect, but reduce the cost of incremental capacity as they upgrade their system.

4) The system determines what to move and how to move it.I/O density rate is a term that refers to how much data access occurs in a region over a given amount of time. AO recognizes region candidates for tiering by their I/O density rates.

Administrators control the AO participation for each volume by assigning them to an AO Profile and a QoS Gradient. The profile is a short stack of device-RAID levels, such as SATA RAID 6, FC RAID 5(7+1) and SSD RAID5 (3+1). AO allows either 2 or 3 device RAID levels in the profile's stack.

Another discussion was around using policies to automate the process.
One group was a bit concerned about automating this process but
realized that, again, with PBs of data being stored that the only way
to effectively implement intelligent tiered storage is via automation.
Additionally, it is not an all or nothing proposition. You can select
certain volumes and applications to implement and gain a comfort level
before deploying more widely. One of the key tenants of technology is
to automate otherwise manually cumbersome processes. We just need to
get over that hurdle but we need to do so in a planned, considered and
reasoned way.

By applying measured I/O density rates, AO profiles and QoS Gradients, 3PAR has taken the first major steps to automating storage tiering and removing the burden from administrators.

....it spreads a small amount of SSD amongst the 3PAR engines so the IO’s
aren’t all going to a single drive and sucking up a lot of bandwidth –
it’s nicely balanced. Traditional implementations will use larger drive
with more IO’s going to that drive. The part of the array with that
drive will get more activity.

In practice we don’t think this will matter all that much because, for
example, EMC’s V-Max has more bandwidth to play with than 3PAR and EMC
uses its cache to transfer data between tiers to avoid bottlenecks.
Nonetheless, on paper, the 3PAR implementation looks to be more
efficient which means (in theory) it can do more with less flash. But
nobody really knows yet.

3PAR storage arrays avoid I/O bottlenecks by incorporating tiny virtual storage elements (chunklets) and spreading the workload over as many devices and controllers as possible. This approach differs from other vendors where smaller groups of resources are created and then combined into larger constructs that are more cumbersome to manage and tune than a single widely distributed storage span. The same concepts apply to SSD integration, where InServ arrays accommodate multiples of many, smaller sized SSDs for scaling out high-IOPS tiers for those customers that may want to expand their use of AO in the future .

December 09, 2009

Chris Evans, the Storage Architect, had an unexpected analysis of EMC's FAST announcement today on his blog. The point he makes is that with FAST only being available on Intel-based array architectures - and not the DMX product line - that EMC has put themselves at a peer level with the rest of the industry. As he says:

So what’s my point? Well, simply this; EMC have legitimised the
enterprise modular architecture characterised by V-Max. This accepts
that the future is commodity-based hardware with differentiation in
software. However, EMC are no longer the leaders in this field and are
having to play catch up.

It's an interesting angle to be sure and it will be interesting to see how EMC's new platform works out after all the bluster and FUD over FAST is long forgotten. EMC is going to busy in the years to come as they try to convert their DMX customers to v-Max customers. All vendors go through this sort of thing when they introduce new architectures and even their most loyal customers are more or less forced to consider alternatives. There is a lot of pressure on EMC all of a sudden to bring software to market that justifies the switch from the DMX -to v-MAX. FAST has failed to do that so far.

EMC's pre-announcement of FAST earlier this year was presumably done to shore up their disappointing SSD sales - a tactic that apparently did not help their enterprise business much. But having played the FAST hand so early, they had no choice but to release FAST version 0.5 before the end of the year. Will this help answer the questions about EMC's ability to develop enterprise level software for v_Max? The answer has been loud and clear - let's wait until Fast 2.0.

EMC was way too early with SSDs and they just repeated the same mistake with FAST. Instead of creating an advantage for themselves, they allowed their competitors to observe what they are doing without applying any pressure. Instead, unexpectedly, the pressure is now on EMC to do better.

Considering that 2009 was a year of backpedaling for EMC, I want to offer lessons in the art with this video:

December 08, 2009

3PAR has been designing and marketing it's agile storage for years. Recently, Hitachi started using the word "agile" in their messaging. Imitation is a form of flattery, but at least they should link to our Cloud Agile partner page when they plagiarize like that.

“Over the past two years, EMC has optimized all of its major storage
architectures to support new levels of automation required in virtual
data center environments and for the transition to private clouds.
These dynamic and agile environments require the capabilities that
EMC’s FAST technology provides for effective management and scaling.
This is an area of significant investment for EMC and we are the only
one offering this technology across high-end, mid-tier and unified
platforms.” - Pat Gelsinger, EMC President and Chief Operating Officer, Information Infrastructure Products

Online data movement is a big deal for storage and FAST puts it squarely in the mainstream of storage software. Gelsinger is right about virtual environments being dynamic and agile, but the problem with FAST is that the underlying storage does not measure up and FAST shines a spotlight on those shortcomings. If you are going to have online data movement, you need to have the ability to make the results of those data movement operations productive - without turning your storage utilization goals upside down. FAST undoubtedly will work great where EMC storage is under-utilized and very expensive, but how will it work when you are trying to reduce the cost of storage by increasing storage utilization?

It also looks like EMC is backing away slightly from their bullish position on SSDs. I was wondering how they were going to deal with their disappointing SSD sales when they finally announced FAST.

Speaking of bullish, I found this little video that I thought I'd share with you of some truly agile guys that really know their bull!

December 07, 2009

While some have wondered whether or not EMC was going to be able to get their FAST product announced this year, others have been keeping their heads down getting work done. For instance, Symantec, who today announced their Dynamic Storage Tiering feature for SSDs in their Storage Foundation products.

Of course, many probably wonder why the heck I'm writing about this seeing how 3PAR doesn't have support for SSDs in it's products yet. That's easy. A v-Max with FAST is going to be more of a science project than a production system that actually gets work done. I'm more concerned about customers buying somebody else's SSD-enabled array and using it with Symantec's Dynamic Storage Tiering than I am about FAST on v-Max.

But I'm not that concerned because the SSD market has been slow to develop - even by EMC, the company that has been leading the charge (and inventory buildup). And even though they like to point to the check boxes in the feature checklist, big wide striping (the kind that flattens disk contention problems) with a v-Max is still an exercise in storage contortionism, unlike on a 3PAR, where its the default behavior.

But all this will change over time and the cost of deploying SSDs is going to come down as enabling software for SSDs becomes available and is refined. Even EMC's software will slowly get better. Despite the performance of the SSD devices - and EMC's wishes, the race for best of breed SSD functionality is going to be much more of a marathon than a sprint.

So where's 3PAR? We've been shipping DO (Dynamic Optimization) for 4 years and today we announced the policy advisor for automating most of the tasks that used to take admins time to work through. Changing service levels for volumes just got a lot easier. Drive speed, RAID levels, additional drives - if you want to change them, you can - and it doesn't take weeks or months and piles of spreadsheets to figure it out.