Tag Archives: Storage Center

Scale-out architecture and why it is important when architecting a storage solution.

I had an interesting discussion with an architectural firm the other day. Most of the discussion was around scaling for the future. In our discussion we talked about the linear scalability of the ISE technology and he pointed out that while that made a ton of sense for his block-access requirements he was a little concerned around the unstructured data, as well as some plans utilizing NFS for some of his server and desktop virtualization needs. The last thing he wanted to worry about was changing his architecture in 12 to 24 months due to growth or technology changes. So we started working on architecting a solution utilizing our new “scale-out” ISE-NAS solution.

You’ve probably heard a lot about scale-out type architectures. 3PAR sort of led the way with their ability to scale out (at least to eight) their storage controllers to their fixed-backend backplane-attached disk drives and it offers up a pretty unique solution (at least in a block storage architecture). 3PARs problem is they don’t really have an answer for the same scalability around unstructured data (NAS). Don’t get me wrong, they list 5 NAS companies on their website, 1 is out of business and the other 4 have either been acquired by their competitors or is a straight up competitor. This scale out architecture seems to have caught on in the emerging NAS Gateway devices like Symantec FileStore and Isilon. Clearly both FileStore and Isilon are very different on the scale-out architecture. More below.

So first things first, let’s describe what a “scale-out” architecture means, at least to me that is. When architecting solutions, it’s always important to put a solution together that can grow with the business. In other words, they know what they need today, and they have an idea what they might need in 12 months, but 24 – 48 is a complete crap shoot. They could be 5X the size, or just 2X the size but the architecture needs to be in place to support either direction. What is sometimes not discussed is what happens when you run out of either front-side processing power, backend IOPS or usable capacity? Most storage solutions give you 1 to 2(ea) clustered controllers, and a fixed number of disk-drives they can scale to dependent on the specific controller you purchase. From a front-end NAS solution most of them only scale to 2 nodes as well. If you need more processing power, more backend IOPS or capacity, you buy a second storage solution or you spend money to upgrade storage controllers that are not even remotely close to being amortized off the CFO’s books. If you look at the drawing above, you can clearly see what scale-out architecture should look like. You need more front-side processing, no problem. You need more backend IOPS or Capacity, no problem. They scale independently of each other. There is no longer the case of “You love your first <insert storage/NAS solution of choice> and you hate your third, fourth etc etc. Isilon is probably a great example of that. They tout their “scale-out” architecture but it clearly has some caveats. For example, If you need more processing power, buy another Isilon, you need more capacity buy another Isilon, you need more backside IOPS…well you get the idea 🙂 It’s not a very efficient “scale-out” architecture. It’s closer to a Scale up !!

Let’s also not loose site on the fact that this is a solution that will need to be in place for about 4 to 5 years, or the amount of time in which your company will amortize it. The last thing you want to have to worry about is a controller upgrade, or net-new purchase because you didn’t size correctly or you under/over guessed your growth or even worse, years 4 and 5 hardware maintenance. This is especially true if the vendor “end of life’d” their product before it was written off the books !!! Cha-CHING.

So this company I was working with fluctuates with employees depending on what jobs they are working on. It could go from 50 people to 500 people in a moment’s notice and while they would LOVE to size for 500, most of the time they were around 50 to 100. So as I mentioned above, we started architecting a solution that incorporated our ISE-NAS solution based on Symantec’s FileStore product. When coupled with our Emprise 5000 (ISE) gives them the perfect scale-out solution. They can start with 2-nodes and grow to 16 by simply adding NAS engines (x86) to the front end. If they need more capacity, or backend IOPS, we can scale any direction independent of the rest of the solution. Coupled with our predictable performance we gave them the ultimate ability to size for today, and know exactly what they can scale to in the future.

In the world of “Unified Storage”, cloud computing and 3 to 5 year project plans, its important to consider architecture when designing a solution to plan for the future. Scale-Out architecture just makes a lot of sense. BUT – do your homework. Just because they say “scale-out” doesn’t really mean they are the same. Dual-Clustered controllers – or even eight-way – will eventually become the bottle neck and the last thing you want to worry about is having to do a wholesale swap-out/upgrade of your controller nodes to remove the bottleneck or worse, have to buy a second (or third) storage solution to manage!!

If you are a VMware Admin, or a Hyper-Visor admin from a non-specific point of view, Xiotech’s “Virtual View” is the final piece to the very large server virtualization puzzle you’ve been working on. In my role, I talk to a lot of Server Virtualization Admin’s and their biggest heartburn is adding capacity, or a LUN into an existing server cluster. With Xiotech’s Virtual View it’s as easy as 1, 2, 3. Virtual View utilizes CorteX (RESTful API) to communicate, in the case of VMware, to the Virtual Center appliance to provision the storage to the various servers in the cluster. From a high level, here is how you would do it today.

I like to refer to the picture below as the “Rinse and Repeat” part of the process. Particularly the part in the middle that describes the process of going to each node of the server cluster to do various admin tasks.

VMware Rinse and Repeat process

With Virtual View the steps would look more like the following. Notice its “wizard” driven with a lot of the steps processed for you. But it also gives you an incredible amount of “knob turning” if you want as well.

Virtual View Wizard Steps

And for those that need to see it to believe it, below is a quick YouTube video Demonstration.

If you run a VMware Specific Cluster (For H.A purposes maybe) of 3 servers or more, then you should be most interested in Virtual View !!!

I’ll be adding some future Virtual View specific blog posts over the next few weeks so make sure you subscribe to my blog on the right hand side of this window. !!

If you have any questions, feel free to leave them in the comments section below.

By the way, if by chance 10,000 is just not enough users for you. Don’t worry, add a second ISE and DOUBLE IT TO 20,000. Need 30,000, then add a THIRD ISE. 100,000 users in 10 ISE or 30U of RackSpace. Sniff Sniff….I love it !!!!!!!!!!!!

By the way – Check out what others are doing:

Pillar Data = 8,500 Exchange Users with 24GB of Cache !!! I should say, our ISE comes with 1GB. It’s not the size that counts, it’s HOW YOU USE IT !! 🙂

What does the Pacer, Yugo and Arbitrated Loop have in common? You are probably running one of them in your datacenter.

George Crump recently blogged over at InfoWorld and asked, “do we really need Tier 1 storage”? It struck me as interesting topic and while I disagreed with his reasons on where he put our solution, I tend to agree that the others mentioned are right where they should be. In his article he specifically mentions some of the reasons both the monolithic array manufactures as well as the “modular guys” have “issues” and he zeroed in on performance and scalability. Now his article was speaking about the front end controllers, but I think he missed out on pointing to the backend architectures as well. I thought this would make a great blog posting 🙂 As you recall in my “Performance Starved Applications” blog and my “Why running your hotel, like you run your Storage array can put you out of business” blog I said that if you lined up the various different storage vendors next to each other about the only difference is the logo and the software loaded on the controllers.

Did you also know that if you looked behind those solutions you would see a large hub architecture – also known as our dear old friend “Mr. Arbitrated Loop”? This is like running your enterprise wide Ethernet infrastructure on Ethernet hubs. Can you imagine having to deal with them today? For all those same reasons we dropped ethernet hubs like a bad habit, you should be doing the same thing with your storage array manufacturer if they are using arbitrated loops in their backend storage. Talk about a huge bottleneck to both capacity as well as performance at scale!! So what’s wrong with Fibre Channel Arbitrated Loop (FCAL) on the backend? Well for starters it doesn’t scale well at all. Essentially you can only reference 126 components (for example a disk drive) per loop. Most storage arrays support dual loops which is why you typically see a lot of 224 drive solutions on the market today, with 112 drives per loop – approaching the limit and creating a very long arbitration time. Now, for those that offer more, it’s usually because they are doing more loops (typically by putting more HBA’s in their controller heads) on the backend. The more loops on the backend, the more you have to rely on your controllers to manage this added complexity. When you are done reading my blog post, go and check out Rob Peglar’s blog post around storage controller “Feature Creep” called Jack of All Trades, Master of NONE !. At the end of the day the limitations of FCAL on the backend is nothing new.

About 4 years ago we at Xiotech became tired of dealing with all of these issues. We rolled out a full Fabric backend on our Magnitude 3D 3000 (and 4000) solution. We deployed this in a number of accounts. Mostly it was used for our GeoRAID/DCI configuration where we split our controllers and bays between physical sites up to 10Km. Essentially each bay was a loop all to itself directly plugged into a fabric switch. Fast forward to our Emprise product family and we’ve completely moved away from FCAL on our backend. We are 100% FULL, Non Blocking, Sweet and as pure as your mamas homemade apple pie Fabric with all of the benefits that it offers!!

My opinion (are you scooting towards the front of your chair in anticipation?) is unless you just enjoy running things in hubs I would STRONGLY advise that if you are looking at a new purchase of a Storage Array you should make sure they are not using 15-year old architecture on their backend !! If you are contemplating architecting a private cloud, you should first go read my blog post on “Building resilient, scalable storage clouds” and applying the points I’ve made, to that endeavor. Also, if you really are trying to make a decision around what solution to pick I would also suggest you check out Roger Kelley (@storage_wonk) over at http://www.storagewonk.com/. He talked about comparing storage arrays “Apples to Apples” and brought up other great differences. Not to mention, Pete Selin (@pjselin) over at his blog talked about “honesty in the Storage biz” which was an interesting take on “Apples vs Apples” relative to configurations and pricing. Each of these blog posts will give you a better understanding on how we differentiate ourselves in the market.

Recently I wrote about why “Cost per raw TB” wasn’t a very good metric for comparing storage arrays. In fact, my good friend Roger Kelley over at StorageWonk.com wrote a nice blog specifically “Comparing Storage Arrays “apples to apples” . We don’t say this as a means to simply ignore some of the features and functions that some of the other vendors offer. It’s just our helpful reminder that there is no “free storage lunch”.

So let me take you on a different type of journey around “cost per raw TB” and “cost per useable TB” and apply it to something outside of technology. Hopefully this will make sense!!

Let’s assume you are in the market for a 100 room hotel. You entertain all sorts of realtors that tell you why their hotel is better than the others. You’ve decided that you want to spend about $100,000 for 100 room hotel which averages about $1000 per room. So, at a high level all the hotels offer that same cost per room. Let’s call this “Cost per raw occupancy”. It’s the easy way to figure out costs and it looks fair.

You narrow down your list of hotels to three choices. We’ll call them hotel C, hotel N and hotel X. Hotel C and N have the same architecture, same basic building design, essentially they look the same other than names and colors of the buildings. Hotel X is unique in the fact that it’s brand new and created by a group that has been building hotel rooms for 30+ years with each hotel getting better and better. They are so confident in their building that it comes with 5 years of free building maintenance.

So, you ask the vendors to give you their “best practice, not to exceed hotel occupancy rate”. Hotel C tells you they have some overhead associated with some of their special features so their number is about 60 rooms that could be rented out at any given time. The reservation system will let you book an unlimited amount of rooms, but once you get over 60 things just stop working well and guests complain. Hotel N says they can do about 70 rooms before they have issues. Hotel X says they have tested at 96 room’s occupancy without any issues at all.

So, while at a high level hotel’s C, N and X were $1000 a room, after further review hotel C is about $1600 a room, hotel N is $1400 a room and hotel X is $1041 a room. Big difference!! Let’s assume each of these vendors could “right size” their hotel to meet your 100 room request but the room cost will stay the same. So, hotel C would now cost you $160,000, hotel N is $140,000 and hotel X is $104,000. So that my friend is what I like to call “Cost per useable occupancy” !!

Another way to do this is to have hotel C and N right size down to your budget number based on “cost per useable occupancy”. If the $100,000 is the most important and you understand that you will only get to rent out 60 or 70 rooms from the other hotels, then you could save money with Hotel X by just purchasing 60 rooms in hotel X. That would bring Hotel X’s costs down to $60,000 or a nice savings of $40,000!! The net-net is you get 60 rooms across all 3 hotels but 1 offers you a HUGE savings.

At the end of the day, as the owner of that hotel you want as many rooms rented out as possible. The last thing you want to see happen is your 100 room hotel only capable of 60% or 70% occupancy.

So, if you are in the market for a 100 room hotel, or a Storage Array, you might want to spend a little more time trying to figure out what their best practice occupancy rate is !! It’ll save you money and heartburn in the end.

I’ll leave you with this – based on the array you have today, what do you think your occupancy rating would be for your 100 room hotel? Feel free to leave the vendor name out (or not) 🙂

How to build resilient, scalable storage clouds and turn your IT department into a profit center!!

If you’ve been living under a rock for the last year the topic of Cloud based computing might be new to you. Don’t worry about it at this point, there are CLEARLY more questions than answers on the subject. I get asked at just about every meeting what my interpretation of “cloud” is. I will normally describe it as an elastic, utility based environment that when properly architected, can grow and shrink as resources are provisioned and de-provisioned. It’s a move away from “silo based” infrastructure and into a more flexible and scalable, utility based solution. From a 30,000 foot view, I think that’s probably the best way to describe it. Then the conversation usually rolls to “so, how do you compare your solution to others” relative to cloud. Here is what I normally talk about.

First and foremost we have sold solutions that are constructed just like everyone else’s. Our Magnitude 3D 4000 product line is built with pretty much the exact same pieces and parts as does Compellent, NetApp FAS, EMC Clariion and HP EVA etc. Intel-based controller motherboards, Qlogic HBAs, Xyratex or other SBOD drive bays connected via arbitrated loops. Like I’ve said in prior posts, just line each of these up, remove the “branding” and you wouldn’t be able to tell the difference. They all use the same commodity parts. Why is this important? Because none of those solutions would work well in a “Cloud” based architecture. Why? Because of all the reasons I’ve pointed out in my “Performance Starved Application” post, as well as my “Cost per TB” post. THEY DON’T SCALE WELL and they have horrible utilization rates. If you really want to build a storage cloud you have to zero in on what are the most important aspects of it, or what I like to refer to as “The Fundamentals”.

First you MUST start with a SOLID foundation. That foundation must not require a lot of “care and feeding” and it must be self healing. With traditional storage arrays, you could end up with 100, 200 or even 1000 spinning disks. Do you really want to spend the time (or the HUGE maintenance dollars) swapping out, and dealing with bad disks? Look don’t get me wrong, I get more than a few eye rolls when I bring this up. At the end of the day, if you’ve never had to restore data because of a failed drive, or any other issue related to failed disks then this is probably not something high on your list of worries. For that reason, I’ll simply say why not go with a solution that guarantees that you won’t have to touch the disks for 5 years and backs it up with FREE HARDWARE MAINTENANCE (24/7/365/4hr)!! Talk about putting your money where your mouth is. From a financial point of view, who cares if you’ve never had to mess with a failed drive, it’s freaking FREE HARDWARE MAINTENANCE for 5 years!!

Secondly, it MUST have industry leading performance. Not just “bench-marketing” type performance, I mean real audited, independent, third party, validated performance numbers. The benchmarks from the Storage Performance Council are a great example of a third party solution. You can’t just slap SSD into an array and say “I have the fastest thing in the world”. Here is a great example; if you are looking at designing a Virtual Desktop Infrastructure then performance should be at the top of your design criteria (boot storms). Go check out my blog topic on the subject. It’s called “VDI and why performance matters”

Finally, you need the glue that holds all of this together from a management and a reporting point of view. WebServices is that glue. It’s the ubiquitous “open standard” tool on which many, many application solutions have been built on. We are the only company who builds its storage management and reporting on Web Services, and have a complete WSDL to prove it. No other company epitomizes the value of WebService than Microsoft. Just go to Google “SANMAN XIOTECH” and you’ll see that the folks out in Redmond have developed their own user interface to our solution (our WSDL) to enable automated storage provisioning. HOW AWESOME IS THAT!! Not to mention, WebServices also gives you the ability to do things like develop “chargeback” options which turns the information technology department into a profit center. We have a GREAT customer reference in Florida that has done this very thing. They’ve turned their IT department into a profit center and have used those funds to refresh just about everything in their datacenter.

So those are the fundamentals. In my opinion, those are the top 3 things that you need to address before you move any further into the design phase. Once your foundation is set, then you can zero in on some of the value added attributes you would like to be able to offer as a service in the cloud. Things like CDP, CAS, De-Duplication, Replication, NAS etc.

Is Storage Performance Predictability when building VMWare Virtual Desktop (VDI) Storage Clouds important? This can also apply to Citrix and Microsoft Windows Hyper-V Virtual Desktop Systems.

Here is yet another great example of why I just love my job. Last week at our Xiotech National Sales Meeting we heard from a net-new educational customer out in the western US. They recently piloted a VDI project with great success. One of the biggest hurdles they were running into, and I would bet other storage cloud (or VDI specific) providers are as well, is performance predictability. This predictability is very important. Too often we see customer focus on the capacity side of the house and forget that performance can be extremely important (VDI boot storm anyone?). Rob Peglar wrote a great blog post called “Performance Still Matters” over at the Xiotech.com blog site. When you are done reading this blog, head over to it and check it out 🙂

So, VDI cloud architects should make sure that the solution they design today will meet the requirements of the project over the next 12 months, 24 months and beyond. To make matters worse, they need to consider what happens if the cloud is 20% utilized or if/when it becomes wildly successful and utilization is closer to 90% to 95%. The last thing you want to do is have to add more spindles ($$$) or turn to expensive SSD ($$$$$$$$$) to solve an issue that should have never happened in the first place.

So, let’s assume you already read my riveting, game changing piece on “Performance Starved Applications” (PSA). VDI is ONE OF THOSE PSA’s!!! Why is this important? If you are looking at traditional storage (Clariion, EVA, Compellent Storage Center, Xiotech Mag3D, NetApp FAS) arrays it’s important to know that once you get to about 75% utilization performance drops like my bank account did last week while I was in Vegas. Like a freaking hammer!! That’s just HORRIBLE (utilization and my bank account). Again you might ask why that’s important? Well I have three kids and a wife, who went back in to college, so funds are not where they should be at…..oh wait (ADD moment) I’m sure you meant horrible about performance dropping and not my bank account. So, what does performance predictability really mean? How important would it be to know that every time you added an intelligent storage element (Xiotech Emprise 5000 – 3U) with certain DataPacs you could support 225 to 250 simultaneous VDI instances (just as an example) including boot storms? This would give you an incredible ability to zero in on the costs associated with the storage part of your VDI deployment. This is especially true when moving from a pilot program into a full production roll out. For instance, if you pilot 250 VDI instances, but you know that you will eventually need support for 1000, you can start off with one Emprise 5000 and grow it to a total of four elements. Down the road, if you grow further than 1000 you fully understand the storage costs associated with that growth, because it is PREDICTABLE.

What could this mean to your environment? It means if you are looking at traditional arrays, be prepared to pay for capacity that you will probably never use without a severe hit to performance. What could that mean for the average end user? That means their desktop boots slowly, their applications slow down and your helpdesk phone rings off the hook!! So, performance predictability is crucial when designing scalable VDI solutions and when cost management (financial performance predictability) is every bit as critical.

So if you are looking at VDI or even building a VDI Storage Cloud then performance predictability would be a great foundation on which to build those solutions. The best storage solution to build your application on is the Xiotech Emprise 5000.