Building your first storage area network (SAN) may seem like the biggest hurdle to realizing the benefits of storage networking, but if you don't build it correctly, you may find that scaling your SAN infrastructure is even more difficult.

That's what we learned at Intuit Inc. when we decided to implement a SAN three years ago. With business requirements and storage doubling - sometimes tripling - every year, the advantages of achieving greater storage resource utilization through centralization, consolidation and availability were incentive enough to go ahead and be one of the early adopters of SAN technology. As our SANs grew from around 20TB, 128 ports and 60 DLT tape drives to approximately 200TB, 900 switch ports and 140 DLT drives, we encountered unforeseen problems that can plague you if you're not prepared.

One of the challenges was sharing SAN resources and achieving 100% utilization while trying to avoid both high costs and a large team to manage the SAN. We also had to figure out how to protect our initial investment while expanding - you don't want to have to throw out the infrastructure you built when you were relatively small in order to expand.

You can avoid these landmines by not boxing yourself in with a SAN design that can't scale effectively. Understanding what that means concretely, however, is far from obvious.

The right stuff

SAN veterans can frequently be heard muttering "need more tools." You may think that only concerns those with large, complex environments, but bringing in the right tools early can yield immediate benefits in several areas.

Interoperability. When adding or upgrading new hardware or software to the SAN, it's important to know all the different firmware and driver versions for the switches, storage and HBAs deployed to verify everything is supported. Going to each server to manually check the HBA firmware and driver version in a large SAN, spread across multiple networks of various security, can become tedious. A tool to provide us with an accurate report was necessary. Without this information, the network would potentially be at risk when changes were made to the environment.

Planning. When planning for scheduled maintenance of your SAN, it's important to know the downstream dependencies for scheduling downtime if necessary or identifying the paths that need to be failed over in each fabric one at a time in order to prevent the potential downtime. The critical issue here is to identify which applications will be disrupted by breaks in the fabric. A good tool will help you avoid the manual task determining the paths for each of the servers and which storage arrays they are being served from.

Scaling. When implementing a SAN or redesigning your current one it's helpful to have a visual diagram of your environment. Without a visual network node diagram, it can be hard to scale or redesign the current architecture effectively.

Reclamation. As projects are added, removed and applications modified, storage can become unused or retired without the storage being reclaimed. Without a reporting tool to help manage the allocation and usage of the disks in the array, there could potentially be big dollars going to waste.

There is also money to be saved for storage that's allocated and in use for a particular project by looking at the size of the application and calculating the percent used to the overall allocated. This can be especially true for databases.

Avoid false economies Our SAN implementation began as a few isolated monolithic and modular storage arrays with redundant fabrics made up of a few switches. We had mostly Unix servers of mixed operating systems connected to the monolithic arrays while Intel servers were connected to the modular arrays. While the modular arrays were less expensive, at the time they didn't yet have the availability, caching and ability for multiple mirror copies and therefore were mostly used for the smaller applications such as databases running on Intel platforms. Although this changed over time as the software for the modular arrays became competitive with the features of the monolithic arrays, we continued to use the monolithic arrays for the most critical applications.

Ultimately, decisions at a higher level forced us to move to a method for replicating data, which meant moving more apps to monolithic storage and scrapping some of the modular systems. Always try and anticipate your future needs when you chose your primary storage (see "Scaling backup,").

Some of our initial SAN implementations were performed by adding Fibre host bus adapters (HBAs) to the servers and then migrating the direct-attached servers from maxed arrays to new ones with Fibre switches placed in between. These isolated SAN islands were designed and laid out in a simple fashion. Management was manual, but relatively easy. A few Excel spreadsheets showed the switch and disk configurations for each of the servers. The infrastructure was nothing more than several strands of fiber laid throughout the data center under the floor in the network trays. Switches were racked and located centrally between the servers and storage. Backups were performed on a daily basis. Soft and hard zones were configured and our SAN implementations were a success.

This initial configuration worked well while things were relatively small and isolated. However, some of the benefits of a SAN weren't being fully utilized in this design. New servers were added to these SAN islands, but once we grew beyond the capacity of the switches or arrays in the initial design, the various components of the SAN started to become obstacles. One by one, each component needed to be addressed.

Design your SAN with a topology that scales regardless of how small you initially start out. Spending money up front will save you both soft and hard costs down the road.

The soft costs saved include the amount of time it takes to manage, redesign and then implement a core-edge topology later down the road. The hard costs are the longer investment protection over time of the initial hardware purchase.

You can protect your initial investment if you correctly anticipate faster hardware speeds for tape, switch, servers and storage. With speeds increasing and the ability to create trunks between your switches, you'll have more flexibility if you've designed an architecture that lets you move the older and slower technology out to the edge and implement the new faster hardware at the core. You'll also reduce the amount of downtime you experience in the future. With our SAN islands, we had to bring down the fabric in order to merge islands - and perhaps the servers as well - to bring firmware levels in sync, or upgrade them to the latest version to support more or newer drives.

We found that it was better to schedule downtime when making most major changes to a SAN. Due to the infancy of SAN technology at the time, interoperability along with older versions of software firmware and drivers could - and had - resulted in unplanned outages. Ensuring data integrity and uptime to our customers was the main objective, and therefore, scheduling the downtime for maintenance was sometimes necessary.

SAN maturity and issues with interoperability have improved, so you may not need to bring everything down to make changes now.

But without the right architecture, you may be forced into awkward configurations just to utilize all the available resources across multiple islands. In our case, we would sometimes end up with small switches linked in daisy-chain fashion to each other through a single ISL in order to achieve this. As a result, our SAN was vulnerable to single points of failure.

I don't recommend the daisy-chain approach in general. Spend the money, buy more switches and design an architecture that will allow the availability, performance and flexibility necessary when SANs become larger and need to scale. For us that meant trunking ISLs when possible as well as building out core-edge topologies that scaled. This also let us take advantage of storage resources by merging fabrics when necessary, without having to schedule downtime. This approach also allowed us to build on our earlier investment in smaller switches - we introduce newer, faster, larger switches at the core and push the older, smaller ones to the edge or to development environments in some cases.

Scaling backup

Backups are key to ensuring data integrity, and yet one of the most unrewarding jobs. Don't get boxed in with a tape technology, tape library or network that won't allow you to scale. Avoid practices that won't scale either, such as extensive use of homegrown scripts.

Tapes. Choose a tape technology that will allow you to protect your current investment in media. Although this may not be entirely possible, think in terms of flexibility and the business requirements. Media can be a huge cost for the company - when your company has over $1 million invested in DLT tapes, it's not easy to justify switching to a new tape technology that's not backwards compatible to your existing media, as we found out now that we want to graduate from DLT.

If you're using a specific media for your backup and recovery make sure that a plan is in place to migrate to the faster media as the drive speeds increase. Switching to faster media will also allow you to increase the size of your SAN without having to add more tape drives or tape libraries to accommodate the increased storage - which can be big expense to purchase and maintain.

Libraries. Select a tape library that will allow you take advantage of new tape drives and media. Your library should be flexible and will hold more than just one type of drive and media. It's more cost effective to buy a tape library that has the capacity to hold more drives than originally purchased than to order a tape library fully populated with drives. In the latter scenario you would be forced to upgrade the drives to a faster technology, which may require abandoning existing media at a huge loss for the company, or make another tape library purchase. Purchasing a larger library that's not fully utilized may cost a little more up front, but having the ability to add more drives, either same or faster, and mix the media within the library could potentially save thousands of dollars down the road.

You might have to add an additional library to the SAN, which may or may not be possible depending on your SAN design. Adding more tape libraries also increases the management complexity by requiring you to balance the backups over several tape libraries.

Design. How you design your backup and recovery depends on the methodology chosen.We chose client-free backups, which to us means we would use third mirror copies that could be split off and made visible to the backup server, which is connected to the external media device.

Even though there are several ways to design and configure a backup architecture, it is important that with whatever the decision, it will be inline with the three-year corporate strategy. By better aligning technical designs with the business strategy for the company several future costs can be avoided: The people costs for data migrations and management, the capital expenditures for new network, license and hardware purchases.

While it is possible to build out a core-edge topology using smaller switches, it's more cost-effective to use bladed directors with a large port count if high availability and increased flexibility are the goal. However, whether using smaller switches or large director switches to build your core-edge topology, the key is to pick an architecture that will scale for your environment.

While architecture is crucial, you can also avoid future costs by building out the right SAN infrastructure and how you connect servers to storage resources.

Start out small, and run fiber under the floor for a few hosts. But as you grow and have to move, add, or retire servers, switches, tape libraries and storage arrays, maintaining the fiber under the floor can be cost prohibitive. Troubleshooting a connection problem can be difficult with so many strands running on top of each other in a spaghetti fashion. Labels can become inaccurate or removed altogether from handling and moving the cables so many times.

If you don't have an infrastructure in place, you could end up with a lot of wasted fiber under the floor that could involve downtime to pull. Build the storage network like any other network and include patch panels with bundled fiber running between and distributed throughout the data center(s) in a design where the anticipated length from any server, storage array, switch and tape library can be calculated and preordered.

Understand the soft costs One of the biggest soft costs when implementing and administrating a SAN are the people. This is probably because companies such as Intuit are telling vendors that such a tool is needed, a number of vendors are working on ways to easily administer the SAN from a centralized location and eliminate the manual bookkeeping tasks.

As our SAN grew from 20TB to 50TB to 200TB and over 900 switch ports - the old ways of managing were no longer practical. The spreadsheets we started with that detailed how the zones and disks were configured to the hosts wouldn't scale effectively. Again, this worked initially but keeping the documents updated always seemed to take a back seat to keeping the trains running.

Choose a SAN management tool early in your SAN deployment to cope with future growth. Good tools have widespread benefits in the area of interoperability, planning, scaling, and space reclamation (see "The right stuff").

If there's one thing I've learned from our whole experience, it's that basic technologies change rapidly. Absorbing them while running a real environment means you have to have good policies, procedures, design and management tools - so don't wait too long until you do that.

Now that we've arrived at a scalable architecture and infrastructure, we know what we need for management tools, and implementation is our biggest challenge. We're currently focusing on three main pain points. Implementing an enterprise resource management tool, better and more efficient ways for data archiving and remote data facility replication for achieving ensuring higher levels of data integrity and usability in the event of a disaster.

Change is still the order of the day. But I believe that with the current state of the art and the lessons that pioneers such as Intuit have learned, many companies can start at a reasonable level and grow into the many terabyte range while preserving most of their initial investment.

0 comments

Register

Login

Forgot your password?

Your password has been sent to:

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy