Book Excerpt: Building SANs with Brocade Fabric Switches Page 6

How Much Downtime Is Acceptable to Production
Components During Implementation?

It will likely be necessary to shut down some
existing production devices during implementation, to ensure a safe transition
onto the SAN. For example, you might have to shut down a host to install an
HBA. Determine how much downtime is acceptable for each host, and at what times
this can occur. Generally, you should try to schedule more downtime than you
think you need to ensure that any unforeseen issues that arise during the
implementation can be handled within the downtime window.

How Much Downtime Is Acceptable for Routine
Maintenance? How Much Downtime Is Acceptable for Upgrades and Architectural
Changes?

These two questions are intimately related,
becauseto an end userthere is really no difference between downtime to a
production system for maintenance, and downtime for an upgrade. Once systems
are in production, you will want to keep them running as much as possible.

Many upgrades can be accomplished with zero downtime
by using a double- or triple-redundant fabric architecture. No matter how well
you plan the upgrade and maintenance processes beforehand, you will need to
shut down specific hosts on occasion. For example, you might want to upgrade an
HBA driver, which would typically require a reboot.

Note: Wherever possible, a redundant fabric architecture
should be used. This will ensure the best performance and reliability, and will
simplify maintenance tasks. In a redundant fabric architecture, every host has
at least two paths to every storage device it connects to, and these paths
traverse two completely unconnected fabrics. While it might appear on the surface
to be more expensive, if hosts are to be dual-attached anyway, it is actually
less expensive to attach them to two separate fabrics than to use one larger
fabric, or a director-class switch. This does not even include the downtime ROI
calculation, which, in high-availability environments, will usually overshadow
the entire cost of the SAN. More details about redundant and resilient fabrics
are provided in Chapter 7.

You should therefore determine in advance when you
will be able to schedule downtime for every host and storage array, and for the
fabric itself. You might not have to use every scheduled outage, but having
them available to you when you do need them is essential.

One way to do this is to make a list of applications
and services provided by the hosts on the SAN, and determine an owner for each.
Take your list of SAN devices and map these devices to the applications and
services they affect. This will provide a mapping of application/service
owners, who are typically responsible for scheduling downtime, to devices that
typically require downtime. Have each owner approve the downtime calendar for
each device that affects his or her service.

The mapping of owners to devices should be kept up to
date as changes in personnel, applications, and/or SAN infrastructure occur.

When Do You Need Each Piece of the Solution to Be
Complete?

Once you have a table detailing which of the
initiators communicate with which targets, you can begin to create a timeline
for the project. Other members of the core team will tell you something like,
"the customer database application must be online by mid-June." It is your task
to define which SAN components you need to accomplish this, and to develop a
timeline for adding these components that meet their requirements.

This is a high-level list of some of the questions
that should appear on a SAN design interview form:

What overall business
problem are you trying to solve?

>What are the business
requirements of the solution?

What is known about the
nodes that will attach to the SAN?

Which SAN-enabled
application do you have in mind?

Which components of the
solution already exist?

Which components are
already in production?

Which elements of the
solution need to be prototyped and tested?

What equipment will be
available for testing?

How and when are
backups to be done?

What will the traffic
patterns in the solution be?

What do we know about
current performance characteristics?

What do we know about
future performance characteristics?

How much downtime is
acceptable to production components during implementation?

How much downtime is
acceptable for routine maintenance?

How much downtime is
acceptable for upgrades and architectural changes?

>When do you need each
piece of the solution to be complete?

Conduct a Physical Assessment

You should now have the location of every piece of
hardware that currently exists. In addition, you should know where each piece
of hardware in the eventual SAN will be located.

Look at each piece of hardware. Make sure that it
does exist, and has all necessary pieces to function. This could include things
like power cords, keyboard, mouse, monitor, Ethernet card, Ethernet cable,
HBAs, and Fibre Channel cables. Note the physical dimensions of the hardware,
and its power/cooling requirements. Does it rack mount? Does it have a network
interface? How many Fibre Channel interfaces does it have? How much does it
weigh? You should already have this information from the interview process, but
you should verify that the information you were given is correct.

Go to each location where SAN equipment or nodes will
be installed, and again check to see that your information was correct. Notice
how the equipment will fit into the space available. Notice how the equipment
will enter the building. You should also have a meeting with the person in
charge of the facility to discuss power, cooling, and equipment locations.

AuthorsJosh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.