If you haven’t guessed from the blatant pop culture reference in the title of my blog , I spent the first week of June at the IBM Edge storage conference (and I promise if you keep reading that I’ll refrain from making any puns on the Edge theme – despite the temptation to bring up a favorite Irish rock band hero ). Anyway, it would hardly be appropriate to mention another band when Foreigner did such an awesome job rockin’ the conference. Who knew when I was growing up that the 80’s would produce the greatest rock ballads of all time ?

Anyway, it’s been a great week at IBM Edge, hearing about all the latest advances in storage technology; in case you missed the talk on SVC Stretch Clusters as an example of the ODIN reference architecture, let me say a few words about it here. This will get a bit technical, but don’t worry…we’re not going to have a quiz at the end.

The problem we’re trying to solve is VM mobility over extended distance, and multi-site workload deployment across data centers. VM mobility not only improves availability of your applications, it’s a more efficient way to use limited storage resources. The most common reason for using this approach typically involves some form of business continuity or disaster avoidance/recovery solution, including such planned events as migrating one data center to another or eliminating downtime due to scheduled maintenance. But given an increasingly global work force, there are other good reasons to explore VM mobility. Many clients are realizing that this approach provides load balancing and enhanced user performance across multiple time zones (the so-called “follow the sun” approach). Others are realizing that by moving workloads over distance, it’s possible to optimize the cost of power to run the data center; since the lowest cost electricity is available at night, this strategy is known as “follow the moon”.

IBM has announced a software bundle featuring Storage Volume Controller (SVC), which includes Stretch Clustering over long distance. This provides read/write access to storage volumes located far apart from each other, enabling data replication across multiple data centers. SVC works in concert with Tivoli Productivity Center (TPC) to manage your storage, and integration with VMWare’s products like VMotion and vCenter enables transparent migration of virtual machines and their corresponding data or applications.

Let’s consider two data centers separated by up to 300 km (supported in SVC 6.3), and interconnected by a traditional IP network such as the internet or by dark optical fiber. We require many of the features for an Open Datacenter with an Interoperable Network (ODIN) for this solution, including lossless Ethernet fabrics, automated port profile migration, Layer 2 VLANS in each location, and an intersite Layer 2 VLAN supporting MPLS/VPLS (preferably with a 10G or 100G Ethernet line speed between sites, since the SAN infrastructure is likely running either 8G or 16G Fibre Channel). An SVC split cluster uses industry standard Fibre Channel links for both node-to-node communication and for host access to SVC nodes, so your production sites must be connected by Fibre Channel links or FC-IP.

Generally a business continuity solution will define one physical location as a failure domain, though this can vary depending on what you’re trying to protect against; a failure domain could also consist of a group of floors in a single building, or just different power domains in the same data center. In order for SVC to decide which storage nodes survive if we lose a failure domain, the solution uses a quorum disk (a management disk that contains a reserved area used exclusively for system management). At a minimum, you should have one active quorum disk on a separate power grid in one of your failure domains; up to three quorum disks can be configured with SVC, though only one is active at any given time. Metro mirroring is recommended for this type of solution; a maximum round trip delay of 80 ms is supported (note that routing is required, since the fabrics at each location are not merged).

Connectivity between sites may take several forms. First, if the regular Internet provides sufficient quality of service (QoS) and meets your business objectives for recovery time, recovery point, etc., the IBM SVC uses industry standard protocols (FC-IP) in conjunction with a Brocade switch infrastructure to transport storage over distance. This is typically a low cost option, though you might require multiple circuits with load balancing (a so-called virtual trunk). Second, it’s possible to run a Brocade inter-switch link (ISL) between SVC nodes (with SVC 6.3.0 or higher). Brocade switches provide ISL options including consolidation of up to four ISLs at 4 Gbit/s each (creating a 16G trunk), or up to eight ISLs at 16 Gbit/s each (creating a 128G trunk). Buffer credit support for up to 250 km (nearly the SVC limit) is available. SVC supports SAN routing (including FC-IP links) for intercluster storage connections. Finally, note that you can connect multiple locations with optical fiber and use a variety of protocol-agnostic wavelength division multiplexing (WDM) products in this solution. This may provide better QoS or dedicated bandwidth for large applications. A 10G passive WDM option is available on some Brocade switches (with options such as in-flight compression and encryption), or a stand-alone WDM product can be employed (IBM has qualified many such solutions, including those from ODIN participants Adva, Ciena, and Huawei). Your local service provider may also offer a variety of managed service backup options using a combination of these features. Attachment of each SVC node to both local and remote SAN switches (without ISLs) is typically done in this case. Both the ISL and non-ISL approaches are known as split I/O groups.

IBM SVC storage manager works in concert with vCenter through API plug-ins. This includes VADP (which provides data protection for snapshot backups at the VMware-level rather than the LUN level, allowing you to concentrate on the value of the VM rather than the physical location of the associated data). Performance improvements can be achieved by offloading some functions to the storage hypervisor, as well. The storage hypervisor includes a virtualization platform, controller, and management (TPC supports application aware snapshots of your data through Flash Copy Manager). At the management level, IBM also allows the storage hypervisor components to be managed as plug-ins for vCenter. VM location can be managed through vCenter with Global Server Load Balancer (GSLB), which works in concert with a Brocade API plug-in. Further, vCenter is integrated with Brocade Application Resource Broker (ARB), which can report VM status back to a Brocade ADX switch. vCenter and GSLB manage both VM and IP profiles, performing intelligent load balancing to redirect traffic to the VM’s new location.

With this combination of ODIN best practices, IBM SVC, and Brocade SAN/FC-IP connectivity, your data can rest easy, wherever it happens to be (and so can you).

For those of you who don’t know what SDN and OpenFlow mean, beyond being some of the hottest buzzwords in the networking industry right now, you can check out the appropriate volume of the Open Datacenter Interoperable Network (ODIN) reference architecture for a detailed introduction to this topic and the problems it addresses. For those who just need a quick refresher, software-defined networking is an approach which allows the basic data flows through a switch to be manipulated through an external controller. It’s an industry standard approach being led by the Open Network Foundation (ONF), a consortium run by the world’s largest network users (Google, Facebook, Verizon, and more). OpenFlow is a relatively new industry standard which separates the data plane and control plane of a switch, creating flow table abstractions (in other words, you can match data flows based on content of the packets and perform actions associated with each flow match; if you don’t assign a flow, traffic can be blocked or filtered using this technique). Optimal paths through the network are defined by the OpenFlow controller, rather than some proprietary software within the switch.

One of the potential benefits of OpenFlow is that it allows you to innovate at Internet speeds, by just changing the software and not replacing or reconfiguring the switch hardware. There are still open questions about just how large an OpenFlow controller can scale, how many controllers we need, etc. Marist College has created an SDN lab which will contribute to the OpenFlow community, support research around SDN, and possibly support compliance testing in the near future. They are engaged with some large OpenFlow switch providers (including IBM) and some interested OpenFlow adopters (to be named later) to investigate use cases and performance limitations of the current OpenFlow protocol. Their current lab environment includes four IBM G8264 OpenFlow enabled 10/40G switches in a spine-leaf configuration, running under an open source FloodLight controller. These switches interconnect a server farm based on IBM x86, Power, and System Z enterprise servers. Many of the x86 servers run the VMWare hypervisor and the IBM 5000v virtual switch. The servers are connected via a separate Fibre Channel SAN to various enterprise storage devices.

One of Marist’s early contributions has been to create an open source FloodLight administrative control panel (FACP) that can be used for network administration. The FACP eliminates the need to write Python script to control the switches, thereby reducing management complexity. FACP provides an abstraction of the network, and a configuration application can be build against this abstraction. At the conference, Marist held a demo showing how this controller can provision quality of service and routing of Layer 2 & 3 VLANs in the network. Manipulation of firewall ACLs is also possible, and future extensions may include MPLS and other WAN related protocols. Ongoing work in this area is focused on creating a static flow pusher, which will allow a static programmable interface to write scripts which support flow tables across the network using the FloodLight rest API.

Further investigation will include such topics as demonstrating multi-vendor interoperability under a common FloodLight controller, and exploring the limits of scalability and security associated with OpenFlow networking. Keep up with their latest work and see their presentation from the NSF conference .

Want to suggest another TLA (three letter acronym) for my list ? Comment on this blog entry below, or drop me a line on my Twitter feed.