Nexus 1000V with FCoE CNA and VMWare ESX 4.0 deployment diagram

What does the virtual data center environment look like when you have a CNA (Converged Network Adapter) installed in a ESX 4.0 server running the Cisco Nexus 1000V virtual switch? I decided to draw it out in a diagram and came up with this:

(Click the diagram to view a larger image in a new browser window)

Some key observations of importance:

The version of ESX running here is ESX 4.0 (not yet released)

The Nexus 1000V software on the physical server acts like a line card of a modular switch, described as a VEM (virtual ethernet module)

The Nexus 1000V VEM is a direct replacement of the VMWare vSwitch function

The Nexus 1000V VSM (virtual supervisor module) acts like the supervisor engine of a modular switch

One Nexus 1000V VSM instance manages a single ESX cluster of up to 64 physical servers

The form factor of Nexus 1000V VSM can be a physical appliance or a virtual machine

The network administrator manages the Cisco Nexus 1000V (from the VSM) as a single distributed virtual switch for the entire ESX cluster

Each virtual machine connects to its own Virtual Ethernet (vEthernet) port on the Nexus 1000V providing the network administrator traffic visibility and policy control on a per virtual machine basis. Virtual machines can now be managed like physical servers in terms of their network connectivity.

In this diagram VM1 connects to interface VEth1 on the Nexus 1000v and keeps the same VEth1 interface even when it’s VMotion or powered up on a different physical server.

The VMKernel vmknic interface also connects to Nexus 1000V on a Virtual Ethernet port, ‘interface vEth 2′ for example (not shown here). Same goes for the Service Console vswif interface, it also connects to Nexus 1000V on a virtual ethernet port (not shown here).

The network administrator defines Port Profiles which are a collection of network configuration settings such as the VLAN, any access lists, QoS policies, or traffic monitoring such as NetFlow or SPAN.

In this example Port Profile BLUE could be defining access to VLAN 10 and enabling NetFlow traffic monitoring, for example.

Once enabled, Port Profiles are dynamically pushed to VMWare Virtual Center and show up as Port Groups to be selected by the VMWare administrator

The VMWare administrator creates virtual machines and assigns them to Port Groups as he/she has always done. By selecting VM1 to connect to Port Group BLUE in Virtual Center, the VMWare administrator has effectively connected VM1 to VLAN 10, along with any other security, monitoring, or QoS policies defined in Port Profile BLUE by the network administrator

The network administrator does not configure the virtual machine interfaces directly. Rather, all configuration settings for virtual machines are made with Port Profiles (configured globally), and it’s the VMWare administrator who picks which virtual machines are attached to which Port Profile. Once this happens the virtual machine is dynamically assigned a unique Virtual Ethernet interface (e.g. ‘int vEth 10′) and inherits the configuration settings from the chosen Port Profile.

The VMWare administrator no longer needs to manage multiple vSwitch configurations, and no longer needs to associate physical NICs to a vSwitch.

The Nexus 1000v VSM is for control plane functions only and does not participate in forwarding traffic.

If the Nexus 1000v VSM goes down it does not disrupt traffic between physical servers and virtual machines.

If an ESX host reboots or is added to the network, the Nexus 1000v VSM must be accessible.

The Nexus 1000v VSM can be deployed redundantly, with a standby VSM ready to take over in case of failure.

The ESX server has a 10GE connection to a physical lossless Ethernet switch that supports Data Center Ethernet (DCE) and Fibre Channel over Ethernet (FCoE), such as the Cisco Nexus 5000.

The Cisco Nexus 5000 provides lossless ethernet services for the FCoE traffic received from the CNA. If the Nexus 5000 buffers reach a high threshold an 802.3x pause signal with the CoS equal to FCoE will be sent to the CNA. This per CoS pause signal tells the CNA to pause the FCoE traffic only, not the other TCP/IP traffic that is tolerant to loss. The default CoS setting for FCoE is COS 3. When the Nexus 5000 buffers reach a low threshold, a similar un-pause signal is sent to the CNA. The 802.3x per CoS pause provides the same functionality as FC buffer credits, controlling throughput based on the networks ability to carry the traffic reliably.

The CNA and Cisco Nexus 5000 also support 802.1Qaz CoS based bandwidth management which allows the network administrator to provide bandwidth guarantees to different types of traffic. For example, the VMotion vmkernel traffic could be given a minimum guaranteed bandwidth of 10% (1GE), and so on.

Fibre Channel HBA’s are not needed in the physical server, as the Fibre Channel connectivity is supplied by a Fibre Channel chip on the CNA from either Emulex or Qlogic (your choice)

Individual Ethernet NIC’s are not needed in the physical server, as the Ethernet connectivity is supplied by a Ethernet chip on the CNA from either Intel or Broadcom.

The single CNA appears to the ESX Hypervisor as two separate I/O cards, one Ethernet NIC card, and one Fibre Channel HBA.

The ESX Hypervisor uses a standard Emulex or Qlogic driver to operate what it sees as the Fibre Channel HBA.

The ESX Hypervisor uses a standard Intel ethernet driver to operate what it sees as the Ethernet NIC.

ESX 4.0 is not required to use CNA’s and FCoE. FCoE can be deployed today with ESX 3.5 update 2.

ESX 4.0 is required for Nexus 1000V.

CNA’s are not required for Nexus 1000V.

The Nexus 1000v has no knowledge of FCoE and does not need to support FCoE because FCoE is of no concern to the Nexus 1000V deployment. To assert this fact, consider that the Nexus 1000v operates no differently in a traditional server that has individual FC HBA’s and individual Ethernet NIC’s. The Nexus 1000V uses the services of the Ethernet chip on the CNA and is unaware that the CNA is also providing FC services to the ESX host. Additionally, the virtual machine has no knowledge of FCoE.

The Menlo ASIC on the CNA guarantees 4Gbps of bandwidth to the FC chip. If the FC chip is not using the bandwidth, all 10GE bandwidth will be available to the Ethernet chip.

More notes to come…. please check back again later for updates. (Updated diagram and notes 1/7/09)

About Brad Hedlund

Brad Hedlund is a member of the technical staff at Amazon Web Services. Brad’s background in data center networking begins in the mid-1990s with a variety of experience in roles such as IT customer, systems integrator, architecture and technical strategy roles at Cisco, Dell, and VMware, and speaker at industry conferences. CCIE Emeritus #5530.

This is the most understandable description I have seen so far and has some articulation points that I haven’t been able to put together until now. Fully understand why you need the 5000 now (in redundant fashion) and what can be controlled through VSM and why it is of benefit because of ease of management (No longer manage multiple vSwitch configs for each.

A couple thing I am still trying to piece together is the CNA and then this may tie it all in for me. Is that a SW driver for current broadcom and Intel server nics? Are there certain NICS that have the Menlo ASIC?

What about blade server switches. Can we still do all of this if that blade server switch is a HP 10GB? I guess I am not sure how blade servers allocate ethernet resources virtually to the blade servers themselves. Is it directly correlated to the blade center switch chipset or separate? Can we get all the same features/functions if the blade server switch is Cisco or HP?

Clearly see the need for 10GB because of the FCOE Emulex or Qlogic virtual separation and 4GB guarantee for the FC.

Ryan,
There are currently two hardware based CNA’s on the market, one from Emulex, the other from Qlogic, both of which are PCIe cards that install into rack mount servers.
The Emulex and Qlogic CNA’s have an FC chip, Ethernet chip, and the Menlo ASIC (developed by Cisco).
The third option is Intel’s Oplin 10GE NIC that has an Ethernet chip (of course), but no FC chip, and no Menlo ASIC. The Intel Oplin NIC performs FCoE in software.

There are no blade server switches on the market right now (either Cisco or HP) that can support FCoE traffic because there is no blade server switch on the market that has a lossless Ethernet architecture and support for DCE. Furthermore, there is no 10GE pass-through module available for the HP c-class. Additionally, there are no CNA mezzanine cards available for HP c-class.

The solution shown in my diagram above is available today with rack mount servers, not blades. Currently rack mount servers are more flexible in terms of your options for networking architectures.

As for HP blade switches, I would be very surprised to see any lossless Ethernet FCoE and DCE capable blade switches from HP any time soon given that HP has no background and no expertise in Fibre Channel, and they are just now starting to get serious about Ethernet. It does not appear that FCoE and unified fabrics are at all part of HP’s data center strategy (no surprise there).

Cisco can deliver blade switches that are FCoE capable. How soon or even if those switches are available for HP blade systems in the future is something that I am sure the two companies are negotiating as we speak.

All I can say is that 2009 will be a very exciting year for data center virtualization and unified fabrics. It will be fun to watch, and even more fun to participate.

FCoE still amuses me, I fail to see or understand why anyone would want, short of existing investment, to use FCoE (or FC) in a VMWare implementation. FCoE with VMFS is obviously less complicated than traditional FC, it still seems to me like the two combined are just “recreating” NFS. Shared file system over an IP network, sounds like NFS to me. When you bring 10GE into the mix why waste your time with the complexity, and so far unproven FCoE when 10GE and NFS works so well with VMWare?

Sure there are features in the current VMWare product suite that don’t “officially” support NFS (Storage VMotion for one), but that’s just a matter of catch-up. I fully believe that HP blades with their new FLEX-10 Virtual Connect using NFS on Netapp is a better, less costly and easier to implement VMWare solution.

I’d love to be convinced otherwise. That said, the Cisco 1000v is pure genius, it will be nice to have a “real” switch for VMs. I just wonder if adding a “real” switch will lead to more “real switch” problems…

Joseph,
Infiniband has nice technical specs but didn’t take off as a unified fabric solution mainly because it’s a whole different network technology that most people do not understand. Most people know Ethernet, and some know Fibre Channel. So why throw a third unknown network technology into the mix when Ethernet (Data Center Ethernet) does the job? Now that lossless Ethernet switches (Nexus 5000) and Data Center Ethernet enhancements are available to the market, Infiniband will be reserved to applications that require nanoseconds of latency such as grid computing. And even in the grid computing space, 10GE and Data Center Ethernet are starting to gain interest.

Paul,
The problem with HP Virtual Connect FLEX-10 becomes apparent precisely in a VMWare deployment with Nexus 1000V. The Nexus 1000V is a perfect example of how the data center network has moved inside the physical server. Where as Virtual Connect tries to push the network not just out of the physical server, Virtual Connect tries to push the network out of the entire blade enclosure — going squarely against the trends of data center virtualization.

For example, who manages the Nexus 1000V? The network team.
Who manages the Virtual Connect? The server team.
Who manages the physical network? The network team.

So now you have a Cisco-HP-Cisco sandwich, and a NetworkTeam-ServerTeam-NetworkTeam sandwich in terms of the data center networking operations, configuration, and troubleshooting.

Are you saying that there are more people really know 10GbE and DCE/Unified Ethernet than know InfiniBand? There really is no unified fabric that has ‘taken off’ yet as we are in the early stages.

As you mentioned, IB provides for very small latencies. This could (and in some cases already does) provide for a more ‘live’ and ‘gridlike’ evolution of the Virtualization platform.

VMotion can already be configured to take advantage of this low latency for lightning-fast migrations. I wouldn’t be surprised if VMware, Citrix (Xen), Oracle (Xen) comes out with a way to leverage IB for creating larger ‘multi-host, single image’ virtualization platforms. I have heard that some larger hosting companies are already doing this with VMware.

Also, FCoIB is gaining some momentum as a solution for high-IO environments (aka DB, Virtualization). This is by some big names in storage.

Brad, I get the need to simplify things and the “Network guys” “Server Guys” battle has raged for far too long. That said, Flex-10 can be used like a giant 10GE passthru giving the 1000v control on the inside of the server. I’m just not sure if having MORE Cisco switches is the way to make things easier.

Joseph,
Yes, I am saying that more people know 10GE than Infiniband. 10GE and DCE is largely the same good old Ethernet everybody knows and loves, only with a few enhancements that are easy to learn.
Infiniband as a unified fabric solution is going nowhere, fast. Cisco, a major Infiniband player, has shifted all focus to Data Center Ethernet, while other Infiniband vendors are facing financial trouble.
If I was a gambling man I wouldn’t have my money in Infiniband right now.

Brad, Virtual Connect can be used as a switch, but it doesn’t have to be. You can tell it to just take VLANs on the 1000v (or VMWare Virtual Switch) side and pass them thru to the switches in your infrastructure be that a standard 6500 or VSS or whatever.

I don’t think, 1000v included, that anyone has created the best solution yet.

Paul,
Passing through VLANs is still a switching operation. Virtual Connect receives the frame, stores it in a buffer, and makes a forwarding decision. By definition that is a switch and with it comes switching responsibilities such as QoS, high availability, network security, network configuration, and network troubleshooting.

How does Virtual Connect handle congestion? How does Virtual Connect prioritize important traffic classified by Nexus 1000V? How does Virtual Connect affect traffic when it’s software is upgraded? What other configuration changes to Virtual Connect might affect network connectivity for Nexus 1000V and VMWare?
How does Virtual Connect secure traffic from one server to the other? How are Virtual Connect’s network settings configured?

These are all network issues that the server team will be responsible for and questioned by the network operations team.

One of the key value propositions behind Nexus 1000V is to provide clear continuity and delineation of operational roles and responsibilities (Server and Network) of VM creation and network connectivity.

However in a Nexus 1000V deployment, Virtual Connect becomes a dumb bump-in-the-wire, a point of obscurity, a question mark, an unknown, a black box, a man in the middle… a SWITCH, and not a very good one at that. You would be much better off with a real pass-through module or a real Cisco switch.

You indicate that the VSM needs to be accessible when an ESX host reboots – this is perfectly understandable. However, what happens if you deploy the VSM as a VM (which is supported), and for whatever reason you have to shutdown all of the hosts in a cluster? Does the first host to come up not get any VEM configuration, even if the VSM is one of the first VMs to power on? What does this then do to the remaining ESX hosts as they come back online (no VEM on the first host ostensibly means no communication either)? Does this mean that for a truly fault-tolerant deployment you would need to have a physical host for the VSM (possibly as a ‘standby’, with the primary VSM on a VM)?

Jody,
With the current beta release of Nexus 1000V, having your VSM as a VM is supported however having the VSM VM reside within the same ESX cluster that the VSM is managing is not supported, yet. This also means that the ESX host where the VSM resides is using the traditional VMWare vSwitch.

I believe the Nexus 1000V team at Cisco intends to eventually support a configuration where the VSM VM resides within the same ESX cluster it is also managing, however I cannot comment with certainty on when, if, and how that will happen.

Do you know if having a secondary VSM running as a VM in the managed cluster will be supported at all? This would be to cover for a failure or maintenace of the primary VSM on physical hardware, with the caveat that the ESX cluster would need to remain on-line until the primary VSM is restored to service.

I know that we will get this question from clients, so that is why I ask here.

So for the HP C-Class enclosure using the BL495c G5 blades that have dual 10G Ethernet NICs, what interconnect module would you recommend if its not virtual connect? I’d like to take advantage of 10G uplinks to Nexus switches.

Dwight,
At this point you have a conundrum. The only thing you can insert into the HP C-Class that will provide a 10G server connection today is the HP Virtual Connect Flex-10. So if you absolutely need 10G to the individual blades today you have no choice in the matter.

HP claims to be a company of open standards and choice however they are certainly not giving you a choice in this matter, and Flex-10 is anything but standards based. Flex-10 is all HP proprietary technology.

So, you can make the decision to install Flex-10 today to provide a proprietary 10G link to your BL495c blade servers right now. While you will have your 10G server today you will have also locked yourself out of much better (non proprietary) choices available in the coming months.

One of these choices coming soon is a Converged Network Adapter mezzanine card for HP C-Class. This will provide a standards based 10GE connection to the BL495c that provides both Ethernet and FC connectivity over a single cable connected to an external Nexus switch. This approach will result in less cabling, fewer switches, and lower power consumption than anything HP will be able to offer with Virtual Connect.

When Nexus 1000V is widely deployed at a later date, a Virtual Connect decision today can later become a source of regret as I have already articulated in comments above.

With HP Virtual Connect you will always need separate Ethernet and FC switches in the blade enclosure along with separate Ethernet and FC cabling infrastructures, resulting in more power consumption and more cabling. Furthermore, you will loose the ability to provide a clear delineation of operational roles and responsibilities with respect to VM creation and networking.

If I were you, I would stall my server purchase for now until I had more choices available in the next 3 to 6 months.

Thanks for the insight, and you are right, I’m in a bit of a pickle. Now if the HP CNA becomes a reality, won’t they need to introduce another rear connectivity module to support it? A pass through module would require 16 to 32 cables, but we also need cable consolidation as we don’t need (or want) 16x or 32x 10G bandwidth to our servers. Plus buying many times more 10G switch port line cards than we need would be a major expense. So some type of ‘transparent’ multiplexor would be ideal that provided a 4:1 port consolidation ratio.

Dwight,
You are right that an HP CNA mezzanine would need to be accompanied by a 10G pass-through or a 10G switch device that supports FCoE forwarding. For the pass-through approach keep in mind that by using the CNA you have already provided some cable consolidation. The 16 or 32 pass-through cables you mention would be supporting all network connectivity for the blades, both FC and Ethernet. Not much different than having both Ethernet and FC blade switches each with 8 uplinks, resulting in 16 cables. You could also link your CNA pass-through connections to a top of rack Nexus 5000, keeping most of the cabling within the rack and lowering your 10G server access port costs. The price per 10G port on Nexus 5000 ($875) is much lower than the current price per 10G port on Nexus 7000 linecards ($2100).

Your thought of a “transparent multiplexor” inside the blade enclosure is right on the money. Cisco refers to this type of device as a Fabric Extender. A Fabric Extender links to an upstream switch device (such as a Nexus 5000) and acts like a remote extension of that upstream switch, thus appearing like a “transparent multiplexor”.

Wouldn’t it be cool if the HP CNA mezzanine linked to a Fabric Extender device inside that blade enclosure? No more need for both Ethernet and FC switches, just the Fabric Extender, and you get the benefit of additional cable consolidation. 😉

This is something Cisco is capable of delivering. It’s just a matter of Cisco and HP getting the proper agreements in place to make it happen.

If a physical server has two CNAs or two NICs, does the Nexus 1000V provide any means for teaming/bonding of the two (or more) physical interfaces and presenting it as a single virtual NIC to the guest OS? For high availability we always use the HP teaming agent on physical servers and configure port bonding on the Cisco switch side.

Dwight,
The traffic between the VSM and VEM’s is light control plane traffic only, so having a 10G connection for your VSM would certainly work but 1G should be fine as well.

The NIC teaming/bonding is provided by the Nexus 1000V so you will not need to fuss with HP teaming agents. Remember, the Nexus 1000V is a switch just like a real physical Cisco switch, so therefore it views the server NIC’s like it’s uplinks and the network administrator configures the bonding and redundancy just like he/she would on a physical switch.

The VMWare administrator decides how many virtual NIC’s the guest OS has when the VM is created (up to 4). This is no different than how things are done today, prior to Nexus 1000V.

Dwight,
HP made a critical error when trying to compare Nexus 1000V to Virtual Connect. Why? That document clearly showed the market that HP does not understand what the Nexus 1000V is and does. HP wrote that document under the false pretense that Nexus 1000V and Virtual Connect are somehow interchangeable and mutually exclusive, as if the customer needs to decide between Virtual Connect or Nexus 1000V. The reality is that Nexus 1000V and Virtual Connect satisfy two completely different functions. Virtual Connect represents the physical ASIC based connectivity between physical blade servers within the enclosure – its a physical SWITCH. Nexus 1000V represents the virtual switching function on the ESX hypervisor and provides the software based connectivity between virtual machines within the same physical server – a virtual machine switch.
Nexus 1000V is interchangeable and mutually exclusive with the VMWare vSwitch – not a physical switch such as HP Virtual Connect.

Wouldn’t software switching in the 1000v have the possibility of higher latency over a ASIC based solution? What happens to the 1000v performance if the physical processors become heavily utilized?

Software switching between virtual machines on the same server is what is widely deployed and accepted today with the VMWare vSwitch. When Nexus 1000V replaces the vSwitch the performance will be no worse than what you have today with the vSwitch, and all the same ESX kernel level protections from high CPU will still apply.

Is the VNTag Cisco proprietary or is it being included in a IEEE standard?

Both Cisco and VMWare jointly authored and proposed VNTag to the IEEE standards body. So yes, VNTag is on its way to being a standard. Keep in mind that Nexus 1000V has nothing to do with VNTag, dont let HP confuse you into thinking otherwise. VNTag represents a completely different solution for virtual machine networking that is not yet available. This is when an external Nexus 5000 connects directly to each virtual machine with VNTags. The Nexus 5000 hardware is ready, however the software and joint testing with VMWare is still in development.

For VM to VM traffic on the same physical server with a 1000v, are packets routed to a 5000v then back into the blade or can the switch send packets internally between VMs?

The Nexus 1000V, like the VMWare vSwitch, sends packets internally between virtual machines on the same server. There is no requirement of any external physical switch. This is another myth resulting in HP’s misunderstanding of how Nexus 1000V works.

VC is being pushed by HP like candy to a kinder garden….
There is however a way around the funny stuff happening inside the VC “world”.
Expansion blades….http://h18004.www1.hp.com/products/blades/expansion/index.html
This would enable you to hook up a couple (or 4) blades in a simple PoC lab to try out various CNAs as they become available.
Then when HP choose to ship a proper mez card you can even be lucky enough to already have tested that particular CNA chips combo and the corresponding drivers on each OS (esx,xen,hyperV,etc)
Not to mention you get a head start on performance tuning for bonding/tcp/nfs in the kernels/drivers of those setups..
Remember – we are going from 0,1ms to 0,005ms ping times
(and windows are still only showing 1ms ping minimum)

My question is – what CNAs have proved to work (best) with the various features of Cisco DCE ?

My question #2 is – can the 5020/10’s act as edge aggregation switches for old style FC switches (in the rack) and mesh up multiple pods over a DC ethernet Nx10Gb backbone to typical SAN pods ?
As in, you keep your old HBAs and the FC switch they are connected to – and as you migrate to CNAs you transport FCoE from edge FC fabric switches to your disk/controller racks over 5020/10 FCoE-DCE distribution/core pipes ?

So would you see any virtue in combining FlexConnect-10 with 1000v switches? I want to keep the architecture as simple as possible and given the lack of 10G pass through or CNA/Extenders for the C-Class and 495 servers, my options are few. I can use the FC-10 or the HP 10G ‘generic’ switch.

The last thing I want to do is require configuring the FC-10 AND 1000v for network changes, VLANs, etc. But the ability to use the FC-10 and break each 10G NIC into 4 bandwidth-tuned virtual NICs is appealing.

Dwight,
It’s good news for HP and bad news for you … you don’t have a choice. For the next 3 to 6 months (at least) only an HP switch can provide 10G to your BL495c servers. HP loves this and they are using this window of opportunity to push Virtual Connect like “candy at a kinder garden” as Spidern said above.

Having said that, there is no technical reason I am aware of yet why Nexus 1000V would not work with an HP switch such as Virtual Connect Flex-10. Nexus 1000V can work with any upstream switch, even something non-Cisco.

If I were you I would buy the cheaper simple HP 10G switch, keeping my investment low and the configuration simple, while I make plans to move to the CNA Nexus approach when those options become available for the HP C-class.

Another approach is to buy rack optimized HP servers, rather than Blades. What are you getting from Blades that VMWare and CNA’s with rack mount servers doesn’t get you? A controversial question perhaps, and just some food for thought…

But with a CNA, you are only provided 1 NIC to VMware – 2 if you have 2 CNAs (obviously). This seems to be a major detriment to the deployment of CNAs as VMware admins loose all insight into the traffic patterns of hosts communicating to the outside world.

We (everyone) have vSwitches that perform different functions – VM connectivity, Management, VMotion, Storage, etc. With CNAs, we would see only a pair of NICs with no insight into the tagged VLAN traffic.

How will Cisco/VMware address this?
Also, how will some security-related configurations be achievable when you need separate physical connectivity (i.e. management and vmotion to isolated network, VMs to DMZ environments)?

Joe,
Interesting you should ask as I was just contemplating drafting a follow up article on how to migrate from existing multi-1GE designs to 2xCNA designs for VMWare.
Grab my RSS feed and watch for such an article in the next few weeks: http://www.bradhedlund.com/feed/

Essentially with 2 x CNA’s you currently you have the option of using 802.1Q VLAN tagging to logically separate the VM traffic, service console, Vmkernel, etc.
Later in the year you will have the option of using Network Interface Virtualization (NIV). This is when the CNA is able to be logically segmented into multiple virtual adapters using the PCI SR-IOV specification. The single CNA will appear as multiple NIC’s to the VMWare ESX host. The traffic from each virtual adapter is distinguished at the Nexus 5000 with VNTags, and the Nexus 5000 would have unique vEth interfaces corresponding to each virtual adapter. This would allow you to port your existing multi-1GE NIC design to a server with 2 x CNA’s or 2 x 10GE NIC’s without any change to your existing design.

Also, how will some security-related configurations be achievable when you need separate physical connectivity (i.e. management and vmotion to isolated network, VMs to DMZ environments)?

This is quite interesting because what you are saying that you are OK with DMZ virtual machines and non-DMZ virtual machines residing on the same hypervisor, but at the same time you don’t trust VLAN/VRF isolation of those VM’s on the network. If physical separation of DMZ and non-DMZ virtual machines is so important, why do you have them running on the same server? Seems to me like driving 80 mph on the highway with no seatbelt but wearing a helmet.

In your diagram you show one “int Eth 1/1″ and one “int vEth 1″. If the physical server had 2 or more NICs say 4 or 8, is there a one to one relationship between “int Eth” and “int vEth”? Or are “int vEth” only the ESX virtual instance of a NIC presented to a VM?

I would echo Joe’s concern about physically separating security boundaries only by VLANs, such as a DMZ and internal networks. I’m not an expert, but I recall hearing in security conferences that its quite trivial to VLAN hop given the right knowledge and VLANs are strong security boundaries like physical cables are. I *will* safely assume VMware has properly isolated the guest VMs and the CPU and memory level, so I’m not worried about “VM hopping” and running a DMZ VM on the same server as a internal VM.

But our security folks would not rely on simple VLAN tags to provide ‘enterprise’ grade security isolation between security domains that physical cables provide. Just not happening!

What you describe regarding the virtual NICs for the CNAs sounds like what HP is trying to do today with their Flex-10, which presents one to four PCI-E NIC devices to the ESX hypervisor depending on what you want to do. Of course they aren’t doing CNA yet, so that doesn’t directly apply to the original question. But it sounds like a similar approach to chopping a 10G connection into smaller pipes and presenting them to the hypervisor.

One of our network engineers was commenting on how the Nexus 1000v should help a lot with spanning tree issues within and among blade enclosures. Can you comment at all on that topic?

but I recall hearing in security conferences that its quite trivial to VLAN hop given the right knowledge

This is possible in a physical network where no network security best practices have been applied. VLAN hopping is not possible in a network that has been properly configured.
Furthermore, it is not possible for a virtual machine to VLAN hop when connected to Nexus 1000V, again if properly configured. Ultimately it is human error that exposes such vulnerabilities that are later exploited. The same human error potential also applies to the ESX configuration, where a VM could be mistakenly connected to the wrong segment. Just like a physical patch cable could be mistakenly connected to the wrong switch port. This is why I still find it fascinating when people trust the security of server virtualization more so than network virtualization. Both are equally exposed to human error and it is the server that is more often the target of attack, rather than the network.

Or are “int vEth” only the ESX virtual instance of a NIC presented to a VM?

Correct, “int vEth” is given to each VM per the number of virtual NIC’s assigned, as well as “int vEth” is provided to the Service Console and VMKernel interfaces. If you installed 4 physical NIC’s in a server and associated them to Nexus 1000V you would have for example int Eth1/1, int Eth1/2, int Eth1/3, and int Eth1/4 that would be treated as uplink connections for the Nexus 1000V. In server #2 it would be int Eth2/1, int Eth2/2 … and so on.

One of our network engineers was commenting on how the Nexus 1000v should help a lot with spanning tree issues within and among blade enclosures. Can you comment at all on that topic?

Like the VMWare vSwitch, the Nexus 1000V does not have any effect on Spanning Tree (positive or negative) as it employs the same protections as the vSwitch, such as not forwarding traffic out of one NIC that was received on another. Spanning Tree is a consideration with the physical blade switch inside the enclosure that applies to both Cisco and HP blade switches. HP claims that Virtual Connect doesn’t affect spanning tree, however I recently heard of a network meltdown story where the server administrator mistakenly picked the wrong port to mirror traffic on a Virtual Connect module and thus created a spanning tree loop in the data center. In all fairness the same mistakes can be made with a Cisco switch. That whole human error thing again…

Great article on the 1000v and I’m excited to see how it turns out. However, I’m a Virtual Connect user and in all of your above statements with Virtual Connect, you are focused on Ethernet (I’m also a CCNP and a fluent user of the Cisco IOS and love it). Ethernet is only half of the Virtual Connect benefit. The other half is the SAN simplification. I never need to go back and rezone my switches or re-assign my LUN security when I put VC in an enclosure. This is a HUGE benefit in an enterprise data center. The SAN team has better things to do then re-zone because I moved something.

Jeff,
True that Virtual Connect allows the server admin to maintain the pWWN of a blade when it moves or fails. Same can be done with the Cisco FC switch as well, its called Flex Attach.

At any rate, my larger point with Virtual Connect was that you will always need separate Ethernet and FC modules in the blade enclosure. This results in more points of management, more cabling, more power, more cooling. These are the issues facing data centers today. When compared to a unified fabric design – a single blade switch that carries both FC and Ethernet over 10GE links, I’m not sure the Virtual Connect strategy is best in the long term. Of course that is my biased opinion

Margaret,
Thanks, I am aware of that switch. Allow me to clarify…
What I meant when I told Dwight he doesn’t have a choice is that he doesn’t have a choice in vendor options. It’s HP either way. Good news for HP, bad news for Dwight.

“Virtual Connect allows the server admin to maintain the pWWN of a blade when it moves or fails. Same can be done with the Cisco FC switch as well, its called Flex Attach. ”

Flex Attach uses NAT to maintain the same pWWN on the SAN at all times. I should say vWWN instead of pWWN, because it’s not truly the server’s pWWN – it’s one generated by the Cisco FC switch. So you have 1 physical server and 2 WWNs to know it by. One on the SAN and one on the server. All servers tools will show the real WWN on the server (qlogic or emulex), but the SAN admin sees a different WWN for that same server (the on generated by the switch). Yes, I can rip and replace a blade and have flex attach seamlessly handle the event – but I cannot cure the “disconnect” that now exists between my server team and my SAN team. The SAN team doesn’t know or care anything about servers slots or bays – they only care about WWNs. And the server team is not given access to the FC switch to tell the SAN admins that wwn AAA is really WWN XYZ on the SAN side. This would be a nightmare to maintain in a large enterprise.

VC handles this very seamlessly since only one pWWN is seen everywhere. It just seems like a better solution in my opinion.

Jeff,
On the flip side, with Virtual Connect you don’t have F_port trunking and F_port channeling, two things which provide more FC bandwidth and redundancy and allows for VM’s within the same ESX host to reside on separate VSANs. You are also lacking ISSU with Virtual Connect.

At the end of the day though, these things are really IT specific and tactical in nature, and not what the business really cares about. Data Centers today are inefficient energy hog money pits. When no more power is available a business must face the enormous cost of building and moving into a new facility. The status quo of multiple Ethernet and FC infrastructures must change. If you can reduce the power budget of a data center by 30% by deploying a unified fabric, that’s 30% more power available for servers. And when your server power requirements increase by 15% a year, that’s 2 years you have added to the life of that facility. That’s 2 more years the business can depreciate assets on that facility. That’s 2 more years the business can deffer capital expenditures for a new facility, along with a 2 year deferral in the cost of obtaining that capital.

So, yeah, its nice to be a hero to the Server team because you saved them a phone call to the Storage team about a WWN.

For me though, I would prefer to be a hero to the CFO because I saved him a phone call to the bank. That’s when network and systems architecture has a real business impact.

I’m with you Brad – a converged network is a great solution, and I intend to move to it when it’s a reality for blade customers. Today that is not the case. We bought blades because they also are providing huge cost savings in power and cooling – and the life of my existing data center. So I’m already a hero with the CFO – I guess when I can put a converged network on the blades, they’ll have to make me CFO. I can’t wait!

a converged network is a great solution, and I intend to move to it when it’s a reality for blade customers. Today that is not the case.

Jeff,
I have been biting my lip this whole thread. There is so much I want to say right now, but I cant, not yet. Please keep your eyes out for a major Cisco public announcement sometime in March. This is the biggest and most exciting thing Cisco has done in decades.

I agree wtih Jeff that a converged fabric does have its advantages. And in 2010 when ratified standards based products are widely available and ready for blade enclosures, we will look at it very closely. I’m glad Cisco is a leader in this area, but the various moving pieces and standards haven’t all aligned to let blade customers deploy the technology *today*.

On the technical side, I have another question. If I have a blade enclosure (non-CNA) hosting several 1000v swtiches and external 5000v switches, what complications would there be if the external facing blade chassis interconnect was a standard non-Cisco ethernet switch (say Nortel or brand x.)

1000v Brand x switch 5000v.

Would that sandwich cause any feature loss or additional complications that would hinder operation of *any* current or planned NX-OS features in a pure Ethernet environment?

From a functional perspective, it should work. The Nexus 1000V does not care what upstream switches exists in the network. However, having some non-CIsco switch in the middle is obviously not the ideal scenario from a troubleshooting and operations perspective.

You stated:
This is quite interesting because what you are saying that you are OK with DMZ virtual machines and non-DMZ virtual machines residing on the same hypervisor, but at the same time you don’t trust VLAN/VRF isolation of those VM’s on the network. If physical separation of DMZ and non-DMZ virtual machines is so important, why do you have them running on the same server? Seems to me like driving 80 mph on the highway with no seatbelt but wearing a helmet.

I understand your comment. We should have been more clear. We do not use VLAN/VRF for Internet-Facing DMZs, but rather, internal ‘security zones’ for various tiers of an application set (Web, DB, Application, Web Services, Security, Management, etc.).

VLAN/VRF separation is OK for Internal; Physical separation is required for Internet facing.

In any case, your information on the evolving capability for virtual NICs (NIV through PCI SR-IOV) is a move in the right direction, but one I can already achieve this today through Xsigo and HP Virtual Connect (16 NICs and 4 NICs per HCA/VC-NIC, respectively) – probably with other technologies, too.

It all is a matter of time-to-market and maturity. I think the 1000V (when available) will be very helpful to the network team and should support almost any VMware network infrastructure. At present, though, in a blade or even rack-mount environment, CNAs and (Cisco) DCE is a significant risk of having to rip-and-replace many server cards as the solutions evolve over the next 6-18 months.

Anyway, it is a very exciting time, but also very frustrating for those of us who have to buy now and need to consolidate I/O now (like tomorrow).

VLAN/VRF separation is OK for Internal; Physical separation is required for Internet facing.

Totally understand. And with such a policy I would expect internet facing VM’s to exist on physically isolated internet facing servers. The manner in which a switch provides isolation at the VLAN/VRF level is no less secure than how a hypervisor provides isolation between VM’s. Thus why I chuckle when folks mix these VM’s on one server but at the same time insist on separate NIC’s and switches for these VM’s.

your information on the evolving capability for virtual NICs (NIV through PCI SR-IOV) is a move in the right direction, but I can already achieve this today through Xsigo and HP Virtual Connect

NIV is not an apples-to-apples comparison with Virtual Connect. NIV will take what HP currently does with Virtual Connect a step further in being able to provide hundreds of virtual NICs per physical NIC. This will allow a VM to connect directly to it’s own virtual NIC on the physical NIC, bypassing the hypervisor altogether for network I/O, resulting in I/O performance rivaling bare metal servers.

CNAs and (Cisco) DCE is a significant risk of having to rip-and-replace many server cards as the solutions evolve over the next 6-18 months.

I’m not suggesting customers rip and replace their exiting infrastructure. A unified fabric is something you build from scratch in a new green field data center, or gradually evolve into at an existing facility as servers are life cycled.

I totally understand your frustration with trying to piece all these various parts together.
Wouldn’t it be great if you could buy one complete computing system with the virtualization and unified fabric components built in from day one? 😉

If Cisco gets into the blade server business, people also have to remember the entire management and deployment infrastructure that goes with servers. HP, IBM and Dell all have had a decade or longer to build up their management and deployment tools.

Dwight,
Cisco doesn’t enter a new market to provide a me too product that’s just more of the same. If Cisco does enter the blade server maket, expect to see a whole new approach to systems management that completely changes the current paradigm that has existed for over a decade, as you pointed out.

“NIV is not an apples-to-apples comparison with Virtual Connect. NIV will take what HP currently does with Virtual Connect a step further in being able to provide hundreds of virtual NICs per physical NIC. This will allow a VM to connect directly to it’s own virtual NIC on the physical NIC, bypassing the hypervisor altogether for network I/O, resulting in I/O performance rivaling bare metal servers.”

Margaret,
With Flex-10 a virtual machine cannot connect directly to Flex-10, you will still need a hypervisor switch. Flex-10 does not provide visibility and policy control to each individual virtual machine.
The advantage of NIV is that each individual virtual machine will be able to connect directly to a hardware based switch (Nexus 5000), alleviating the hypervisor CPU from processing network I/O, thereby allowing the hypervisor to host more virtual machines and thus gaining better consolidation ratios. Not to mention better network I/O performance.

Hypothetical configuration: Small branch office, four ESX hosts (typical rackmount servers) with iSCSI storage, Nexus 1000v on each host. Vmotion and DRS is in use.

Does using a Nexus 5000v as the next upstream switch buy me anything in terms of ESX network port management? I realize if I was in a CNA environment having a Nexus 5000v upstream would be important.

Given the 5000v switches are relatively expensive for a branch office and I don’t need 10Gb Ethernet, I was hoping no VM networking functionality would be lost by not using a ‘typical’ workgroup Cisco ethernet switch.

Gary,
That would be fine. No VM networking functionality is lost by connecting your ESX hosts with Nexus 1000V to a “typical” Cisco ethernet switch. Just keep in mind you will need to have a Nexus 1000V VSM (either physical or virtual) at each small branch office. Some day it may be possible for a central VSM to manage remote VEM’s across a WAN link, but for now the VSM and it’s VEM’s are assumed to exist within the same local network.

Spidern,
Keep in mind the N1KV has nothing to do with a unified fabric (no dependencies there).
All that is needed for the linux community to adopt unified fabric is the appropriate device drivers for the CNA (which already exist).

I have no specific knowledge of the status, but I suspect negotiations are underway with mixed results between the three. Somehow I get the feeling the agreements with HP are a little more challenging than the others (unfortunately) …

It will certainly be interesting this year to see if (and how) HP adopts a unified fabric offering for their customer base.

I suspect HP will likely dismiss unified fabric as not really important for the data center …

I will thoroughly enjoy watching how this transpires in the coming months…

Are you kidding? All of the techno-babble in the world doesn’t hide very significant facts:

1 – VNTAG = 0 backward compatability. Kidding right?
Without VNTAGs enabled = The frame will simply be discarded. Great for IT shops!
2 – VEPA is backward compatable! You don’t mention it? Why?
In fact not a single person mentions it on this blog. Sounds like a resounding blast of ignorance!
3 – Enabling VNTAGS = All other switches are broken unless they are Cisco with VNTags enabled.
4 – This is Cisco’s way of trying to regain their huge losses in the edge switch market. Simpleton response by a networking company. Don’t make it better. Rip it all out and put new switches everywhere and since Cisco decided to push the VNTAG standard then nobody else will immediately implement it is a standard.
5 – Cisco continues to make 60 points on every switch sold. OUCH!
Moores law down the tubes!
6 – This is a Network only view. What about the existing investments in Storage, Servers, WAN accelerators and the like.
7 – FCoE – Even with NEW standards to support lossless Enet is still slower than Fibre Channel.
8 – Cisco promises 2to1 cable reduction! Are you kidding? I am already at a 4to1 reduction with my solution! I will go backwards with FCoE!

Your Network only view is pitiful. But I am sure the Cisco Kool-Aide drinkers will lap it up….

Brad, you are obviously a Cisco Stockholder. All this really adds up to is mo’ money for Cisco and no real progress for IT.

Wrong. A switch that supports VNTag (such as Nexus) works perfectly fine with switches or NIC’s that do not have VNTag support.

2 – VEPA is backward compatable! You don’t mention it? Why?

A switch with VNTag support is backward compatible too, so what’s your point?

3 – Enabling VNTAGS = All other switches are broken unless they are Cisco with VNTags enabled.

Ouch, your 0-3. Wrong again! VNTag’s are link by link specific (NIC-to-Switch, or Fabric_Extender-to-Switch) and are not forwarded upsteam to other switches. So, enabling VNTag between a Server NIC and a switch means other switches continue to work just fine.

4 – This is Cisco’s way of trying to regain their huge losses in the edge switch market. Simpleton response by a networking company. Don’t make it better. Rip it all out and put new switches everywhere and since Cisco decided to push the VNTAG standard then nobody else will immediately implement it is a standard.

As stated above your “Rip it all out” comment is derived from being uniformed – no need to go there again. As for “Cisco decided to push the VNTag standard” — one point of clarification: VMWare is also pushing VNTag as a standard.

6 – This is a Network only view. What about the existing investments in Storage, Servers, WAN accelerators and the like.

Where does it say that everything in the Data Center needs to be ripped out? The suggestion that implementing Cisco Nexus switches requires trashing all other investments in the Data Center is either rooted in ignorance, the intent to mislead, or both.

7 – FCoE – Even with NEW standards to support lossless Enet is still slower than Fibre Channel.

Wow, wrong again. Gen2 CNA’s from Emulex/Qlogic, and even Cisco’s adapter in UCS can all forward Fibre Channel at 10Gbps per second.

8 – Cisco promises 2to1 cable reduction! Are you kidding? I am already at a 4to1 reduction with my solution! I will go backwards with FCoE!

Today a server can have a single adapter, linked to single switch, with a single cable providing 10Gbps Eth and FC. Can your solution do that? Yeah, didn’t think so.

Brad, you are obviously a Cisco Stockholder

You’re right. Even better, I’m also a Cisco employee, and I clearly state that for every reader to see in the “About the Author” section at the top of every page.

Your Network only view is pitiful. But I am sure the Cisco Kool-Aide drinkers will lap it up….

Note to readers: This juvenile post was made by an employee of Hewlett-Packard.

I should point out that some other comments in this article were made by HP employees in a mature discussion of facts and philosophical debate. Something I expect from a large and successful company like HP. This poster, unfortunately, did not represent HP very well.

I always welcome a spirited debate with adults … let me know when you are ready for that.

You mention that a FCoE initiator is available for the Intel Oplin 10GE NICs. I was searching for that quite a lot, but could not find anything useful. There is the open-sourced code on open-fcoe.org, but I need a driver for Windows Server 2008 or VMWare ESX. I’ve seen that Microsoft has launched a FCoE logo program and they will issue the first certificates in 2010 only. VMWare lists only the first generation CNAs on its HCL, which have standalone HBA and 10GE chipsets on them.

Do I have an Intel FCoE option today or the only candidates are the hybrid adapters from Qlogic or Emulex? Are their second generation CNAs already shipping?

Do I have an Intel FCoE option today or the only candidates are the hybrid adapters from Qlogic or Emulex? Are their second generation CNAs already shipping?

The Intel Oplin 10GE adapter has software FCoE capabilities and broad operating system support is forthcoming. As of right now, the Qlogic and Emulex CNA’s are your certified and supported options for FCoE. AFAIK, the Gen2 CNA’s from Emulex and Qlogic should be shipping this summer (very soon).

I’m in InfoSec (financial services) and have been a long time cisco user.

Just purchased the 5010 lab bundle and wanted to point out some gotchas to anyoen reading.

Gotcha #1: Make sure you order the FCoE Bundle. You need the FCoE card AND the FCoE license (which costs 12K). If you order the ethernet bundle, you just bought a pretty white switch.

Gotcha #2: If, like me, you are running a NON Cisco UCS Blade Center, MAKE sure that your vendor makes a CNA mezzanine card to replace the ethernet mezzanine card.

Let’s just say that gotcha #2 somehow never made it into the initial architecture discussions and I now have really pretty 39K boat anchors (yes they are being returned unless cisco and my blade center vendor both say that a mezzanine card is shipping out in the next month and I can have them).

So for all of you out there who were really hyped on 10GigE FCoE back to the SAN and BladeCenter, ask the fine print questions, because as I have found out, the reps barely know at this point.

They are still working out the kinks I guess with a new product. (and I like my cisco reps)

NOW- on a side note, I did go to the labs to play with the NK7 and NK5, and it is pretty cool. So I will most likely revisit it again once I can actually use them in the manner that they were designed for.

Do I have to use Nexus 1000V in order to see vm to vm traffic inside an ESX host? Can I get away with buying the 5010 and making my intra ESX traffic have to go through it without the 1000V for sniffing traffic? I don’t want to pay for the Enterprise Plus license…

How would you force all VM to VM traffic on the same ESX through a Nexus 5000? The reality is, any VMs on the same ESX Host, on the same VLAN, will directly communicate via the vSwitch and you will not see that traffic at the Nexus 5000.

[…] at the InternetExpert.org blog there is a post by Brad Hedlund revealing a bit more about the Nexus 1000V and FCoE in a VMware ESX 4.0 environment. As I mentioned in a previous post I’m not in a position to […]

[…] Combined with the ever growing capacity of CPU and RAM in servers this will result in VM host monsters. But how are all these new techonologies going to integrate with one another. Thankfully Brad Hedlund a Consulting System Engineer with Cisco and CCIE has written an article to explain this in detail. You can read about it here. […]