Stats

Tag: vCNS

In vCloud Director 8.10 there is massive improvement in deployment (and configuration) speed of Edge Gateways. This is especially noticeable in use cases where large number of routed vApps are provisioned in as short time as possible – for example nightly builds for testing, or labs for training purposes. But this is also important for customer onboarding – time to login to cloud VM from the swipe of the credit card SLA.

Theory

How is the speed improvement achieved? It is actually not really vCloud Director accomplishment. The deployment and configuration of Edge Gateways were always done by vShield or NSX Manager. However, there is a big difference how vShield Manager and NSX Manager communicate with the Edge Gateway to push its configuration (IP addresses, NAT, firewall and other network services configurations).

As the Edge Gateway can be deployed to any network which can be completely isolated from any external traffic, its configuration cannot be done over the network and instead out-of-band communication channel must be used. vShield Manager always used VIX API (Guest Operations API) which involves communication with vCenter Server, hostd process on ESXi host hosting the Edge Gateway VM and finally VMware Tools running in the Edge Gateway VM (see this older post for more detail).

NSX Manager uses different mechanism. As long as the ESXi host is properly prepared for NSX, message bus communication between the NSX Manager and vsfwd user space process on the ESXi host is established. Additionally the configuration to the Edge Gateway VM is done via VMCI channel.

Prerequisites

There are necessary prerequisites to use the faster message bus communication as opposed to VIX API. If any of these is not fulfilled the communication mechanism fails back to VIX API.

The host running the Edge Gateway must be prepared for NSX. So if you are in vCloud Director using solely VLAN (or even VCDNI) backed network pools and you skipped the NSX preparation of underlying clusters, message bus communication cannot be used as the host is missing the NSX VIBs and vsfwd process.

The Edge Gateway must be version 6.x. It cannot be the legacy Edge version 5.5 deployed by older vCloud Director releases (8.0, 5.6, etc.). vCloud Director 8.10 deploys Edge Gateway version 6.x however existing Edges deployed before upgrade to 8.10 must be redeployed in vCloud Director or upgraded in NSX (read this whitepaper for a script to do it at once).

Obviously NSX Manager must be used (as opposed to vShield Manager) – anyway vCloud Networking and Security is not supported with vCloud Director 8.10 anymore.

Performance Testing

I have done quick proof of concept testing to see what is the relative improvement between the older and newer deployment mechanism.

I used 3 different combinations of the same environment (I was upgrading from one combination to the other).

vCloud Director 5.6.5 + vCloud Networking and Security 5.5.4

vCloud Director 8.0.1 + NSX 6.2.3 (uses legacy Edges)

vCloud Director 8.10 + NSX 6.2.3 (uses NSX Edges)

All 3 combinations used the same hardware and the same vSphere environment (5.5) with nested ESXi hosts. So the point is to look at the relative differences as opposed to absolute deployment times.

I measured in PowerCLI sequential deployment speed of 10 vApps with one isolated network and 10 vApps with one routed network with multiple runs to calculate average per one vApp. The first scenario was to measure differences in provisioning speeds of VXLAN logical switches to see impact of controller based control plane mode. The second includes provisioning of an Edge Gateway and logical switch. The vApps were otherwise empty (no VMs).

Note; If you want to do similar test in your environment, I captured the two empty vApps with only the routed or isolated networks to a catalog with vCloud API (PowerCLI) as it cannot be done from vCloud UI.

Here are the average deployment times of each vApp.

vCloud Director 5.6.5 + vCloud Networking and Security 5.5.4

Isolated 5-5.5 seconds

Routed 2:17 min

vCloud Director 8.0.1 + NSX 6.2.3

Isolated cca 6.8 seconds (Multicast), 7.5 seconds (Unicast)

Routed 2:20 min

vCloud Director 8.10 + NSX 6.2.3

Isolated 7.7 s (Multicast), 8.1 s (Unicast)

Routed 1:35 min

While the speed of logical switch provisioning goes little bit down with NSX and with Unicast control plane mode, the Edge Gateway deployment gets massive boost with NSX and VCD 8.10. While the OVF deployment of NSX Edge takes little bit longer (from 20 to 30 s) it is the configuration that makes up for it (from way over a minute down to about 30 s).

Just for comparison here are the tasks done during deployment of each routed vApp as reported by vSphere Client Recent Task window.

Like this:

In April I wrote whitepaper describing all considerations that need to be taken when upgrading vCloud Networking And Security to NSX in vCloud Director Environments. I have updated the whitepaper to include additional information related to new releases:

updates related to vCloud Director 8.10 release

update related to VMware NSX 6.2.3 release

updates related to vCenter Chargeback Manager 2.7.1 release

NSX Edge Gateway upgrade script example

extended upgrade scenario to include vCloud Director 8.10

The whitepaper will be posted later this month on vCloud Architecture Toolkit for Service Providers website, until then it can be downloaded from the link below.

VMware vCloud Director® relies on VMware vCloud® Networking and Security or VMware NSX® for vSphere® to provide abstraction of the networking services. Until now, both platforms could be used interchangeably because they both provide the same APIs that vCloud Director uses to provide networks and networking services.
The vCloud Networking and Security platform end-of-support (EOS) date is 19 September 2016. Only NSX for vSphere will be supported with vCloud Director after the vCloud Networking and Security end-of-support date.
To secure the highest level of support and compatibility going forward, all service providers should migrate from vCloud Networking and Security to NSX for vSphere. This document provides guidance and considerations to simplify the process and to understand the impact of changes to the environment.
NSX for vSphere provides a smooth, in-place upgrade from vCloud Networking and Security. The upgrade process is documented in the corresponding VMware NSX Upgrade Guides (versions 6.0 , v6.1 , 6.2 ). This document is not meant to replace these guides. Instead, it augments them with specific information that applies to the usage of vCloud Director in service provider environments.

Share this:

Like this:

vCloud Director based clouds support non-disruptive maintenance of the underlying physical hosts. They can be patched, upgraded or completely exchanged without any impact on the customer workloads all that thanks to vMotion and DRS Maintenance Mode which can evacuate all running, suspended or powered-off workloads from an ESXi host.

Many service providers are going to be upgrading their networking platform from vCloud Network and Security (vCNS) to NSX. This upgrade besides upgrading the Manager and deploying new NSX Controllers requires upgrade of all hosts with new NSX VIBs. This host upgrade results in the need to reboot every host in the service provider environment.

Depending on number of hosts, their size and vMotion network throughput evacuating each host can take 5-10 minutes and reboot can add additional 5 minutes. So for example sequential reboot of 200 hosts could result in full weekend long maintenance window. However, as I mentioned, these reboots can be done non-disruptively without any impact on customers – so no maintenance windows is necessary and no SLA is breached.

So how do you properly reboot all hosts in vCloud Director environment?

While vSphere maintenance mode helps, it is important to properly coordinate it with vCloud Director.

Before a host is put into a vSphere maintenance mode it should be disabled in vCloud Director which will make sure it does not try to communicate with the host for example for image uploads.

All workloads (not just running VMs) must be evacuated during the maintenance mode. A customer who decides to power on VM or clone a VM which is registered to a rebooting (and temporarily unavailable) host would be otherwise impacted.

So here is the correct process (omitting the parts that actually lead to the need to reboot the hosts):

Make sure that cluster has enough capacity to temporarily run without 1 host (it is very common to have atleast N+1 HA redundancy)

As a quick proof of concept I am attaching a PowerCLI script that automates this. It needs to talk to both vCloud Director and vCenter Server therefore replace Connect strings at the beginning to match your environment.

Michael Belohaubek sent me his modified script that is more user friendly, takes into account VM Tools installation that could potentially block host maintenance mode and also HA Admission Control set to failover hosts.

Like this:

In order to access external network resources from internal Org VDC networks Source NAT (SNAT) rule must be created on the Edge Gateway which translates internal IP address to a sub-allocated IP address of a particular external interface.

The internal source IP address can be entered in these formats:

Single IP address

Range of IP addresses

CIDR format

As you can see it is not possible to put ‘Any’ as it is with firewall rules configuration.

After investigating what would be the easiest option to use, this is what I found out:

In case where Edge Gateway is deployed by NSX Manager then it is possible to use following CIDR entry 0.0.0.0/0.

Unfortunately this is not working with Edge Gateway deployed by vShield Manager (vCNS) where Edge configuration fails with the following error: