vSphere

In this blog post, we go into the trenches of the (Distributed) vSwitch with a focus on vSphere ESXi network IOChain. It is important to understand the core constructs of the vSphere networking layers for i.e. troubleshooting connectivity issues. In a second blog post on this topic, we will look closer into virtual network troubleshoot tooling.

IOChain

The vSphere ESXi network IOChain is a framework that provides the capability to insert functions into the network data-path regardless of the usage of a vSphere Standard Switch (VSS) or a vSphere Distributed Switch (VDS). The IOChain is a group of functions that provides connectivity between ports and the vSwitch. A port has two IOChains based on the direction to and from the vSwitch. Meaning each port in a set is associated with it an input and an output IOChain. This allows for a modular approach by only including optional elements in an IOChain as configured by the user.

Examples of optional elements in an IOChain are VLAN support, NIC teaming, and traffic shaping. Looking at the high-level components in an ESXi network IOChain, we differentiate between the port group, the vSwitch (VSS or VDS) and the uplink level.

Port group level

This is where an optional configured VLAN is interpreted by the VLAN filter, allowing for VLAN dot1q tags for your port group. The security settings Promiscuous mode, MAC address changes, and Forged transmits are also set at the port group level. The user can also optionally configure traffic shaping, either egress only when using a VSS or bi-directional traffic shaping when using a VDS.

vSwitch (VSS or VDS) level

Incoming packets at the vSwitch level are forwarded to their destination using the forwarding engine. Incoming packets at the vSwitch level are forwarded to their destination using the forwarding engine. The forwarding engine contains port information paired with MAC address information. It’s job is to send the traffic to its proper destination. That can be either a VM residing on the same ESXi host or an external host.

The teaming engine is responsible for balancing network packets over the uplink interfaces. The way it does so is depended on the chosen teaming configuration by the user. The traffic shaper module is added to the IOChain if enabled in the port group level.

Uplink level

At this level, the traffic sent from the vSwitch to an external host finds its way to the driver module. This is where all the hardware offloading is taking place. The Supported hardware offloading features depends strongly on the physical NIC in combination with a specific driver module. Typically supported hardware offloading functions that in NICs are TCP Segment Offload (TSO), Large Receive Offload (LRO) or Checksum Offload (CSO). Network overlay protocol offloading like with VXLAN and Geneve, as used in NSX-v and NSX-T respectively, are widely supported on modern NICs.

Next to hardware offloading, the buffer mechanisms come into play in the Uplink level. I.e., when processing a burst of network packets, ring buffers come into play. Finally, the bits transmit onto the DMA controller to be handled by the CPU and physical NIC onwards to the Ethernet fabric.

Standard vSwitch

The following diagram puts all components together to form the IO chain for vSphere networking using a standard vSwitch:(more…)

To enforce bandwidth availability, it is possible to reserve a portion of the available uplink bandwidth using Network I/O Control (NIOC). It may be necessary to configure bandwidth reservations to meet business requirements with regards to network resources availability. In the system traffic overview, under the resource allocation option in the Distributed vSwitch settings, you can configure reservations. Reservations are set per system traffic type or per VM.

Strongly depending on your IT architecture, it could make sense to reserve bandwidth for specific business critical workload, vSAN network or IP storage network backend. However, be aware that network bandwidth allocated in a reservation cannot be consumed by other network traffic types. Even when a reservation is not used to the fullest, NIOC does not redistribute the capacity to the bandwidth pool that is accessible to different network traffic types or network resource pools.

Since you cannot overcommit bandwidth reservations by default, it means you should be careful when applying reservations to ensure no bandwidth is gone to waste. Thoroughly think through the minimal amount of reservation that you are required to guarantee for network traffic types.

For NIOC to be able to guarantee bandwidth for all system traffic types, you can only reserve up to 75% of the bandwidth relative to the minimum link speed of the uplink interfaces.

When configuring a reservation, it guarantees network bandwidth for that network traffic type or VM. It is the minimum amount of bandwidth that is accessible. Unlike limits, a network resource can burst beyond the configured value for its bandwidth reservation, as it doesn’t state a maximum consumable amount of bandwidth.

You cannot exceed the value of the maximum reservation allowed. It will always keep aside 25% bandwidth per physical uplink to ensure the basic ESXi network necessities like Management traffic. As seen in the screenshot above, a 10GbE network adapter can only be configured with reservations up to 7.5 Gbit/s.

Bandwidth Reservation Example

vSphere network quality control features like the Network I/O Control (NIOC) feature is focused on the virtual networking layer within in a VMware virtual data center. But what about the physical network layer and how the two can cooperate?

In converged infrastructures or enterprise networking environments, Quality of Service (QoS) is commonly configured in the physical network layers. QoS is the ability to provide different priorities to network flows, or to guarantee a certain level of performance to a network flow by using tags. In vSphere 6.7, you have the ability to create flow-based traffic marking policies to mark network flows for QoS.

Quality of Service

vSphere 6.7 supports Class of Service (CoS) and Differentiated Services Code Point (DSCP). Both are QoS mechanisms used to differentiate traffic types to allow for policing network traffic flows.

As related to network technology, CoS is a 3-bit field that is present in an Ethernet frame header when 802.1Q VLAN tagging is present. The field specifies a priority value between 0 and 7, more commonly known as CS0 through CS7, that can be used by quality of service (QoS) disciplines to differentiate and shape/police network traffic. Source: https://en.wikipedia.org/wiki/Class_of_service

One of the main differentiators is that CoS operates at data link layer in an Ethernet based network (layer-2). DSCP operates at the IP network layer (layer-3).

Differentiated services or DiffServ is a computer networking architecture that specifies a simple and scalable mechanism for classifying and managing network traffic and providing quality of service (QoS) on modern IP networks. DiffServ uses a 6-bit differentiated services code point (DSCP) in the 8-bit differentiated services field (DS field) in the IP header for packet classification purposes. Source: https://en.wikipedia.org/wiki/Differentiated_services

When a traffic marking policy is configured for CoS or DSCP, its value is advertised towards the physical layer to create an end-to-end QoS path.

Traffic marking policies are configurable on Distributed port groups or on the DvUplinks. To match certain traffic flows, a traffic qualifier needs to be set. This can be realized using very specific traffic flows with specific IP address and TCP/UDP ports or by using a selected traffic type. The qualifier options are extensive. (more…)

In my previous post, I mentioned the part 3 of the Host Deep Dive session at VMworld 2018. The ‘3’ is because we ran the part 1 and 2 at VMworld 2016 and 2017. We had the chance to try a new way of discussing Host Resources settings by the way of creating levels of performance tuning. The feedback we received will test-driving this session at the London and Indianapolis VMUG, was really positive.

As we always stated, the out-of-the-box experience of VMware vSphere is good enough for 80-90% of common virtual infrastructures. We like to show how you can gradually increase performance and reduce latency with advanced vSphere tuning. That’s why we came up with the Pyramid of Performance Optimization. Delivering the content this way allows for better understanding on when to apply certain optimizations. We will start with the basics and work our way up to settings to squeeze out the maximum performance of vSphere ESXi hosts.

Due to session time constrains, we will focus on compute (CPU and Memory) and virtual Networking. The following pyramids contain the subjects about content we will discuss in our session.

Pyramid of Performance Optimization – Compute:

Pyramid of Performance Optimization – Networking:

We will go trough all these levels in detail. We very much look forward to VMworld and hope to see you there! Be sure to reserve your seat for this session!

Countless hours has gone into it; researching, writing content, updating content, discussing a lot, creating the cover designs, creating a logo, having fun!
It was my second time as a co-author after releasing the Host Deep Dive book last year with Frank. I am humbled that I got to work with two of the most regarded individuals in our vCommunity. What these guys did for making the daily life easier for everybody working with VMware solutions, is incredible.

Also, a big thank you to all our reviewers, people who helped realizing this release and Chris Wahl for writing an inspiring foreword.

Clustering Deep Dive

I am a big fan of the previous releases of the Clustering Deep Dive series. When thinking about that, back in 2010-11, the first release helped me a lot to fully understand all clustering constructs. One might say it helped to fuel my enthusiasm for working with VMware vSphere.

Couple years fast forward and here I am working together with Duncan and Frank on the latest release. A big thank you to them for letting me get onboard and be part of this amazing series! As there are a lot of changes since the 5.1 release, we hope this book can help you getting a thorough understanding about all VMware vSphere 6.7 clustering features. The new version of the Clustering Deep Dive covers vSphere HA, DRS, Storage DRS, Storage I/O Control and Network I/O Control. In the last chapter of the book, we bring all the theory together and apply it to create a stretched cluster configuration.

Where Duncan worked on the HA parts and Frank on the DRS parts, I primarily focussed on the Quality Control parts. I feel that these features are often enabled or disabled without really understanding how they can help you managing and enforcing quality control. At least, that is my experience with them. While knewing high-level what NIOC and SIOC are all about, a deeper understanding can lead to new insights on their impact and how to use them. We feel that this addition to the book helps to gain these insights.

Logo & VMworld sessions

The idea is to provide you with a vSphere Resource kit to fully understand all features from the hardware components and everything involved all the way up to the vSphere clustering service on top of that.

Now the Host Deep Resources Deep Dive, Part 3 might be a slightly confusing title. It is part 3 because we already did Part 1 at VMworld 2016 and Part 2 in 2017. We will bring a new awesome way of delivering host resources knowledge in that session. More on that later.

With the arrival of the new Clustering Deep Dive book, we came up with a new logo to accompany the Host Deep Dive logo. There will be a limited number of shirts + stickers (pushing for on-time delivery) that we will bring with us to VMworld. We will give some away in our sessions so make sure to attend!

TCP Segmentation Offload (TSO) is the equivalent to TCP/IP Offload Engine (TOE) but more modeled to virtual environments, where TOE is the actual NIC vendor hardware enhancement. It is also known as Large Segment Offload (LSO). But what does it do?

When a ESXi host or a VM needs to transmit a large data packet to the network, the packet must be broken down to smaller segments that can pass all the physical switches and possible routers in the network along the way to the packet’s destination. TSO allows a TCP/IP stack to emit larger frames, even up to 64 KB, when the Maximum Transmission Unit (MTU) of the interface is configured for smaller frames. The NIC then divides the large frame into MTU-sized frames and prepends an adjusted copy of the initial TCP/IP headers. This process is referred to as segmentation.

When the NIC supports TSO, it will handle the segmentation instead of the host OS itself. The advantage being that the CPU can present up to 64 KB of data to the NIC in a single transmit-request, resulting in less cycles being burned to segment the network packet using the host CPU. To fully benefit from the performance enhancement, you must enable TSO along the complete data path on an ESXi host. If TSO is supported on the NIC it is enabled by default.

The same goes for TSO in the VMkernel layer and for the VMXNET3 VM adapter but not per se for the TSO configuration within the guest OS. To verify that your pNIC supports TSO and if it is enabled on your ESXi host, use the following command: esxcli network nic tso get. The output will look similar the following screenshot, where TSO is enabled for all available pNICs or vmnics.