WNV Deep Dive Part 4 – Looking at LBFO and Hyper-V traffic

We’re going to look at the two other basic types of WNV traffic in part 4: LBFO (NIC teaming) and Hyper-V. I’ll be skipping over Hyper-V Network Virtualization and Software Defined Networks. The new Switch Embedded Teaming technology will also not be covered. Those are topics for a different day.

NBLs are still in use as “packets inside of Windows”, but there are a couple of small variations in capturing and looking at the traffic.

Network LBFO

Network LBFO (Load Balancing and Fail Over), more commonly known as NIC teaming, is currently the only network virtualization technique in Windows that does not use a vmSwitch. The basic traffic flow for LBFO looks something like this.

There are some notes I want to point out about LBFO. These are some common answers, misconceptions, and issues I see people running into when working with LBFO.

You can create an LBFO team with a single physical NIC. There will be no load balancing, nor fail over with a “single NIC LBFO adapter,” but you can do it.

Why would you ever use a single NIC LBFO adapter? Testing and VLANing without installing additional software. More on VLANing coming up.

There are two parts to an LBFO team setup: switch type and load balancing mode. This is a topic unto itself. I will refer you to a couple of official guides if you want to learn more.

You can create multiple sub-interfaces, one for each VLAN, off a single tNIC. This is similar to VLAN tagging in the Linux world.

The primary tNIC must always use the trunk or default VLAN. Use sub-interfaces for other VLANs.

The actual VLAN header is always added to the packet by the physical NIC in Windows.

The VLAN information is passed to the pNIC driver via the NBL OOB (Out Of Band) blob.

Never attach a VLAN to a pNIC or tNIC, including LBFO sub-interfaces, when it is attached to a Hyper-V vmSwitch.

Use multiple host management adapters (vmNICs) if the host system needs to use VLANs, and the tNIC is attached to a vmSwitch.

LBFO is only available in Windows Server SKUs. The LBFO PowerShell cmdlets would not throw errors in older versions of Windows 10, but LBFO never actually worked. There are a couple of third-party NIC OEMs who make and support teaming software that works in Windows client SKUs. There is no Microsoft supported Windows client NIC teaming.

LBFO does not use a vmSwitch to pass NBLs within Windows. The LBFO system driver handles the movement of LBFO NBLs within Windows, and these events are currently not publicly viewable. So how do you know whether the traffic works?

Packet capture tools can collect data from all network adapters on the system. In the case of LBFO that means the packet can be captured at the physical NIC (pNIC) and then at the LBFO teamed adapter (tNIC). Seeing packets at the pNIC shows that the data arrived from the wire, and was not discarded somewhere on the network. The packet arriving at the tNIC shows that the NBL made it to the LBFO virtual adapter.

This what a psping on TCP port 445 looks like with LBFO configured.

The TCP ports and IP addresses are the same, showing it’s the same TCP/IP connection, but there are two of every packet. Two SYN, two ACK-SYN, two ACKs, ACK-FINs, RST, and so on. This is not a glitch. The data was just captured on both pNIC and tNIC. This can throw off a packet analysis if you are not expecting it. Most packet analysis tools make this appear like there is a transmission problem, massive packet retransmission, when there is actually none.

Hyper-V Traffic

Hyper-V traffic is a lot like Container traffic. Or, more accurately, Container traffic is a lot like Hyper-V traffic, since Hyper-V came first. Hyper-V traffic is just NBLs being passed around between NICs, both physical and virtual, by vmSwitches.

When LBFO is involved you can add the LBFO adapter between the physical NIC and the vmSwitch. Unless, of course, you are using Windows Server 2016 and use Switch Embedded Teaming (SET). But that’s a confusion for a different blog.

The immediate security concern is that when capturing data on the Hyper-V host you can see all the traffic going to every VM. There may also be duplication in the trace when LBFO is used, and when packets are picked up on the vmSwitch. Duplication can be removed if you capture on a single adapter on Server 2012 R2. Windows 10 and Server 2016 seem to have a mechanism whereby duplication is sometimes not captured using a basic capture. I haven’t quite worked out how the anti-duplication bits are triggered, I just know that sometimes there is packet duplication and sometimes there isn’t.

The following is an example of how the traffic looks on Windows 10 Anniversary Edition (1607) with Hyper-V installed. Bing.com was pinged from the host system, while ARIN.net was pinged from a guest VM. The PowerShell code from part 2 was used to capture the data.

Here is what the Bing.com ping looks like from the host, filtered to keep the size down to blog friendly size.

The final ping (ICMP Echo Request) is the packet as it is sent to the physical NIC before it hits the wire.

This data is different than what you’d see on a Windows Server or via a wired connection, because my Windows 10 host was connected via a wireless network at the time of trace. This adds an additional hop for the data to traverse versus a traditional wired LAN connection.

Long story short, the Multiplexor device is a bridge that links the host vmNIC to the physical wireless NIC. This is a special step needed for Hyper-V to work with wireless networks that would not be present with a wired NIC. Though the data flow is similar to the traffic when a vmSwitch is attached to an LBFO NIC – LBFO NICs are also called Multiplexor adapters, but there is no LBFO native to Windows 10.

The ping to ARIN.net is takes a shorter path to the physical network than the host.

The shorter path is because the guest operating system doesn’t have to traverse a network bridge before hitting the physical network. Instead it goes straight to the multiplexor adapter attached to the vmSwitch and out to the physical network.

In case you are curious about how long all this intra-OS routing takes, it took 0.0355 ms (milliseconds), or 35.5 µs (microseconds), for the ICMP frame to go from the guest vmNIC to the physical adapter. The Bing ping took slightly longer at 0.054 ms, or 54 µs. The network latency through Windows Hyper-V will vary, depending on hardware speed and system resource constraints.

Under 50 µs, each way, is common for guest to physical NIC traversal. If an application absolutely, positively needs every microsecond shaved off the network latency, look at SR-IOV and SET with RDMA as lower latency Hyper-V networking technologies.

Common things that go wrong

Most of the time you won’t experience any issues once Windows gets the packet. The vmNICs and vmSwitches in Windows are, by default, simple virtual devices. Think of the vmSwitches as simple L2 switches. vmNICs are basic synthetic NICs with standard feature sets. There are a few things that can cause headaches though. The three most common causes will be discussed.

VLANs

When using VLANs with Windows Network Virtualization it is best practice to leave the physical switchport, physical NIC, and Hyper-V attached LBFO adapters in trunk mode. VLANs should be attached to vmNICs only. Whether that’s a host management vmNIC, or a vmNIC attached to a guest VM.

Use multiple vmNICs when multiple VLANs are needed. One vmNIC per VLAN. The VLAN can be set via the VM settings for guest vmNICs, and via PowerShell’s Set-VMNetworkAdapterVlan cmdlet.

Secondary LBFO adapters can be added to provide VLAN support, but *never* add secondary LBFO adapters to an LBFO adapter attached to a Hyper-V vmSwitch. This will cause all kinds of routing problems in Hyper-V. Use additional host management adapters (vmNICs) if multiple host VLANs are needed.

VLAN errors will appear with the following message in Message Analyzer (filtered for readability).

NBL destined to Nic (Friendly Name: wireless) was dropped in switch (Friendly Name: wireless), Reason Packet does not match the packet filter set on the destination NIC.

The default native VLAN ID is 0. Windows will assume that all untagged traffic belongs to that VLAN.

The default VLAN state for a vmNIC is untagged. Meaning any tagged traffic will automatically be dropped by the vmSwitch.

To change the native VLAN the vmNIC must be in trunk mode, a list of allowed VLANs should be set, and the new native VLAN ID needs to be set. Changing the native VLAN will cause Hyper-V to assume that all untagged traffic belongs to that VLAN, and VLAN 0 traffic must now be tagged.

ACLs

While there isn’t a lot of configuration on a vmSwitch, there are some ACL capabilities. ACLs and Extended ACLs are set via the VM Network Adapter PowerShell cmdlet, but… the ACL is actually applied on the switchport of the vmSwitch.

These are regular old ACLs, processed in an unordered fashion. This means Hyper-V ACLs don’t have numbered priorities like many other ACL systems do. Instead it picks the winning rule based on other criteria.

Specific matches, such as MAC address or subnet, win over a wildcard (ANY) match

MAC beats Subnet

An IP address is just a subnet with a /32 prefix

The appropriate action, Allow or Deny, is applied based on the winning match

Port ACLs are bi-directional, meaning rules must be created for both inbound and outbound traffic

ACLs are the second most common cause of issues with WNV. Usually because not everyone knows about them. One admin applies an ACL, then a different admin wonders why they can’t reach a VM and, because they aren’t aware of ACLs in Hyper-V, they don’t know to look for them.

The best way to look for ACLs is via System Center Virtual Machine Manager, if that is used to manage Hyper-V or HNV/SDN, or with our good friend, PowerShell. The first link below is an article on how to manage port ACLs with SCVMM 2016. The other links point to the help pages for the PowerShell get port ACL cmdlets.

Anti-virus

Ah, anti-virus. The applications everyone both loves and hates. The security teams love the peace of mind, but the server admins hate the resource usage. The official Microsoft recommendation is to not run anti-virus on Hyper-V hosts, run it on the VM guests instead. Problem solved, right? Not so fast, says the security team! All systems must run anti-virus.

This is why Microsoft posted a fancy article detailing the recommended anti-virus configuration for Hyper-V hosts. If you must run anti-virus on a Hyper-V host, please make sure the appropriate exceptions are in place.

Part 5 of this series gets started on Container networks types. The default NAT network type gets to go first. I’ll discuss why using NAT can be fun for testing but should not be used on a production system.