Open Source Network Virtualization

In this article we explain how the MN Agent simulates the overlay topology. This post is the third in a series intended to familiarize users with MidoNet’s overlay virtual networking models. In part 1 we discussed MN’s Provider Router and in Part 2 we discussed Tenant Routers and Networks.

The Provider Router port P2 has been linked to the Tenant Router port P1.

The Provider router has a route all traffic to 20.20.0.20 to the Tenant via P2.

The Tenant router has a default route to the Provider via P1.

The Tenant Router uses the 20.20.0.20 address for port masquerading.

The Tenant Router has a port on the 10.10.0.0/24 subnet with address 10.10.0.254.

A VM instance launched on Compute Host 5 has its vNIC connected to tap123.

tap123 inserted into the OVS datapath as port #10.

The VM is on the 10.10.0.0/24 subnet and has learned its IP, 10.10.0.1, and its gateway IP, 10.10.0.254, via DHCP.

The MidoNet virtual bridge that implements the Neutron Network has been seeded with MAC-table and ARP-table entries for both the Router’s and VM’s MAC and IP.

What happens when the VM sends an ARP request to resolve the gateway IP, 10.10.0.254?

The ARP packet misses in the datapath (OVS/kernel or DPDK-based flow switch) and is kicked up (via Netlink) to the MN Agent on Host5. This is exactly like Open vSwitch in-kernel datapath kicking up a packet to the vSwitch user-space daemon.

The MN Agent checks its in-process copy of the datapath flow table (this part is the same as with Open vSwitch daemon) for a match. A match occurs when a second or subsequent packet in a flow races with a request from the user-space agent to the datapath (via Netlink) to install a flow rule. If there’s a flow with no actions, the packet is dropped. If there’s a flow with a non-empty list of actions, the MN Agent sends a request to the datapath to apply those actions to the packet (the actions may change header fields and emit the packet from datapath ports).

If there was no match in the MN Agent’s copy of the datapath flow table, then a user-space Wildcard flow table is checked. Note that even when the datapath supports wildcarding features, MN Agent may not fully use them (e.g. prefers more granular flows for statistics/counting purposes). If there’s a match in the Wildcard flow table, the MN Agent copies the actions and sends two requests to the datapath: 1) to install a flow with the packet’s header and applying those actions (even if the actions are an empty-list, meaning “drop”, and 2) if the actions are a non-empty list, to apply the actions to the packet that missed in the datapath (this is necessary because the datapath does not buffer packets it kicks up to user-space).

If there was no match in the Wildcard flow table, then the packet should go to the Simulation stage of the MN Agent. Two things have to happen first:

There may already be a packet with an identical header (if the sender is sending fast enough) in the Simulation phase. A prep phase performs de-duplication: if an identical Simulation is already in progress, this packet gets buffered and will receive the same treatment determined by the in-progress Simulation. There is a limit to the number of packets the MN Agent will buffer this way.

The Simulation only understands overlay topology elements. So the prep-stage also translates physical layer concepts into overlay/virtual layer concepts. For example, the packet ingressed datapath-port #10, that translates to ingressing the overlay topology at virtual bridge port P5. Now the packet is ready for Simulation.

The Simulation fetches an local representation of port P5. If there’s a port-level firewall, it would be a filter inside the overlay port object. This is where MidoNet implements Neutron Security Groups and Anti-spoofing. That’s a topic of Part 4 so for now let’s assume there’s no port-level firewall. The Simulation knows the packet traverses the port in-bound (from the perspective of the virtual bridge) and therefore the packet enters the virtual bridge.

The Simulation fetches an internal representation of the virtual bridge, along with its MAC table and ARP table.

MAC learning is performed on the source address. If the MAC table does not already contains an entry mapping the packet’s source MAC to P5, then the Simulation registers a device state change to add such an entry (possibly replacing an older entry for that MAC). The change is processed after the Simulation completes, and propagates to any other host that is caching this virtual bridge.

The bridge’s pre-forwarding filters are evaluated/simulated. We’ll assume no pre-forwarding filters are set.

The bridge recognizes this is an ARP request and checks its ARP table to see if there’s an entry for 10.10.0.254. There is and it resolves to the P4’s MAC address. This entry was added when the Router’s Neutron port was created. At that time, the MidoNet’s virtual router’s P4 port was created, as was the virtual bridge’s P3 port, and the two were linked (MidoNet stores a “peer” port ID in each of P3 and P4’s configuration that represents a linked-to-port).

The bridge generates the corresponding ARP reply packet and queues it for emission from port P5. This will be handled in a separate simulation.

Since the bridge is able to answer the ARP it consumes the packet.

The Simulation terminates indicating that the packet was consumed. The packet+results are transferred back to the prep stage to reverse the overlay/underlay translation and then to apply the same result any buffered packets with the same header. In this case, since the packet was “consumed” we would just discard those packets (as a form of DOS protection).

Normally packet+results are passed back to the Wildcard Flow Table stage. In this case the Simulation did not install any flow, so handling for this packet is already complete.

This diagram illustrates how the VM, datapath, and MN Agent are connected.

This illustrates the path of the ARP request from VM through datapath to MN Agent and through its packet processing stages.

Let’s briefly describe what happens to that ARP reply packet that was queued for Simulation:

This packet was generated by the virtual topology and starts its life in the Simulation stage. Its context indicates that the virtual bridge is emitting it from P5.

The Simulation fetches the local representation of P5. If there were an egress/out-bound (from the perspective of the port’s owner, the bridge) filter, the Simulation would evaluate it. There isn’t so the Simulation determines that the packet should be emitted from P5.

The Simulation realizes that P5 is an exterior port, in the sense that it’s at the boundary of the overlay and physical worlds. The Simulation therefore terminates with the result being an action “Forward(P5)”.

Control passes to the Simulation prep phase where reverse translation of overlay-to-physical concepts occurs. The result action becomes “Forward(datapath-port #10)”. Since this is a topology-generated packet, there are no buffered packets with identical headers.

This stage also realizes there are no flows to install. It enqueues a Netlink request that sends the packet and the now datapath-compatible action to the datapath.

When the datapath receives the Netlink request, it will apply the Forward action to the packet included in the request. As a result the VM will receive the ARP reply.

This diagram illustrates how the ARP reply generated by the virtual bridge starts its life in the Simulation Stage, traverses MN’s packet-processing stages, and is forwarded to the datapath for emission towards the VM.

If the VM were to send a Ping request to the gateway IP, 10.10.0.254, the description would be very similar, with the difference that the Simulation would have the packet actually reach the Router, and the Router would consume the packet and emit a Ping reply.

We haven’t illustrated a case where a packet results in a flow being installed to drop packets, or to forward packets to another local VM, or to forward packets to a remote VM via a tunnel. We’ll leave that for a future post. Our goal in this article was to give the reader an understanding of MN Agent’s packet processing workflow and what we mean by topology “Simulation”.