OVERLAY NETWORKS WITH MY SDN CONTROLLER

2018-01-06

Intro

The objective with this project is to be able to create virtual networks
that span across several switches. For example, I have 5 VMs that run in
a 10.0.0.0/24 subnet but those VMs are on different hypervisors. I want my controller
to create an overlay network for those, and only those, VMs. It is possible
to have many overlays with the same addressing space because the switches
will only forward traffic between hosts part of the same overlay

For such a topology, OVS bridges end up looking like this:

The controller

On the controller, I configure 4 types of objects: Bridges, Networks, Routers and Hosts.
A host is a VM or a container (or anything using a TAP interface in the OVS bridge).
Each host is part of a bridge. In general, you'd setup one bridge per hypervisor.
So because I define where bridges are (on which hypervisor they run), the controller
knows that 2 hosts run on different hypervisors and they should communicate through a tunnel.
Each host is part of a network. Those are virtual network that I define in the controller.
The controller will allow communication between hosts that are on the same virtual network.
A virtual network can span across many bridges. From the controller's point of view, all
the configuration is static. For each host, I define the bridge it is on, the port number
it is using, it's IP address and mac address. The mac,port,bridge could be given to
the controller by some kind of agent that would run on the HV but in my case, I have
just hardcoded those values. The IP address of the VM can be set to anything.

Flows

DHCP

Since I pre-configure the controller with the list of hosts that will attach to it with their MAC, IP
and networkID, gateway IP etc., the controller has all the information it needs to act as a DHCP server.
So the controller adds a flow, 1 for every host that is local to the hypervisor, to match the source MAC/port
and UDP port 67 with an action to send the packet to the controller. The controller then crafts a DHCP reply
packet and sends it back on the ingress port.

ARP

Again, since the controller has all information about the hosts, it is also capable of responding to ARP queries.
Flows are installed to intercept any ARP query and the controller will craft a response. If a host makes
an ARP query, that query will never be flooded (not even on the vxlan tunnel) and will never be seen by
anyone else than the controller. Technically, no hosts should ever receive ARP requests.

Table 0

Table 0 contains the flows to match ARP and DHCP requests. If a packet does not match any flow in Table 0,
a table-miss flow will send the packet to Table 1.

Table 1

Table 1 is about matching the source of a packet. For each host on the current bridge, a flow
that matches the in_port,src_mac,src_ip will be installed. The actions are to set the
virtual network ID in the tunnel_id metadata (will be used in Table 3) and then goto Table 2
Another flow that matches the tunnel in_port is installed with an action to goto Table 3.
That tunnel flow doesn't set the tunnel_id metadata because it was already set when
the packet came through the tunnel. The tunnel_id of a packet coming in from a VXLAN
tunnel is the VNI.

Table 2

Table 2 has flows that match dst_mac set to gateway addresses as defined in Router objects.
These flows are used for routing between virtual networks. If the dst_mac is not for
a virtual router, then a table-miss flow sends the packet to table 3.

Table 3

Table 3 is about forwarding to the correct host. When a packet enters Table 3, it is
guaranteed to have a tunnel_id (set as the virtual network ID).
Table 3 contains 1 flow for each host known across all virtual networks.
All flows match the tunnel_id,dst_mac,dst_ip. The action depends on if the host
is local or resides on another hypervisor. If it is local, then the action
is simply to forward to the port of that host. If it is remote, then 2 actions
are executed: Set-Field for remote tunnel endpoint IP and output to the tunnel
port. Note that since the tunnel_id is set (in table 1), the VXLAN packet created
by OVS will have that ID as the VNI. So when entering the other HV, a flow will
match that to the virtual network ID of the host.

So since the controller knows about all VMs and virtual networks, it is able to install
those flows that will make traffic between hosts part of the same virtual network possible,
even across hypervisors, thanks to VXLAN tunnels.

Example flows

Here is an example of the flows created on Hypervisor 1 from the above example setup

First, we see that in table 0, we intercept all ARP queries from all local hosts and send them to the controller.
The controller will respond if the target is a local or a remote host

Now in table 1, we match the source of all known local hosts and set the tun_id field (VNI), which is a way for us to tag the packet with its
network ID. The last flow matches the in_port 65279, which is the tunnel port. It will make us jump to table 2. The tun_id is already
set for all packets comming in from the tunnel

These last flows, in Table 3 again, match the destination of all remote hosts and forwards the packet out of the tunnel port
after having set the tunnel destination address (address of the HV running that host)

VXLAN tunnels

I had a hard understanding how to create the tunnels at first.
I know that with OVS, we can create VXLAN tunnel ports and we just need
to forward traffic on those ports. But this requires configuring (N-1) tunnel ports
on each hypervisors if we have a total of N hypervisors. So instead I found that
there is an openflow extension (so not part of the official spec) that allows
to set the destination of the tunnel dynamically with a flow. This is done with
a "set-field" action, with a OXM class of 0x001 and field ID 32 (this is a Nicira extension).
Then you only need to create 1 tunnel port per hypervisor like this:

Using "flow" as the remote_ip allows us to set the destination IP using flows. And setting key=flow allows us to set the VNI with flows.

Routing

Routing is a very simple process. When a host sends a packet to another host,
on another network, it simply uses the mac address of the gateway as the destination
mac. So from the switch's point of view, it's just a matter of detecting that "special"
mac address and allowing to forward on a port that is on a different virtual network.

In my controller, this done by configuring
a Router object in the controller and adding networks it. The router
will allow communication between networks associated to it. When a host
needs to send traffic to another network, it needs to go through a gateway.
in the configuration shown above, we can see that an IP has been assigned as
the gateway ip in each network. The controller will respond to arp queries
for those gw IP with the mac adress of the virtual router that those networks are
associated to. So hosts will be sending their inter-network traffic using the
virtual router's mac as the destination mac.

So the controller can install flows to match traffic with that mac and
the dst_ip (with a mask) to change the network id on that packet. the packet
can then be sent to table 3 for proper port forwarding.

Example flows

Here is a representation of table 1,2,3. In each scenario, the yellow flows are the ones
that would be matched

Configuration

Like I've mentionned above, right now, the configuration is very static. We have
to pre-configure the controller and tell it exactly where (on which HV) each VMs runs,
the port number in the OVS they are attached to and their MAC address. Although the controller
could deduct some of that information, I prefer to tell it exactly what I am expecting so
I can exercise better control over my network.

So right now, if I create a VM, I need to tell the controller about that new VM and provide the following information:

MAC address

Network ID (that will be used as the VNI)

Hypervisor where the VM runs

IP Address I want assigned for the VM

Port number on the OVS bridge

Note that the chosen IP address can be configured manually on the VM, but if
choosing DHCP, the controller will intercept the request and assign that IP.
So I would need a tool to do this for me instead of doing that step manually
after creating a VM. So what could be done next is to create an orchestrator that allows me to spawn VMs on any
of the HV (transparently maybe?). At the time of creation, I would need to specify in
which network I want my VM to be part of. Then the orchestrator would choose a HV,
choose a and IP from the network that was selected and spawn the VM. After spawning
the VM, the orchestrator would know the MAC address and the OVS port number that
the VM was started with. At that point, the orchestrator has all the information
needed to tell the controller how to establish proper networking for that new VM.

What next?

I guess that a cool feature I could do next would be to have a way for those virtual networks to access an external network such as the internet. This would involve NAT and a way route to the underlay. I'll have to think about that one.