Open vSwitch Advanced Features Tutorial

Many tutorials cover the basics of OpenFlow. This is not such a
tutorial. Rather, a knowledge of the basics of OpenFlow is a
prerequisite. If you do not already understand how an OpenFlow flow
table works, please go read a basic tutorial and then continue reading
here afterward.

It is also important to understand the basics of Open vSwitch before
you begin. If you have never used ovs-vsctl or ovs-ofctl before,
you should learn a little about them before proceeding.

Most of the features covered in this tutorial are Open vSwitch
extensions to OpenFlow. Also, most of the features in this tutorial
are specific to the software Open vSwitch implementation. If you are
using an Open vSwitch port to an ASIC-based hardware switch, this
tutorial will not help you.

This tutorial does not cover every aspect of the features that it
mentions. You can find the details elsewhere in the Open vSwitch
documentation, especially ovs-ofctl(8) and the comments in the
include/openflow/nicira-ext.h header file.

In this tutorial, paragraphs set off like this designate notes
with additional information that readers may wish to skip on a
first read.

Getting Started

This is a hands-on tutorial. To get the most out of it, you will need
Open vSwitch binaries. You do not, on the other hand, need any
physical networking hardware or even supervisor privilege on your
system. Instead, we will use a script called ovs-sandbox, which
accompanies the tutorial, that constructs a software simulated network
environment based on Open vSwitch.

You can use ovs-sandbox three ways:

If you have already installed Open vSwitch on your system, then
you should be able to just run ovs-sandbox from this directory
without any options.

If you have not installed Open vSwitch (and you do not want to
install it), then you can build Open vSwitch according to the
instructions in [INSTALL.md], without installing it. Then run
./ovs-sandbox -b DIRECTORY from this directory, substituting
the Open vSwitch build directory for DIRECTORY.

As a slight variant on the latter, you can run make sandbox
from an Open vSwitch build directory.

When you run ovs-sandbox, it does the following:

CAUTION: Deletes any subdirectory of the current directory
named "sandbox" and any files in that directory.

Creates a new directory "sandbox" in the current directory.

Sets up special environment variables that ensure that Open
vSwitch programs will look inside the "sandbox" directory
instead of in the Open vSwitch installation directory.

If you are using a built but not installed Open vSwitch,
installs the Open vSwitch manpages in a subdirectory of
"sandbox" and adjusts the MANPATH environment variable to point
to this directory. This means that you can use, for example,
man ovs-vsctl to see a manpage for the ovs-vsctl program that
you built.

Creates an empty Open vSwitch configuration database under
"sandbox".

Starts ovsdb-server running under "sandbox".

Starts ovs-vswitchd running under "sandbox", passing special
options that enable a special "dummy" mode for testing.

Starts a nested interactive shell inside "sandbox".

At this point, you can run all the usual Open vSwitch utilities from
the nested shell environment. You can, for example, use ovs-vsctl
to create a bridge:

ovs-vsctl add-br br0

From Open vSwitch's perspective, the bridge that you create this way
is as real as any other. You can, for example, connect it to an
OpenFlow controller or use ovs-ofctl to examine and modify it and
its OpenFlow flow table. On the other hand, the bridge is not visible
to the operating system's network stack, so ifconfig or ip cannot
see it or affect it, which means that utilities like ping and
tcpdump will not work either. (That has its good side, too: you
can't screw up your computer's network stack by manipulating a
sandboxed OVS.)

When you're done using OVS from the sandbox, exit the nested shell (by
entering the "exit" shell command or pressing Control+D). This will
kill the daemons that ovs-sandbox started, but it leaves the "sandbox"
directory and its contents in place.

The sandbox directory contains log files for the Open vSwitch dameons.
You can examine them while you're running in the sandboxed environment
or after you exit.

Using GDB

GDB support is not required to go through the tutorial. It is added in case
user wants to explore the internals of OVS programs.

GDB can already be used to debug any running process, with the usual
'gdb ' command.

'ovs-sandbox' also has a '-g' option for launching ovs-vswitchd under GDB.
This option can be handy for setting break points before ovs-vswitchd runs,
or for catching early segfaults. Similarly, a '-d' option can be used to
run ovsdb-server under GDB. Both options can be specified at the same time.

In addition, a '-e' option also launches ovs-vswitchd under GDB. However,
instead of displaying a 'gdb>' prompt and waiting for user input, ovs-vswitchd
will start to execute immediately. '-r' option is the corresponding option
for running ovsdb-server under gdb with immediate execution.

To avoid GDB mangling with the sandbox sub shell terminal, 'ovs-sandbox'
starts a new xterm to run each GDB session. For systems that do not support
X windows, GDB support is effectively disabled.

When launching sandbox through the build tree's make file, the '-g' option
can be passed via the 'SANDBOXFLAGS' environment variable.
'make sandbox SANDBOXFLAGS=-g' will start the sandbox with ovs-vswitchd
running under GDB in its own xterm if X is available.

Motivation

The goal of this tutorial is to demonstrate the power of Open vSwitch
flow tables. The tutorial works through the implementation of a
MAC-learning switch with VLAN trunk and access ports. Outside of the
Open vSwitch features that we will discuss, OpenFlow provides at least
two ways to implement such a switch:

An OpenFlow controller to implement MAC learning in a
"reactive" fashion. Whenever a new MAC appears on the switch,
or a MAC moves from one switch port to another, the controller
adjusts the OpenFlow flow table to match.

The "normal" action. OpenFlow defines this action to submit a
packet to "the traditional non-OpenFlow pipeline of the
switch". That is, if a flow uses this action, then the packets
in the flow go through the switch in the same way that they
would if OpenFlow was not configured on the switch.

Each of these approaches has unfortunate pitfalls. In the first
approach, using an OpenFlow controller to implement MAC learning, has
a significant cost in terms of network bandwidth and latency. It also
makes the controller more difficult to scale to large numbers of
switches, which is especially important in environments with thousands
of hypervisors (each of which contains a virtual OpenFlow switch).
MAC learning at an OpenFlow controller also behaves poorly if the
OpenFlow controller fails, slows down, or becomes unavailable due to
network problems.

The second approach, using the "normal" action, has different
problems. First, little about the "normal" action is standardized, so
it behaves differently on switches from different vendors, and the
available features and how those features are configured (usually not
through OpenFlow) varies widely. Second, "normal" does not work well
with other OpenFlow actions. It is "all-or-nothing", with little
potential to adjust its behavior slightly or to compose it with other
features.

Scenario

We will construct Open vSwitch flow tables for a VLAN-capable,
MAC-learning switch that has four ports:

p1, a trunk port that carries all VLANs, on OpenFlow port 1.

p2, an access port for VLAN 20, on OpenFlow port 2.

p3 and p4, both access ports for VLAN 30, on OpenFlow ports 3
and 4, respectively.

The ports' names are not significant. You could call them eth1
through eth4, or any other names you like.

An OpenFlow switch always has a "local" port as well. This
scenario won't use the local port.

Our switch design will consist of five main flow tables, each of which
implements one stage in the switch pipeline:

Table 0: Admission control.

Table 1: VLAN input processing.

Table 2: Learn source MAC and VLAN for ingress port.

Table 3: Look up learned port for destination MAC and VLAN.

Table 4: Output processing.

The section below describes how to set up the scenario, followed by a
section for each OpenFlow table.

You can cut and paste the ovs-vsctl and ovs-ofctl commands in each
of the sections below into your ovs-sandbox shell. They are also
available as shell scripts in this directory, named t-setup, t-stage0,
t-stage1, ..., t-stage4. The ovs-appctl test commands are intended
for cutting and pasting and are not supplied separately.

Setup

To get started, start ovs-sandbox. Inside the interactive shell
that it starts, run this command:

ovs-vsctl add-br br0 -- set Bridge br0 fail-mode=secure

This command creates a new bridge "br0" and puts "br0" into so-called
"fail-secure" mode. For our purpose, this just means that the
OpenFlow flow table starts out empty.

If we did not do this, then the flow table would start out with a
single flow that executes the "normal" action. We could use that
feature to yield a switch that behaves the same as the switch we
are currently building, but with the caveats described under
"Motivation" above.)

The new bridge has only one port on it so far, the "local port" br0.
We need to add p1, p2, p3, and p4. A shell "for" loop is one way to
do it:

In addition to adding a port, the ovs-vsctl command above sets its
"ofport_request" column to ensure that port p1 is assigned OpenFlow
port 1, p2 is assigned OpenFlow port 2, and so on.

We could omit setting the ofport_request and let Open vSwitch
choose port numbers for us, but it's convenient for the purposes
of this tutorial because we can talk about OpenFlow port 1 and
know that it corresponds to p1.

The ovs-ofctl command above brings up the simulated interfaces, which
are down initially, using an OpenFlow request. The effect is similar
to ifconfig up, but the sandbox's interfaces are not visible to the
operating system and therefore ifconfig would not affect them.

We have not configured anything related to VLANs or MAC learning.
That's because we're going to implement those features in the flow
table.

To see what we've done so far to set up the scenario, you can run a
command like ovs-vsctl show or ovs-ofctl show br0.

Implementing Table 0: Admission control

Table 0 is where packets enter the switch. We use this stage to
discard packets that for one reason or another are invalid. For
example, packets with a multicast source address are not valid, so we
can add a flow to drop them at ingress to the switch with:

We could add flows to drop other protocols, but these demonstrate the
pattern.

We need one more flow, with a priority lower than the default, so that
flows that don't match either of the "drop" flows we added above go on
to pipeline stage 1 in OpenFlow table 1:

ovs-ofctl add-flow br0 "table=0, priority=0, actions=resubmit(,1)"

(The "resubmit" action is an Open vSwitch extension to OpenFlow.)

Testing Table 0

If we were using Open vSwitch to set up a physical or a virtual
switch, then we would naturally test it by sending packets through it
one way or another, perhaps with common network testing tools like
ping and tcpdump or more specialized tools like Scapy. That's
difficult with our simulated switch, since it's not visible to the
operating system.

But our simulated switch has a few specialized testing tools. The
most powerful of these tools is ofproto/trace. Given a switch and
the specification of a flow, ofproto/trace shows, step-by-step, how
such a flow would be treated as it goes through the switch.

The first block of lines describes an OpenFlow table lookup. The
first line shows the fields used for the table lookup (which is mostly
zeros because that's the default if we don't specify everything). The
second line gives the OpenFlow flow that the fields matched (called a
"rule" because that is the name used inside Open vSwitch for an
OpenFlow flow). In this case, we see that this packet that has a
reserved multicast destination address matches the rule that drops
those packets. The third line gives the rule's OpenFlow actions.

The second block of lines summarizes the results, which are not very
interesting here.

This time the flow we handed to ofproto/trace doesn't match any of
our "drop" rules, so it falls through to the low-priority "resubmit"
rule, which we see in the rule and the actions selected in the first
block. The "resubmit" causes a second lookup in OpenFlow table 1,
described by the additional block of indented text in the output. We
haven't yet added any flows to OpenFlow table 1, so no flow actually
matches in the second lookup. Therefore, the packet is still actually
dropped, which means that the externally observable results would be
identical to our first example.

Implementing Table 1: VLAN Input Processing

A packet that enters table 1 has already passed basic validation in
table 0. The purpose of table 1 is validate the packet's VLAN, based
on the VLAN configuration of the switch port through which the packet
entered the switch. We will also use it to attach a VLAN header to
packets that arrive on an access port, which allows later processing
stages to rely on the packet's VLAN always being part of the VLAN
header, reducing special cases.

Let's start by adding a low-priority flow that drops all packets,
before we add flows that pass through acceptable packets. You can
think of this as a "default drop" rule:

ovs-ofctl add-flow br0 "table=1, priority=0, actions=drop"

Our trunk port p1, on OpenFlow port 1, is an easy case. p1 accepts
any packet regardless of whether it has a VLAN header or what the VLAN
was, so we can add a flow that resubmits everything on input port 1 to
the next table:

We don't write any rules that match packets with 802.1Q that enter
this stage on any of the access ports, so the "default drop" rule we
added earlier causes them to be dropped, which is ordinarily what we
want for access ports.

Implementing Table 2: MAC+VLAN Learning for Ingress Port

This table allows the switch we're implementing to learn that the
packet's source MAC is located on the packet's ingress port in the
packet's VLAN.

This table is a good example why table 1 added a VLAN tag to
packets that entered the switch through an access port. We want
to associate a MAC+VLAN with a port regardless of whether the VLAN
in question was originally part of the packet or whether it was an
assumed VLAN associated with an access port.

It only takes a single flow to do this. The following command adds
it:

The "learn" action (an Open vSwitch extension to OpenFlow) modifies a
flow table based on the content of the flow currently being processed.
Here's how you can interpret each part of the "learn" action above:

table=10
Modify flow table 10. This will be the MAC learning table.
NXM_OF_VLAN_TCI[0..11]
Make the flow that we add to flow table 10 match the same VLAN
ID that the packet we're currently processing contains. This
effectively scopes the MAC learning entry to a single VLAN,
which is the ordinary behavior for a VLAN-aware switch.
NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[]
Make the flow that we add to flow table 10 match, as Ethernet
destination, the Ethernet source address of the packet we're
currently processing.
load:NXM_OF_IN_PORT[]->NXM_NX_REG0[0..15]
Whereas the preceding parts specify fields for the new flow to
match, this specifies an action for the flow to take when it
matches. The action is for the flow to load the ingress port
number of the current packet into register 0 (a special field
that is an Open vSwitch extension to OpenFlow).

A real use of "learn" for MAC learning would probably involve two
additional elements. First, the "learn" action would specify a
hardtimeout for the new flow, to enable a learned MAC to
eventually expire if no new packets were seen from a given source
within a reasonable interval. Second, one would usually want to
limit resource consumption by using the FlowTable table in the
Open vSwitch configuration database to specify a maximum number of
flows in table 10.

Testing Table 2

EXAMPLE 1

The output shows that "learn" was executed, but it isn't otherwise
informative, so we won't include it here.

The -generate keyword is new. Ordinarily, ofproto/trace has no
side effects: "output" actions do not actually output packets, "learn"
actions do not actually modify the flow table, and so on. With
-generate, though, ofproto/trace does execute "learn" actions.
That's important now, because we want to see the effect of the "learn"
action on table 10. You can see that by running:

ovs-ofctl dump-flows br0 table=10

which (omitting the "duration" and "idle_age" fields, which will vary
based on how soon you ran this command after the previous one, as well
as some other uninteresting fields) prints something like:

You can see that the packet coming in on VLAN 20 with source MAC
50:00:00:00:00:01 became a flow that matches VLAN 20 (written in
hexadecimal) and destination MAC 50:00:00:00:00:01. The flow loads
port number 1, the input port for the flow we tested, into register 0.

EXAMPLE 2

The flow that this command tests has the same source MAC and VLAN as
example 1, although the VLAN comes from an access port VLAN rather
than an 802.1Q header. If we again dump the flows for table 10 with:

ovs-ofctl dump-flows br0 table=10

then we see that the flow we saw previously has changed to indicate
that the learned port is port 2, as we would expect:

Implementing Table 3: Look Up Destination Port

This table figures out what port we should send the packet to based on
the destination MAC and VLAN. That is, if we've learned the location
of the destination (from table 2 processing some previous packet with
that destination as its source), then we want to send the packet
there.

The flow's first action resubmits to table 10, the table that the
"learn" action modifies. As you saw previously, the learned flows in
this table write the learned port into register 0. If the destination
for our packet hasn't been learned, then there will be no matching
flow, and so the "resubmit" turns into a no-op. Because registers are
initialized to 0, we can use a register 0 value of 0 in our next
pipeline stage as a signal to flood the packet.

The second action resubmits to table 4, continuing to the next
pipeline stage.

We can add another flow to skip the learning table lookup for
multicast and broadcast packets, since those should always be flooded:

We don't strictly need to add this flow, because multicast
addresses will never show up in our learning table. (In turn,
that's because we put a flow into table 0 to drop packets that
have a multicast source address.)

Testing Table 3

EXAMPLE

Here's a command that should cause OVS to learn that f0:00:00:00:00:01
is on p1 in VLAN 20:

If you tried the examples for the previous step, or if you did
some of your own experiments, then you might see additional flows
there. These additional flows are harmless. If they bother you,
then you can remove them with ovs-ofctl del-flows br0 table=10.

The other way is to inject a packet to take advantage of the learning
entry. For example, we can inject a packet on p2 whose destination is
the MAC address that we just learned on p1:

Here's an interesting excerpt from that command's output. This group
of lines traces the "resubmit(,10)", showing that the packet matched
the learned flow for the first MAC we used, loading the OpenFlow port
number for the learned port p1 into register 0:

If you read the commands above carefully, then you might have noticed
that they simply have the Ethernet source and destination addresses
exchanged. That means that if we now rerun the first ovs-appctl
command above, e.g.:

Implementing Table 4: Output Processing

At entry to stage 4, we know that register 0 contains either the
desired output port or is zero if the packet should be flooded. We
also know that the packet's VLAN is in its 802.1Q header, even if the
VLAN was implicit because the packet came in on an access port.

The job of the final pipeline stage is to actually output packets.
The job is trivial for output to our trunk port p1:

ovs-ofctl add-flow br0 "table=4 reg0=1 actions=1"

For output to the access ports, we just have to strip the VLAN header
before outputting the packet:

The only slightly tricky part is flooding multicast and broadcast
packets and unicast packets with unlearned destinations. For those,
we need to make sure that we only output the packets to the ports that
carry our packet's VLAN, and that we include the 802.1Q header in the
copy output to the trunk port but not in copies output to access
ports:

Our rules rely on the standard OpenFlow behavior that an output
action will not forward a packet back out the port it came in on.
That is, if a packet comes in on p1, and we've learned that the
packet's destination MAC is also on p1, so that we end up with
"actions=1" as our actions, the switch will not forward the packet
back out its input port. The multicast/broadcast/unknown
destination cases above also rely on this behavior.