This post is all about OTV (Overlay Transport Virtualization) on the CSR1000v.
I wanted to create the post because there are alot of acronyms and terminology involved.

A secondary objective was to have a “real” multicast network in the middle, as the examples I have seen around the web, have used a direct P2P network for the DCI.
Instead, I wanted to have full multicast running in the SP core in order to gain a full understanding of the packet forwarding and encapsulation.

First off, lets talk about the topology I will be using:

Datacenters:
————
We have 2 Datacenters, one represented by Site 1 and the other by Site 2.
In the middle, we have what is in all respects a SP provider network. In your environment, this may or may not be your own transport network.

In site 1, CSR-1 is our “server”, basically all thats configured on it is an IP address (192.168.100.1/24) on its G1 interface.
SW-9 is our L2 switch, which is configured with 2 VLAN’s (Vlan 100 (SERVER-VLAN) and Vlan 900 (SITE-VLAN)). The port (e0/0) going to CSR-1 is configured as an access-port in Vlan 100.

The ports going to CSR-2 and CSR-3 (e0/1 and e0/2) are trunk ports.

In site 1, the CSR-2 and CSR-3 routers are our OTV Edge devices, which is basically just a naming convention to designate your OTV encapsulation/decapsulation devices. In site 1 we are running two of these in order to show how the redundancy is performed.

Site 2 is very similar, although here I have selected to only have 1 OTV edge device (CSR-7).

In all sites, our OTV edge devices use their G2 interface as whats called their “Join Interface”. All that really means is that this is the L3 interface going towards the DCI “cloud”.

Also on all OTV edge devices, the G1 port is a L2 only interface, connecting to the internals of the respective site.

Transport Network:
——————
We have OSPF running as our IGP between all devices, providing full unicast reachability between our routers.
Inside the transport network, we are running Any Source Multicast (ASM), with the Rendevouz Point (RP) being the loopback0 interface of CSR-5 (5.5.5.5/32). All other routers (CSR-4 and CSR-6) having statically this RP configured.

Its important to note that no PIM adjacency exists between CSR-2 and CSR-4, and the same between CSR-3 and CSR-4. And the other way around no PIM adjacency exists between CSR-6 and CSR-7 either.

The only thing thats required is that we enable PIM on the link on CSR-4 and CSR-6. The reason behind that is that in IOS configuration, this is what also enables IGMP which is what we are really after in this solution. So you can think of CSR-2, CSR-3 and CSR-7 as “clients” of the multicast network sending IGMP joins (actually these are reports) to the transport network, which then handles the real multicast forwarding.

Terminology:
————
We have already covered quite a bit of terminology in the previous introduction, but let me iterate a few here:

– Join interface = Simply the L3 interface on the edge device, which face the transport network.
– Edge Device = Just an OTV router. Sits at the boundary between the L2 network you want transported and the L3 transport network.
– AED Device = Authoritive Edge Device. This is the “active” router, doing the transport of a certain VLAN. Only 1 AED for each VLAN on the site.
– Site VLAN = The Edge devices need to elect an AED for each VLAN that needs transporting. This is the function of the Site VLAN.
– Internal Interface = A L2 interface going towards the internal datacenter site. This is where we receive the frames we need to extend across OTV.
– Overlay Interface = The logical representation within Cisco IOS that ties all the pieces together.

Verification:
————-
Enough theory, lets see this beast in action on the command-line.
First off is our “servers”, which is CSR-1 and CSR-8:

First off, we set our Site Vlan to be 900, which is again what is used for AED election, locally to the site. This Vlan should never be extended over the OTV tunnel.
Then i set the identifier for our site. This is what is used in the loop prevention, so its very important that this is unique per site!

In the overlay interface configuration, I define a few things. The first of which is the multicast configuration, where we use group address 239.1.1.1 for our control traffic and 232.1.1.0/24 for data traffic. Next I specify that our Join interface is GigabitEthernet2. Finally I configure that we want to extend vlan 100 through the use of a service instance configuration snippet.

Toward the L2 site, we have GigabitEthernet1, where I have configured a L2 configuration using two Vlan’s. We want to have the router “listening” to both our Site Vlan (900) and our Data or Server Vlan (100) which is the one we want to extend across our DCI.

Last, but not least, we have GigabitEthernet2, which is our Join interface. This is a standard L3 interface configuration, with two important statements. First is “ip pim passive” which makes the interface run multicast, but not establish any pim adjacency and the other is “ip igmp version 3” which in effect makes the interface able to utilize SSM.

On CSR-3, the exact same configuration is present, with the exception of a different Join interface IP address:

To summarize, what I have gone through in this post, is how to use the CSR1K platform to provide for DCI (Data Center Interconnect) using OTV. We have gone through the configuration of the individual devices, as well as having provided a “real” multicast transport network. We then verified the control-plane information and lastly we tested our dataplane connectivity.

I recently decided that i would like to utilize Observium as well as rancid for configuration backups on my home network. To that effect, the following links really helped me out getting it all setup correctly:

I really like this paragraph, because almost everyone wants to imitate google. Why? well, the answer to that questions seems to be what everyone is missing!

Google’s solutions were built for scale that basically doesn’t exist outside of a maybe a handful of companies with a trillion dollar valuation. It’s foolish to assume that their solutions are better. They’re just more scalable. But they are actually very feature-poor. There’s a tradeoff there. We should not be imitating what Google did without thinking about why they did it. Sometimes the “whys” will apply to us, sometimes they won’t

The quote comes from Cloud Field Day 4, from Ben Sigelman of LightStep.

In it, it highlights an interesting fact that I think is very relevant for the networking world and that is the difference between something that is complicated versus something that is complex.

There is a distinct difference in that something complicated can be broken down into its building blocks and analysed with a high degree of certainty. Think of a car engine for example. It is a very complicated piece of machinery for sure, but it is not complex, since you can divide its functionality down into components. On the other hand think of something like a virus and how it evolves. This is a complex organism that you you can’t be certain that will evolve in a predetermined fashion.

So im thinking, the way we build networks today, are we building them to be “just” complicated or are they really complex in nature instead? – The answer to this question determines how we need to manage our infrastructure!

Also, since the numbering for VNI’s is a 24 bit identifier, you have alot more flexibility than just the regular 4096 definable VLAN’s. (12 Bits .1q tags)

Each endpoint that does the encapsulation/decapsulation is called a VTEP (VxLAN Tunnel EndPoint). In our example this would be CSR3 and CSR5.

After the VxLAN header, the packet is further encapsulated into a UDP packet and forwarded across the network. This is a great solution as it doesnt impose any technical restrictions on the core of the network. Only the VTEPs needs to understand VxLAN (and probably have hardware acceleration for it as well).

Since we wont be using BGP EVPN, we will rely solely on multicasting in the network to establish who is the VTEP’s for the traffic in question. The only supported mode is BiDir mode, which is an optimization of the control plane (not the data plane), since it only has (*,G) in its multicast-routing tables.

Lets take a look at the topology i will be using for the example:

I have used a regular IOS based device in Site 1 and Site 2, to represent our L2 devices. These could be servers or end-clients for that matter. What i want to accomplish is to run EIGRP between R1 and R2 over the “fabric” using VxLAN as the tunneling mechanism.

CSR3 is the VTEP for Site 1 and CSR5 is the VTEP for Site 2.

In the “fabric” we have CSR4, along with its loopback0 (4.4.4.4/32), which is the BiDir RP and its announcing this using BSR so that CSR3 and CSR4 knows this RP information (along with the BiDir functionality). We are using OSPF as the IGP in the “fabric” to establish routing between the loopback interfaces, which will be the VTEP’s respectively for CSR3 and CSR5.

Lets verify that routing between the loopbacks are working and our RIB is correct:

We can see from this output that we are running PIM on all the relevant interfaces as well as making sure that bidir is enabled. We have also verified that we are indeed running BSR to announce Loopback0 as the RP.

Whats important here is that we will source our VTEP from loopback0 (3.3.3.3/32) and use multicast group 239.1.1.1 for the VNI 1000100. This number can be whatever you choose, i have just chosen to use a very large number and encode which VLAN this VNI is used for (Vlan 100).

Here we have a bridge domain configuration where we have 2 members. The local interface G1 on its service instance 100 and our VNI / VTEP. This is basically the glue to tie the bridge domain together end to end.

This command will show the MAC addresses learned in this particular bridge domain. On our EFP on G1 we have dynamically learned the MAC address of R1’s interface and through the NVE1 interface using VNI 1000100 we have learned the MAC address of R2. Pay attention to the fact that we know which VTEP endpoints to send the traffic to now. This means that further communication between these two end-hosts (R1 and R2) is done solely using unicast between 3.3.3.3 and 5.5.5.5 using VxLAN as the tunneling mechanism.

This command shows the status of our NVE interface. From this we can see that its in an Up/Up state. The VxLAN port is the standard destination port (4789) and we have some packets going back and forth.

Now that we have everything checked out okay in the control plane, lets see if the data plane is working by issuing an ICMP ping on R1 to R2 (they are obviously on the same subnet (192.168.100.0/24)):

Finally i want to show how the ICMP ping works in the dataplane by doing a capture on CSR4’s G2 interface:

Here we can see a ping i issued on R1’s loopback interface towards R2’s loopback interface.
I have extended the view, so you can see the encapsulation with the VxLAN header running atop the UDP packet.
The UDP packet has the VTEP endpoints (3.3.3.3/32 and 5.5.5.5/32) as the source and destination.

The VNI is what we selected to use and is used for differentiation on the VTEP.
Finally we have our L2 packet in its entirety.

Thats all I wanted to show for now. Next time I will extend this a bit and involve BGP as the control plane.
Thanks for reading!

In this post i would like to highlight a couple of “features” of ISIS.
More specifically the authentication mechanism used and how it looks in the data plane.

I will do this by configuring a couple of routers and configure the 2 authentication types available. I will then look at packet captures taken from the link between them and illustrate how its used by the ISIS process.

The 2 types of Authentication are link-level authentication of the Hello messages used to establish an adjacency and the second type is the authentication used to authenticate the LSP’s (Link State Packet) themselves.

First off, here is the extremely simple topology, but its all thats required for this purpose:

Simple, right? 2 routers with 1 link between them on Gig1. They are both running ISIS level-2-only mode, which means they will only try and establish a L2 adjacency with their neighbors. Each router has a loopback interface, which is also advertised into ISIS.

First off, lets look at the relevant configuration of CSR-02 for the Link-level authentication:

Im currently going through the INE DC videos and learning a lot about fabrics and how they work along with a fair bit of UCS information on top of that!

Im spending an average of 2.5 hours on weekdays for study and a bit more in the weekends when time permits.

I still have no firm commitment to the CCIE DC track, but at some point I need to commit to it and really get behind it. One of these days 😉

I mentioned it to the wife-to-be a couple of days ago and while she didn’t applaud the idea, at least she wasn’t firmly against it, which is always something I guess! Its very important for me to have my family behind me in these endeavours!

Im still a bit concerned about the lack of rack rentals for DCv2 from INE, which is something I need to have in place before I order a bootcamp or more training materials from them. As people know by now, I really do my best learning in front of the “system”, trying out what works and what doesn’t.

Now to spin up a few N9K’s in the lab and play around with NX-OS unicast and multicast routing!

So I just completed a purchase off eBay for a new server for my lab purposes.

For a while now I’ve been limited to 32Gb of memory on my old ESXi server, which is really more like 20Gb when my regular servers have had their share. Running a combination of different types of devices, each taking at least 4Gb of memory, doesn’t leave much room for larger labs.

I decided to go with a “real” server this time around. So I got an older Cisco UCS C200 M2 server with 2 x Xeon 5570 processors and an additional 96 Gb ram (on top of the 24 it came with). That stil leaves room for a bit of memory upgrades in the future (it supports a total of 192Gb) (had a budget on this one, so couldn’t go crazy).

Work:

Work has been crazy lately. 2 of my Team members just resigned so a lot of workload has to be shifted until we find suitable replacements. That means I’ve been working 65+ hour work weeks for a while now. Something that I dont find even remotely amusing to be honest. But I’ve been reassured that everything is being done to interview candidates, so im hopeful it will work out after the summer holidays.

We have a lot of interesting projects coming up, fx. our first production environment running Cisco ACI. This also included some very good training. Its really a different ball-game compared to the old way of doing Datacenters.

Also on my plate is some iWan solutions. Pretty interesting all in all.

Study:

Im still reading my way through the Cisco Intelligent WAN (IWAN) book. Its still on my list of things to do to take the exam I mentioned earlier, but keeping the work network running takes priority. Also I can’t help but feeling the pull of another CCIE when time permits, but its still just a thought (we all know how that usually goes right? 🙂 )

Personal:

September 16, my long-time girlfriend and I are getting married! Yes.. Married. Scary, but still something I look forward to. We’ve been together for an amazing 11 years on that date so its about time (she keeps telling me). As you may know, I proposed when I went to Las Vegas for Cisco Live last year, so its very memorable 🙂