Archive

When tuning your IGP of choice, the first thing people look at is usually the hello
and dead interval. This is a flawed logic, it is true that it can help in certain
cases but convergence consists of much more than just hello timers.

Why tune timers?

Detecting that the other side of the link is down is an important part of converging.
That’s why your design should avoid putting any bump in the wires such as converters
or a L2 cloud between the L3 endpoints. If you avoid such things when one end of the
link goes down the other end will as well which provides fast detection of failure.

In rare cases you can have the link being up but traffic is not passing over it. For
such cases or for those cases where there was no chance of avoiding a converter or
L2 cloud, tuning the hello timers can help with failure detection. The answer is almost
always BFD though, if the platform supports it.

Topologies where tuning timers is bad

When using a topology where VSS is involved such as Catalyst 6500 or Catalyst 4500,
tuning the timers is very bad. A common topology might look like this:

The L3 switches are dually connected to the VSS. These L3 switches might be in the
distribution layer and the VSS is part of the core. The distribution switches run
LACP towards the VSS which acts as one device from an outside perspective.

The VSS runs Stateful Switchover (SSO) which syncs configuration, boots the standby
supervisor with the software and has the line cards ready to go in case of failure
of the primary chassis. Hardware forwarding tables are also synchronized, SSO
switchover takes somewhere up to 10 seconds.

The active VSS chassis runs the control plane. Routing protocols such as OSPF are not
HA aware, meaning that the state of the routing protocols is not synchronized between
the chassis.

When using fast timers and a switchover occurs, what happens is that OSPF detects that the
neighbor is not replying and tears down the adjacency. The secondary chassis then has to go
bring the adjacency back up by sending out hello packets, exchanging LSAs and updating
RIB/FIB. This may take as long as 20 seconds with the time included from the switchover.

Non Stop Forwarding (NSF)

NSF combined with graceful restart is a technology used to forward packets when
a switchover has occured. The goal of NSF is to delay the failure detection which
may sound strange from a convergence perspective. Remember though that the VSS acts
as one device.

With NSF the forwarding is done according to the last known FIB entries. After a
switchover the secondary VSS will use graceful restart to inform its neighbors that
it has restarted and needs to synchronize its LSDB. This is done by sending hello packets
with a special bit set and the synchronization is done Out Of Band (OOB) to not tear
down the existing adjacency. The neighbors exchange LSAs and run SPF as normal. The
RIB and FIB can then be updated and and normal forwarding ensues.

This process is dependant on that the neighbors are also NSF aware otherwise they
would tear down the adjacency when the secondary VSS is restarting its routing
processes. So the key here is that the adjacency must stay up and that’s why timers
should be left at default if running VSS. This goes for both the VSS and any routers
that are neighbors to the VSS.

Conclusion

When using VSS always leave IGP timers at the default. Fast timers ruins the NSF
process and will lead to much higher convergence times than leaving them at the
default.

The Catalyst 2960 is a very common switch in any environment that has
Cisco devices. A couple of years ago the 2960 got stacking via the
2960-S model. It also got the ability to do static routes which
was a nice feature. I used it in some deployments to do routing
locally in 2960 and then add a default route towards WAN provider.
That way I didn’t have to go through a slow CPE to route my local
VLANs.

The 2960-X and -XR are available in 24 or 48 port configurations.
Uplinks are either 2x 10 Gbit SPF+ or 4x 1 Gbit SFP. The PoE models
can support 370W or 740W of power.

The 2960-X provides up to 80 Gbps of stack bandwidth which is 2x more
compared to the 2960-S. It is now also possible to stack up to 8 switches
compared to the earlier maximum of 4. The 2960-S model uses FlexStack while
the newer -X and -XR models uses FlexStack-Plus. FlexStack-Plus supports
detecting stack port operational state in hardware and change the forwarding
according to it. This takes 100 ms or less. The older model does it in CPU
which can take 1 or 2 seconds.

Here are some notable differences between 2960-X and -XR compared to 2960-S.

Dual core CPU @ 600 MHz. 2960-S has single core

2960-XR has support for dual power supplies

256 MB of flash for -XR, 128 MB for -X. The S model has 64 MB

512 MB of DRAM compared to 256 for 2960-S

1k active VLANs compared to 255 for 2960-S

48 Etherchannel groups for -XR, 24 for -X and 6 for -S

4 MB of egress buffers instead of 2 MB

4 SPAN sessions instead of 2

32k MACs for -XR, 16k for -X and 8k for -S

24k unicast routes for -XR, 16 static routes for -X and -S

The newer models also support Netflow lite, hibernation mode and EEE.

The 2960-XR does support dynamic routing. It has support for RIP, OSPF stub,
OSPFv3 stub, EIGRP stub, HSRP, VRRP and PIM.

Here are some performance numbers:

2960-X Lan Lite has 100 Gbps of switching bandwidth and 64 active VLANS.
2960-X Lan Base has 216 Gbps of switching bandwidth and 1023 active VLANs.
The same holds true for 2960-XR with IP Lite feature set. The 2960-S had
a maximum of 255 VLANs and 176 Gbps switching bandwidth. Depending on
model the 2960-X tops out at 130.9 Mpps compared to 101.2 for 2960-S.

Cisco is releasing a new Catalyst switch and it will be called 3850. Some of
you might already have heard about it. I’ve seen some presentations about it
but now it’s offical. Lets look at some of the highlights of this new switch.
If you want the full info check out BRKCRS-2887 from Cisco Live in London.

Integrated wireless controller

Clean Air

Radio Resource Management (RRM)

802.11ac ready

Stacking, Stackpower

Flexible Netflow

Granular QoS

Energywise

It terminates CAPWAP and DTLS in hardware. One switch/stack can support up to
50 APs and 2000 clients. Wireless capacity is 40G/switch. Supports IPv4 and
IPv6 client mobility. IP base license level is required to use wireless capabilities.

The stack supports 480Gbps and the fans and power supplies are field replacable.
It also has support for stackpower and linerate on all ports. The switch supports
SFP and SFP+ modules. Different network modules can be inserted with different
capabilities. WS-C3850-NM4-1G has 4x 1G ports (SFP). WS-C3850-NM-2-10G has 2x 10G
OR 4x 1G OR 2x 1G AND 1x 10G. WS-C3850-NM-4-10G is autosensing and supports all
combinations up to 4x 10G, only supported on WS-C3850-48.

Power modules are available at 350W, 715W and 1100W and are called PWR-C1-350WAC,PWR-C1-715WAC, PWR-C1-1100WAC.

Here is a comparison to the 3750-X.

As you can see it’s a pretty good improvement compared to the 3750.

Some more features:

Cavium 6230 800 MHz 4-core CPU

IOS XE

2GB flash, 4GB DRAM

84Mpps per ASIC

Line rate for 64-byte packets

8 queues per port (wired), 4 queues per port (AP ports)

Flexible Netflow

So one major thing here is that it is actually running IOS XE and it has an
IOS daemon supporting IOS. This enables support for the multicore CPU. It
allows for hosted applications like Wireshark. Here is a look under the hood
of the switch.

Finally the Catalyst 3850 uses MQC and not MLS QoS which is nice to see. This
means the QoS features will be more comparable to those of a router. A nice
features is that you can apply different QoS settings depending on the SSID.

All in all this looks like a very intesting switch for the enterprise that has
both wired and wireless needs.

I’ve done a post earlier on Catalyst QoS. That described how to
configure the QoS features on the Catalyst but I didn’t describe
in detail how the buffers work on the Catalyst platform. In this
post I will go into more detail about the buffers and thresholds
that are used.

By default, QoS is disabled. When we enable QoS all ports
will be assigned to queue-set 1. We can configure up to two
different queue-sets.

These are the default settings. Every port on the Catalyst has
4 egress queues (TX). When a port is experiencing congestion
it needs to place the packet into a buffer. If a packet gets
dropped it is because there were not enough buffers to store it.

So by default each queue gets 25% of the buffers. The value is
in percent to make it usable across different versions of the Catalyst
since they may have different size of buffers. The ASIC will have
buffers of some size, maybe a couple of megs but this size is not known
to us so we have to use the percentages.

Of the buffers we assign to a queue we can make the buffers reserved.
This means that no other queue can borrow from these buffers. If we
compare it to CBWFQ it would be the same as the bandwidth percent command
because that guarantees X percent of the bandwidth but it may use more
if there is bandwidth available. The buffers work the same way. There is
a common pool of buffers. The buffers that are not reserved go into the
common pool. By default 50% of the buffers are reserved and the rest go
into the common pool.

There is a maximum how much buffers the queue may use and by default this
is set to 400% This means that the queue may use up to 4x more buffers than
it has allocated (25%).

To differentiate between packets assigned to the same queue the thresholds
can be used. You can configure two thresholds and then there is an implicit
threshold that is not configurable (threshold3). It is always set to the maximum the queue
can support. If a threshold is set to 100% that means it can use 100% of
the buffers allocated to a queue. It is not recommended to put a low value
for the thresholds. IOS enforces a limit of at least 16 buffers assigned
to a queue. Every buffer is 256 bytes which means that 4096 bytes are
reserved.

This table explains how the buffers works. Lets say that this port
on the ASIC has been assigned 200 buffers. Every queue gets 25% of the
buffers which is 50 buffers. However out of these 50 buffers only 50%
are reserved which means 25 buffers. The rest of the buffers go to the
common pool. The thresholds are set to 100% which means they can use 100%
of the allocated buffers to the queue which was 50 buffers. For packets
that go to threshold3 400% of the buffers can be used which means 200 buffers.
This means that a single queue can use up all the non reserved buffers
if the other queues are not using them.

To see which queue packets are getting queued to we can use the show
platform port-asic stats enqueue command.

In this output we have the four queues with three thresholds. Note that queue 0
here is actually queue 1. Queue 1 is queue 2 and so on. Weight 0 is
threshold1, weight 1 is threshold2 and weight 3 is the maximum threshold.

We can also list which frames are being dropped. To do this we use the
show platform port-asic stats drop command.

The queues are displayed in the same way here where queue 0 = queue 1.
This command can be good to find out if you are having packet loss for important
traffic like IPTV traffic or such that is dropping in a certain queue.

The documentation for Catalyst QoS can be a bit shady and by this post I
hope that you know have a better understanding how the egress queueing works.

I’m back studying and I have already booked a new lab date.
I won’t make an announcement until I get back.
This is to keep a bit of the pressure off from taking the lab.

The last couple of days I have been studying Catalyst QoS. It can get a bit messy.

There are no Catalyst 3550’s in the lab any longer, only 3560. So when practicing
forget about 3550. If you have a 3750 that is fine since it is basically the same
switch as 3560 but with stacking.

The Catalyst has 2 ingress queues, one can be used for priority. The switch also
has 4 egress queues where one can be used for priority. It is more likely to end
up with congestion on egress than ingress but we have the option of configuring
both.

Lets assume that we have a switch with a Cisco IP phone connected to it.
The switchport will have a general configuration like the one below.

Even though the port is configured as access this is actually a form of trunk since
voice and data are using different VLANs. By default QoS is disabled. This means that
the switch will be transparent to the QoS markings. Whatever the phone, computers are
setting will be transparently forwarded through the switch. As soon as we turn on QoS
with the mls qos command this behaviour is no longer true.
If we just enable QoS and do nothing more than all markings will be set to BE (0).
This is true both for CoS and DSCP. To check if QoS is enabled use show mls qos.

If we trust the device connecting to a port, most likely a phone then we setup
a trust boundary. We can trust CoS, IP precedence or DSCP. CoS or DSCP will be
more common than precedence.
The CoS is a layer 2 marking, sometimes also called 802.1p priority bits.
CoS is only available in tagged frames like on a 802.1Q trunk. There is a risk
of loosing the marking when the frame gets forwarded through different media from
Ethernet to frame relay or PPP or whatever your links are running. Because of this
it makes much sense to either trust DSCP or use the CoS value to map to a DSCP value.
If we want to configure trusting of CoS on a port we configure it like this.

interface FastEthernet0/10
mls qos trust cos

This means that the CoS marking coming in to the port is trusted. Untagged frames
will receive BE treatment since there is no marking of those packets. If we want to
mark the untagged frames we use the following configuration.

interface FastEthernet0/10
mls qos trust cos
mls qos cos 3

All untagged frames will get a CoS marking of 3. What if we want the port to
mark all the packets the same no matter comes in to the port?
We can use the override command for this.

interface FastEthernet0/10
mls qos cos 1
mls qos cos override

This will effectively set the CoS value to 1 for all frames entering the port.
We can also use the switchport priority command to instruct the Cisco phone to set a
CoS marking on packets from the computer (data) entering the IP phone.

interface FastEthernet0/10
switchhport priority extend cos 1

This will set all frames from the computer entering the phone to have a
marking of 1 no matter what the computer tries to set them to.

It is important to know that the Catalyst switch uses a concept of an
internal QoS label. This is a DSCP value which is used internally and
will define into which queues the traffic ends up.
If you type show mls qos map you will see a lot of different maps that
the Catalyst uses. The CoS to DSCP map is used by the switch so if we trust
CoS then a DSCP value will be derived from that and when the frame is exiting
the switch to another switch then the CoS value will be set according
to the DSCP to QoS mapping table. This effectively keeps the QoS labels synchronized.

Now lets take a look at the ingress queues. We have two of them.
By default queue 2 will be the priority queue. To see the default settings
use the show mls qos input-queue command. We can manipulate which queue becomes
the priority queue and this is done with the
mls qos srr-queue input priority-queue bandwidth command.
If you want to use queue 1 as the priority queue then enter a 1 in the command.
The weight defines how much bandwidth the priority queue can use. By default it uses 10%
You can set this value from 0 to 40 so that it does not starve all of the bandwidth.

The switch uses buffers if there is a need to queue packets. Remember the basic
function of QoS that without congestion there is no queueuing to begin with, only forwarding.
Unfortunately Cisco does not tell us a lot about how much buffers are available
in the Catalyst platforms. To tune the buffers we use mls qos srr-queue input buffers.
We should not assign too much of the buffers to the priority queue.
Finding optimal values depends a lot on your network and takes a lot of testing.
The safest bet might be to use Auto QoS and look what Cisco is using.
These values have been researched by Cisco and should be safe to use.
Lets temporarily enable Auto QoS and look which values we get.

With Auto QoS configured the priority queue gets 10% of bandwidth
and 33% of the buffers. The thresholds for the non priority queue are significantly
lower than the default settings. So the buffers assigns buffer space to the
queues but it does not say how much bandwidth is available to each queue.
We control this with mls qos srr-queue input bandwidth.
It is important to note here that the priority queue gets served first and then a
Shared Round Robin (SRR) algorithm is used to divide the traffic between the
two queues according to the weights. These are just weights and not necessarily
percentages although you could configure it to be.

If we look at show mls qos map cos-input-q and show mls qos dscp-input-q
we can see the maps that are used to define which queue the traffic ends up in.
We can of course set these values according to our needs.

To read this table start by reading from the left column and combining
that number with a number to a row on the right. Almost everything is mapped
to queue 1 except for DSCP 40-47 which is mapped to queue 2.

Up until now we have only discussed queues. The catalyst switch also uses
a congestion avoidance mechanism that is called Weighted Tail Drop (WTD).
The switch has three thresholds for every queue where the third threshold
is not configurable, it is always set to 100%
We can set the other two thresholds to values of our liking. Now we will
map CoS 6 to queue 2, threshold 3. We don’t want this traffic to get dropped unless
there is no other option.

The egress queueing is a bit more flexible. With the SRR algorithm we
can do some port specific bandwidth control. When it comes to egress queues
we can either shape or share a queue. A shaped queue is guaranteed an amount
of bandwidth but is also policed to that value. Even if there is no
congestion that queue still can’t use more bandwidth then it has been assigned.
The shaped value is calculated against the physical interface speed.
Look at the following command.

Rack18SW1(config-if)#srr-queue bandwidth shape 25 0 0 0

How much bandwidth did we just assign to queue 1? 25 Mbit?
We assigned 4 Mbit since (1/25)*100 = 4. When we set the other queues to 0
this means that they are operating in shared mode instead of shaped.
Now we configure the three other queues.

Rack18SW1(config-if)#srr-queue bandwidth share 33 33 33 33

How much does every queue get? Notice I put a 33 for queue 1 but
that will do nothing since it is operating in shaped mode. That leaves
us with the other three queues. To calculate their share we use
(33/33+33+33)*96 = 32 Mbit. So these values are just a weight
and we have to subtract the value from the shaped queue when calculating
how much the other queues get. When operating in shared mode if one
queue is not using all of its bandwidth the other queues may cut into this.
This is different compared to the shaped mode.

Assigning values to the egress queue depending on CoS or DSCP works
the same way as for ingress.

The buffer values can be a bit confusing. First we define how big a share
the queue gets of the buffers in percent. The thresholds will define when
traffic will be dropped. For queue 1 we start dropping traffic in threshold 1 at 50%
Then we drop traffic in threshold 2 at 200%
How can a queue get to 200%?! The secret here is that a queue can outgrow
the buffers we assign if there are buffers available in the common pool.
This is where the reserved values comes into play.
Every queue gets assigned buffers but we can define that only 50% of
these buffers are strictly reserved for the queue.
The other 50% goes into a common pool and can be used by the other queues as well.
We then set a maximum value for the queue which says that it can grow
up to 400% but no more than that.

Early in this post I talked about the priority queue for egress queues.
This how we enable it.

Rack18SW1(config)#int f0/10
Rack18SW1(config-if)#priority-queue out

This will always be queue 1 and is not configurable.

Now lets move on to some other things we can do with QoS. Lets assume that we
have a customer connecting to switch and internally they are using totally different
DSCP values than we want. We can use a DSCP mutation map for that.

We also have the option of using policy maps just like on routers.
And we can even police traffic. This policy-map will match all ICMP and police
it to 128k with a marking of EF, any exceeding traffic will be remarked
to DSCP 0.

This is a huge table showing how much traffic with different markings
are coming in and going out. At the very end you see the Policer which
shows how much is in profile and out of profile.

So that is one way of configuring policy-maps. When using the
catalyst switches they can use QoS in either VLAN based mode or port based mode.
If we use VLAN based mode we will apply the policy to a SVI instead.
This might be more scalable depending on your setup.
The caveat with using a policy-map on SVI is that you can’t police in the parent map.
You need a child map for that. Lets look at an example using a parent and child map.
Any IP traffic from trunks may use 256k (Fa0/13 – 21) and traffic from Fa0/6 will
be restricted to 56k. This will all be configured for VLAN 146.

And before I leave you there is this final thing I want to show.
That is how to limit the egress traffic on an interface with the SRR command.
Let’s say that We have a 100 Mbit interface but customer only pays for 4 Mbit.
We can use this command.

The lowest we can set is 10 though. If we have a port running at
100 Mbit that will leave us with 10 Mbit. We can set the port to 10 Mbit
and configure this to 40% Then we will achieve the value that was requested.

This post turned out to be very long but I hope it has been informative
and I know for sure it helped me to solidify the concepts.
I hope it will help a lot of other people as well.