]>
Link State protocols SPF trigger and delay algorithm impact on IGP micro-loopsOrange Business Servicestephane.litkowski@orange.comOrangebruno.decraene@orange.comDeutsche Telekommartin.horneffer@telekom.deRouting Area Working GroupA micro-loop is a packet forwarding loop that may occur transiently
among two or more routers in a hop-by-hop packet forwarding paradigm.
In this document, we are trying to analyze the impact of using different Link State IGP implementations in a single network in regards of micro-loops.
The analysis is focused on the SPF triggers and SPF delay algorithm.The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in .
Link State IGP protocols are based on a topology database on which a SPF (Shortest Path First) algorithm like Dijkstra is implemented to find the optimal routing paths.
Specifications like IS-IS () propose some optimizations of the route computation (See Appendix C.1) but not all the implementations are following those not mandatory optimizations.We will call SPF trigger, the events that would lead to a new SPF computation based on the topology.
Link State IGP protocols, like OSPF () and IS-IS (), are using plenty of timers to control the router behavior in case of churn: SPF delay, PRC delay, LSP generation delay, LSP flooding delay, LSP retransmission interval ...
Some of those timers are standardized in protocol specification, some are not especially the SPF computation related timers.For non standardized timers, implementations are free to implement it in any way.
For some standardized timer, we can also see that rather than using static configurable values for such timer, implementations may offer dynamically adjusted timers to help controlling the churn.We will call SPF delay, the timer that exists in most implementations that specifies the required delay before running SPF computation after a SPF trigger is received.
A micro-loop is a packet forwarding loop that may occur transiently
among two or more routers in a hop-by-hop packet forwarding paradigm. We can observe that these micro-loops are formed when two routers do not update their Forwarding Information Base (FIB) for a certain prefix at the same time.
The micro-loop phenomenon is described in .
Some micro-loop mitigation techniques have been defined by IETF (e.g. , ) but are not implemented due to complexity or are not providing a complete mitigation.
In multi vendor networks, using different implementations of a link state protocol may favor micro-loops creation during the convergence process due to discrepancies of timers.
Service Providers are already aware to use similar timers for all the network as a best practice, but sometimes it is not possible due to limitations of implementations.
This document will present why it sounds important for service providers to have consistent implementations of Link State protocols across vendors. We are particularly analyzing the impact of using different Link State IGP implementations in a single network in regards of micro-loops.
The analysis is focused on the SPF triggers and the SPF delay algorithm.
This document is only stating the problem, and defining some work items but its not intended to provide a solution.
A ---- B
| |
10 | | 10
| |
C ---- D
| 2 |
Px Px
Figure 1
In the figure above, A uses primarily the AC link to reach C. When the AC link fails, IGP convergence occurs. If A converges before B, A will forward the traffic to C through B, but as B as not converged yet, B will loop back traffic to A, leading to a micro-loop.
The micro-loop appears due to the asynchronous convergence of nodes in a network when an event occurs.
Multiple factors (and combination of these factors) may increase the probability for a micro-loop to appear:
the delay of failure notification: the more B is advised of the failure later than A, the more a micro-loop may have a chance to appear.the SPF delay: most of the implementations supports a delay for the SPF computation to try to catch as many events as possible. If A uses an SPF delay timer of x msec and B uses an SPF delay timer of y msec and x < y, B would start converging after A leading to a potential micro-loop.the SPF computation time: mostly a matter of CPU power and optimizations like incremental SPF. If A computes its SPF faster than B, there is a chance for a micro-loop to appear. CPUs are today faster enough to consider SPF computation time as negligeable (order of msec in a large network).the RIB and FIB prefix insertion speed or ordering: highly implementation dependant.
This document will focus on analysis SPF delay (and associated triggers).
Depending of the change advertised in LSP/LSA, the topology may be affected or not.
An implementation may avoid running the SPF computation (and may only run IP reachability computation instead) if the advertised change is not affecting topology.
Different strategies exists to trigger the SPF computation:
An implementation may always run a full SPF whatever the change to process.An implementation may run a full SPF only when required: e.g. if a link fails, a local node will run an SPF for its local LSP update. If the LSP from the neighbor (describing the same failure)
is received after SPF has started, the local node can decide that a new full SPF is not required as the topology has not change.If the topology does not change, an implementation may only recompute the IP reachability.
As pointed in , SPF optimizations are not mandatory in specifications, leading to multiple strategies to be implemented.
Implementations of link state routing protocols use different strategies to delay the SPF computation. We usually see the following:
Two step delay.Exponential backoff delay.
Those behavior will be explained in the next sections.
The SPF delay is managed by four parameters:
Rapid delay: amount of time to wait before running SPF.Rapid runs: amount of consecutive SPF runs that can use the rapid delay. When the amount is exceeded the delay moves to the slow delay value .Slow delay: amount of time to wait before running SPF.Wait time: amount of time to wait without events before going back to the rapid delay.
Example: Rapid delay = 50msec, Rapid runs = 3, Slow delay = 1sec, Wait time = 2sec
SPF delay time
^
|
|
SD- | x xx x
|
|
|
RD- | x x x x
|
+---------------------------------> Events
| | | | || | |
< wait time >
The algorithm has two modes: the fast mode and the backoff mode. In the backoff mode, the SPF delay is increasing exponentially at each run.
The SPF delay is managed by four parameters:
First delay: amount of time to wait before running SPF. This delay is used only when SPF is in fast mode.Incremental delay: amount of time to wait before running SPF. This delay is used only when SPF is in backoff mode and increments exponentially at each SPF run.Maximum delay: maximum amount of time to wait before running SPF.Wait time: amount of time to wait without events before going back to the fast mode.
Example: First delay = 50msec, Incremental delay = 50msec, Maximum delay = 1sec, Wait time = 2sec
SPF delay time
^
MD- | xx x
|
|
|
|
|
| x
|
|
|
| x
|
FD- | x x x
ID |
+---------------------------------> Events
| | | | || | |
< wait time >
FM->BM -------------------->FM
S ---- E
| |
10 | | 10
| |
D ---- A
| 2
Px
Figure 2
In the diagram above, we consider a flow of packet from S to D. We consider that S is using optimized SPF triggering (Full SPF is triggered only when necessary), and two steps SPF delay (rapid=150ms,rapid-runs=3, slow=1s). As implementation of S is optimized, Partial Reachability Computation (PRC) is available.
We consider the same timers as SPF for delaying PRC.
We consider that E is using a SPF trigger strategy that always compute Full SPF and exponential backoff strategy for SPF delay (start=150ms, inc=150ms, max=1s)
We also consider the following sequence of events (note : the time scale does not intend to represent a real router time scale where jitters are introduced to all timers) :
t0=0 ms: a prefix is declared down in the network. We consider this event to happen at time=0.200ms: the prefix is declared as up.400ms: a prefix is declared down in the network.1000ms: S-D link fails.TimeNetwork EventRouter S eventsRouter E eventst0=0Prefix DOWN10msSchedule PRC (in 150ms)Schedule SPF (in 150ms)160msPRC startsSPF starts161msPRC ends162msRIB/FIB starts163msSPF ends164msRIB/FIB starts175msRIB/FIB ends178msRIB/FIB ends200msPrefix UP212msSchedule PRC (in 150ms)214msSchedule SPF (in 150ms)370msPRC starts372msPRC ends373msSPF starts373msRIB/FIB starts375msSPF ends376msRIB/FIB starts383msRIB/FIB ends385msRIB/FIB ends400msPrefix DOWN410msSchedule PRC (in 300ms)Schedule SPF (in 300ms)710msPRC startsSPF starts711msPRC ends712msRIB/FIB starts713msSPF ends714msRIB/FIB starts716msRIB/FIB endsRIB/FIB ends1000msS-D link DOWN1010msSchedule SPF (in 150ms)Schedule SPF (in 600ms)1160msSPF starts1161msSPF ends1162msMicro-loop may start from hereRIB/FIB starts1175msRIB/FIB ends1612msSPF starts1615msSPF ends1616msRIB/FIB starts1626msMicro-loop endsRIB/FIB ends
In the table above, we can see that due to discrepancies in the SPF management, after multiple events (of a different type), the values of the SPF delay are completely misaligned between nodes leading to long micro-loops creation.
The same issue can also appear with only single type of events as displayed below:TimeNetwork EventRouter S eventsRouter E eventst0=0Link DOWN10msSchedule SPF (in 150ms)Schedule SPF (in 150ms)160msSPF startsSPF starts161msSPF ends162msRIB/FIB starts163msSPF ends164msRIB/FIB starts175msRIB/FIB ends178msRIB/FIB ends200msLink DOWN212msSchedule SPF (in 150ms)214msSchedule SPF (in 150ms)370msSPF starts372msSPF ends373msSPF starts373msRIB/FIB starts375msSPF ends376msRIB/FIB starts383msRIB/FIB ends385msRIB/FIB ends400msLink DOWN410msSchedule SPF (in 150ms)Schedule SPF (in 300ms)560msSPF starts561msSPF ends562msMicro-loop may start from hereRIB/FIB starts568msRIB/FIB ends710msSPF starts713msSPF ends714msRIB/FIB starts716msMicro-loop endsRIB/FIB ends1000msLink DOWN1010msSchedule SPF (in 1s)Schedule SPF (in 600ms)1612msSPF starts1615msSPF ends1616msMicro-loop may start from hereRIB/FIB starts1626msRIB/FIB ends2012msSPF starts2014msSPF ends2015msRIB/FIB starts2025msMicro-loop endsRIB/FIB ends
In order to enhance the current Link State IGP behavior, authors would encourage working on
standardization of some behaviours.
Authors are proposing the following work items :
Standardize SPF trigger strategy.Standardize computation timer scope: single timer for all computation operations, separated timers ...Standardize "slowdown" timer algorithm including its association to a particular timer: authors of this document does not presume that the same algorithm
must be used for all timers.Using the same event sequence as in figure 2, we may expect fewer and/or shorter micro-loops using standardized implementations.TimeNetwork EventRouter S eventsRouter E eventst0=0Prefix DOWN10msSchedule PRC (in 150ms)Schedule SPF (in 150ms)160msPRC startsPRC starts161msPRC ends162msRIB/FIB startsPRC ends163msRIB/FIB starts175msRIB/FIB ends176msRIB/FIB ends200msPrefix UP212msSchedule PRC (in 150ms)213msSchedule PRC (in 150ms)370msPRC startsPRC starts372msPRC ends373msRIB/FIB startsPRC ends374msRIB/FIB starts383msRIB/FIB ends384msRIB/FIB ends400msPrefix DOWN410msSchedule PRC (in 300ms)Schedule PRC (in 300ms)710msPRC startsPRC starts711msPRC endsPRC ends712msRIB/FIB starts713msRIB/FIB starts716msRIB/FIB endsRIB/FIB ends1000msS-D link DOWN1010msSchedule SPF (in 150ms)Schedule SPF (in 150ms)1160msSPF starts1161msSPF endsSPF starts1162msMicro-loop may start from hereRIB/FIB startsSPF ends1163msRIB/FIB starts1175msRIB/FIB ends1177msMicro-loop endsRIB/FIB ends
As displayed above, there could be some other parameters like router computation power, flooding timers that may also influence micro-loops.
In the figure 5, we consider E to be a bit slower than S, leading to micro-loop creation. Despite of this, we expect that by aligning implementations
at least on SPF trigger and SPF delay, service provider may reduce the number and the duration of micro-loops.
This document does not introduce any security consideration.
Authors would like to thank Mike Shand for his useful comments.This document has no action for IANA.
&RFC2119;
&RFC2328;
&RFC1195;
&RFC6976;
&ULOOP-DELAY;
&ULOOP;