A micro-loop is a packet forwarding loop that may occur transiently among two or more routers in a hop-by-hop packet forwarding paradigm.

In this document, we are trying to analyze the impact of using different Link State IGP implementations in a single network in regards of micro-loops. The analysis is focused on the SPF triggers and SPF delay algorithm.

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

Some of those timers are standardized in protocol specification, some are not especially the SPF computation related timers.

For non standardized timers, implementations are free to implement it in any way. For some standardized timer, we can also see that rather than using static configurable values for such timer , implementations may offer dynamically adjusted timers to help controlling the churn.

We will call SPF delay, the timer that exists in most implementations that specifies the required delay before running SPF computation after a SPF trigger is received.

A micro-loop is a packet forwarding loop that may occur transiently among two or more routers in a hop-by-hop packet forwarding paradigm. We can observe that these micro-loops are formed when two routers do not update their Forwarding Information Base (FIB) for a certain prefix at the same time. The micro-loop phenomenon is described in [I-D.ietf-rtgwg-microloop-analysis].

In multi vendor networks, using different implementations of a link state protocol may favor micro-loops creation during convergence time due to discrepancies of timers. Service Providers are already aware to use similar timers for all the network as best practice, but sometimes it is not possible due to limitation of implementations.

This document will present why it sounds important for service provider to have consistent implementations of Link State protocols across vendors. We are particularly analyzing the impact of using different Link State IGP implementations in a single network in regards of micro-loops. The analysis is focused on the SPF triggers and SPF delay algorithm in a first step.

This document is only stating the problem, and defining some work items but its not intended to provide a solution.

The micro-loop appears due to the asynchronous convergence of nodes in a network when a event occurs.

Multiple factors (and combination of these factors) may increase the probability for a micro-loop to appear :

delay of failure notification : the more B is advised of the failure later than A, the more a micro-loop may appear.

SPF delay : most of the implementations supports a delay for the SPF computation to try to catch as many events as possible. If A uses a SPF delay timer of x msec and B uses a SPF delay timer of y msec and x < y, B would start converging after A leading to a potential micro-loop.

SPF computation time : mostly a matter of CPU power and optimizations like incremental SPF. If A computes SPF faster than B, there is a chance for a micro-loop to appear. CPUs are today faster enough to consider SPF computation time as negligeable (order of msec in a large network).

Depending of the change advertised in LSP/LSA, the topology may be affected or not. An implementation can decide to not run SPF (and only run IP reachability) if the advertised change is not affecting topology.

Different strategies exists to trigger SPF :

Always run full SPF whatever the change to process.

Run only Full SPF when required : e.g. if a link fails, a local node will run an SPF for its local LSP update. If the LSP from the neighbor (describing the same failure) is received after SPF has started, the local node can decide that a new full SPF is not required as the topology has not change.

If topology does not change, only recompute reachability.

As pointed in Section 1, SPF optimization are not mandatory in specifications, leading to multiple strategies to be implemented.

In the diagram above, we consider a flow of packet from S to D. We consider that S is using optimized SPF triggering (Full SPF is triggered only when necessary), and two steps SPF delay (rapid=150ms,rapid-runs=3, slow=1s). As implementation of S is optimized, Partial Reachability Computation (PRC) is available. We consider the same timers as SPF for delaying PRC. We consider that E is using a SPF trigger strategy that always compute Full SPF and exponential backoff strategy for SPF delay (start=150ms, inc=150ms, max=1s)

We also consider the following sequence of events (note : the timescale does not intend to represent a real router timescale where jitters are introduced to all timers) :

t0=0 ms : a prefix is declared down in the network. We consider this event to happen at time=0.

200ms : the prefix is declared as up.

400ms : a prefix is declared down in the network.

1000ms : S-D link fails.

Route computation event time scale

Time

Network Event

Router S events

Router E events

t0=0

Prefix DOWN

10ms

Schedule PRC (in 150ms)

Schedule SPF (in 150ms)

160ms

PRC starts

SPF starts

161ms

PRC ends

162ms

RIB/FIB starts

163ms

SPF ends

164ms

RIB/FIB starts

175ms

RIB/FIB ends

178ms

RIB/FIB ends

200ms

Prefix UP

212ms

Schedule PRC (in 150ms)

214ms

Schedule SPF (in 150ms)

370ms

PRC starts

372ms

PRC ends

373ms

SPF starts

373ms

RIB/FIB starts

375ms

SPF ends

376ms

RIB/FIB starts

383ms

RIB/FIB ends

385ms

RIB/FIB ends

400ms

Prefix DOWN

410ms

Schedule PRC (in 300ms)

Schedule SPF (in 300ms)

710ms

PRC starts

SPF starts

711ms

PRC ends

712ms

RIB/FIB starts

713ms

SPF ends

714ms

RIB/FIB starts

716ms

RIB/FIB ends

RIB/FIB ends

1000ms

S-D link DOWN

1010ms

Schedule SPF (in 150ms)

Schedule SPF (in 600ms)

1160ms

SPF starts

1161ms

SPF ends

1162ms

Micro-loop may start from here

RIB/FIB starts

1175ms

RIB/FIB ends

1612ms

SPF starts

1615ms

SPF ends

1616ms

RIB/FIB starts

1626ms

Micro-loop ends

RIB/FIB ends

In the table above, we can see that due to discrepancies in SPF management, after multiple events (different types of event), SPF delays are completely misaligned between nodes leading to long micro-loop creation.

The same issue can also appear with only single type of events as displayed below :

Standardize "slowdown" timer algorithm including its association to a particular timer : authors of this document does not presume that the same algorithm must be used for all timers.

Using the same event sequence as in figure 2, we may expect fewer and/or shorter micro-loops using standardized implementations.

Route computation event time scale

Time

Network Event

Router S events

Router E events

t0=0

Prefix DOWN

10ms

Schedule PRC (in 150ms)

Schedule SPF (in 150ms)

160ms

PRC starts

PRC starts

161ms

PRC ends

162ms

RIB/FIB starts

PRC ends

163ms

RIB/FIB starts

175ms

RIB/FIB ends

176ms

RIB/FIB ends

200ms

Prefix UP

212ms

Schedule PRC (in 150ms)

213ms

Schedule PRC (in 150ms)

370ms

PRC starts

PRC starts

372ms

PRC ends

373ms

RIB/FIB starts

PRC ends

374ms

RIB/FIB starts

383ms

RIB/FIB ends

384ms

RIB/FIB ends

400ms

Prefix DOWN

410ms

Schedule PRC (in 300ms)

Schedule PRC (in 300ms)

710ms

PRC starts

PRC starts

711ms

PRC ends

PRC ends

712ms

RIB/FIB starts

713ms

RIB/FIB starts

716ms

RIB/FIB ends

RIB/FIB ends

1000ms

S-D link DOWN

1010ms

Schedule SPF (in 150ms)

Schedule SPF (in 150ms)

1160ms

SPF starts

1161ms

SPF ends

SPF starts

1162ms

Micro-loop may start from here

RIB/FIB starts

SPF ends

1163ms

RIB/FIB starts

1175ms

RIB/FIB ends

1177ms

Micro-loop ends

RIB/FIB ends

As displayed above, there could be some other parameters like router computation power, flooding timers that may also influence micro-loops. In the figure 5, we consider E to be a bit slower than S, leading to micro-loop creation. Despite of this, we expect that by aligning implementations at least on SPF trigger and SPF delay, service provider may reduce number or duration of micro-loops.