Building network automation solutions

9 module online course

Clue started an interesting discussion on the NANOG mailing list. He’s inherited a network that extended its internal OSPF to its multihomed customers and wondered whether he should leave the network as it is, change OSPF to IS-IS or deploy BGP. Here are a few thoughts from my reply.

Please remember that we were discussing running global OSPF with the customer routers. Running OSPF in a VRF is a different story, as the customer cannot impact another customer’s routing (they can only burn your CPU cycles).

Do not ever run an SPF routing protocol (OSPF or IS-IS) with your customer. They can insert anything they want into it, be it due to configuration mistake, malicious intent or third-party hijacking, and your whole network (or at least the other customers) will be affected.

Just to give you a few examples:

They could hijack the host route to your DNS server and spoof every other customer of yours that uses your DNS (I haven’t seen this one yet, but it’s feasible).

They could hijack the host route to your POP3 server and collect the usernames and passwords of your residential users (I’ve seen this in a production network, but the attack vector was not OSPF but another routing protocol).

Company A could hijack the host route to the web server of company B.

They could insert a better default route than you do and at least some of your routers will listen to them (I’ve seen this done with OSPF).

If they ever make a total mess and start flapping their LSAs, your whole network will be affected and all your routers will burn CPU running SPF algorithm.

If you absolutely insist on not using BGP (but then BGP is the only currently available routing protocol designed to handle routing in scenarios where the two parties don't necessarily trust each other), use RIP. It's safer than OSPF; at least you can filter the incoming updates.

I’ve also seen a Service Provider running RIP with their customer … but they were not using any filters when redistributing RIP routes into their IGP.

Related posts by categories

6 comments:

Another solution is to run a brand new OSPF/ISIS process and redistribute it into the legacy IGP. The customers won't see the change (except a brief connection loss) and you will be able to filter the updates in order to protect the "inner" network.

Many of the small ISP networks I've seen are either running an IGP with customers, or redistributing customer static routes into their IGP (instead of BGP). Then they redistribute the IGP into BGP (usually with no filters) and hope for the best. I've had numerous conversations trying to explain the nightmare waiting to happen, but as far as I know none of them has ever changed the practice.

This is absolutely better than the "original" idea, but still has a few drawbacks. Unless you deploy OSPF process per customer, other customers in the same OSPF process could be impacted (things could get a bit better if you run each customer in a separate area).

Additionally, you have to use "distribute-list in" in customer OSPF processes on edge routers to prevent invalid OSPF routes from entering the IP routing table.

Best option (ISIS) is of course to flood all LSPs with all bits set in sequence number and invalid information so that you are closest in terms of metric to everyone, you'd break entire network.Reloading boxes wouldn't help a bit, as if some box is up, it'll reflood the broken data.

Few ways to recover1) reload all boxes at same time2) wait for LSP to time out, many networks have LSP lifetime maxed to 18h3) change net address of each box

My own horror story. At one point we had OSPF happily in the core and were asked (for the 1st time) to deliver an active/active dual link to a customer site in quick order. We took the easy way out and simply extended the core ospf out to the CPE. Unfortunately the method stuck for a limited number of such connections before a more robust solution was used. Wind forward 2 years and we start seeing horrendous churn in the core OSPF. No route shows up as being stable for more than 30 mins. After a considerable amount of debugging mostly in the small hours when other changes were minimal we identified that all OSPF routes were being flushed milliseconds before they should have be refreshed. The flush instantly triggered a refresh but every route was disappearing for a few seconds every 30 mins.

The flush was coming from one of the CPEs deployed above...who's clock was running almost exactly twice as fast as normal. So it was seeing the routes hit the 60min (no refresh) and flush. A few cyles slower and we'd probably never have had a problem. It took me several minutes staring at the errant CPE cli (a 1720 I recall!) when I found it with scripted "sh clock" checking the router time over a fixed period. Disabled OSPF on that CPE and......the whole network returned to graceful stability.

The author

Ivan Pepelnjak (CCIE#1354 Emeritus), Independent Network Architect at ipSpace.net, has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced internetworking technologies since 1990.