The Modular Network-in-a-Box: What Could Happen If SDN Thinks Big

Defining An Engineer’s Value

As a network engineer, how would you define the value you bring to the organization you work for? Is it that you know how to type commands that no one else knows? That you speak CLI? That you can interpret non-contiguous wildcard masks? That you know the difference between shutting down an interface and shutting down a BGP neighbor? In other words, do you think your value is defined by all the networking-specific intellectual esoterica that’s floating around the 3 pounds of meat sitting between your ears?

There is value in those things, to be sure. Routing protocol timers matter. Being able to configure a virtual port-channel is useful. Rolling out a new VLAN is necessary at times. Tweaking a QoS profile keeps executive phone calls sounding clear. Updating a NAT statement could be the difference between an Internet-facing application working and not.

However, being able to provision a network using a CLI is a mechanical task. We provision in a specific way because we have to. Because a need was defined, and we’re the ones who know how to type the required commands to meet the need. I argue that there’s little value in typing commands, from a certain point of view. The real value is in making the network DO something, not the means by which we bent the network to our will.

Bear with me, and let’s make an analogy to an automobile. What is the primary value of an automobile? An automobile provides personal transportation. That’s it. Each trip an automobile might take is a little different, and so it’s up to you – the operator – to steer, provide brake and accelerator input, not hit other cars, obey local traffic laws, and park when the trip has reached an end. But there wasn’t a great deal of value in the specific mechanics you applied to steer. You used a steering wheel most likely, but that’s incidental, as is the type of power steering pump that helped you steer. To accelerate, you probably used a floor-mounted accelerator pedal, but you might have used a button on your steering wheel tied to the cruise-control system. No matter whether you depressed a pedal or pushed a button to accelerate, the result is the same. You got where you were going, and thus the automobile fulfilled its chief objective. Therefore, if a network transports traffic (which is its job), the manner in which the network was programmed is irrelevant – it’s not the “manner” that’s important. It’s the end result.

You could argue that you’re more of an automobile mechanic than an operator – that you do the networking equivalent of changing spark plugs, installing new tires, swapping the occasional engine, bolting on a turbocharger, etc. For purposes of my analogy, that’s not the case. The vast majority of us, even when we’re doing a forklift upgrade of a core switch in a big data center, are really operators. Networking vendors are the mechanics – we don’t really get much of a chance to go under the hood, down deep into the software and the ASICs, where the REAL nuts-and-bolts of networking happens.

Sometimes, “What” Is More Important Than “How”

I’ve gone a long way to make a point, and the point is this: the way we operate networks today is ridiculous. Truly, fundamentally, egregiously…ridiculous. Now, I’m a pretty nerdy guy. I like command line interfaces. I get a little jolt of satisfaction laying out long scripts of IOS code in Notepad++ to fulfill some request or design requirement. But in reality? Most of that code I write is so routine, so repeatable, so tedious that it’s just silly to have to write it. Ever. What’s even sillier is this: for simple tasks like switchport provisioning, why does a human being have to touch the network at all? Again, this is ridiculous, and I believe it’s time to move beyond being excited about some mechanical process we’ve mastered over time (the CLI) that might give us a sense of self-worth.

Software defined networking (SDN) is still, itself, being defined. It means different things to different vendors, and vendors are understandably trying to shape the SDN message to best match their own agenda of what SDN should look like. And that’s fair. Vendors have business models to protect, and SDN has the potential to upset the sales model of folks who want to grind out new hardware and sell it to you every 3-7 years. If you think of SDN as a way to push the physical network infrastructure a layer away from the operator and manage the network holistically, then I think that’s a common ground most would agree on. So…if you back away from thinking about firewalls, switches, and routers as individual entities, and instead think of the network as a unified entity, and one that is managed centrally – then that should get your mind spinning. Instead of focusing on the door of the car, we’re thinking about the car. Instead of peering at the car’s inscrutable engine, again, we’re thinking about the car. What can the car do? Ergo, what can the network do?

A Good Place To Start: What Should a Modern Network Be Able To Do?

Provide access. At its base, a network allows hosts to connect. Hosts need to be able to access the network. This is the physical layer, and mostly means Ethernet, at least in the LAN. Oh, and that shared medium wireless thing some people get all excited about.

Connect hosts to other hosts. A network should allow hosts to connect to other hosts, or else there wasn’t a lot of point in connecting to the network to begin with. Functions like routing, bridging, network address translation, and wide-area network services fit in here.

Separate hosts when required. Yes, we want hosts to be able to connect to other hosts, but not always. Therefore, we might to prevent some hosts from talking to other hosts. Firewalls, application layer inspection, network admission control, and VLANs would map to this function.

Remain available. When the network is not available, then elements of the three functions above are potentially compromised. Therefore, the network should have resiliency built-in that will reduce or eliminate single points of failure. I put functions like L2 multipathing, ECMP, MLAG, virtual switching systems (the super chassis), load-balancing, and automated disaster recovery into this category.

Treat traffic uniquely or preferentially. On a data network, some traffic is more important than other traffic, or merits a unique forwarding path for whatever reason. Traffic must be able to be classified and forwarded according to a hierarchical scheme or other system of preference. Think of QoS, MPLS TE, PBR, and PfR as examples.

Be programatically agnostic. By this, I mean that a network should not insist that it be programmed by a vendor-supplied command-line shell, java GUI, or other very specific means of programming it. Yes, surely there must be SOME sort of predictable way to program the network – I agree. But we should not be locked exclusively into an IOS-like shell to get the job done…not that all vendors do this, but certainly a CLI is the standard approach. I believe the long-term answer lies in APIs that allow a user to program the network using his tool of choice, and that tool need not be vendor supplied. Think about vCenter automating network tasks in conjunction with a virtualized server function.

Conceptual Modules Providing Modern Network Functions

Let’s say I’ve covered most of the bases in my network function list above. What are the building blocks that could make all of this happen? Forget what you know about the pieces that do this already. Yes, we know – firewalls, switches, routers, etc. And yes, those things are necessary, but let’s broaden the scope a bit and think in terms of modularity.

Access layer. Hosts have to plug into stuff, and for the most part that means an Ethernet switch. There’s hundreds to choose from, with varying characteristics. This could also mean some sort of wireless layer, but fundamentally, this is how a host gets plumbed.

Services engine. In my conceptual SDN network model, there needs to be a device that can do rich packet processing. That might be a stateful firewall. That could be application layer inspection device. That could be a load-balancer…er, sorry, Application Delivery Controller. That could be a VPN termination device. The idea is that you’ve got some generic device that you spin up rich services on. Done. Embrane’s Heleos falls squarely into this space.

Gateways. These are devices that get you from your network to other, non-Ethernet, networks. Today, we call them WAN routers. They have Ethernet on one side, and T1/E1, T3/E3, SONET, etc. interfaces on the other. If you don’t have an Ethernet handoff from your carrier, then you need a device that can interface with whatever you are being given.

Controller & Applications. You need a device that tells the access-layer devices, gateways, and service engines how to forward traffic. The controller does this. The controller is also the piece that prevents network operators from having to touch each individual device to program it. Instead, the operator defines behavior by policy creation, and uses the controller to install, monitor, and administrate that policy. You might also think of the controller (with the applications running on it) as the orchestrator. The key here is that the controller need not be bound by traffic forwarding paradigms such as TRILL, OSPF, or whatever your favorite standard is that we know and work with today. The software on the controller can run its own application to forward traffic. Do we need to throw out all standards? I’m not ready to go there yet, but I’m also unwilling to say that we have to stick to standards forever and ever. SDN allows us to rethink how we do networking, and the applications running on the controller are the unicorns that could make brand new networking paradigms happen.

Console. The network operator needs a way to tell the controller what’s what. This is where non-network vendors might have the opportunity to shine. The console a network operator uses to create policies (i.e. program the network) should be rich, intuitive, flexible, and powerful. A well-designed 3D GUI will win over some (probably me). A controller API that lets people write their own scripts, third-parties to interact with the controller, or a vendor-agnostic community-based language to be developed would win over others. A vendor-specific CLI is NOT the answer here, not as far as I’m concerned. This isn’t the space where a networking vendor should need to excel. Leave this piece to companies that understand how to create rich user interfaces. Historically, networking vendors have been consistently poor at this.

Alerting & reporting engine. A network needs to be able to react to events, alert operators, and report on network happenings. Therefore, a central management point that can monitor the network elements as well as the controller in real-time is a necessity. Don’t confuse this piece with the controller itself. This is a component that stands apart from the controller. That said, I can see both the controller and alerting/reporting engine populating information fields in the operator’s console.

So, What Have We Got?

Taken altogether, what do we have? A modular network-in-a-box – a network that could scale from the smallest enterprise to the largest data center or service provider. Think of this approach as a framework, the key to which is APIs. Do you want to mix and match hardware vendors? Go ahead. Just make sure that (1) your chosen controller supports a given vendor’s API, and that (2) the vendor’s API supports the sorts of policies you want to write. Do you want to have a controller from vendor X, and a console from vendor Y? Why not? The only requirement is that your console application can tell the controller what it needs to be told. Want to buy a “netBlock”, like you can a vBlock? There’d be nothing in the world preventing such a thing from coming on the market.

As network operators, we get to manage a network that we manage as an entire entity. An entire system. I’ve worked on networks with hundreds of nodes, and I’ve worked on networks with many thousands of nodes. To operate the network without having to do tedious CLI on a bunch of devices to get any single task done would be a dream. Let’s take a personal example. To add a new VLAN and properly advertise it throughout the LAN/WAN, there are many steps required:

Create the VLAN on all switches that need it.

Add the VLAN to appropriate trunks.

Define the spanning-tree values for that VLAN.

Build a first-hop gateway for the VLAN

Provision an FHRP for the new VLAN.

Update the local IGP to advertise the new VLAN.

Update the local IGP to not run on the new VLAN if inappropriate.

Update the local firewall policy to allow the VLAN to pass if required. That could involve NATs, access-rules, and VPN policies.

Update the WAN router to facilitate redistribution of the new VLAN into the carrier’s BGP cloud.

Yes, some of those steps might be obviated with route summarization, protocols like VTP, or MSTP mapping. However, all of that still needs to happen, and that’s just to get a new VLAN built and talking to everything that it should be talking to. To accomplish this task, I need to touch core switches, access switches, firewalls, and WAN routers. How, in this modern world, does that make any sort of sense? Doesn’t it make far better sense to write a policy that describes to a controller where the new VLAN should be created and the rules that should be followed in its creation? And then save that policy as a template that I can use the next time that I need to do such a thing? Or better yet, let me create that policy template, and then have a junior team member use the template to build a new VLAN, perhaps with me as a workflow checkpoint? As it is in my real-world today, I have no one I can delegate the task of VLAN creation to, because what should be a routine task is too darn complicated and mistake-prone.

What about OpenFlow, the elephant in the room I’ve purposely not mentioned until just now? OpenFlow is certainly one way to go about programming flow tables in network hardware in a vendor-agnostic way. The specification continues to grow, as the ONF is up to version 1.3 (PDF download). But OpenFlow is just one way to go about SDN. Why stop there and peer myopically into the OpenFlow void? Well-defined vendor APIs allow a vendor’s unique product capabilities to be programatically accessible to any tool that can write to it.

What about NETCONF and YANG? Indeed, there’s an IETF approach here that fits into this scheme, at least as far as the controller-to-network hardware communication goes. But can NETCONF cover the granularity of specific vendor’s offering, especially as that vendor adds capability? NETCONF is only part of the picture.

Objections

This is all pie-in-the-sky, Banks. Yes, at the moment I’m thinking about “what if”, and not “what is”. However, all of the components exist or are being discussed by vendors, standards bodies, and end users. I’m throwing my own concept out there of where I’d like SDN to take us. Owning and operating a network could be so much better than it is today. More powerful in capability. Simpler to program. Easier to monitor. Allowing configuration tools to become best of breed. Friendly to delegation.

Interoperability is a fail, because you want vendors to write their own APIs, not standards-based ones. This is a point worth arguing. On the one hand, a proprietary vendor API lets that vendor expose every single capability of their product. On the other hand, not every device can do everything another might be able to do. So should vendors write to a common base API described by a standard, and then extend the standard with API functions that go beyond? While that would make it easier for third-party tool writers, my take is that a vendor should be able to write their own APIs free of standards, because that provides market differentiation for them. If a vendor wishes to support standards like NETCONF or OpenFlow in addition, that’s great. I’m not eagerly anticipating the networking market dumbed down to merchant silicon Ethernet switches that all do the same thing. Let’s let vendors innovate and do amazing things. Think about F5 with their iControl API – it exposes an innovative, rich load balancing platform to a host of programming tools. Bravo.

Doesn’t SNMP meet most of your requirements? Certainly SNMP could play a role in my scheme. MIBs and OIDs form an API of sorts. But SNMP doesn’t lend itself to bidirectional communication especially well (think northbound AND southbound APIs between a controller and network device). SNMP can set OID values, but what about getting information back from the network devices? Traps or informs? Traps are simply not designed to provide a central controller with a stream of data needed to maintain a holistic view of the network. Traps are a point-in-time notification of a specific event, delivered best-effort. Informs are acknowledged, but are still a point-in-time reflection of state.

Won’t virtualized switches in hypervisors change everything? Yes. They already have, in fact. Much of the innovation in data networking is happening in soft switches. But soft switches don’t change the SDN model I’ve imagined here. In fact, soft switches play right into the overall scheme – just treat them like any other network forwarding device. We’ll leave the issue of limited forwarding capacity in soft switches to the side for right now. I’m just thinking of soft switches as another module.

What about controller scalability? One of the recurring objections to a centralized controller is that it won’t scale large enough to handle large environments. For first-to-market commercial implementations of OpenFlow switches and controllers, this seems to be the case. Forget about that for a moment and back up just a step. If you have an application that doesn’t scale as large as you need today, what do you do about it? You throw more and/or better hardware at it. You make the software processes more efficient. You replicate & maintain synchronous data sets to let you move your application closer to the consumer. You spread application functions across diverse hardware. Why wouldn’t we do exactly this with SDN controllers? The issue of scaling is one we know how to solve, and to keep raising the issue of scale is to tilt at windmills.

Vendors will want nothing to do with this. I completely disagree. Vendors already have APIs; Cisco, Juniper, and F5 come immediately to mind. The API-based “modular network-in-a-box” scheme plays well into their business model…certainly better than a world where every switch runs OpenFlow. Insofar as SDN is moving networking forward, I think vendors will be on board with a model that doesn’t require them to throw out the way they’ve been doing business and making money. APIs can do this.

Yes, But Does Networking Need To Go In This Direction?

Oh my, yes. Businesses need to get to the place where it’s less complicated and risky to make network changes, add network capacity, or enhance network functionality. Today, making changes to a network infrastructure any more complex than provisioning an access-layer switch requires an enormous amount of planning, review, and process. And even then, changes don’t always go as expected. We need to get out of the error-prone business of operating network elements, and move instead into operating the network. I believe SDN opens the door to get us there.

Comments

Spot on Ethan… especially regarding interoperability… standards often lead to least common denominator APIs and should probably be secondary to making rich APIs within each building block that attempt to make each that block’s key resources accessible and reportable. And instead of value being defined by knowing what commands to type in what order could be replaced with the architect and builder functions of layering the building blocks together, programmatically connecting via APIs to match the business needs. Great job!!

Excellent article Ethan, couldn’t agree more, except that there is a solution on the market at the moment 😉 Just over a year a few colleagues and myself established a spin off from a large financial institution who had developed exactly what you described: a multi vendor/multi tenant networking platform in a box. We use a modeling technique to build networks based on the customer’s architecture & design in the platform and then generate configurations/changes according the designs by using templates, containing the vendor specifics. We use the CLI as API but can also use Yang or hand over the config to other systems if needed. If you’re interested, let’s have a chat. regards, Wim

This vision fits with what I am imagning for the future. What I am mostly worried about is the increased risk of this new layer of abstraction. From buggy API to to buggy controller, you add one or more layer of bugs in the process. Perhaps I am a little too paranoid (after all virtualization hasn’t made server services go to hell) but considering the amount of bugs I experience on a weekly basis on networking equipment, I have reasons to worry I would think….

There will be bugs. Oh boy. There will be bugs. Wait until your SDN magic uses open source packages… and an update kills your network. Totally plausible. Anyone that has spent time with linux or FreeBSD has run into these type of package issues.

They absolutely should not go away, and fortunately we have well defined ways today to internetwork. I think in the context of this discussion, there’s been a lot of focus on controller to switch interoperability – which isn’t necessarily internetworking (we’ve just subdivided a formerly atomic process.) Maybe a better use of time would be internetworking support across SDNs without having to transit a conventional network, versus intra-domain controller to switch interoperability. And within a homogenous domain (controller -> switch), APIs can help us externally stitch things together operationally at least.

Thanks, Ethan –I tend to agree that we need to stop focusing on how, and start focusing on why, or what, we are doing. There’s no way we’re going to do this so long as we’re stuck in the weeks of configuring BGP neighbors one at a time, or managing SPF exponential backoff…

The question is –will the networking industry accept this vision, and what it means in terms of relearning our world, or will we cling to what’s here and now?

”
Do we need to throw out all standards? I’m not ready to go there yet, but I’m also unwilling to say that we have to stick to standards forever and ever. ”
It all sounds great expect for the comment above. While standards bodies are slow and painful at best, I have a hard time believing we can build robust, resilient networks without standards. It is hard enough to get interoperability _with_ standards let alone if everyone was doing what they wanted.

I believe we need to change the way we view networking, but at some point shouldn’t we still have standards for how routing is done. If each shop ran a different algorithm, or was doing things in a different way – can you imagine the ramp up time it would take someone new to the shop?