Days in the life of a professional packet shepherd.

Those Slow-Poke Network Engineers

This year (and especially in the past few months) there have been a lot of new solutions announced in the network virtualization and network overlay platform arenas. These solutions hold great potential, but in my opinion the vendors of these solutions need to get on board with a team approach to IT and avoid marketing to server engineers by throwing the networking team under the bus.

While I’m glad to see that in detailed technical articles/blogs and in conversation, there’s an acknowledged need for a robust and reliable underlay network, I’m concerned at the broader marketing message that several of these vendors seem to be pushing. To paraphrase and add a little snark, it goes something like this:

“Hey all you smart, handsome server engineers! Stop waiting on those slow, inefficient network guys! They take two months to create a VLAN and get in the way of your application deployment! Just use SpiffyCo Network Overlay Technology and do it yourself! Say ‘sayonara’ to waiting on the network! Tell your CIO you want SpiffyCo Overlay Networks because you care much more about getting work done than those lazy, weird-smelling network goons! Order today and receive a free vial of geniune unicorn tears (just pay separate handling and processing)!”

The target audience of these messages is pretty obvious. So, why is the network team perceived to be so slow? It’s not all FUD. Sometimes the network team is slow. But it’s not because it takes two months to type “vlan 1440” a dozen times or mount and connect a new switch. Here are a few likely reasons:

The network guys usually have to do design, engineering, capacity planning, and install activity. A new VLAN might require a change to multiple switches that may require one or more maintenance windows (depending on the organization’s tolerance for change). Or adding that new application may demand larger links from the access layer up to the core to handle load from the new application servers talking to database or storage systems. While turning up a server is pretty unlikely to impact other servers around it, modifying the network on which everything rides usually makes management folks a bit nervous and that means waiting for some maintenance window to create the VLAN, add it to trunks, etc.

The lack of automation and tools force very manual workflows compared to virtual server technologies that basically instantiate a new server in seconds. This is a legitimate gripe considering the impact of server virtualization in terms of new application provisioning. It’s absolutely fair to say that it may take somewhat longer to get a new VLAN, routed interface, firewall zone, and load-balancer context created end-to-end than it does to start booting the new virtual servers that will go in it.

While server engineers are usually focused in the data center, the network engineer often wears many hats. The person or team doing data center networking is often also expanding the LAN for the newly built floor of the building, migrating the WAN to reduce costs, generating reports for management on employee Internet usage, and troubleshooting the application issues that are all blamed on “the network”. While some very large or very well organized enterprises differentiate between data center and enterprise networking teams, many of the medium-size enterprises I work with really do not.

The network team is often simply understaffed. Some light research was unable to find much in terms of recommended guidelines for network ports or devices per admin (it’s highly variable as you can imagine) but in my experience it’s not uncommon to see a network “team” or one or two engineers doing the vast majority of networking for enterprises that may consist of hundreds of network devices and thousands or even tens of thousands of ports. With very small teams of network admins/engineers handling infrastructures of that size, with all the duties mentioned above, it’s no wonder requests sit in a queue for days, weeks, or months.

Network virtualization has the potential to improve things a lot. It can improve the responsiveness of data center network admins to project and M/A/C requests by separating the design and scaling of the underlay from the complexity and constant change of the overlay. It can help compartmentalize, and thereby reduce the risk of, that change. In theory the data center underlay fabric should remain very stable and require little operational change, and making necessary change to portions of the overlay (even adding a new overlay) should be treated as lower-risk than modifying the foundational underlay network. The automation capabilities of network virtualization platforms will also offer a means to speed up deployment such as a new VLAN or VRF definition, possibly along with firewall and application delivery policy, in a centralized control-plane which is immediately propagated to all relevant forwarding plane elements in the network.

On the other hand, unless organizations begin to make a more explicit delineation between data center and enterprise networking roles, that same network admin still has to chase down the network loop on the 16th floor, troubleshoot the CEO’s remote access VPN, prep temporary network access for the trade show booth, and prove that it’s “not the network.”

My biggest hope is that as the network virtualization market solidifies, the message will shift from enticing server engineers to end-run around the network team, to a message of improving real teamwork across functions in IT and leveraging various skill-sets collaboratively to create visible business results faster and less disruptively.

Like this:

Related

6 thoughts on “Those Slow-Poke Network Engineers”

As I read through the discussions about the perceived inability of network staff to be “sufficiently” agile, I wonder how much is due to the artisan versus industrial model.

Most network design add/change operations are like the custom shoe maker crafting a custom pair of shoes rather than mass production. Or, worse, someone copying a legacy, working config to a new device without understanding the parameters, then whacking it into shape as needed.

Years ago software applications programmers starting consolidating their designs and implementations around patterns, loosely based on Alexander’s A Pattern Language. That thinking never penetrated very far in networking, though the networking thought leaders clearly operate that way as I listen to their podcasts and read their postings.

And, worse, we’ve all had it relatively easy for years since networking models have been essentially stagnant, just amping up speeds. Virtualization and SDN discussions are changing what’s in the mix, but thinking is generally still stuck in legacy land.

IT leadership is really to blame here. Instead of focusing on the application service chain, that is all the services needed to make an app work end to end, we have the fatal combination of teams in silos implementing apps with tightly coupled support services. So when things break, they often break spectacularly, instead of in ways that can recover without heroic, manual effort.

It’s not necessarily that networking needs zillions more headcount, it’s that IT leadership needs to get busy engineering out legacy thinking and legacy designs, including generally matching resources to function. Of course, I could have written this same thing 5, 10, 15… years ago.

This isn’t specific to networking really. System admins are getting the same story. “We’re going to deploy a cloud so that we don’t have to wait for the poky sysadmins to deploy new servers.” Of course that cloud is often what is driving the adoption of the network virtualization system.

This is a general perception problem for operations, that we’re slow to handle requests and so we should provide an API that does instant provisioning.

Of course to some extent this is a good thing. Users shouldn’t have to wait for us in most cases. It’s a good thing for us and them if they can provision resources quickly. It’s no fun to manually PXE boot a server or provision a VLAN.

The problem will be, as you point out, when engineering and management realize that proper scaling and architecture didn’t magically get solved by the cloud and network virtualization.

Thanks for the insight into the systems side of things, Jason! I think it will be a while yet before the reality of network virtualization really crystallizes and people really start to understand (and accept!) what problems it does, and doesn’t solve. Maybe I’m just being an oversensitive network goon, but I’d like to see the marketing message focus more on the tangible benefits of network virtualization from a security, flexibility, and stability standpoint rather than “those guys are too slow, don’t wait on them!” attitude.

It’s not that guys like us don’t see through that, it’s that the marketing message on the big presos and glossy pages are what CIOs latch on to and I don’t like that they’ll be latching onto “our network team is too slow. This product will make that problem go away!”

Great post and good perspective! Glad to see someone else turned-off by (and looking beyond) the divisive marketing. Two additional points I think that are often overlooked or ignored:

1) As Jason indicated in his comments, the challenge is pervasive across infrastructure teams. I lived this for four years at a Fortune 150. The server, SAN, network, etc. teams all shared in the roadblock blame-game. Worse yet, each team would occasionally attempt to pin fault for it amongst one another. Ex. “I can’t finish this because I don’t have this from team X…” and it would cascade from there.

2) How much of the pokiness or lack-of-agility is due to process? There are some enterprises so bogged down by certain process frameworks that it becomes a significant cog as well. Ex. ITIL change management, which in some cases may include limited change windows and/or change management meetings. Risk mitigation certainly has a place, but it does not make it exempt from scrutiny.