While everyone deeply involved with OpenFlow agrees it’s just a low-level tool that can’t solve problems we couldn’t solve in the past (just like replacing Tcl with C++ won’t help you prove P = NP), occasionally you stumble across mindboggling ideas that are so simple you have to ask yourself: “were we really that stupid?” One of them that obviously impressed James Hamilton is the solution to load balancing that requires no load balancers.

Before clicking Read more,watch this video and try to figure out what the solution is and why we’re not using it in large-scale networks.

The proposal is truly simple: it uses anycast with per-flow forwarding. All servers have the same IP address, and the OpenFlow controller establishes a path from each client to one of the servers. In its most simplistic implementation, a flow entry is installed in all devices in the path every time a client establishes a session with a server (you could easily improve it by using MPLS LSPs or any other virtual circuit/tunneling mechanism in the core).

Now ask yourself: will this ever scale? Of course it won’t. It might be a good solution for long-lived sessions (after all, that’s how voice networks handle 800-numbers), but not for the data world where a single client could establish tens of TCP sessions per second.

A quick look back confirms that hunch: all technologies that required per-session state in every network device have failed. IntServ (with RSVP) never really took off on a global scale, and ATM-to-the-desktop failed miserably. The only two exceptions are global X.25 networks (they were so expensive that nobody ever established more than a few sessions) and voice networks (where sessions usually last for minutes ... or hours if teenagers get involved).

Load balancers work as well as they do because a single device in the whole path (load balancer) keeps the per-session state, and because you can scale them out – if they become overloaded, you just add another pair of redundant devices with new IP addresses to the load balancing pool (and use DNS-based load balancing on top of them).

Some researchers have quickly figured out the scaling problem and there’s work being done to make the OpenFlow-based load balancing scale better, but one has to wonder: after they’re done and their solution scales, will it be any better than what we have today, or will it just be different?

Moral of the story – every time you hear about an incredible solution to a well-known problem ask yourself: why weren’t we using it in the past? Were we really that stupid or are there some inherent limitations that are not immediately visible? Will it scale? Is it resilient? Will it survive device or link failures? And don’t forget: history is a great teacher.

You don't need to keep any state at all in the load balancing tier. The simplest way to do this is to use a consistent hashing scheme (or even outright identifiers) instead of recording per-flow state. This is of course easiest at layer 7 with HTTP cookies, but can also be done at layers 3-4. We've been running our SaaS application like this for years, with identically-configure load balancers using the session cookies to send users to the same back-ends. Nginx is an awesome bit of code.

Using anycast with layer-3 ECMP and source IP addresses as the hash key would probably be adequate for most services (although you have to watch out for clients that change addresses and make sure your backend application can handle them properly). Again, keeping as much state as possible (or at least a session identifier) on the client at Layer 7 can make this scale. No OpenFlow needed. But using anycast seems unnecessarily complicated - multiple stateless scale-out load balancers with DNS round-robin is a simpler and time-proven architecture.

Ryan, there's the stickiness state (which you can push to the client if you can insert cookies in the HTTP session and the client accepts them) and the session state (for active TCP sessions).

Cookies can't help you with the session state; the only way to get away from the session state is to load balance based on source IP address hash, but that does seem a bit risky. Are you aware of any load balancers that are truly stateless (i.e. no active TCP session state)?

As for anycast, it's one of those "you love them or you hate them" architectures. Some people have got it working, love it and deride any other (DNS-based) solution.

I see perfect use cases at the network edge (virtualized networking, access points), not sure about the network core ... that's one of the reasons Im so excited to be part of the OpenFlow symposium on Wednesday.

Ivan I agree with the insights and peeling some of the onion layers on OF I just don't see it as this great panacea. It has it uses and the "wireless controller/split mac" type of analogy of it fits at times. The goal of it to provide the intel. to "commodity" hardware and then just control it all from one virtual place is and slice and dice your network according to needs is ambitious. The protocols and their relevant states involved to centralize everything is daunting enough, not to mention the FSMs at the asic level etc. Bringing all that up to be manipulated in a NOC is SNA like.Beside I am still of the "intelligence in the packet/PHB" camp.

Just as you can design bad applications, OpenFlow opens the door to bad control plane designs too. In the past control plane applications have been reviewed by IETF or IEEE before substantial deployment. I imagine we'll see an enterprise network burned by a poor OpenFlow application in much the same way as we see enterprises burned today by poor IT applications.

You can see it now: "FooApp 2.24 requires FooFlow 1.0 to be installed on the enterprise's routers". FooFlow will do something trivial, like compensate for FooApps poor content caching, and just like the rest of FooApp the OpenFlow component will be so badly written it runs wild every now and again.

I do look forward to good OpenFlow applications. For example, integrating a SIP Session Border Controller into an ISP network is a bit of a nightmare at the moment. An architecture with a SBC making the policy choices and tweaking flow admission control at the network edge seems obvious.

Ivan, its important to understand that for the big cloud guys like James Hamilton, its all about commoditizing the hardware infrastructure to lower costs and increase flexibility. The load balancer is just another expensive, proprietary "vertically integrated stack" to dismantle.Insightful, thought provoking post, as always.

One of my observations from attending the Open Networking Summit last week was along similar lines: a lot of the use cases, and even academic studies around SDN/Openflow are less 'killer app' (or even headed that way) than simply re-implementing in most cases, and hopefully improving existing ways of solving/configuring network problems. There are several companies that are claiming that they have managed to get their hands around the state 'explosion' problem, but I suspect their focus on narrow use-cases makes this a much simpler problem to solve than for the general case. Hoping I'm wrong.

I do see the promise of SDN, especially in the movement that goes from 'configuring' your network to 'programming' your network. One of the most oft repeated questions I hear is whether we are ready for or need a Dev-Ops movement in networking. Potentially exciting and simultaneously scary thought - we can count on some # of lazy/inept Openflow apps. to make it into production.

Yeah, well, TCP state isn't exactly what I was talking about; it's the application state that truly matters (and should be decoupled from the transport-layer session in a good application design).

Anyway, I do believe there are some truly "stateless" load balancers, that is, they maintain no per-flow map of source-to-destination. The widely used and open-source HAproxy has a stateless layer-3 source hashing mode. I believe there are multiple commercial load balancing appliances that have HAproxy at their core. There are also I believe other commercial solutions with "direct server return" that operate at layer 3 in a hash-based mode.

I agree that load balancers operating at layer 4 or higher must maintain TCP session state for at least the sessions they are currently handling. But that is fine in a scale-out scenario, as the state isn't shared (and likely remains in CPU cache).

Replicating TCP session state to another HA peer device so you can "heal" TCP sessions on failure simply will never scale as you mentioned. That's why sane application layer protocols and user-agents have a sensible disconnect/retry/backoff behavior available. HTTP and browsers make this easy with round-robin DNS, but even Microsoft SQL Server's TDS protocol has automatic disconnect/retry available against a pool of IPs.

If you just can't deal with failover at the application layer, F5 is glad to sell you a pair of $3k servers for $100k so you can do layer-4 load balancing with HA. Just make sure you test all the failure modes thoroughly in concert with with your applications and hardware first.

+1 on your conclusions on scalability and resilience: these are my main concerns about SDN in general, although it looks like there are some smart guys out there working on this class of problems (eg. http://www.usenix.org/event/osdi10/tech/full_papers/Koponen.pdf - Onix, unrelated to loadbalancing).

I’m having a hard time believing in the OpenFlow deployment model with the controller programming the network per-flow. it must introduce pretty bad setup latencies, not suitable for multi gigabit datacenter networks (controller-to-switch RTT, switching ASIC update time, controller performance, etc). I’ve yet to be convinced that it can be done in real time.

the demo itself is rather unconvincing to anyone serious about loadbalancing: not only they ran a test with a very low flow setup rate (~1 flow/s compared to thousands), but also reduced the problem to layer 4 load balancing, while in reality you often need L7 with session tracking, SSL offload, etc. non-naive implementation of that with OF (whatever future revision) is close to impossible, although could be an interesting (& theoretical) brain exercise.

Ivan Pepelnjak, CCIE#1354 Emeritus, is an independent network architect. He has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced technologies since 1990. See his full profile, contact him or follow @ioshints on Twitter.

The author

Ivan Pepelnjak (CCIE#1354 Emeritus), Independent Network Architect at ipSpace.net,
has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced internetworking technologies since 1990.