Category Archives: SD-WAN

Software, cloud computing and IOT are rapidly transforming networks in a way, and at a rate, never seen before. With software-as-a-service (SaaS) models, enterprises are moving more and more of their critical applications and data to public and hybrid clouds. Enterprise traffic, that never left the corporate network, is now shifting to the Internet, reaching out to different data centers across the globe. Streaming Video (Netflix, Youtube, Hulu, Amazon) accounts for an absurdly high percentage of traffic in the Internet and content providers have built out vast content distribution networks (CDNs) that overlay the Internet backbone. Higher resolutions (HD and UHD) will increase the traffic further, and by some accounts, will be over 80% of the total network traffic by 2020. More and more businesses are being created that reach their customers exclusively over the Internet (Spotify, Amazon, Safari, Zomato, etc). Real-time voice and video communications are moving to cloud-based delivery and network operators are challenged to deliver these services without impacting user quality of experience. And if this was’nt enough, with the advances being made in IOT, we have more devices than ever, lively communicating and chatting in real time over the Internet.

Security becomes a prime concern as more business critical applications migrate to the cloud. The number of DDOS attacks are only increasing and IOT devices can be compromised by hackers to launch some very lively and innovative attacks. A large scale cyber attack in 2016 used a botnet consisting of a multitude of IoT devices such as printers, camera, web cams, residential network gateways, and even baby monitors causing a major outage that brought down a big chunk of the Internet.

All this traffic goes over service provider networks that were built and designed using devices, protocols and management software from the Jurassic age. The spectacular growth and variability of traffic that is experienced today was not anticipated when these networks were built. There is a dire need to cope with changing traffic patterns and to optimize the use of available network resources at all levels (IP, MPLS and Optical) — we’ll talk about the multi-layer SDN controller that optimizes the IP-Optical layers some other time.

Given these challenges, its imperative that service providers work towards gaining realtime visibility into the network behavior and extract actionable insights needed to react immediately to network anomalies, changing traffic patterns and security threats and alarms.

And this is where big data analytics, like a knight in the shining armour, comes in.

Given the data rates that we are dealing with, and the rate at which traffic volumes and speeds are growing, deep packet inspection at line rate gets ruled out in most parts of the network. There is only so much that one can do with hardware’s brute force approach. Additionally, with most traffic being encrypted, DPI offers limited — no, zero — insight into whats happening in the network.

What can help at the scale that networks run today is streaming telemetry combined with big data analytics. Instead of constantly polling the devices in the network and then reacting to what is learnt, the new age mantra is for these devices to periodically push the relevant statistics to the data collectors, which can analyse this data and act based on that. One can argue that streaming network telemetry may not even require an IETF standard in order to be useful. A standard format like JSON could be used, and it’s up to the collector to parse and interpret the incoming barrage of data. This allows network operators to quickly write dev-ops tools that they can use to closely monitor their network and services. This opens up room for hyper innovation where new-age startups can quickly come up with products that can smartly mine the data from the network and draw rich insights into whats happening that can help the service providers in running their networks smarter and hotter.

Big data analytics entails ingesting, processing and storing exabytes worth of network data over a period of time that can be analysed later for actionable insights. With advances made in streaming analytics, this analysis can also happen in real time as the data comes piping hot from the network. New age scalable stream processors make it possible to fuse data streams to answer more sophisticated queries about the network in real-time.

By correlating data from sources beyond traditional routing and networking equipment (IX router-server views, DNS and CDN logs, firewall logs, billing and call detail records) it is possible for the analytics engine to identify patterns or behaviors that can not be identified by merely sifting through the device logs (collected traditionally using SNMP, syslogs, netflow, sflow, IPFIX, etc). The ability to correlate telemetry data from the network with applications such as Netflix or Youtube or SaaS applications such as an iOS upgrade can provide insights that can never be found with traditional traffic engineering approaches.

I claim that we now have the smarts to avoid the famous iOS7 meltdown that happened when iOS7 was released. Let’s see how:

The analytics engine feeding the controller can identify and correlate iOS updates to a new spike — an anomaly — in the network utilization inside an enterprise. The SDN controller can install more specific flows that will steer all iOS update traffic on a different path in the network. This way the controller can automatically adjust the enterprise customer flows to either (i) provide an improved iOS update experience OR (ii) prevent other enterprise traffic to get affected with the iOS update tsunami. Advanced IP controllers (and those are being demo’ed to several service providers currently) can steer such traffic across multiple ASes as well.

We recently demo’ed a hierarchical SDN controller to a very big customer in Europe. The SDN controller was used to set up inter-domain IP/MPLS services and it used telemetry feeds to determine the realtime link utilization of the inter-domain links. We used that information to place the inter-domain IP services across multiple ASes — the new services were placed on the least utilized inter-domain link at that instance. The services could be moved around as the link utilization changed. This is very different from how its done today, where the BW utilization is reserved and services are placed based on the hard reservations. IMO, the concept of hard reservations will get obsoleted very soon. Why assume that a VPLS service on a link will take up 1Gbps, when the traffic that it “historically” sends never exceeds 100 Mbps?

The figure below shows the different sources feeding into a typical big data analytics cluster that feeds the output to the SDN controller.

Flow telemetry and network telemetry will help in monitoring the traffic flowing inside the service provider networks. We could use this to gain a deep understanding of what a network looks like during normal operations and how it looks like when an anomaly is present in the network.

If one understands the “normal”, the abnormal can become apparent. What comprises abnormal may vary from network to network and from attack to attack. It could include large traffic spikes from a single source in the network, higher-than-typical traffic “bursts” from several or many devices in the network, or traffic types detected that are not normally sent from a known device type. Once the abnormal has been identified, the attacks can be controlled and eliminated.

Network telemetry will also help in peering analytics to select the most cost-effective peering and transit connections based on current and historic traffic trends. Correlating this data with BGP feeds from route servers can help in visualizing how the traffic flows/shifts from one AS to the other.

Data collected from different sources is fed to a scalable publish/subscribe pipeline that feeds this to the big data analytics platform. Some of this can be fed to a real time streaming analytics platform for deriving rich real time insights from the network. This can then be fed to a machine learning cluster for predictive analytics.

The data is stored in a scalable data lake which can be optimized for complex, multi-dimensional queries that becomes the building block for the SDN controller to do something useful. This data can be coupled with the other data that is being learnt off different sources (syslog records, DNS and CDN logs, IX views, etc) and all this can be processed and transformed into actionable intelligence. For example, this can help service providers understand the amount of Facebook, Netflix, Youtube and Amazon Prime Video traffic thats flowing in their networks. It can help them construct a “heat map” of the most active sources and the sinks. Combine this with anonymized subscriber demographics, and the big data analytics framework can provide high fidelity insights into how the subscribers, applications and the network are correlated.

This level of insight cannot be derived by merely observing the telemetry feeds alone since it is not straightforward to correlate flows with specific applications, services and subscriber end points. The ability to mine data from a panoply of sources (as shown on the left side of the figure above — DNS servers, repositories that can identify servers and end points by owner, geo-location, type and purpose) and being able to correlate them is what differentiates the new age intelligent networks from the ones that exist today.

This level of sophistication can not be achieved without a solid big data analytics framework supporting the SDN controller. The limitless potential of what can be achieved will only unfold as more real deployments start happening in the next few years. We’re living in very interesting times, and I’m waiting with bated breath to see what the future holds and how the networking industry becomes “great again”!

Lets admit that most of us in the networking domain know as much about SD-WAN as an average 6th grader on sex — which is to say pretty much nothing. We take it as something much grander and exotic than what it really is and are obviously surrounded by friends and well-wishers who wink conspiratorially that they “know it all” and consider themselves on an intellectual high ground to educate us on matters of this rich and riveting biological social interaction. Like most others in that tender and impressionable age, i did get swayed by what i heard and its only later that i was able to sort things out in my head, till it all became somewhat clear.

The proverbial clock’s wound backwards and i experience that feeling of deja-vu each time i read an article on SD-WAN that either extols its virtues or vilifies it as something that has always existed and is being speciously served on a platter dressed up as something that it is not. And like the big boys then, there are men who-know-it-all, who have already written SD-WAN off as something that has always existed and really presents nothing new here. Clearly, i disagree with that view.

I presume, perhaps a trifle rashly, that you are already aware of basic concepts of SDN and NFV (and this) and hence wouldnt waste any more oxygen explaining those.

So what really is the SD-WAN technology and the precise problem that its trying to solve?

SD-WAN is a way of architecting, designing and deploying enterprise WANs using commodity Internet connections in a manner that makes those “magically” appear as a private “MPLS-like” connection. Its the claim that it can appear “MPLS-like” that really peeves the regular-big-mpls-vendors-and-consultants. I will delve into the “MPLS-like” aspect a little later, so please hold on to your sabers till then. What makes the “magic” work is the control plane that implements and enforces the network access policies (VOIP is high priority/low latency/low jitter, big data sync medium priority and all else low priority, no VOIP via Afghanistan, etc) and the data plane that weaves an L2/L3 overlay on top of the existing consumer-grade Internet links (broadband links and in a few cases the LTE/4G connections).

The SD-WAN evangelists want to wean enterprises off their dedicated prohibitively priced private WAN connections (read MPLS circuits) with commodity enterprise broadband links. Philosophically, adding a new branch should just mean shipping a CPE device (perhaps in a virtualized form-factor) that auto-magically dials into a central controller when brought to life. Once thats done and the credentials verified, the branch should just come online (viola!) and should be visible to all the geo-separated branches. Contrast this with the provisioning time (can go as high as a year in some remote locations) and the complexity it takes to get a remote branch online today with MPLS and you will understand why most IT folks have ulcers and are perennially on anti-anxiety/depressant medicines. And btw we’ve not even begun talking about the expenses and long term contracts with the MPLS connections here!

Typically SD-WAN solutions have a central SDN controller which is really a cluster of x86 devices (servers, VMs, containers, take your pick) and hence has computing and analytical horsepower much more than a dedicated HW network device. The controller has complete visibility right from the source all the way till the destination and can constantly analyze traffic and can carve out optimal network paths for applications and individual flows based on the user and application policies. In the first mile the Internet links are either coalesced to form a fatter pipe or are used separately as dictated by the customer policies. The customer traffic is continuously finger-printed and is routed dynamically based on the real time network conditions.

Where most people go wrong is when they believe that SD-WAN solutions lose control over the traffic once it leaves the customer premises or the SD-WAN edge node. Bear in mind that there is nothing in the SD-WAN technology that prevents further control over how the traffic is routed and this could perhaps be one aspect differentiating one SD-WAN offering from the other. Since SD-WAN is an overlay technology you will not have control over each physical hop, but you can surely do something more nuanced given the application and end-to-end network visibility that exists with the controller.

Its “MPLS-like” in the sense that you can, in most cases, guarantee the available bandwidth and network up time. The central controller can monitor each overlay circuit for loss/jitter/delay and can take corrective actions when routing traffic. Patently enterprise broadband connections in certain geographies dont come with the same level of reliability as MPLS and it behooves upon us to ask ourselves if we need that level of reliability (given the cost that we pay for such connections). An enterprise can always hedge its risks by commissioning a few backup enterprise broadband connections for those rainy days when the primary is out cold. Alternatively, enterprises can go in for a hybrid approach where they maintain a low bandwidth MPLS connection for their mission-critical traffic and use the SD-WAN solution for everything else OR can implement a policy to revert to the MPLS connection when the Internet connections are not working satisfactorily. This can also provide a plausible transition strategy to the enterprises who may not be comfortable switching to SD-WANs given that the technology is still relatively new.

And do note that even MPLS connections go down, so its really not fair to say that SD-WAN solutions stand on tenuous grounds with regard to the reliability. Yes i concede that there are SLAs given with MPLS that just dont exist with regular Internet pipes. However, one could argue that you can get some bit of extra reliability by throwing in an additional Internet link (with a different provider?) thats only there as a standby. Also note that with service providers now giving fiber connections, the size and the quality of Internet links is only going to improve with time. A large site for instance can aggregate a 1Gbps Google Fiber and a 1Gbps Verizon FIOS connection and can retain a small MPLS connection as the standby. If the enterprise discovers that its MPLS connection is underutilized it can negotiate on pricing or can go with lower MPLS pipe and thereby save on its costs.

I recently read a blog which argued that enterprise broadband promising 350Mbps would mostly give only around 320Mbps on an average. Sure this might be true in a few geographies, but seriously, who cares? Given the cost difference between a broadband connection and an MPLS circuit i will gladly assume that i only had a 300Mbps connection and derive utmost pleasure any time it gives me anything more than that!

The central controller in the SD-WAN technologies amongst other things (analyzing traffic, links) can also continually learn about the customer network conditions and can predict when link qualities will deteriorate and can preemptively reroute traffic before the links start acting up. Given that the controller is monitoring paths end-to-end and is also monitoring and analyzing the traffic emanating from the branch sites there are insights that enterprises can draw that they could have never imagined when using traditional WAN architectures since in that world all connections are really only “dumb pipes”. SD-WAN changes all that — it changes how the enterprise connections and the applications running there are viewed. The WAN architecture is aligned to the application service requirements and its management is greatly simplified. You can implement complex network policies and let the SD-WAN infrastructure sweat on your behalf (HINT: intent driven networking).

So watch out before you disdainfully write off SD-WAN as a technology thats merely replacing your dumb MPLS pipes with the regular Internet connections, since i argue, it can really do a lot more than that. Perhaps a topic worth discussing some other day.