Category Archives: DNS

Software, cloud computing and IOT are rapidly transforming networks in a way, and at a rate, never seen before. With software-as-a-service (SaaS) models, enterprises are moving more and more of their critical applications and data to public and hybrid clouds. Enterprise traffic, that never left the corporate network, is now shifting to the Internet, reaching out to different data centers across the globe. Streaming Video (Netflix, Youtube, Hulu, Amazon) accounts for an absurdly high percentage of traffic in the Internet and content providers have built out vast content distribution networks (CDNs) that overlay the Internet backbone. Higher resolutions (HD and UHD) will increase the traffic further, and by some accounts, will be over 80% of the total network traffic by 2020. More and more businesses are being created that reach their customers exclusively over the Internet (Spotify, Amazon, Safari, Zomato, etc). Real-time voice and video communications are moving to cloud-based delivery and network operators are challenged to deliver these services without impacting user quality of experience. And if this was’nt enough, with the advances being made in IOT, we have more devices than ever, lively communicating and chatting in real time over the Internet.

Security becomes a prime concern as more business critical applications migrate to the cloud. The number of DDOS attacks are only increasing and IOT devices can be compromised by hackers to launch some very lively and innovative attacks. A large scale cyber attack in 2016 used a botnet consisting of a multitude of IoT devices such as printers, camera, web cams, residential network gateways, and even baby monitors causing a major outage that brought down a big chunk of the Internet.

All this traffic goes over service provider networks that were built and designed using devices, protocols and management software from the Jurassic age. The spectacular growth and variability of traffic that is experienced today was not anticipated when these networks were built. There is a dire need to cope with changing traffic patterns and to optimize the use of available network resources at all levels (IP, MPLS and Optical) — we’ll talk about the multi-layer SDN controller that optimizes the IP-Optical layers some other time.

Given these challenges, its imperative that service providers work towards gaining realtime visibility into the network behavior and extract actionable insights needed to react immediately to network anomalies, changing traffic patterns and security threats and alarms.

And this is where big data analytics, like a knight in the shining armour, comes in.

Given the data rates that we are dealing with, and the rate at which traffic volumes and speeds are growing, deep packet inspection at line rate gets ruled out in most parts of the network. There is only so much that one can do with hardware’s brute force approach. Additionally, with most traffic being encrypted, DPI offers limited — no, zero — insight into whats happening in the network.

What can help at the scale that networks run today is streaming telemetry combined with big data analytics. Instead of constantly polling the devices in the network and then reacting to what is learnt, the new age mantra is for these devices to periodically push the relevant statistics to the data collectors, which can analyse this data and act based on that. One can argue that streaming network telemetry may not even require an IETF standard in order to be useful. A standard format like JSON could be used, and it’s up to the collector to parse and interpret the incoming barrage of data. This allows network operators to quickly write dev-ops tools that they can use to closely monitor their network and services. This opens up room for hyper innovation where new-age startups can quickly come up with products that can smartly mine the data from the network and draw rich insights into whats happening that can help the service providers in running their networks smarter and hotter.

Big data analytics entails ingesting, processing and storing exabytes worth of network data over a period of time that can be analysed later for actionable insights. With advances made in streaming analytics, this analysis can also happen in real time as the data comes piping hot from the network. New age scalable stream processors make it possible to fuse data streams to answer more sophisticated queries about the network in real-time.

By correlating data from sources beyond traditional routing and networking equipment (IX router-server views, DNS and CDN logs, firewall logs, billing and call detail records) it is possible for the analytics engine to identify patterns or behaviors that can not be identified by merely sifting through the device logs (collected traditionally using SNMP, syslogs, netflow, sflow, IPFIX, etc). The ability to correlate telemetry data from the network with applications such as Netflix or Youtube or SaaS applications such as an iOS upgrade can provide insights that can never be found with traditional traffic engineering approaches.

I claim that we now have the smarts to avoid the famous iOS7 meltdown that happened when iOS7 was released. Let’s see how:

The analytics engine feeding the controller can identify and correlate iOS updates to a new spike — an anomaly — in the network utilization inside an enterprise. The SDN controller can install more specific flows that will steer all iOS update traffic on a different path in the network. This way the controller can automatically adjust the enterprise customer flows to either (i) provide an improved iOS update experience OR (ii) prevent other enterprise traffic to get affected with the iOS update tsunami. Advanced IP controllers (and those are being demo’ed to several service providers currently) can steer such traffic across multiple ASes as well.

We recently demo’ed a hierarchical SDN controller to a very big customer in Europe. The SDN controller was used to set up inter-domain IP/MPLS services and it used telemetry feeds to determine the realtime link utilization of the inter-domain links. We used that information to place the inter-domain IP services across multiple ASes — the new services were placed on the least utilized inter-domain link at that instance. The services could be moved around as the link utilization changed. This is very different from how its done today, where the BW utilization is reserved and services are placed based on the hard reservations. IMO, the concept of hard reservations will get obsoleted very soon. Why assume that a VPLS service on a link will take up 1Gbps, when the traffic that it “historically” sends never exceeds 100 Mbps?

The figure below shows the different sources feeding into a typical big data analytics cluster that feeds the output to the SDN controller.

Flow telemetry and network telemetry will help in monitoring the traffic flowing inside the service provider networks. We could use this to gain a deep understanding of what a network looks like during normal operations and how it looks like when an anomaly is present in the network.

If one understands the “normal”, the abnormal can become apparent. What comprises abnormal may vary from network to network and from attack to attack. It could include large traffic spikes from a single source in the network, higher-than-typical traffic “bursts” from several or many devices in the network, or traffic types detected that are not normally sent from a known device type. Once the abnormal has been identified, the attacks can be controlled and eliminated.

Network telemetry will also help in peering analytics to select the most cost-effective peering and transit connections based on current and historic traffic trends. Correlating this data with BGP feeds from route servers can help in visualizing how the traffic flows/shifts from one AS to the other.

Data collected from different sources is fed to a scalable publish/subscribe pipeline that feeds this to the big data analytics platform. Some of this can be fed to a real time streaming analytics platform for deriving rich real time insights from the network. This can then be fed to a machine learning cluster for predictive analytics.

The data is stored in a scalable data lake which can be optimized for complex, multi-dimensional queries that becomes the building block for the SDN controller to do something useful. This data can be coupled with the other data that is being learnt off different sources (syslog records, DNS and CDN logs, IX views, etc) and all this can be processed and transformed into actionable intelligence. For example, this can help service providers understand the amount of Facebook, Netflix, Youtube and Amazon Prime Video traffic thats flowing in their networks. It can help them construct a “heat map” of the most active sources and the sinks. Combine this with anonymized subscriber demographics, and the big data analytics framework can provide high fidelity insights into how the subscribers, applications and the network are correlated.

This level of insight cannot be derived by merely observing the telemetry feeds alone since it is not straightforward to correlate flows with specific applications, services and subscriber end points. The ability to mine data from a panoply of sources (as shown on the left side of the figure above — DNS servers, repositories that can identify servers and end points by owner, geo-location, type and purpose) and being able to correlate them is what differentiates the new age intelligent networks from the ones that exist today.

This level of sophistication can not be achieved without a solid big data analytics framework supporting the SDN controller. The limitless potential of what can be achieved will only unfold as more real deployments start happening in the next few years. We’re living in very interesting times, and I’m waiting with bated breath to see what the future holds and how the networking industry becomes “great again”!

If you were unable to access LinkedIn for almost the entire day earlier this week, then you can take solace in the fact that you were not the only one, not able to. Almost half the world shared your misery where all attempts to access LinkedIn (and several other websites) went awry. This purportedly happened because a bunch of hackers decided to poison the DNS entries for LinkedIn and some other well known websites (fidelity.com being another).

Before we delve into the sordid details of this particular incident lets quickly take a look at how DNS works.

Whenever we access linkedin.com, our computer must resolve this human-readable address “linkedin.com” into a computer-readable IP address like “216.52.242.86″ thats hosting this website. It does this by requesting a DNS server to return an IP address that can be used. The DNS server responds with one or more IP addresses with which you can reach linkedin.com. Your computer then connects to that IP address.

So where is this DNS server located that i just spoke about?

This DNS server lies with your Internet service provider, which caches information from other DNS servers. The router that we have at home also functions as a DNS server, which caches information from the ISP’s DNS servers — this is done so that we dont have to perform a DNS lookup each time we have to access a website for which we have already resolved the IP address.

Now that we know the basics, lets see what DNS poisoning is?

A DNS cache is said to be poisoned if it contains an invalid entry. For example, if an attack “somehow” gains control of a DNS server and changes some of the information on it — it could for instance say that citibank.com actually points to an IP address the attacker owns — that DNS server whenever requested to resolve citibank.com would tell its users to look for citibank.com at the wrong address. The attacker’s address could potentially contain some sort of malicious phishing website, which could resemble the original citibank.com or could simply be used to drop all traffic. The latter is done when ISPs want to block all access to a particular website. China typically does it for lot of websites — its called the Great Firewall of China. There are multiple techniques which China employs to implement their censorship and one of them is DNS poisoning (more here).

DNS poisoning spreads like wild fire because of how it works. Clearly Internet service providers cannot hold information about all websites in their DNS caches – they get their DNS information from other DNS servers. Now assume, that they are getting their DNS information from a compromised server. The poisoned DNS entry or entries will spread to the Internet service providers and get cached there. It will then spread to other ISPs that get information from this DNS server. And it wouldnt stop at this, it would spread to routers at campuses, homes and the DNS caches on individual user computers. So everybody who requests for the DNS resolution of the hijacked website will receive an incorrect response and will forward traffic to the address specified by the attacker.

LinkedIn.com and a number of other organizations have registered their domain names with Network Solutions. For some inexplicable reason their DNS nameservers were replaced with nameservers at ztomy.com. The nameservers at ztomy.com were configured to reply to DNS requests for the affected domains with IP addresses in the range 204.11.56.0/24. This address range is belongs to confluence networks, so all traffic bound to LinkedIn was re-routed to a networks hosted by confluence networks.

But what caused the name servers to be replaced?

According to Network Solutions (NS), they were hit by a distributed denial-of-service (DDOS) attack on night of 19/06. This is certainly is plausible since Network Solutions, being the original registrar for .com, .net, and .org domain names, is certainly an attractive target for attackers. Most of you would remember the (in)famous August 2009 NS server breach which allegedly led to the exposure of names, addresses, and credit card numbers of more than 500,000 people who made purchases on web sites hosted by the NS.

A spokesperson from Network Solutions had the following to say regarding the DNS poisoning issue:

“In the process of resolving a Distributed Denial of Service (DDoS) incident on Wednesday night, the websites of a small number of Network Solutions customers were inadvertently affected for up to several hours.”

They have reassured customers that no confidential data has been compromised as a result of the incident.

The jury meanwhile, is still out on whether this was a configuration error or a coordinated DNS attack on Network Solutions.

Regardless of what it was, the fact is that enormous amount of LinkedIn traffic was redirected to some other network. This is should make all of us very nervous since LinkedIn does not use Secure Socket Layer (SSL), which means that all communication between you and LinkedIn goes in plaintext — leaving you vulnerable to eavesdropping and man-in-the-middle attacks. If an attacker is able to intercept all data being sent between a browser and a web server they can see and use that information. In this event all traffic bound to LinkedIn was diverted to IP addresses owned by Confluence Networks.

Not only is this a clear violation of their user’s privacy (which is a different discussion btw) but is also extremely dangerous if this data transfer is not being done securely, as this would leave LinkedIn users very vulnerable to eavesdropping attacks.

So when the DNS entry for LinkedIn was poisoned we know that all our confidential information was diverted to unknown servers that can mine that data in whatever manner they find most amusing. I just hope that you didnt have any confidential data plugged into your LinkedIn iOS app, as somebody somewhere may just be reading all that as you read this blog post.