Technical Article

F5 cloud uptime

Cloud computing has been hailed as the future of IT in the enterprise. It is cheaper, quicker and more flexible and agile than an on-premise, internal IT infrastructure. Instead of the cost (in time and money) of installing your own IT kit in your own data centre or corner of the office, you can let someone else worry about that. You just use your browser to connect up to whichever service you need at the time, and you only pay for exactly what you use.

But while the benefits may be clear, there are also negatives to contend with, particularly in terms of loss of control, which is particularly evident when things go wrong. A string of outages over recent years have shown how fallible cloud computing architectures can be.

Cloud providers such as Rackspace, Microsoft and Amazon have all suffered downtime due to a variety of reasons, including failed software upgrades, lightning strikes, unspecified “network errors” and even bugs caused by the leap year.

When cloud services go down, websites and online services of all shapes and sizes are affected. The UK government’s CloudStore, Reddit, location-based social network Foursquare, streaming service Netflix and question and answer service Quora were all knocked offline when their respective cloud providers went offline, and there was very little they could do about it.

Many cloud providers offer five 9s availability (99.999% uptime) and offer compensation for any unscheduled disruption to services. But is that enough? Businesses of all sizes rely on cloud computing to operate – not just their websites but the software they use, email and other forms of communication, productivity suites and HR tools and so much more are cloud-based. If that goes down, so does the business.

But why? Why is it that these problems result in services collapsing completely, often for hours or days on end? One of the fundamentals of building mission-critical systems is failover, automatically switching to a redundant system to keep operations running smoothly. Usually this happens without the end-user noticing at all.

With cloud vendors being understandably secretive about how their systems are set up it is difficult to know why some system failures have been so catastrophic.

But what these incidents show is that perhaps cloud computing is not ready for mission-critical operations, and that at least for now traditional architectures may be more suitable.

As my colleague Lori MacVittie points out (https://devcentral.f5.com/blogs/us/the-limits-of-cloud-gratuitous-arp-and-failover) cloud doesn’t take too kindly to something called gratuitous ARP requests, which is the basis for load balancing failover (you can read Lori’s blog for a more technical breakdown of gratuitous ARPs and how they work).

This is because the way it works is to duplicate an IP address, with the secondary IP taking over if it detects an issue with the primary one. However in cloud environments networks are often shared, meaning gratuitous ARP requests could complicate matters.

This means cloud computing environments are not as good as traditional networking architectures at dealing with failover, meaning if there is a problem with one element it has the potential to disrupt the entire operation.

One potential solution is cloud balancing. Essentially what this does is to automatically select the best location from which to serve an application, whether it is the primary or secondary data centre or an on-premise location. It does this by weighing up a number of different factors such as the physical location of the customer, the current capacity of the data centre or server being used or the application response time.

It can also use more cloud-related criteria such as the location of the application and any compliance or regulatory issues that may cause, the cost of using certain locations and any stipulations made in the customer’s contract.

This results in a more reliable cloud computing environment, one that both providers and customers can fully embrace with no hesitation. It provides a cost-effective way of ensuring mission-critical applications are available when you need them, and that all necessary legal and regulatory goals are met.

For cloud balancing to be a success, the network must have full visibility into what is going on with the network and the applications that are running across it. One of the ways to do this is with the F5 BIG-IP Global Traffic Manager, an application delivery controller (ADC) that provides the load balancing across the data centre and cloud platforms.

There are undoubtedly benefits to cloud computing, but it is also true to say there are still issues to address before mission-critical workloads can be entrusted there. Load balancing and failover is one such problem, but there are certainly solutions out there to combat those issues and help cloud computing live up to the hype.