Author Archive

Building out a highly available website means that it is fault-tolerant and reliable. A best practice is to put your web servers behind a load balancer not only to distribute load, but also to mitigate the risk of an end user accessing a failing web server. However, traditional load balancing funnels traffic into a single-tenant environment—a single point of failure. A better practice is to have a distributed load balancer that takes advantage of the features of the cloud and increases the fault-tolerance abilities on the load balancer. GoGrid’sDynamic Load Balancer service is designed around a software-defined networking (SDN) architecture that turns the data center into one big load balancer.

GoGrid’s Dynamic Load Balancer offers many features, but one of its core features is high availability (HA). It is HA in two ways.

First, on the real server side, deploying multiple clones of your real servers is a standard load-balancing practice. That way, if one of your servers goes down, the load balancer will use the remaining servers in the pool to continue to serve up content. In addition, each GoGrid cloud server that you deploy as a web server (in the real server pool) is most likely on a different physical node. This setup provides additional protection in the case of hardware failure.

Second, on the Dynamic Load Balancer side, the load balancers are designed to be self-healing. In case of a hardware failure, Dynamic Load Balancing is designed to immediately recover to a functioning node. The Virtual IP address of the Dynamic Load Balancer (the VIP) is maintained as well as all the configurations, with all the changes happening on the back end. This approach ensures the Dynamic Load Balancer will continue to function with minimal interruption, preventing the Dynamic Load Balancer from being a single point of failure. Because the load balancer is the public-facing side of a web server, whenever it goes down the website goes down. Having a self-healing load balancer therefore makes the web application more resilient.

Users with websites or applications that need to always be available would benefit from including GoGrid’s Dynamic Load Balancing in their infrastructure. The load balancer is important for ensuring the public side of a service is always available; however, including easily scalable cloud servers, the ability to store images of those servers in persistent storage, and the option to replicate infrastructure between data centers with CloudLink are all important elements of a successful HA setup.

GoGrid has recently released some new features that improve on the customer experience using our private network. Private Network Automation (PNA) is currently available in all our data centers. As of this most recent release, these new features will be exposed if you enable PNA by contacting support:

All servers will have a private IP assigned upon creation (both virtual and dedicated)

Any private IPs that are used will be marked as assigned on the portal

The assignment of private IPs happen automatically at the time a new server is deployed. GoGrid has enabled this for all new customers. If you are an existing customer, this is feature IS NOT enabled in data centers where you have servers deployed. You will need to file a support ticket to request this feature. Note that once enabled, this will be active for all new servers only – existing servers will keep their existing settings.

As you can see from the screenshot below, once you create the server, you will have a public IP and a private IP assigned. Note that this feature is enabled for both virtual and dedicated servers.

This is also visible in the Networking tab so that you can monitor private IPs that have been assigned from your block.

Basho is a GoGrid partner and responsible for the open-source Riak project. If you are not familiar with Riak, it is a well regarded open-source distributed database. It was built off of the Dynamo concept so it is often compared to Cassandra and Amazon Dynamo DB.

Riak is used as a fast, fault-tolerant distributed database. Companies like Mozilla use it for storing and analyzing beta testing results. Mozilla needed a solution to help improve the user experience and that would allow them to store large amounts of data very quickly. Another example of a company using Riak is Bump which uses Riak to scale and manage massive amounts of data sent between it’s millions of users. Riak is used to store elements of past user conversations so that communication history is readily accessible to users.

Basho Riak version 1.1.4 is now available as a GoGrid Community Server Image (CGSI). You can find it when you launch a virtual machine and search for “Riak”. This image is available in all our data centers. This CGSI contains the open source version so support is only available via the community site and will not have all the features present in the Enterprise version. However, you can use this image to either run a proof of concept (POC) of Riak to see if it will meet your needs or to run a small cluster. These will run on GoGrid’s high performance VMs which have been shown to have significant performance advantages over other cloud implementations.

I recently attended Under the Radar 2012 as GoGrid was a sponsor of this event. As there were several tracks, Michael Sheehan and I split the tracks and I covered Infrastructure, Database Scalability and Big Data. Michael covered Mobile Access, Infrastructure, Performance Monitoring, PaaS in Part 1. Overall, the presenting companies have some compelling ideas and it gives an indicator as to the new thinking happening in Silicon Valley. The trends that I noticed were: a continued interest in private clouds, the increase in adoption of Openstack and the prevalence integrating Big Data.

If you never attended Under the Radar, the format is to have four startups that already have a real product present for 6 minutes and are then judged by a panel of experienced executives at more established companies. The presenters had to be companies that are actual startups with a unique value proposition and a real product that they are able to monetize. Alumni or companies that are already more established can also present as a “Grad Circle” member but they are not included in the awards presented at the end of the show. And like American Idol, the audience also has a vote on their favorites for each category. I included the Judge’s choice and Audience choice for each category but also added my own choice which reflects my own opinion and not that of GoGrid.

Infrastructure

This category focused on companies that are delivering infrastructure or infrastructure management products. So this would include services that could offer up infrastructure components (like compute, network, and storage) or even tools for managing configurations and deployments. Not surprisingly, nearly all of them focus on the cloud as the operating model of choice.

Cloudscaling – This company focuses on delivering an amazon-like cloud using Openstack. Their solution is comprised of Open Cloud OS, which is a product grade version of Openstack, Cloudblocks, a comprehensive architecture for cloud services and Hardware Blueprints, which are templates for physical hardware. Customers can leverage this solution to deploy a public or private cloud in their own DC.

But What is Big Data?

The problem with using the term “Big Data” is that it’s used in a lot of different ways. One definition is that Big Data is any data set that is too large for on-hand data management tools. According to Martin Wattenberg, a scientist at IBM, “The real yardstick … is how it [Big Data] compares with a natural human limit, like the sum total of all the words that you’ll hear in your lifetime.” Collecting that data is a solvable problem, but making sense of it, (particularly in real time), is the challenge that technology tries to solve. This new type of technology is often listed under the title of “NoSQL” and includes distributed databases that are a departure from relational databases like Oracle and MySQL. These are systems that are specifically designed to be able to parallelize compute, distribute data, and create fault tolerance on a large cluster of servers. Some examples of NoSQL projects and software are: Hadoop, Cassandra, MongoDB, Riak and Membase.

The techniques vary, but there is a definite distinction between SQL relational databases and their NoSQL brethren. Most notably, NoSQL systems share the following characteristics:

Do not use SQL as their primary query language

May not require fixed table schemas

May not give full ACID guarantees (Atomicity, Consistency, Isolation, Durability)

Scale horizontally

Because of the lack of ACID, NoSQL is used when performance and real-time results are more important than consistency. For example, if a company wants to update their website in real time based on an analysis of the behaviors of a particular user interaction with the site, they will most likely turn to NoSQL to solve this use case.

However, this does not mean that relational databases are going away. In fact, it is likely that in larger implementations, NoSQL and SQL will function together. Just as NoSQL was designed to solve a particular use case, so do relational databases solve theirs. Relational databases excel at organizing structured data and is the standard for serving up ad-hoc analytics and business intelligence reporting. In fact, Apache Hadoop even has a separate project called Sqoop that is designed to link Hadoop with structured data stores. Most likely, those who implement NoSQL will maintain their relational databases for legacy systems and for reporting off of their NosQL clusters.