Growing And Scaling With Big Data On The Hybrid Cloud

This is a guest post written and contributed by Mike Prince, Founder at Corporation Wiki[1], a Rackspace Hybrid Cloud[2] customer that archives historical corporate data to become the go-to spot for information on corporations and executives.

At Corporation Wiki, we store lots of data – it’s the nature of our business. We archive historical corporate data and have become a go-to resource for people to discover and understand connections between corporations and executives maintaining a directory of more than 90 million corporate officers and executives to map and connect these companies and individuals. We serve more than 300 million page views per month to visitors and search engine web crawlers. To get a handle on all of this data, we rely heavily on a hybrid cloud environment – using RackConnect[3] to bridge our dedicated Rackspace servers to the Rackspace Cloud[4]. This helps us serve our visitors quickly and at scale with no hiccups.

When we started, we were on a single dedicated server maintained and managed by Rackspace. As our traffic grew, we added another server to cluster the Microsoft SQL Server database and our front end web servers were moved to the cloud and scaled using a dedicated F5 Big-IP load balancer in front of the cloud servers. This let us run our database on a high-capacity and highly-available server cluster while scaling the front-end servers through a high performance dedicated load balancer.

Our hybrid cloud environment is a key to helping us process big data in a timely manner. We process data from sources all over the country, a lot of it from public records. A hybrid cloud and the ability to quickly spin up dozens of cloud servers helps us overcome these big data processing challenges in hours instead of weeks. And to clean, standardize and unify data in a reasonable amount of time, we break large jobs into smaller jobs and run them across dozens of cloud servers. These low-cost instances are spun up as needed and the import and processing jobs complete by a factor of 100 times faster than when we used dedicated hardware alone.

The ability to manage and connect our dedicated servers to demand-base cloud instances is unlike anything else out there.

With a hybrid cloud, we have the flexibility to grow and scale. And with RackConnect as the cornerstone, we’re able to build a hybrid cloud solution that arms us with the ability to keep growing. It’s an optimal compute environment that gives us new freedom and unifies our original managed dedicated server, clustered Microsoft SQL Servers, Windows cloud servers and a load balancer.

And we’ll continue adding new data sets as we build our historical archive. Within our hybrid cloud deployment we also use Rackspace Cloud Files[5], which gives us a scalable data store for images. It’s nearly limitless storage capability and scalable performance is much more cost effective for us than trying to handle these amounts of data on dedicated hardware.

We wouldn’t be where we are today without our Rackspace-powered hybrid cloud infrastructure and Rackspace’s help in scaling, supporting and maintaining our high-performance environment. And because of it, we can focus on building our business and the programming required to bring such a large site online while we continue to grow and scale into the future.

To read more about how Corporation Wiki leverages the Rackspace Hybrid Cloud, check out this case study[6].

Endnotes:

Corporation Wiki: http://www.corporationwiki.com/

Rackspace Hybrid Cloud: http://www.rackspace.com/cloud/hybrid/

RackConnect: http://www.rackspace.com/cloud/hybrid/rackconnect/

Rackspace Cloud: http://www.rackspace.com/cloud/

Rackspace Cloud Files: http://www.rackspace.com/cloud/files/

check out this case study: http://www.rackspace.com/knowledge_center/case-study/hybrid-hosting-powers-big-data-processing-for-corporation-wiki