How Amazon saved Zynga’s butt—and why Zynga built a cloud of its own

Zynga's cloud runs just like Amazon's, but it's all in-house.

Five years ago, the social gaming company Zynga was cruising along with a fairly standard IT infrastructure. Servers were racked and stacked in a retail data center where Zynga rented space. Customer demand for games like Zynga Poker, launched in 2007, was being met.

Then along came FarmVille. After the game's 2009 release, 10 million users were hitting FarmVille servers within six weeks, and 25 million within five months.

"We couldn’t get power fast enough. We couldn’t get servers fast enough. We just couldn’t scale our infrastructure to match the needs of FarmVille," said Allan Leinwand, Zynga’s CTO of infrastructure.

Zynga upended its whole IT model, shifting most of its infrastructure to the Amazon Elastic Compute Cloud, which lets businesses buy virtual servers and storage, scaling capacity up and down as needed. "They clearly saved us. They clearly were helping us scale throughout 2010 and 2009. They did an amazing job," Leinwand said. But eventually, Zynga "realized we could actually do it on our own and we could scale it in a way that worked better for our business."

The private cloud Zynga would build is in some ways better and more efficient because it's entirely under Zynga’s control. Moving nearly all of its own servers away from retail, co-location data centers, Zynga started building its own data centers on both US coasts. By early 2011, about 20 percent of Zynga game users at any given time were logged onto servers in Zynga’s own data centers, while the other 80 percent were playing in the Amazon cloud.

By the end of 2011, the number flipped. Now, 80 percent of Zynga usage is accommodated by Zynga-built data centers, and the other 20 percent is fueled by Amazon. Zynga doesn’t plan to ditch Amazon entirely, as it’s still a great "shock absorber" for sudden increases in interest—like when Alec Baldwin was kicked off a plane for refusing to stop playing Words With Friends last year.

Building a better Amazon

But in key respects, Amazon’s cloud can’t match what Zynga can achieve on its own, because unlike Amazon, Zynga doesn't have to meet the needs of thousands of businesses. It just has to build what's best for Zynga. Leinwand discussed the basics of Zynga’s infrastructure in a keynote address Tuesday at the Interop Las Vegas conference, and then met with Ars to describe the project in a bit more detail. Lovingly called "zCloud," Zynga’s internal infrastructure is managed by CloudStack software and was designed with the company RightScale, which specializes in managing cloud deployments on Amazon and in private data centers.

One result is that both Zynga’s internal infrastructure and the resources deployed on Amazon can be managed from a single console.

"When you go to provision virtual machines on Zynga infrastructure there's a little drag-down box that says ‘is this going to be Amazon or is this going to be zCloud?’ and then you hit the apply button," Leinwand said.

Leinwand said Zynga doesn’t reveal exactly what types of servers it uses, or even how many data centers it has built—except that it runs "multiple" facilities on both the East and West coasts. But generally speaking, not being tied to the instance types offered by Amazon lets Zynga customize its hardware and software to meet the specific needs of FarmVille, Words With Friends, and all of its other games.

Zynga’s infrastructure is big. Leinwand reports having 24.5 trillion rows of data in Zynga’s database system, for 1.4 petabytes in total. Zynga has to push massive amounts of data in and out of memory as players make changes to their in-game worlds. Load balancers distribute the traffic across Web servers, while data moves from servers to an in-memory cache and on to a high-capacity, highly available system for longer-term storage. In addition to serving millions of casual gamers, Zynga also provides a platform to help third-party developers build social games.

While Zynga doesn’t build its own servers the way Facebook and Google do, it does buy servers from hardware vendors who can customize them for Zynga's infrastructure, with a configuration that's similar to Facebook’s Open Compute technology.

Leinwand says a Zynga workload that runs on three servers on Amazon can run on just one server in-house. The solution isn’t running a giant server—they’re about the same size as what Zynga gets from Amazon. The secret is making dozens of tweaks that add up to huge gains in efficiency. Deep analysis into server performance, network traffic trends, and how applications use resources is a time-consuming but crucial process.

"We dug into our applications. We wrote tools that got into the memory heap. We wrote tools that look at profiling certain processes. We got into the Linux kernel. We use CentOS, and we got into the CentOS kernel and figured out where those bottlenecks are."

For all Amazon’s scalability, the offerings can be a bit rigid. For example, you can rent an Amazon instance with a certain amount of storage and compute power, but adding a few gigabytes of memory or another processor might require buying a whole separate instance, which may have more resources than you really need.

"You can't go to the public cloud and say I want another 64GB of memory here. They look at you and say ‘buy another instance of this type,’" Leinwand said.

Leinwand said the Amazon instance model leads to over-provisioning, meaning you end up buying more storage than necessary. Internally, Zynga uses direct-attached storage striped across multiple servers, providing a big I/O performance boost and more efficient utilization, he said.

Every performance and efficiency improvement is important when you’re growing as fast as Zynga. Leinwand says that for every server the company had two years ago, it now has 100. For the foreseeable future, a percentage of those servers will remain on the Amazon cloud, even as Zynga’s own data centers ramp up. Besides providing always-available capacity, Amazon is still good for certain workloads that aren’t memory-intensive or have smaller needs for CPU cores and storage, Leinwand said.

"People ask me a lot, ‘will you ever get off that public cloud?’" Leinwand notes. “I don’t think that's where we want to go. We want to have the public cloud there to be that shock absorber. We spend a lot of time looking at workloads and deciding which is best for which service. That’s the science of what we do."

UPDATE: One reader asked a great question—what hypervisor does Zynga use? Leinwand tells us that, as most of you know, Amazon's cloud uses a customized version of the Xen hypervisor. Zynga has gone a similar route internally. "On zCloud, we use the Citrix CloudStack orchestration software and that runs on XenServer," Leinwand said.