Going viral without going down June 20, 2008

This guest post was written by Saran Chari, CTO and co-founder of Flixster. Flixster is a Lightspeed portfolio company.
_________________________________________________

As the social web evolves and platforms like Facebook and MySpace open up to applications, many companies and developers are rushing to get distribution to their millions of users by “going viral”. For the successful applications, this can often present a problem (a high-quality one for sure) – how do you actually scale your deployment to handle that growth?

At Flixster, we’ve been riding this growth curve for 2 years now – first with our destination site itself (www.flixster.com), and subsequently on our embedded applications on Facebook and MySpace. Across our properties, we now have over 1 million users logging in each day and we are approaching our 2 billionth movie rating. Like many others, we started out with just a single virtual server in a shared hosting environment. So how did we scale to where we are today?

The Holy Grail for scaling is “pure horizontal scaling” – just add more boxes to service more users. This tends to be relatively easy at the application layer – there are a multitude of cheap and simple clustering and load balancing technologies. The data layer is typically much more difficult to scale, and is where a lot of web startups fall down. High-volume applications simply generate too much traffic for any reasonably-priced database (I’ll assume you’re probably running MySQL as we are). So what are your options?

Buy yourself some time

The overriding mantra to everything we’ve done to scale our database has been: “avoid going to disk at all costs”. Going to disk to retrieve data can be orders of magnitude worse than accessing memory. You should apply this principle at every layer of your application.

Given that, the first thing to do is to throw in a good caching layer. The easiest way to scale your database is to not access it. Caching can give you a ton of mileage, and we still spend a lot of effort optimizing our caching layers.

If you can afford it, you can also buy a bigger box (RAM being the most important thing to upgrade). “Scaling up” like this can be effective to a point, but only buys you so much time because after all, it’s still a single database.

A replication setup can also buy you some time if you have a read-intensive workload and can afford to send some queries to a slave database. This has its problems though, the biggest of which is replication lag (slaves fall behind). Ultimately, replication can also buy you some time, but for most web application workloads, replication is a tool much better suited to solving high-availability problems than it is to solving scalability ones.

It’s time to break up

Eventually, you’re going to have to find a way to “scale out” at your database layer. Split up your data into chunks. Put the chunks on separate databases. This strategy is often called “sharding” or more generally “data partitioning” (I use the two interchangeably). It works because it reduces the workload (and price tag) for each server. It’s not trivial, but it is very doable.

There is a lot of literature out there on the technical details and challenges of sharding (see the resources section). At Flixster, we’ve followed many of the strategies described by LiveJournal, Flickr and others. One of the critical things for any startup however is figuring out when to do things.

Our primary trigger for deciding to shard a given piece of data is the size of the “active” or “working” set. It all comes back to the principle of never going to disk. All of our database servers have 32GB of memory, which we give almost entirely to the MySQL process. We try to fit most, if not all, of our active data on a given server into that space.

The ratio of active / total data will vary tremendously by application (for us it seems to be in the 10-20% range). One way to figure out if your active data is saturating your available memory is to just look at cycles spent waiting for I/O on your server. This stat more than anything else we monitor drives our partitioning decisions.

The other thing we look at for a given table is the raw table size. If a table becomes too big (in terms of # of rows or total data volume) to administer – i.e. we can’t make schema changes easily – we partition it. There’s no magic threshold that fits all applications, but for us we typically decide to shard a table if we expect it to reach 30-40 million rows.

It’s certainly easier to start off with a fully sharded architecture, but most applications do not (we certainly didn’t). In fact, I’d say that if you are spending a lot of time figuring out partitioning strategies before you even have any users, you’re probably wasting development resources. So how do you actually rip the engine out of the car while it’s running? Piece by piece and very, very carefully…

Crawl, walk, run

There are a variety of partitioning strategies, which we’ve employed incrementally as we’ve grown. Here are some of the things we’ve done (in ascending order of difficulty).

Table Partitioning

If you have a large table with a relatively small “hot spot”, consider putting the active data into a separate table. You will have some additional complexity managing the flow of data from the “active” table to the “archive” table, but at least you have split the problem a bit. This is the strategy we used early on for our movie ratings table, after realizing that 90% of the queries we were writing against it were looking for data from the last 30 days.
Vertical (or feature-based) Partitioning

Your application may have features that are relatively independent. If so, you can put each feature on a separate database. Since the features are independent, separating them shouldn’t violate too many assumptions in your application.

We did this pretty early on, and have had a lot of success with this approach. For example, movie ratings are a core feature that didn’t overlap too much (data-wise) with the rest of the database. Comments are another one. We’ve followed the same strategy for several other “features” and now have six separate feature databases.

This was a major step forward for us as it split our big problems into several smaller ones. You might not need to go any further…vertical partitioning may be sufficient. But, then again, you want to grow forever, right?

One of the challenges of horizontal partitioning is in rewriting your data access code to figure out which database to use. With vertical partitions it’s relatively straightforward – which feature am I coding? With user-based partitioning, the logic can get much more complex. Another challenge in horizontal partitioning is the transition from your single data source into your partitions. The data migration can be painful. Extra hardware eases much of the pain, especially coupled with replication.

Following movie ratings, we have now horizontally partitioned a handful of other tables. We’ve also doubled the size of the partition cluster itself, going from four to eight master-slave pairs. We still use our vertically-partitioned feature databases, but they are under much less stress given the load absorbed by the horizontal partitions. And we continue to partition our high-volume tables on an as-needed basis.

Finally, some tips

• Start small, and bite things off in pieces that are manageable. Massive, several-month-long re-architectures rarely work well.
• Get some advice. We spent a good amount of time gleaning wisdom from the success of others (which they were kind enough to put online for everyone!). See the Resources section.
• Pick the best approach for your specific problems (but you have to know where your problems are – monitor EVERYTHING).
• You’ll never get there if you don’t start.
Bonus tip – come work @ Flixster!

If you’re a DBA and interested in working on these kinds of problems at a company that is already operating at scale, please send us a resume: jobs – at – flixster.com. We’re also hiring Java developers.

Excellent, well written article that describes nicely some options for DB scaling. A quick question about Horizontal Scaling. What methods do you use to decide where to partition the data? Is it simply alphabetical? Based on usage frequency?

@Tin Pig – Figuring out what “dimension” along which to partition your data depends on your access patterns. When analyzing this for a new table we’re partitioning, we ask ourselves – “what are the MOST important questions we need to ask the database for this entity?” For ratings, it turned out we needed to query by user (give me all the ratings for this user) and by movie (give me all the ratings for this movie). Everything else we needed was a subset of these queries. So we ended up partitioning our ratings table by BOTH users AND movies. Another piece of advice – partition by some immutable value – e.g. primary key, row creation date, etc. Partitioning by something that changes is going to give you lots of headaches as you’ll need to move data around a lot.

RE: referential integrity – just wave goodbye 🙂 In a lot of cases, if you really examine your application’s requirements, integrity is a nice to have, not a need to have. However, if you feel like you MUST have it, there are 2 options. One – ensure integrity at the point of data creation by using distributed transactions (although I haven’t heard great things about MySQL’s XA transactions). Two – build scripts/processes to check the data after the fact and clean up errors. We actually do take this approach in one or two cases.

This is a fabulous article. I have worked with a few people on technology and I can relate to the concerns of scalability and uptime all the time. Your article explains in very simple words the concept of scaling that is what fascinated me the most.

About...

Lightspeed is a leading global venture capital firm with over $2 billion of committed capital under management. Our investment professionals and advisors are located in Silicon Valley, China, India and Israel.

Email Subscription

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Subscribe by email or RSS reader

Legal Disclaimer

Information set forth on this site should not be construed as investment advice or an offering of any particular investment. Any opinions or views expressed on this site are those of the author and do not necessarily represent the opinion or view of Lightspeed Venture Partners.