Why Can't Twitter Scale? Blaine Cook Tries To Explain

Today, Blaine Cook, formerly Twitter’s chief architect, writes his first blog post since leaving Twitter. It certainly seems like he has some stuff to get off his chest. (Not true, says Blaine - see comment below).

The gist of the piece is that languages don’t scale, architectures do. It seems clear why Blaine might want to say something about scaling, since it is clearly the number one issue at Twitter. And he did take some heat for the problems there. And in some corners of the Web, Ruby, the language that Twitter runs on, has taken even more heat.

Blaine's arugment: Scaling is fundamentally about the ability of a system to easily support many servers. So something is scalable if you can easily start with one server and go easily to 100, 1000, or 10,000 servers and get performance improvement commensurate with the increase in resources.

When people talk about languages scaling, this is silly, because it is really the architecture that determines the scalability. One language may be slower than another, but this will not affect the ability of the system to add more servers.

Typically one language could be two or three, or even ten times slower. But all this would mean in a highly scalable system is that you would need two or three or ten times the number of servers to handle a given load. Servers aren't free (just ask Facebook), but a well-capitalized company can certainly afford them.

The problem comes when your architecture is such that you can’t just add more servers. While Blaine does not discuss this, the primary reason things don’t scale has to do more with the cost of data access. Databases are almost always your bottleneck, because all your data typically needs to be stored in some central repository.

So how you architect your data storage and access will determine your scalability. For example, do you use RAM based caching like memcached to improve performance and limit the need to read the database? If so, is your caching architecture good enough to limit most reads from the database, or just a few? These are the kinds of architectural decisions that will determine system performance.

In Twitter's case, there is zero chance that the problems there are in any way related to their language. It is likely that there are architectural challenges which come from the fact that it is very hard to cache a Twitter data request since no two people ever get the same data. And even for a given user, the data requests change quickly since users are always receiving tweets. This is a hard, though not unsolvable problem that requires a very specialized caching architecture. Eran Hammer-Lahav, has done some interesting work in this area and talks about it in an extensive blog post.

The bottom line is languages don’t kill scaling, programmers do. As such, Blaine's piece, while sounding a bit defiant, might really be read more like a mea culpa. Though, to be fair, despite all the chatter and criticism, scaling Twitter is indeed a non-trivial problem.