Archive for April 10th, 2012

The sprites server has smashed through the 250k barrier with 250,001 concurrent, active connections. In addition to the stuff talked about in my previous two blog entries, it took a few more optimizations.

1) Latest tagged revision of V8, which appears to perform a little better

The 250k limit is right on the fringe of what this server can pull off without violating the 1.4GB heap limitation in V8. It’s not clear to me why this limitation hasn’t bubbled up in priority enough to be taken care of yet. Just look at those free CPU cycles and unused memory! V8 is a complete bad-ass, and it’s just this one limitation that is holding it back from some really extreme capabilities.

I had tried 250k a few times, and couldn’t get it stable until after upgrading the /deps/v8/ directory of Node.js with the latest version tagged in SVN. I was really hoping the 1.4GB limit had been removed, but alas. Instead, just had to settle with some significant improvements to garbage collection and performance in general.

2) Workers via “cluster” module

I figured the onslaught of HTTP GET requests the 100 EC2 servers were unleashing at a rate of 100,000 JSON packets per second was contributing both to CPU and memory consumption, which was agitating the garbage collector enough to resist the 250k mark.

So, to reduce the overhead of these transient requests, which are in addition to the 250k concurrent connections (which really leaves us on average at about 350k connections at any given second, 100k of which are transients), I decided to leverage the cluster module.

The master performs the exact same tasks it did before. Except, now there are workers spawned, one for each CPU on the system, and they listen on a separate port from the master process. This port is used by the clients for these transient requests, and as they arrive they are parsed and forwarded to the master using the Node.js send() function.

The only reason this was a critical adjustment is, again, the 1.4GB heap limitation in V8. Keeping all the resources associated with those requests off the master process saves just enough memory to get past the 250k milestone.

TL;DR – V8, you’re so good, but it’s time for you to support more memory!