Saturday, November 12, 2011

Strategies for Scaling Real-Time Web Apps

Using Polling
Don't setup any cacheing and run a normal server. Poll for whatever data you need at any time, by as many users as possible. This puts load on your app and your database server. You can probably do a couple of thousand updates per minute before you get into trouble with a standard node.js / mongodb setup. On our Rackspace 1gb (of ram) cloud servers, we can handle a max of around 2000/second per box. Even a hefty Mongo setup won't be able to sustain that kind of load very long and the DB connections will start piling up and your servers will melt after a few minutes. With our setup of regular short polling every second (which is crazy) you'll want to scale things up around 200 simultaneous users.

Using Polling and Varnish
The strategy here is simply to setup polling between set periods of time. So you can say get me everything that happened between this 10 second chunk and that 10 second chunk. Setup varnish to cache those calls , so everyone gets the same stuff. This puts the main responsibility for handling the load on the cacheing layer, which are usually built to handle a lot of load. Your database will only have to do one update every 10 seconds. It can handle that really easily. If there is some initial data you need to load to start your app that may get out of date, you can cache that bigger query for like 1 minute and again your database will be happy as pie. You can use Varnish here as a load balancer as well, but you probably will have to scale the load balancer before you'll need to scale up your app servers. A hefty Varnish box (say 8gb of ram) should be able to handle thousands of hits per second of cached data.

Using Socket.IO for polling

If your app server can safely handle ~2000 users per second, but your database cannot, you may want to try a setup like this. Let your app server "cache" the data from your database and push updates using web-sockets/long polling with socket.io. When the user initially connects to your page you can send down the cached data (or even a buffer of say the last 1000 messages). As the database gets updated using normal POSTs and things the frequent polling of the database for updates can happen at the app-server (say one 5-second poll per app server), which will then send updates to all the users they know about. Obviously, this depends on how much data your pushing through (and how much ram you have), but with small amounts of stuff you can do the caching at the app level instead of on a special cache server, sparing both a new server and your database. Theoretically this setup would scale horizontally as well with multiple app servers, either through a load-balancer or some other round-robining approach.