EventMachine is a gem that provides an easy interface to an event reactor capable of handling thousands of concurrent clients all in a single thread. The library has convenience classses for TCP/UDP connections, timers, signals and file io that can be used in a concurrent and safe way.

Here is the most basic example, that starts the reactor and prints the time every second:

require 'eventmachine'
EM.run do
EM::PeriodicTimer.new(1) do
puts "The time is %s" % Time.now
end
end

Of course, actual usage is never that simple. Here is a TCP server that receives data, prints it, and does some mock processing:

You can see the draw of EventMachine, this code is incredibly easy to read and hides the many complexities of concurrent network programming.

A Server Is Only As Fast As The Slowest Action

Event driven has become synonymous with fast, but frameworks like EventMachine can be bottlenecked by poorly performing callback code. Such code will cause events to queue up and be processed well after they are received. You can usually alleviate these delayed events by using evented database drivers or background jobs to do heavy processing, but sometimes slow callbacks are unavoidable.

It’s pretty easy to see that our example server will not be able to process more than 10 messages per second, since each event takes at least 0.1 seconds to process. You can guess what happens when we receive a burst of 100 messages: the last message will be stuck in the queue 9.9 seconds, while the messages ahead of it are processed.

This period where EventMachine is processing pending data and is unable to respond to new data is one of the better indicators for a service that is nearing capacity limits. EventMachine doesn’t have any hooks to inspect this data, but with clever use of the PeriodicTimer we can get access to something very close.

Just A Second

Using our server example from above, we’ll add a PeriodicTimer set to fire every second and track how close to 1 second it actually fires. If things are good, it will be near zero. If things are backing up, this time will start climbing.

Finally, we’ll add one line of code to instrument this with the Instrumental agent and then we can easily visualize the data.

Observing sudden spikes can alert you to erratic performance in your own actions, a sudden burst of data from a specific client, or server specific conditions. Sustained increases in a latency measurement like this one can be a good indicator for the need to increase your capacity, either by improving single server resources or purchasing more servers.

We’ve been using this technique at Instrumental for a while now to track when we need to increase our collection capacity; the graph is great for a quick glance of overall system health, and quite pretty too :).