The technology team behind Think Through Math

Optimizing Long-running Queries With New Relic

Feb 27th, 2013

Summary: For single-threaded Rails web apps, a good strategy to improve concurrency is to ensure a low and consistent transaction time across all of your controller actions. New Relic provides excellent tools for identifying and prioritizing the worst offenders.

At Think Through Math, we’re always happy when we can use math to solve a real-world problem. When it comes to modeling operational efficiency in Rails, math once again provides a great framework. Many Rails apps are single-threaded; as of the time of this writing ours is no different. Let’s assume that the average controller method in our app takes 250ms. That means in a perfectly ordered environment we can process four requests per second, per web server queue. We’re hosted on Heroku, using the Unicorn Web server with four worker processes per dyno. Since each worker is its own queue, we can process on average 16 requests per-second-per-dyno, or 960 requests per minute. To generalize this we can write an equation: (1000 / avg_response_time) * queues_per_dyno * 60 = dyno_throughput.

This is where New Relic comes in. New Relic will always show you your average app server response time, both real-time and as a graph over the time range you have selected.

To start using New Relic to optimize queries, click the Web transactions tab under Monitoring. Click App server if it isn’t already selected, then sort by Slowest average response time. This will give you a list of controller methods, sorted by highest average tranaction time. Heroku’s recommendation is that all transactions have an average response time below 500ms, so that’s a good place to start. What we do is make the time window reflect at least three hours of peak usage, then export the list as CSV (there’s an option in New Relic). We then look at the combination of average response time, max response time, std deviation, and call count to prioritize our efforts.

Implementing the refactor to reduce the average is going to be specific to your application, but we generally end up with one of four different strategies:

Use the transaction tracing features in New Relic to identify the slowest parts of the transaction and optimize that code

Database optimization; add indexes, denormalize, add counter caches

Switch elements of the page to use an ajax callback strategy

If it’s not time-sensitive, move the transaction to a background job (we use and love the awesome sidekiq for this)

We typically dedicate a portion of development effort each iteration to this activity. Rather than assign it to a specific developer we like to do optimization parties - divvy the list up and have people pair on solutions. It gives people a chance to work on parts of the code they don’tnormally interact with, and it’s an excellent way to add technical breadth to the team.