Common Application Problems and How to Fix Them: The Select N + 1 Problem

At AppDynamics we get the opportunity to see the inner workings of a lot of applications. While these applications all seem pretty different to their end users, what’s under the hood usually doesn’t vary that much (sorry, your app isn’t a unique snowflake after all). They all have similar service oriented architectures using a variety of databases, caches, and queues. This holds true for the performance issues these apps experience and the antipatterns that cause them. The same problems show up in high-speed trading applications, e-commerce sites, mobile apps and online games – so we thought we’d put together some of the most common performance problems in a blog series to show you how to find, fix and prevent them. In this blog post we’ll take a look at a pretty common problem that can be tricky to detect in a large application: the Select N + 1 problem.

What is it?

The N + 1 problem is a performance anti-pattern in which an application makes N + 1 database calls (where N is the number of objects fetched). Like most antipatterns, this isn’t necessarily a problem in itself, but under certain circumstances (where N is large, for example) it will cause performance to degrade by making hundreds or even thousands of database calls for a single business transaction.

In plain English: You’re spamming your database with really small, fast queries instead of using one or two more complex ones.

Here’s how it usually goes down: You have two database tables with a parent/child relationship (like blogs and posts, or products and line items), and you want to iterate through all of them. So you do this:

SELECT id FROM Parent

and then executing a query for each record:

SELECT * FROM Child WHERE parent_id = ?

This isn’t necessarily a bad way of doing it, especially if there aren’t many parents or children. But what if you’re a giant e-commerce company with thousands of products and line items? Suddenly each transaction is calling the database thousands of times. Even if each database call is super fast, the cumulative response time of that transaction will be seconds (if not longer). Which is not an ideal situation. The problem only becomes worse with increase in traffic.

How to find it with AppDynamics

Like I said, this isn’t always a problem, but when it is it can be hard to find. The database will probably look normal to the DBA – they won’t see any long-running queries, and CPU may look fine under normal load. The best way to find a problem like this is using a performance monitoring solution that allows you to drill into a particular Business Transaction (user request) and see what code is executed and how it is accessing the database.

Here’s an example from an AppDynamics customer:

The best part about AppDynamics is its request snapshots. They not only allow us to troubleshoot performance problems faster – they also allow us to perfect his code. “AppDynamics allows us to see what is going on and identify the issues that could be refactored and made faster,” he said. “It lets us bridge the gap between anecdotes from users and actual, actionable information.” – Cornell University

How to fix it

This problem happens most often when you’re using a persistence engine or an object/relational mapper (ORM) like Hibernate and you’re using lazy loading. Be sure to understand the defaults for any Object-relational mapper before you begin using them.

If you’re just writing raw SQL, you may want to fetch all your data at once and then join the two sets of records, like this:

Optimizing database access

Many times how an application access the database will be a focus of optimization. Stay tuned for our next blog post in the series we will discuss the importance of caching database access. If you enjoyed this blog post, checkout our ebook on java performance problems.

I came across your ebook on top 10 java performance problems and found it very informative. I have a few questions about the last section (chapter 10) that details tools that could be used to detect performance problems

1. The ebook mentions about taking frequent thread dumps – every 50-100 ms. Would this be a recommended practice when troubleshooting high cpu usage issues? Since the cpu is already high, taking thread dumps so frequently would make the server non-responsive?

2. You also mention about modifying code to associate a business transaction to a thread identifier. Could you detail on a bit on what type code modification are you referring to?

It would be appreciated if you could share insights on how your solution would help me to troubleshoot high cpu usage issues in production considering it to be a distributed java based enterprise application. Thanks.