Meta

How important is it to merge queries together?

Reading Ernie’s post today inspired me to write about something I’ve been wanting to write about for a while: how much of a performance impact you should expect from network latency.

Hypothetical Numbers

I’m going to throw out some web-application ‘back of the envelope’ numbers which I will then use for examples:

The goal is to generate a page within 200-500ms.

Network latency between the application server and DB server is 0.5-1ms.

There will be use-cases which have goals far more agressive than this, for example advertising server goals are closer to 50ms. I have also seen network latency well below 1ms. I encourage commenters to correct me where they disagree 😉

Analysis

So if we say that a typical web page requires 5-10 queries to render, you can see that the amount of time spent in network latency could actually be very low:

However, what I want to do is illustrate the effects of the N+1 query pattern. That is, some applications will naively repeat a query inside a looping structure. So using the numbers we have established, lets take a look at the difference between:

SELECT * FROM Country – 239 rows in 1 query.

SELECT * FROM Country WHERE code = '?' – 239 queries with one row each.

Using performance_schema + ps_helper, I can show the time that it takes to execute on the server:

Because the SELECT * FROM Country retrieves every row, it is going to be a table scan (which is quite efficient in this case). We can see that the query time is 1.14ms to retrieve all 239 rows versus 0.239ms to retrieve just a single row.

This example is largely the “worst-case” situation, which I am using to illustrate an example. 2.14ms versus 296ms isn’t even comparable 🙂 To apply context – this difference will basically double my page generation goal of 200-500ms.

Merging Queries

My next example to run through, is what happens when we merge queries together. In this case retrieving multiple rows via primary key at once:

SELECT * FROM Country WHERE code IN ('AUS', 'USA', 'CAN');

instead of:

SELECT * FROM Country WHERE code = 'AUS';
SELECT * FROM Country WHERE code = 'USA';
SELECT * FROM Country WHERE code = 'CAN';

We have the time for the single row query above. Here is the timing information from ps_helper for the IN list query:

I’ve also encountered a setup on which network latency was an issue. Using prepared statements will make this much worse. Some connectors have client site prepared statements, which will save you a roundtrip.