Tuning ORDER and GROUP BY

GROUP BY, ORDER BY, and certain group functions (MAX, MIN, etc.) may require that data be sorted before being returned to the user. You can detect that a sort is required from the Using filesort tag in the ExTRa column of the EXPLAIN statement output, as shown in Example 21-19.

If there is sufficient memory, the sort can be performed without having to write intermediate results to disk. However, without sufficient memory, the overhead of the disk-based sort will often dominate the overall performance of the query.

There are two ways to avoid a disk-based sort:

Create an index on the columns to be sorted. MySQL can then use the index to retrieve the rows in sorted order.

Allocate more memory to the sort.

These approaches are described in the following sections.

21.4.1. Creating an Index to Avoid a Sort

If an index exists on the columns to be sorted, MySQL can use the index to avoid a sort. For instance, suppose that the following index exists:

MYSQL can use that index to avoid the sort operation shown in Example 21-19. Example 21-20 shows the output when the index exists; note the absence of the Using filesort tag and that the i_customer_name index is used, even though there are no WHERE clause conditions that would suggest that the index was necessary.

When MySQL performs a sort, it first sorts rows within an area of memory defined by the parameter SORT_BUFFER_SIZE. If the memory is exhausted, it writes the contents of the buffer to disk and reads more data into the buffer. This process is continued until all the rows are processed; then, the contents of the disk files are merged and the sorted results are returned to the query. The larger the size of the sort buffer, the fewer the disk files that need to be created and then merged. If the sort buffer is large enough, then the sort can complete entirely in memory.

You can allocate more memory to the sort by issuing a SET SORT_BUFFER_SIZE statement. For instance, the following allocates 10,485,760 bytes (10M) to the sort:

SET SORT_BUFFER_SIZE=10485760;

You can determine the current value of SORT_BUFFER_SIZE by issuing the following statement:

SHOW VARIABLES LIKE 'sort_buffer_size';

As you allocate more memory to the sort, performance will initially improve up to the point at which the sort can complete within a single "merge run." After that point, adding more memory appears to have no effect, until the point at which the sort can complete entirely in memory. After this point, adding more memory will not further improve sort performance. Figure 21-9 shows where these two plateaus of improvement occurred for the example above. It also shows the effect of creating an index to avoid the sort altogether.

To find out how many sort merge runs were required to process our SQL, we can examine the value for the status variable SORT_MERGE_PASSES from the SHOW STATUS statement before and after our SQL executes.

Figure 21-9. Optimizing ORDER BY through increasing sort buffer size or creating an index

To optimize SQL that must perform a sort (ORDER BY, GROUP BY), consider increasing the value of SORT_BUFFER_SIZE or create an index on the columns being sorted.