From: Stephen Tu
Date: July 11 2012 3:22pm
Subject: Re: why does "select * from table oder by indexed_field" not use key?
List-Archive: http://lists.mysql.com/mysql/227805
Message-Id:
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary=e89a8f3bafdb0a92cd04c48f69ce
--e89a8f3bafdb0a92cd04c48f69ce
Content-Type: text/plain; charset=ISO-8859-1
On Wed, Jul 11, 2012 at 5:46 AM, Reindl Harald wrote:
> > mysql> show profiles;
> >
> +----------+------------+------------------------------------------------------------------------------------------+
> > | Query_ID | Duration | Query
> > |
> >
> +----------+------------+------------------------------------------------------------------------------------------+
> > | 1 | 0.32261700 | SELECT SQL_NO_CACHE * FROM
> > cms1_quickbar_groups force key (qbq_key) ORDER BY qg_sort ASC |
> > | 2 | 0.24592100 | SELECT SQL_NO_CACHE * FROM
> > cms1_quickbar_groups ORDER BY qg_sort ASC |
> >
> +----------+------------+------------------------------------------------------------------------------------------+
> > 2 rows in set (0.00 sec)
>
> this may be true for small data where it does not matter at all
> but if this would be a large table it would cause a lot of I/O
>
>
While I agree with you that Ewen's microbenchmark is not the most
convincing, I do agree with his reasoning, and I encourage you to try
benchmarking both options on your dataset. Remember to flush both the OS
disk cache and the mysql buffer pool between runs, so that your benchmarks
are actually reflecting cold runs instead of partially warmed up runs.
So why do I believe no index is faster for your particular query? Well, a
secondary index (qbq_key in your case) is usually key/value pairs of the
form [index key, page ID pointing to tuple]. So if we answered your query
with qbq_key, we don't need to do a sort, *but* we'll need to do roughly
one disk seek for each key in the index (I'm assuming here that qg_sort
values don't have any strong correlation with qg_id, and that the pages for
the table aren't already in the buffer pool/OS disk cache).
Compare that with a case where we don't use an index, so we must do a
filesort. Since mysql has an optimization (
https://dev.mysql.com/doc/refman/5.5/en/order-by-optimization.html - read
the "modified filesort algorithm" section) which stores the entire tuple
(if under a threshold size controlled by max_length_for_sort_data, which is
1024 bytes by default which your schema seems to fall under) when sorting
instead of just the [sort key, page ID], the filesort can actually avoid
much of random disk seeks (since mergesort itself is a very sequential IO
heavy algorithm). So for this plan, you end up basically doing several
sequential IOs over the entire table (whereas the previous plan just had to
read the entire table once, albeit randomly). Most likely the mysql
optimizer has calculated that several sequential scans over the table are
much faster than a bunch of random disk seeks which reads the table at
once. In fact, I believe the general rule of thumb for DB optimizers is
that if you need to read more than 10% of a table from an index, you are
better off using a sequential scan.
Like I said, I encourage you to measure the performance to convince
yourself that mysql is actually doing the right thing.
--e89a8f3bafdb0a92cd04c48f69ce--