Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

I created a clustered index on a table expecting it to make the queries with ranges perform better, but, different values in the where clause can produce differente query plans, one uses the clustered index and one does not.

My question is: What can I do to make the DBMS use the better query plan? Or better yet, should I change my schema to something better?

Details:

I'm using Amazon's RDS (Server version: 5.5.31-log)

I executed optimize table on each table (expecting it to "rebuild" the clustered index and reset the statistics), sometimes it does not change anything, sometimes the DBMS will use worse query plans, sometimes makes it faster because it will use the clustered index.

explain extended followed by a show warnings did not produce anyting interesting/useful

Can you have many price ranges per product? (or just 2, one for buy and one for sell)? If you have just these two, the primary index could be defined as (type, product_id)
–
ypercubeAug 20 '13 at 23:40

1

And how do you expect anyone to help when the description and the EXPLAIN plan use different names for the tables and different query?
–
ypercubeAug 20 '13 at 23:47

In your slow example, type:index combined with Extra:using index means a full index scan; the only thing worse than this is a full table scan. Unfortunately, without some understanding of your WHERE clause, table structures, and join conditions, it's impossible to provide much more, though I expect your clustered index strategy is more of a hindrance than a help, especially when compared to a small primary key and multiple secondary indexes with different column orders for the optimizer to choose from... but that's impossible to say without more detail.
–
Michael - sqlbotAug 21 '13 at 0:52

@ypercube Sorry, didn't want to just throw my stuff on the question so I came up with a similar example. Added the real schema and query
–
hack.augustoAug 21 '13 at 3:25

2 Answers
2

I got some performance improvements by mapping all the ranges into a series of integers. Ranges that includes all 0-9 numbers are encoded with the letter a, the resulting number is used in base 11, so, the range 100-199 is encoded as 1aa (which in base 10 is 241). With the ranges in place, a search for 150 does a query with IN (150, 15a, 1aa, aaa). I didn't manage to remove the range join, but because now it is a single integer column, it has a smaller index that performs better. An alternative that I did not experiment was using a R-tree index, it might be another improvement.

I also got a working hack to add the index hint into Django's QuerySets, not the best way to solve my problem, but it made a huge difference.

def get_compiler(qs, using=None, connection=None):
'''
Method that gets the QuerySet's sql compiler. It monkey patches the
QuerySet so that the compiler state can be changed. Keep in mind
that most of the QuerySet's methods returns a new one (which is
not gonna be monkey patched)
QuerySet's methods that return a new instance:
- https://docs.djangoproject.com/en/1.5/ref/models/querysets/#methods-that-return-new-querysets
'''
if using is None and connection is None:
compiler = qs.query.get_compiler(using=qs.db)
else:
compiler = qs.query.get_compiler(using=using, connection=connection)
qs.query.get_compiler = lambda *args, **kwargs: compiler
return compiler
def primary_index_hint(table, qs, index=-1):
'''
Method that adds index hinting. Table is the name in the database
(not the name used one the ORM)
index hint: https://dev.mysql.com/doc/refman/5.7/en/index-hints.html
Django 1.5 - 21/08/13
'''
query, compiler = qs.query, get_compiler(qs)
alias = query.table_map[table][index]
join = list(query.alias_map[alias])
join[1] = '{} use index (primary)'.format(alias)
query.alias_map[alias] = JoinInfo(*join)
compiler.quote_cache[join[1]] = alias
return qs

MySQL changes plan depends on it's assumption of the index efficiency. InnoDB perform several index lookups right after opening the table and pass these values to MySQL optimizer which is creating actual plan for every query. Sometimes these lookups may not be enough to build correct plan.

I could recommend to check if this is the case by increasing innodb_stats_sample_pages from 8 to say 12 or 16 which will give MySQL 50% or 100% more information about the index selectivity and would help building correct execution plan. After the change you will need to run

ANALYZE TABLE <table_name>

for each table involved in these queries to refresh index statistics. Please note, that this command will lock table and put it in read only mode while statistics will be collected so be careful. Usually it takes several seconds depends on the data size but it's RDS so you'll never know.

I will give it a try next week and let you know. Right now the mapping from ranges to integers is performing well (around 50ms). Either way, next year I'm gonna move the data away from de database and use a R-tree index that gives me a consitent performance for stabbing queries (I should use a interval-tree, I'm going for R-tree it is easier to find good libraries for).
–
hack.augustoDec 13 '14 at 16:57