More on table_cache

More on table_cache

In my previous post I looked into how large table_cache actually can decrease performance. The “miss” path is getting more expensive very quickly as table cache growths so if you’re going to have high miss ratio anyway you’re better off with small table cache.

What I have not checked though is how does table_cache (or table_open_cache in newer version) size affects the hit path.

I started with the same test as last time – created 100000 tables and read all of them at once to make sure all table cache entries are populated. When I tried repeatedly reading 1000 empty tables with table_cache of 20000 and 2000. With Table Cache of 2000 I got about 16400 selects/sec with Table Cache of 20000 13500 selects/sec. So there is some slow down in hit path as well though it is not as large as with miss path.

I also tested with very small table cache of 5, which would cause opening 1000 tables in the loop to generate misses. This way I got 8000 selects/sec – note this is basically best case scenario for opening tables as MyISAM table headers which are being modified were fully cached in this case.

What does this tell us ? If you can fit your tables in table cache and eliminate (or almost eliminate) table cache misses it is best to size table_cache large enough – even in the best case scenario for misses hit from large table_cache is faster.

There are also possible optimization for “hit” path with large table cache though this should only affect someone who runs really large amount of simple queries – this is when opening table can be significant contributer to total query time.

Related

Author

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Share this post

Comments (7)

Has anybody looks at the internals of the table_cache data structure? I’d naively assume that you could get close to linear performance with something as simple as a hash table, its sort of weird that there’d be any kind of dependency on number of entries (though good to know).

I would assume this strange behaviour is due to entry vs bucket options in the hashtable (of course depending on which hash is being used etc but as a theory).

Similar behaviour exists in the tcp/ip stack of the linux kernel where (from performance point of view) the best option is to set same number of buckets as you will have number of entries in the hash (each bucket will only contain max 1 entry). Its not the most optimal regarding memory usage (slightly more mem will be used) but its the best regarding performance. The difference is if you have 10 entries per bucket it will be somewhat fast but if you fill each bucket with 100 or more entries you will start to notice a penalty hit.

Regarding the tcp/ip stack of linux kernel the hint is to set “ip_conntrack.hashsize=xxx” (where xxx is same value you set as /proc/sys/net/ipv4/ip_conntrack_max) as bootoption for the kernel.

Yes this seems to be related to dynamic hash table code. Yasufumi has taken a look into it and insert/delete code seems to be not overly optimal. MySQL at this point does not have any tuning options. Indeed for things like table_cache it should be possible just to allocate large enough hash table so there is almost no collisions.

After reading the table cache structure, it looks like we testing on 1000 tables and keep performing select will put instances of one table into the chain of table cache structure until there is no more cache available and then
mysql will delete those in unused_link. One thing I am not really understand is that every select you issue, will
create a table instance and put it into chain? ( in same session ) Should MySQL reuse these ‘used’ table cache? It makes sense if we have large cache and chain up a lot of instances, traverse the long linked-list will introduce penalty if mysql doesn’t reuse ‘used’ instances in the linked-list.