All About InnoDB

Thursday Dec 12, 2013

As you probably already know, in MySQL
5.7.3 release, InnoDB Memcached reached a record of over 1 million
QPS on a read only load. The overview of the benchmark and testing
results can be seen in an earlier blog by Dimitri.
In this blog, I will spend sometime on the detail changes we
have made to achieve this number.

First thanks to Facebook's
Yoshinori with his bug#70172 that brought our attention to this single commit read only load test.
We have been focussing on operation with large batch size. This bug
prompted us to do a series of optimization on single commit read only
queries and these optimizations eliminate almost all major
bottlenecks from the InnoDB Memcached plugin itself.

If you are just getting familiar with
InnoDB Memcached, there are some earlier blog on the topics to get you started. In a short word, InnoDB Memcached allows a fast path to retrieve key value data stored in the InnoDB table, with Memcached protocol.

The Benchmark:

Now,
Let's discuss the testing scenario. The InnoDB Memcached plugin
configurations are all by default in this benchmark, which means, the
daemon_memcached_r_batch_size was also set to be 1, and the read
operation would do a begin and commit transaction for each query. It
is equivalent to auto-commit single selects through SQL interface.
The innodb_api_trx_level is by default set to 0 (read uncommitted),
however, changing it to 2 (repeatable read) gave the same benchmark
result.

Another good news in 5.7.3 is that we start to support
integer key column mapping, as it is common to use integer as primary
key for a table. And the table used in this benchmark comes with
integer as the key column. The mapping table contains only key and
value columns. So we set the corresponding `flags`, `cas_column` and
`expire_time_column` column in the config containers table all to
NULL, this avoids overhead columns to support Memcached "extra" options. The
table itself containers 1 million rows, each with a short integer key
and a text value.

Here is the detail table definition

mysql> desc test.memc_test;

Field

Type

Null

Key

Default

Extra

id

int(11)

NO

PRI

NULL

value3

text

YES

NULL

To make InnoDB Memcached recognize this InnoDB table, insert following row into innodb_memcache/containers table

The memcached client for inserting
rows and querying is a simple libmemcached program provided by
Yoshinori.Each query would do a key lookup and fetches corresponding value.

We made some
adjustment so that there are multiple client processes, each with
multiple sessions. This was used to alleviate bottlenecks in the
client itself.

As a note, there are many memcached
clients out there, and Memcached clients can play important roles
in the performance result itself. For example, we observed at least
3 times difference on result with Perl client Cache::Memcached::Fast
when comparing to its slower version Cache::Memcached. And as far as
we can see, libmemcached is one of the most efficient clients
available, even though eventually it becomes bottleneck itself as the
test progresses, especially requests through the network.

The
test result can be seen in Dimitri's blog, so I will not repeat them
here. The summary is that we got close to 1.2 million QPS at the
best. The next bottleneck now seems lying at the adaptive hash index's global latch -
"btr_search_latch". The libmemcached client overhead is also
significant.

Read Only QPS through
InnoDB Memcached

The improvement:

There
are several changes in both InnoDB Memcached code and Memcached
Native code to achieve the record benchmarks.

1. The first
is of course to address the issue brought by bug #70712. With
daemon_memcached_r_batch_size set to 1, the transaction is being
repeatedly started and committed for each query. It is better to
cache the trx object itself, to avoid repeated create and destroy the trx
object. Otherwise, the "trx_sys mutex" will kill the concurrency.

After the change,
the trx object is cached with private memcached connection data. Each
connection gets its own trx object, and it is used to handle
transactions through this particular connection.

2. The next
thing we did is to take advantage of the read only optimization
recently made in the InnoDB code. This scenario (single read trx) is
perfect to use the optimization. Whenever the read batch size is set
to 1, InnoDB Memcached will treat incoming queries as auto-commit
read only query. It will automatically hook up to the "fast
path" of read-only operation in InnoDB.

3. After these
two transaction related changes, we found the bottleneck comes from
Memcached native Code itself. As a note, we embedded the Memcached code itself in our InnoDB Memcached plugin, so any bottleneck in Memcached will affect us.

The original Memcached memory allocation is
protected by a global Cache Lock (engine->cache_lock), and it
quickly rises in prominence in the profiling result.

Even though the data is stored in InnoDB, we happened to
still use some of Memcached's own memory allocation to store and deliver the
result back to the front end. To fix this bottleneck, we stopped using Memcached Memory altogether. Instead a connection private
memory buffer is used to store and deliver the result. This also
saves a memcpy as we move the data to memcached memory as
before.

This change makes InnoDB Memcached plugin as thin as
possible, and only relies on the InnoDB buffer pool and Adaptive Hash
Index (AHI) as the backing store for the data. This provides better
scaling and memory handling than Memcached itself.

4. Another
bottleneck in Memcached is its statistics mutex
("thread_stats->mutex"). This also becomes significant as testing goes.
So to remove it, we switched to using atomic operations whenever the
platform supports (most modern platforms do).
With these changes, we can now well scale the plugin to over 100
connections without degradation as the number of connections are
ramped up.

5. In addition to removing those major bottlenecks,
we also streamline the code to remove some overhead work. For
example, we start to cached the "search tuple", so that
there is no need to allocate the search tuple for each query. This is
to keep the InnoDB Memcached as lean as possible.

With these
changes, we have eliminated all the major InnoDB Memcached Plugin
bottlenecks. The bottlenecks now comes from clients themselves and to
a lesser degree from the Adaptive Hash Index search latch.

Future
work:

Now the Memcached read goes more than twice as fast
as those from SQL end. By using the InnoDB buffer pool as the
in-memory store, and with InnoDB AHI, InnoDB Memcached can probably provide an
efficient and more scalable store than Memcached itself.

There
is still more to be done.

1. We will continue to remove some
bottlenecks in InnoDB (such as btr_search_latch), as well as make
InnoDB memcached leaner/faster.

2. We will add support to
"mgets" command, which allows Memcached to fetch multiple
results (corresponding to multiple keys) in one query attempts. This
would again give us another big jump in terms of QPS.

3. We
will start to focus more on insertion/updates operations.

4.
We are considering extending the functionality of the memcached
interface to support range queries etc. So to make it a more versatile
key value store.

In summary, with these enhancements, the
InnoDB Memcached becomes more and more attractive as as quick
key value store through the MySQL server.

Your feedback and
comments are important to us as we evolve and improve this plugin.