Memaslap manages network connections like memcached with libevent. Each thread of memaslap is bound with a CPU core, all the threads don't communicate with each other, and there are several socket connections in each thread. Each connection keeps key size distribution, value size distribution, and command distribution by itself.

You can specify servers via the memslap --servers option or via the environment variable MEMCACHED_SERVERS.

For memaslap, both TCP and UDP use non-blocking network IO. All the network events are managed by libevent as memcached. The network module of memaslap is similar to memcached. Libevent can ensure memaslap can handle network very efficiently.

Memslap has the similar implementation of multi-threads to memcached. Memslap creates one or more self-governed threads; each thread is bound with one CPU core if the system testss setting CPU core affinity.

In addition, each thread has a libevent to manage the events of the network; each thread has one or more self-governed concurrencies; and each concurrency has one or more socket connections. All the concurrencies don’t communicate with each other even though they are in the same thread.

Memslap can create thousands of socket connections, and each concurrency has tens of socket connections. Each concurrency randomly or sequentially selects one socket connection from its socket connection pool to run, so memaslap can ensure each concurrency handles one socket connection at any given time. Users can specify the number of concurrency and socket connections of each concurrency according to their expected workload.

In order to improve time efficiency and space efficiency, memaslap creates a random characters table with 10M characters. All the suffixes of keys and values are generated from this random characters table.

Memslap uses the offset in the character table and the length of the string to identify a string. It can save much memory. Each key contains two parts, a prefix and a suffix. The prefix is an uint64_t, 8 bytes. In order to verify the data set before, memaslap need to ensure each key is unique, so it uses the prefix to identify a key. The prefix cannot include illegal characters, such as ‘r’, ‘n’, ‘0’ and ‘ ‘. And memaslap has an algorithm to ensure that.

Memslap doesn’t generate all the objects (key-value pairs) at the beginning. It only generates enough objects to fill the task window (default 10K objects) of each concurrency. Each object has the following basic information, key prefix, key suffix offset in the character table, key length, value offset in the character table, and value length.

In the work process, each concurrency sequentially or randomly selects an object from the window to do set operation or get operation. At the same time, each concurrency kicks objects out of its window and adds new object into it.

Memslap uses libevent to schedule all the concurrencies of threads, and each concurrency schedules tasks based on the local task window. Memslap assumes that if each concurrency keeps the same key distribution, value distribution and commands distribution, from outside, memaslap keeps all the distribution as a whole. Each task window includes a lot of objects, each object stores its basic information, such as key, value, expire time, and so on. At any time, all the objects in the window keep the same and fixed key and value distribution. If an object is overwritten, the value of the object will be updated. Memslap verifies the data or expire-time according to the object information stored in the task window.

Libevent selects which concurrency to handle based on a specific network event. Then the concurrency selects which command (get or set) to operate based on the command distribution. If it needs to kick out an old object and add a new object, in order to keep the same key and value distribution, the new object must have the same key length and value length.

If memcached server has two cache layers (memory and SSD), running memaslap with different window sizes can get different cache miss rates. If memaslap adds enough objects into the windows at the beginning, and the cache of memcached cannot store all the objects initialized, then memaslap will get some objects from the second cache layer. It causes the first cache layer to miss. So the user can specify the window size to get the expected miss rate of the first cache layer.

Because each thread is self-governed, memaslap can assign different threads to handle different memcached servers. This is just one of the ways in which memaslap tests multiple servers. The only limitation is that the number of servers cannot be greater than the number of threads. The other way to test multiple servers is for replication test. Each concurrency has one socket connection to each memcached server. For the implementation, memaslap can set some objects to one memcached server, and get these objects from the other servers.

By default, Memslap does single get. If the user specifies multi-get option, memaslap will collect enough get commands and pack and send the commands together.

Memslap testss both the ASCII protocol and binary protocol, but it runs on the ASCII protocol by default. Memslap by default runs on the TCP protocol, but it also tests UDP. Because UDP is unreliable, dropped packages and out-of-order packages may occur. Memslap creates a memory buffer to handle these problems. Memslap tries to read all the response data of one command from the server and reorders the response data. If some packages get lost, the waiting timeout mechanism can ensure half-baked packages will be discarded and the next command will be sent.

All the distributions are read from the configuration file specified by user with “—cfg_cmd” option. If the user does not specify a configuration file, memaslap will run with the default distribution (key size = 64, value size = 1024, get/set = 9:1). For information on how to edit the configuration file, refer to the “Configuration File” section.

The minimum key size is 16 bytes; the maximum key size is 250 bytes. The precision of proportion is 0.001. The proportion of distribution will be rounded to 3 decimal places.

The minimum value size is 1 bytes; the maximum value size is 1M bytes. The precision of proportion is 0.001. The proportion of distribution will be rounded to 3 decimal places. Currently, memaslap only testss set and get commands. And it testss 100% set and 100% get. For 100% get, it will preset some objects to the server.

The high performance of memaslap benefits from the special schedule of thread and concurrency. It’s important to specify the proper number of them. The default number of threads is 1; the default number of concurrency is 16. The user can use “—threads” and “--concurrency” to specify these variables.

If the system tests setting CPU affinity and the number of threads specified by the user is greater than 1, memaslap will try to bind each thread to a different CPU core. So if you want to get the best performance memaslap, it is better to specify the number of thread equal to the number of CPU cores. The number of threads specified by the user can also be less or greater than the number of CPU cores. Because of the limitation of implementation, the number of concurrencies could be the multiple of the number of threads.

The memaslap performs very well, when used to test the performance of memcached servers. Most of the time, the bottleneck is the network or the server. If for some reason the user wants to limit the performance of memaslap, there are two ways to do this:

Decrease the number of threads and concurrencies. Use the option “--tps” that memaslap provides to limit the throughput. This option allows the user to get the expected throughput. For example, assume that the maximum throughput is 50 kops/s for a specific configuration, you can specify the throughput equal to or less than the maximum throughput using “--tps” option.

Most of the time, the user does not need to specify the window size. The default window size is 10k. For Schooner Memcached, the user can specify different window sizes to get different cache miss rates based on the test case. Memslap testss cache miss rate between 0% and 100%. If you use this utility to test the performance of Schooner Memcached, you can specify a proper window size to get the expected cache miss rate. The formula for calculating window size is as follows:

Assume that the key size is 128 bytes, and the value size is 2048 bytes, and concurrency=128.

Memslap testss both data verification and expire-time verification. The user can use "--verify=" or "-v" to specify the proportion of data verification. In theory, it testss 100% data verification. The user can use "--exp_verify=" or "-e" to specify the proportion of expire-time verification. In theory, it testss 100% expire-time verification. Specify the "--verbose" options to get more detailed error information.

For example: --exp_verify=0.01 –verify=0.1 , it means that 1% of the objects set with expire-time, 10% of the objects gotten will be verified. If the objects are gotten, memaslap will verify the expire-time and value.

Memslap testss multi-servers based on self-governed thread. There is a limitation that the number of servers cannot be greater than the number of threads. Memslap assigns one thread to handle one server at least. The user can use the "--servers=" or "-s" option to specify multi-servers.

The above command means that there are 6 threads, with each thread having 6 concurrencies and that threads 0 and 3 handle server 0 (10.1.1.1); threads 1 and 4 handle server 1 (10.1.1.2); and thread 2 and 5 handle server 2 (10.1.1.3).

All the threads and concurrencies in memaslap are self-governed.

So is memaslap. The user can start up several memaslap instances. The user can run memaslap on different client machines to communicate with the same memcached server at the same. It is recommended that the user start different memaslap on different machines using the same configuration.

The default memaslap runs with time mode. The default run time is 10 minutes. If it times out, memaslap will exit. Do not specify both execute number mode and time mode at the same time; just specify one instead.

The user can use "--division=" or "-d" to specify multi-get keys count. Memslap by default does single get with TCP. Memslap also testss data verification and expire-time verification for multi-get.

Memslap testss multi-get with both TCP and UDP. Because of the different implementation of the ASCII protocol and binary protocol, there are some differences between the two. For the ASCII protocol, memaslap sends one “multi-get” to the server once. For the binary protocol, memaslap sends several single get commands together as “multi-get” to the server.

Memslap testss both UDP and TCP. For TCP, memaslap does not reconnect the memcached server if socket connections are lost. If all the socket connections are lost or memcached server crashes, memaslap will exit. If the user specifies the “--reconnect” option when socket connections are lost, it will reconnect them.

User can use “--udp” to enable the UDP feature, but UDP comes with some limitations:

UDP cannot set data more than 1400 bytes.

UDP is not testsed by the binary protocol because the binary protocol of memcached does not tests that.

The above command means that there are 2 replication memcached servers, memaslap will set objects to both server 0 and server 1, get objects which are set to server 0 before from server 1, and also get objects which are set to server 1 before from server 0. If server 0 crashes, memaslap will only get objects from server 1. If server 0 comes back to life again, memaslap will reconnect server 0. If both server 0 and server 1 crash, memaslap will exit.

Start memaslap with "--conn_sock=" or "-n" to enable this feature. Make sure that your system can tests opening thousands of files and creating thousands of sockets. However, this feature does not tests reconnection if sockets disconnect.

The above command means that memaslap starts up 8 threads, each thread has 16 concurrencies, each concurrency has 128 TCP socket connections, and the total number of TCP socket connections is 128 * 128 = 16384.

Since memcached 1.3.3 doesn't implement binary UDP protocol, memaslap does not tests UDP. In addition, memcached 1.3.3 does not tests multi-get. If you specify "--division=50" option, it just sends 50 get commands together as “mulit-get” to the server.