Saturday, May 19, 2012

Inter Thread Communication: Socketpairs vs. In-Memory Buffers

In multi-threaded applications, efficient communication between threads is often a challenge. A good scheme for doing this should have the following characteristics:

should be lock free (threads should not necessarily block while communicating)

respect message boundaries when multiple threads communicate to one thread

Socketpairs are often used for this. A socketpair is a bi-directional socket with file descriptors of both ends provided. Data written to one end can be read from the other end. A socketpair with one end of it with the thread and the other end with another thread, it does provide a good inter-thread communication pipe without locking. Or does it?

Socketpairs are a facility provided by the kernel. Reads and writes into it involve an expensive context switch (system calls). When the same process writes and reads from socketpairs, it switches to kernel mode, copies part of its memory into a kernel buffer, switches back to user mode, switches to kernel mode to do a 'select' or equivalent and back, and then switches to kernel mode to read from kernel buffer to user memory, and switches back to user mode. Phew! That piece of memory was right there in the same user space, damn it! Doing inter-thread communication with socketpairs is like snaking your arm around your head to touch your nose.

Is there a better way?

People usually avoid using in-memory message buffers because they require locking. The writer needs to lock it while writing so that the reader does not read till write is complete and no other writer writes to it.

In earlier days, when semaphores were the only locking primitive available, this was an expensive mechanism. Semaphores are meant to do much more than a mutex. For example, they can be used across processes, they can be locked and unlocked by different threads/processes, and they can maintain a count. They interact with the scheduler much more deeply and hence are considered 'heavy'.

With mutex based locks available in most systems now, locks can be much lighter now. Mutex is lightweight because it is simpler. It is limited to one process only, it can be unlocked only by the same thread that locked it, and it is binary (count 0 or 1). Socketpairs did have an advantage few years back when advanced locking primitives were not available. But not any more.

Below I've pasted a piece of code to compare the two mechanisms we discussed above. I've implemented a simple queue with two locks - one head lock for the reader and a tail lock for the writer. When the queue has data, it can be written to and read from without any lock contention. Only when the reader does not find any data, does it lock the tail lock to flush any cached data and check whether it really does not have any data. There is a chance of contention at that point, but it will be very infrequent.

Compile and run the code, and it will print out the command line arguments required for the two modes. Here's what I got on my laptop:

For the in-memory queue:
real0m4.005s
user0m4.113s
sys0m0.197s

For the socketpair:
real0m22.875s
user0m5.770s
sys0m39.505s

In my run socketpairs took 10 times more CPU (with heavy sys time) and were 5 times slower than the in-memory queue.