Asynchronous Programming: Back to the Future

Asynchronous programming… Hearing these words, programmers’ eyes begin to shine, breathing becomes shallow, hands are shaking and the brain is drawing multiple levels of abstraction… Managers’ eyes become wide, sounds become inarticulate, fists are clenched, and the voice switches to overtones. The only thing that unites these two groups of people is a rapid pulse. However, there are different reasons. While programmers are eager for the fight, managers are trying to look into the crystal ball and realize the risks, frantically trying to come up with reasons to extend the deadlines as mush as they can. Later, when the most part of the code is already written, programmers begin to realize and experience the bitterness of asynchronous programming, spending endless nights in a debugger, desperately trying to understand what is actually happening…

That’s exactly what my inflamed imagination draws when hearing “asynchronous programming”. Of course, all of it is too emotional and not always true. Right? Various options are possible. Some people might say that “everything will work well with the right approach”. But you could always say so, at every possible occasion. This doesn’t make anything better. Bugs are not fixed, and insomnia won’t go away.

So, what is asynchronous programming? Why is it so attractive, and, most importantly, what’s wrong with it?

Introduction

Asynchronous programming is quite a popular subject nowadays. It’s enough to look around to make sure it’s true. You will come across reviews of various libraries, as well as the Go language, and all sorts of asynchronous frameworks in JS, and many other things.

As a rule, asynchronous programming is used for network programming: various sockets-shmockets, readers-writers and other acceptors. But interesting events also take place, especially in UI. In this article, I am going to talk about the network programming only. However, as we shall see in the next article, we can expand and deepen the approach to the unknown extent.

Being more specific, we are going to write a simple HTTP server that will send a standard response to any standard request. This is not to write a parser as it has exactly the same attitude to asynchronous programming as the position of stars to the character of a man (see astrology).

Synchronous Single-Threaded Server

Hmm. Synchronous? Reading an article on asynchronous programming, a careful reader may say that «asynchronous» has nothing to do here. Well, first of all, we have to begin with something, something simple. Secondly, I am the author here, so it’s going to be this way. Later, you’ll find out what it’s for.

In order not to write a low-level platform-dependent code, I’m going to use a powerful asynchronous library named boost.asio for all our purposes. Fortunately, there are lots of articles written about it, so we can be in-the-know.

For more clarity and «productioness» of the code, I am going to create wrappers around some of the boost.asio library’s functions. Someone can certainly like things like boost::asio::ip::tcp::socket or boost::asio::ip::udp::resolver::iterator, but this results to a less clear, as well as less readable code.

Here I’ve used a singleton forio_service, so not to explicitly pass it all the time. How can a user know that there should be some io_service? That’s why I’ve hidden it away. I guess the rest is quite clear, except for the readUntil function. Its purpose is simple: read the bytes until the desired end comes. That’s exactly what we need for HTTP, as we can’t specify the size beforehand. That’s why we have to resize.

Synchronous Multi-Threaded Server

Downsides of the previous server are obvious:

It is unable to handle several connections simultaneously.

The client can reuse the connection for a more effective interaction, and we always close it.

That’s why I decide to process connections in a different thread while still accepting further connections. For this purpose, we are going to need a function to create a new thread. I will name such function as go:

One would think that everything is all right, but it’s not the case. On high load, in real-world scenarios, it will go down quite fast. That’s why the smart guys have thought for a while and decided to go asynchronous.

Asynchronous Server

What’s wrong with the previous approach? The thing is that most of the time threads are waiting on events from the network, gobbling resources, instead of performing actual work. We would like to make better use of threads for performing useful work.

Therefore, I am going to implement similar functions, but asynchronously, using the proactor pattern. What does this mean? This means that for all operations we call a function and pass it a callback that will be automagically invoked upon the completion of the operation. Which means they will call us as soon as the operation completes. This differs from the reactor pattern when we have to call the necessary processors ourselves, monitoring the state of operations. A typical example of a reactor is something like epoll, kqueue, and various selects. An example of a proactor: IOCP on Windows. I am going to use a cross-platform proactor boost.asio.

Error handling differs significantly now. In case of the synchronous approach, we have two variants: returning an error code or generating an exception (this method has been used at the beginning of the article). However, there is the only way in case of asynchronous call: passing an error through the handler. That is, not even through the result, but as an input parameter of the handler. Want it or not, you will have to process errors like in the good old days when there were no exceptions: working with every trifle during the check. But it’s not the most interesting thing here. Interesting is when an error occurs in the handler, and we should process it. Restoring the context is a favorite task of asynchronous programming!

I’ve used IoHandler for the one-size-fits-all approach, which makes the code simpler and multipurpose.

Having a closer look, the only difference from synchronous functions lies only in the fact that asynchronous ones contain an additional handler as an input parameter.

Everything should be clear, except for the readUntil method. In order to call asynchronous reading on a socket several times, we should save the state. There’s a special UntilHandler class for this purpose. It saves the current state of an asynchronous operation. A similar implementation can be found in boost.asio for various functions (likeboost::asio::read) that require multiple calls of simpler (but not less asynchronous) operations.

In addition, we should write an analogue to _go_and dispatch:

void go(Handler);
void dispatch(int threadCount = 0);

Here, we have two things: a) a handler that will be running asynchronously in the pool of threads and b) creating a pool of threads with the subsequent dispatch.

Well, that’s it. The nesting of lambdas grows with each new call. Well, normally no one writes such things using lambdas, as there are difficulties with recursion. We should pass a lambda to itself, so that it could call itself. Nevertheless, the readability of the code will be pretty much the same meaning equally bad when compared to synchronous code.

Let’s discuss pros and cons of the asynchronous approach:

The undoubted advantage (actually, that’s the reason why we’re doing this) is performance. It is not only much better but also massively higher!

Now, it’s time to talk about disadvantages. Actually, there is just one — complex and confusing code, which is also more difficult to debug.

It’s good if you’ve written everything correctly, and now it all works without bugs. But what if it’s not the case? As they say, good luck with your debugging. Moreover, I have considered quite a simple example where we can track the sequence of calls. With a slight complication of the processing method (i.e. simultaneous read and write on the same socket), code complexity shoots up, while the number of bugs begins to grow almost exponentially.

So, is it really worth it? Should we deal with asynchronous things? Actually, there is a solution, and it is called coroutines.

Coroutines

What do we all want? Health, happiness, and money? We want a simple thing: use the advantages of asynchronous and synchronous approaches simultaneously. That is, performance would be like in the asynchronous approach, and the code would be as simple as that in the synchronous one.

Sounds great on paper but is it possible? To answer this question, we are going to need a brief introduction to coroutines.

What are regular procedures? For instance, we are at some place of execution, and then there is *boom*…and some procedure is being called. To call the procedure, we preserve the current point of return, and then call the procedure. It performs, completes, and returns the control to the place it has been called from. A coroutine is the same thing, but only different. It also returns the control to the place it has been called from but it does not complete. It stops at some point, from which it begins to work during the rerun. Thus, we get a sort of ping pong. The caller throws the ball, and then a coroutine catches it, runs to another place, and throws it back. The caller also does something (runs to a different place), and throws the ball again to the previous place, but of the coroutine. It goes like this until the coroutine completes. In general, we could say that a procedure is a special case of a coroutine.

How can we use it for our asynchronous tasks? We should keep in mind that a coroutine saves some execution context, which is extremely important for asynchronousness. That’s exactly what I am going to use. If a coroutine needs to perform an asynchronous operation, I will simply call an asynchronous method and exit the coroutine. Upon completion of the asynchronous operation, the handler will keep performing our coroutine from the point of the last call of the asynchronous operation. This means that all the dirty work of context saving falls on the shoulders of the implementation of coroutines.

That’s when the problems begin. The thing is that the support for coroutines on the side of languages and processors is ancient history. For the implementation of switching execution contexts, we have to perform lots of operations nowadays. We should save register states, then switch the stack and fill in some command boxes for the correct operation of the runtime environment (for exceptions, TLS, etc.) Moreover, the implementation depends not only on the architecture of the processor but also on the compiler and operating system. Sounds like the final nail in the coffin…

Fortunately, we’ve got boost.context that will implement everything necessary to support a particular platform. Everything is written with Assembler, in the best tradition. We can certainly use boost.coroutine but why when there’s boost.context? We need more hard-rock!

The Implementation of Coroutines

So, we are going to write coroutines for our purposes. The interface is going to be like this:

Here we replace the old TLS value oft_coro with a new one (we need it for the recursive switching between several coroutines), then set various flags and switch the context using boost::context::jump_fcontext. At the end, we restore old values and rethrow exceptions.

Now, let’s have a look at the private starter0 method that invokes the necessary handler:

I’d like to highlight one interesting thing: if we do not save the handler inside the coroutine (before calling it) the program can crash during the next return. This is due to the fact that the handler stores some state that can be destroyed at some point.

Synca: async vice versa

It’s time to implement asynchronousness on coroutines. A trivial option of implementation is provided in the following diagram:

We create a coroutine, then it starts an asynchronous operation and completes its execution using yield() function. Upon completion of the operation, the coroutine continues its execution by calling resume() method.

Everything would be fine if it weren’t for the notorious multithreading. As is always the case, it brings some turbulence. That’s why the provided above approach will not work properly, which is clearly illustrated by the following diagram:

Right after the operation scheduling, we can call the handler that will continue the execution till we exit the coroutine. This certainly wasn’t in our plans. Therefore, we will have to complicate the sequence:

The difference lies in the fact that we run the scheduling not in the coroutine, but outside of it, which eliminates the option described above. At the same time, the continuation coroutine can occur in another thread, which is quite normal behavior. Coroutines are meant this way, so that we could displace it back and forth, saving the execution context.

Small Remark

Surprisingly enough, boost.asio already has the support for coroutines. To solve the mentioned above problem,io_service::strand is used but that’s another story. It’s always interesting to write something on its own. Besides, the result obtained in the article is much more convenient to use.

This function is always called inside the coroutine. Here, we pass a handler that will deal with scheduling of operations, that is, invoking asynchronies. This handler is preserved so that we could trigger it after exiting the coroutine(coro::yield). Right after we quit it, onCoroComplete is invoked, and it triggers our «deferred»handler. Here’s the usage of defer function by the example of Socket::accept:

onCompleteHandler returns the asynchronous handler that processes the completion of the asynchronous operation. Inside the handler, the t_error error is stored, so that we could later throw an exception inside our coroutine (refer to handleError inside defer). Then, the execution of coro->resume() coroutine continues, meaning the return to defer method right after calling yield(). The diagram below shows the sequence of calls and the interaction between various entities:

There’s just one difference here. In the synchronous implementation, socket acceptance takes place in the main thread. That’s why there’s no dispatch. However, we could make these approaches absolutely identical. To do this, the synchronous implementation should also have socket acceptance in a separate thread with the help of go. As for the dispatch function, it would be waiting for the completion of all threads.

But the difference in the implementation is fundamentally important. The obtained code uses asynchronous network interaction, which makes it a much more effective implementation. Actually, that’s where our goal is achieved, as we wanted to create a symbiosis of the asynchronous and synchronous approaches, taking the best from both worlds (the synchronous approach simplicity and the asynchronous approach performance).

Enhancement

Let’s have a look how to enhance the process of accepting sockets. After accepting, there are usually two execution routes. The acceptor will continue to accept, and a new socket will be handled in a separate execution context. Therefore, we are going to create a new goAccept method:

At first, I wanted to test the limit loads but then it turned out that a Gigabit network (rather than a CPU) is fully loaded in one (!!!) thread. Therefore, I have carried out the following test:

The server runs under the constant load of 30K RPS (30K requests per second).

CPU load in case of async and synca.

The results are provided below:

Method

Requests per Second

Number of Threads

Workload of the CPU Core

async

30000

1

75±5%

synca

30000

1

80±5%

It should be noted that the inaccuracy of obtained values is linked to the value fluctuations in the course of one test. Most likely, it is due to the unevenness of the channel load and processing.

Nevertheless, we can see that despite the additional switching contexts as well as throwing exceptions instead of using return codes (an exception is generated each time a socket closes, meaning each time on a new request), overheads are negligible. What if we add code that will honestly parse an HTTP message, and also code that won’t less honestly process requests and do something important and necessary? We could claim that there would be no difference in performance at all.

Question No.2. Okay, maybe. But is it possible to solve more complex asynchronous tasks this way?

Theorem. Any asynchronous task can be solved with the help of coroutines.

Proof.

First, let’s take a function that uses asynchronous calls. Any function can be converted to a coroutine as function is a special case of coroutines. Then, let’s take any asynchronous call in the converted coroutine. We can represent this call in the following form:

// code before the call
async(..., handler);
// code after the call

Let’s consider the case when there’s no code after the call:

// code before the call
async(..., handler);

In terms of the coroutine, such code is equivalent to the following:

// code before the call
synca(...);
handler();

Meaning that we’re calling a corresponding asynchronous async function inside synca that returns control to the coroutine upon the completion of the operation, and then _handler_() is explicitly called. The result is absolutely the same.

Now, we should consider a more general case, when we have code after the synchronous call. This code is equivalent to:

This means that we have one asynchronous call less. Applying this approach to every asynchronous call of the function, and to every function, we will rewrite the entire code with coroutines. QED.

Summary

Asynchronous programming bursts into the life of programmers. Complications arising when they write code can drive even the most experienced ones crazy. However, we should not forget about the good old synchronous code. In smart hands, asynchronousness turns into smart coroutines.

In the next article, we’re going to review a much more complex example that will reveal all the power and potential of coroutines!

Comments

Most of us work with strings one way or another. There’s no way to avoid them — when writing code, you’re doomed to concatinate strings every day, split them into parts and access certain characters by index. We are used to the fact that strings are fixed-length arrays of characters, which leads to certain limitations when working with them. For instance, we cannot quickly concatenate two strings. To do this, we will at first need to allocate the required amount of memory, and then copy there the data from the concatenated strings.