The problem is that the db.read_object is a blocking call. There is absolutely no way the thread can continue, because the method must return the database object. And remember, we are still on the single thread running the message loop.

Therefore, if a thousand clients come at the same time, they will get served one-by-one, the last one waiting for the 999 database calls to complete. In other words, the throughput of our server is terribly low.

The whole difference is in the db library being itself non-blocking. What db.read_object does is that it puts the passed callback function inside some data structure and returns immediately, so our (single!) main thread can happily continue accepting requests. The db object itself is then running its own message loop internally (on its own thread, so our server has two threads now). In its internal loop, the db object polls for responses from the external database and calls back our function.

Now, if thousand clients come at the same time, a thousand requests to the database will be fired almost instantly and remembered by the db object, and a response will be sent back to each individual client as soon as the database returns each one of the individual requested objects.

Now this is the awesome non-blocking IO that everyone is talking about. The server really is handling a thousand clients "in parallel" using only two threads.

Of course, there will still be one thousand open sockets but we managed to handle them all using only two threads.

Appendix - what to do when you only have a blocking client library:

If the only API your client library gives you is 'obj = db.read_object(id)', you will essentially need to do the following:

This way you free the IO thread (the one running the message loop) for accepting more incoming requests, but your server is now blocking, since it uses one thread per client (each of those threads simply sits idle, waiting for the response from the database). If many requests come at once, the threadpool will start new threads. When a maximum number of threads is reached, the calls will be just queued, waiting for threads to complete and become available, therefore seriously limiting throughput.

The takeaway from this article is: in order for your request handling code to be non-blocking, it has to be composed entirely on non-blocking API calls. A "non-blocking" server toolkit by itself does not guarantee high concurrency/throughput.