Improve Server Performance with asynchronous WebAPI

Introduction

In this article I want to show, how you can improve server performance dramatically by using Microsoft asynchronous WebAPI. The text shows, how a Microsoft WebAPI service backend can be improved to handle request asynchronously and so to increase the number of concurrent clients the server can handle. You can boost the user experience of client applications with asynchronous WebAPI services by being more reactive and fluent in the user interface. This document will not describe the client side in deep, focus is on the performance of the server side. In the following section, I write a very simple web service using Microsoft WebAPI, measure the performance of concurrent requests to that service and compare the results to a second scenario, where we will make the service asynchronous.

Environment

My test machine was a Windows 8.1 PC running on Intel Core™ i7 CPU with 6 cores at 3.24 GHz and with 12 GB RAM. The WebAPI is hosted within the local IIS 8. The simulated clients also run on this machine. One can say that this is not really a realistic scenario for simulating concurrent users and measure server performance. But it will be enough to show some problems and how they can be solved with asynchronous WebAPI services.

The Server

Service Implementation

The server code is minimalistic. The code snipped below shows the implementation we need. There is only a WebAPI service that supports the GET HTTP method in a WebAPI controller named LongOperationController. This service simulates a long running operation by calling Thread.Sleep(). The service waits for 2 seconds until it returns and a response message is generated. So a client waits at least for 2 seconds when calling the service.

Notice: the service response time is a little bit longer than 2 seconds. Serialization/deserialization and transport of HTTP messages over the wire takes time. So the service response time depends on network latency.

Thread Pool Configuration

There is another variable in the scenario that has to be considered: the thread pool size. As an Asp.Net WebAPI service hosted in an IIS, the application has an own thread pool with a limited number of available threads. The default numbers of threads in the pool are calculated based in the number of cores in the CPU to find an optimum value. But they can be changed! There are two types of threads in a thread pool:

Worker Threads are used for active work e.g. when pushing a work item into the thread pool. Client requests are handled in this way. Each request is handled by an own thread from the thread pool (as far as there are enough worker threads available in the pool).

Completion Port Threads are used to wait for asynchronous I/O operations to finish (e.g. when accessing storage/disc, receive data from network, at least when using the await keyword in the WebAPI service implementation). Completion port threads are not doing a lot. They simply wait and block until they receive a signal.

The number of available worker threads in the thread pool affects the performance of our WebAPI services. If there are too many concurrent client request, all worker threads are busy. New client requests must be queued. As soon as a worker thread is available again, the request is taken from the queue and processed. Using the queue is time expensive, so performance problems can occur.

The maximum number of threads in the thread pool can be increased in order to avoid those performance issues. The problem with this solution is that each thread takes about 1 MB of RAM. So maybe you have to scale the machines memory. 1000 extra threads means 1 GB extra memory! That is not the way to go here.

For our test scenario, we set the maximum number of threads in the pool to a small value: 200. So we can simulate massive concurrent client requests and provoke performance problems at the server (due to the small number of worker threads, not all client request can be handled). While simulating, we can observe what happens with the server performance/response times.

To query or set the numbers of threads the code below can be used. The ThreadPool class provides some static services to query and set the minimum, maximum and available number of worker and completion port threads.

// Variables used to store the available completion port thread counts.int workerThreads, completionPortThreads; // Variables used to store the min. and max. worker thread counts.int minWorkerThreads, maxWorkerThreads; // Variables used to store the min. and max. completion port thread counts.int minCompletionPortThreads, maxCompletionPortThreads; // Query the number of available threads in the app pool.ThreadPool.GetAvailableThreads(out workerThreads, out completionPortThreads); // Query the minimum number of threads in the app pool.ThreadPool.GetMinThreads(out minWorkerThreads, out minCompletionPortThreads); // Query the maximum number of threads that can be in the app pool.ThreadPool.GetMaxThreads(out maxWorkerThreads, out maxCompletionPortThreads);
// Set the maximum number of threads available in the app pool.ThreadPool.SetMaxThreads(workerThreads, completionPortThreads);

With the first service implementation and the thread pool configuration on board, we are ready with the server side and can implement the client code.

The Client

The code below shows the implementation of the client. The method creates a bunch of threads. The number of threads is given by the parameter requests in order to simulate different numbers of concurrent client requests. Each thread calls the WebAPI service shown in the previous chapter and waits for the response. After initializing all threads in the for-loop, all threads are started to the same time (more or less) and the method waits until all threads are completed. That is, when all threads called the WebAPI service and got a response (successful or not).

The first scenario should show the problems with synchronous services. Therefore, the method above is called with different parameters: 1, 10, 20, 30, 40, 50, 100, 200, 300, 400, 500, 1000, 2000 and 3000. That means, e.g. for value 1000, that 1000 requests are send to the DoLongRunningOperation WebAPI service to the same time.

Performance

To measure the number of requests per second, number of failed requests per second and the total number of failed requests that were handled by the WebAPI service, the appropriate performance counters have been configured in the performance monitor tool. The image below shows the results of measurements while the client code was running. The red box marks the timeframe when the client code from above was executed. The green line shows how many requests are handled per second. The server handles the request in a nearly constant way for 1 – 500 concurrent requests.

Problems occur with 1000, 2000 and 3000 concurrent requests. The blue line shows the number of failed requests per second. 1000 and more requests cannot be handled by the server in this scenario. As described above, there are only 200 worker threads at the server and the DoLongRunningOperation service takes 2 seconds for execution. So, if there are 200 and more request operations running to the same time at the server, the number of worker threads exceeds and requests are queued. If requests cannot be queued, they fail. The blue line shows that a lot of requests failed in the 1000 and more concurrent requests scenarios. The total number of not handled requests increases shown by the red line.

The next chapter shows how this situation can be improved by using asynchronous WebAPI services.

Figure 1: Performance of synchronous WebAPI services

The asynchronous Service Implementation

To improve the performance of the server in this test scenario, the WebAPI service DoLongRunningOperation will be made asynchronous. The code below shows how to change the service implementation. First, “Async” is appended to the method name following the naming conventions in the .Net framework.

The operation is simply marked with the async keyword and the return type is changed to Task. That is all to make the service asynchronous. But there is a little bit more in this case. Since there is no asynchronous stuff done in this method, the code won’t compile. A task has to be returned or an awaitable operation has to be called. So the Thread.Sleep() method cannot be used here anymore. Instead, the awaitable Task.Delay() method is used.

Awaiting the Task.Delay() method means that DoLongRunningOperationAsync immediately returns when Task.Delay() is called. And so the thread that handles the current requests will be available in the thread pool again and can handle other requests. The Task.Delay() operation runs in a separate thread. When it finishes, the response is created and returned to the client.

An asynchronous WebAPI service does not act asynchronous from a client point of view (e.g. using a callback). The client has to wait until the operation has been finished. That is, in this scenario, when the delay of 2 seconds is over and the response is returned. The client thread is blocked until this operation completes. Asynchronous WebAPI services means, that those services do not block the worker threads in the server thread pool. So the server can handle more requests.

Changing the server code as described results in better performance of the server. The next chapter explains those results for the same test scenario (sequence of concurrent requests) from above.

Performance

The image below shows the performance measurement results when sending 1, 10, 20, 30, 40, 50, 100, 200, 300, 400, 500, 1000, 2000 and 3000 concurrent requests to the WebAPI service. Comparing to the graph of the first measurements above, the plot below looks a lot smoother. There are several points that have been improved with the asynchronous WebAPI service:

There are no failed requests. As mentioned above, asynchronous calls in the WebAPI service are done in other background threads. So worker threads can be released earlier and they are available in the thread pool again to handle other requests. 200 worker threads in the thread pool seem to be enough to handle all requests in the test sequence from 1 to 3000 concurrent requests. In the first scenario the limit has been reached with 500 concurrent requests.

More requests per second can be handled by the WebAPI service. The image below shows that 50 requests per second (more or less) are handled during the test period. The results of the first test scenario shows, that only 4-5 requests per second can be handled (until 1000 concurrent requests are send, where requests begin to fail).

The test sequence is handled faster. In the scenario above it takes about 40 minutes to run the test sequence from 1 to 500 concurrent requests. The graph below shows that the sequence from 1 to 400 concurrent requests are handled within 4 minutes. After 8 minutes the server handled the sequence from 1 to 1000.

Figure 2: Performance of asynchronous WebAPI services

Conclusion

The test scenarios show that server performance can be improved dramatically by making WebAPI services asynchronous using async and await. Requests are handled faster, more requests can be handled and so there are less requests that fail. Making WebAPI services asynchronous is as easy as possible when using the async and await language features of C#.

If you have any questions or if you need assistance with your code, do not hesitate to contact me! I would be happy if you have comments, critics or opinions to this post. Please let me know!

Yes, the client works synchronous. But you can use e.g. the WebClient.DownloadStringAsync method with the await keyword in order to make the call asynchronous. I didn’t use the async method at the client in order to keep the test scenario clean. I want to show what’s happening when using async web services. So using async methods at the client affects the performance measurements what makes the results not comparable for my scenario. Otherwise you are correct.

Like the architect says: it depends on…! 🙂 E.g.: Depends on which database you are using. Depends on how the application is deployed (e.g. all on one server or separate servers for application and database). Depends on whether there is load balancing or not. Depends on the configuration of the servers and application (e.g. max request limit). Depends on the server hardware and so on.

Be careful: if you allow 20.000 concurrent requests (max request limit) it could take the server down very fast! There will be an open connection for each concurrent request! I would not recommend a such big limit. It looks to high from my point of view.

The question is, what do you really want to do? Maybe you can questioning the requirements and/or the current solution and solve your problem at another level. Do you really need 20.000 requests per second? Or can the problem be solved on another way?

Hi, I didn’t repo your case. I seriously doubt that since that your async task’s return type is void, when you use await, it just start a background thread to sleep, and return immediately. So in your async version, request speed is much faster, just because sleeping is not block your worker thread.
I‘ve done a similar experiment.But I did a CPU bound job in the task. And the worker thread need to wait to the task to be done in order to get the result. This time, the sync and async version shows no different.

You mentioned the difference between “Worker Threads” and “Completion Port Threads” and the issue that a new thread needs extra memory.

So freeing up a worker thread via async await looks great from memory point of view. but if I call await doesn’t that create a completion port thread? According to this, do I really get a benefit in the number of threads running and in the memory consumption?

There’s one challenge I’ve encountered in the app I work on. The server has Web API running for our client apps to use as a data resource. During the lifetime of any one service call, I want to retain certain contextual info, such as the user ID, the product ID and version, and other vital data (normally passed in the header of every API call). But of course, I don’t want to have to pass this data through every single method call throughout the chain of operations. Should this be carried along as a property on the Thread.CurrentPrinciple? Or is there a better way?

Hi David! Thanks a lot! 🙂 I am happy that the post is helpful for you.

What is your “chain of operations”? Do you call service by service from the client side? Or do you call a service from the client once, and then call other operations within the service at the server side?

In case one, you have to send the contextual information as parameters since WebApi services are RESTful and so they are stateless. Client context is not stored between service calls at the server side. Further, the CurrentPrinciple at the server side is created new with every WebApi service call.

In case two you can put the context into the CurrentPrinciple or pass it as method parameters to the next operation. Another way would be to store the context in an instance variable. But this is not a nice nor clean way.

While it definitely shows the ability of asynchronous solutions to consume less resources, results might be different in a real production environment running Windows Server.

I am yet to try exactly your code in such an environment, but I performed similar tests using Web API self-hosting option (which is not prone to concurrent requests limitations), using Apache Benchmark as a client.

For a small number of concurrent requests (100), synchronous and asynchronous results were pretty close, with 47/48 requests per second and 2065/2027 median latency. The difference was more drastic for 1000 concurrent requests, with sync attaining 65 req/s and 10507 ms median latency, and async attaining 98.86 req/s and 10080 ms, with significantly lower latency deviation (1506 ms vs 8000 ms). I have not seen any failed requests at all.

At the moment I am not able to test more than 1000 requests for unknown reason, and my machine is a humble Pentium E6800 with 6GB of memory running Windows 7.

While numbers suggest that indeed asynchronous version is more performant, the difference is not as devastating as testing on a desktop Windows IIS hints. I hope to return to this topic with more data.

I have tried with 7 seconds delay, and continuesly pumping of 15 requests per second for next 10 minutes.
After 7 – 8 minutes my response time had a gradual increase from 7 to 15 and more. So after running for hours,
the server system will be down.
When I tried the same scenareo, and replaced 7 seconds delay with a real time code which takes same time(about 7 sconds initially),
but the processing involves some api calls, loops and so on (time and resource consuming) I could get a normal uptime of 7 seconds
only for first 2-3 minutes.