Why Asynchronous Servlets ?

Not Asynchronous IO

The concept of Asynchronous Servlets is often confused with Asynchronous IO or the use of NIO. However, Asynchronous Servlets are not primarily motivated by asynchronous IO, since:

HTTP Requests are mostly small and arrive in a single packet. Servlets rarely block on requests.

Many responses are small and fit within the server buffers, so servlets often do not block writing responses.

Even if we could expose asynchronous IO in a servlet, it is a hard paradigm to program. For example what would an application do if it read 2 bytes of a 3 byte UTF-8 character? It would have to buffer and wait for more bytes. This is best done by the container rather than the application.

Asynchronous Waiting

The main use-case for asynchronous servlets is waiting for non-IO events or resources. Many web applications need to wait at some stage during the processing of a HTTP request, for example:

waiting for a resource to be available before processing the request (e.g., thread, JDBC Connection)

waiting for a response from a remote service (e.g., RESTful or SOAP call to a web service).

The servlet API (<=2.5) supports only a synchronous call style, so that any waiting that a servlet needs to do must be with blocking. Unfortunately this means that the thread allocated to the request must be held during that wait along with all its resources: kernel thread, stack memory and often pooled buffers, character converters, EE authentication context, etc. It is wasteful of system resources to hold these resources while waiting.

Significantly better scalability and quality of service can be achieved if waiting is done asynchronously.

Asynchronous Servlet Examples

AJAX Comet Server Push

Web 2.0 applications can use the comet technique (aka AJAX Push, Server Push, Long Polling) to dynamically update a web page without refreshing the entire page.

Consider a stock portfolio web application. Each browser will send a long poll request to the server asking for any of the user's stock prices that have changed. The server will receive the long poll requests from all its clients, but will not immediately respond. Instead the server waits until a stock price changes, at which time it will send a response to each of the clients with that stock in their portfolio. The clients that receive the long poll response will immediately send another long poll request so they may obtain future price changes.

Thus the server will typically hold a long poll request for every connected user, so if the servlet is not asynchronous, there would need more than 1000 threads available to handle 1000 simultaneous users. 1000 threads can consume over 256MB of memory; that would be better used for the application rather than idly waiting for a price to change.

If the servlet is asynchronous, then the number of threads needed is governed by the time to generate each response and the frequency of price changes. If every user receives a price every 10 seconds and the response takes 10ms to generate, then 1000 users can be serviced with just 1 thread, and the 256MB of stack be freed for other purposes.

For more on comet see the cometd project that works asynchronously with Jetty

For an example of Jetty's solution, see the Cometd (aka Bayeux).

Asynchronous RESTful Web Service

Consider a web application that accesses a remote web service (e.g., SOAP service or RESTful service). Typically a remote web service can take hundreds of milliseconds to produce a response -- eBay's RESTful web service frequently takes 350ms to respond with a list of auctions matching a given keyword -- while only a few 10s of milliseconds of CPU time are needed to locally process a request and generate a response.

To handle 1000 requests per second, which each perform a 200ms web service call, a webapp would needs 1000*(200+20)/1000 = 220 threads and
110MB of stack memory. It would also be vulnerable to thread starvation if bursts occurred or the web service became slower.

If handled asynchronously, the web application would not need to hold a thread while waiting for web service response. Even if the asynchronous mechanism cost 10ms (which it doesn't), then this webapp would need 1000*(20+10)/1000 = 30 threads and 15MB of stack memory. This is a 86% reduction in the resources required and 95MB more memory would be available for the application.

Furthermore, if multiple web services request are required, the asynchronous approach allows these to be made in parallel rather than serially, without allocating additional threads.

Quality of Service (e.g., JDBC Connection Pool)

Consider a web application handling on average 400 requests per second, with each request interacting with the database for 50ms. To handle this load, 400*50/1000 = 20 JDBC connections are need on average. However, requests do not come at an even rate and there are often bursts and pauses. To protect a database from bursts, often a JDBC connection pool is applied to limit the simultaneous requests made on the database. So for this application, it would be reasonable to apply a JDBC pool of 30 connections, to provide for a 50% margin.

If momentarily the request rate doubled, then the 30 connections would only be able to handle 600 requests per second, and 200 requests per second would join those waiting on the JDBC Connection pool. Then if the servlet container had a thread pool with 200 threads, that would be entirely consumed by threads waiting for JDBC connections in 1 second of this request rate. After 1s, the web application would be unable to process any requests at all because no threads would be available. Even requests that do not use the database would be blocked due to thread starvation. To double the thread pool would require an additional 100MB of stack memory and would only give the application another 1s of grace under load!

This thread starvation situation can also occur if the database runs slowly or is momentarily unavailable. Thread starvation is a very frequently reported problem, and causes the entire web service to lock up and become unresponsive.

If the web container was able to threadlessly suspend the requests waiting for a JDBC connection, then thread starvation would not occur, as only 30 threads would be consumed by requests accessing the database and the other 470 threads would be available to process the request that do not access the database.

Servlet Threading Model

The scalability issues of Java servlets are caused mainly by the server threading model:

Thread per connection

The traditional IO model of Java associated a thread with every TCP/IP connection. If you have a few very active threads, this model can scale to a very high number of requests per second.

However, the traffic profile typical of many web applications is many persistent HTTP connections that are mostly idle while users read pages or search for the next link to click. With such profiles, the thread-per-connection model can have problems scaling to the thousands of threads required to support thousands of users on large scale deployments.

Thread per request

The Java NIO libraries support asynchronous IO, so that threads no longer need to be allocated to every connection. When the connection is idle (between requests), then the connection is added to an NIO select set, which allows one thread to scan many connections for activity. Only when IO is detected on a connection is a thread allocated to it. However, the servlet 2.5 API model still requires a thread to be allocated for the duration of the request handling.

This thread-per-request model allows much greater scaling of connections (users) at the expense of a small reduction to maximum requests per second due to extra scheduling latency.

Asynchronous Request handling

The Jetty Continuation (and the servlet 3.0 asynchronous) API introduce a change in the servlet API that allows a request to be dispatched multiple times to a servlet. If the servlet does not have the resources required on a dispatch, then the request is suspended (or put into asynchronous mode), so that the servlet may return from the dispatch without a response being sent. When the waited-for resources become available, the request is re-dispatched to the servlet, with a new thread, and a response is generated.

Feature

Jetty 6 Continuations

Asynchronous servlets were originally introduced with Jetty-6 Continuations, which were a Jetty-specific mechanism.

Jetty Continuations

From Jetty 7 onwards, the Continuations API has been extended to be a general purpose API that will work asynchronously on any servlet-3.0 container, as well as on Jetty 6, 7, or 8. Continuations will also work in blocking mode with any servlet 2.5 container. Continuations should be considered an application abstraction and portability layer on top of the implementation detail of asynchronous servlets.

Using Continuations

Obtaining a Continuation

The ContinuationSupport factory class can be used to obtain a continuation instance associated with a request:

The lifecycle of the request will be extended beyond the return to the container from the Servlet.service(...) method and Filter.doFilter(...) calls. When these dispatch methods return, the suspended request will not yet be committed and a response will not yet be sent to the HTTP client.

Once the request has been suspended, the continuation should be registered with an asynchronous service so that it may be used by an asynchronous callback when the waited-for event happens.

The request will be suspended until either continuation.resume() or continuation.complete() is called. If neither is called then the continuation will timeout. The timeout should be set before the suspend, by a call to continuation.setTimeout(long); if no timeout is set, then the default period is used. If no timeout listeners resume or complete the continuation, then the continuation is resumed with continuation.isExpired() true.

There is a variation of suspend for use with request wrappers and the complete lifecycle (see below):

continuation.suspend(response);

Suspension is analogous to the servlet 3.0 request.startAsync() method. Unlike jetty-6 continuations, an exception is not thrown by suspend and the method should return normally. This allows the registration of the continuation to occur after suspension and avoids the need for a mutex. If an exception is desirable (to bypass code that is unaware of continuations and may try to commit the response), then continuation.undispatch() may be called to exit the current thread from the current dispatch by throwing a ContinuationThrowable.

Resuming a Request

Once an asynchronous event has occurred, the continuation can be resumed:

When a continuation is resumed, the request is redispatched to the servlet container, almost as if the request had been received again. However during the redispatch, the continuation.isInitial() method returns false and any attributes set by the asynchronous handler are available.

Completing a Request

As an alternative to resuming a request, an asynchronous handler may write the response itself. After writing the response, the handler must indicate the request handling is complete by calling the complete method:

Continuation Patterns

Suspend Resume Pattern

The suspend/resume style is used when a servlet and/or filter is used to generate the response after an asynchronous wait that is terminated by an asynchronous handler. Typically a request attribute is used to pass results and to indicate if the request has already been suspended.

void doGet(HttpServletRequest request, HttpServletResponse response){// if we need to get asynchronous resultsObject results = request.getAttribute("results");if(results==null){final Continuation continuation = ContinuationSupport.getContinuation(request);// if this is not a timeoutif(continuation.isExpired()){
sendMyTimeoutResponse(response);return;}// suspend the request
continuation.suspend();// always suspend before registration// register with async service. The code here will depend on the// the service used (see Jetty HttpClient for example)
myAsyncHandler.register(new MyHandler(){publicvoid onMyEvent(Object result){
continuation.setAttribute("results",results);
continuation.resume();}});return;// or continuation.undispatch();}// Send the results
sendMyResultResponse(response,results);}

This style is very good when the response needs the facilities of the servlet container (e.g., it uses a web framework) or if one event may resume many requests so the container's thread pool can be used to handle each of them.

Suspend Continue Pattern

The suspend/complete style is used when an asynchronous handler is used to generate the response:

void doGet(HttpServletRequest request, HttpServletResponse response){final Continuation continuation = ContinuationSupport.getContinuation(request);// if this is not a timeoutif(continuation.isExpired()){
sendMyTimeoutResponse(request,response);return;}// suspend the request
continuation.suspend(response);// response may be wrapped.// register with async service. The code here will depend on the// the service used (see Jetty HttpClient for example)
myAsyncHandler.register(new MyHandler(){publicvoid onMyEvent(Object result){
sendMyResultResponse(continuation.getServletResponse(),results);
continuation.complete();}});}

This style is very good when the response does not need the facilities of the servlet container (e.g., it does not use a web framework) and if an event will resume only one continuation. If many responses are to be sent (e.g., a chat room), then writing one response may block and cause a DOS on the other responses.

Continuation Examples

Chat Servlet

The ChatServlet example shows how the suspend/resume style can be used to directly code a chat room. The same principles are applied to frameworks like cometd.org which provide an richer environment for such applications, based on Continuations.

Quality of Service Filter

The QoSFilter(javadoc), uses suspend/resume style to limit the number of requests simultaneously within the filter. This can be used to protect a JDBC connection pool or other limited resource from too many simultaneous requests.

If too many requests are received, the extra requests wait for a short time on a semaphore, before being suspended. As requests within the filter return, they use a priority queue to resume the suspended requests. This allows your authenticated or priority users to get a better share of your server's resources when the machine is under load.

Denial of Service Filter

The DosFilter(javadoc) is similar to the QoSFilter, but protects a web application from a denial of service attack, as much as is possible from within a web application.

If too many requests are detected coming from one source, then those requests are suspended and a warning generated. This works on the assumption that the attacker may be written in simple blocking style, so by suspending you are hopefully consuming their resources. True protection from DOS can only be achieved by network devices (or eugenics :)).

Proxy Servlet

Gzip Filter

The Jetty GzipFilter is a filter that implements dynamic compression by wrapping the response objects. This filter has been enhanced to understand continuations, so that if a request is suspended in suspend/complete style and the wrapped response is passed to the asynchronous handler, then a ContinuationListener is used to finish the wrapped response. This allows the GzipFilter to work with the asynchronous ProxyServlet and to compress the proxied responses.