Please look carefully! getWithRetry() does not block. It returns CompletableFuture immediately and invokes given function asynchronously. You can listen for that Future or even for multiple futures at once and do other work in the meantime. So what this code does is: trying to connect to localhost:8080 and if it fails with SocketException it will retry after 500 milliseconds (with some random jitter), doubling delay after each retry, but not above 10 seconds.

thenAcceptBoth() callback is executed asynchronously when both slow and unreliable systems finally reply without any failure. Similarly (using CompletableFuture.acceptEither()) you can call two or more unreliable servers asynchronously at the same time and be notified when the first one succeeds after some number of retries.

I can’t emphasize this enough – retries are executed asynchronously and effectively use thread pool, rather than sleeping blindly.

Rationale

Often we are forced to retry given piece of code because it failed and we must try again, typically with a small delay to spare CPU. This requirement is quite common and there are few ready-made generic implementations with retry support in Spring Batch through RetryTemplate class being best known. But there are few other, quite similar approaches ([1], [2]). All of these attempts (and I bet many of you implemented similar tool yourself!) suffer the same issue – they are blocking, thus wasting a lot of resources and not scaling well.

This is not bad per se because it makes programming model much simpler – the library takes care of retrying and you simply have to wait for return value longer than usual. But not only it creates leaky abstraction (method that is typically very fast suddenly becomes slow due to retries and delay), but also wastes valuable threads since such facility will spend most of the time sleeping between retries. ThereforeAsync-Retry utility was created, targeting Java 8 (with Java 7 backport existing) and addressing issues above.

Don’t worry about RetryRunnable and RetryCallable – they allow checked exceptions for your convenience and most of the time we will use lambda expressions anyway.

Please note that it returns CompletableFuture. We no longer pretend that calling faulty method is fast. If the library encounters an exception it will retry our block of code with preconfigured backoff delays. The invocation time will sky-rocket from milliseconds to several seconds. CompletableFuture clearly indicates that. Moreover it’s not a dumb java.util.concurrent.Future we all know – CompletableFuture in Java 8 is very powerful and most importantly – non-blocking by default.

If you need blocking result after all, just call .get() on Future object.

Basic API

The API is very simple. You provide a block of code and the library will run it multiple times until it returns normally rather than throwing an exception. It may also put configurable delays between retries:

Returned CompletableFuture<Socket> will be resolved once connecting to localhost:8080 succeeds. Optionally we can consume RetryContext to get extra context like which retry is currently being executed:

This code is more clever than it looks. During first execution ctx.getRetryCount() returns 0, therefore we try to connect to localhost:8080. If this fails, next retry will try localhost:8081 (8080 + 1) and so on. And if you realize that all of this happens asynchronously you can scan ports of several machines and be notified about first responding port on each host:

Passing asyncHttp() to getWithRetry() will yield CompletableFuture<CompletableFuture<V>>. Not only it’s awkward to work with, but also broken. The library will barely call asyncHttp() and retry only if it fails, but not if returnedCompletableFuture<String> fails. The solution is simple:

In this case RetryExecutor will understand that whatever was returned from asyncHttp() is the actually just a Future and will (asynchronously) wait for result or failure. This library is much more powerful, so let’s dive into:

Configuration options

In general there are two important factors you can configure: RetryPolicy that controls whether next retry attempt should be made and Backoff – that optionally adds delay between subsequent retry attempts.

By default RetryExecutor repeats user task infinitely on every Throwable and adds 1 second delay between retry attempts.

Creating an instance of RetryExecutor

Default implementation of RetryExecutor is AsyncRetryExecutor which you can create directly:

The only required dependency is standard ScheduledExecutorService from JDK. One thread is enough in many cases but if you want to concurrently handle retries of hundreds or more tasks, consider increasing the pool size.

Notice that the AsyncRetryExecutor does not take care of shutting down the ScheduledExecutorService. This is a conscious design decision which will be explained later.

AsyncRetryExecutor has few other constructors but most of the time altering the behaviour of this class is most convenient with calling chained with*() methods. You will see plenty of examples written this way. Later on we will simply use executor reference without defining it. Assume it’s of RetryExecutor type.

Retrying policy

Exceptions

By default every Throwable (except special AbortRetryException) thrown from user task causes retry. Obviously this is configurable. For example in JPA you may want to retry a transaction that failed due to OptimisticLockException – but every other exception should fail immediately:

Where dao.optimistic() may throw OptimisticLockException. In that case you probably don’t want any delay between retries, more on that later. If you don’t like the default of retrying on every Throwable, just restrict that using retryOn():

executor.retryOn(Exception.class)

Of course the opposite might also be desired – to abort retrying and fail immediately in case of certain exception being thrown rather than retrying. It’s that simple:

Clearly you don’t want to retry NullPointerException or IllegalArgumentException as they indicate programming bug rather than transient failure. And finally you can combine both retry and abort policies. User code will retry in case of any retryOn() exception (or subclass) unless it should abortOn() specified exception. For example we want to retry every IOException or SQLException but abort if FileNotFoundException or java.sql.DataTruncation is encountered (order is irrelevant):

Max number of retries

Another way of interrupting retrying “loop” (remember that this process is asynchronous, there is no blocking loop) is by specifying maximum number of retries:

executor.withMaxRetries(5)

In rare cases you may want to disable retries and barely take advantage from asynchronous execution. In that case try:

executor.dontRetry()

Delays between retries (backoff)

Retrying immediately after failure is sometimes desired (see OptimisticLockException example) but in most cases it’s a bad idea. If you can’t connect to external system, waiting a little bit before next attempt sounds reasonably. You save CPU, bandwidth and other server’s resources. But there are quite a few options to consider:

should we add random “jitter” to delay times to spread retries of many tasks in time?

This library answers all these questions.

Fixed interval between retries

By default each retry is preceded by 1 second waiting time. So if initial attempt fails, first retry will be executed after 1 second. Of course we can change that default, e.g. to 200 milliseconds:

executor.withFixedBackoff(200)

If we are already here, by default backoff is applied after executing user task. If user task itself consumes some time, retries will be less frequent. For example with retry delay of 200ms and average time it takes before user task fails at about 50ms RetryExecutor will retry about 4 times per second (50ms + 200ms). However if you want to keep retry frequency at more predictable level you can use fixedRate flag:

executor.
withFixedBackoff(200).
withFixedRate()

This is similar to “fixed rate” vs. “fixed delay” approaches in ScheduledExecutorService. BTW don’t expect RetryExecutor to be very precise, it does it’s best but it heavily depends on aforementioned ScheduledExecutorService accuracy.

Exponentially growing intervals between retries

It’s probably an active research subject, but in general you may wish to expand retry delay over time, assuming that if the user task fails several times we should try less frequently. For example let’s say we start with 100ms delay until first retry attempt is made but if that one fails as well, we should wait two times more (200ms). And later 400ms, 800ms… You get the idea:

executor.withExponentialBackoff(100, 2)

This is an exponential function that can grow very fast. Thus it’s useful to set maximum backoff time at some reasonable level, e.g. 10 seconds:

Random jitter

One phenomena often observed during major outages is that systems tend to synchronize. Imagine a busy system that suddenly stops responding. Hundreds or thousands of requests fail and are retried. It depends on your backoff but by default all these requests will retry exactly after one second producing huge wave of traffic at one point in time. Finally such failures are propagated to other systems that, in turn, synchronize as well.

To avoid this problem it’s useful to spread retries over time, flattening the load. A simple solution is to add random jitter to delay time so that not all request are scheduled for retry at the exact same time. You have choice between uniform jitter (random value from -100ms to 100ms):

executor.withUniformJitter(100) //ms

…and proportional jitter, multiplying delay time by random factor, by default between 0.9 and 1.1 (10%):

executor.withProportionalJitter(0.1) //10%

You may also put hard lower limit on delay time to avoid to short retry times being scheduled:

executor.withMinDelay(50) //ms

Implementation details

This library was built with Java 8 in mind to take advantage of lambdas and new CompletableFuture abstraction (but Java 7 port with Guava dependency exists). It uses ScheduledExecutorService underneath to run tasks and schedule retries in the future – which allows best thread utilization.

But what is really interesting is that the whole library is fully immutable, there is no single mutable field, at all. This might be counter-intuitive at first, take for example this trivial code sample:

It might seem that all with*() methods or retryOn()/ abortOn() mutate existing executor. But that’s not the case, each configuration change creates new instance, leaving the old one untouched. So for example while first executor will retry on FileNotFoundException, the second and third won’t. However they all share the same scheduler. This is the reason why AsyncRetryExecutor does not shut down ScheduledExecutorService (it doesn’t even have any close() method). Since we have no idea how many copies of AsyncRetryExecutor exist pointing to the same scheduler, we don’t even try to manage its lifecycle. However this is typically not a problem (see Spring integration below).

You might be wondering, why such an awkward design decision? There are three reasons:

when writing a concurrent code immutability can greatly reduce risk of multi-threading bugs. For example RetryContext holds number of retries. But instead of mutating it we simply create new instance (copy) with incremented but final counter. No race condition or visibility can ever occur.

if you are given an existing RetryExecutor which is almost exactly what you want but you need one minor tweak, you simply call executor.with...() and get a fresh copy. You don’t have to worry about other places where the same executor was used (see: Spring integration for further examples)

functional programming and immutable data structures are sexy these days.

N.B.: AsyncRetryExecutor is not marked final, does you can break immutability by subclassing it and adding mutable state. Please don’t do this, subclassing is only permitted to alter behaviour.

Dependencies

Spring integration

If you are just about to use RetryExecutor in Spring – feel free, but the configuration API might not work for you. Spring promotes (or used to promote) the convention of mutable services with plenty of setters. In XML you define bean and invoke setters (via <property name="..."/>) on it. This convention assumes the existence of mutating setters. But I found this approach error-prone and counter-intuitive under some circumstances.

final int oldTimeout = template.getTimeout();
template.setTimeout(10_000);
//do the work
template.setTimeout(oldTimeout);

This code is wrong on so many levels! First of all if something fails we never restore oldTimeout. OK, finally to the rescue. But also notice how we changed global, shared TransactionTemplate instance. Who knows how many other beans and threads are just about to use it, unaware of changed configuration?

And even if you do want to globally change the transaction timeout, fair enough, but it’s still wrong way to do this. private timeout field is not volatile and thus changes made to it may or may not be visible to other threads. What a mess! The same problem appears with many other classes like JmsTemplate.

You see where I’m going? Just create one, immutable service class and safely adjust it by creating copies whenever you need it. And using such services is equally simple these days:

As you can see integrating modern, immutable services with Spring is just as simple. BTW if you are not prepared for such a big change when designing your own services, at least consider constructor injection.

Newsletter

Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

Email address:

Recent Jobs

No job listings found.

Join Us

With 1,240,600 monthly unique visitors and over 500 authors we are placed among the top Java related sites around. Constantly being on the lookout for partners; we encourage you to join us. So If you have a blog with unique and interesting content then you should check out our JCG partners program. You can also be a guest writer for Java Code Geeks and hone your writing skills!

Disclaimer

All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners. Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. Examples Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.