Forked version of Atmos

A Scala library for describing retry-on-failure behavior using a concise, literate embedded DSL.

There are places in most modern software where small, intermittent errors can occur and disrupt the normal flow of execution. Traditionally, a retry loop is used in these situations, with constraints on the number of attempts that can be made and how certain results and errors are handled. A naive retry loop that would make up to three attempts, waiting 100 milliseconds between each attempt and accepting only non-empty results could look like this:

They obscure the actual work that the program is trying to do (the lone call to doSomethingThatMightFail() above).

They are convoluted and tend to contain mutable state, making them hard to reason about and resistant to change.

They are difficult and tedious to test, possibly leading to undiscovered bugs in the code base.

With the atmos library, a retry policy can be described using a minimalistic DSL, replacing unreadable retry loops with concise, descriptive text. The above example can be described using atmos like so:

Atmos decomposes the traditional retry loop into these five, independent strategies and allows you to easily recombine them in whatever fashion you see fit. A reconstructed retry policy is encapsulated in the atmos.RetryPolicy class.

Termination Policies

Termination policies determine when a retry operation will make no further attempts. Any type that implements the atmos.TerminationPolicy trait can be used in a retry policy, but the DSL exposes factory methods for creating the most common implementations. DSL methods that define termination policies return a RetryPolicy configured with that termination policy and with default values for its other properties.

A default retry policy that limits an operation to 3 attempts can be created with retrying:

Finally, a retry policy that immediately terminates can be created with neverRetry and a retry policy that never terminates (unless directed to by an error classifier) can be created with retryForever:

Backoff Policies

Backoff policies specify the delay before subsequent retry attempts and are configured by calling using on an existing retry policy. Any type that implements the atmos.BackoffPolicy trait can be used in a retry policy, but the DSL exposes factory methods for creating the most common implementations.

There are four basic backoff policies provided by this library:

importscala.concurrent.duration._importatmos.dsl._// Wait 5 milliseconds between each attempt.implicitvalretryPolicy= retryForever using constantBackoff { 5 millis }
// Wait 5 seconds after the first attempt, then 10 seconds, then 15 seconds and so on.valotherRetryPolicy= retryForever using linearBackoff { 5 seconds }
// Wait 5 minites after the first attempt, then 10 seconds, then 20 seconds and so on.valanotherRetryPolicy= retryForever using exponentialBackoff { 5 minutes }
// Wait 5 hours after the first attempt, then repeatedly multiply by the golden ratio after subsequent attempts.valyetAnotherRetryPolicy= retryForever using fibonacciBackoff { 5 hours }

For each of the above policy declarations, the parameter list may be omitted and the default backoff duration of 100 milliseconds will be used:

importatmos.dsl._// Wait the default 100 milliseconds between each attempt.implicitvalretryPolicy= retryForever using constantBackoff

Additionally, you can select the type of backoff to use based on the exception that caused the most recent attempt to fail:

Event Monitors

Event monitors are notified when retry attempts fail and are configured on a retry policy using monitorWith and alsoMonitorWith. Any type that implements the atmos.EventMonitor trait can be used in a retry policy, but the DSL exposes factory methods for creating the most common implementations.

Event monitors handle three distinct types of events:

Retrying events occur when an attempt has failed but another attempt is going to be made.

Interrupted events occur when an attempt has failed with a fatal error.

Aborted events occur when too many attempts have been made and failed.

This library supports using instances of Java's PrintStream and PrintWriter as targets for logging retry events. The specifics of what is printed can be customized for each type of event:

Finally, multiple event monitors can be chained together and each monitor will be notified of every event:

importjava.util.logging.Loggerimportatmos.dsl._// Submit information about failed attempts to stderr as well as an instance of `java.util.logging.Logger`.implicitvalretryPolicy= retryForever monitorWith System.err alsoMonitorWith Logger.getLogger("MyLoggerName")

Result Classifiers

Results that occur during a retry attempt can be classified as Acceptable or Unaccptable. Acceptable results will be immediately returned by the retry operation. Unaccptable results will be logged or suppressed so that the retry operation can continue if the status associated with the result indicates so. Result classifications are defined in atmos.ResultClassification.

Result classifiers are simply implementations of PartialFunction that map instances of Any to the desired result classification. In situations where a classifier is not defined for a particular result, any result is considered Acceptable. The appropriate partial function type is defined as atmos.ResultClassifier and includes a factory in the companion object.

Result classifiers are configured by calling onResult to replace the classifier on an existing retry policy, or by using orOnResult to chain a result classifier to the one that a retry policy already contains:

Error Classifiers

Errors that occur during a retry attempt can be classified as Fatal, Recoverable or SilentlyRecoverable. Fatal errors will interrupt a retry operation and cause it to immediately fail. Recoverable errors will be logged or suppressed so that the retry operation can continue. SilentlyRecoverable errors will be suppressed without being logged so that the retry operation can continue. Error classifications are defined in atmos.ErrorClassification.

Error classifiers are simply implementations of PartialFunction that map instances of Throwable to the desired error classification. In situations where a classifier is not defined for a particular error, scala.util.control.NonFatal is used to classify errors as Fatal or Recoverable. The appropriate partial function type is defined as atmos.ErrorClassifier and includes a factory in the companion object.

Error classifiers are configured by calling onError to replace the classifier on an existing retry policy, or by using orOnError to chain an error classifier to the one that a retry policy already contains:

Retrying Synchronously

To retry synchronously you pass a block of code to the retry() method and that block is repeatedly executed until it completes successfully or ultimately fails in accordance with your policy. If a block completes successfully then the value the block evaluates to becomes the return value of retry(). If a block fails to complete successfully, meaning it was interrupted by a fatal error or had to abort after too many attempts, then the most recently thrown exception is thrown from retry().

Typically, a retry policy is declared as an implicit variable and the retry() method from the DSL is used to execute a synchronous retry operation. However, if you have multiple policies in the same scope (or if you want to avoid using implicit parameters) you can also call retry() directly on the policy object:

When calling retry() you have the option of giving the operation a name that is included in any log messages:

importatmos.dsl._implicitvalpolicy= retryForever
// The following two statements will have a custom operation name in log messages:
retry("Doing something") { doSomething() }
valresult1= retry(Some("Getting something")) { getSomething() }
// The following two statements will have a generic operation name in log messages:
policy.retry() { doSomethingMysterious() }
valresult2= policy.retry(None) { getSomethingMysterious() }

You may optionally define an implicit rummage.Clock, from the rummage project, at the point the retry operation is invoked. This is the component responsible for blocking the calling thread until a backoff duration expires. By default, timing is controlled by a singular, global daemon thread. It is unlikely that you will need to provide a custom clock outside of testing.

It is important to note that synchronous retry operations will block the calling thread while waiting for a backoff duration to expire. Use synchronous retries carefully in situations where you do not control the calling thread.

Retrying Asynchronously

Atmos supports asynchronous retries using Scala Futures. Asynchronous retries are much more involved than their single-threaded cousins, so care must be taken to understand the retry execution model.

To retry asynchronously you call the retryAsync() method and pass it a block of code that evaluates to a scala.concurrent.Future. A single asyncnhronous attempt consists of executing the block and evaluating the outcome of the resulting future. If either the block or the future fails then the attempt fails and normal retry behavior takes over. The retryAsync() method returns a future that tracks the entire retry operation regardless of how many attempts are made. If any attempt succeeds then the returned future succeeds with the same value, if the operation fails then the returned future fails with the last reported exception.

When retrying asynchronously, certain additional dependencies must be specified:

There must be an implicit scala.concurrent.ExecutionContext available at the point the retry operation is invoked. This execution context is where the block provided to retryAsync() will be executed during subsequent retries. This can typically be the same context used to execute your futures (if applicable).

You may optionally define an implicit rummage.Clock, from the rummage project, at the point the retry operation is invoked. This is the component responsible for providing non-blocking, asynchronous callbacks based on when a backoff duration expires. By default, timing is controlled by a singular, global daemon thread. It is unlikely that you will need to provide a custom clock outside of testing unless you are working with actors.

The atmos DSL provides support for limiting the amount of time that a future is allowed to complete by providing a withDeadline method on scala.concurrent.Future.

Asynchronous retries support the same operations as the synchronous form: you may optionally provide an operation name and you can either call this method via the DSL with an implicit retry policy or directly on the retry policy itself.

Retrying with Actors

The atmos library has built-in support for Akka, specifically for retrying asynchronously when using the ask pattern. To use this library with actors there are only a couple extra steps involved beyond what is described in Retrying Asynchronously above.

First, you will want to make sure you have an implicit instance of rummage.Clock from the rummage project in scope, this will make sure that your actor system is the one responsible for scheduling asynchronous backoff timers. This can be accomplished by importing rummage.AkkaClocks._ inside an actor or by creating an implicit rummage.AkkaClock yourself. Second, you'll want to make sure and use Akka logging support to keep your entire retry operation non-blocking.