Take 60 minutes to understand the Power of the Actor Model with "Designing Reactive Systems: The Role Of Actors In Distributed Architecture". Brought to you in partnership with Lightbend.

When our service depends on a third party service’s endpoint for a response, it is possible that, due to various reasons that may vary from bad connection to an internal server error from the third party, our own service fails. A circuit breaker is employed to prevent cascading failures in our services if such scenarios were to occur. And if you are here, you are probably looking forward to customizing the circuit breaker within your own service. So, shall we?

First of, when working with Lagom in Java, the circuit breaker functionality is on even if you don’t do anything about it. But it is always better to customize things to suit our own needs, right? So let’s first get the concept of circuit breaker clear in our head. A circuit breaker can have one of the following three states at any instant:

Closed,

Open, and

Half-Open

Also, there are three parameters that govern the state of our circuit breaker, which we can specify in the application.conf file in the /resources folder:

max-failures

reset-timeout

call-timeout

Consider two services, S1 (client service) and S2 (supplier service), and imagine a circuit between them. A circuit breaker is like a switch that exists between these two services. Initially, this switch is closed, and the flow of request/response between the two services goes smoothly. Also, assume that we have set the values of our three parameters as below:

max-failures = 10

reset-timeout = 15 seconds

call-timeout = 10 seconds

Now, if for whatever reason S2 fails to deliver a response to S1 within the call-timeout set in the configuration (here we have set it to 10 seconds), it is counted as a failure. After the number of failures reaches the maximum limit of max-failures specified in the application.conf (i.e. when 10 failures have occurred in our case), our circuit breaker goes to the open state, also for the duration specified for the reset-timeout parameter (15 seconds for this example). During this state, if S1 tries to reach S2 for a response, the circuit breaker generates a CircuitBreakerOpenException as a response to the request. After the reset-timeout period is over, our circuit-breaker enters the half-open state temporarily for the next request in the queue from S1 and tries to get the response from S2. If the response is not received yet again, the circuit breaker goes back to the open state for another reset-timeoutperiod. However, if S1 receives a successful response from S2 (i.e., with status code 200), the circuit breaker goes back to the closed state.

So let’s get to the configuration part in our implementation.

In the /resources/application.conf within your service-impl, add the following configuration for now.

#default circuit breaker configuration.
lagom.circuit-breaker {
# Default configuration that is used if a configuration section
# with the circuit breaker identifier is not defined.
default {
# Enable/Disable circuit breaker.
enabled = on
# Number of failures before opening the circuit.
max-failures = 10
max-failures = ${?CIRCUIT_BREAKER_MAX_FAILURES}
# Duration of time in open state after which to attempt to close
# the circuit, by first entering the half-open state.
reset-timeout = 15s
reset-timeout = ${?CIRCUIT_BREAKER_RESET_TIMEOUT}
# Duration of time after which to consider a call a failure.
call-timeout = 10s
call-timeout = ${?CIRCUIT_BREAKER_CALL_TIMEOUT}
}
}

Once you’ve done that, you have specified a default configuration for your circuit breaker. You can make modifications in these configurations to suit your own requirements, but is it really this simple?

Actually, yes! You can even further customize your configuration for your service to any level you want. I believe your Lagom service’s API looks somewhat as below.

If we do not specify the configuration in application.conf, the circuit breaker would consider the default parameters as we specified previously. However, if you now replace that with the below configuration in application.conf..

lagom.circuit-breaker {
# Default configuration that is used if a configuration section
# with the circuit breaker identifier is not defined.
default {
# Enable/Disable circuit breaker.
enabled = on
# Number of failures before opening the circuit.
max-failures = 10
max-failures = ${?CIRCUIT_BREAKER_MAX_FAILURES}
# Duration of time in open state after which to attempt to close
# the circuit, by first entering the half-open state.
reset-timeout = 15s
reset-timeout = ${?CIRCUIT_BREAKER_RESET_TIMEOUT}
# Duration of time after which to consider a call a failure.
call-timeout = 10s
call-timeout = ${?CIRCUIT_BREAKER_CALL_TIMEOUT}
}
demo {
# Enable/Disable circuit breaker.
enabled = off
# Number of failures before opening the circuit.
max-failures = 2
max-failures = ${?CIRCUIT_BREAKER_MAX_FAILURES}
# Duration of time in open state after which to attempt to close
# the circuit, by first entering the half-open state.
reset-timeout = 15s
reset-timeout = ${?CIRCUIT_BREAKER_RESET_TIMEOUT}
# Duration of time after which to consider a call a failure.
call-timeout = 10s
call-timeout = ${?CIRCUIT_BREAKER_CALL_TIMEOUT}
}
demo2 {
# Enable/Disable circuit breaker.
enabled = on
# Number of failures before opening the circuit.
max-failures = 3
max-failures = ${?CIRCUIT_BREAKER_MAX_FAILURES}
# Duration of time in open state after which to attempt to close
# the circuit, by first entering the half-open state.
reset-timeout = 15s
reset-timeout = ${?CIRCUIT_BREAKER_RESET_TIMEOUT}
# Duration of time after which to consider a call a failure.
call-timeout = 10s
call-timeout = ${?CIRCUIT_BREAKER_CALL_TIMEOUT}
}
}

... you will observe that the configuration specified for demo overrides the default configuration for all the API calls within the descriptor named demo (i.e., for /demopath1 and for /demopath2). Thereafter, since in the case of /demopath2, we have explicitly specified that it should use the circuit breaker configuration identified by demo2, its circuit breaker configuration is further overridden.

Please note that if values for CIRCUIT_BREAKER_MAX_FAILURES, CIRCUIT_BREAKER_RESET_TIMEOUT, and CIRCUIT_BREAKER_CALL_TIMEOUT are set in the environment variables, the values hard-coded in application.conf are over-ridden by those in the example used above.

Also, while implementing my own circuit breaker, I observed that when I was running the integration test cases on my service, I was getting the CircuitBreakerOpenException after the occurrence of max-failures, but I wasn’t getting the same exception when I was hitting multiple requests using CURL or my browser. The reason for that is the circuit breaker does not come into the picture when there is just one independent service and some end users hit that service directly using a browser or an HTTP client. The role of a circuit breaker is clearly defined in the documentation as: “A circuit breaker is used to provide stability and prevent cascading failures in distributed systems. These should be used in conjunction with judicious timeouts at the interfaces between services to prevent the failure of a single service from bringing down other services.”

I hope this was helpful. I’m still looking more into circuit breaker implementation in Lagom, so if you have any queries regarding it, please feel free to drop a comment down below. And if you found this helpful at any level, please like and share this blog, we are all here to learn! Cheers.

Learn how the Actor model provides a simple but powerful way to design and implement reactive applications that can distribute work across clusters of cores and servers. Brought to you in partnership with Lightbend.