Circuit Breaker
The circuit breaker is a resilience pattern that prevents a failing downstream service from taking down your entire system. Named after the electrical component, it trips open when failures exceed a threshold — stopping requests from even reaching the broken service — then cautiously tests recovery before closing again.
The Problem
In a distributed system, services call each other over the network. When a downstream service becomes slow or unavailable, callers typically block waiting for a response. With enough concurrent callers, threads exhaust, connection pools fill up, and the failure cascades upstream — a single slow service can bring down the entire call chain.
- Cascading failures — Service A calls Service B which calls Service C. C slows down, B's thread pool fills waiting on C, A's thread pool fills waiting on B. The entire system grinds to a halt even though only C is the root cause.
- Wasted resources — requests that are guaranteed to fail still consume threads, memory, and connection pool slots while they wait for a timeout.
- Slow recovery — when the failing service recovers, it is immediately hit by the backlog of queued requests, potentially overloading it again before it stabilises.
How It Works
A circuit breaker wraps calls to a remote service and tracks their outcomes. It operates as a state machine with three states:
- Closed — normal operation. Requests pass through. Failures are counted. When the failure rate exceeds a configured threshold (e.g. 50% of the last 10 calls), the circuit trips to Open.
- Open — the circuit is tripped. All requests fail immediately with a fallback response — no network call is made. After a configured wait duration (e.g. 30 seconds), the circuit moves to Half-Open to test recovery.
- Half-Open — a limited number of probe requests are allowed through. If they succeed, the circuit closes and normal traffic resumes. If they fail, the circuit opens again and the wait timer resets.
| State | Requests | Transitions to |
|---|---|---|
| Closed | Pass through normally | Open — when failure rate exceeds threshold |
| Open | Fail fast — no network call | Half-Open — after wait duration expires |
| Half-Open | Limited probe requests allowed | Closed on success · Open on failure |
Java Implementation
A minimal circuit breaker from scratch to show the state machine clearly:
public class CircuitBreaker {
public enum State { CLOSED, OPEN, HALF_OPEN }
private final int failureThreshold; // failures before opening
private final int probeThreshold; // successes to close from half-open
private final long waitDurationMs; // time to stay open before probing
private State state = State.CLOSED;
private int failureCount = 0;
private int successCount = 0;
private long openedAt = 0;
public CircuitBreaker(int failureThreshold, int probeThreshold, long waitDurationMs) {
this.failureThreshold = failureThreshold;
this.probeThreshold = probeThreshold;
this.waitDurationMs = waitDurationMs;
}
public <T> T call(Supplier<T> action, Supplier<T> fallback) {
if (state == State.OPEN) {
if (System.currentTimeMillis() - openedAt >= waitDurationMs) {
transitionTo(State.HALF_OPEN);
} else {
return fallback.get(); // fail fast
}
}
try {
T result = action.get();
onSuccess();
return result;
} catch (Exception e) {
onFailure();
return fallback.get();
}
}
private void onSuccess() {
failureCount = 0;
if (state == State.HALF_OPEN) {
successCount++;
if (successCount >= probeThreshold) {
transitionTo(State.CLOSED);
}
}
}
private void onFailure() {
successCount = 0;
if (state == State.HALF_OPEN) {
transitionTo(State.OPEN);
return;
}
failureCount++;
if (failureCount >= failureThreshold) {
transitionTo(State.OPEN);
}
}
private void transitionTo(State next) {
state = next;
failureCount = 0;
successCount = 0;
if (next == State.OPEN) openedAt = System.currentTimeMillis();
}
public State getState() { return state; }
}Usage:
CircuitBreaker cb = new CircuitBreaker(
5, // open after 5 consecutive failures
2, // close after 2 consecutive successes in half-open
30_000 // wait 30 seconds before probing
);
String result = cb.call(
() -> paymentService.charge(order), // action
() -> "Payment service unavailable" // fallback
);Resilience4j
Resilience4j is the standard circuit breaker library for Java and Spring Boot. It replaces Netflix Hystrix (now in maintenance mode) and provides a count-based or time-based sliding window, thread-safe state transitions, metrics integration, and Spring annotations.
Dependency (Gradle)
implementation 'io.github.resilience4j:resilience4j-spring-boot3:2.2.0'Configuration (application.yml)
resilience4j:
circuitbreaker:
instances:
paymentService:
sliding-window-type: COUNT_BASED
sliding-window-size: 10 # last 10 calls
failure-rate-threshold: 50 # open if ≥ 50% fail
wait-duration-in-open-state: 30s
permitted-number-of-calls-in-half-open-state: 3
slow-call-duration-threshold: 2s # slow calls count as failures
slow-call-rate-threshold: 80Annotation-based usage
@Service
public class PaymentService {
@CircuitBreaker(name = "paymentService", fallbackMethod = "chargeFallback")
public String charge(Order order) {
return externalPaymentGateway.process(order);
}
// fallback must match the return type and add a Throwable parameter
private String chargeFallback(Order order, Throwable ex) {
log.warn("Payment gateway unavailable: {}", ex.getMessage());
return "Payment queued for retry";
}
}Programmatic usage
CircuitBreakerRegistry registry = CircuitBreakerRegistry.ofDefaults();
CircuitBreaker cb = registry.circuitBreaker("paymentService");
Supplier<String> decorated = CircuitBreaker
.decorateSupplier(cb, () -> externalPaymentGateway.process(order));
String result = Try.ofSupplier(decorated)
.recover(throwable -> "Payment service unavailable")
.get();Count-based vs time-based window
| COUNT_BASED | TIME_BASED | |
|---|---|---|
| Window | Last N calls | Calls in last N seconds |
| Best for | Consistent request rate | Variable or bursty traffic |
| Behaviour under low traffic | Window fills slowly — slower to detect failures | Reflects real time — more responsive |