Circuit Breaker

The circuit breaker is a resilience pattern that prevents a failing downstream service from taking down your entire system. Named after the electrical component, it trips open when failures exceed a threshold — stopping requests from even reaching the broken service — then cautiously tests recovery before closing again.

The Problem

In a distributed system, services call each other over the network. When a downstream service becomes slow or unavailable, callers typically block waiting for a response. With enough concurrent callers, threads exhaust, connection pools fill up, and the failure cascades upstream — a single slow service can bring down the entire call chain.

  1. Cascading failures — Service A calls Service B which calls Service C. C slows down, B's thread pool fills waiting on C, A's thread pool fills waiting on B. The entire system grinds to a halt even though only C is the root cause.
  2. Wasted resources — requests that are guaranteed to fail still consume threads, memory, and connection pool slots while they wait for a timeout.
  3. Slow recovery — when the failing service recovers, it is immediately hit by the backlog of queued requests, potentially overloading it again before it stabilises.

How It Works

A circuit breaker wraps calls to a remote service and tracks their outcomes. It operates as a state machine with three states:

  1. Closed — normal operation. Requests pass through. Failures are counted. When the failure rate exceeds a configured threshold (e.g. 50% of the last 10 calls), the circuit trips to Open.
  2. Open — the circuit is tripped. All requests fail immediately with a fallback response — no network call is made. After a configured wait duration (e.g. 30 seconds), the circuit moves to Half-Open to test recovery.
  3. Half-Open — a limited number of probe requests are allowed through. If they succeed, the circuit closes and normal traffic resumes. If they fail, the circuit opens again and the wait timer resets.
StateRequestsTransitions to
ClosedPass through normallyOpen — when failure rate exceeds threshold
OpenFail fast — no network callHalf-Open — after wait duration expires
Half-OpenLimited probe requests allowedClosed on success · Open on failure

Java Implementation

A minimal circuit breaker from scratch to show the state machine clearly:

public class CircuitBreaker {

    public enum State { CLOSED, OPEN, HALF_OPEN }

    private final int    failureThreshold;  // failures before opening
    private final int    probeThreshold;    // successes to close from half-open
    private final long   waitDurationMs;    // time to stay open before probing

    private State state           = State.CLOSED;
    private int   failureCount    = 0;
    private int   successCount    = 0;
    private long  openedAt        = 0;

    public CircuitBreaker(int failureThreshold, int probeThreshold, long waitDurationMs) {
        this.failureThreshold = failureThreshold;
        this.probeThreshold   = probeThreshold;
        this.waitDurationMs   = waitDurationMs;
    }

    public <T> T call(Supplier<T> action, Supplier<T> fallback) {
        if (state == State.OPEN) {
            if (System.currentTimeMillis() - openedAt >= waitDurationMs) {
                transitionTo(State.HALF_OPEN);
            } else {
                return fallback.get();   // fail fast
            }
        }

        try {
            T result = action.get();
            onSuccess();
            return result;
        } catch (Exception e) {
            onFailure();
            return fallback.get();
        }
    }

    private void onSuccess() {
        failureCount = 0;
        if (state == State.HALF_OPEN) {
            successCount++;
            if (successCount >= probeThreshold) {
                transitionTo(State.CLOSED);
            }
        }
    }

    private void onFailure() {
        successCount = 0;
        if (state == State.HALF_OPEN) {
            transitionTo(State.OPEN);
            return;
        }
        failureCount++;
        if (failureCount >= failureThreshold) {
            transitionTo(State.OPEN);
        }
    }

    private void transitionTo(State next) {
        state        = next;
        failureCount = 0;
        successCount = 0;
        if (next == State.OPEN) openedAt = System.currentTimeMillis();
    }

    public State getState() { return state; }
}

Usage:

CircuitBreaker cb = new CircuitBreaker(
    5,      // open after 5 consecutive failures
    2,      // close after 2 consecutive successes in half-open
    30_000  // wait 30 seconds before probing
);

String result = cb.call(
    () -> paymentService.charge(order),   // action
    () -> "Payment service unavailable"   // fallback
);

Resilience4j

Resilience4j is the standard circuit breaker library for Java and Spring Boot. It replaces Netflix Hystrix (now in maintenance mode) and provides a count-based or time-based sliding window, thread-safe state transitions, metrics integration, and Spring annotations.

Dependency (Gradle)

implementation 'io.github.resilience4j:resilience4j-spring-boot3:2.2.0'

Configuration (application.yml)

resilience4j:
  circuitbreaker:
    instances:
      paymentService:
        sliding-window-type: COUNT_BASED
        sliding-window-size: 10          # last 10 calls
        failure-rate-threshold: 50       # open if ≥ 50% fail
        wait-duration-in-open-state: 30s
        permitted-number-of-calls-in-half-open-state: 3
        slow-call-duration-threshold: 2s # slow calls count as failures
        slow-call-rate-threshold: 80

Annotation-based usage

@Service
public class PaymentService {

    @CircuitBreaker(name = "paymentService", fallbackMethod = "chargeFallback")
    public String charge(Order order) {
        return externalPaymentGateway.process(order);
    }

    // fallback must match the return type and add a Throwable parameter
    private String chargeFallback(Order order, Throwable ex) {
        log.warn("Payment gateway unavailable: {}", ex.getMessage());
        return "Payment queued for retry";
    }
}

Programmatic usage

CircuitBreakerRegistry registry = CircuitBreakerRegistry.ofDefaults();
CircuitBreaker cb = registry.circuitBreaker("paymentService");

Supplier<String> decorated = CircuitBreaker
    .decorateSupplier(cb, () -> externalPaymentGateway.process(order));

String result = Try.ofSupplier(decorated)
    .recover(throwable -> "Payment service unavailable")
    .get();

Count-based vs time-based window

COUNT_BASEDTIME_BASED
WindowLast N callsCalls in last N seconds
Best forConsistent request rateVariable or bursty traffic
Behaviour under low trafficWindow fills slowly — slower to detect failuresReflects real time — more responsive