News KrakenD CE v2.6 released with OpenTelemetry

Community Documentation

Recent changes

You are viewing a previous version of KrakenD Community Edition (v2.0) , go to the latest version

The Circuit Breaker

Document updated on May 2, 2021

The Circuit Breaker

The Circuit Breaker is a straightforward state machine in the middle of the request and response that monitors all your backend failures. When they reach a configured threshold, the circuit breaker will prevent sending more traffic to a failing backend.

When KrakenD demands more throughput than your actual API stack can deliver properly, the Circuit Breaker mechanism will detect the failures and prevent stressing your servers by not sending requests that are likely to fail. It is also useful for dealing with network and other communication problems by preventing too many requests to fail due to timeouts, etc.

A must have configuration
The Circuit Breaker is an automatic protection measure for your API stack and avoids cascade failures keeping your API responsive and resilient. It has a small consumption of resources, try to implement it always.

Circuit breaker configuration

The Circuit Breaker is available in the namespace qos/circuit-breaker inside the extra_config key. The following configuration is an example of how to add circuit breaker capabilities to a backend:

{
    "endpoints": [
    {
        "endpoint": "/myendpoint",
        "method": "GET",
        "backend": [
        {
            "host": [
                "http://127.0.0.1:8080"
            ],
            "url_pattern": "/mybackend-endpoint",
            "extra_config": {
                "qos/circuit-breaker": {
                    "interval": 60,
                    "timeout": 10,
                    "max_errors": 1,
                    "name": "cb-myendpoint-1",
                    "log_status_change": true
                }
            }
        }
        ]
    }
    ]
}

The attributes available for the configuration are:

  • interval: (integer) Time window where the errors count, in seconds.
  • timeout: (integer) For how long the circuit breaker will wait before testing again that the backend is healthy.
  • max_errors: (integer) The consecutive number of errors within the interval window to consider the backend unhealthy.
  • name: (string) A friendly name to identify this circuit breaker’s activity in the logs.
  • log_status_change: (boolean) Whether to log the changes of state of this circuit breaker or not.

How it works

The Circuit Breaker retains the state of the connections to your backend (s) over a series of requests and when it sees more than the configured number of consecutive failures (max_errors) in a given time interval (interval) it stops all the interaction with the backend for the next N seconds (the timeout). After waiting for this time window, the system will allow a single connection to trial the system again: if it fails, it will wait N seconds more, and if it succeeds, it will return to the normal state, and the system is considered healthy.

The circuit breaker works with three different internal states, and the easiest way to imagine it is like in an electrical circuit:

Circuit Breaker
Krakend logo
  • CLOSED: This is the normal state. When the circuit is closed, the electricity flows uninterrupted, and the connection to the backend is allowed.
  • OPEN: No connection to the backend is allowed when the circuit is open.
  • HALF-OPEN: When the system has seen repeated problems, only the necessary connection to test the backend is permitted.

And this is the way the states change:

Circuit Breaker transitions
Krakend logo
  • CLOSED: In the initial state, the system is healthy and sending connections to the backend.
  • OPEN: When a consecutive number of supported errors from the backend (max_errors) is exceeded, the system changes to OPEN, and no further connections are sent to the backend. The system will stay in OPEN state for N seconds ( the timeout).
  • HALF-OPEN: After the timeout, it changes to this state and allows one connection to pass. If the connection succeeds, the state changes to CLOSED, and the backend is considered to be healthy again. But if it fails, it switches back to OPEN for another timeout.
Scarf

Unresolved issues?

The documentation is only a piece of the help you can get! Whether you are looking for Open Source or Enterprise support, see more support channels that can help you.

We use cookies to understand how you use our site and to improve your overall experience. By continuing to use our site, you accept our Privacy Policy. More information