Document updated on Dec 5, 2022
Namespace | qos/circuit-breaker |
---|---|
Log prefix | [BACKEND: /foo][CB] |
Scope | backend |
Source | krakend/krakend-circuitbreaker |
The Circuit Breaker is a straightforward state machine in the middle of the request and response that monitors all your backend failures. When they reach a configured threshold, the circuit breaker will prevent sending more traffic to a failing backend alleviating its pressure under challenging conditions.
When KrakenD demands more throughput than your actual API stack can deliver properly, the Circuit Breaker mechanism will detect the failures and prevent stressing your servers by not sending requests that are likely to fail. It is also helpful for dealing with network and other communication problems by preventing too many requests from dying due to timeouts, etc.
It is important to remark that the number of maximum errors are consecutive errors, and not the total of errors in the period. This approach works better when your traffic is variable, as it’s based on a probabilistic pattern and it’s not affected by the volume you might have.
The Circuit Breaker is available in the namespace qos/circuit-breaker
inside the extra_config
key. The following configuration is an example of how to add circuit breaker capabilities to a backend:
{
"endpoints": [
{
"endpoint": "/myendpoint",
"method": "GET",
"backend": [
{
"host": [
"http://127.0.0.1:8080"
],
"url_pattern": "/mybackend-endpoint",
"extra_config": {
"qos/circuit-breaker": {
"interval": 60,
"timeout": 10,
"max_errors": 1,
"name": "cb-myendpoint-1",
"log_status_change": true
}
}
}
]
}
]
}
The attributes available for the configuration are:
| Time window where the errors count, in seconds. |
| Whether to log the changes of state of this circuit breaker or not. Defaults to false |
| The consecutive number of errors within the interval window to consider the backend unhealthy. An error is any response without a success (20x) status code or no response. |
| A friendly name to follow this circuit breaker’s activity in the logs. Example: "cb-backend-1" |
| For how many seconds the circuit breaker will wait before testing again if the backend is healthy. |
It’s easy to picture the state of the circuit breaker as an electrical component, where an open circuit means no flow of electricity between the ends, and a closed one normal flow:
The Circuit Breaker starts with the CLOSED
state, meaning the electricty can flow to the backends as they are considered healthy (innocent until proven guilty).
Then the component watches the state of the connections with your backend(s), with a tolerance to consecutive failures (max_errors
) during a time interval (interval
). it stops all the interaction with the backend for the next N seconds (the timeout
). We call this state OPEN
.
After waiting for this time window, the state changes to HALF-OPEN
and allows a single connection to pass and test the system again. At this stage:
This is the way the states change:
Circuit Breaker transitions |
---|
CLOSED
: In the initial state, the system is healthy and sending connections to the backend.OPEN
: When a consecutive number of supported errors from the backend (max_errors
) is exceeded, the system changes to OPEN
, and no further connections are sent to the backend. The system will stay in OPEN
state for N seconds ( the timeout
).HALF-OPEN
: After the timeout, it changes to this state and allows one connection to pass. If the connection succeeds, the state changes to CLOSED
, and the backend is considered to be healthy again. But if it fails, it switches back to OPEN
for another timeout.A failure that counts for the circuit breaker could be anything that prevents having a successful connection with the service. There is a small difference in behavior when you use the circuit breaker with no-op
encoding vs. the rest of the encodings.
Regardless of the encoding, the Circuit Breaker will react to:
qos/ratelimit/proxy
)modifier/lua-backend
)validation/cel
)backend/lambda
)In addition, when you work with json
, or any other encoding different than no-op
, the gateway also checks the HTTP responses back from the backend and marks as failures:
200
or 201
(including client credentials)The documentation is only a piece of the help you can get! Whether you are looking for Open Source or Enterprise support, see more support channels that can help you.