Document updated on Jan 31, 2025
Caching Strategies
Caching allows you to store backend responses in memory to reduce the number of calls a user sends to the origin, reducing the network traffic and alleviating the pressure on your services.
KrakenD works similarly to the default rules of a CDN to cache responses, as it adheres mostly to the RFC-7234 (HTTP/1.1 Caching) in its implementation, and all the internals follow the decisions based on the RFC.
The caching component is a capability of the default KrakenD HTTP client connecting to your upstream services, and can store content in-memory so the next request that is within a valid expiration window can be returned right away without using the network, alleviating pressure and improving times.
KrakenD does not cache the final content delivered to the end-user (the endpoint output), but the response of the backend (close, but not the same). Its approach is to store individual backend responses rather than aggregated content. In an endpoint with aggregation, you will need to add the cache component to all the backends that need caching individually. If you have manipulations of content after retrieving the content, these are not cached either.
What is cached and for how long?
You can only cache your backend’s GET
methods. If you connect to a backend using POST
, DELETE
, PUT
, or any other unsafe method, the cache is skipped, and nothing stored. The method that counts for caching or not is the one used to connect to the backend, and not the one used in the endpoint, which might be different. For instance, you could still have an endpoint offering a POST
method to the end-user, but if the backend uses a GET
you could cache its response.
When KrakenD receives the response from the backend, checks its headers. The Cache-Control
header sets for how long this content is stored in the cache. See the different headers and values that affect caching below. The cache is stored in a key that contains the final URL sent to the backend plus the combination of any existing Vary
headers.
When responses are returned to users, the memory is filled according to the header directives. Stale content is replaced by fresh content when needed automatically.
Cache configuration
To enable the caching of the backend services you only need to add in the backend
section of your krakend.json
the following:
{
"backend": [{
"url_pattern": "/url-to-cache",
"host": ["http://host-to-cache.example.com"],
"extra_config": {
"qos/http-cache": {}
}
}]
}
When you don’t set any additional parameters, you are creating an individual cache bucket for this backend, with uncapped memory.
A safer option that limits the amount of memory you can set to a backend would be:
{
"backend": [{
"url_pattern": "/url-to-cache",
"host": ["http://host-to-cache.example.com"],
"extra_config": {
"qos/http-cache": {
"@comment": "Allow up to 100 cache entries or ~128MB (bytes not set exactly)",
"shared": false,
"max_items": 100,
"max_size": 128000000
}
}
}]
}
The configuration options are as follows:
Cache options
qos/http-cache
object- Enable in-memory caching for backend responses for as long as its
Cache-Control
header permits. Only safe methods are cached (GET or HEAD). The final backend URL and theVary
headers are used to create the caching key.* Required one of: (max_size
+max_items
) , or (shared
) , or ( )max_items
integer- Enterprise only. To be released on open source on v2.10. The maximum number of items the LRU cache will store for this cache bucket before starting to do evictions. When
max_items
is declared, you must declaremax_size
as well. max_size
integer- Enterprise only. To be released on open source on v2.10. The maximum number of bytes you allow the LRU cache to store for this cache bucket before starting to do evictions. This is not the total cache you allow to the system, but the number of bytes you reserve to this backend (or its
shared
neighbours). will store before starting to do evictions. Whenmax_size
is declared, you must declaremax_items
as well.Examples:104857600
,1048576
shared
boolean- Allows different backend definitions with this flag set to true to reuse the store between them when the request is the same. Otherwise, each backend uses a private cache context that is not accessible by other endpoints. The cache definition is unique for every backend URL +
Vary
header combination.
Notice that max_size
and max_items
must coexist. Either you declare none, or you declare both.
Cache types, cache size and LRU
The caching component allows you to declare how the cache buckets work and their relationship with the hardware. There are two important things to have in mind:
- Capped or Uncapped memory decides what you can do with the resources of the host
- Individual or Shared cache sets if you want to reuse content in different endpoints
Capped vs Uncapped memory
When the memory is capped (you set max_items
and max_size
) you allow every cache bucket to grow up to a defined point. You stablish limits both to the number of entries and the amount of bytes, and enable the LRU algorithm (Least Recently Used). The two parameters are required simultaneously (otherwise it falls back to uncapped memory). In this model, when the maximum capacity is exhausted, new cacheable content replaces the old content that has been least recently used (LRU). This is the safest option.
When the memory is uncapped (you don’t set the attributes max_items
and max_size
), the content does not have any restriction to store content and can use and exhaust all the memory of the system. This option requires you, the developer, to plan how the cache is going to be used and make numbers. Surpassing the memory limit might set the system unstable or even crash it.
Individual vs Shared cache
When the cache is individual (shared
property is false
or missing), every backend definition uses its own cache bucket, and the contents are restricted to the associated backend. It means that other backends or endpoints can’t fetch their contents.
When the cache is shared, instead of creating the cache bucket isolated in the backend, you use a globally available bucket to all endpoints where other backends that need the same cache key can fetch the contents without needing to ask it again.
This idea is represented in the following example:
In the diagram above there are four backend
definitions that could be anywhere across different endpoints. Backends B
and C
query their upstream services and store the content in a cache bucket that is constrained to their private access.
On the other hand, backends A
and D
use a shared cache bucket, so if one of them saves a result the other can fetch the same cache key without needing to go to the origin.
Cache TTL, size, expiration, purge
When you enable the caching module, your backends control the expiration time of the cache by setting the Cache-Control
header. If your backends do not set the header or it is set to zero, KrakenD won’t store any content in its internal cache.
The response’s content size directly impacts KrakenD memory consumption, as KrakenD does not set any hard limit. Therefore, you must be aware of the response sizes, caching times, and the hit rate of the calls.
Finally, KrakenD does not offer any interface to purge the cache. The cache will cleanse itself as defined by the cache-control header, and only a restart of the service would wipe the store entirely. Nevertheless, you can make a few tweaks, as described below.
Headers affecting the cache
As we said, the only possible methods to cache are GET
and HEAD
, but this is only true as long as there isn’t a Range
header in the response (multipart downloads).
The cache takes into account the Vary
header too. When present, it won’t return cached content unless all Vary
headers match the cached ones.
If a recently generated response is already saved in the cache, it will be retrieved without requiring a connection to the server. However, if the saved response is outdated or stale, any validators included in the new request will be used to give the server an opportunity to respond with a NotModified
status. If the server provides this status, then the cached response will be returned.
When storing the content in cache, it takes into account the headers Date
, Etag
and Last-Modified
, and KrakenD will send to the if-none-match
and if-modified-since
accordingly.
If a response includes both an Expires
header and a max-age
directive, the max-age
directive overrides the Expires
header, even if the Expires
header is more restrictive.
The Cache-Control
header honors the time settings and the properties no-store
, only-if-cached
, no-cache
.
Is my content cached?
KrakenD does not provide an explicit mechanism for the developer to keep track of hits or misses or actively manage its entries. For users with advanced needs managing caching who do not want automatic management of entries, we recommend using an advanced caching system such as Varnish Cache. One endpoint might result from several cached components, so the final content might be a sum of caches.
Nevertheless, a simple rule exists to identify whether a response is cached: look at the response time in the access logs.
When the content is generated fresh, you’ll see in the access log that a response takes a few milliseconds or even seconds, depending on your backend load and performance. But when the content is cached, you’ll see times that are at least one order of magnitude smaller (like a few microseconds or even milliseconds). You’ll have no doubt when comparing a fresh call to a cached one.
In addition, there are other ways to check if backend responses are served from the cache, like:
- Checking the traces to notice missing backend calls
- Comparing the latency of the requests in your metrics
- Check your backend logs
Custom directives (Stale cache, freshness)
You can allow your API consumers to pass custom directives (instructions) in their Cache-Control
header when they make the request.
KrakenD can honor the directives max-stale
, max-age
, stale-if-error
, and min-fresh
present in the request when you add the Cache-Control
in the input_headers
endpoint configuration:
{
"endpoint": "/cached-or-not",
"input_headers": ["Cache-Control"]
}
If you don’t allow the input headers to pass, KrakenD ignores the consumer’s request and returns the cached content according to the backend response.
But if you do, if clients are willing to accept stale cache and pass the directive max-stale
(or similarly stale-if-error
) in the request, then a response that has exceeded its expiration time by no more than the specified number of seconds in the directive receives a stale cache.
Overriding the expiration time
As you have seen, the caching module does not accept any parameters to control the cache expiration because it relies on the input headers it finds when the response returns. But as KrakenD can transform data in many ways, you can modify the Cache-Control
header right before the cache module picks it.
The Martian module is the component that can transform the headers from the backend as long as you don’t use the no-op
encoding (as it does not allow manipulation). Let’s illustrate how you can do this in the following example.
{
"version": 3,
"$schema": "https://www.krakend.io/schema/v2.9/krakend.json",
"endpoints": [
{
"endpoint": "/cached",
"backend": [
{
"host": ["http://worldtimeapi.org/"],
"url_pattern": "/api/timezone/Europe/Madrid",
"extra_config": {
"qos/http-cache": {
"@comment": "This API returns a cache-control: max-age=0 so KrakenD won't cache this unless changed"
},
"modifier/martian": {
"header.Modifier": {
"scope": ["response"],
"name": "Cache-Control",
"value": "max-age=60, public",
"@comment": "We will change the max-age policy before KrakenD checks the content for caching. Now content is cached 60 seconds."
}
}
}
}
]
}
]
}
Notice in the configuration above how KrakenD consumed a backend without an actionable cache-control header, but it set a cache of one minute anyway. This technique might be handy when you have little control of the cache headers in the response.