News KrakenD EE v2.7: Workflows, enhanced Rate Limiting, Direct WS, and more

Enterprise Documentation

Recent changes

Telemetry and Monitoring through OpenTelemetry

Document updated on Jan 29, 2024

OpenTelemetry (for short OTEL) offers a comprehensive, unified, and vendor-neutral approach to collecting and managing telemetry data, providing enhanced observability and deeper insights into application performance and behavior. It’s particularly beneficial in complex, distributed, and cloud-native environments.

OpenTelemetry captures detailed, contextual information about the operation of your applications. This includes not only metrics but also tracing data that shows the full lifecycle of requests as they flow through your systems, providing insights into performance bottlenecks, latency issues, and error diagnostics.

It supports auto-instrumentation and can be integrated seamlessly into cloud-native deployments, making it easier to monitor these dynamic environments.

Stability note on OpenTelemetry

KrakenD has traditionally offered part of its telemetry integration through the OpenCensus integration, which has provided a reliable service for over six years. We are transitioning to the more modern and robust OpenTelemetry framework, and the OpenCensus integration does not receive further updates.

While the underlying protocol specification of OpenTelemetry is stable, you’ll find mixed stability statuses in the components lifecycle. While we cannot predict what changes there will be as the technology evolves, KrakenD will always do its best to maintain compatibility between versions. More information about the underlying exporter can be found here.

Collecting metrics and traces

The telemetry/opentelemetry component in KrakenD collects the activity generated for the enabled layers and pushes or exposes the data for pulling. There are two ways of publishing metrics:

  • OpenTelemetry protocol (OTLP) - push
  • Prometheus - pull

You can use both simultaneously if needed, and even multiple instances of each.

When you add OpenTelemetry in the configuration, you will have different metrics available.

Prometheus exporter (pull)

Choose the prometheus exporter when you want KrakenD to expose a new port offering a /metrics endpoint. So, an external Prometheus job can connect to a URL like http://krakend:9090/metrics and retrieve all the data.

Prometheus connecting to KrakenD and fetching metrics

See how to configure Prometheus

OTLP exporter (push)

Choose the otlp exporter when you want to push the metrics to a local or remote collector or directly to a SaaS or storage system that supports native OTLP (there is a large number of supported providers). The following diagram represents this idea:

KrakenD to collector, collector to backend

The host where your collector lives can also point to an external load balancer between KrakenD and multiple collectors if needed: KrakenD to load balanced collectors, collectors to backend

Enterprise users can push directly to external storage passing auth credentials using the telemetry/opentelemetry-security component, so the collector is not needed anymore:

opentelemetry-otlp-auth.mmd diagram

This strategy saves a lot of time during the setup of KrakenD.

OpenTelemetry Configuration

To enable OpenTelemetry, you will need a Prometheus or an OTEL Collector (or both) and add the telemetry/opentelemetry namespace at the top level of your configuration.

The configuration of the telemetry/opentelemetry namespace is very extensive, but the two key entries are:

  • exporters, defining the different technologies you will use
  • layers, the amount of data you want to report

The entire configuration is as follows:

Fields of OpenTelemetry
* required fields
exporters  *

object
The places where you will send telemetry data. You can declare multiple exporters even when they are of the same type. For instance, when you have a self-hosted Grafana and would like to migrate to its cloud version and check the double reporting during the transition. There are two families of exporters: otlp or prometheus.
otlp

array
The list of OTLP exporters you want to use. Set at least one object to push metrics and traces to an external collector using OTLP. Each item is an object with the following properties:
disable_metrics

boolean
Disable metrics in this exporter (leaving only traces if any). It won’t report any metrics when the flag is true.
Defaults to false
disable_traces

boolean
Disable traces in this exporter (leaving only metrics if any). It won’t report any metrics when the flag is true.
Defaults to false
host  *

string
The host where you want to push the data. It can be a sidecar or a remote collector.
name  *

string
A unique name to identify this exporter.
Examples: "local_prometheus" , "remote_grafana"
port

integer
A custom port to send the data. The port defaults to 4317 for gRPC unless you enable use_http, which defaults to 4318.
Defaults to 4317
use_http

boolean
Whether this exporter uses HTTP instead of gRPC.
prometheus

array
Set here at least the settings for one Prometheus exporter. Each exporter will start a local port that offers metrics to be pulled from KrakenD. Each item is an object with the following properties:
disable_metrics

boolean
Leave this exporter declared but disabled (useful in development). It won’t report any metrics when the flag is true.
Defaults to false
go_metrics

boolean
Whether you want fine-grained details of Go language metrics or not.
listen_ip

string
The IP address that KrakenD listens to in IPv4 or IPv6. You can, for instance, expose the Prometheus metrics only in a private IP address. An empty string, or no declaration means listening on all interfaces. The inclusion of :: is intended for IPv6 format only (this is not the port). Examples of valid addresses are 192.0.2.1 (IPv4), 2001:db8::68 (IPv6). The values :: and 0.0.0.0 listen to all addresses, which are valid for IPv4 and IPv6 simultaneously.
Examples: "172.12.1.1" , "::1"
Defaults to "0.0.0.0"
name  *

string
A unique name to identify this exporter.
Examples: "local_prometheus" , "remote_grafana"
port

integer
The port in KrakenD where Prometheus will connect to.
Defaults to 9090
process_metrics

boolean
Whether this exporter shows detailed metrics about the running process like CPU or memory usage or not.
layers

object
A request and response flow passes through three different layers. This attribute lets you specify what data you want to export in each layer. All layers are enabled by default unless you declare this section.
backend

object
Reports the activity between KrakenD and each of your backend services. This is the more granular layer.
metrics

object
detailed_connection

boolean
Whether you want to enable detailed metrics for the HTTP connection phase or not. Includes times to connect, DNS querying, and the TLS handshake.
Defaults to false
disable_stage

boolean
Whether to turn off the metrics or not. Setting this to true means stop reporting any data.
Defaults to false
read_payload

boolean
Whether you want to enable metrics for the response reading payload or not (HTTP connection not taken into account).
Defaults to false
round_trip

boolean
Whether you want to enable metrics for the actual HTTP request for the backend or not (manipulation not taken into account). This is the time your backend needs to produce a result.
Defaults to false
static_attributes

array
A list of tags or labels you want to associate with these metrics.
Example: [{"key":"my_metric_attr","value":"my_metric_val"}] Each item is an object with the following properties:
key  *

string
The key, tag, or label name you want to use.
value  *

string
The static value you want to assign to this key.
traces

object
detailed_connection

boolean
Whether you want to add detailed trace attributes for the HTTP connection phase or not. Includes times to connect, DNS querying, and the TLS handshake.
Defaults to false
disable_stage

boolean
Whether to turn off the traces or not. Setting this to true means stop reporting any data.
Defaults to false
read_payload

boolean
Whether you want to add trace attributes for the response reading payload or not (HTTP connection not taken into account).
Defaults to false
report_headers

boolean
Whether you want to report the final headers that reached the backend.
Defaults to false
round_trip

boolean
Whether you want to add trace attributes for the actual HTTP request for the backend or not (manipulation not taken into account). This is the time your backend needs to produce a result.
Defaults to false
static_attributes

array
A list of tags or labels you want to associate to these traces.
Example: [{"key":"my_trace_attr","value":"my_trace_val"}] Each item is an object with the following properties:
key  *

string
The key, tag, or label name you want to use.
value  *

string
The static value you want to assign to this key.
global

object
Reports the activity between end-users and KrakenD
disable_metrics

boolean
Whether you want to disable all metrics happening in the global layer or not.
Defaults to false
disable_propagation

boolean
Whether you want to ignore previous propagation headers to KrakenD. When the flag is set to true, spans from a previous layer will never be linked to the KrakenD trace.
Defaults to false
disable_traces

boolean
Whether you want to disable all traces happening in the global layer or not.
Defaults to false
metrics_static_attributes

array
Static attributes you want to pass for metrics. Each item is an object with the following properties:
key

string
The key of the static attribute you want to send
value

string
The value of the static attribute you want to send
report_headers

boolean
Whether you want to send all headers that the consumer passed in the request or not.
Defaults to false
traces_static_attributes

array
Static attributes you want to pass for traces. Each item is an object with the following properties:
key

string
The key of the static attribute you want to send
value

string
The value of the static attribute you want to send
proxy

object
Reports the activity at the beginning of the proxy layer, including spawning the required requests to multiple backends, merging, endpoint transformation and any other internals of the proxy between the request processing and the backend communication
disable_metrics

boolean
Whether you want to disable all metrics happening in the proxy layer or not.
Defaults to false
disable_traces

boolean
Whether you want to disable all traces happening in the proxy layer or not.
Defaults to false
metrics_static_attributes

array
Static attributes you want to pass for metrics. Each item is an object with the following properties:
key

string
The key of the static attribute you want to send
value

string
The value of the static attribute you want to send
report_headers

boolean
Whether you want to report all headers that passed from the request to the proxy layer (input_headers policy in the endpoint plus KrakenD’s headers).
Defaults to false
traces_static_attributes

array
Static attributes you want to pass for traces. Each item is an object with the following properties:
key

string
The key of the static attribute you want to send
value

string
The value of the static attribute you want to send
metric_reporting_period

integer
How often you want to report and flush the metrics in seconds. This setting is only used by otlp exporters.
Defaults to 30
service_name

string
A friendly name identifying metrics reported by this installation. When unset, it uses the name attribute in the root level of the configuration.
service_version

string
The version you are deploying, this can be useful for deployment tracking.
skip_paths

array
The paths you don’t want to report. Use the literal value used in the endpoint definition, including any {placeholders}. In the global layer, this attribute works only on metrics, because traces are initiated before there is an endpoint to match against. If you do not want any path skipped, just add an array with an empty string [""].
Example: "/foo/{bar}"
Defaults to ["/__health","/__debug/","/__echo/","/__stats/"]
trace_sample_rate

number
The sample rate for traces defines the percentage of reported traces. This option is key to reduce the amount of data generated (and resource usage), while you still can debug and troubleshoot issues. For instance, a number of 0.25 will report a 25% of the traces seen in the system.
Example: 0.25
Defaults to 1

Here’s an example with a Grafana Tempo and a Prometheus.

{
    "version": 3,
    "$schema": "https://www.krakend.io/schema/v2.7/krakend.json",
    "extra_config": {
        "telemetry/opentelemetry": {
            "service_name": "krakend_middle_service",
            "service_version": "commit-sha-ACBDE1234",
            "exporters": {
                "prometheus": [
                    {
                        "name": "my_prometheus",
                        "port": 9092,
                        "listen_ip": "::1",
                        "process_metrics": false,
                        "go_metrics": false
                    }
                ],
                "otlp": [
                    {
                        "name": "local_tempo",
                        "host": "localhost",
                        "port": 4317,
                        "use_http": false
                    }
                ]
            },
            "layers": {
                "global": {
                    "disable_metrics": false,
                    "disable_traces": false,
                    "disable_propagation": false
                },
                "proxy": {
                    "disable_metrics": false,
                    "disable_traces": false
                },
                "backend": {
                    "metrics": {
                        "disable_stage": false,
                        "round_trip": true,
                        "read_payload": true,
                        "detailed_connection": true,
                        "static_attributes": [
                            {
                                "key": "my_metric_attr",
                                "value": "my_middle_metric"
                            }
                        ]
                    },
                    "traces": {
                        "disable_stage": false,
                        "round_trip": true,
                        "read_payload": true,
                        "detailed_connection": true,
                        "static_attributes": [
                            {
                                "key": "my_metric_attr",
                                "value": "my_middle_metric"
                            }
                        ]
                    }
                }
            },
            "skip_paths": [
                "/foo/{bar}"
            ]
        }
    }
}

Examples of integrations

Scarf

Unresolved issues?

The documentation is only a piece of the help you can get! Whether you are looking for Open Source or Enterprise support, see more support channels that can help you.