Document updated on Jan 29, 2024

Telemetry and Monitoring through OpenTelemetry

OpenTelemetry (for short OTEL) offers a comprehensive, unified, and vendor-neutral approach to collecting and managing telemetry data, providing enhanced observability and deeper insights into application performance and behavior. It’s particularly beneficial in complex, distributed, and cloud-native environments.

OpenTelemetry captures detailed, contextual information about the operation of your applications. This includes not only metrics but also tracing data that shows the full lifecycle of requests as they flow through your systems, providing insights into performance bottlenecks, latency issues, and error diagnostics.

It supports auto-instrumentation and can be integrated seamlessly into cloud-native deployments, making it easier to monitor these dynamic environments.

Stability note on OpenTelemetry

KrakenD has traditionally offered part of its telemetry integration through the OpenCensus integration, which has provided a reliable service for over six years. We are transitioning to the more modern and robust OpenTelemetry framework, and the OpenCensus integration does not receive further updates.

While the underlying protocol specification of OpenTelemetry is stable, you’ll find mixed stability statuses in the components lifecycle. While we cannot predict what changes there will be as the technology evolves, KrakenD will always do its best to maintain compatibility between versions. More information about the underlying exporter can be found here.

Collecting metrics and traces

The telemetry/opentelemetry component in KrakenD collects the activity generated for the enabled layers and pushes or exposes the data for pulling. There are two ways of publishing metrics:

OpenTelemetry protocol (OTLP) - push
Prometheus - pull

You can use both simultaneously if needed, and even multiple instances of each.

When you add OpenTelemetry in the configuration, you will have different metrics available.

Prometheus exporter (pull)

Choose the prometheus exporter when you want KrakenD to expose a new port offering a /metrics endpoint. So, an external Prometheus job can connect to a URL like http://krakend:9090/metrics and retrieve all the data.

Prometheus connecting to KrakenD and fetching metrics

See how to configure Prometheus

OTLP exporter (push)

Choose the otlp exporter when you want to push the metrics to a local or remote collector or directly to a SaaS or storage system that supports native OTLP (there is a large number of supported providers). The following diagram represents this idea:

KrakenD to collector, collector to backend

The host where your collector lives can also point to an external load balancer between KrakenD and multiple collectors if needed: KrakenD to load balanced collectors, collectors to backend

Enterprise users can push directly to external storage passing auth credentials using the telemetry/opentelemetry-security component, so the collector is not needed anymore:

opentelemetry-otlp-auth.mmd diagram

This strategy saves a lot of time during the setup of KrakenD.

OpenTelemetry Configuration

To enable OpenTelemetry, you will need a Prometheus or an OTEL Collector (or both) and add the telemetry/opentelemetry namespace at the top level of your configuration.

The configuration of the telemetry/opentelemetry namespace is very extensive, but the two key entries are:

exporters, defining the different technologies you will use
layers, the amount of data you want to report

The entire configuration is as follows:

Fields of OpenTelemetry

* required fields

`deploy_env` string

The environment you are deploying, this can be useful for deployment tracking. The string can have any value that makes sense to you to identify the running environment.

Examples: "development" , "testing" , "staging" , "production"

Defaults to ""

`exporters` * object

The places where you will send telemetry data. You can declare multiple exporters even when they are of the same type. For instance, when you have a self-hosted Grafana and would like to migrate to its cloud version and check the double reporting during the transition. There are two families of exporters: otlp or prometheus.

`otlp` array

The list of OTLP exporters you want to use. Set at least one object to push metrics and traces to an external collector using OTLP.

Each item of otlp accepts the following properties:

custom_metric_reporting_period integer: Whether you want to override the global metric_reporting_period attribute set for all exporters or not. Value in seconds. A missing attribute, or set to 0 means using whatever value was used in metric_reporting_period at the global level.
Defaults to 0
disable_metrics boolean: Disable metrics in this exporter (leaving only traces if any). It won’t report any metrics when the flag is true.
Defaults to false
disable_traces boolean: Disable traces in this exporter (leaving only metrics if any). It won’t report any metrics when the flag is true.
Defaults to false
host * string: The host where you want to push the data. It can be a sidecar or a remote collector.
name * string: A unique name to identify this exporter.
Examples: "local_prometheus" , "remote_grafana"
port integer: A custom port to send the data. The port defaults to 4317 for gRPC unless you enable use_http, which defaults to 4318.
Defaults to 4317
use_http boolean: Whether this exporter uses HTTP instead of gRPC.

`prometheus` array

Set here at least the settings for one Prometheus exporter. Each exporter will start a local port that offers metrics to be pulled from KrakenD.

Each item of prometheus accepts the following properties:

custom_metric_reporting_period integer: Whether you want to override the global metric_reporting_period attribute set for all exporters or not. Value in seconds. A missing attribute, or set to 0 means using whatever value was used in metric_reporting_period at the global level.
Defaults to 0
disable_metrics boolean: Leave this exporter declared but disabled (useful in development). It won’t report any metrics when the flag is true.
Defaults to false
go_metrics boolean: Whether you want fine-grained details of Go language metrics or not.
listen_ip string: The IP address that KrakenD listens to in IPv4 or IPv6. You can, for instance, expose the Prometheus metrics only in a private IP address. An empty string, or no declaration means listening on all interfaces. The inclusion of :: is intended for IPv6 format only (this is not the port). Examples of valid addresses are 192.0.2.1 (IPv4), 2001:db8::68 (IPv6). The values :: and 0.0.0.0 listen to all addresses, which are valid for IPv4 and IPv6 simultaneously.
Examples: "172.12.1.1" , "::1"
Defaults to "0.0.0.0"
name * string: A unique name to identify this exporter.
Examples: "local_prometheus" , "remote_grafana"
port integer: The port in KrakenD where Prometheus will connect to.
Defaults to 9090
process_metrics boolean: Whether this exporter shows detailed metrics about the running process like CPU or memory usage or not.

`histograms` object

Use an histogram bucket configuration different from the defaults to define the detail of histogram metrics (decrease or increase their size). You don’t need to set this attribute unless you want full control of the histogram definition.

`size_buckets` array

The size of the buckets in bytes you want to use for the histograms.

Defaults to [128,256,512,1024,4096,8192,16384,32768,65536,262144,524288,1048576,4194304,16777216,67108864]

`time_buckets` array

The duration of buckets in seconds you want to use for the histograms.

Defaults to [0.01,0.02,0.05,0.075,0.1,0.125,0.15,0.175,0.2,0.25,0.3,0.35,0.5,0.75,1,1.5,2,3.5,5,10]

`layers` object

A request and response flow passes through three different layers. This attribute lets you specify what data you want to export in each layer. All layers are enabled by default unless you declare this section.

`backend` object

Reports the activity between KrakenD and each of your backend services. This is the more granular layer.

`metrics` object

`traces` object

`global` object

Reports the activity between end-users and KrakenD

`disable_metrics` boolean

Whether you want to disable all metrics happening in the global layer or not.

Defaults to false

`disable_propagation` boolean

Whether you want to ignore previous propagation headers to KrakenD. When the flag is set to true, spans from a previous layer will never be linked to the KrakenD trace.

Defaults to false

`disable_traces` boolean

Whether you want to disable all traces happening in the global layer or not.

Defaults to false

`metrics_static_attributes` array

Static attributes you want to pass for metrics.

`report_headers` boolean

Whether you want to send all headers that the consumer passed in the request or not.

Defaults to false

`semantic_convention`

The semantic convention naming you want to use. The default is an empty string which uses the original naming convention prior to 1.27. For the semantic convention of 1.27 and higher, use 1.27

Possible values are: "" , "1.27"

`traces_static_attributes` array

Static attributes you want to pass for traces.

`proxy` object

Reports the activity at the beginning of the proxy layer, including spawning the required requests to multiple backends, merging, endpoint transformation and any other internals of the proxy between the request processing and the backend communication

`disable_metrics` boolean

Whether you want to disable all metrics happening in the proxy layer or not.

Defaults to false

`disable_traces` boolean

Whether you want to disable all traces happening in the proxy layer or not.

Defaults to false

`metrics_static_attributes` array

Static attributes you want to pass for metrics.

`report_headers` boolean

Whether you want to report all headers that passed from the request to the proxy layer (input_headers policy in the endpoint plus KrakenD’s headers).

Defaults to false

`traces_static_attributes` array

Static attributes you want to pass for traces.

`metric_reporting_period` integer

How often you want to report and flush the metrics in seconds. This setting is only used by otlp exporters.

Defaults to 30

`service_name` string

A friendly name identifying metrics reported by this installation. When unset, it uses the name attribute in the root level of the configuration.

`service_version` string

The version you are deploying, this can be useful for deployment tracking.

`skip_paths` array

The paths you don’t want to report. Use the literal value used in the endpoint definition, including any {placeholders}. In the global layer, this attribute works only on metrics, because traces are initiated before there is an endpoint to match against. If you do not want any path skipped, just add an array with an empty string [""].

Example: "/foo/{bar}"

Defaults to ["/__health","/__debug/","/__echo/","/__stats/"]

`trace_sample_rate` number

The sample rate for traces defines the percentage of reported traces. This option is key to reduce the amount of data generated (and resource usage), while you still can debug and troubleshoot issues. For instance, a number of 0.25 will report a 25% of the traces seen in the system.

Example: 0.25

Defaults to 1

Schema: https://www.krakend.io/schema/v2.10/telemetry/opentelemetry.json

Here’s an example with a Grafana Tempo and a Prometheus.

{
    "version": 3,
    "$schema": "https://www.krakend.io/schema/v2.10/krakend.json",
    "extra_config": {
        "telemetry/opentelemetry": {
            "service_name": "krakend_middle_service",
            "service_version": "commit-sha-ACBDE1234",
            "deploy_env": "production",
            "exporters": {
                "prometheus": [
                    {
                        "name": "my_prometheus",
                        "port": 9092,
                        "listen_ip": "::1",
                        "process_metrics": false,
                        "go_metrics": false
                    }
                ],
                "otlp": [
                    {
                        "name": "local_tempo",
                        "host": "localhost",
                        "port": 4317,
                        "use_http": false
                    }
                ]
            },
            "layers": {
                "global": {
                    "disable_metrics": false,
                    "disable_traces": false,
                    "disable_propagation": false
                },
                "proxy": {
                    "disable_metrics": false,
                    "disable_traces": false
                },
                "backend": {
                    "metrics": {
                        "disable_stage": false,
                        "round_trip": true,
                        "read_payload": true,
                        "detailed_connection": true,
                        "static_attributes": [
                            {
                                "key": "my_metric_attr",
                                "value": "my_middle_metric"
                            }
                        ]
                    },
                    "traces": {
                        "disable_stage": false,
                        "round_trip": true,
                        "read_payload": true,
                        "detailed_connection": true,
                        "static_attributes": [
                            {
                                "key": "my_metric_attr",
                                "value": "my_middle_metric"
                            }
                        ]
                    }
                }
            },
            "skip_paths": [
                "/foo/{bar}"
            ]
        }
    }
}

Examples of integrations

Push metrics to InfluxDB
Pull metrics from Prometheus
Push metrics to Datadog
Push metrics to Zipkin
Push metrics to Jaeger
Push metrics to Azure Monitor

Enterprise Documentation

Telemetry and Monitoring through OpenTelemetry

Collecting metrics and traces

Prometheus exporter (pull)

OTLP exporter (push)

OpenTelemetry Configuration

Fields of OpenTelemetry

deploy_env string

exporters * object

otlp array

custom_metric_reporting_period integer

disable_metrics boolean

disable_traces boolean

host * string

name * string

port integer

use_http boolean

prometheus array

custom_metric_reporting_period integer

disable_metrics boolean

go_metrics boolean

listen_ip string

name * string

port integer

process_metrics boolean

histograms object

size_buckets array

time_buckets array

layers object

backend object

metrics object Show properties

detailed_connection boolean

disable_stage boolean

read_payload boolean

round_trip boolean

static_attributes array

key * string

value * string

traces object Show properties

detailed_connection boolean

disable_stage boolean

read_payload boolean

report_headers boolean

round_trip boolean

static_attributes array

key * string

value * string

global object

disable_metrics boolean

disable_propagation boolean

disable_traces boolean

metrics_static_attributes array Show properties

key string

value string

report_headers boolean

semantic_convention

traces_static_attributes array Show properties

key string

value string

proxy object

disable_metrics boolean

disable_traces boolean

metrics_static_attributes array Show properties

key string

value string

report_headers boolean

traces_static_attributes array Show properties

key string

value string

metric_reporting_period integer

service_name string

service_version string

skip_paths array

trace_sample_rate number

Examples of integrations

Unresolved issues?

`deploy_env` string

`exporters` * object

`otlp` array

`custom_metric_reporting_period` integer

`disable_metrics` boolean

`disable_traces` boolean

`host` * string

`name` * string

`port` integer

`use_http` boolean

`prometheus` array

`custom_metric_reporting_period` integer

`disable_metrics` boolean

`go_metrics` boolean

`listen_ip` string

`name` * string

`port` integer

`process_metrics` boolean

`histograms` object

`size_buckets` array

`time_buckets` array

`layers` object

`backend` object

`metrics` object

`detailed_connection` boolean

`disable_stage` boolean

`read_payload` boolean

`round_trip` boolean

`static_attributes` array

`key` * string

`value` * string

`traces` object

`detailed_connection` boolean

`disable_stage` boolean

`read_payload` boolean

`report_headers` boolean

`round_trip` boolean

`static_attributes` array

`key` * string

`value` * string

`global` object

`disable_metrics` boolean

`disable_propagation` boolean

`disable_traces` boolean

`metrics_static_attributes` array

`key` string

`value` string

`report_headers` boolean

`semantic_convention`

`traces_static_attributes` array

`key` string

`value` string

`proxy` object

`disable_metrics` boolean

`disable_traces` boolean

`metrics_static_attributes` array

`key` string

`value` string

`report_headers` boolean

`traces_static_attributes` array

`key` string

`value` string

`metric_reporting_period` integer

`service_name` string

`service_version` string

`skip_paths` array

`trace_sample_rate` number