You are viewing a previous version of KrakenD Enterprise Edition (v2.3), go to the latest version

Document updated on Aug 22, 2022

WebSockets

KrakenD Enterprise supports communications using the WebSocket Protocol (RFC-6455) to enable two-way communication between a client to a backend host through the API gateway. This technology aims to provide a mechanism for browser-based applications that need two-way communication with servers that do not rely on opening multiple HTTP connections.

KrakenD has the capability of multiplexing. Each individual end client (e.g., Desktop, Mobile device) establishes a connection with the gateway directly, and KrakenD opens a single channel with the backend host to handle all its connected clients.

The backend decides whether to send the responses to a specific set of client(s) or to broadcast to everyone. This decision depends on the message format

Websockets configuration

The configuration to enable WebSockets is straightforward; the only requirement is to include the websocket namespace at the endpoint level, and that you declare at least one backend host using the ws:// or wss:// schemas.

For each endpoint, KrakenD will open a single connection against one of the hosts. The hosts are load balanced in a Round-Robin fashion but the session once is established is kept permamently.

The flag "disable_host_sanitize": true is also necessary for the backend.

Here there is an example:

{
    "endpoint": "/ws/{room}",
    "input_query_strings": ["*"],
    "input_headers": ["*"],
    "backend": [
        {
            "url_pattern": "/ws",
            "disable_host_sanitize": true,
            "host": [
                "ws://localhost:8081",
                "ws://localhost:8082",
            ]
        }
    ],
    "extra_config": {
        "websocket": {
            "input_headers": [
                "Cookie",
                "Authorization"
            ],
            "connect_event": true,
            "disconnect_event": true,
            "read_buffer_size": 4096,
            "write_buffer_size": 4096,
            "message_buffer_size": 4096,
            "max_message_size": 3200000,
            "write_wait": "10s",
            "pong_wait": "60s",
            "ping_period": "54s",
            "max_retries": 0,
            "backoff_strategy": "exponential"
        }
    }
}

All the fields inside websocket are optional, allowing you to declare an empty object "websocket": {}. The additional options are:

input_headers (list - optional): Defines which input headers are allowed to pass to the backend. Notice that you must also declare the input_headers at the endpoint level.
connect_event (boolean - optional): Notifies in the log when there is the client connect event
disconnect_event (boolean - optional): Notifies in the log when there is a client disconnect event
read_buffer_size (integer - optional): Connections buffer network input and output to reduce the number of system calls when reading messages. You can set the maximum buffer size for reading in bytes. Defaults to 1024 bytes.
write_buffer_size (integer - optional): Connections buffer network input and output to reduce the number of system calls when writing messages. You can set the maximum buffer size for writing in bytes. Defaults to 1024 bytes.
message_buffer_size (integer - optional): Sets the maximum buffer size for messages (in bytes). Defaults to 256.
max_message_size (integer - optional): Sets the maximum size of client messages (in bytes). Defaults to 512.
write_wait (duration - optional): Sets the maximum time KrakenD will wait until the write times out. Defaults to 10s.
pong_wait (duration - optional): Sets the maximum time KrakenD will wait until the pong times out . Defaults to60s.
ping_period (duration - optional): Sets the time between pings to check the system’s health. Defaults to 54s.
max_retries (integer - optional): The maximum number of times you will allow KrakenD to retry reconnecting to a broken system. The absence of parameter, 0, or a negative value means unlimited retries. Older versions used -1 for unlimited.
return_error_details (boolean - optional): Provides an error {"error":"reason here"} to the client when KrakenD was unable to send the message to the backend.
backoff_strategy (string - optional): When the connection to your event source gets interrupted for whatever reason, KrakenD keeps trying to reconnect until it succeeds or until it reaches the max_retries. The backoff strategy defines the delay in seconds in between consecutive failed retries. Defaults to fallback:
- linear: The delay time (d) grows linearly after each failed retry (r) using the formula d = r. E.g., 1st failure retries in 1s, 2nd failure in 2s, 3rd in 3s, and so on.
- linear-jitter: Similar to linear but adds or subtracts a random number: d = r ± random. The randomness prevents all agents connected to a mutual service from retrying simultaneously as all have a slightly different delay. The random number never exceeds ±r*0.33
- exponential: Multiplicatively increase the time between retries using d = 2^r. E.g: 2s, 4s, 8s, 16s…
- exponential-jitter: Same as exponential, but adds or subtracts a random number up to 33% of the value using d = 2^r ± random. This is the preferred strategy when you want to protect the system you are consuming.
- Fallback: When the strategy is missing or none of the above (e.g.:fallback) then it will use constant backoff strategy d=1. Will retry after one second every time.

Broadcast vs. directed communication.

No matter how many clients connect to KrakenD through WebSockets, KrakenD keeps one connection with the backend server per endpoint (as depicted above). So, for instance, you might have 1000 concurrent users in a chat room (endpoint /chat) with 1000 sockets opened against KrakenD, but KrakenD still communicates with your backend using one channel.

Depending on your business type, you might have two different needs to communicate with the users connected to the gateway:

Send a message to all your clients (broadcast)
Send a message only to some user(s) (directed communication)

By default, unrecognized messages received by KrakenD do broadcast to all clients. However, the message format allows you to specify how you would like KrakenD to process your backend responses (broadcast to all connected users or just to a specific session).

Message format

The message format is the mechanism that KrakenD uses to determine if a message needs broadcasting or addressing a specific set of clients.

The recognized message format for bidirectional communication is a JSON object with the following fields:

body: Mandatory. A base64 representation of the content.
session: The session is a filter to determine who is the sender or the receiver of a message.
url: The affected KrakenD endpoint.

KrakenD message for the backend

When the client interacts with KrakenD, the gateway sends messages to the backend with a structure like the following:

{
    "url": "/chat/krakend",
    "session": {
        "uuid": "0b251b07-5611-49e5-b69f-cf2cb8d339d6",
        "Room":"krakend",
    },
    "body": "SGVsbG8gV29ybGQh"
}

The client sent what you can see in the body (“Hello World!” base64 encoded).

The rest of the fields are contextual metadata for your convenience, so the backend can determine who is doing the call and from which originating endpoint.

Essential observations on the example above are:

The body is base64 encoded.
The session contains information about the client making the request. At least you always receive an uuid randomly assigned by KrakenD to the client when a new session starts.
If your endpoint contains placeholders (e.g., as in /chat/{room}), the placeholder parameters are available under session, but using the first letter uppercased. In this example, Room holds the value of {room}).

Backend message for KrakenD

The return from your WS backend needs to use the expected format if you don’t want to broadcast it.

The response body is mandatory, and KrakenD forwards to all connected clients matching the filters you pass. We are talking about broadcasting if you don’t give any filters. The filters are a combination of the session and the url.

If, for instance, you only want to communicate with a specific user, you would have an answer like this one:

{
    "session": { "uuid": "0b251b07-5611-49e5-b69f-cf2cb8d339d6"},
    "body": "SGVsbG8gV29ybGQh"
}

If you want to communicate with all users connected to an endpoint, then you could use:

{
    "url": "/chat/krakend",
    "body": "SGVsbG8gV29ybGQh"
}

And a broadcast:

{
    "body": "SGVsbG8gV29ybGQh"
}

Notice that when the JSON fields url or session exist, the body is sent to the specific subgroup instead of being broadcasted.

Handshaking

The WS protocol consists of an opening handshake followed by basic message framing layered over TCP.

The handshake process is straightforward but necessary for KrakenD to determine if the backend server is alive. KrakenD manages it, which consists of nothing but opening a WebSocket against the backend server and trying to find a response (like a ping/pong). We can simplify it as follows:

krakend => {"msg":"KrakenD WS proxy starting"}
backend => OK

After the successful handshake, KrakenD is ready to start serving and communicating with WS.

Replying with an ‘OK’ is mandatory

The WebSocket server must reply with an OK string. KrakenD requires this string to make sure you are aware that a multiplexed connection requires you to treat an envelope from now on.

Understanding WebSockets logs

The nature of WebSocket connections is that they have kind of a “state” and use a lasting connection. Therefore, there are implications to be aware of when connectivity issues or downtimes arise.

Generally speaking, you can read the different levels of errors as:

WARNING: There are connectivity issues with the backend
ERROR: There are problems renegotiating the connection
CRITICAL: The WebSocket connection is lost for good

WS Issues during startup

The most visible problem of all. If for whatever reason, the WebSocket on the backend server is not available during KrakenD startup, KrakenD won’t start. This event shows in the console with a CRITICAL message like this one:

▶ CRITICAL [SERVICE: Websocket] websocket.Dial ws://localhost:8081/ws: dial tcp [::1]:8081: connect: connection refused

If KrakenD relies on WebSocket communication, ensure that the backends are alive when KrakenD starts. Or otherwise, the gateway won’t.

WS Issues during operation

If, on the other hand, the handshake succeeded, but at a given point in time, the backend server or the network connection with the WS dies, the affected endpoint becomes non-operational.

All clients connected to KrakenD during the downtime of your backend’s WebSocket keep their connection with KrakenD, even though KrakenD cannot pass any data from/to the backend server.

KrakenD will keep retrying a broken connection as defined through max_retries and backoff_strategy. While KrakenD is disconnected from the backend, the log will show WARNING messages when clients demand information from it, for instance:

▶ KRAKEND WARNING: [SERVICE: Websocket][Client] Reading from the connection: EOF

Following the backoff_strategy, KrakenD will keep trying to fix this problem, but for each failed retry, KrakenD will show an ERROR in the log:

▶ KRAKEND ERROR: [SERVICE: Websocket][Client] Unable to renew the connection: websocket.Dial ws://localhost:8888/ws: dial tcp 127.0.0.1:8888: connect: connection refused

While the connection with the backend is retrying, all writes remain in queue.

If you have set a limited number of max_retries (greater than 0), when KrakenD has exhausted all the retries, KrakenD will stop trying, and KrakenD will forget the WebSocket connection. You can see this state with a CRITICAL in the logs.

 ▶ KRAKEND CRITICAL: [SERVICE: Websocket][Client] Unable to reconnect to the backend: websocket.Dial ws://localhost:8888/ws: dial tcp 127.0.0.1:8888: connect: connection refused

In addition, all remaining queued messages will show an error after the critical, as well as new ones:

▶ KRAKEND ERROR: [SERVICE: Websocket] Writing request: empty connection

The client will receive an error too:

{"error":"empty connection"}

At this point, KrakenD stops trying, and you must restart the service. Of course, you can always set max_retries to 0 to keep trying indefinitely.

Example of failing websocket with `max_retries=1`

The following is an example log of a websocket that failed and couldn’t reconnect on the single retry we allowed in the configuration (max_retries=1)

▶ KRAKEND WARNING: [SERVICE: Websocket][Client] Reading from the connection: EOF
▶ KRAKEND ERROR: [SERVICE: Websocket][Client] Unable to renew the connection: websocket.Dial ws://localhost:8888/ws: dial tcp 127.0.0.1:8888: connect: connection refused
▶ KRAKEND CRITICAL: [SERVICE: Websocket][Client] Unable to reconnect to the backend: websocket.Dial ws://localhost:8888/ws: dial tcp 127.0.0.1:8888: connect: connection refused
▶ KRAKEND ERROR: [SERVICE: Websocket] Writing request: empty connection

Enterprise Documentation

WebSockets

Websockets configuration

Broadcast vs. directed communication.

Message format

KrakenD message for the backend

Backend message for KrakenD

Handshaking

Understanding WebSockets logs

WS Issues during startup

WS Issues during operation

Example of failing websocket with `max_retries=1`

Unresolved issues?

Enterprise Documentation

WebSockets

Websockets configuration

Broadcast vs. directed communication.

Message format

KrakenD message for the backend

Backend message for KrakenD

Handshaking

Understanding WebSockets logs

WS Issues during startup

WS Issues during operation

Example of failing websocket with max_retries=1

Unresolved issues?

Example of failing websocket with `max_retries=1`