Document updated on Aug 22, 2022
WebSockets Integration
KrakenD Enterprise supports communications using the WebSocket Protocol (RFC-6455) to enable two-way communication between a client to a backend host through the API gateway. This technology aims to provide a mechanism for browser-based applications that need two-way communication with servers that do not rely on opening multiple HTTP connections.
KrakenD has the capability of multiplexing. Each individual end client (e.g., Desktop, Mobile device) establishes a connection with the gateway directly, and KrakenD opens a single channel with the backend host to handle all its connected clients.
The backend decides whether to send the responses to a specific set of client(s) or to broadcast to everyone. This decision depends on the message format
Websockets configuration
The configuration to enable WebSockets is straightforward; the only requirement is to include the websocket
namespace at the endpoint
level, and that you declare at least one backend host
using the ws://
or wss://
schemas.
For each endpoint
, KrakenD will open a single connection against one of the host
s. The hosts are load balanced in a Round-Robin fashion but the session once is established is kept permamently.
The flag "disable_host_sanitize": true
is also necessary for the backend.
Here there is an example:
{
"endpoint": "/ws/{room}",
"input_query_strings": ["*"],
"input_headers": ["*"],
"backend": [
{
"url_pattern": "/ws",
"disable_host_sanitize": true,
"host": [
"ws://localhost:8081",
"ws://localhost:8082",
]
}
],
"extra_config": {
"websocket": {
"input_headers": [
"Cookie",
"Authorization"
],
"connect_event": true,
"disconnect_event": true,
"read_buffer_size": 4096,
"write_buffer_size": 4096,
"message_buffer_size": 4096,
"max_message_size": 3200000,
"write_wait": "10s",
"pong_wait": "60s",
"ping_period": "54s",
"max_retries": 0,
"backoff_strategy": "exponential"
}
}
}
All the fields inside websocket
are optional, allowing you to declare an empty object "websocket": {}
. The additional options are:
Fields of Schema definition for Websockets
backoff_strategy
- When the connection to your event source gets interrupted for whatever reason, KrakenD keeps trying to reconnect until it succeeds or until it reaches the max_retries. The backoff strategy defines the delay in seconds in between consecutive failed retries. Defaults to ‘fallback’Possible values are:
"linear"
,"linear-jitter"
,"exponential"
,"exponential-jitter"
,"fallback"
Defaults to"fallback"
connect_event
boolean- Whether to send notification events to the backend or not when a user establishes a new Websockets connection.Defaults to
false
disconnect_event
boolean- Whether to send notification events to the backend or not when users disconnect from their Websockets connection.Defaults to
false
input_headers
array- Defines which input headers are allowed to pass to the backend. Notice that you need to declare the
input_headers
at the endpoint level too.Defaults to[]
max_message_size
integer- Sets the maximum size of messages in bytes sent by or returned to the client. Messages larger than this value are discarded by KrakenD and the client disconnected.Defaults to
512
max_retries
integer- The maximum number of times you will allow KrakenD to retry reconnecting to a broken messaging system. Use a value
<= 0
for unlimited retries.Defaults to0
message_buffer_size
integer- Sets the maximum number of messages each client can have in the buffer waiting to be processed. As this is a per-client setting, you must forecast how many consumers of KrakenD websockets you will have. The default value may be too high (memory consumption) if you expect thousands of clients consuming simultaneously.Defaults to
256
ping_period
- Sets the time between pings checking the health of the system.Specify units using
ns
(nanoseconds),us
orµs
(microseconds),ms
(milliseconds),s
(seconds),m
(minutes), orh
(hours).Defaults to"54s"
pong_wait
- Sets the maximum time KrakenD will until the pong times out.Specify units using
ns
(nanoseconds),us
orµs
(microseconds),ms
(milliseconds),s
(seconds),m
(minutes), orh
(hours).Defaults to"60s"
read_buffer_size
integer- Connections buffer network input and output to reduce the number of system calls when reading messages. You can set the maximum buffer size for reading in bytes.Defaults to
1024
return_error_details
boolean- Provides an error
{'error':'reason here'}
to the client when KrakenD was unable to send the message to the backend.Defaults tofalse
write_buffer_size
integer- Connections buffer network input and output to reduce the number of system calls when writing messages. You can set the maximum buffer size for writing in bytes.Defaults to
1024
write_wait
- Sets the maximum time KrakenD will wait until the write times out.Specify units using
ns
(nanoseconds),us
orµs
(microseconds),ms
(milliseconds),s
(seconds),m
(minutes), orh
(hours).Defaults to"10s"
Broadcast vs. directed communication.
No matter how many clients connect to KrakenD through WebSockets, KrakenD keeps one connection with the backend server per endpoint (as depicted above). So, for instance, you might have 1000 concurrent users in a chat room (endpoint /chat
) with 1000 sockets opened against KrakenD, but KrakenD still communicates with your backend using one channel.
Depending on your business type, you might have two different needs to communicate with the users connected to the gateway:
- Send a message to all your clients (broadcast)
- Send a message only to some user(s) (directed communication)
By default, unrecognized messages received by KrakenD do broadcast to all clients. However, the message format allows you to specify how you would like KrakenD to process your backend responses (broadcast to all connected users or just to a specific session).
Message format
The message format is the mechanism that KrakenD uses to determine if a message needs broadcasting or addressing a specific set of clients.
The recognized message format for bidirectional communication is a JSON object with the following fields:
body
: Mandatory. A base64 representation of the content.session
: The session is a filter to determine who is the sender or the receiver of a message.url
: The affected KrakenD endpoint.
KrakenD message for the backend
When the client interacts with KrakenD, the gateway sends messages to the backend with a structure like the following:
{
"url": "/chat/krakend",
"session": {
"uuid": "0b251b07-5611-49e5-b69f-cf2cb8d339d6",
"Room":"krakend",
},
"body": "SGVsbG8gV29ybGQh"
}
The client sent what you can see in the body
(“Hello World!” base64 encoded).
The rest of the fields are contextual metadata for your convenience, so the backend can determine who is doing the call and from which originating endpoint.
Essential observations on the example above are:
- The
body
is base64 encoded. - The
session
contains information about the client making the request. At least you always receive anuuid
randomly assigned by KrakenD to the client when a new session starts. - If your endpoint contains placeholders (e.g., as in
/chat/{room}
), the placeholder parameters are available undersession
, but using the first letter uppercased. In this example,Room
holds the value of{room}
).
Backend message for KrakenD
The return from your WS backend needs to use the expected format if you don’t want to broadcast it.
The response body
is mandatory, and KrakenD forwards to all connected clients matching the filters you pass. We are talking about broadcasting if you don’t give any filters. The filters are a combination of the session
and the url
.
If, for instance, you only want to communicate with a specific user, you would have an answer like this one:
{
"session": { "uuid": "0b251b07-5611-49e5-b69f-cf2cb8d339d6"},
"body": "SGVsbG8gV29ybGQh"
}
If you want to communicate with all users connected to an endpoint, then you could use:
{
"url": "/chat/krakend",
"body": "SGVsbG8gV29ybGQh"
}
And a broadcast:
{
"body": "SGVsbG8gV29ybGQh"
}
Notice that when the JSON fields url
or session
exist, the body
is sent to the specific subgroup instead of being broadcasted.
Handshaking
The WS protocol consists of an opening handshake followed by basic message framing layered over TCP.
The handshake process is straightforward but necessary for KrakenD to determine if the backend server is alive. KrakenD manages it, which consists of nothing but opening a WebSocket against the backend server and trying to find a response (like a ping/pong). We can simplify it as follows:
krakend => {"msg":"KrakenD WS proxy starting"}
backend => OK
After the successful handshake, KrakenD is ready to start serving and communicating with WS.
OK
string. KrakenD requires this string to make sure you are aware that a multiplexed connection requires you to treat an envelope from now on.Backoff strategies
The backoff_strategy
setting defines how KrakenD keeps trying to reconnect to the backend until it succeeds or until it reaches the max_retries
. The backoff strategy defines the delay in seconds in between consecutive failed retries. Defaults to fallback
:
linear
: The delay time (d
) grows linearly after each failed retry (r
) using the formulad = r
. E.g., 1st failure retries in 1s, 2nd failure in 2s, 3rd in 3s, and so on.linear-jitter
: Similar tolinear
but adds or subtracts a random number:d = r ± random
. The randomness prevents all agents connected to a mutual service from retrying simultaneously as all have a slightly different delay. The random number never exceeds±r*0.33
exponential
: Multiplicatively increase the time between retries usingd = 2^r
. E.g:2s
,4s
,8s
,16s
…exponential-jitter
: Same as exponential, but adds or subtracts a random number up to 33% of the value usingd = 2^r ± random
. This is the preferred strategy when you want to protect the system you are consuming.- Fallback: When the strategy is missing or none of the above (e.g.:
fallback
) then it will use constant backoff strategyd=1
. Will retry after one second every time.
Understanding WebSockets logs
The nature of WebSocket connections is that they have kind of a “state” and use a lasting connection. Therefore, there are implications to be aware of when connectivity issues or downtimes arise.
Generally speaking, you can read the different levels of errors as:
WARNING
: There are connectivity issues with the backendERROR
: There are problems renegotiating the connectionCRITICAL
: The WebSocket connection is lost for good
WS Issues during startup
The most visible problem of all. If for whatever reason, the WebSocket on the backend server is not available during KrakenD startup, KrakenD won’t start. This event shows in the console with a CRITICAL
message like this one:
▶ CRITICAL [SERVICE: Websocket] websocket.Dial ws://localhost:8081/ws: dial tcp [::1]:8081: connect: connection refused
If KrakenD relies on WebSocket communication, ensure that the backends are alive when KrakenD starts. Or otherwise, the gateway won’t.
WS Issues during operation
If, on the other hand, the handshake succeeded, but at a given point in time, the backend server or the network connection with the WS dies, the affected endpoint becomes non-operational.
All clients connected to KrakenD during the downtime of your backend’s WebSocket keep their connection with KrakenD, even though KrakenD cannot pass any data from/to the backend server.
KrakenD will keep retrying a broken connection as defined through max_retries
and backoff_strategy
. While KrakenD is disconnected from the backend, the log will show WARNING
messages when clients demand information from it, for instance:
▶ KRAKEND WARNING: [SERVICE: Websocket][Client] Reading from the connection: EOF
Following the backoff_strategy
, KrakenD will keep trying to fix this problem, but for each failed retry, KrakenD will show an ERROR
in the log:
▶ KRAKEND ERROR: [SERVICE: Websocket][Client] Unable to renew the connection: websocket.Dial ws://localhost:8888/ws: dial tcp 127.0.0.1:8888: connect: connection refused
While the connection with the backend is retrying, all writes remain in queue.
If you have set a limited number of max_retries
(greater than 0
), when KrakenD has exhausted all the retries, KrakenD will stop trying, and KrakenD will forget the WebSocket connection. You can see this state with a CRITICAL
in the logs.
▶ KRAKEND CRITICAL: [SERVICE: Websocket][Client] Unable to reconnect to the backend: websocket.Dial ws://localhost:8888/ws: dial tcp 127.0.0.1:8888: connect: connection refused
In addition, all remaining queued messages will show an error after the critical, as well as new ones:
▶ KRAKEND ERROR: [SERVICE: Websocket] Writing request: empty connection
The client will receive an error too:
{"error":"empty connection"}
At this point, KrakenD stops trying, and you must restart the service. Of course, you can always set max_retries
to 0
to keep trying indefinitely.
Another log you can see is:
KRAKEND WARNING: [SERVICE: Websocket][Client] Reading from the connection: websocket: read limit exceeded
When you see the log above is because the client or the backend sent a message larger than permitted by the configuration. The offender will receive a close 1009 (message too big)
followed by a disconnect.
Example of failing websocket with max_retries=1
The following is an example log of a websocket that failed and couldn’t reconnect on the single retry we allowed in the configuration (max_retries=1
)
▶ KRAKEND WARNING: [SERVICE: Websocket][Client] Reading from the connection: EOF
▶ KRAKEND ERROR: [SERVICE: Websocket][Client] Unable to renew the connection: websocket.Dial ws://localhost:8888/ws: dial tcp 127.0.0.1:8888: connect: connection refused
▶ KRAKEND CRITICAL: [SERVICE: Websocket][Client] Unable to reconnect to the backend: websocket.Dial ws://localhost:8888/ws: dial tcp 127.0.0.1:8888: connect: connection refused
▶ KRAKEND ERROR: [SERVICE: Websocket] Writing request: empty connection