KrakenD EE v2.10: AI Gateway & Quota management

by Albert Lombarte

KrakenD AI Gateway

With the release of KrakenD Enterprise Edition v2.10, we’re thrilled to unveil a game-changing innovation that marks a new era in API management: AI Gateway functionality. This release includes several powerful enhancements to logging, quota management, observability, and developer tooling to better adopt AI in your company. Let’s dive into the highlights.

The KrakenD AI Gateway Vision

As AI adoption explodes, organizations face pressure to integrate intelligent systems into their architectures without compromising control, cost, or compliance. KrakenD’s AI Gateway addresses this challenge head-on by turning your existing API infrastructure into a powerful, secure, and governable AI delivery platform.

Unlike black-box SaaS offerings, KrakenD offers an on-premise-first, zero-trust-ready solution that puts your engineering team in full control of data, routing, authorization, and observability.

KrakenD AI Gateway does not require an extra license to manage and does not add vendor lock-in. The AI Gateway functionality is natively embedded into KrakenD Enterprise, enabling you to:

Securely route and transform calls to multiple LLMs, such as OpenAI, Claude, Gemini, Mistral, or your own models.
Govern interactions at scale with isolated authorization, rate-limiting, and policy-based enforcement.
Control costs proactively via token quota enforcement, budget tiers, and intelligent vendor routing.
Unify LLM interfaces through configuration and provide vendor abstraction

With KrakenD, you don’t bolt AI onto your system as an afterthought, you extend your API with AI-native capabilities, purpose-built for modern, latency-sensitive, and compliance-heavy environments.

Note: KrakenD does not use AI in its engine, but facilitates the governance of such systems.

Four Pillars of KrakenD’s AI Gateway

Security: Enforce zero-trust AI architectures with isolated authorization flows, hidden API keys, request/response data masking, and outbound exfiltration prevention.
Cost Control: Prevent runaway bills by setting persistent token quotas and enforcing consumption tiers, all baked into the gateway.
Governance: Apply fine-grained control over how users interact with models through prompt validation, multi-LLM routing strategies, reusable prompt templates, and policy enforcement.
Unified LLM Interface (Developer Abstraction): Standardize development across LLMs using vendor-agnostic endpoints and body generators that adapt user payloads to the specific format each model expects, from GPT-4 to Claude or Cohere.

Discover AI Gateway

Stateful Quota Management: Granular API Governance

Another standout capability introduced in KrakenD EEv2.10 is the new Quota component, designed for teams seeking persistent, tier-aware traffic governance. It’s more than just a rate-limiter, this is API monetization and API-as-a-Product strategy.

Quota vs Rate Limit

The purpose of a rate limit is to prevent abuse because it monitors a short period (like a second or minute). In contrast, the purpose of a quota is more closely related to usage control as it monitors a longer period (a day, month, etc.), allowing you to productize your APIs (or control your AI expense)

KrakenD’s quota engine operates with long-term stateful tracking, using Redis to persist counters across nodes and deployments. It enables:

Differentiated service plans (e.g., Free, Gold, Enterprise).
Token-based usage caps for LLMs or metered APIs.
Ingress and egress governance, ensuring external consumption and internal costs stay controlled.

The Quota key features are:

Multi-dimensional enforcement: Apply simultaneous quotas by hour, day, month, and year.
Tier-based policies: Distinct limits for users based on identity, membership, or subscription.
Weighted accounting: Track usage based on dynamic metrics like LLM token cost or API response payloads.
Deployment-survivable state: Unlike in-memory rate limits, quotas survive restarts and are shared across your KrakenD cluster.

Quotas documentation

Enhanced Developer Experience and Operational Tooling

Beyond AI and Quota features, KrakenD EEv2.10 delivers practical improvements for day-to-day operations, debugging, and deployment automation. These updates enhance flexibility, observability, and integration workflows across the board.

Key highlights include:

A new type of plugin, the Middleware plugin, the equivalent to forking the source code to add unexisting components.
A revamped logger for more precise and customizable backend request logging.
Improved startup diagnostics with new flags for the e2e command.
Expanded OpenTelemetry support, including environment tagging and per-exporter metric tuning.
Cleaner OpenAPI spec generation, with reduced noise and better server declarations.
Usability refinements in Lua scripting, CORS logging, and error visibility across components.

Together, these changes reinforce KrakenD’s commitment to being a robust, extensible gateway for modern, distributed systems that grow with your architecture without adding complexity.

🚀 Summary of changes for EEv2.10

AI Gateway and Quotas!

Introducing AI Gateway functionality.
New component for Quota management that allows you to set a stateful counter to manage quotas that survive deployments.
Added a new backend logger providing detailed and customizable logging per backend request.
Added a new type of plugin, the Middleware plugin, the most powerful of all.
The logging component accepts now specifying layouts and formats of the printed fields.
The e2e command has added the flags --ready-url string, --ready-url-wait, and --startup-wait duration to enhance the startup.
Log the original status code from the backend in the application logger when there is an invalid status code.
The audit command has added a new rule recommending limiting the cached content.
Added a new attribute deploy_env to the OTEL component to identify the environment of the deployment (e.g., staging, production, etc.)
When getting errors from a backend, the status code is now logged as a warning in the console with a syntax like WARNING [BACKEND: GET /endpoint/foo -> POST /backend/bar][Client] Status: 403.
Added support for x5t#s256 key identification strategy in JWT Validation
Add an attribute custom_metric_reporting_period to OTEL allowing to set a custom reporting period for a specific exporter, overriding the global one.
OpenAPI generator updated to declare multiple servers at the service level.
Revoker server enhanced to warn about the usage of untracked token_keys.
OpenAPI specs generation now filters component_schemas that are not used in a specific audience, reducing the output of the final output and remove unnecessary components.
Expanded the functions headers() and params() in Lua to work without arguments and return a luaTable.
The body generator includes a new variable {{.req_method}}.
Upgraded Go and dependencies for security CVEs that do not directly impact KrakenD.
Lua - Added missing advanced helpers and error logger decorator to modifier/lua-endpoint.
CORS component: Fix log pump to remove padding and size limit for the debug messages
Fixed override of attributes in the OTEL component taking into account method collision.
Removed the log line shown before registering the logging component containing Parsing configuration file: krakend.json
OTEL client.address now honors the router trusted_proxies setting

Upgrading to the latest version is always advised.

Categories: Product Updates Security

Blog categories

Recent entries