News KrakenD is Now SOC 2 Type II Certified: Our Commitment to Your Security, Officially Verified

Document updated on May 21, 2025

AI Token Cost Control & Quotas

AI workloads can quickly generate unpredictable and excessive costs. KrakenD’s AI Gateway provides granular token usage monitoring and enforcement to keep your AI expenses transparent and within budget. Features like token quotas, budget alerts, prompt caching, and intelligent routing enable you to optimize requests and avoid surprise bills while maintaining performance and scalability.

Token Quota and Budget Enforcement

KrakenD Enterprise includes a powerful persistent quota system that’s perfect for managing token-based usage quotas in LLM applications, designer for controlling cost, enforcing subscription tiers, and preventing overuse.

The quota system allows you to limit usage per user, client, or endpoint to prevent runaway costs.

See the Quota component for full details.

Here’s a sample of the configuration (see the documentation for all necessary blocks):

{
  "governance/quota": {
    "quota_name": "public_plans",
    "on_unmatched_tier_allow": false,
    "weight_key": "credits_consumed",
    "weight_strategy": "body",
    "tier_key": "X-Level",
    "disable_quota_headers": false,
    "tiers": [
      {
        "rule_name": "rule_gold",
        "tier_value": "gold",
        "tier_value_as": "literal",
        "strategy": "header",
        "key": "X-User-Id"
      },
      {
        "comment": "Special case * that catches any requests not falling into one of the tiers above",
        "rule_name": "rule_bronze",
        "tier_value_as": "*",
        "strategy": "ip"
      }
    ]
  }
}

AI Metrics and reporting

Through OpenTelemetry you can follow all the activity of the gateway, including connections to LLMs. If you want to follow internals like the models used, providers, etc. we recommend you to add tags to telemetry so you have a complete detail on what is going on.

In addition, while there is no API available to generate reporting yet, you can follow real-time token consumption if you connect to the internal Redis database that tracks usage.

Unresolved issues?

The documentation is only a piece of the help you can get! Whether you are looking for Open Source or Enterprise support, see more support channels that can help you.

See all support channels