Verified AI PLATFORM Advanced

LLM Gateway for Spring Boot: Multi-tenant API Keys, Quotas, and Cost Controls

A production-grade LLM proxy that enforces per-tenant API keys, rate limits, token budgets, caching, and audit logging.

v1.0.0 Redhat 8/9 / Ubuntu / macOS / Windows (Docker) Java 17 · Spring Boot 3.x · Spring Security · PostgreSQL · Redis · OpenTelemetry · Docker Compose
LLM Gateway for Spring Boot: Multi-tenant API Keys, Quotas, and Cost Controls
LinkedIn
Link copied.
Create free account
Unlock implementation details and enabled downloads.
Verified
Java 17 · Spring Boot 3.x · Spring Security · PostgreSQL · Redis · OpenTelemetry · Docker Compose
15 min local run
Code / Evidence / Docs
Included in this product
Full source code package
Docker Compose runnable stack
Verification evidence screenshots
Production implementation notes
Best for
Spring Boot teams building production AI features.
Verified evidence
Execution artifacts included with this product package.
9 item(s)
build-1.png
web-app-startup-2.png
health-status-up-3.png
create-tenant-team-a-4.png
Issue a tenant API key-5.png
Update tenant policy-6.png
Create a free account to unlock the runnable package
Email verification unlocks full implementation notes, runnable source bundles when enabled, and product assets for adaptation.
Source package Full notes Evidence assets

Problem Statement

Teams want to ship LLM features quickly. Operations need control:

  • keys must be scoped and rotated
  • usage must be isolated by tenant and project
  • spend must be predictable with hard caps
  • abuse must be blocked (prompt injection / jailbreak attempts / large payloads)
  • auditability must exist (for incident response + billing + compliance)

Direct-to-provider SDK usage makes this messy and inconsistent. The gateway centralizes controls.

This solution delivers a Spring Boot LLM Gateway that sits between clients and an OpenAI-compatible provider.


Requirements

Runtime

  • Java 17 + Spring Boot 3.x
  • PostgreSQL (policy + audit + usage aggregation)
  • Redis (optional; recommended for high-QPS counters + caching)
  • Docker Compose for local runnable stack

Supported Provider

  • OpenAI-compatible API (configurable base URL + API key)

Architecture

High-Level Flow

Client → Gateway (AuthN/AuthZ) → Policy Engine → Quota/Rate Limit → (Cache?) → Provider → Response Filter → Audit → Client

Core Components

  1. Tenant & Key Management

    • tenant table (id, name, status)
    • api_key table (hashed key, tenant_id, scopes, created_at, last_used_at, rotated_at)
    • key rotation workflow (new key active, old key expires after grace period)
  2. Policy Engine

    • input limits: max prompt bytes, max tokens, allowed models
    • output clamp: max completion tokens
    • deny rules (optional): basic patterns, disallowed content flags, model allowlist
  3. Quota & Cost Controls

    • rate limits: requests/sec, concurrent in-flight
    • budgets: daily/monthly spend caps per tenant/project
    • token budgets: daily/monthly token caps
    • enforcement modes:
      • HARD: reject once limit reached
      • SOFT: degrade (fallback model, lower max tokens)
      • OBSERVE: record but do not block (for rollout)
  4. Caching (Optional)

    • safe caching only for:
      • deterministic temperature=0 calls
      • explicitly cacheable routes
    • key includes tenant_id + model + canonicalized prompt + relevant params
    • TTL by route/policy
    • cache hit/miss recorded
  5. Audit & Evidence Store

    • per-request record: tenant_id, key_id, model, tokens_in/out, latency_ms, policy_decision, outcome
    • redactable prompt/response storage (policy-driven)
    • trace_id for correlation

Data Model (Recommended)

Tables (minimum)

tenants

  • id (uuid)
  • name
  • status (ACTIVE/SUSPENDED)
  • created_at

api_keys

  • id
  • tenant_id
  • key_hash (never store raw key)
  • scopes (json)
  • status
  • created_at
  • last_used_at

policies

  • tenant_id
  • allowed_models (json)
  • max_prompt_bytes
  • max_input_tokens
  • max_output_tokens
  • rate_limit_rps
  • max_inflight
  • daily_budget_usd
  • monthly_budget_usd
  • daily_token_cap
  • monthly_token_cap
  • enforcement_mode (HARD/SOFT/OBSERVE)
  • redact_mode (NONE/BASIC/STRICT)

usage_rollup_daily

  • tenant_id
  • date
  • requests
  • tokens_in
  • tokens_out
  • cost_usd_est
  • blocked_requests

audit_log

  • request_id
  • tenant_id
  • key_id
  • model
  • request_ts
  • latency_ms
  • tokens_in/out
  • cost_usd_est
  • decision (ALLOW/BLOCK/DEGRADE)
  • reason_code
  • trace_id
  • prompt_redacted (optional)
  • response_redacted (optional)

Request Pipeline

1) Authentication

  • accept Authorization: Bearer <tenant_api_key>
  • lookup by hash (constant-time compare)
  • resolve tenant + scopes
  • reject if tenant/key is suspended

2) Input Normalization

  • canonicalize request:
    • model
    • temperature/top_p
    • messages/prompt
    • max_tokens (requested)
  • compute:
    • prompt_bytes
    • estimated tokens_in (approx ok; final tokens from provider response)

3) Policy Decision

  • verify allowed model
  • clamp max_tokens to policy max_output_tokens
  • enforce max_prompt_bytes
  • optional: enforce max_input_tokens (approx or via tokenizer lib)

Decision outcomes:

  • ALLOW
  • DEGRADE (rewrite request: cheaper model, lower max_tokens, stricter params)
  • BLOCK (reject with reason code)

4) Quota Enforcement

Rate limit

  • sliding window / token bucket (Redis recommended)
  • per-tenant and optionally per-key
  • enforce max in-flight with semaphore counter

Budget

  • compute estimated cost for the request using configured price table:
    • input_tokens * price_in + output_tokens * price_out
  • check daily/monthly rollups
  • if limit reached:
    • HARD: reject
    • SOFT: degrade (cheaper model + max_tokens down)
    • OBSERVE: allow, but flag in audit

5) Cache (Optional)

  • only if policy allows caching
  • only if temperature=0 (or explicit allow)
  • record cache hit in audit

6) Provider Call

  • bounded timeout + retry only on safe transient failures
  • propagate trace_id
  • parse provider usage to get actual tokens/cost

7) Response Filtering

  • ensure no provider metadata leaks secrets
  • optionally redact sensitive fields
  • return to client

8) Persist Audit + Update Usage Rollups

  • write audit_log
  • update usage_rollup_daily (transaction or async with idempotency)

Failure Modes & Safeguards

Failure Mode: Provider Outage / 5xx

Mitigation

  • short bounded retries (max 1–2) for transient errors
  • circuit breaker to shed load
  • fallback strategy (optional):
    • alternative provider endpoint
    • lower-capability model
    • cached responses if allowed

Failure Mode: Redis Down (if used)

Mitigation

  • policy setting:
    • HARD FAIL: block requests (strict)
    • SOFT FAIL: allow but log “quota_unavailable” (risky)
  • recommended: degrade throughput if quota store unavailable

Failure Mode: Budget Calculation Mismatch

Mitigation

  • always store provider-reported tokens in audit when available
  • rollups based on actual usage
  • price table versioning

Failure Mode: Key Leakage

Mitigation

  • keys are hashed at rest
  • rotate keys quickly + revoke old key
  • per-key rate limits and anomaly alerting (optional)

Security & Compliance

Prompt/Response Storage

Storage modes:

  • NONE: do not store prompts/responses, only metadata
  • BASIC: store truncated + redacted
  • STRICT: store hashes only, plus small safe excerpts

Controls

  • never log Authorization header
  • redact secrets patterns (API keys, JWTs, emails) before persistence
  • configurable retention (e.g., audit logs 7–30 days, rollups 90+ days)

Cost & Scaling Notes

Scaling Strategy

  • stateless gateway instances behind load balancer
  • Redis for fast counters + caching
  • Postgres for durable audit + rollups
  • async audit write is possible if you include idempotency per request_id

Hot Paths

  • auth lookup (cache key_id in Redis)
  • quota counters (Redis)
  • audit writes (batch insert or async worker if needed)

Cost Controls Best Practices

  • per-tenant monthly cap + per-key RPS cap
  • max output tokens clamp prevents “runaway responses”
  • degrade mode reduces bill shock during traffic spikes

Verification Checklist (Evidence to Publish)

Publish these as evidence artifacts for buyers:

  1. Quota block

    • show tenant hitting daily budget
    • gateway returns 429/402-like error with reason code
  2. Degrade mode

    • show request rewritten to cheaper model + max_tokens reduced
  3. Audit record

    • screenshot/log of audit_log entry with tokens, cost estimate, decision, trace_id
  4. Rate limit

    • load test script showing 429 after threshold
  5. Cache hit

    • demonstrate repeated request served from cache + audit marks cache_hit=true

Run Instructions

  1. docker compose up -d
  2. create tenant + API key (bootstrap script)
  3. call gateway endpoint with Bearer key
  4. observe:
    • audit_log updates
    • usage_rollup_daily increments
  5. test enforcement:
    • set low daily_budget_usd
    • run request loop to trigger block/degrade

Configuration Reference

  • Provider:

    • LLM_BASE_URL
    • LLM_API_KEY
  • Gateway:

    • GATEWAY_PORT
    • AUTH_HEADER=Authorization
  • Policy defaults:

    • DEFAULT_ALLOWED_MODELS
    • DEFAULT_MAX_PROMPT_BYTES
    • DEFAULT_MAX_OUTPUT_TOKENS
    • DEFAULT_RATE_LIMIT_RPS
    • DEFAULT_MONTHLY_BUDGET_USD
    • DEFAULT_ENFORCEMENT_MODE (HARD/SOFT/OBSERVE)
  • Redis:

    • REDIS_URL (optional but recommended)
  • Postgres:

    • SPRING_DATASOURCE_URL
    • SPRING_DATASOURCE_USERNAME
    • SPRING_DATASOURCE_PASSWORD

Changelog Guidance

Record changes that affect:

  • enforcement behavior (HARD/SOFT/OBSERVE)
  • pricing tables / cost estimation
  • schema changes (audit and rollups)
  • caching behavior
  • security/redaction modes
Changelog
Release notes

1.0.0

On this page
Share this product
Link copied.
Free account required
Create an account and verify your email to unlock the runnable package.
Free


  • Solution write-up and runnable implementation
  • Evidence images (when published)
  • Code bundle downloads (when enabled)
Evidence
9 item(s)
build-1.png
web-app-startup-2.png
health-status-up-3.png
create-tenant-team-a-4.png
Issue a tenant API key-5.png
Update tenant policy-6.png
Tenant call-7.png
Admin visibility-8.png
Admin visibility-usage-9.png
Code downloads
2 file(s)
llm-gateway.zip
ZIP bundle
Locked
llm-gateway.zip
ZIP bundle
Locked