1. Overview

This solution implements a SQL Assistant for Spring Boot that converts natural-language questions into SQL with explicit schema grounding, enforces read-only execution, applies query budgeting, and records a full audit trail covering prompt inputs, generated SQL, policy decisions, execution metadata, and results summaries.

In production, “NL → SQL” approaches fail for predictable reasons:

Ungrounded generation: the model hallucinates tables/columns or misunderstands joins and cardinality without an explicit schema contract.
Unsafe SQL: generated statements may include writes (INSERT/UPDATE/DELETE), DDL, dangerous functions, or multi-statement payloads.
Inadequate enforcement: many systems “ask the model to be safe” instead of enforcing safety at the database boundary.
No traceability: teams cannot answer “who ran what, against which schema, under which policy, and what happened,” which blocks production adoption.
Unbounded costs and load: unconstrained queries can scan large tables or execute expensive joins, harming production systems.

This implementation is production-ready because it:

Uses a two-phase contract: schema grounding + SQL generation with strict output expectations.
Applies deterministic guardrails: SQL parsing + allowlist policy + multi-statement rejection + parameter validation.
Enforces read-only at the database boundary using a dedicated DB role with read-only privileges and transaction-level read-only settings.
Adds budget controls: row limits, statement timeouts, and execution classification (cheap vs expensive) before execution.
Persists an audit-grade execution record including input, normalized SQL fingerprint, policy outcome, runtime, row counts, and error codes.

2. Architecture

System flow:

Client → SQL Assistant API (POST /api/sql/query)
SQL Assistant → Schema Introspector (PostgreSQL catalogs or curated schema table)
SQL Assistant → LLM Provider (OpenAI-compatible API) for SQL generation
SQL Assistant → Guardrails Engine (SQL parser + policy evaluator)
SQL Assistant → PostgreSQL (read-only connection) for query execution
Audit events → PostgreSQL audit tables (query runs, decisions, errors)

Key components:

Query Controller: request validation, auth, idempotency handling.
Schema Service: produces a stable schema “contract” used for grounding (tables, columns, PK/FK hints, row count estimates where available).
LLM SQL Generator: requests SQL with explicit constraints (single statement, SELECT-only, required LIMIT).
Guardrails Engine:
- parse SQL into AST
- enforce single-statement SELECT-only
- denylist risky constructs
- enforce required predicates/limits where configured
- normalize SQL and compute fingerprint
Execution Engine:
- binds parameters safely (no string concatenation)
- sets statement timeout
- sets transaction read-only
- enforces max rows and result truncation for response
Audit Logger: persists full run record and decision trace.

External dependencies:

PostgreSQL (schema + audit + optional sample dataset)
OpenAI-compatible API (SQL generation)
Docker Compose (local runtime)

Trust boundaries:

Client → Service: authenticated boundary; requests are attributed to an actor.
Service → LLM Provider: untrusted output boundary; all generated SQL is treated as untrusted input.
Service → PostgreSQL: privileged boundary; enforced by separate read-only role and transaction policy.

3. Key Design Decisions

Technology stack:

Spring Boot 3.x + Java 17: stable runtime and ecosystem for secure API + DB access.
Spring Security: role enforcement and attribution for audit logs.
PostgreSQL: both the target query engine and the durable audit store.
Docker Compose: reproducible local evidence runs.
SQL parser library (e.g., JSqlParser): required to enforce deterministically at the syntax/AST level rather than regex-based checks.

Data storage model:

Audit is stored in append-only tables with explicit foreign keys where appropriate.
SQL is stored in multiple forms:
- raw generated SQL
- normalized SQL
- a fingerprint (hash) for deduplication and trend analysis
Result bodies are not fully persisted by default; store summaries and optional sampled rows to minimize data exposure.

Synchrony vs asynchrony:

Queries run synchronously for interactive usage. The service enforces timeouts and row caps to keep latency bounded.
An optional asynchronous mode can be added later (job-based execution) for expensive queries.

Error handling and retries:

LLM calls: bounded retries for transient network/5xx errors only.
SQL execution: no retries by default (queries may be non-idempotent at the system level due to load); surface a stable error code and require explicit client retry if desired.
All failures produce an audit record with an error classification.

Idempotency strategy:

Optional Idempotency-Key on the query endpoint:
- if the same actor submits the same key, return the prior run result (or run id) without re-executing.
- idempotency is scoped to actor + key, and optionally includes a request hash to detect mismatched reuse.

4. Data Model

Core tables and purpose:

sql_assistant_runs
- Purpose: primary record of each NL→SQL request and execution attempt.
- Key columns:
  - id (uuid)
  - actor_user_id, actor_email
  - question_text
  - schema_version (or schema_snapshot_id)
  - generated_sql_raw
  - generated_sql_normalized
  - sql_fingerprint (hash)
  - policy_decision (ALLOW, DENY)
  - deny_reason_code (nullable)
  - executed (boolean)
  - duration_ms
  - row_count
  - result_sample_json (nullable, truncated)
  - error_code / error_message_truncated
  - created_at
- Indexing:
  - (actor_user_id, created_at desc)
  - (sql_fingerprint, created_at desc)
  - (policy_decision, created_at desc)
sql_assistant_decisions
- Purpose: detailed policy trace for explainability (which rule fired).
- Key columns:
  - id, run_id
  - rule_name
  - outcome (PASS, FAIL)
  - details_json
  - created_at
- Indexing:
  - (run_id)
schema_snapshots (optional but recommended)
- Purpose: stable grounding input for the LLM and reproducible audits.
- Key columns:
  - id, version, snapshot_json, created_at
- Indexing:
  - (version) unique

Design intent:

Every request yields a run row even if denied before execution.
Decisions are traceable and reproducible, enabling audits and policy tuning without ambiguity.

5. API Surface

Public/internal endpoints:

POST /api/sql/query — Generate SQL from natural language, enforce guardrails, and execute if allowed (ROLE_USER)
- Optional: Idempotency-Key header
GET /api/sql/runs/{id} — Fetch a prior run (question, SQL, decision, execution metadata, sampled results) (ROLE_USER, owner-only)

Admin/ops (optional):

GET /admin/sql/runs?decision=DENY&limit=... — Search runs and denials (ROLE_ADMIN)
GET /admin/sql/rules — List active guardrail rules and configuration (ROLE_ADMIN)

6. Security Model

Authentication:

Spring Security authentication (local Basic auth for runnable mode; replace with platform auth in production).

Authorization (roles):

ROLE_USER can create runs and view their own runs.
ROLE_ADMIN can view global audit and policy outcomes.

Read-only enforcement:

Database uses a dedicated sql_ro role with:
- SELECT privileges on allowed schemas/tables
- no write privileges
Each execution uses:
- SET TRANSACTION READ ONLY
- statement timeout (e.g., SET LOCAL statement_timeout = '3s')
- optional SET LOCAL idle_in_transaction_session_timeout

CSRF considerations:

If cookie/session-based auth is used, keep CSRF enabled on POST endpoints.
If stateless tokens are used, CSRF is not required.

Data isolation guarantees:

Run records are scoped to the authenticated actor (owner-only read endpoints).
Result sampling is truncated and optionally column-filtered to avoid exposing sensitive fields by default.
LLM output is not trusted; it is treated as untrusted input and must pass deterministic enforcement.

7. Operational Behavior

Startup behavior:

Docker Compose starts PostgreSQL and the application.
Flyway applies migrations for audit tables and any sample dataset.
On startup, the schema snapshot can be generated or loaded (configurable).

Failure modes:

LLM unavailable: run is recorded with error_code=LLM_UNAVAILABLE and no execution occurs.
Guardrails deny: run is recorded with policy_decision=DENY and deny_reason_code, no execution occurs.
Query timeout: run is recorded with error_code=QUERY_TIMEOUT.
SQL parse failure: run is recorded with error_code=SQL_PARSE_FAILED.
DB errors (permissions, missing objects): run is recorded with error_code=DB_ERROR.

Retry and timeout behavior:

LLM call retries are bounded and only for transient failures.
SQL execution uses strict statement timeout; retries are not automatic.

Observability hooks:

Structured logs include runId, actor, sql_fingerprint, decision, duration_ms.
Metrics can be added for:
- denials by reason
- execution latency distributions
- timeouts and DB errors
Tracing (optional) can wrap:
- schema snapshot fetch
- LLM generation
- guardrail evaluation
- DB execution

8. Local Execution

Prerequisites:

Docker + Docker Compose v2

Environment variables:

LLM_BASE_URL — OpenAI-compatible base URL
LLM_API_KEY — provider key
LLM_MODEL — model name
SQL_STATEMENT_TIMEOUT_MS — execution timeout
SQL_MAX_ROWS — max rows returned (hard cap)
Local auth credentials (for runnable mode)

Run:

docker compose up -d --build

Health:

curl http://localhost:8080/health

Generate and execute a query:

curl -u user:password -X POST http://localhost:8080/api/sql/query \
  -H 'Content-Type: application/json' \
  -d '{"question":"Top 10 customers by total spend last month"}'

Fetch run details:

RUN_ID="<uuid>"
curl -u user:password http://localhost:8080/api/sql/runs/${RUN_ID}

Verification steps:

Confirm denied writes are blocked:

curl -u user:password -X POST http://localhost:8080/api/sql/query \
  -H 'Content-Type: application/json' \
  -d '{"question":"Delete all orders older than 2020"}'

Confirm audit records exist in PostgreSQL:

docker exec -it <postgres-container> psql -U <user> -d <db> -c "select policy_decision, deny_reason_code, created_at from sql_assistant_runs order by created_at desc limit 10;"

9. Evidence Pack (MANDATORY)

[ ] Service startup logs showing Flyway migrations applied and readiness
[ ] Health endpoint response
[ ] Successful query run:
- request/response logs for POST /api/sql/query
- returned runId
- policy_decision=ALLOW
- row_count and duration_ms populated
[ ] Guardrail denial run:
- request/response logs for a write-intent question
- policy_decision=DENY with deny_reason_code
[ ] SQL parsing enforcement evidence:
- request that induces multi-statement output
- denial with deny_reason_code=MULTI_STATEMENT
[ ] Read-only DB enforcement evidence:
- attempt to execute a write statement (even if bypass attempted) fails at DB permission boundary
- error recorded with stable error_code=DB_READONLY_VIOLATION (or equivalent)
[ ] Query budgeting evidence:
- a deliberately expensive query triggers statement timeout or is denied by policy
- recorded error_code=QUERY_TIMEOUT or deny_reason_code=EXPENSIVE_QUERY
[ ] Database records after execution:
- sql_assistant_runs row for each run
- sql_assistant_decisions rows showing policy trace
[ ] Idempotency evidence (if enabled):
- same Idempotency-Key returns same runId without re-executing

10. Known Limitations

Does not guarantee semantic correctness of generated SQL; correctness is bounded by schema grounding and validation rules.
Does not implement full data classification or column-level masking by default; result sampling must be configured for sensitive datasets.
Single-node deployment model; does not include distributed rate limiting or queue-based execution for heavy workloads.
Advanced query planning safeguards (cost-based planner checks, EXPLAIN gating) are optional extensions and not mandatory in the baseline.

11. Extension Points

Add EXPLAIN-based gating:
- run EXPLAIN (FORMAT JSON) and deny if estimated cost/rows exceed policy thresholds.
Introduce column-level policies:
- deny or mask sensitive columns by schema metadata rules.
Add asynchronous execution:
- queue runs and allow polling for results for long queries.
Add tenant isolation:
- per-tenant schema snapshots, policies, and budgets.
Integrate OpenTelemetry:
- traces for LLM generation, guardrail evaluation, and DB execution.
Add evaluation harness:
- curated NL→SQL test suite with pass/fail and regression tracking for policy tuning.

SQL Assistant for Spring Boot: Guardrails, Read-only Enforcement, and Audit Logs

Business Fit

Enterprise Readiness

Delivery Package

Implementation Notes