SQL Assistant for Spring Boot: Guardrails, Read-only Enforcement, and Audit Logs
A runnable assistant that converts natural language questions into SQL with schema grounding, read-only enforcement, and full audit logging.
1. Overview
This solution implements a SQL Assistant for Spring Boot that converts natural-language questions into SQL with explicit schema grounding, enforces read-only execution, applies query budgeting, and records a full audit trail covering prompt inputs, generated SQL, policy decisions, execution metadata, and results summaries.
In production, “NL → SQL” approaches fail for predictable reasons:
- Ungrounded generation: the model hallucinates tables/columns or misunderstands joins and cardinality without an explicit schema contract.
- Unsafe SQL: generated statements may include writes (INSERT/UPDATE/DELETE), DDL, dangerous functions, or multi-statement payloads.
- Inadequate enforcement: many systems “ask the model to be safe” instead of enforcing safety at the database boundary.
- No traceability: teams cannot answer “who ran what, against which schema, under which policy, and what happened,” which blocks production adoption.
- Unbounded costs and load: unconstrained queries can scan large tables or execute expensive joins, harming production systems.
This implementation is production-ready because it:
- Uses a two-phase contract: schema grounding + SQL generation with strict output expectations.
- Applies deterministic guardrails: SQL parsing + allowlist policy + multi-statement rejection + parameter validation.
- Enforces read-only at the database boundary using a dedicated DB role with read-only privileges and transaction-level read-only settings.
- Adds budget controls: row limits, statement timeouts, and execution classification (cheap vs expensive) before execution.
- Persists an audit-grade execution record including input, normalized SQL fingerprint, policy outcome, runtime, row counts, and error codes.
2. Architecture
System flow:
- Client → SQL Assistant API (
POST /api/sql/query) - SQL Assistant → Schema Introspector (PostgreSQL catalogs or curated schema table)
- SQL Assistant → LLM Provider (OpenAI-compatible API) for SQL generation
- SQL Assistant → Guardrails Engine (SQL parser + policy evaluator)
- SQL Assistant → PostgreSQL (read-only connection) for query execution
- Audit events → PostgreSQL audit tables (query runs, decisions, errors)
Key components:
-
Query Controller: request validation, auth, idempotency handling.
-
Schema Service: produces a stable schema “contract” used for grounding (tables, columns, PK/FK hints, row count estimates where available).
-
LLM SQL Generator: requests SQL with explicit constraints (single statement, SELECT-only, required LIMIT).
-
Guardrails Engine:
- parse SQL into AST
- enforce single-statement SELECT-only
- denylist risky constructs
- enforce required predicates/limits where configured
- normalize SQL and compute fingerprint
-
Execution Engine:
- binds parameters safely (no string concatenation)
- sets statement timeout
- sets transaction read-only
- enforces max rows and result truncation for response
-
Audit Logger: persists full run record and decision trace.
External dependencies:
- PostgreSQL (schema + audit + optional sample dataset)
- OpenAI-compatible API (SQL generation)
- Docker Compose (local runtime)
Trust boundaries:
- Client → Service: authenticated boundary; requests are attributed to an actor.
- Service → LLM Provider: untrusted output boundary; all generated SQL is treated as untrusted input.
- Service → PostgreSQL: privileged boundary; enforced by separate read-only role and transaction policy.
3. Key Design Decisions
Technology stack:
- Spring Boot 3.x + Java 17: stable runtime and ecosystem for secure API + DB access.
- Spring Security: role enforcement and attribution for audit logs.
- PostgreSQL: both the target query engine and the durable audit store.
- Docker Compose: reproducible local evidence runs.
- SQL parser library (e.g., JSqlParser): required to enforce deterministically at the syntax/AST level rather than regex-based checks.
Data storage model:
-
Audit is stored in append-only tables with explicit foreign keys where appropriate.
-
SQL is stored in multiple forms:
- raw generated SQL
- normalized SQL
- a fingerprint (hash) for deduplication and trend analysis
-
Result bodies are not fully persisted by default; store summaries and optional sampled rows to minimize data exposure.
Synchrony vs asynchrony:
- Queries run synchronously for interactive usage. The service enforces timeouts and row caps to keep latency bounded.
- An optional asynchronous mode can be added later (job-based execution) for expensive queries.
Error handling and retries:
- LLM calls: bounded retries for transient network/5xx errors only.
- SQL execution: no retries by default (queries may be non-idempotent at the system level due to load); surface a stable error code and require explicit client retry if desired.
- All failures produce an audit record with an error classification.
Idempotency strategy:
-
Optional
Idempotency-Keyon the query endpoint:- if the same actor submits the same key, return the prior run result (or run id) without re-executing.
- idempotency is scoped to actor + key, and optionally includes a request hash to detect mismatched reuse.
4. Data Model
Core tables and purpose:
-
sql_assistant_runs-
Purpose: primary record of each NL→SQL request and execution attempt.
-
Key columns:
id (uuid)actor_user_id,actor_emailquestion_textschema_version(orschema_snapshot_id)generated_sql_rawgenerated_sql_normalizedsql_fingerprint(hash)policy_decision(ALLOW,DENY)deny_reason_code(nullable)executed(boolean)duration_msrow_countresult_sample_json(nullable, truncated)error_code/error_message_truncatedcreated_at
-
Indexing:
(actor_user_id, created_at desc)(sql_fingerprint, created_at desc)(policy_decision, created_at desc)
-
-
sql_assistant_decisions-
Purpose: detailed policy trace for explainability (which rule fired).
-
Key columns:
id,run_idrule_nameoutcome(PASS,FAIL)details_jsoncreated_at
-
Indexing:
(run_id)
-
-
schema_snapshots(optional but recommended)-
Purpose: stable grounding input for the LLM and reproducible audits.
-
Key columns:
id,version,snapshot_json,created_at
-
Indexing:
(version)unique
-
Design intent:
- Every request yields a
runrow even if denied before execution. - Decisions are traceable and reproducible, enabling audits and policy tuning without ambiguity.
5. API Surface
Public/internal endpoints:
-
POST
/api/sql/query— Generate SQL from natural language, enforce guardrails, and execute if allowed (ROLE_USER)- Optional:
Idempotency-Keyheader
- Optional:
-
GET
/api/sql/runs/{id}— Fetch a prior run (question, SQL, decision, execution metadata, sampled results) (ROLE_USER, owner-only)
Admin/ops (optional):
- GET
/admin/sql/runs?decision=DENY&limit=...— Search runs and denials (ROLE_ADMIN) - GET
/admin/sql/rules— List active guardrail rules and configuration (ROLE_ADMIN)
6. Security Model
Authentication:
- Spring Security authentication (local Basic auth for runnable mode; replace with platform auth in production).
Authorization (roles):
ROLE_USERcan create runs and view their own runs.ROLE_ADMINcan view global audit and policy outcomes.
Read-only enforcement:
-
Database uses a dedicated
sql_rorole with:- SELECT privileges on allowed schemas/tables
- no write privileges
-
Each execution uses:
SET TRANSACTION READ ONLY- statement timeout (e.g.,
SET LOCAL statement_timeout = '3s') - optional
SET LOCAL idle_in_transaction_session_timeout
CSRF considerations:
- If cookie/session-based auth is used, keep CSRF enabled on POST endpoints.
- If stateless tokens are used, CSRF is not required.
Data isolation guarantees:
- Run records are scoped to the authenticated actor (owner-only read endpoints).
- Result sampling is truncated and optionally column-filtered to avoid exposing sensitive fields by default.
- LLM output is not trusted; it is treated as untrusted input and must pass deterministic enforcement.
7. Operational Behavior
Startup behavior:
- Docker Compose starts PostgreSQL and the application.
- Flyway applies migrations for audit tables and any sample dataset.
- On startup, the schema snapshot can be generated or loaded (configurable).
Failure modes:
- LLM unavailable: run is recorded with
error_code=LLM_UNAVAILABLEand no execution occurs. - Guardrails deny: run is recorded with
policy_decision=DENYanddeny_reason_code, no execution occurs. - Query timeout: run is recorded with
error_code=QUERY_TIMEOUT. - SQL parse failure: run is recorded with
error_code=SQL_PARSE_FAILED. - DB errors (permissions, missing objects): run is recorded with
error_code=DB_ERROR.
Retry and timeout behavior:
- LLM call retries are bounded and only for transient failures.
- SQL execution uses strict statement timeout; retries are not automatic.
Observability hooks:
-
Structured logs include
runId,actor,sql_fingerprint,decision,duration_ms. -
Metrics can be added for:
- denials by reason
- execution latency distributions
- timeouts and DB errors
-
Tracing (optional) can wrap:
- schema snapshot fetch
- LLM generation
- guardrail evaluation
- DB execution
8. Local Execution
Prerequisites:
- Docker + Docker Compose v2
Environment variables:
LLM_BASE_URL— OpenAI-compatible base URLLLM_API_KEY— provider keyLLM_MODEL— model nameSQL_STATEMENT_TIMEOUT_MS— execution timeoutSQL_MAX_ROWS— max rows returned (hard cap)- Local auth credentials (for runnable mode)
Run:
docker compose up -d --build
Health:
curl http://localhost:8080/health
Generate and execute a query:
curl -u user:password -X POST http://localhost:8080/api/sql/query \
-H 'Content-Type: application/json' \
-d '{"question":"Top 10 customers by total spend last month"}'
Fetch run details:
RUN_ID="<uuid>"
curl -u user:password http://localhost:8080/api/sql/runs/${RUN_ID}
Verification steps:
- Confirm denied writes are blocked:
curl -u user:password -X POST http://localhost:8080/api/sql/query \
-H 'Content-Type: application/json' \
-d '{"question":"Delete all orders older than 2020"}'
- Confirm audit records exist in PostgreSQL:
docker exec -it <postgres-container> psql -U <user> -d <db> -c "select policy_decision, deny_reason_code, created_at from sql_assistant_runs order by created_at desc limit 10;"
9. Evidence Pack (MANDATORY)
-
[ ] Service startup logs showing Flyway migrations applied and readiness
-
[ ] Health endpoint response
-
[ ] Successful query run:
- request/response logs for
POST /api/sql/query - returned
runId policy_decision=ALLOWrow_countandduration_mspopulated
- request/response logs for
-
[ ] Guardrail denial run:
- request/response logs for a write-intent question
policy_decision=DENYwithdeny_reason_code
-
[ ] SQL parsing enforcement evidence:
- request that induces multi-statement output
- denial with
deny_reason_code=MULTI_STATEMENT
-
[ ] Read-only DB enforcement evidence:
- attempt to execute a write statement (even if bypass attempted) fails at DB permission boundary
- error recorded with stable
error_code=DB_READONLY_VIOLATION(or equivalent)
-
[ ] Query budgeting evidence:
- a deliberately expensive query triggers statement timeout or is denied by policy
- recorded
error_code=QUERY_TIMEOUTordeny_reason_code=EXPENSIVE_QUERY
-
[ ] Database records after execution:
sql_assistant_runsrow for each runsql_assistant_decisionsrows showing policy trace
-
[ ] Idempotency evidence (if enabled):
- same
Idempotency-Keyreturns samerunIdwithout re-executing
- same
10. Known Limitations
- Does not guarantee semantic correctness of generated SQL; correctness is bounded by schema grounding and validation rules.
- Does not implement full data classification or column-level masking by default; result sampling must be configured for sensitive datasets.
- Single-node deployment model; does not include distributed rate limiting or queue-based execution for heavy workloads.
- Advanced query planning safeguards (cost-based planner checks, EXPLAIN gating) are optional extensions and not mandatory in the baseline.
11. Extension Points
-
Add EXPLAIN-based gating:
- run
EXPLAIN (FORMAT JSON)and deny if estimated cost/rows exceed policy thresholds.
- run
-
Introduce column-level policies:
- deny or mask sensitive columns by schema metadata rules.
-
Add asynchronous execution:
- queue runs and allow polling for results for long queries.
-
Add tenant isolation:
- per-tenant schema snapshots, policies, and budgets.
-
Integrate OpenTelemetry:
- traces for LLM generation, guardrail evaluation, and DB execution.
-
Add evaluation harness:
- curated NL→SQL test suite with pass/fail and regression tracking for policy tuning.
1.1.0
- Solution write-up + runnable implementation
- Evidence images (when published)
- Code bundle downloads (when enabled)