Agentic Workflows in Spring Boot: Tool Calling, Idempotency, and Durable Runs

A runnable workflow engine for LLM tool-calling with durable run state, retries, idempotency keys, and human-in-the-loop checkpoints.

Verified v1.1.0 Redhat 8/9 / Ubuntu / macOS / Windows (Docker) Java 17 · Spring Boot 3.x · PostgreSQL · Spring Scheduling · OpenTelemetry · Docker Compose
Register account for free
Unlock full implementation + downloads
Account access required
This solution includes runnable code bundles and full implementation details intended for production use.

1. Overview

This solution implements a runnable “agentic workflow” execution service in Spring Boot that can orchestrate LLM tool-calling while guaranteeing production-grade properties: durable run state, bounded retries with backoff, idempotency keys to prevent duplicate side effects, and human-in-the-loop checkpoints for sensitive actions.

The problem it solves is straightforward: once an LLM is allowed to call tools (HTTP, database writes, internal service calls), naive implementations become operationally unsafe. Typical failure patterns include:

  • Duplicate tool execution when clients retry requests or when the LLM repeats calls after partial failures.
  • Lost or inconsistent workflow state when the process restarts mid-run.
  • Unbounded loops or runaway costs when the model keeps calling tools without constraints.
  • Non-replayable incidents because intermediate decisions, tool inputs/outputs, and retry history are not persisted.
  • “Human approval” steps implemented as ad-hoc blocking logic that breaks under restarts and timeouts.

Existing approaches often fail in production because they treat tool-calling as a synchronous chat completion loop with in-memory state. When the node restarts, state is lost; when a request times out, clients retry and side effects duplicate; when a tool fails transiently, there is no consistent retry semantics or visibility into what happened.

This implementation is production-ready because it treats workflows as durable runs:

  • Every run is persisted with step-level state transitions (queued → running → waiting_for_approval → completed/failed).
  • Tool calls are recorded as first-class events with idempotency protection at the tool boundary.
  • Retries are deterministic and observable, with attempt counters, next-at timestamps, and structured error capture.
  • Human-in-the-loop is modeled as a durable wait state, not a process-level block.
  • End-to-end tracing (OpenTelemetry) correlates API requests, run execution, tool calls, and retries.

2. Architecture

Request flow and dependencies:

  • Client → Spring Boot REST API
  • REST API → PostgreSQL (durable run + step state, outbox events, idempotency records)
  • REST API → LLM Provider (OpenAI-compatible API) for “next action” planning and tool selection
  • Workflow Engine (in-process) → Tool Registry (internal interface) → External Systems (HTTP tools, DB tools, mock tools)
  • Scheduler (Spring Scheduling) → Run Dispatcher → Workflow Engine (process pending work, retries, timeouts)
  • Core Service → OpenTelemetry SDK → OTLP Collector (optional) → Tracing backend (e.g., Jaeger/Tempo/Langfuse-compatible collector)

Key components:

  • Run API: creates runs, queries run status, lists history, handles approval.
  • Workflow Engine: executes deterministic state machine over persisted steps.
  • Planner Adapter: wraps LLM calls and normalizes “tool call” directives and constraints.
  • Tool Registry: maps tool names to strongly-typed implementations; enforces idempotency and timeouts.
  • Run Dispatcher: picks eligible runs/steps from DB (using leases) and executes them.
  • Idempotency Service: stores and checks idempotency keys per tool execution boundary.
  • Outbox/Event Log: optional append-only event records for audits and replay.
  • Observability: structured logs + OTel traces (request/run/step/tool spans).

Trust boundaries:

  • Inbound boundary: client input to REST API (authz + validation).
  • LLM boundary: planner output is untrusted and must be constrained (allowed tools, max steps, schema validation).
  • Tool boundary: all side effects must be guarded (idempotency + authorization + allowlists).
  • Data boundary: run history contains sensitive data; access controlled by roles and tenant scoping.

3. Key Design Decisions

Technology stack

  • Spring Boot 3.x / Java 17: stable operational model, mature observability and security integrations, straightforward packaging for Docker Compose.
  • PostgreSQL: required for durable run state, idempotency ledger, leases, and replayable history; transactional consistency is central to correctness.
  • Spring Scheduling: lightweight dispatcher mechanism for a single-node Compose deployment; avoids introducing a message broker while still enabling asynchronous progress.
  • OpenTelemetry: vendor-neutral tracing, consistent correlation across HTTP, DB, LLM, and tool calls.

Data storage model

  • The system uses a run/step/event model:

    • A run is the durable container (workflow instance).
    • A step is the unit of work (LLM plan, tool execution, approval wait, completion).
    • An event log (optional but included) captures append-only transitions for audit and replay.
  • This model supports:

    • Restart-safe continuation.
    • Precise retry semantics at the step boundary.
    • Deterministic replay and debugging using persisted tool inputs/outputs.

Synchrony vs asynchrony

  • Run creation is synchronous (returns run id immediately).

  • Execution is asynchronous via the scheduler-driven dispatcher:

    • Prevents client timeouts from forcing duplicate work.
    • Allows durable waiting (approvals, backoffs) without tying up threads.
  • Certain endpoints optionally support “wait-for” semantics (polling) but do not drive execution in-process on the request thread by default.

Error handling and retries

  • Tool failures are categorized:

    • Transient (timeouts, 5xx): retry with bounded exponential backoff and jitter.
    • Permanent (4xx, validation, policy): mark step failed without retry.
  • Retries are step-local, with columns tracking attempt count, last error, and next eligible execution time.

  • The dispatcher uses leases to avoid double execution if multiple instances are ever introduced.

Idempotency strategy

  • Idempotency is enforced at two levels:

    1. Run creation idempotency (client-supplied idempotency key): prevents duplicate runs on client retry.
    2. Tool execution idempotency (derived key): prevents duplicate side effects for the same run/step/tool input.
  • The idempotency ledger stores:

    • Key, scope (tenant), tool name, request hash, status, response snapshot, timestamps.
  • Tool calls must check and persist idempotency results in the same transaction that advances step state (or with a strict ordering guaranteeing at-most-once semantics per key).

4. Data Model

Core tables and purpose:

  • workflow_run

    • Purpose: top-level workflow instance and lifecycle.
    • Key columns: id (uuid), tenant_id, status, created_at, updated_at, input_json, max_steps, deadline_at, idempotency_key.
  • workflow_step

    • Purpose: durable unit of execution inside a run.
    • Key columns: id, run_id, seq, type (PLAN|TOOL|APPROVAL|FINAL), status, attempt, next_attempt_at, lease_owner, lease_expires_at, request_json, result_json, error_code, error_detail, started_at, finished_at.
  • tool_execution

    • Purpose: canonical record of tool calls (inputs/outputs) for replay and audit.
    • Key columns: id, run_id, step_id, tool_name, idempotency_key, request_hash, request_json, response_json, status, duration_ms, created_at.
  • idempotency_record

    • Purpose: global ledger to dedupe side effects (run creation and tool calls).
    • Key columns: idempotency_key (pk), tenant_id, scope (RUN_CREATE|TOOL_CALL), status, request_hash, response_json, created_at, updated_at.
  • approval_checkpoint

    • Purpose: human-in-the-loop durable wait state.
    • Key columns: id, run_id, step_id, status (PENDING|APPROVED|REJECTED), requested_by, decided_by, reason, created_at, decided_at.
  • run_event (append-only)

    • Purpose: replayable history and audit trail of transitions.
    • Key columns: id, run_id, step_id, event_type, payload_json, created_at.

Indexing strategy:

  • workflow_run(tenant_id, created_at desc) for listing runs per tenant.

  • workflow_run(idempotency_key, tenant_id) unique constraint to dedupe run creation.

  • workflow_step(run_id, seq) unique to preserve deterministic ordering.

  • Dispatcher hot path indexes:

    • workflow_step(status, next_attempt_at) partial index for status in (QUEUED, RETRY_PENDING) ordered by time.
    • workflow_step(lease_expires_at) to reclaim stuck leases.
  • Idempotency:

    • idempotency_record(tenant_id, scope, idempotency_key) unique / primary key (depending on chosen schema).
    • tool_execution(idempotency_key) unique for tool-level at-most-once.

5. API Surface

  • POST /api/runs – Create a new workflow run; returns run id (ROLE_USER)

    • Supports Idempotency-Key header for run creation dedupe.
  • GET /api/runs/{id} – Get current run state summary (ROLE_USER, tenant-scoped)

  • GET /api/runs/{id}/history – Get step-by-step history including tool inputs/outputs (ROLE_USER, tenant-scoped; redaction applied)

  • POST /api/runs/{id}/approve – Approve a pending checkpoint and resume execution (ROLE_APPROVER)

  • POST /api/runs/{id}/reject – Reject a pending checkpoint and fail/abort run (ROLE_APPROVER)

  • POST /internal/dispatcher/tick – Trigger a dispatcher tick (ROLE_ADMIN; used for deterministic testing)

  • GET /actuator/health – Health check (public or ROLE_ADMIN depending on deployment)

  • GET /admin/runs – List runs across tenants (ROLE_ADMIN)

  • GET /admin/runs/{id} – Admin view including raw events and lease state (ROLE_ADMIN)

6. Security Model

Authentication

  • Spring Security with stateless authentication (e.g., JWT bearer tokens) for API endpoints.
  • For local Compose, a dev profile can enable a fixed test token issuer or basic auth for simplicity while keeping the security model intact.

Authorization (roles)

  • ROLE_USER: create runs, view own tenant runs, read history (with redaction).
  • ROLE_APPROVER: approve/reject checkpoints for the tenant.
  • ROLE_ADMIN: cross-tenant admin endpoints, dispatcher tick, operational views.

Paid access enforcement (if applicable)

  • Enforced at the API layer via:

    • Tenant subscription status stored in DB or resolved via a billing adapter.
    • A request filter that blocks run creation and tool execution when subscription is inactive.
  • The enforcement point must be before run creation and before tool execution (to prevent side effects).

CSRF considerations

  • APIs are stateless and designed for token-based auth; CSRF is disabled for non-browser clients.
  • If browser-based sessions are enabled, restrict cookie-based auth to admin UI endpoints and enable CSRF there only.

Data isolation guarantees

  • Every run is tagged with tenant_id.
  • All queries include tenant predicates except ROLE_ADMIN endpoints.
  • History endpoints apply output redaction policies for tool inputs/outputs (secrets, tokens, PII markers) before returning to ROLE_USER.

7. Operational Behavior

Startup behavior

  • On startup, the service:

    • Runs DB migrations (Flyway/Liquibase).
    • Initializes tool registry and validates allowed tool schemas.
    • Starts the dispatcher scheduler (unless disabled via profile).
    • Emits a startup log line including version, active profiles, DB connectivity, and OTel exporter mode.

Failure modes

  • DB unavailable: fail fast on startup; health becomes unhealthy; no runs executed.
  • LLM provider unavailable: PLAN steps fail transiently and retry until max attempts; run transitions to FAILED when exhausted.
  • Tool failure: step transitions to RETRY_PENDING (transient) or FAILED (permanent).
  • Process restart during execution: leases expire; dispatcher reclaims and resumes from the last durable state.

Retry and timeout behavior

  • Planner (LLM) timeout is bounded; failures are retried with backoff up to planner.maxAttempts.
  • Tool calls have per-tool timeouts and max attempts.
  • Backoff is stored in next_attempt_at to ensure restart-safe scheduling.
  • A run-level deadline_at prevents indefinite execution; once exceeded, remaining steps fail with TIMEOUT.

Observability hooks

  • Structured logs:

    • run_id, step_id, tenant_id, tool_name, attempt, lease_owner, idempotency_key.
  • OpenTelemetry traces:

    • HTTP span for inbound requests.
    • Run execution span (per step) linked via trace/span attributes.
    • Nested spans for LLM calls and tool calls with result status and latency.
  • Metrics (via Micrometer + OTel bridge if desired):

    • Runs created/completed/failed.
    • Step retries, tool error rates, dispatcher lag.
    • Idempotency hits vs misses.

8. Local Execution

Prerequisites

  • Docker Desktop (or Docker Engine) with Compose v2
  • JDK 17 (for running tests locally; container build uses JDK image)
  • Available ports: 8080 (app), 5432 (postgres), 4317 (optional OTLP)

Environment variables

  • SPRING_PROFILES_ACTIVE=local
  • DB_URL=jdbc:postgresql://localhost:5432/workflows
  • DB_USER=workflows
  • DB_PASS=workflows
  • LLM_BASE_URL=http://mock-llm:8081 (or your provider)
  • LLM_API_KEY=... (if using real provider)
  • OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317 (optional)
  • WORKFLOWS_MAX_STEPS=25
  • WORKFLOWS_DEFAULT_DEADLINE_SECONDS=120

Docker Compose usage

docker compose up -d --build

Verification steps

  1. Health:
curl -s http://localhost:8080/actuator/health
  1. Create a run (with idempotency):
curl -s -X POST http://localhost:8080/api/runs \
  -H "Authorization: Bearer <TOKEN>" \
  -H "Idempotency-Key: run-123" \
  -H "Content-Type: application/json" \
  -d '{
    "tenantId": "t-001",
    "input": { "goal": "Create a ticket via tool, requires approval" }
  }'
  1. Poll status:
curl -s http://localhost:8080/api/runs/<RUN_ID> \
  -H "Authorization: Bearer <TOKEN>"
  1. If run is waiting for approval:
curl -s -X POST http://localhost:8080/api/runs/<RUN_ID>/approve \
  -H "Authorization: Bearer <APPROVER_TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{ "reason": "Approved for demo" }'
  1. Confirm history is replayable:
curl -s http://localhost:8080/api/runs/<RUN_ID>/history \
  -H "Authorization: Bearer <TOKEN>"
  1. Idempotency validation (repeat create with same key; should return same run id or a conflict-safe response):
curl -i -X POST http://localhost:8080/api/runs \
  -H "Authorization: Bearer <TOKEN>" \
  -H "Idempotency-Key: run-123" \
  -H "Content-Type: application/json" \
  -d '{ "tenantId": "t-001", "input": { "goal": "Create a ticket via tool, requires approval" } }'

9. Evidence Pack

Checklist of included evidence artifacts proving execution and correctness:

  • Service startup logs showing DB migration completion and dispatcher start

  • Successful POST /api/runs invocation logs including returned run id and Idempotency-Key handling

  • Database records after run creation:

    • workflow_run row
    • initial workflow_step rows
  • Replayable run history output from GET /api/runs/{id}/history demonstrating persisted step transitions

  • Idempotency proof:

    • repeated POST /api/runs with same idempotency key returning same run reference
    • idempotency_record row showing stored response snapshot
  • Retry behavior demonstration:

    • forced transient tool failure logs showing attempts incrementing and next_attempt_at scheduling
    • run eventually completes after retry or fails after max attempts
  • Human-in-the-loop checkpoint proof:

    • run enters WAITING_FOR_APPROVAL
    • approval action transitions run back to runnable state and continues execution
    • approval_checkpoint row showing approver and timestamp
  • OpenTelemetry trace export proof:

    • trace showing correlated spans: inbound request → run execution → planner call → tool call → retry span (if triggered)
  • Test evidence:

    • idempotency unit/integration test output
    • deterministic replay test verifying identical step history from persisted events

10. Known Limitations

  • Single-node dispatcher design by default; horizontal scaling requires distributed locking/leases across multiple instances and careful tuning of lease durations and concurrency.
  • Tool result redaction is policy-based and requires explicit configuration per tool schema; it does not automatically detect all sensitive data.
  • This solution does not provide a full UI for approvals; it exposes API endpoints and an optional minimal admin view only.
  • Exactly-once semantics are scoped to idempotency keys; if external tools are non-idempotent and keys are not enforced at the boundary, side effects can still duplicate.
  • No built-in long-term artifact storage for large tool outputs; payloads are stored as JSON and should be capped or externalized for large binaries.

11. Extension Points

  • Replace the scheduler-based dispatcher with:

    • a queue-driven model (Kafka/RabbitMQ) for higher throughput, while preserving the same run/step persistence and idempotency ledger.
  • Add a “tool gateway” service:

    • isolate high-risk tools behind an internal API with separate authorization and auditing.
  • Introduce multi-tenant scaling:

    • per-tenant concurrency limits, per-tenant rate limiting for LLM calls, and quota enforcement integrated with subscription status.
  • Add stronger determinism and replay:

    • treat planner outputs as immutable events and support “replay with fixed plan” without re-calling the LLM.
  • Production hardening changes:

    • run/step partitioning and retention policies
    • outbox forwarding to an external audit system
    • dedicated tracing backend and sampling strategy
    • secret management (Vault/KMS) and per-tool credential isolation
Changelog
Release notes

1.1.0

Locked
Register account to unlock implementation details and assets.
Account


  • Solution write-up + runnable implementation
  • Evidence images (when published)
  • Code bundle downloads (when enabled)
Evidence
7 item(s)
code-structure-0.png
build-success-1.png
health-status-up-3.png
Create a run-4.png
Poll-5.png
History-7.png
Approve-6.png
Code downloads
2 file(s)
spring-boot-agentic-workflows-idempotency-durable-runs_v1.1.zip
ZIP bundle
Locked
spring-boot-agentic-workflows-idempotency-durable-runs.zip
ZIP bundle
Locked