Verified AGENT WORKFLOW Advanced

Agentic Workflows in Spring Boot: Tool Calling, Idempotency, and Durable Runs

A runnable workflow engine for LLM tool-calling with durable run state, retries, idempotency keys, and human-in-the-loop checkpoints.

v1.1.0 Redhat 8/9 / Ubuntu / macOS / Windows (Docker) Java 17 · Spring Boot 3.x · PostgreSQL · Spring Scheduling · OpenTelemetry · Docker Compose
Agentic Workflows in Spring Boot: Tool Calling, Idempotency, and Durable Runs
LinkedIn
Link copied.
Create free account
Unlock implementation details and enabled downloads.
Verified
Java 17 · Spring Boot 3.x · PostgreSQL · Spring Scheduling · OpenTelemetry · Docker Compose
15 min local run
Code / Evidence / Docs
Included in this product
Full source code package
Docker Compose runnable stack
Verification evidence screenshots
Production implementation notes
Best for
Spring Boot teams building production AI features.
Verified evidence
Execution artifacts included with this product package.
7 item(s)
code-structure-0.png
build-success-1.png
health-status-up-3.png
Create a run-4.png
Poll-5.png
History-7.png
Create a free account to unlock the runnable package
Email verification unlocks full implementation notes, runnable source bundles when enabled, and product assets for adaptation.
Source package Full notes Evidence assets

1. Overview

This solution implements a runnable “agentic workflow” execution service in Spring Boot that can orchestrate LLM tool-calling while guaranteeing production-grade properties: durable run state, bounded retries with backoff, idempotency keys to prevent duplicate side effects, and human-in-the-loop checkpoints for sensitive actions.

The problem it solves is straightforward: once an LLM is allowed to call tools (HTTP, database writes, internal service calls), naive implementations become operationally unsafe. Typical failure patterns include:

  • Duplicate tool execution when clients retry requests or when the LLM repeats calls after partial failures.
  • Lost or inconsistent workflow state when the process restarts mid-run.
  • Unbounded loops or runaway costs when the model keeps calling tools without constraints.
  • Non-replayable incidents because intermediate decisions, tool inputs/outputs, and retry history are not persisted.
  • “Human approval” steps implemented as ad-hoc blocking logic that breaks under restarts and timeouts.

Existing approaches often fail in production because they treat tool-calling as a synchronous chat completion loop with in-memory state. When the node restarts, state is lost; when a request times out, clients retry and side effects duplicate; when a tool fails transiently, there is no consistent retry semantics or visibility into what happened.

This implementation is production-ready because it treats workflows as durable runs:

  • Every run is persisted with step-level state transitions (queued → running → waiting_for_approval → completed/failed).
  • Tool calls are recorded as first-class events with idempotency protection at the tool boundary.
  • Retries are deterministic and observable, with attempt counters, next-at timestamps, and structured error capture.
  • Human-in-the-loop is modeled as a durable wait state, not a process-level block.
  • End-to-end tracing (OpenTelemetry) correlates API requests, run execution, tool calls, and retries.

2. Architecture

Request flow and dependencies:

  • Client → Spring Boot REST API
  • REST API → PostgreSQL (durable run + step state, outbox events, idempotency records)
  • REST API → LLM Provider (OpenAI-compatible API) for “next action” planning and tool selection
  • Workflow Engine (in-process) → Tool Registry (internal interface) → External Systems (HTTP tools, DB tools, mock tools)
  • Scheduler (Spring Scheduling) → Run Dispatcher → Workflow Engine (process pending work, retries, timeouts)
  • Core Service → OpenTelemetry SDK → OTLP Collector (optional) → Tracing backend (e.g., Jaeger/Tempo/Langfuse-compatible collector)

Key components:

  • Run API: creates runs, queries run status, lists history, handles approval.
  • Workflow Engine: executes deterministic state machine over persisted steps.
  • Planner Adapter: wraps LLM calls and normalizes “tool call” directives and constraints.
  • Tool Registry: maps tool names to strongly-typed implementations; enforces idempotency and timeouts.
  • Run Dispatcher: picks eligible runs/steps from DB (using leases) and executes them.
  • Idempotency Service: stores and checks idempotency keys per tool execution boundary.
  • Outbox/Event Log: optional append-only event records for audits and replay.
  • Observability: structured logs + OTel traces (request/run/step/tool spans).

Trust boundaries:

  • Inbound boundary: client input to REST API (authz + validation).
  • LLM boundary: planner output is untrusted and must be constrained (allowed tools, max steps, schema validation).
  • Tool boundary: all side effects must be guarded (idempotency + authorization + allowlists).
  • Data boundary: run history contains sensitive data; access controlled by roles and tenant scoping.

3. Key Design Decisions

Technology stack

  • Spring Boot 3.x / Java 17: stable operational model, mature observability and security integrations, straightforward packaging for Docker Compose.
  • PostgreSQL: required for durable run state, idempotency ledger, leases, and replayable history; transactional consistency is central to correctness.
  • Spring Scheduling: lightweight dispatcher mechanism for a single-node Compose deployment; avoids introducing a message broker while still enabling asynchronous progress.
  • OpenTelemetry: vendor-neutral tracing, consistent correlation across HTTP, DB, LLM, and tool calls.

Data storage model

  • The system uses a run/step/event model:

    • A run is the durable container (workflow instance).
    • A step is the unit of work (LLM plan, tool execution, approval wait, completion).
    • An event log (optional but included) captures append-only transitions for audit and replay.
  • This model supports:

    • Restart-safe continuation.
    • Precise retry semantics at the step boundary.
    • Deterministic replay and debugging using persisted tool inputs/outputs.

Synchrony vs asynchrony

  • Run creation is synchronous (returns run id immediately).

  • Execution is asynchronous via the scheduler-driven dispatcher:

    • Prevents client timeouts from forcing duplicate work.
    • Allows durable waiting (approvals, backoffs) without tying up threads.
  • Certain endpoints optionally support “wait-for” semantics (polling) but do not drive execution in-process on the request thread by default.

Error handling and retries

  • Tool failures are categorized:

    • Transient (timeouts, 5xx): retry with bounded exponential backoff and jitter.
    • Permanent (4xx, validation, policy): mark step failed without retry.
  • Retries are step-local, with columns tracking attempt count, last error, and next eligible execution time.

  • The dispatcher uses leases to avoid double execution if multiple instances are ever introduced.

Idempotency strategy

  • Idempotency is enforced at two levels:

    1. Run creation idempotency (client-supplied idempotency key): prevents duplicate runs on client retry.
    2. Tool execution idempotency (derived key): prevents duplicate side effects for the same run/step/tool input.
  • The idempotency ledger stores:

    • Key, scope (tenant), tool name, request hash, status, response snapshot, timestamps.
  • Tool calls must check and persist idempotency results in the same transaction that advances step state (or with a strict ordering guaranteeing at-most-once semantics per key).

4. Data Model

Core tables and purpose:

  • workflow_run

    • Purpose: top-level workflow instance and lifecycle.
    • Key columns: id (uuid), tenant_id, status, created_at, updated_at, input_json, max_steps, deadline_at, idempotency_key.
  • workflow_step

    • Purpose: durable unit of execution inside a run.
    • Key columns: id, run_id, seq, type (PLAN|TOOL|APPROVAL|FINAL), status, attempt, next_attempt_at, lease_owner, lease_expires_at, request_json, result_json, error_code, error_detail, started_at, finished_at.
  • tool_execution

    • Purpose: canonical record of tool calls (inputs/outputs) for replay and audit.
    • Key columns: id, run_id, step_id, tool_name, idempotency_key, request_hash, request_json, response_json, status, duration_ms, created_at.
  • idempotency_record

    • Purpose: global ledger to dedupe side effects (run creation and tool calls).
    • Key columns: idempotency_key (pk), tenant_id, scope (RUN_CREATE|TOOL_CALL), status, request_hash, response_json, created_at, updated_at.
  • approval_checkpoint

    • Purpose: human-in-the-loop durable wait state.
    • Key columns: id, run_id, step_id, status (PENDING|APPROVED|REJECTED), requested_by, decided_by, reason, created_at, decided_at.
  • run_event (append-only)

    • Purpose: replayable history and audit trail of transitions.
    • Key columns: id, run_id, step_id, event_type, payload_json, created_at.

Indexing strategy:

  • workflow_run(tenant_id, created_at desc) for listing runs per tenant.

  • workflow_run(idempotency_key, tenant_id) unique constraint to dedupe run creation.

  • workflow_step(run_id, seq) unique to preserve deterministic ordering.

  • Dispatcher hot path indexes:

    • workflow_step(status, next_attempt_at) partial index for status in (QUEUED, RETRY_PENDING) ordered by time.
    • workflow_step(lease_expires_at) to reclaim stuck leases.
  • Idempotency:

    • idempotency_record(tenant_id, scope, idempotency_key) unique / primary key (depending on chosen schema).
    • tool_execution(idempotency_key) unique for tool-level at-most-once.

5. API Surface

  • POST /api/runs – Create a new workflow run; returns run id (ROLE_USER)

    • Supports Idempotency-Key header for run creation dedupe.
  • GET /api/runs/{id} – Get current run state summary (ROLE_USER, tenant-scoped)

  • GET /api/runs/{id}/history – Get step-by-step history including tool inputs/outputs (ROLE_USER, tenant-scoped; redaction applied)

  • POST /api/runs/{id}/approve – Approve a pending checkpoint and resume execution (ROLE_APPROVER)

  • POST /api/runs/{id}/reject – Reject a pending checkpoint and fail/abort run (ROLE_APPROVER)

  • POST /internal/dispatcher/tick – Trigger a dispatcher tick (ROLE_ADMIN; used for deterministic testing)

  • GET /actuator/health – Health check (public or ROLE_ADMIN depending on deployment)

  • GET /admin/runs – List runs across tenants (ROLE_ADMIN)

  • GET /admin/runs/{id} – Admin view including raw events and lease state (ROLE_ADMIN)

6. Security Model

Authentication

  • Spring Security with stateless authentication (e.g., JWT bearer tokens) for API endpoints.
  • For local Compose, a dev profile can enable a fixed test token issuer or basic auth for simplicity while keeping the security model intact.

Authorization (roles)

  • ROLE_USER: create runs, view own tenant runs, read history (with redaction).
  • ROLE_APPROVER: approve/reject checkpoints for the tenant.
  • ROLE_ADMIN: cross-tenant admin endpoints, dispatcher tick, operational views.

Paid access enforcement (if applicable)

  • Enforced at the API layer via:

    • Tenant subscription status stored in DB or resolved via a billing adapter.
    • A request filter that blocks run creation and tool execution when subscription is inactive.
  • The enforcement point must be before run creation and before tool execution (to prevent side effects).

CSRF considerations

  • APIs are stateless and designed for token-based auth; CSRF is disabled for non-browser clients.
  • If browser-based sessions are enabled, restrict cookie-based auth to admin UI endpoints and enable CSRF there only.

Data isolation guarantees

  • Every run is tagged with tenant_id.
  • All queries include tenant predicates except ROLE_ADMIN endpoints.
  • History endpoints apply output redaction policies for tool inputs/outputs (secrets, tokens, PII markers) before returning to ROLE_USER.

7. Operational Behavior

Startup behavior

  • On startup, the service:

    • Runs DB migrations (Flyway/Liquibase).
    • Initializes tool registry and validates allowed tool schemas.
    • Starts the dispatcher scheduler (unless disabled via profile).
    • Emits a startup log line including version, active profiles, DB connectivity, and OTel exporter mode.

Failure modes

  • DB unavailable: fail fast on startup; health becomes unhealthy; no runs executed.
  • LLM provider unavailable: PLAN steps fail transiently and retry until max attempts; run transitions to FAILED when exhausted.
  • Tool failure: step transitions to RETRY_PENDING (transient) or FAILED (permanent).
  • Process restart during execution: leases expire; dispatcher reclaims and resumes from the last durable state.

Retry and timeout behavior

  • Planner (LLM) timeout is bounded; failures are retried with backoff up to planner.maxAttempts.
  • Tool calls have per-tool timeouts and max attempts.
  • Backoff is stored in next_attempt_at to ensure restart-safe scheduling.
  • A run-level deadline_at prevents indefinite execution; once exceeded, remaining steps fail with TIMEOUT.

Observability hooks

  • Structured logs:

    • run_id, step_id, tenant_id, tool_name, attempt, lease_owner, idempotency_key.
  • OpenTelemetry traces:

    • HTTP span for inbound requests.
    • Run execution span (per step) linked via trace/span attributes.
    • Nested spans for LLM calls and tool calls with result status and latency.
  • Metrics (via Micrometer + OTel bridge if desired):

    • Runs created/completed/failed.
    • Step retries, tool error rates, dispatcher lag.
    • Idempotency hits vs misses.

8. Local Execution

Prerequisites

  • Docker Desktop (or Docker Engine) with Compose v2
  • JDK 17 (for running tests locally; container build uses JDK image)
  • Available ports: 8080 (app), 5432 (postgres), 4317 (optional OTLP)

Environment variables

  • SPRING_PROFILES_ACTIVE=local
  • DB_URL=jdbc:postgresql://localhost:5432/workflows
  • DB_USER=workflows
  • DB_PASS=workflows
  • LLM_BASE_URL=http://mock-llm:8081 (or your provider)
  • LLM_API_KEY=... (if using real provider)
  • OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317 (optional)
  • WORKFLOWS_MAX_STEPS=25
  • WORKFLOWS_DEFAULT_DEADLINE_SECONDS=120

Docker Compose usage

docker compose up -d --build

Verification steps

  1. Health:
curl -s http://localhost:8080/actuator/health
  1. Create a run (with idempotency):
curl -s -X POST http://localhost:8080/api/runs \
  -H "Authorization: Bearer <TOKEN>" \
  -H "Idempotency-Key: run-123" \
  -H "Content-Type: application/json" \
  -d '{
    "tenantId": "t-001",
    "input": { "goal": "Create a ticket via tool, requires approval" }
  }'
  1. Poll status:
curl -s http://localhost:8080/api/runs/<RUN_ID> \
  -H "Authorization: Bearer <TOKEN>"
  1. If run is waiting for approval:
curl -s -X POST http://localhost:8080/api/runs/<RUN_ID>/approve \
  -H "Authorization: Bearer <APPROVER_TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{ "reason": "Approved for demo" }'
  1. Confirm history is replayable:
curl -s http://localhost:8080/api/runs/<RUN_ID>/history \
  -H "Authorization: Bearer <TOKEN>"
  1. Idempotency validation (repeat create with same key; should return same run id or a conflict-safe response):
curl -i -X POST http://localhost:8080/api/runs \
  -H "Authorization: Bearer <TOKEN>" \
  -H "Idempotency-Key: run-123" \
  -H "Content-Type: application/json" \
  -d '{ "tenantId": "t-001", "input": { "goal": "Create a ticket via tool, requires approval" } }'

9. Evidence Pack

Checklist of included evidence artifacts proving execution and correctness:

  • Service startup logs showing DB migration completion and dispatcher start

  • Successful POST /api/runs invocation logs including returned run id and Idempotency-Key handling

  • Database records after run creation:

    • workflow_run row
    • initial workflow_step rows
  • Replayable run history output from GET /api/runs/{id}/history demonstrating persisted step transitions

  • Idempotency proof:

    • repeated POST /api/runs with same idempotency key returning same run reference
    • idempotency_record row showing stored response snapshot
  • Retry behavior demonstration:

    • forced transient tool failure logs showing attempts incrementing and next_attempt_at scheduling
    • run eventually completes after retry or fails after max attempts
  • Human-in-the-loop checkpoint proof:

    • run enters WAITING_FOR_APPROVAL
    • approval action transitions run back to runnable state and continues execution
    • approval_checkpoint row showing approver and timestamp
  • OpenTelemetry trace export proof:

    • trace showing correlated spans: inbound request → run execution → planner call → tool call → retry span (if triggered)
  • Test evidence:

    • idempotency unit/integration test output
    • deterministic replay test verifying identical step history from persisted events

10. Known Limitations

  • Single-node dispatcher design by default; horizontal scaling requires distributed locking/leases across multiple instances and careful tuning of lease durations and concurrency.
  • Tool result redaction is policy-based and requires explicit configuration per tool schema; it does not automatically detect all sensitive data.
  • This solution does not provide a full UI for approvals; it exposes API endpoints and an optional minimal admin view only.
  • Exactly-once semantics are scoped to idempotency keys; if external tools are non-idempotent and keys are not enforced at the boundary, side effects can still duplicate.
  • No built-in long-term artifact storage for large tool outputs; payloads are stored as JSON and should be capped or externalized for large binaries.

11. Extension Points

  • Replace the scheduler-based dispatcher with:

    • a queue-driven model (Kafka/RabbitMQ) for higher throughput, while preserving the same run/step persistence and idempotency ledger.
  • Add a “tool gateway” service:

    • isolate high-risk tools behind an internal API with separate authorization and auditing.
  • Introduce multi-tenant scaling:

    • per-tenant concurrency limits, per-tenant rate limiting for LLM calls, and quota enforcement integrated with subscription status.
  • Add stronger determinism and replay:

    • treat planner outputs as immutable events and support “replay with fixed plan” without re-calling the LLM.
  • Production hardening changes:

    • run/step partitioning and retention policies
    • outbox forwarding to an external audit system
    • dedicated tracing backend and sampling strategy
    • secret management (Vault/KMS) and per-tool credential isolation
Changelog
Release notes

1.1.0

On this page
Share this product
Link copied.
Free account required
Create an account and verify your email to unlock the runnable package.
Free


  • Solution write-up and runnable implementation
  • Evidence images (when published)
  • Code bundle downloads (when enabled)
Evidence
7 item(s)
code-structure-0.png
build-success-1.png
health-status-up-3.png
Create a run-4.png
Poll-5.png
History-7.png
Approve-6.png
Code downloads
2 file(s)
spring-boot-agentic-workflows-idempotency-durable-runs_v1.1.zip
ZIP bundle
Locked
spring-boot-agentic-workflows-idempotency-durable-runs.zip
ZIP bundle
Locked