1. Overview

This solution implements an explainable recommendation service that combines vector similarity (embeddings) with deterministic business rules to produce stable, production-suitable rankings. It addresses the common gap between “embedding-only” recommendations (high recall but hard to control) and “rules-only” personalization (predictable but low relevance and hard to generalize).

Pure vector similarity approaches often fail in production because they are difficult to constrain: they can surface disallowed content, violate inventory or eligibility constraints, and produce unstable ordering when similarity scores are close. They also tend to be non-explainable to stakeholders beyond “the model said so,” which complicates debugging, auditing, and governance. Pure business-rule ranking, conversely, becomes brittle as catalogs grow and user intent becomes multi-dimensional.

This implementation is production-ready in the sense that it:

Separates eligibility filtering (hard rules) from ranking (soft scoring) and produces a structured explanation of the final ordering.
Uses PostgreSQL + pgvector for low-latency similarity search with predictable operational characteristics and transactional data consistency.
Implements timeouts, retries (where safe), and explicit failure behavior for the embedding provider and the database.
Produces traceable ranking breakdowns (score components, applied rules, and tie-breakers) for debugging and auditability.

2. Architecture

Request flow and components:

Client → Recommendation API (Spring Boot REST)
Recommendation API → AuthN/AuthZ layer (Spring Security)
Recommendation API → Eligibility Filter (business rules engine/module)
Recommendation API → Embedding Client (OpenAI-compatible embedding API)
Recommendation API → PostgreSQL (pgvector) for similarity search and metadata lookup
Recommendation API → Ranker (score aggregation + deterministic tie-breakers)
Recommendation API → Audit/Explain Log (structured logs; optional DB persistence)

External dependencies:

PostgreSQL with pgvector extension enabled (vector index + transactional catalog state).
Embedding API (OpenAI-compatible) for generating embeddings for query/user/context vectors.
Docker Compose for local single-node deployment.

Trust boundaries:

Untrusted: client inputs (user identifiers, query text, context signals).
Semi-trusted: embedding provider responses (validated for shape and length; guarded by timeouts).
Trusted: internal service components and database on the private Docker network.
Boundary enforcement: authentication at the API; authorization at endpoint and rule scope; least-privilege DB credentials.

3. Key Design Decisions

Technology stack

Java 17 + Spring Boot 3.x: standard enterprise runtime, predictable performance, mature ecosystem for security, configuration, and observability.
PostgreSQL + pgvector: avoids introducing a separate vector database for the single-node deployment model. Keeps item metadata and embeddings in one transactional store, simplifying consistency and operational overhead.
Docker Compose: aligns with single-node reproducible execution and platform packaging.

Data storage model

Store item embeddings alongside item metadata and eligibility fields in PostgreSQL.
Store “explainability outputs” (ranking breakdowns) in structured logs by default; optionally persist to a table for audit/compliance use-cases (configurable).
Use immutable recommendation result identifiers to correlate requests, logs, and persisted explanations.

Synchrony vs asynchrony

The primary recommendation request is synchronous to meet typical online serving requirements (low latency, immediate response).
Optional asynchronous paths are supported conceptually (not required for local run): embedding precomputation for items and periodic refresh jobs. For the baseline runnable system, item embeddings are assumed to be preloaded or ingested via an admin endpoint.

Error handling and retries

Embedding calls use strict timeouts and bounded retries with jitter only on transient failures (network timeouts, 5xx). Non-retryable failures (4xx, invalid payloads) fail fast.
Database queries rely on connection pool timeouts and fail with explicit error mapping; no blind query retries that could amplify load.
If embedding generation fails, the service returns a controlled error response (e.g., 503) or a configured fallback behavior (e.g., “rules-only recommendations”) depending on runtime configuration.

Idempotency strategy

For read-only recommendation endpoints, idempotency is inherent.
For ingestion endpoints (e.g., upserting items/embeddings), use an idempotency key or deterministic item identifiers. Upserts are executed as single-statement transactional operations keyed by item_id to prevent duplicates.

4. Data Model

Core tables (design intent):

items
- Purpose: canonical catalog records and business-rule eligibility fields.
- Key columns: item_id (PK), status, category, region, is_active, inventory_state, price, created_at, updated_at.
- Indexing: B-tree indexes on status, category, region, and composite indexes aligned to common eligibility filters (e.g., (region, is_active)).
item_embeddings
- Purpose: vector representation of items for similarity search.
- Key columns: item_id (PK/FK), embedding VECTOR(d), embedding_model, embedded_at.
- Indexing: pgvector index on embedding using the chosen operator class (typically cosine or inner-product based, depending on normalization strategy). Ensure the index matches the similarity operator used in queries.
recommendation_explanations (optional persistence; otherwise logs-only)
- Purpose: durable audit trail and debugging.
- Key columns: request_id (PK), principal_id, input_factors (JSONB), applied_rules (JSONB), candidates (JSONB), final_ranking (JSONB), created_at.
- Indexing: B-tree on created_at, optionally principal_id for support workflows.

Indexing strategy

Eligibility filters should reduce candidate sets before vector ranking where possible (especially for large catalogs).
Use pgvector index for approximate or exact nearest neighbor queries depending on performance needs; in a single-node baseline, start with an index-backed ANN configuration and a bounded topK candidate retrieval.
Ensure deterministic tie-breaking by adding stable secondary ordering (e.g., business priority descending, then item_id ascending).

5. API Surface

Public (serving):

POST /api/recommendations – Generate ranked recommendations for a user/context payload with explainability in response (ROLE_USER)
GET /api/recommendations/{requestId} – Retrieve a previously generated explainability record if persistence is enabled (ROLE_USER, ownership enforced)

Internal/Admin (catalog + operations):

POST /admin/items:upsert – Upsert item metadata used for eligibility filtering (ROLE_ADMIN)
POST /admin/items/{itemId}/embedding:upsert – Upsert or recompute an item embedding (ROLE_ADMIN)
POST /admin/reindex – Rebuild or verify pgvector indexes (ROLE_ADMIN)
GET /actuator/health – Health status for service and downstream checks (no auth or ROLE_MONITOR, depending on deployment)
GET /actuator/metrics – Metrics endpoint (ROLE_MONITOR)
GET /actuator/prometheus – Prometheus scrape endpoint if enabled (ROLE_MONITOR)

6. Security Model

Authentication

Spring Security with either:
- Local development: HTTP Basic or static API key (recommended for Compose).
- Production alignment: JWT bearer tokens (configured but not required for local run).

Authorization (roles)

ROLE_USER: can request recommendations and read only their own persisted explanations (if enabled).
ROLE_ADMIN: can manage items, embeddings, and maintenance operations.
ROLE_MONITOR: can access operational endpoints (health/metrics) where restricted.

Paid access enforcement (if applicable)

Enforced at the API boundary by mapping the authenticated principal to a “subscription/entitlement” claim (JWT claim or internal lookup).
Requests lacking entitlement are rejected with 403 and are logged with an audit marker.
No “soft gating” in the ranker; gating happens before candidate retrieval to avoid leaking catalog signals.

CSRF considerations

For stateless APIs using bearer tokens, CSRF protection is typically disabled; endpoints are not intended for browser cookie auth.
If enabling cookie-based auth for an admin UI, CSRF must be enabled for state-changing admin endpoints.

Data isolation guarantees

Principal scoping is enforced when accessing persisted explanations: requestId fetch requires ownership check (principal_id == authenticated principal), unless ROLE_ADMIN.
Catalog data is shared by design; if multi-tenant isolation is required, add tenant_id to core tables and apply row-level security or explicit tenant predicates on every query.

7. Operational Behavior

Startup behavior

On startup, the service validates:
- Database connectivity and pgvector extension availability.
- Required schema migrations (via Flyway/Liquibase).
- Embedding provider configuration (presence of base URL/model; does not block startup unless configured as “strict”).
The service exposes readiness only after DB is reachable and migrations are complete.

Failure modes

Embedding provider unavailable: request fails with a controlled 503 (or configured rules-only fallback).
Database unavailable: readiness fails; serving endpoints return 503 with clear error codes.
pgvector index missing/misaligned: startup can fail fast in strict mode, or emit an operational alert and run with degraded performance in permissive mode.

Retry and timeout behavior

Embedding client:
- Connect/read timeouts set explicitly.
- Bounded retries for transient errors only; no retries on validation failures.
Database:
- Connection pool timeouts enforced.
- No automatic query retries in the request path; operators should address underlying DB issues rather than amplify load.

Observability hooks (logs, metrics, traces)

Structured logs include request_id, principal_id, rule_set_version, candidate_count, and top score components.
Metrics include request latency, embedding latency, DB query latency, failure counts by dependency, and cache hit ratios if caching is enabled.
Traces propagate request correlation across the controller, embedding client, and DB query spans (OpenTelemetry-compatible), enabling investigation of tail latency and dependency failures.

8. Local Execution

Prerequisites

Docker Desktop (or Docker Engine) with Docker Compose v2
Java 17 (for running tests locally; containerized run does not require local Java)
An OpenAI-compatible embedding endpoint and API key (can be a local emulator or a remote provider)

Environment variables Set these in your shell or a .env file used by Docker Compose:

EMBEDDING_BASE_URL – Base URL of the embedding API
EMBEDDING_API_KEY – API key
EMBEDDING_MODEL – Embedding model identifier
DB_HOST=postgres (for container network)
DB_PORT=5432
DB_NAME=recs
DB_USER=recs
DB_PASSWORD=recs
RECS_AUTH_MODE=basic|apikey|jwt (local default: basic)
RECS_ADMIN_USER / RECS_ADMIN_PASS (if basic auth)
RECS_USER_USER / RECS_USER_PASS (if basic auth)
EXPLANATIONS_PERSISTED=true|false (default false; logs-only)

Docker Compose usage

Start dependencies and the service:

docker compose up --build

Verify health:

curl -s http://localhost:8080/actuator/health

Expected: status UP and database readiness indicators.

Seed or upsert a few items (admin-auth protected). Example (basic auth):

curl -u "$RECS_ADMIN_USER:$RECS_ADMIN_PASS" \
  -H "Content-Type: application/json" \
  -X POST http://localhost:8080/admin/items:upsert \
  -d '{
    "itemId":"item-1",
    "status":"ACTIVE",
    "category":"books",
    "region":"UK",
    "isActive":true,
    "inventoryState":"IN_STOCK",
    "price":19.99,
    "content":"Distributed systems design patterns"
  }'

Upsert embeddings for items (admin-auth). If the solution supports deriving embeddings from content, call:

curl -u "$RECS_ADMIN_USER:$RECS_ADMIN_PASS" \
  -H "Content-Type: application/json" \
  -X POST http://localhost:8080/admin/items/item-1/embedding:upsert \
  -d '{"mode":"FROM_CONTENT"}'

Request recommendations (user-auth):

curl -u "$RECS_USER_USER:$RECS_USER_PASS" \
  -H "Content-Type: application/json" \
  -X POST http://localhost:8080/api/recommendations \
  -d '{
    "principalId":"user-123",
    "query":"reliable distributed systems",
    "region":"UK",
    "constraints":{
      "categories":["books"],
      "maxPrice":50.00,
      "excludeOutOfStock":true
    },
    "topK":10,
    "explain":true
  }'

Verification steps

Response returns:
- A requestId
- A ranked list of items with finalScore
- An explanation object containing applied filters and score components
Confirm database state (optional):

docker exec -it <postgres-container-name> psql -U recs -d recs -c \
  "select item_id, embedded_at from item_embeddings order by embedded_at desc limit 5;"

9. Evidence Pack (MANDATORY)

Included evidence artifacts (checklist):

Service startup logs showing:
- Successful database connection
- Schema migrations applied
- pgvector extension check passed
Successful admin upsert logs for:
- Item metadata upsert
- Item embedding upsert/recompute
Successful API invocation evidence:
- Request/response logs for POST /api/recommendations including request_id
- Response payload containing explanation fields and deterministic tie-break ordering
Database records after execution:
- items row present for seeded items
- item_embeddings row present with embedded_at updated
- (If enabled) recommendation_explanations row present for request_id
Failure-mode demonstration:
- Embedding provider timeout scenario producing a controlled error (or rules-only fallback) with explicit log markers
- Authorization failure example for admin endpoints (403) with audit log entry
Observability proof:
- Structured log samples showing score component breakdown
- Trace/span export evidence (if tracing enabled) for embedding call and DB similarity query spans

10. Known Limitations

Does not implement full multi-tenant isolation by default; shared catalog is assumed. Tenant isolation requires schema changes and enforced predicates or RLS.
Does not provide an offline evaluation harness (e.g., NDCG/CTR simulations) as part of the baseline runtime; the focus is online serving correctness and explainability.
Embedding drift management (model upgrades, backfills) is not fully automated in the baseline; embedding refresh is supported via admin operations but not scheduled orchestration.
Complex user modeling (session-based embeddings, long-term profiles, feature stores) is out of scope; the service accepts a query/context and ranks item candidates accordingly.

11. Extension Points

Replace the embedding provider:
- Swap the OpenAI-compatible client configuration to a different vendor or self-hosted embedding service; maintain vector dimensionality consistency and operator choice.
Scale candidate generation:
- Add pre-filter caches, category-specific indexes, or a two-stage retrieval pipeline (coarse ANN → re-rank).
- Introduce asynchronous embedding jobs for bulk updates and scheduled refresh.
Enhance business rules:
- Externalize rules into a versioned ruleset (e.g., JSON/YAML or a rules engine) with controlled rollout and audit logs.
- Add per-segment constraints (geo, compliance, inventory contracts) and expose rule versions in explanations.
Production hardening changes:
- Add a gateway/WAF in front of the API for rate limiting and request shaping.
- Move secrets to a proper secret manager; rotate embedding API keys.
- Introduce multi-node deployment with a dedicated Postgres instance and tuned pgvector index parameters; optionally split vectors into a specialized vector store if operational needs exceed Postgres constraints.
Explainability persistence and tooling:
- Persist explanations by default and build an internal admin viewer for debugging ranking disputes and auditing rule applications.
- Add trace correlation IDs to explanation records for end-to-end incident investigation.

Vector-based Recommendations in Spring Boot: pgvector Similarity + Business Rules

Business Fit

Enterprise Readiness

Delivery Package

Implementation Notes