1. Overview
This solution implements an explainable recommendation service that combines vector similarity (embeddings) with deterministic business rules to produce stable, production-suitable rankings. It addresses the common gap between “embedding-only” recommendations (high recall but hard to control) and “rules-only” personalization (predictable but low relevance and hard to generalize).
Pure vector similarity approaches often fail in production because they are difficult to constrain: they can surface disallowed content, violate inventory or eligibility constraints, and produce unstable ordering when similarity scores are close. They also tend to be non-explainable to stakeholders beyond “the model said so,” which complicates debugging, auditing, and governance. Pure business-rule ranking, conversely, becomes brittle as catalogs grow and user intent becomes multi-dimensional.
This implementation is production-ready in the sense that it:
- Separates eligibility filtering (hard rules) from ranking (soft scoring) and produces a structured explanation of the final ordering.
- Uses PostgreSQL + pgvector for low-latency similarity search with predictable operational characteristics and transactional data consistency.
- Implements timeouts, retries (where safe), and explicit failure behavior for the embedding provider and the database.
- Produces traceable ranking breakdowns (score components, applied rules, and tie-breakers) for debugging and auditability.
2. Architecture
Request flow and components:
- Client → Recommendation API (Spring Boot REST)
- Recommendation API → AuthN/AuthZ layer (Spring Security)
- Recommendation API → Eligibility Filter (business rules engine/module)
- Recommendation API → Embedding Client (OpenAI-compatible embedding API)
- Recommendation API → PostgreSQL (pgvector) for similarity search and metadata lookup
- Recommendation API → Ranker (score aggregation + deterministic tie-breakers)
- Recommendation API → Audit/Explain Log (structured logs; optional DB persistence)
External dependencies:
- PostgreSQL with pgvector extension enabled (vector index + transactional catalog state).
- Embedding API (OpenAI-compatible) for generating embeddings for query/user/context vectors.
- Docker Compose for local single-node deployment.
Trust boundaries:
- Untrusted: client inputs (user identifiers, query text, context signals).
- Semi-trusted: embedding provider responses (validated for shape and length; guarded by timeouts).
- Trusted: internal service components and database on the private Docker network.
- Boundary enforcement: authentication at the API; authorization at endpoint and rule scope; least-privilege DB credentials.
3. Key Design Decisions
Technology stack
- Java 17 + Spring Boot 3.x: standard enterprise runtime, predictable performance, mature ecosystem for security, configuration, and observability.
- PostgreSQL + pgvector: avoids introducing a separate vector database for the single-node deployment model. Keeps item metadata and embeddings in one transactional store, simplifying consistency and operational overhead.
- Docker Compose: aligns with single-node reproducible execution and platform packaging.
Data storage model
- Store item embeddings alongside item metadata and eligibility fields in PostgreSQL.
- Store “explainability outputs” (ranking breakdowns) in structured logs by default; optionally persist to a table for audit/compliance use-cases (configurable).
- Use immutable recommendation result identifiers to correlate requests, logs, and persisted explanations.
Synchrony vs asynchrony
- The primary recommendation request is synchronous to meet typical online serving requirements (low latency, immediate response).
- Optional asynchronous paths are supported conceptually (not required for local run): embedding precomputation for items and periodic refresh jobs. For the baseline runnable system, item embeddings are assumed to be preloaded or ingested via an admin endpoint.
Error handling and retries
- Embedding calls use strict timeouts and bounded retries with jitter only on transient failures (network timeouts, 5xx). Non-retryable failures (4xx, invalid payloads) fail fast.
- Database queries rely on connection pool timeouts and fail with explicit error mapping; no blind query retries that could amplify load.
- If embedding generation fails, the service returns a controlled error response (e.g., 503) or a configured fallback behavior (e.g., “rules-only recommendations”) depending on runtime configuration.
Idempotency strategy
- For read-only recommendation endpoints, idempotency is inherent.
- For ingestion endpoints (e.g., upserting items/embeddings), use an idempotency key or deterministic item identifiers. Upserts are executed as single-statement transactional operations keyed by item_id to prevent duplicates.
4. Data Model
Core tables (design intent):
-
items
- Purpose: canonical catalog records and business-rule eligibility fields.
- Key columns:
item_id (PK),status,category,region,is_active,inventory_state,price,created_at,updated_at. - Indexing: B-tree indexes on
status,category,region, and composite indexes aligned to common eligibility filters (e.g.,(region, is_active)).
-
item_embeddings
- Purpose: vector representation of items for similarity search.
- Key columns:
item_id (PK/FK),embedding VECTOR(d),embedding_model,embedded_at. - Indexing: pgvector index on
embeddingusing the chosen operator class (typically cosine or inner-product based, depending on normalization strategy). Ensure the index matches the similarity operator used in queries.
-
recommendation_explanations (optional persistence; otherwise logs-only)
- Purpose: durable audit trail and debugging.
- Key columns:
request_id (PK),principal_id,input_factors (JSONB),applied_rules (JSONB),candidates (JSONB),final_ranking (JSONB),created_at. - Indexing: B-tree on
created_at, optionallyprincipal_idfor support workflows.
Indexing strategy
- Eligibility filters should reduce candidate sets before vector ranking where possible (especially for large catalogs).
- Use pgvector index for approximate or exact nearest neighbor queries depending on performance needs; in a single-node baseline, start with an index-backed ANN configuration and a bounded
topKcandidate retrieval. - Ensure deterministic tie-breaking by adding stable secondary ordering (e.g., business priority descending, then
item_idascending).
5. API Surface
Public (serving):
- POST /api/recommendations – Generate ranked recommendations for a user/context payload with explainability in response (ROLE_USER)
- GET /api/recommendations/{requestId} – Retrieve a previously generated explainability record if persistence is enabled (ROLE_USER, ownership enforced)
Internal/Admin (catalog + operations):
- POST /admin/items:upsert – Upsert item metadata used for eligibility filtering (ROLE_ADMIN)
- POST /admin/items/{itemId}/embedding:upsert – Upsert or recompute an item embedding (ROLE_ADMIN)
- POST /admin/reindex – Rebuild or verify pgvector indexes (ROLE_ADMIN)
- GET /actuator/health – Health status for service and downstream checks (no auth or ROLE_MONITOR, depending on deployment)
- GET /actuator/metrics – Metrics endpoint (ROLE_MONITOR)
- GET /actuator/prometheus – Prometheus scrape endpoint if enabled (ROLE_MONITOR)
6. Security Model
Authentication
-
Spring Security with either:
- Local development: HTTP Basic or static API key (recommended for Compose).
- Production alignment: JWT bearer tokens (configured but not required for local run).
Authorization (roles)
- ROLE_USER: can request recommendations and read only their own persisted explanations (if enabled).
- ROLE_ADMIN: can manage items, embeddings, and maintenance operations.
- ROLE_MONITOR: can access operational endpoints (health/metrics) where restricted.
Paid access enforcement (if applicable)
- Enforced at the API boundary by mapping the authenticated principal to a “subscription/entitlement” claim (JWT claim or internal lookup).
- Requests lacking entitlement are rejected with 403 and are logged with an audit marker.
- No “soft gating” in the ranker; gating happens before candidate retrieval to avoid leaking catalog signals.
CSRF considerations
- For stateless APIs using bearer tokens, CSRF protection is typically disabled; endpoints are not intended for browser cookie auth.
- If enabling cookie-based auth for an admin UI, CSRF must be enabled for state-changing admin endpoints.
Data isolation guarantees
- Principal scoping is enforced when accessing persisted explanations:
requestIdfetch requires ownership check (principal_id == authenticated principal), unless ROLE_ADMIN. - Catalog data is shared by design; if multi-tenant isolation is required, add
tenant_idto core tables and apply row-level security or explicit tenant predicates on every query.
7. Operational Behavior
Startup behavior
-
On startup, the service validates:
- Database connectivity and pgvector extension availability.
- Required schema migrations (via Flyway/Liquibase).
- Embedding provider configuration (presence of base URL/model; does not block startup unless configured as “strict”).
-
The service exposes readiness only after DB is reachable and migrations are complete.
Failure modes
- Embedding provider unavailable: request fails with a controlled 503 (or configured rules-only fallback).
- Database unavailable: readiness fails; serving endpoints return 503 with clear error codes.
- pgvector index missing/misaligned: startup can fail fast in strict mode, or emit an operational alert and run with degraded performance in permissive mode.
Retry and timeout behavior
-
Embedding client:
- Connect/read timeouts set explicitly.
- Bounded retries for transient errors only; no retries on validation failures.
-
Database:
- Connection pool timeouts enforced.
- No automatic query retries in the request path; operators should address underlying DB issues rather than amplify load.
Observability hooks (logs, metrics, traces)
- Structured logs include
request_id,principal_id,rule_set_version,candidate_count, and top score components. - Metrics include request latency, embedding latency, DB query latency, failure counts by dependency, and cache hit ratios if caching is enabled.
- Traces propagate request correlation across the controller, embedding client, and DB query spans (OpenTelemetry-compatible), enabling investigation of tail latency and dependency failures.
8. Local Execution
Prerequisites
- Docker Desktop (or Docker Engine) with Docker Compose v2
- Java 17 (for running tests locally; containerized run does not require local Java)
- An OpenAI-compatible embedding endpoint and API key (can be a local emulator or a remote provider)
Environment variables
Set these in your shell or a .env file used by Docker Compose:
EMBEDDING_BASE_URL– Base URL of the embedding APIEMBEDDING_API_KEY– API keyEMBEDDING_MODEL– Embedding model identifierDB_HOST=postgres(for container network)DB_PORT=5432DB_NAME=recsDB_USER=recsDB_PASSWORD=recsRECS_AUTH_MODE=basic|apikey|jwt(local default: basic)RECS_ADMIN_USER/RECS_ADMIN_PASS(if basic auth)RECS_USER_USER/RECS_USER_PASS(if basic auth)EXPLANATIONS_PERSISTED=true|false(default false; logs-only)
Docker Compose usage
- Start dependencies and the service:
docker compose up --build
- Verify health:
curl -s http://localhost:8080/actuator/health
Expected: status UP and database readiness indicators.
- Seed or upsert a few items (admin-auth protected). Example (basic auth):
curl -u "$RECS_ADMIN_USER:$RECS_ADMIN_PASS" \
-H "Content-Type: application/json" \
-X POST http://localhost:8080/admin/items:upsert \
-d '{
"itemId":"item-1",
"status":"ACTIVE",
"category":"books",
"region":"UK",
"isActive":true,
"inventoryState":"IN_STOCK",
"price":19.99,
"content":"Distributed systems design patterns"
}'
- Upsert embeddings for items (admin-auth). If the solution supports deriving embeddings from
content, call:
curl -u "$RECS_ADMIN_USER:$RECS_ADMIN_PASS" \
-H "Content-Type: application/json" \
-X POST http://localhost:8080/admin/items/item-1/embedding:upsert \
-d '{"mode":"FROM_CONTENT"}'
- Request recommendations (user-auth):
curl -u "$RECS_USER_USER:$RECS_USER_PASS" \
-H "Content-Type: application/json" \
-X POST http://localhost:8080/api/recommendations \
-d '{
"principalId":"user-123",
"query":"reliable distributed systems",
"region":"UK",
"constraints":{
"categories":["books"],
"maxPrice":50.00,
"excludeOutOfStock":true
},
"topK":10,
"explain":true
}'
Verification steps
-
Response returns:
- A
requestId - A ranked list of items with
finalScore - An explanation object containing applied filters and score components
- A
-
Confirm database state (optional):
docker exec -it <postgres-container-name> psql -U recs -d recs -c \
"select item_id, embedded_at from item_embeddings order by embedded_at desc limit 5;"
9. Evidence Pack (MANDATORY)
Included evidence artifacts (checklist):
-
Service startup logs showing:
- Successful database connection
- Schema migrations applied
- pgvector extension check passed
-
Successful admin upsert logs for:
- Item metadata upsert
- Item embedding upsert/recompute
-
Successful API invocation evidence:
- Request/response logs for
POST /api/recommendationsincludingrequest_id - Response payload containing explanation fields and deterministic tie-break ordering
- Request/response logs for
-
Database records after execution:
itemsrow present for seeded itemsitem_embeddingsrow present withembedded_atupdated- (If enabled)
recommendation_explanationsrow present forrequest_id
-
Failure-mode demonstration:
- Embedding provider timeout scenario producing a controlled error (or rules-only fallback) with explicit log markers
- Authorization failure example for admin endpoints (403) with audit log entry
-
Observability proof:
- Structured log samples showing score component breakdown
- Trace/span export evidence (if tracing enabled) for embedding call and DB similarity query spans
10. Known Limitations
- Does not implement full multi-tenant isolation by default; shared catalog is assumed. Tenant isolation requires schema changes and enforced predicates or RLS.
- Does not provide an offline evaluation harness (e.g., NDCG/CTR simulations) as part of the baseline runtime; the focus is online serving correctness and explainability.
- Embedding drift management (model upgrades, backfills) is not fully automated in the baseline; embedding refresh is supported via admin operations but not scheduled orchestration.
- Complex user modeling (session-based embeddings, long-term profiles, feature stores) is out of scope; the service accepts a query/context and ranks item candidates accordingly.
11. Extension Points
-
Replace the embedding provider:
- Swap the OpenAI-compatible client configuration to a different vendor or self-hosted embedding service; maintain vector dimensionality consistency and operator choice.
-
Scale candidate generation:
- Add pre-filter caches, category-specific indexes, or a two-stage retrieval pipeline (coarse ANN → re-rank).
- Introduce asynchronous embedding jobs for bulk updates and scheduled refresh.
-
Enhance business rules:
- Externalize rules into a versioned ruleset (e.g., JSON/YAML or a rules engine) with controlled rollout and audit logs.
- Add per-segment constraints (geo, compliance, inventory contracts) and expose rule versions in explanations.
-
Production hardening changes:
- Add a gateway/WAF in front of the API for rate limiting and request shaping.
- Move secrets to a proper secret manager; rotate embedding API keys.
- Introduce multi-node deployment with a dedicated Postgres instance and tuned pgvector index parameters; optionally split vectors into a specialized vector store if operational needs exceed Postgres constraints.
-
Explainability persistence and tooling:
- Persist explanations by default and build an internal admin viewer for debugging ranking disputes and auditing rule applications.
- Add trace correlation IDs to explanation records for end-to-end incident investigation.