Problem
Most LLM integrations are built as thin wrappers around model APIs — no cost control, no observability, no transactional guarantees. When these systems fail or overspend, there's no structured way to understand why.
Solution
A production-oriented LLM backend designed around correctness and operational clarity. The platform treats /chat as the single authoritative write-path, enforces transactional integrity on every request, and tracks cost, latency, and provider behavior through structured observability.
Key capabilities:
- Provider-agnostic architecture supporting OpenAI and AWS Bedrock through a unified port
- ML-aware intelligent routing between providers based on cost and complexity
- Exponential backoff retry and single-hop fallback at the provider boundary
- Structured JSON logging with request correlation and provider lifecycle events
- Offline cost analytics over usage events — no billing coupling
- Redis response cache for non-streaming requests
- Read-only conversation inspection endpoints
- CI baseline with deterministic pytest suite and Docker build validation
Architecture
The platform follows a strict layered design: the API layer has no DB access, ChatService has no HTTP semantics, and provider logic is fully isolated behind a port contract. This separation ensures that provider failures never corrupt business data and that observability is non-invasive to the write-path.
Result
A reference backend that demonstrates how LLM workloads should be integrated in production environments — with explicit operational boundaries, cost awareness, and architectural discipline.