LLM Chat Platform | Matias Da Silva

Problem

Most LLM integrations are built as thin wrappers around model APIs — no cost control, no observability, no transactional guarantees. When these systems fail or overspend, there's no structured way to understand why.

Solution

A production-oriented LLM backend designed around correctness and operational clarity. The platform treats /chat as the single authoritative write-path, enforces transactional integrity on every request, and tracks cost, latency, and provider behavior through structured observability.

Key capabilities:

Provider-agnostic architecture supporting OpenAI and AWS Bedrock through a unified port
ML-aware intelligent routing between providers based on cost and complexity
Exponential backoff retry and single-hop fallback at the provider boundary
Structured JSON logging with request correlation and provider lifecycle events
Offline cost analytics over usage events — no billing coupling
Redis response cache for non-streaming requests
Read-only conversation inspection endpoints
CI baseline with deterministic pytest suite and Docker build validation

Architecture

The platform follows a strict layered design: the API layer has no DB access, ChatService has no HTTP semantics, and provider logic is fully isolated behind a port contract. This separation ensures that provider failures never corrupt business data and that observability is non-invasive to the write-path.

Result

A reference backend that demonstrates how LLM workloads should be integrated in production environments — with explicit operational boundaries, cost awareness, and architectural discipline.