2025Lead engineer~1M chunks indexed, <500ms time-to-first-token

Retrieval-Augmented Knowledge Assistant

A production RAG system over product documentation and support data.

Built an assistant that answers product and support questions grounded in an internal knowledge base. Chunking is semantic and aware of doc structure; retrieval combines pgvector ANN and BM25 reranking; the LLM layer is streamed with citations and a safety rail.

Highlights

Hybrid retrieval (dense + lexical) with learned reranker
Citations every answer, with provenance UI
Evals in CI against a golden set; regressions block deploys

Stack

Next.jsVercel AI SDKGeminipgvectorPostgreSQLTypeScript