Confidential — Large B2B SaaS Engineering Org
Building an Internal-Documentation MCP Server for a SaaS Engineering Team
- search · get · owners · changes · runbooks
- Tools exposed
- Sub-second
- Latency (p95)
- Notion · GitHub · Backstage · more
- Sources indexed
- Live in production, 6+ months
- Status
Project details
The Challenge
Our client is a US-headquartered B2B SaaS company with several hundred engineers, a long-lived codebase, and the institutional-knowledge problem that comes with both. Engineering documentation lived across Notion (thousands of pages), GitHub READMEs, an internal Backstage TechDocs deployment, Confluence (a leftover from an earlier acquisition), Slack threads, and the heads of a handful of principal engineers who were starting to think about retirement. New hires were taking the better part of two months to become independently productive.
The team had previously tried two RAG-based "internal Stack Overflow" projects, both abandoned within six months. The failure mode was the same in both cases: the systems answered confidently but inaccurately, citing documents that contradicted each other or were several major releases out of date, and engineers stopped trusting them. The brief to us was sharp: build something engineers would actually use, with measurable answer faithfulness, that integrates with the IDE and chat tools they already live in.
Our Approach
Rather than build another standalone chatbot, we built an internal Model Context Protocol (MCP) server that any LLM client — Claude Desktop, Cursor, Zed, the team's own internal Claude API tooling — could connect to. The MCP server is the contract; the model surface is whatever the engineer prefers. That decision alone was the difference between adoption and another shelved project.
The server exposes a focused set of tools rather than one monolithic search_docs primitive:
search_docs(query, source_filter?)— hybrid BM25 + dense retrieval across the indexed sources.get_doc(path)— fetch a specific document with provenance metadata.find_owner(file_path)— return the CODEOWNERS entry and on-call rotation for a code path.recent_changes(component, days)— pull merged PRs touching a component, with author and reviewer context.runbook(alert_name)— fetch the operational runbook for a named alert, including recent post-mortems for that alert.
The indexing pipeline addresses the freshness problem head-on. We hash every source document on ingest and store doc_hash → [chunk_ids] in a side table; only documents whose hash has changed are re-embedded on the nightly run. Documents inherit a freshness score based on the last-edited timestamp, the number of PRs that have referenced them in recent months, and whether their author still works at the company. Stale documents are deprioritised in retrieval rather than removed — the model sees them but is told to flag uncertainty when relying on them.
Answer quality is measured continuously. A nightly Airflow DAG runs RAGAS faithfulness, answer-relevancy, and context-precision scoring against a golden set maintained by the platform-engineering team. Any deployment that regresses faithfulness beyond a small threshold blocks the release.
Embedding model is text-embedding-3-large truncated to 1536 dims. Vector store is pgvector on the team's existing Postgres cluster — they already had backup, monitoring, and PII review processes for it, and we didn't want to introduce a new datastore into the security boundary. Re-ranking uses Cohere Rerank for the top candidates.
The Outcome
The MCP server has been in production for over six months. It serves tens of thousands of queries per day at sub-second p95 latency, with answer faithfulness consistently held above the platform team's RAGAS-measured floor. The major internal sources are indexed, with Notion, GitHub, and the Backstage TechDocs surface contributing the bulk of retrieved chunks.
The metric the engineering org actually cares about is time-to-productivity. New-hire onboarding ramp measured by independent-commit milestones has moved by a meaningful margin — roughly halved. Internal Slack volume on the engineering help channel has dropped noticeably and the staff engineers who used to be the de-facto search engine have visibly reclaimed their week. The platform team has since opened the MCP server to other tools — a Slack bot and an incident-response agent both now use the same toolset.
Capabilities used
Services that powered this project
Next project
OBox — Pakistan