47HQ
All Services

04 · AI Implementation

RAG & Embedding

Production-grade retrieval — measured against your golden set before it ships.

rag-index · prod
LIVE
query > duplicate charge auto refund window
handbook/refund-policy.md0.94
refund within 4 minutes
ledger/transactions.sql0.88
auto-reversal trigger
runbooks/billing.md0.81
duplicate retry guard
Recall @5
0.96
↑ 0.18
MRR
0.71
↑ 22%
Chunk size
512
tuned
Retrieval you can measure.
47hq
Duration
3–5 weeks
Team
1 principal + 1 engineer
Starts in
Kick-off within 1–2 weeks of SOW
Investment
Fixed fee · $45k–$85k

Overview

What you get

Chunking, hybrid retrieval, reranking, citation grounding, and CI eval gates so your RAG system improves with every deploy instead of silently regressing.

The problem

Why teams call us

  • Hallucination rate is unknown and getting worse with each prompt tweak.
  • Retrieval precision regresses silently — nobody finds out until a customer does.
  • Cost-per-query is creeping up and there's no budget instrumentation.

Approach

How we work

  • Start with the eval set, not the prompt. Measure first, ship second.
  • Three chunking strategies benchmarked head-to-head against your data.
  • Hybrid (BM25 + dense) retrieval with a reranker tuned to your domain.
  • CI gates so regressions on hallucination or precision block the deploy.

Process

Week by week.

  1. 01 · Week 1

    Eval set

    Build 500+ golden query/answer set with LLM-as-judge rubrics.

  2. 02 · Week 2

    Retrieval rebuild

    Chunking benchmark, hybrid retrieval, reranker.

  3. 03 · Week 3–4

    Grounding

    Citation system, refusal logic, before/after measurement.

  4. 04 · Week 5

    CI + handoff

    Eval CI gates, dashboards, your-team-can-run-this runbook.

You're a fit if
  • Live RAG system shipped to paying customers
  • Willingness to instrument production with our eval harness
  • Single decision-maker on your side
Probably not a fit if
  • Pre-product 'we want to do RAG' exploration
  • No tolerance for instrumenting production with eval traces
  • Looking for a chatbot wrapper, not a measured retrieval system

Deliverables

Everything we ship

  • 01Chunking redesign with three strategies benchmarked
  • 02Hybrid BM25 + dense retrieval with reranker
  • 03Citation system with source-grounding checks
  • 04Golden eval dataset of 500+ queries with LLM-as-judge scoring
  • 05CI gates so regressions block deploys
  • 06Cost-per-query + latency dashboards

Outcomes

What you walk away with.

−60%
hallucination rate vs baseline
+25%
retrieval precision @5
−35%
P95 latency and cost-per-query

Named before/after metrics on hallucination rate, retrieval precision @5, P95 latency, and cost-per-query — methodology your team can re-run forever.

Tooling

Stack we ship against

Model- and infra-agnostic. We adapt to your stack, not the other way around.

OpenAIAnthropicCohere RerankPineconepgvectorTurbopufferLangSmithLangfuse

FAQ

Real questions, technically answered.

Will this work with our existing vector DB?
Yes. We are infrastructure-agnostic and ship against Pinecone, Weaviate, pgvector, Turbopuffer, and Mongo Atlas.
Do we have to switch LLM providers?
No. We benchmark across providers and recommend, but the decision (and migration) is yours.
What if our golden set is too small?
We co-author it during week 1 using real production traces. 500 queries is the floor, not the ceiling.

Next step

Ready to scope RAG & Embedding?

Book a discovery call. We'll confirm fit, sequence the engagement, and have a Statement of Work in your inbox within a week.

Refundable if we're not a fitWritten diagnostic in 48 hoursSession run by a founder, not a sales rep