47HQ
All Services

06 · AI Implementation

AI Agents

Multi-step agents that finish the job — with traces, guardrails, and rollback.

agent-runtime · prod
LIVE
goal
Resolve duplicate-charge ticket end-to-end
plandecompose ticket
toolsearch · stripe.refunds
toollookup · customer.tier
actissue refund · $128.40
verifyledger reconciled
Steps / task
5.2
median
Auto-resolve
73%
↑ 41pt
Escalate rate
8%
↓ 19pt
Multi-step tasks, resolved end-to-end.
47hq
Duration
6–12 weeks
Team
1 principal + 2 engineers
Starts in
Kick-off within 2 weeks of SOW
Investment
Fixed fee · $120k–$240k

Overview

What you get

Tool + API orchestration with typed schemas, per-step evals, replayable traces, and deterministic rollback paths for every side effect.

The problem

Why teams call us

  • Agents work in demos and explode in production on the third tool call.
  • Traces are noisy or missing — nobody can answer 'why did it do that?'.
  • Rollback for side-effect tools is bolted on later, not designed in.

Approach

How we work

  • Tools first, prompts second. Typed schemas before any reasoning loop.
  • Per-step eval coverage, not just end-to-end smoke tests.
  • Every side-effect tool ships with an explicit rollback path.

Process

Week by week.

  1. 01 · Week 1–2

    Tooling map

    Define tools, typed schemas, side effects, rollback paths.

  2. 02 · Week 3–6

    Build agent

    Orchestration graph, step-level traces, replay infra.

  3. 03 · Week 7–9

    Evaluate

    Per-step evals, golden traces, red-team pass.

  4. 04 · Week 10–12

    Ship & handoff

    Production rollout, runbooks, rotation guide.

You're a fit if
  • Workflow with clear tools, APIs, and side effects to orchestrate
  • Engineering org ready for trace + replay infra
  • Guardrails treated as first-class, not an afterthought
Probably not a fit if
  • Single-turn Q&A — use a copilot instead
  • Workflows where 'best effort' is fine and audit doesn't matter
  • Orgs unwilling to define side-effect rollback semantics

Deliverables

Everything we ship

  • 01Tool + API orchestration with typed schemas
  • 02Step-level trace + replay infrastructure
  • 03Per-step eval coverage, not just end-to-end
  • 04Rollback paths for every side-effect tool
  • 05Handoff package: runbooks, eval suite, rotation guide

Outcomes

What you walk away with.

≥90%
task completion on golden traces
100%
side-effect tools with a rollback path
Replay
any production run end-to-end

An agent that finishes the job — with traces you can replay, guardrails that fire when they should, and rollback when they don't.

Tooling

Stack we ship against

Model- and infra-agnostic. We adapt to your stack, not the other way around.

LangGraphLlamaIndexInngestTemporalOpenAIAnthropicLangSmithOpenTelemetry

FAQ

Real questions, technically answered.

Multi-agent vs single-agent?
We default to a single planner with typed tools. Multi-agent only when latency and decomposition genuinely demand it.
How do you prevent runaway loops?
Step limits, budget ceilings, and per-step evals — enforced in the orchestration layer, not the prompt.
Can the agent call our internal APIs?
Yes. We treat your APIs as first-class tools with typed schemas and auth scoping.

Next step

Ready to scope AI Agents?

Book a discovery call. We'll confirm fit, sequence the engagement, and have a Statement of Work in your inbox within a week.

Refundable if we're not a fitWritten diagnostic in 48 hoursSession run by a founder, not a sales rep