> warden / blog
theauditor, mcp, integration

TheAuditor + Warden: stop guessing, start querying

Warden ships the audit trail on token economics. TheAuditor replaces read-and-guess with a database query. The pairing isn't a coincidence.

Most AI coding agents lock you into two things: a single LLM provider, and a single view of your codebase. Warden breaks the first lock. TheAuditor breaks the second. This post is from Warden’s side — the agent-side integration mechanics, what we ship today, and why pairing with TheAuditor is the rare case where two pre-launch tools were independently designed with each other in mind.

Warden, in one paragraph

Warden is a lean, multi-provider, terminal-native LLM coding agent — single-binary CLI in Python 3.14, MIT licensed, 42,800 LOC under a strict 13-tier import DAG enforced by import-linter on every PR. 21 builtin tools, full MCP client and server mode, 22 hook event types, 6 permission modes, a 4-tier WARDEN.md memory walker. Three providers fully wired today (Anthropic, OpenAI Platform + Responses subscription, Gemini); three scaffolded (Bedrock, Vertex, Foundry) raising explicit NotImplementedError on construction so you can’t accidentally use a half-finished adapter. 24 model entries across the seven providers.

That’s the surface area. The interesting bit is the token economics underneath it.

Where Warden’s tokens go (and where they don’t)

Token economics is one of those things every agent claims to do and almost none ship the audit trail for. We do.

Prompt caching on the wire end-to-end. cache_control markers flow from the composition root through the typed system-prompt blocks, into the Anthropic adapter’s _system_to_wire, and out onto the SSE request verbatim. This was a recent bug-fix wave (changelog entry W2.2): a typing drift had been silently disabling Anthropic prompt cache because ApiRequestParams.system was passing list[str] instead of list[TextBlock], and the wire layer was emitting plain {type: text, text} dicts with no cache_control. Threaded typed end-to-end now; +16 tests pin the contract. OpenAI and Gemini join system blocks into a single provider-native shape and drop cache_control with a per-provider DEBUG log so you know exactly what each adapter is doing.

Auto-compaction at ~187K estimated tokens. Four surgical strategies fire in order: system-reminder dedup, image age strip, tool-result age decay, and mandatory orphan cleanup (the last one reconciles tool-use / tool-result pairing that earlier truncation broke). Manual /compact available any time. The compactor is not a summarizer — it’s surgical message-array editing that preserves thinking-block signatures so the next turn still cache-hits.

Hard cost governance. 12-model price table with Decimal accounting (no float drift across long sessions), configurable per-session USD cap, 80% / 100% threshold warnings. Per-turn cost lands in the REPL renderer’s status footer. The dedicated token_budget/ subsystem ships an estimator, persistor, analyzer, cache-break telemetry, and a budget tracker. The cost ledger persists per-session to JSONL so warden --resume <uuid> continues against the original cap — no double billing on resumed work.

Net effect: an honest per-turn token bill, observable cache behavior, and a hard ceiling on cost. That’s the cake. What TheAuditor adds is the icing — and it’s the part that matters when the model would otherwise re-read the same 2,000 lines for the third time in a debugging session.

Where TheAuditor comes in

Every LLM coding agent suffers from the same failure mode: read 2,000 lines of code, infer a call graph from indentation and comments, hallucinate three of the relationships, write a “fix” that quietly breaks two other files. TheAuditor replaces that read-and-guess loop with a sub-millisecond database query.

Their integration spec (which lives in our repo as architecture/24-theauditor-integration.md) puts the savings at 85–95% token reduction on common investigation flows. The per-call comparison isn’t always token-negative — calling aud_explain on a file can cost more tokens than naively Read-ing the file once. The win is the eliminated re-reads, mis-edits, “let me just check this neighbouring file too” rabbit holes, and refactors that don’t match how the codebase actually calls into the symbol you’re touching.

The honest framing: tiny token reduction per call, huge hallucination reduction. And once hallucinations stop, the multi-thousand-token recovery loops they cause stop with them. That’s where the 85–95% number comes from.

Concrete numbers from TheAuditor’s MCP-tool token-optimization audit (per-call wire bytes):

TargetJSON beforeJSON afterΔ
TS file (985 properties)28,62818,071-36.9%
Python file (469 symbols)17,0729,085-46.8%
Class symbol (15 callers, dups)8,3343,630-56.4%

Coverage is honest too: 100% True Positive Rate at 0% False Positive Rate on OWASP Java (11/11), OWASP Python, and OWASP Juice Shop (31/31). No risk scores, no subjective ratings — facts. Twelve languages with parity across indexing / taint / CFG / call graph / rules.

The integration mechanics (Warden-side)

The pairing surface is small because the architecture is right. Warden was built MCP-first from the start; TheAuditor exposes its facts via a standard MCP stdio server. No special-casing required.

warden install --with-code-intel writes a .mcp.json snippet pointing at TheAuditor’s aud-mcp stdio server, plus a SessionStart hook that runs aud full --offline --fast in the background. The model gets fresh database state on every connect. The 8 MCP tools — aud_explain, aud_query, aud_findings, aud_impact, aud_blueprint, aud_session, aud_reindex, aud_analytics — surface in Warden’s tool pool under the same permission grammar as builtins. No custom permission rules to write.

MCP prompts → Warden skills. TheAuditor’s /theauditor:planning, /theauditor:security, /theauditor:impact slash commands surface in Warden as /theauditor:* skills via our existing fetch_mcp_skills_for_connection bridge. Zero code on either side — Warden’s MCP-prompt-to-skill converter already does the work. List them with /help.

Context Gate (optional). If you want to be aggressive about it, a PreToolUse hook can hard-block Edit / Write until the model has called aud_explain on the target file in the current session. The hook is six lines of bash + a JSON decision; subprocess transport, exit code = decision. The policy is yours — Warden enforces, you author.

Cost ledger continuity. TheAuditor’s MCP tool calls cost tokens like any other tool. They land in the same per-model Decimal ledger, the same /cost output, the same per-session USD cap. Compaction skips tool_use / tool_result pairs that are still being acted on; older aud_explain results decay per the standard tool-result age strategy.

Set it up in three commands

pip install warden theauditor                 # both are pip-installable, Python 3.14+

cd your-project
aud full --offline                            # index — 30s for small projects, 10 min for 100K+ LOC
warden install --with-code-intel              # writes .mcp.json + SessionStart hook

Open a Warden session. The model gets TheAuditor’s 8 MCP tools on first invocation. Type a prompt. Watch the model call aud_explain instead of Read on the file it’s about to edit, and watch your token bill drop. The first turn after aud full --offline is the cache-warming turn; subsequent turns hit the prompt cache and the database both.

Honest disclaimers

Both projects are pre-launch. We don’t hide that — the README opens with the disclosures.

  • Warden is Pre-Alpha (v0.1.0). APIs and on-disk layouts may shift between audit-coherence waves. Three providers fully wired; three scaffolded. WebSearch is Anthropic-only today. The --permission-mode CLI flag is currently dropped on the floor inside bootstrap.py (the parameter is accepted then discarded via _ = permission_mode — see the comment block at lines 408-414 explaining the not-yet-wired override path) — set the mode via the runtime /plan command instead, or via permissions.default_mode in settings.json. Telemetry is off by default at the FULL tier, env-killable via WARDEN_TELEMETRY=off, schema-allowlisted server-side. License activation hits a small first-party endpoint at api.wardenclient.com; the full data contract is documented in docs/telemetry.md.
  • TheAuditor binary hasn’t shipped publicly yet. Python source is being packaged via Nuitka with SQLCipher-encrypted analysis databases. Validation against OWASP Java / Python and Juice Shop benchmarks is complete (100% TPR / 0% FPR on all three). Public binary lands when adversarial-string-scan checks on the compiled artifact all pass.

What ships, ships. No vaporware promises.

Read the other side

TheAuditor wrote the complementary post from their angle — “Pair Warden with TheAuditor” — focused on the database side: what facts are pre-computed, where the 85–95% number actually comes from, and the OWASP TPR/FPR methodology.

Subscribe via the signup form on the main site for launch notifications. One email when v0.1.0 ships and you can try it. No marketing fluff.