TheAuditor + Warden: stop guessing, start querying
Warden ships the audit trail on token economics. TheAuditor replaces read-and-guess with a database query. The pairing isn't a coincidence.
Most AI coding agents lock you into two things: a single LLM provider, and a single view of your codebase. Warden breaks the first lock. TheAuditor breaks the second. This post is from Warden’s side — the agent-side integration mechanics, what we ship today, and why pairing with TheAuditor is the rare case where two pre-launch tools were independently designed with each other in mind.
Warden, in one paragraph
Warden is a lean, multi-provider, terminal-native LLM coding agent — single-binary
CLI in Python 3.14, MIT licensed, 42,800 LOC under a strict 13-tier import DAG
enforced by import-linter on every PR. 21 builtin tools, full MCP client and
server mode, 22 hook event types, 6 permission modes, a 4-tier WARDEN.md memory
walker. Three providers fully wired today (Anthropic, OpenAI Platform + Responses
subscription, Gemini); three scaffolded (Bedrock, Vertex, Foundry) raising explicit
NotImplementedError on construction so you can’t accidentally use a half-finished
adapter. 24 model entries across the seven providers.
That’s the surface area. The interesting bit is the token economics underneath it.
Where Warden’s tokens go (and where they don’t)
Token economics is one of those things every agent claims to do and almost none ship the audit trail for. We do.
Prompt caching on the wire end-to-end. cache_control markers flow from the
composition root through the typed system-prompt blocks, into the Anthropic
adapter’s _system_to_wire, and out onto the SSE request verbatim. This was a
recent bug-fix wave (changelog entry W2.2): a typing drift had been silently
disabling Anthropic prompt cache because ApiRequestParams.system was passing
list[str] instead of list[TextBlock], and the wire layer was emitting plain
{type: text, text} dicts with no cache_control. Threaded typed end-to-end now;
+16 tests pin the contract. OpenAI and Gemini join system blocks into a single
provider-native shape and drop cache_control with a per-provider DEBUG log so
you know exactly what each adapter is doing.
Auto-compaction at ~187K estimated tokens. Four surgical strategies fire in
order: system-reminder dedup, image age strip, tool-result age decay, and
mandatory orphan cleanup (the last one reconciles tool-use / tool-result pairing
that earlier truncation broke). Manual /compact available any time. The compactor
is not a summarizer — it’s surgical message-array editing that preserves
thinking-block signatures so the next turn still cache-hits.
Hard cost governance. 12-model price table with Decimal accounting (no
float drift across long sessions), configurable per-session USD cap, 80% / 100%
threshold warnings. Per-turn cost lands in the REPL renderer’s status footer.
The dedicated token_budget/ subsystem ships an estimator, persistor, analyzer,
cache-break telemetry, and a budget tracker. The cost ledger persists per-session
to JSONL so warden --resume <uuid> continues against the original cap — no
double billing on resumed work.
Net effect: an honest per-turn token bill, observable cache behavior, and a hard ceiling on cost. That’s the cake. What TheAuditor adds is the icing — and it’s the part that matters when the model would otherwise re-read the same 2,000 lines for the third time in a debugging session.
Where TheAuditor comes in
Every LLM coding agent suffers from the same failure mode: read 2,000 lines of code, infer a call graph from indentation and comments, hallucinate three of the relationships, write a “fix” that quietly breaks two other files. TheAuditor replaces that read-and-guess loop with a sub-millisecond database query.
Their integration spec (which lives in our repo as
architecture/24-theauditor-integration.md) puts the savings at
85–95% token reduction on common investigation flows. The per-call comparison
isn’t always token-negative — calling aud_explain on a file can cost more
tokens than naively Read-ing the file once. The win is the eliminated re-reads,
mis-edits, “let me just check this neighbouring file too” rabbit holes, and
refactors that don’t match how the codebase actually calls into the symbol you’re
touching.
The honest framing: tiny token reduction per call, huge hallucination reduction. And once hallucinations stop, the multi-thousand-token recovery loops they cause stop with them. That’s where the 85–95% number comes from.
Concrete numbers from TheAuditor’s MCP-tool token-optimization audit (per-call wire bytes):
| Target | JSON before | JSON after | Δ |
|---|---|---|---|
| TS file (985 properties) | 28,628 | 18,071 | -36.9% |
| Python file (469 symbols) | 17,072 | 9,085 | -46.8% |
| Class symbol (15 callers, dups) | 8,334 | 3,630 | -56.4% |
Coverage is honest too: 100% True Positive Rate at 0% False Positive Rate on OWASP Java (11/11), OWASP Python, and OWASP Juice Shop (31/31). No risk scores, no subjective ratings — facts. Twelve languages with parity across indexing / taint / CFG / call graph / rules.
The integration mechanics (Warden-side)
The pairing surface is small because the architecture is right. Warden was built MCP-first from the start; TheAuditor exposes its facts via a standard MCP stdio server. No special-casing required.
warden install --with-code-intel writes a .mcp.json snippet pointing at
TheAuditor’s aud-mcp stdio server, plus a SessionStart hook that runs
aud full --offline --fast in the background. The model gets fresh database
state on every connect. The 8 MCP tools — aud_explain, aud_query,
aud_findings, aud_impact, aud_blueprint, aud_session, aud_reindex,
aud_analytics — surface in Warden’s tool pool under the same permission grammar
as builtins. No custom permission rules to write.
MCP prompts → Warden skills. TheAuditor’s /theauditor:planning,
/theauditor:security, /theauditor:impact slash commands surface in Warden as
/theauditor:* skills via our existing fetch_mcp_skills_for_connection bridge.
Zero code on either side — Warden’s MCP-prompt-to-skill converter already does
the work. List them with /help.
Context Gate (optional). If you want to be aggressive about it, a
PreToolUse hook can hard-block Edit / Write until the model has called
aud_explain on the target file in the current session. The hook is six lines
of bash + a JSON decision; subprocess transport, exit code = decision. The policy
is yours — Warden enforces, you author.
Cost ledger continuity. TheAuditor’s MCP tool calls cost tokens like any
other tool. They land in the same per-model Decimal ledger, the same
/cost output, the same per-session USD cap. Compaction skips tool_use /
tool_result pairs that are still being acted on; older aud_explain results
decay per the standard tool-result age strategy.
Set it up in three commands
pip install warden theauditor # both are pip-installable, Python 3.14+
cd your-project
aud full --offline # index — 30s for small projects, 10 min for 100K+ LOC
warden install --with-code-intel # writes .mcp.json + SessionStart hook
Open a Warden session. The model gets TheAuditor’s 8 MCP tools on first invocation.
Type a prompt. Watch the model call aud_explain instead of Read on the file
it’s about to edit, and watch your token bill drop. The first turn after
aud full --offline is the cache-warming turn; subsequent turns hit the prompt
cache and the database both.
Honest disclaimers
Both projects are pre-launch. We don’t hide that — the README opens with the disclosures.
- Warden is Pre-Alpha (v0.1.0). APIs and on-disk layouts may shift between
audit-coherence waves. Three providers fully wired; three scaffolded. WebSearch
is Anthropic-only today. The
--permission-modeCLI flag is currently dropped on the floor insidebootstrap.py(the parameter is accepted then discarded via_ = permission_mode— see the comment block at lines 408-414 explaining the not-yet-wired override path) — set the mode via the runtime/plancommand instead, or viapermissions.default_modeinsettings.json. Telemetry is off by default at the FULL tier, env-killable viaWARDEN_TELEMETRY=off, schema-allowlisted server-side. License activation hits a small first-party endpoint atapi.wardenclient.com; the full data contract is documented indocs/telemetry.md. - TheAuditor binary hasn’t shipped publicly yet. Python source is being packaged via Nuitka with SQLCipher-encrypted analysis databases. Validation against OWASP Java / Python and Juice Shop benchmarks is complete (100% TPR / 0% FPR on all three). Public binary lands when adversarial-string-scan checks on the compiled artifact all pass.
What ships, ships. No vaporware promises.
Read the other side
TheAuditor wrote the complementary post from their angle — “Pair Warden with TheAuditor” — focused on the database side: what facts are pre-computed, where the 85–95% number actually comes from, and the OWASP TPR/FPR methodology.
Subscribe via the signup form on the main site for launch notifications. One email when v0.1.0 ships and you can try it. No marketing fluff.