TheAuditor + Warden: stop guessing, start querying

Most AI coding agents lock you into two things: a single LLM provider, and a single guessed view of your codebase. Warden gives you provider freedom. TheAuditor gives your agent facts. This post is from Warden’s side: the agent-side integration, what we ship today, and why pairing with TheAuditor is the rare case where two pre-launch tools were designed, independently, with each other in mind.

Warden, in one paragraph

Warden is a lean, multi-provider, terminal-native LLM coding agent: a single-binary CLI, MIT licensed. It ships builtin tools, a full MCP client and server, session persistence, hooks, permission modes, and a memory walker. Anthropic, OpenAI, and Gemini are wired today, with more scaffolded behind explicit guards so you can’t reach for a half-finished adapter by accident. That is the surface area. The interesting part is the token economics underneath.

Where Warden’s tokens go, and where they don’t

Every agent claims to do token economics. Almost none show you the bill. We do.

Prompt caching runs end to end, so stable context is not re-billed every turn, and each provider’s cache behavior is logged so you can see exactly what happened.

Auto-compaction keeps a long session inside its window without summarizing away the parts that matter, and it preserves what the next turn needs to keep hitting the cache. You can also compact on demand.

Cost governance is hard, not decorative. Exact per-model accounting with no drift across long sessions, a configurable per-session spending cap, and warnings as you approach it, with the running bill shown after every turn. The ledger persists, so a resumed session continues against the original cap and never double-bills work you already paid for.

Net effect: an honest per-turn token bill, observable cache behavior, and a hard ceiling on cost. That is the cake. What TheAuditor adds is the part that matters when the model would otherwise re-read the same two thousand lines for the third time in a debugging session.

Where TheAuditor comes in

Every LLM coding agent has the same failure mode: read a couple thousand lines, infer a call graph from indentation and comments, hallucinate a few of the relationships, and write a “fix” that quietly breaks two other files. TheAuditor replaces that read-and-guess loop with a query that returns in well under a millisecond.

Asking TheAuditor about a file is not always token-negative on its own. The win is everything you stop paying for: the re-reads, the mis-edits, the “let me just check this neighbouring file too” detours, and the refactors that don’t match how the codebase actually calls into the symbol you are touching. Once the hallucinations stop, the multi-thousand-token recovery loops they cause stop with them.

Per call, the agent receives facts instead of a source dump it has to re-parse, and payloads come back 35 to 55% smaller than an unoptimized baseline. Across a full investigation flow, with eliminated re-reads and cache hits stacked on top, TheAuditor models the realistic aggregate at an 85 to 95% token reduction. Treat the upper bound as a model output, not a guarantee.

Coverage is honest too: 100% true positive rate at a 0% false positive rate on OWASP Java (11 of 11), OWASP Python, and OWASP Juice Shop (31 of 31), across twelve languages with parity on indexing, taint, call graph, and rules. No risk scores, no subjective ratings. Facts.

The integration mechanics, Warden-side

The pairing surface is small because both sides were built MCP-first. TheAuditor exposes its facts over a standard MCP server, and no special-casing is required.

warden install --with-code-intel points Warden at TheAuditor’s MCP server and adds a session hook that refreshes the index in the background, so the model gets current facts on every connect. Its tools surface in Warden’s pool under the same permission grammar as builtins, with no custom rules to write.
Slash commands carry over. TheAuditor’s /theauditor:planning, /theauditor:security, and /theauditor:impact show up in Warden with zero code on either side. List them with /help.
An optional context gate. A pre-edit hook can block Edit and Write until the model has actually asked TheAuditor about the file in the current session. Warden enforces; you author the policy.
One cost ledger. TheAuditor’s tool calls cost tokens like any other tool, and they land in the same per-model ledger, the same /cost output, and the same per-session cap.

Set it up in three commands

pip install warden theauditor                 # both are pip-installable, Python 3.14+

cd your-project
aud full --offline                            # index: 30s for small projects, 10 min for 100K+ LOC
warden install --with-code-intel              # writes .mcp.json + SessionStart hook

Open a Warden session. The model gets TheAuditor’s MCP tools on first use. Type a prompt, watch it call aud_explain instead of Read on the file it is about to edit, and watch your token bill drop. The first turn after aud full --offline warms the cache; later turns hit the prompt cache and the database both.

Honest disclaimers

Both projects are pre-launch. We don’t hide it.

Warden is pre-alpha (v0.1.0). APIs and on-disk layouts may shift between waves. Three providers are fully wired and three are scaffolded. WebSearch is Anthropic-only today. The --permission-mode command-line flag is not wired yet, so set the mode with the /plan command or in settings. Telemetry is off by default at the full tier and can be killed with an environment variable.
TheAuditor’s binary hasn’t shipped publicly yet. Validation against the OWASP Java, Python, and Juice Shop benchmarks is complete in our internal runs. The public binary lands when the hardening checks on the compiled artifact all pass.

What ships, ships. No vaporware promises.

Read the other side

TheAuditor wrote the complementary post from their angle, “Pair Warden with TheAuditor”, focused on the database side: what facts are pre-computed and where the numbers come from.

Subscribe via the signup form on the main site for launch notifications. One email when v0.1.0 ships. No marketing fluff.