Cost Control for Agent Fleets

Why Agent Costs Spiral

Unlike single-call LLM usage, agents can iterate many times per task, spawn sub-agents, and run in parallel. A poorly bounded agent can consume thousands of tokens on what should be a 100-token task.

Cost control for agents requires: budgets, model selection logic, and circuit breakers.

Budget Architecture

Set budgets at multiple levels:

Per task — maximum tokens per task invocation
Per session — maximum cost per conversation
Per day — daily spending limit with automatic cutoff
Per agent — some agents cost more than others; budget them separately

In clawd: $10/day, $50/week. Sessions over $0.50 trigger an alert; over $2.00 are killed.

Model Routing

Not every task needs the most capable (most expensive) model. Route intelligently:

| Task Type | Model | Cost | |-----------|-------|------| | Bulk classification | Haiku | $ | | Code generation | Sonnet | $$ | | Architecture review | Opus | $$$ | | Factual Q&A | Groq/Llama | Free |

The free fleet (Groq, Ollama, Cloudflare Workers AI) handles a surprising fraction of tasks — try free first.

Caching Agent Outputs

Identical or near-identical agent invocations should return cached results. Cache at the task level, not just the LLM call level. A research agent querying the same topic twice within 24 hours should return the cached result.

Kill Switches

Every long-running agent needs a kill switch: a mechanism to stop execution gracefully. Test it before deploying. An agent you can't stop is an agent you shouldn't deploy.

Monitor cost in real time. Alerts at 50%, 80%, 100% of budget. Automated cutoff at 110%.

Why Agent Costs Spiral

Budget Architecture

Model Routing

Caching Agent Outputs

Kill Switches

AI Agents