Operational failure
Cost Runaway
AI systems consuming far more resources than expected through retries, loops, long context, or excessive tool calls.
What failed
Cost runaway occurs when an AI system consumes far more resources than expected. In LLM and agent systems, this can happen through excessive token usage, repeated retries, large context windows, unnecessary tool calls, inefficient retrieval, long-running agent loops, or cascading workflows.
Architecture context
Autonomous agents, research agents, coding assistants, RAG systems, customer support bots, batch summarization pipelines, and workflows using expensive models or tools.
Impact
Cost runaway can make an AI product economically unsustainable. It can also indicate reliability problems: the system may be confused, looping, retrieving irrelevant context, or retrying failed tools. In production, cost spikes can affect margins, budgets, user experience, and system availability.
Symptoms
- Token usage rises sharply without better outcomes.
- Agents call many tools for simple requests.
- The system retries the same failing operation.
- Context windows grow with irrelevant history.
- Costs increase after prompt, model, or routing changes.
- A small group of workflows drives disproportionate spend.
Detection signals
- Cost per task.
- Tokens per successful completion.
- Tool calls per task.
- Retry counts.
- Loop length.
- Model routing frequency.
- Cost spikes by workflow, tenant, user, or model version.
Mitigations
- Set token and tool-call budgets.
- Add loop and retry limits.
- Route simple tasks to cheaper models.
- Compress or summarize context.
- Deduplicate retrieved content.
- Stop work when confidence is sufficient.
- Require approval for expensive workflows.