Operational failure
Cost Runaway
AI systems consuming far more resources than expected through retries, loops, long context, or excessive tool calls.
Definition
Cost runaway occurs when an AI system consumes far more resources than expected. In LLM and agent systems, this can happen through excessive token usage, repeated retries, large context windows, unnecessary tool calls, inefficient retrieval, long-running agent loops, or cascading workflows.
Why it matters
Cost runaway can make an AI product economically unsustainable. It can also indicate reliability problems: the system may be confused, looping, retrieving irrelevant context, or retrying failed tools. In production, cost spikes can affect margins, budgets, user experience, and system availability.
Where it appears
Autonomous agents, research agents, coding assistants, RAG systems, customer support bots, batch summarization pipelines, and workflows using expensive models or tools.
Symptoms
- Token usage rises sharply without better outcomes.
- Agents call many tools for simple requests.
- The system retries the same failing operation.
- Context windows grow with irrelevant history.
- Costs increase after prompt, model, or routing changes.
- A small group of workflows drives disproportionate spend.
Detection signals
- Cost per task.
- Tokens per successful completion.
- Tool calls per task.
- Retry counts.
- Loop length.
- Model routing frequency.
- Cost spikes by workflow, tenant, user, or model version.
Example scenario
A research agent is asked to summarize a company. It repeatedly searches, retrieves overlapping documents, sends long contexts to an expensive model, and retries after minor formatting errors. The final answer is acceptable, but the cost is many times the expected budget.
Severity scoring
Low
Isolated inefficient trace.
Medium
Recurring cost increase in a non-critical workflow.
High
Cost runaway affects product margins, budgets, or user limits.
Critical
Runaway costs degrade service, trigger outages, or create material financial exposure.
Eval strategy
Evaluate cost efficiency alongside quality. Test maximum tool calls, token budgets, retry policies, routing decisions, and completion thresholds. Include adversarial cases that tempt the agent to over-search or over-reason.
Runtime monitoring strategy
Monitor cost per task, token usage, tool-call counts, retry loops, context growth, and model-routing patterns. Alert when cost deviates from expected bands for a workflow.
Mitigation strategies
- Set token and tool-call budgets.
- Add loop and retry limits.
- Route simple tasks to cheaper models.
- Compress or summarize context.
- Deduplicate retrieved content.
- Stop work when confidence is sufficient.
- Require approval for expensive workflows.
Where FailureModes.ai fits
FailureModes.ai helps teams connect cost spikes to underlying failure modes, identify inefficient traces, and add monitors that catch runaway behavior before it becomes a production incident.
Related
Continue exploring.
- →
Infinite Loop
When an agent repeats reasoning, tool calls, or retries without making meaningful progress.
- →
Tool Misuse
When agents pick the wrong tool, pass bad arguments, ignore tool output, or act without required confirmation.
- →
Retrieval Failure
When an AI system retrieves stale, irrelevant, incomplete, conflicting, or poorly ranked context — often the root cause of bad RAG answers.
- →
Planning Failure
When an AI agent decomposes a task incorrectly, picks a wrong strategy, skips required steps, or fails to adapt to new information.
- →
Cascading Agent Failure
One local error in an agent workflow propagates into a larger workflow failure across tools, memory, or systems.