Operational failure

Infinite Loop

When an agent repeats reasoning, tool calls, or retries without making meaningful progress.

Definition

An infinite loop occurs when an AI agent repeats reasoning steps, tool calls, retrieval attempts, or retries without making meaningful progress. Some loops are literal; others are bounded by system limits but still waste time, money, and user trust.

Why it matters

Loops can create cost runaway, latency spikes, tool-rate-limit problems, bad user experiences, and downstream workflow delays. A loop also indicates that the agent may lack proper stopping conditions, uncertainty handling, or recovery behavior.

Where it appears

Research agents, coding agents, browsing agents, IT automation, customer support agents, retrieval workflows, and systems that automatically retry failed tool calls.

Symptoms

  • Repeated calls to the same tool with similar arguments.
  • Repeated retrieval of similar documents.
  • The agent keeps revising without improving.
  • The agent retries after the same error.
  • Long traces with no state progress.
  • User waits while the agent continues unnecessary work.
  • Repeated action patterns.

Detection signals

  • High retry counts.
  • No meaningful state transition.
  • Similar tool inputs across steps.
  • Cost or latency spikes.
  • Agent reaches maximum step limit.

Example scenario

A coding agent encounters a test failure. It repeatedly makes small code changes, reruns the same test, receives the same error, and continues without diagnosing the underlying dependency mismatch.

Severity scoring

Low

Short loop stopped automatically.

Medium

Loop increases latency or cost.

High

Loop blocks workflow or consumes significant resources.

Critical

Loop triggers external actions, rate limits, outages, or material cost exposure.

Eval strategy

Test ambiguous, impossible, or tool-error scenarios. Evaluate whether the agent stops, asks for clarification, changes strategy, or escalates instead of repeating the same behavior.

Runtime monitoring strategy

Monitor repeated tool calls, retries, state progress, trace length, cost, and latency. Alert when workflows exceed expected step counts or repeat patterns without new evidence.

Mitigation strategies

  • Add maximum step and retry limits.
  • Detect repeated actions.
  • Require strategy change after repeated failure.
  • Add explicit stop conditions.
  • Escalate when progress stalls.
  • Budget tokens, tool calls, and elapsed time.

Where FailureModes.ai fits

FailureModes.ai helps teams detect looping behavior in agent traces, distinguish normal iteration from failure, and add monitors that prevent repeated actions from becoming production incidents.

See how your AI systems will fail — before your users do.

Book a diagnostic →