Policy failure

Unsafe Escalation

When an agent acts, approves, or escalates without the right review, policy check, or human handoff — or fails to escalate when it should.

What failed

Unsafe escalation occurs when an AI system takes, recommends, or escalates an action without the required review, approval, policy check, or human handoff. It can also occur when the system fails to escalate a high-risk case that should not be handled autonomously.

Architecture context

Support agents, IT agents, finance workflows, HR assistants, security copilots, sales operations, procurement systems, and any workflow with approval boundaries.

Impact

Enterprise agents often operate near sensitive workflows: customer accounts, refunds, access control, HR, security, finance, legal, and operations. Incorrect escalation behavior can create compliance, security, financial, or reputational risk.

Symptoms

  • The agent performs a sensitive action without confirmation.
  • It fails to route a high-risk issue to a human.
  • It escalates to the wrong team.
  • It treats a policy exception as routine.
  • It continues acting after uncertainty rises.

Detection signals

  • High-impact actions without approval.
  • Missed escalation triggers.
  • Escalation rates changing after model or prompt updates.
  • Sensitive-tool usage in low-confidence contexts.
  • User complaints or manual reversals after agent actions.

Mitigations

  • Define clear approval boundaries.
  • Gate sensitive tools.
  • Require confirmation for high-impact actions.
  • Add escalation policies by workflow.
  • Monitor low-confidence actions.
  • Fail closed when policy context is missing.

Contribute what failed. Unlock how others fixed it.