Security failure
Data Leakage
When an AI system exposes sensitive, confidential, regulated, or unauthorized information through outputs, retrieval, memory, or tool use.
Definition
Data leakage occurs when an AI system exposes sensitive, confidential, private, regulated, or unauthorized information. Leakage can happen through model output, retrieved context, tool responses, logs, memory, prompt injection, or cross-user state contamination.
Why it matters
Enterprise AI systems often interact with customer data, employee data, internal documents, credentials, business strategy, legal material, and regulated records. Data leakage can create legal, compliance, security, reputational, and customer-trust risk.
Where it appears
Customer support assistants, internal knowledge copilots, email agents, HR assistants, sales tools, legal workflows, RAG systems, memory-enabled assistants, and agents with broad tool permissions.
Symptoms
- The system reveals data the user should not access.
- It includes confidential source text in an output.
- It mixes information across users, tenants, or projects.
- It exposes hidden prompts, credentials, or system metadata.
- It sends sensitive information to an external tool or URL.
Detection signals
- Sensitive data patterns in outputs.
- Access-control mismatches between user and retrieved content.
- Prompt-injection attempts to extract data.
- Cross-tenant or cross-user context references.
- Tool calls containing sensitive payloads.
Example scenario
An internal knowledge assistant retrieves a confidential acquisition planning document and summarizes it for an employee who does not have permission to access that source.
Severity scoring
Low
Low-sensitivity internal detail disclosed to an authorized user.
Medium
Internal information exposed to a broader audience than intended.
High
Confidential, customer, employee, legal, financial, or security data exposed.
Critical
Regulated data, credentials, trade secrets, or cross-tenant data leaked.
Eval strategy
Test access-control boundaries, sensitive-data handling, prompt-injection attempts, memory isolation, and retrieval permissions. Include users with different roles and entitlements.
Runtime monitoring strategy
Monitor outputs, retrieved sources, memory reads, tool calls, and logs for sensitive-data patterns and authorization mismatches. Track leakage risk by workflow and data source.
Mitigation strategies
- Enforce permissions before retrieval and tool use.
- Redact sensitive data where appropriate.
- Limit tool access by role and workflow.
- Prevent cross-tenant memory contamination.
- Add prompt-injection defenses.
- Log and audit sensitive-data access.
- Escalate high-risk outputs for review.
Where FailureModes.ai fits
FailureModes.ai helps teams detect data-leakage patterns, connect incidents to retrieval, memory, or tool causes, and monitor sensitive workflows for recurring exposure risk.
Related
Continue exploring.
- →
Prompt Injection
Malicious or unintended instructions embedded in user input, retrieved content, or tool output that override system behavior.
- →
Retrieval Failure
When an AI system retrieves stale, irrelevant, incomplete, conflicting, or poorly ranked context — often the root cause of bad RAG answers.
- →
Memory Drift
When AI systems rely on memory that is stale, incorrect, irrelevant, or misapplied across sessions and workflows.
- →
Unsafe Escalation
When an agent acts, approves, or escalates without the right review, policy check, or human handoff — or fails to escalate when it should.
- →
Hallucination
False, unsupported, fabricated, or ungrounded information produced confidently by an AI system.