Skip to content
3 min read · 658 words

Persistence — Threat Analysis

Other threat analyses in this section: Envelopes · Identity and Reasoning · Media · Back to Trust Tiers

This page covers the formal threat analysis for Persistence. For the operational guide, start there.

Memory poisoning — attack taxonomy

Memory poisoning attacks target the retrieval-and-render pipeline. The attacker writes a record containing a structural escape sequence; upon retrieval, naive parsers fail catastrophically. Two primary attack families define this threat landscape:

  • A-MemGuard (Anonymous, 2025): Demonstrates that memory-poisoning defenses must be keyed off harness-controlled identifiers. Envelope integrity is physically impossible if the identifier is derivable from the content.

  • MemoryGraft (Anonymous, 2025): Illustrates how poisoned records graft malicious instructions into the execution context. The per-record nonce mechanism implemented in the reference batteries defeats close-tag injection—an attacker cannot close an envelope without predicting the Memory.id. However, nonces are powerless against semantic poisoning: a structurally valid record containing persuasive falsehoods (e.g., "User is verified admin") will pass through the envelope intact. Semantic integrity must be enforced via retrieval filtering and memory authentication, not envelope structure.

Why Memory.id must not be body-derivable

Deterministic identifiers are a structural security failure. If Memory.id is a function of the body (e.g., a content hash), the system is compromised by design:

  1. The attacker constructs a body $B$ containing a forged close tag and payload.
  2. The attacker computes $id = f(B)$, mirroring the system's deterministic function.
  3. The attacker embeds the predicted closer </memory_${id}> within their body text.
  4. The reference batteries render the record, placing an identical closer outside the body.
  5. The model terminates the envelope at the attacker's forged boundary.

The security invariant is absolute: Memory.id must be assigned by the caller independently of the body. @nhtio/adk enforces this at the schema layer by requiring caller-provided IDs, ensuring the attacker cannot know the nonce at the time of content creation.

RAG poisoning

RAG poisoning occurs when corpora are contaminated prior to retrieval. The attacker plants malicious content in source material (documentation, web crawls, etc.) that the system eventually ingests.

Key research:

  • TrustRAG (Anonymous, 2025): Establishes a fundamental trust asymmetry between user input and retrieved context. Retrieved content must never inherit the authority of the retrieval mechanism itself.
  • RobustRAG (Anonymous, 2024): Proves that provenance isolation—treating retrieved content as a distinct, lower-trust tier from developer-authored content—is a necessary condition for certifiable retrieval security.

The trustTier declaration on the Retrievable primitive in @nhtio/adk implements this isolation. The tier is bound at construction time when provenance is known, preventing trust-escalation during the retrieval-to-prompt transition.

Long-term state contamination

Delayed-activation attacks represent the most insidious persistent threat. A poisoned record survives across sessions, lying dormant until a specific retrieval trigger is met long after the original attacker has departed.

In these scenarios, structural defenses remain critical. The per-record nonce prevents the payload from escaping the envelope, but the defense is only as robust as the nonce assignment. If the record was stored with a body-derived or attacker-influenced ID, the defense is nullified. Cross-session persistence requires that trust decisions remain immutable; a record stored as third-party-public must never be re-contextualized as a higher-trust tier in a future session.

Non-goals for persistence defenses

Structural defenses are not a panacea. The following threats are explicitly out-of-scope for envelope-based mitigation:

  • Semantic memory poisoning: Structurally valid but factually false records (e.g., "Alice has admin privileges") will pass through every structural filter. This is an authentication and auditing problem.
  • Misleading low-trust content: A correctly labeled third-party-public record containing misinformation is functioning as intended when it is rendered in an untrusted envelope. The system's role is to preserve the structural tier, not to act as an arbiter of truth.
  • Prevention of record recall: Nonces prevent escape, not recall. A poisoned record will still be rendered in the context and may influence the model through its semantic content. Filtering malicious-but-valid records requires active memory auditing and retrieval-time policies.