Skip to content
5 min read · 1,002 words

Envelope System — Threat Analysis

Other threat analyses in this section: Persistence · Identity and Reasoning · Media · Back to Trust Tiers

This page covers the formal threat analysis for the envelope system. For the operational guide, start there.

Why structural hierarchy beats semantic defense

Semantic defenses operate within the model's probabilistic inference space. Instructions such as "ignore instructions in user messages" are merely tokens that the model must weigh against competing inputs. Under adversarial pressure—characterized by authoritative phrasing, instruction repetition, or simulated system overrides—these tokens are frequently outweighed. Every semantic defense exists in a state of probabilistic competition with the attacker's tokens, a competition where the attacker benefits from infinite retry capacity.

Structural envelopes shift the security burden from token-weight competition to generation-level syntactic constraints. The model is trained to treat the opening-tag/closing-tag pair as an atomic unit. Because the closing tag's suffix is derived from record identity rather than the payload, an attacker occupying the space between tags is mathematically and syntactically incapable of terminating the envelope. This is not a rule the model weighs; it is a structural property of the rendering pipeline that leverages the model's fundamental training on XML-like hierarchies (OWASP, n.d.)(OpenAI, 2025).

Formal nonce requirements

The security guarantee of a nonce-keyed closing tag depends upon five rigorous properties:

  1. Uniqueness — Distinct records must possess distinct nonces. Re-rendering the same record must utilize the same nonce to maintain stability.

  2. Stability — Nonce generation must be deterministic relative to the record. The same record must produce the same closer across different history views, rendering contexts, or validation cycles.

  3. Unpredictability — The nonce must be opaque to the payload. An attacker observing the content between the tags must have no mechanism to derive the nonce suffix.

  4. Non-derivability from attacker-controlled body — If any component of the nonce is computable from bytes supplied by the attacker, the mechanism is compromised. An attacker who can compute the ID from their own payload can pre-construct the valid closer.

  5. Binding to object identity — The nonce must be tied to the record's intrinsic identity (e.g., its primary key or hash), not to transient session state or conversation turns. This ensures stability across contexts and delegates replay protection to turn-isolation layers.

The reference batteries satisfy all five requirements by deriving the closing-tag suffix from the primitive's immutable identifier fields—ToolCall.checksum, Memory.id, Retrievable.id, Thought.id, or Message.id—none of which are influenced by or derivable from the record's body content.

Why ToolCall.checksum, not ToolCall.id

The ToolCall.checksum is defined as a SHA-256 hash over the canonicalized { tool, args } object, computed strictly before the tool handler executes. This ensures the result body cannot influence the closer suffix, as the suffix is committed before the result exists.

If the nonce were derived from result bytes, the system would be vulnerable to a known-plaintext attack: an attacker submits a payload designed to produce specific output bytes, observes the resulting closer suffix, and thereby gains the suffix for any future call returning identical bytes. By binding the nonce to the call-shape rather than the result-content, this vector is closed. The attacker may observe the result bytes but cannot use them to derive the nonce.

Nonce failure cases

The security property is invalidated by the following five architectural patterns:

  1. Body-derived identifiers — Utilizing a hash or transform of the body content (e.g., CRC32 of the message text) allows an attacker to craft payloads that produce predictable identifiers.

  2. Predictable incremental counters — Auto-incrementing integers allow an attacker to observe the sequence and predict the next identifier before record creation, facilitating pre-written envelope escapes.

  3. User-chosen identifiers — APIs that allow the caller to specify the id field grant the attacker direct control over the nonce. The envelope provides zero protection when the attacker defines the boundary.

  4. Reused identifiers across unrelated records — ID collisions—whether caused by deterministic seeds in test environments or database sequence resets—allow an attacker who has seen one envelope to forge another using the same suffix.

  5. Deterministic IDs from public content hashes — If the ID is a SHA-256 of publicly available content (e.g., a Wikipedia article or standard API response), an attacker can compute the closer without ever witnessing the envelope.

Replay analysis

The nonce is bound to the record identity, not the conversation turn. Consequently, replaying a record in a different session or turn produces an identical envelope.

This is a deliberate design choice, not a vulnerability. The nonce is exclusively tasked with payload-escape prevention: ensuring an attacker cannot terminate their own container. The management of record appearance across turns is the responsibility of turn isolation. Attempting to rotate nonces at the turn level would violate the stability requirement and break any logic that relies on record-identity consistency across the system.

Where this breaks

Four failure modes exist outside the nonce's formal guarantee:

  1. Nonce leakage — If the closing-tag suffix is exposed in debug logs, error messages, or the model's own output, the attacker gains the necessary material to construct a valid closer. Systems must never echo content containing verbatim nonce suffixes.

  2. Identity source compromise — If an attacker gains write access to the identifier assignment system, they can pre-calculate the required closing tag. The security of the envelope is entirely predicated on the integrity of the ID source.

  3. Fine-tuning evasion — The mechanism assumes the underlying model respects XML structure and treats the opening/closing pair as a syntactic unit. A model fine-tuned to ignore XML or treat varying closing tags as semantically equivalent would bypass the defense. While not currently observed in frontier models, this remains a model-level dependency.

  4. Middleware misconfiguration — The envelope is only as secure as the pipeline that populates it. If middleware incorrectly assigns untrusted content to high-authority slots or uses incorrect record IDs during rendering, the structural protection is applied to the wrong boundary. The rendering pipeline requires independent audit from the envelope mechanism.