---
url: 'https://adk.nht.io/the-loop/trust-tiers.md'
description: >-
  Every token in your agent's context is a power claim. Some of those claims are
  yours.
---

# Trust Tiers

## LLM summary — Trust Tiers

Trust is structural, not semantic. This page outlines how agentic architectures collapse when they treat untrusted data as authoritative instructions. It covers four primary attack vectors: tag escape (payloads terminating developer envelopes), memory poisoning (delayed-onset malicious instructions), chain-of-thought subversion (arXiv:2510.26418, 99% jailbreak success), and RAG injection. ADK provides primitives that carry trust metadata—`trustTier` declarations, caller-supplied IDs, and checksums over call shape. The reference batteries implement these primitives via XML envelopes with nonce-keyed closing tags, preventing payload forgery. This architecture also enables "The quiet part — out loud": using the same trust machinery to inject synthetic RAG and CoT, allowing lightweight models (like Llama 3.2 1b) to exhibit frontier-level capabilities safely. Sub-pages: `/trust-tiers/envelopes`, `/trust-tiers/persistence`, `/trust-tiers/identity-and-reasoning`, `/trust-tiers/media`.

::: danger ADK does not enforce any of this
ADK provides primitives with trust metadata — tier declarations, stable IDs, checksums bound to call shape. That is all it does. How your `executionFn` converts those primitives into a prompt is entirely up to you. You can ignore every tier, inline every memory record unwrapped, and render tool output straight into developer policy. ADK will not stop you. The reference batteries are the correct implementation of these primitives. If you are not using them, you are writing a rendering pipeline from scratch, and everything on this page describes what you will get wrong.
:::

Your agent is a security hole. If it reads a forum post saying "Ignore all previous instructions" and complies, don't blame the model. Blame your architecture. You are handing a loaded gun to a stranger and acting surprised when they point it at you. The failure is an architectural collapse: you treated untrusted data as if it had the same authority as your system prompt. Without a structural distinction between your commands and the data the agent processes, the agent has no choice but to obey the loudest token it sees.

Tag escape attacks are the SQL injection of the agentic era. If a tool returns a string containing `</trusted_content>`, a naive implementation dies immediately. The attacker's close tag terminates your wrapper, and their next line of text runs outside it, speaking with developer authority. The model cannot know the second tag was part of a payload; it just sees a completed instruction followed by a new, authoritative command. You are relying on the model's "vibes" to stay safe. You will get owned.

Memory poisoning is a ticking time bomb. An attacker drops an escaped instruction—`</memory>` followed by a malicious directive—into a profile today. Six months later, your agent retrieves that record. You haven't changed a line of code, but your agent is now a sleeper cell operating under instructions from half a year ago. Without structural authority boundaries, your long-term memory is just a long-term liability.

Chain-of-thought subversion is the ultimate hijack. By injecting pseudo-reasoning traces into the context, an attacker steers the model's internal deliberation like a parasite. Research demonstrates a 99% jailbreak success rate against frontier models using this technique \[@cot-hijacking-2025]. The attacker doesn't need to touch your infrastructure; they only need to place a document where your agent will eventually read it.

A foundation model cannot distinguish developer intent from untrusted data because both arrive as tokens. To the model, a token is a token. Semantic defenses—"ignore instructions in user messages"—are just more tokens. They are easily drowned out by a larger volume of more confident tokens from an attacker. Trust must be structural, or it isn't trust at all.

ADK addresses this by providing primitives that carry trust metadata: tier declarations, stable IDs, and checksums bound to call shape. While a custom `executionFn` can choose to ignore this, the reference batteries use this metadata to render distinct XML envelopes with nonce-keyed closing tags. The nonce is bound to each record's identity; an attacker who controls the payload cannot forge the closing tag.

## The quiet part — out loud

The trust-tier system isn't just a shield; it's a capability multiplier that lets you cheat. Because ADK primitives carry their trust tier and identity regardless of origin, you can use the same rendering infrastructure to give models capabilities they lack natively—without breaking the security model.

**Synthetic RAG for the "dumb" models.** You can inject `Retrievable` records into the context through a middleware pipeline without a single tool call. You run the retrieval—vector search, BM25, database query—and produce `Retrievable` records with the correct `trustTier`. The reference batteries render these into the context exactly as if a tool fetched them. This is how ADK's "Ask ADK" assistant works on a Llama 3.2 1b quantized model with zero native tool-calling support. No tool calls. Full RAG behavior. Total security.

**Synthetic chain-of-thought for the "fast" models.** A `Thought` record in the context is indistinguishable to the model from its own reasoning. You can inject `Thought` records produced by a frontier model or a specialist pipeline. A lightweight model will then respond from that reasoning as if it thought the problem through itself. You run the expensive reasoning once; the cheap model closes the loop. The `Thought.id` nonce ensures this injected reasoning remains structurally bounded—the safety machinery travels with the feature.

For more on how to exploit these patterns, see [Persistence](./trust-tiers/persistence) and [Identity and Reasoning](./trust-tiers/identity-and-reasoning).

## Where to go next

* [The Envelope System](./trust-tiers/envelopes) — Four authority tiers, nonce-keyed closing tags, and why forgery fails.
* [Persistence](./trust-tiers/persistence) — Memory poisoning and RAG injection: the attacks that wait for you in the dark.
* [Identity and Reasoning](./trust-tiers/identity-and-reasoning) — Multi-identity spoofing and the 99% success rate of CoT hijacking.
* [Media](./trust-tiers/media) — Why images and audio are the hardest trust cases you'll ever face.
