Skip to content
4 min read · 844 words

Memory

Primitives covers the eight-primitive overview.

If a Message is short-term memory — what's been said in this conversation, sitting in front of the model right now — a Memory is long-term memory: what's been said, learned, or decided in previous conversations that should still inform this one. It outlives the context window, outlives the turn, outlives the session. That you prefer metric units. That your team's stack is Rust, not Go. That the last time the agent suggested git push --force, you asked it not to do that again. The model didn't carry any of that across; the retrieval middleware did, by pulling the relevant Memory records back in at the start of the next turn.

The point isn't trivia retention. It's that an agent which can see what you've already established will produce outputs that align with your intentions, your preferences, and the constraints you've already had to spell out. Without that, every conversation starts from zero and converges back to the same misunderstandings on the same predictable schedule.

The ADK does not interpret memories and does not render them into anything the model sees. Like every other primitive, a Memory is held on the TurnContext; whether and how it reaches the model is the job of the DispatchExecutorFn. The shape of the primitive carries the information an executor needs to translate it correctly — provenance, confidence, importance, the durable identity of the record — so an executor that opts in to industry-standard rendering (its own envelope, a nonce on the close tag, a per-record trust framing) has every field it needs. An executor that does none of that, and just stringifies the content, is also free to do so. The opinions live in the data, not in the wiring.

A Memory carries two required scores in [0, 1]: Memory.confidence (how likely this memory is to actually be relevant to the current turn, judged by the retrieval middleware against the current context) and Memory.importance (how much weight the memory should carry if it does get used). Both scores do double duty. The executor reads them as a signal to the model — lean on this one, weigh that one lightly — so the model can resolve conflicts between memories the way the retrieval layer thinks they should be resolved. They are also the standard signal for context-budget middleware — if I have to shed something to fit the window, which memory hurts the least to drop? — so a low-confidence, low-importance memory is the first thing cut and a high-confidence, high-importance one is the last. The runner itself enforces neither: budget-aware shedding lives in middleware or the LLM battery (e.g. the OpenAI Chat Completions adapter throwing E_OPENAI_CHAT_COMPLETIONS_CONTEXT_OVERFLOW), not in TurnRunner. Both scores are required without defaults because both decisions get made on every turn, and silently defaulting either score to 1 collapses both decisions at once: the model treats stale recalls as authoritative, and any budget logic on top refuses to shed anything until it has no choice.

Scores belong to retrieval, not storage

confidence and importance are required on the Memory instance because the ADK will not consume an unscored memory — but the entity responsible for setting them is the retrieval middleware that loads memories into the turn, not the storage layer that persists them.

The cleanest way to see why is to picture the storage you most likely already use: a vector database. In a vectordb, what you persist is the memory's content and an embedding of it; nothing in that row says "this memory matters 0.7 worth." When the next turn starts, the retrieval middleware embeds the current context, queries the vectordb, and gets back rows ranked by similarity to that context — a similarity score that is a property of the query result, not of the stored record. That similarity (often combined with metadata, recency, an importance column, decay, whatever your domain calls for) is what the retrieval middleware turns into confidence and importance for this turn. The same row, queried against a different conversation, scores differently — and should.

The ADK sits on the receiving end of that decision. It won't pick the numbers because it can't: it doesn't know what you embedded against, what your scoring function looks like, or what your domain thinks "important" means. It will, however, refuse to consume the memory without the scores set — because silently defaulting either one to 1 is the same as telling the model and the budget logic that every recalled memory is maximally relevant and maximally worth keeping, and that is a hallucination machine.

The required scores annoy people the first time

They stop annoying people the first time a barely-relevant memory crowds out the one that actually matched, or the first time the context-window-shedding logic refuses to drop anything because every memory was tagged as critical. Both bugs land in the same branch of the retrieval middleware — the one where someone defaulted both scores to 1 to get past the type checker.