---
url: 'https://adk.nht.io/the-loop/primitives/media.md'
description: >-
  Typed handle to a binary asset — image, audio, video, document — that rides on
  Message.attachments and ToolCall.results.
---

# Media

[Primitives](../primitives) covers the eight-primitive overview.

A [`Media`](https://adk.nht.io/api/@nhtio/adk/common/classes/Media) is a typed handle to a binary asset — an image, an audio clip, a video, a document — that the loop can hand to a tool, a message, or a provider without ever having to inline the bytes into a string. Every other primitive on this page carries text; `Media` is the one that carries bytes. It rides on two surfaces: [`Message.attachments`](./message) (the dialogue surface just introduced — a human drops in a screenshot, a model returns generated audio) and [`ToolCall.results`](./toolcall) (the action surface, introduced later in the page — a tool returns an image or a PDF the provider can render natively). It is the primitive every modern provider's native image/audio/document content block is asking for, and it is the alternative to the two unhappy paths that exist without it: base64-encoding bytes into a [`Tokenizable`](./tokenizable) and lying to the model about what is in the buffer, or wrapping bytes in a [`SpooledArtifact`](../artifacts) subclass and surfacing handle tools — which works fine for documents the model wants to query, and is wasteful for an image the provider can render inline.

::: tip Media vs. SpooledArtifact — pick by what the model is doing with it
Use `Media` when the provider can render it natively (image/audio/document content blocks). Use [`SpooledArtifact`](../artifacts) when the model needs to *work with* the content through handle tools — grep a log, page through a JSON tree, query a Markdown document by heading. `Media` is not a strict upgrade over artifacts; it's a different silo for a different job.
:::

`Media` is dual-peer on purpose. It is *silo-peer* to [`Tokenizable`](./tokenizable): it sits in the [`ToolCall.results`](./toolcall) slot alongside `Tokenizable` and `SpooledArtifact` as one of the three shapes a result can take, and the executor renders it through its own provider-specific content block (an OpenAI Chat Completions `image_url`, an `input_audio` block, a `file` block; other providers use their own shapes). It is also *handle-peer* to [`SpooledArtifact`](../artifacts): the bytes are not held on the primitive itself but reached through a [`MediaReader`](https://adk.nht.io/api/@nhtio/adk/common/interfaces/MediaReader) contract — the framework owns the contract, the implementor owns the storage backend (in-memory buffer, OPFS file, S3 object, signed URL, whatever the case demands). Same posture, tuned for opaque binary streaming rather than line-indexed text. The two reader contracts are deliberately disjoint: there is no useful notion of "the third line of a JPEG" and no useful notion of "the byte-stream of a Markdown grep result," so the framework refuses to overload either reader with the other shape's surface.

Bytes are lazy. A `Media` instance passed through middleware, persisted via a storage hook, or serialised onto a telemetry event never materialises its bytes unless someone calls [`Media.stream`](https://adk.nht.io/api/@nhtio/adk/common/classes/Media#stream), [`Media.asBytes`](https://adk.nht.io/api/@nhtio/adk/common/classes/Media#asbytes), or [`Media.asBase64`](https://adk.nht.io/api/@nhtio/adk/common/classes/Media#asbase64). [`Media.toJSON`](https://adk.nht.io/api/@nhtio/adk/common/classes/Media#tojson) emits a metadata-only record (id, kind, mimeType, filename, source, trustTier, modalityHazard, stash) so naive event log serialisation does the safe thing by default. Render code that needs the buffer drains the stream once at the wrap site; render code that can forward the stream pipes it through without buffering; logging code never reads bytes at all.

The construction contract is opinionated about two fields that the framework refuses to default. [`Media.trustTier`](https://adk.nht.io/api/@nhtio/adk/common/classes/Media#property-trusttier) ([`Media.MediaTrustTier`](https://adk.nht.io/api/@nhtio/adk/common/classes/Media#property-mediatrusttier): `'first-party'` / `'third-party-public'` / `'third-party-private'`) mirrors [`Retrievable.trustTier`](https://adk.nht.io/api/@nhtio/adk/common/classes/Retrievable#property-trusttier) — same vocabulary, same question, same answer: where did these bytes come from, and how authoritative should the model treat them. [`Media.modalityHazard`](https://adk.nht.io/api/@nhtio/adk/common/classes/Media#property-modalityhazard) ([`Media.MediaModalityHazard`](https://adk.nht.io/api/@nhtio/adk/common/classes/Media#property-mediamodalityhazard): `'inert'` / `'extractable-instructions'` / `'opaque-perceptual'`) is the second axis, and it is *new* — there is no text-side equivalent, because text has one extraction path (read the string) and media has many (OCR, ASR transcription, frame analysis, embedded-text extraction, pixel-level vision encoding). A `third-party-public` JPEG is materially more dangerous than a `third-party-public` paragraph of text because the model itself extracts instructions during perceptual decoding, and no string-level filter can see them. Both fields are required at construction; the bare constructor refuses to guess, and the ergonomic factories ([`Media.userAttachment`](https://adk.nht.io/api/@nhtio/adk/common/classes/Media#userattachment), [`Media.toolGenerated`](https://adk.nht.io/api/@nhtio/adk/common/classes/Media#toolgenerated), [`Media.retrievedPublic`](https://adk.nht.io/api/@nhtio/adk/common/classes/Media#retrievedpublic), [`Media.retrievedPrivate`](https://adk.nht.io/api/@nhtio/adk/common/classes/Media#retrievedprivate)) force the labelling decision at the call site without becoming defaults on the constructor itself. See [Trust tiers → Media](../trust-tiers/media) for how the two axes compose at render time.

[`Media.stash`](https://adk.nht.io/api/@nhtio/adk/common/classes/Media#property-stash) is the register middleware writes into when it wants to leave a *text fallback* on the media for consumers that cannot decode the bytes natively: a logger summarising tool output, a battery that does not natively support the modality, a downstream agent running against a text-only model. Each entry stores a `{ value, trustTier, derivedFromMedia? }` triple, so derived text (a caption, an OCR pass, a transcript) carries its own trust tier — routed through its own envelope at render time, independent of the parent media's tier. The framework reserves **no keys** on `stash`; which keys a battery looks up for its fallback path is documented in the battery itself, not in the primitive.

The fallback lives in `stash` rather than a typed `description?: string` field on `Media`, and the reason is the second-order question *who writes it*. A typed field forces an answer at construction, and every answer is wrong. The handler? Then every handler returning a `Media` captions on the synchronous path — latency and tokens spent on a fallback no downstream consumer may ever read. The framework? Then the ADK is in the image-captioning business, which is not a contract it should own. The renderer? Then the fallback is recomputed once per render, on the hot path, with nowhere to cache or share it. `stash` dissolves the question by moving it out of band: an output middleware over [`ToolCall.results`](./toolcall) detects `Media`, runs whatever captioner you use (OCR, vision-caption, ASR), and writes the result — once, only when a consumer needs it, in the same pipeline that already owns authorisation, redaction, and telemetry. The "describe the asset" policy belongs there, not baked into the primitive.

::: danger Trust is content, not code-path
`Media.trustTier` is the source of truth for the trust envelope. [`Tool.trusted`](../tools#trust-on-the-tool-not-on-the-battery) does not override it, does not propagate to it, and is **not** consulted when a battery renders a `Media` result — the same principle that already governs `Retrievable.trustTier` inside a trusted tool's output. Trust is a property of where the content came from, not who fetched it. A trusted tool returning a `third-party-public` image renders that image in the untrusted envelope, every time.
:::

A `Message` carries `Media` through its `attachments` field — both `user` and `assistant` roles, as the previous section described. A tool returns `Media` (or `Media[]`) directly from its handler when it has the bytes in hand; the ADK writes the value into [`ToolCall.results`](./toolcall) without wrapping it in a `SpooledArtifact` — the shape `ToolCall` covers below. Either way, the renderer is what reaches into the asset: [`MediaReader.stream`](https://adk.nht.io/api/@nhtio/adk/common/interfaces/MediaReader#stream) for upload paths that can forward the stream; [`Media.asBytes`](https://adk.nht.io/api/@nhtio/adk/common/classes/Media#asbytes) / [`Media.asBase64`](https://adk.nht.io/api/@nhtio/adk/common/classes/Media#asbase64) for paths that need the inline buffer.

::: tip Out of scope: byte hygiene
DLP, antivirus scanning, and media moderation are production responsibilities. `Media` does not do them for you. There is no scanning hook on `MediaReader`, no `clean`/`dirty` flag on `Media`, no quarantine state. Wire byte hygiene into your tool implementations, storage adapters, middleware pipeline, or ingress layer — but wire it somewhere if untrusted bytes enter your system. The framework defines contracts; the implementor owns policy.
:::