Embeddings batteries

ADK has no embeddings primitive. The harness only ever sees a Retrievable and its score — never a vector. So these batteries are not executors and they do not plug into a callback slot. They are embedders: you call them from your own retrieval middleware to turn text into vectors, rank your corpus, and inject the winners as Retrievable records. See Bring your own retrieval for where that middleware lives.

ADK ships four embedders, and they are deliberately the same battery in every respect except the engine:

OpenAIEmbeddingsAdapter — POSTs to an OpenAI-/v1/embeddings-compatible endpoint over raw fetch. No SDK, so it runs unchanged in Node, the browser, edge runtimes, and workers. Point baseURL at OpenAI, Azure-behind-a-proxy, vLLM, Together, or a local gateway.
WebLLMEmbeddingsAdapter — embeds in-process on WebGPU via @mlc-ai/web-llm's engine.embeddings.create(). No network, no API key. Browser-only.
TransformersJsEmbeddingsAdapter — embeds on-device via transformers.js's feature-extraction pipeline. Environment-neutral: native ONNX in Node, WASM/WebGPU in the browser. No network, no API key, no GPU requirement.
OllamaEmbeddingsAdapter — POSTs to a native Ollama /api/embed endpoint over raw fetch. No SDK, runs unchanged everywhere fetch exists. Point baseURL at your local or remote Ollama instance.

All four expose embed / embedMany / dimensions / preload / reset / isAvailable with the same signatures, return plain number[], and handle query/document prefixes identically. You can swap one for another by changing the constructor and nothing else.

Vectors are runtime-specific — one backend per corpus

Embeddings from different runtimes are not interchangeable, even for the same model: transformers.js vs WebLLM/MLC vectors for the same model can sit at only ~0.77 cosine, and even native-ONNX (Node) vs WASM-ONNX (browser) can differ subtly. A vector index must be built and queried by one backend — don't embed documents with one battery and queries with another.

A minimal on-device (no API key, no network) example with the transformers.js battery:

typescript

import { TransformersJsEmbeddingsAdapter } from '@nhtio/adk/batteries/embeddings/transformers_js'

const embedder = new TransformersJsEmbeddingsAdapter({
  model: 'onnx-community/all-MiniLM-L6-v2-ONNX',
  dtype: 'fp32',          // 'auto' | 'fp32' | 'fp16' | 'q8' | 'q4' | … (must exist for the model)
})
const [vec] = await embedder.embedMany(['hello world'])  // number[] — pooled + L2-normalised

One shape, four engines

typescript

import { OpenAIEmbeddingsAdapter } from '@nhtio/adk/batteries/embeddings/openai'

const embedder = new OpenAIEmbeddingsAdapter({
  model: 'text-embedding-3-small',
  apiKey: process.env.OPENAI_API_KEY,
})

const qv = await embedder.embed('how do trust tiers work?', { kind: 'query' })
const docs = await embedder.embedMany(corpusTexts) // kind defaults to 'document'

The WebLLM battery is the same call shape — only the constructor differs:

typescript

import { WebLLMEmbeddingsAdapter } from '@nhtio/adk/batteries/embeddings/webllm'

const embedder = new WebLLMEmbeddingsAdapter({
  model: 'snowflake-arctic-embed-m-q0f32-MLC',
  // Arctic is asymmetric: prefix queries, leave documents bare.
  queryPrefix: 'Represent this sentence for searching relevant passages: ',
  onInitProgress: (r) => console.log(r.text),
})

if (!embedder.isAvailable()) throw new Error('WebGPU required for the WebLLM embedder')
await embedder.preload() // warm the engine before the first query

const qv = await embedder.embed('how do trust tiers work?', { kind: 'query' })

Return type: `number[]`, not `Float32Array`

embed returns number[] and embedMany returns number[][] — the native shape of the OpenAI /v1/embeddings response (encoding_format: 'float'), Ollama's response, and WebLLM's Embedding.embedding. If you pack vectors into a contiguous typed-array buffer for fast cosine math, coerce at your boundary:

typescript

const vec = new Float32Array(await embedder.embed(text, { kind: 'query' }))

`model` is required — no default

No battery defaults the model. The right embedding model is a deployment decision (dimensionality, language, cost, latency), so you must name it. A missing or empty model throws at construction:

typescript

new OpenAIEmbeddingsAdapter({}) // throws E_INVALID_OPENAI_EMBEDDINGS_OPTIONS

Query vs document prefixes

Asymmetric embedding models expect an instruction prefix on the query side and none on the document side. The kind option drives this from one shared code path:

kind: 'query' → prepend queryPrefix (if set).
kind: 'document' → prepend documentPrefix (if set). Default when kind is omitted.

Set the prefixes once on the constructor; the battery applies them per call. Symmetric models (e.g. OpenAI text-embedding-3-*) need no prefix — leave both unset.

Wiring an embedder into retrieval

Embedders produce vectors; you own the vector store and the ranking. The pattern is: embed the query in turnInputPipeline, search your store, and inject the hits as Retrievable records with the similarity in score. The executor renders those records inside trust-tier envelopes — see Bring your own retrieval.

typescript

import { Retrievable } from '@nhtio/adk/common'
import { OpenAIEmbeddingsAdapter } from '@nhtio/adk/batteries/embeddings/openai'

const embedder = new OpenAIEmbeddingsAdapter({
  model: 'text-embedding-3-small',
  apiKey: process.env.OPENAI_API_KEY,
})

const retrievalMiddleware = async (ctx, next) => {
  const query = [...ctx.turnMessages].at(-1)?.content.toString() ?? ''
  if (query) {
    const qv = await embedder.embed(query, { kind: 'query' })
    const hits = await myVectorStore.search(qv, { topK: 5 }) // your store, your search

    for (const hit of hits) {
      const now = new Date()
      ctx.turnRetrievables.add(
        new Retrievable({
          id: hit.id,
          content: hit.text,
          trustTier: 'first-party',
          source: hit.url,
          score: hit.similarity, // the embedding-derived rank lands here
          createdAt: now,
          updatedAt: now,
        })
      )
    }
  }
  return next()
}

Embed your corpus the same way — with embedMany(texts) (documents) — at ingest time, store the vectors, and you are done. ADK plays no part in ingestion; it only consumes the Retrievable records your middleware produces.

Optional peer dependencies

The WebLLM battery imports @mlc-ai/web-llm lazily, and the transformers.js battery imports @huggingface/transformers lazily — both declared as optional peer dependencies. Consumers who only use the OpenAI or Ollama batteries (or no embeddings at all) install nothing extra and pay nothing in their bundle. Install the peer for the on-device battery you want:

bash

pnpm add @huggingface/transformers   # for TransformersJsEmbeddingsAdapter
pnpm add @mlc-ai/web-llm             # for WebLLMEmbeddingsAdapter

Note the transformers.js Node backend (onnxruntime-node) is a compiled native addon — a platform-specific consideration for Lambda / Alpine / ARM deployments.

What each pipeline owns

Envelopes

Persistence

Identity and Reasoning

Media

Embeddings batteries

One shape, four engines

Return type: `number[]`, not `Float32Array`

`model` is required — no default

Query vs document prefixes

Wiring an embedder into retrieval

Optional peer dependencies

What each pipeline owns

Envelopes

Persistence

Identity and Reasoning

Media

Embeddings batteries ​

One shape, four engines ​

Return type: number[], not Float32Array ​

model is required — no default ​

Query vs document prefixes ​

Wiring an embedder into retrieval ​

Optional peer dependencies ​

Embeddings batteries

One shape, four engines

Return type: `number[]`, not `Float32Array`

`model` is required — no default

Query vs document prefixes

Wiring an embedder into retrieval

Optional peer dependencies