Skip to content
3 min read · 502 words

Embeddings batteries

ADK has no embeddings primitive. The harness only ever sees a Retrievable and its score — never a vector. So these batteries are not executors and they do not plug into a callback slot. They are embedders: you call them from your own retrieval middleware to turn text into vectors, rank your corpus, and inject the winners as Retrievable records. See Bring your own retrieval for where that middleware lives.

ADK ships two embedders, and they are deliberately the same battery in every respect except the engine:

  • OpenAIEmbeddingsAdapter — POSTs to an OpenAI-/v1/embeddings-compatible endpoint over raw fetch. No SDK, so it runs unchanged in Node, the browser, edge runtimes, and workers. Point baseURL at OpenAI, Azure-behind-a-proxy, vLLM, Together, or a local gateway.
  • WebLLMEmbeddingsAdapter — embeds in-process on WebGPU via @mlc-ai/web-llm's engine.embeddings.create(). No network, no API key. Browser-only.

Both expose embed / embedMany / dimensions / preload / reset / isAvailable with the same signatures, return plain number[], and handle query/document prefixes identically. You can swap one for the other by changing the constructor and nothing else.

One shape, two engines

typescript
import { OpenAIEmbeddingsAdapter } from '@nhtio/adk/batteries/embeddings/openai'

const embedder = new OpenAIEmbeddingsAdapter({
  model: 'text-embedding-3-small',
  apiKey: process.env.OPENAI_API_KEY,
})

const qv = await embedder.embed('how do trust tiers work?', { kind: 'query' })
const docs = await embedder.embedMany(corpusTexts) // kind defaults to 'document'

The WebLLM battery is the same call shape — only the constructor differs:

typescript
import { WebLLMEmbeddingsAdapter } from '@nhtio/adk/batteries/embeddings/webllm'

const embedder = new WebLLMEmbeddingsAdapter({
  model: 'snowflake-arctic-embed-m-q0f32-MLC',
  // Arctic is asymmetric: prefix queries, leave documents bare.
  queryPrefix: 'Represent this sentence for searching relevant passages: ',
  onInitProgress: (r) => console.log(r.text),
})

if (!embedder.isAvailable()) throw new Error('WebGPU required for the WebLLM embedder')
await embedder.preload() // warm the engine before the first query

const qv = await embedder.embed('how do trust tiers work?', { kind: 'query' })

Return type: number[], not Float32Array

embed returns number[] and embedMany returns number[][] — the native shape of both the OpenAI /v1/embeddings response (encoding_format: 'float') and WebLLM's Embedding.embedding. If you pack vectors into a contiguous typed-array buffer for fast cosine math, coerce at your boundary:

typescript
const vec = new Float32Array(await embedder.embed(text, { kind: 'query' }))

model is required — no default

Neither battery defaults the model. The right embedding model is a deployment decision (dimensionality, language, cost, latency), so you must name it. A missing or empty model throws at construction:

typescript
new OpenAIEmbeddingsAdapter({}) // throws E_INVALID_OPENAI_EMBEDDINGS_OPTIONS

Query vs document prefixes

Asymmetric embedding models expect an instruction prefix on the query side and none on the document side. The kind option drives this from one shared code path:

  • kind: 'query' → prepend queryPrefix (if set).
  • kind: 'document' → prepend documentPrefix (if set). Default when kind is omitted.

Set the prefixes once on the constructor; the battery applies them per call. Symmetric models (e.g. OpenAI text-embedding-3-*) need no prefix — leave both unset.

Wiring an embedder into retrieval

Embedders produce vectors; you own the vector store and the ranking. The pattern is: embed the query in turnInputPipeline, search your store, and inject the hits as Retrievable records with the similarity in score. The executor renders those records inside trust-tier envelopes — see Bring your own retrieval.

typescript
import { Retrievable } from '@nhtio/adk/common'
import { OpenAIEmbeddingsAdapter } from '@nhtio/adk/batteries/embeddings/openai'

const embedder = new OpenAIEmbeddingsAdapter({
  model: 'text-embedding-3-small',
  apiKey: process.env.OPENAI_API_KEY,
})

const retrievalMiddleware = async (ctx, next) => {
  const query = [...ctx.turnMessages].at(-1)?.content.toString() ?? ''
  if (query) {
    const qv = await embedder.embed(query, { kind: 'query' })
    const hits = await myVectorStore.search(qv, { topK: 5 }) // your store, your search

    for (const hit of hits) {
      const now = new Date()
      ctx.turnRetrievables.add(
        new Retrievable({
          id: hit.id,
          content: hit.text,
          trustTier: 'first-party',
          source: hit.url,
          score: hit.similarity, // the embedding-derived rank lands here
          createdAt: now,
          updatedAt: now,
        })
      )
    }
  }
  return next()
}

Embed your corpus the same way — with embedMany(texts) (documents) — at ingest time, store the vectors, and you are done. ADK plays no part in ingestion; it only consumes the Retrievable records your middleware produces.

Optional peer dependency

The WebLLM battery imports @mlc-ai/web-llm lazily, and the package is declared as an optional peer dependency. Consumers who only use the OpenAI battery (or no embeddings at all) install nothing extra and pay nothing in their bundle.