Encoders: vectors or text

The store's job is to hold number[] and rank them. It does not embed text, and it does not depend on anything that does. The moment a battery starts reaching for "the" embedding model, it has made an infrastructure decision that isn't its to make — so the vector battery doesn't. Text-to-vector is a callback you supply, with the same shape no matter where your embeddings come from.

The contract

typescript

type VectorEncoderFn = (texts: string[], kind: 'query' | 'document') => Promise<number[][]>

Batch-shaped on purpose — one round-trip for many texts, because that's how every real embedding endpoint wants to be called. A single encode is encode([t]) and take the first result. The kind parameter distinguishes a query from a document, because modern retrieval encoders are asymmetric: they encode the thing you're searching for differently from the things you're searching through. The base threads 'query' for .nearText() and 'document' for text upserts, so your encoder can prefix or route accordingly.

Two sources, checked in order

An adapter gets a vector for text from exactly one of two places:

Built-in server-side encoding. Some backends embed text themselves — Pinecone integrated inference, Weaviate's text2vec-* modules. These declare capabilities.builtInEncoding === true; the adapter ships raw text to the backend and no callback is involved.
A supplied encoder. Every backend without built-in encoding takes your VectorEncoderFn in options.

typescript

const vs = await createVectorStore({
  client: () => import('@nhtio/adk/batteries/vector/qdrant').then((m) => m.QdrantVectorStore),
  options: {
    metric: 'cosine',
    dimensions: 384,
    encoder: myEncoder, // a VectorEncoderFn
    connection: { url: 'http://localhost:6333' },
  },
})

await vs('docs').nearText('how do I register a tool').select('id').limit(5)
// the base calls myEncoder(['how do I register a tool'], 'query'), then searches with the vector

No encoder, no text

If an adapter has neither built-in encoding nor an encoder, then .nearText() — and a text-only .upsert() (a record with a document but no vector) — throws E_VECTOR_STORE_ENCODER_REQUIRED, naming the adapter and telling you to supply an encoder or use .nearVector(). .nearVector() and precomputed vector fields always work; the throw is only on the path that needs embedding. The failure is at query/execute time, because passing text is a runtime decision.

There is deliberately no serverSideEmbedding option. Whether a backend embeds server-side is a fixed fact about that backend (capabilities.builtInEncoding), not a switch you flip — flipping it wouldn't make a backend that can't embed start embedding. Read the capability; don't try to toggle it.

The one seam to the Embeddings battery

The vector battery never imports the Embeddings battery. But if you're already using it, you don't hand-write an adapter function — encoderFromEmbeddings() does it:

typescript

import { createVectorStore, encoderFromEmbeddings } from '@nhtio/adk/batteries/vector'
import { OpenAIEmbeddingsAdapter } from '@nhtio/adk/batteries/embeddings/openai'

const vs = await createVectorStore({
  client: () => import('@nhtio/adk/batteries/vector/qdrant').then((m) => m.QdrantVectorStore),
  options: {
    dimensions: 1536,
    encoder: encoderFromEmbeddings(new OpenAIEmbeddingsAdapter({ model: 'text-embedding-3-small' })),
    connection: { url: 'http://localhost:6333' },
  },
})

encoderFromEmbeddings(adapter) wraps anything with the embeddings-battery shape — an object exposing embedMany(texts, { kind }) — into a VectorEncoderFn. It is the only coupling point between the two batteries, it is opt-in, and it lives in the vector battery, so importing the vector core pulls nothing from embeddings. A hand-written VectorEncoderFn (a fetch to your own embedding service, a local model, a stub) is exactly as first-class — the helper is a convenience, never a privileged path.

typescript

// equally valid: any function of the right shape
const encoder: VectorEncoderFn = async (texts, kind) => {
  const res = await fetch('https://my-embeddings.internal/embed', {
    method: 'POST',
    body: JSON.stringify({ texts, kind }),
  })
  return (await res.json()).vectors // number[][]
}

Keep one embedding runtime per corpus

Vectors are only comparable if they came from the same embedding model and runtime. Mixing encoders within one collection produces vectors that don't share a space, and similarity scores between them are noise. Pick one encoder per collection and index + query with it. (This is the lesson behind the Ask ADK showcase keeping a single runtime for both index-build and query.)

Where to go next

The Query Builder & Filters — .nearText() vs .nearVector() vs .nearId().
Embeddings batteries — the bundled embedders encoderFromEmbeddings() wraps.
Retrievable Glue — the encoder lives on the store; the glue passes text through and the store resolves it.

What each pipeline owns

Envelopes

Persistence

Identity and Reasoning

Media

Encoders: vectors or text

The contract

Two sources, checked in order

The one seam to the Embeddings battery

Where to go next

What each pipeline owns

Envelopes

Persistence

Identity and Reasoning

Media

Encoders: vectors or text ​

The contract ​

Two sources, checked in order ​

The one seam to the Embeddings battery ​

Where to go next ​

Encoders: vectors or text

The contract

Two sources, checked in order

The one seam to the Embeddings battery

Where to go next