Encoders: vectors or text
The store's job is to hold number[] and rank them. It does not embed text, and it does not depend on anything that does. The moment a battery starts reaching for "the" embedding model, it has made an infrastructure decision that isn't its to make — so the vector battery doesn't. Text-to-vector is a callback you supply, with the same shape no matter where your embeddings come from.
The contract
type VectorEncoderFn = (texts: string[], kind: 'query' | 'document') => Promise<number[][]>Batch-shaped on purpose — one round-trip for many texts, because that's how every real embedding endpoint wants to be called. A single encode is encode([t]) and take the first result. The kind parameter distinguishes a query from a document, because modern retrieval encoders are asymmetric: they encode the thing you're searching for differently from the things you're searching through. The base threads 'query' for .nearText() and 'document' for text upserts, so your encoder can prefix or route accordingly.
Two sources, checked in order
An adapter gets a vector for text from exactly one of two places:
- Built-in server-side encoding. Some backends embed text themselves — Pinecone integrated inference, Weaviate's
text2vec-*modules. These declarecapabilities.builtInEncoding === true; the adapter ships raw text to the backend and no callback is involved. - A supplied
encoder. Every backend without built-in encoding takes yourVectorEncoderFnin options.
const vs = await createVectorStore({
client: () => import('@nhtio/adk/batteries/vector/qdrant').then((m) => m.QdrantVectorStore),
options: {
metric: 'cosine',
dimensions: 384,
encoder: myEncoder, // a VectorEncoderFn
connection: { url: 'http://localhost:6333' },
},
})
await vs('docs').nearText('how do I register a tool').select('id').limit(5)
// the base calls myEncoder(['how do I register a tool'], 'query'), then searches with the vectorNo encoder, no text
If an adapter has neither built-in encoding nor an encoder, then .nearText() — and a text-only .upsert() (a record with a document but no vector) — throws E_VECTOR_STORE_ENCODER_REQUIRED, naming the adapter and telling you to supply an encoder or use .nearVector(). .nearVector() and precomputed vector fields always work; the throw is only on the path that needs embedding. The failure is at query/execute time, because passing text is a runtime decision.
There is deliberately no serverSideEmbedding option. Whether a backend embeds server-side is a fixed fact about that backend (capabilities.builtInEncoding), not a switch you flip — flipping it wouldn't make a backend that can't embed start embedding. Read the capability; don't try to toggle it.
The one seam to the Embeddings battery
The vector battery never imports the Embeddings battery. But if you're already using it, you don't hand-write an adapter function — encoderFromEmbeddings() does it:
import { createVectorStore, encoderFromEmbeddings } from '@nhtio/adk/batteries/vector'
import { OpenAIEmbeddingsAdapter } from '@nhtio/adk/batteries/embeddings/openai'
const vs = await createVectorStore({
client: () => import('@nhtio/adk/batteries/vector/qdrant').then((m) => m.QdrantVectorStore),
options: {
dimensions: 1536,
encoder: encoderFromEmbeddings(new OpenAIEmbeddingsAdapter({ model: 'text-embedding-3-small' })),
connection: { url: 'http://localhost:6333' },
},
})encoderFromEmbeddings(adapter) wraps anything with the embeddings-battery shape — an object exposing embedMany(texts, { kind }) — into a VectorEncoderFn. It is the only coupling point between the two batteries, it is opt-in, and it lives in the vector battery, so importing the vector core pulls nothing from embeddings. A hand-written VectorEncoderFn (a fetch to your own embedding service, a local model, a stub) is exactly as first-class — the helper is a convenience, never a privileged path.
// equally valid: any function of the right shape
const encoder: VectorEncoderFn = async (texts, kind) => {
const res = await fetch('https://my-embeddings.internal/embed', {
method: 'POST',
body: JSON.stringify({ texts, kind }),
})
return (await res.json()).vectors // number[][]
}Keep one embedding runtime per corpus
Vectors are only comparable if they came from the same embedding model and runtime. Mixing encoders within one collection produces vectors that don't share a space, and similarity scores between them are noise. Pick one encoder per collection and index + query with it. (This is the lesson behind the Ask ADK showcase keeping a single runtime for both index-build and query.)
Where to go next
- The Query Builder & Filters —
.nearText()vs.nearVector()vs.nearId(). - Embeddings batteries — the bundled embedders
encoderFromEmbeddings()wraps. - Retrievable Glue — the encoder lives on the store; the glue passes text through and the store resolves it.