---
url: 'https://adk.nht.io/batteries/vector/encoders.md'
description: >-
  The BYO VectorEncoderFn contract, built-in server-side encoding, and
  encoderFromEmbeddings() — the seam to the Embeddings battery.
---

# Encoders: vectors or text

## LLM summary — Encoders

* The store holds `number[]`. Turning text into a vector is a **BYO contractually-enforced callback**, NOT a battery dependency: `VectorEncoderFn = (texts: string[], kind: 'query' | 'document') => Promise<number[][]>`. Batch-shaped — one round-trip for many texts.
* An adapter resolves text from exactly one of two sources, checked in order: (1) **built-in server-side encoding** — `capabilities.builtInEncoding === true` (Pinecone integrated inference, Weaviate `text2vec-*`); the adapter sends raw text, no callback needed. (2) a supplied `encoder` option (a `VectorEncoderFn`).
* With neither, `.nearText()` (and a text-only `.upsert()` with `document` but no `vector`) throws `E_VECTOR_STORE_ENCODER_REQUIRED`, naming the adapter. `.nearVector()` / precomputed `vector` always work regardless.
* The base resolves: array input passes through; query text → `encoder(texts, 'query')`; document text on upsert → `encoder(texts, 'document')` (batched); built-in-encoding adapters route raw text to the backend instead.
* `encoderFromEmbeddings(adapter)` (exported from the core barrel) wraps any object with an embeddings-battery shape (`embedMany(texts, { kind })`) into a `VectorEncoderFn`. It is the ONLY coupling point to the embeddings battery, it is opt-in, and it lives in the vector battery so nothing is imported transitively. A hand-written `VectorEncoderFn` is equally first-class.
* `serverSideEmbedding` is NOT an option — server-side encoding is a fixed adapter capability (`builtInEncoding`), not a user toggle. The encoder is validated as a function at construction.
* `kind` distinguishes query vs document encoding so asymmetric embedding models (most modern retrieval encoders) can prefix/encode the two differently.

The store's job is to hold `number[]` and rank them. It does not embed text, and it does not depend on anything that does. The moment a battery starts reaching for "the" embedding model, it has made an infrastructure decision that isn't its to make — so the vector battery doesn't. Text-to-vector is a callback you supply, with the same shape no matter where your embeddings come from.

## The contract

```typescript
type VectorEncoderFn = (texts: string[], kind: 'query' | 'document') => Promise<number[][]>
```

Batch-shaped on purpose — one round-trip for many texts, because that's how every real embedding endpoint wants to be called. A single encode is `encode([t])` and take the first result. The `kind` parameter distinguishes a **query** from a **document**, because modern retrieval encoders are asymmetric: they encode the thing you're searching for differently from the things you're searching through. The base threads `'query'` for `.nearText()` and `'document'` for text upserts, so your encoder can prefix or route accordingly.

## Two sources, checked in order

An adapter gets a vector for text from exactly one of two places:

1. **Built-in server-side encoding.** Some backends embed text themselves — Pinecone integrated inference, Weaviate's `text2vec-*` modules. These declare `capabilities.builtInEncoding === true`; the adapter ships raw text to the backend and no callback is involved.
2. **A supplied `encoder`.** Every backend without built-in encoding takes your `VectorEncoderFn` in options.

```typescript
const vs = await createVectorStore({
  client: () => import('@nhtio/adk/batteries/vector/qdrant').then((m) => m.QdrantVectorStore),
  options: {
    metric: 'cosine',
    dimensions: 384,
    encoder: myEncoder, // a VectorEncoderFn
    connection: { url: 'http://localhost:6333' },
  },
})

await vs('docs').nearText('how do I register a tool').select('id').limit(5)
// the base calls myEncoder(['how do I register a tool'], 'query'), then searches with the vector
```

::: danger No encoder, no text
If an adapter has neither built-in encoding nor an `encoder`, then `.nearText()` — and a text-only `.upsert()` (a record with a `document` but no `vector`) — throws `E_VECTOR_STORE_ENCODER_REQUIRED`, naming the adapter and telling you to supply an `encoder` or use `.nearVector()`. `.nearVector()` and precomputed `vector` fields always work; the throw is only on the path that needs embedding. The failure is at query/execute time, because passing text is a runtime decision.
:::

There is deliberately **no `serverSideEmbedding` option**. Whether a backend embeds server-side is a fixed fact about that backend (`capabilities.builtInEncoding`), not a switch you flip — flipping it wouldn't make a backend that can't embed start embedding. Read the capability; don't try to toggle it.

## The one seam to the Embeddings battery

The vector battery never imports the [Embeddings battery](../../assembly/batteries-embeddings). But if you're already using it, you don't hand-write an adapter function — `encoderFromEmbeddings()` does it:

```typescript
import { createVectorStore, encoderFromEmbeddings } from '@nhtio/adk/batteries/vector'
import { OpenAIEmbeddingsAdapter } from '@nhtio/adk/batteries/embeddings/openai'

const vs = await createVectorStore({
  client: () => import('@nhtio/adk/batteries/vector/qdrant').then((m) => m.QdrantVectorStore),
  options: {
    dimensions: 1536,
    encoder: encoderFromEmbeddings(new OpenAIEmbeddingsAdapter({ model: 'text-embedding-3-small' })),
    connection: { url: 'http://localhost:6333' },
  },
})
```

`encoderFromEmbeddings(adapter)` wraps anything with the embeddings-battery shape — an object exposing `embedMany(texts, { kind })` — into a `VectorEncoderFn`. It is the **only** coupling point between the two batteries, it is **opt-in**, and it lives in the vector battery, so importing the vector core pulls nothing from embeddings. A hand-written `VectorEncoderFn` (a `fetch` to your own embedding service, a local model, a stub) is exactly as first-class — the helper is a convenience, never a privileged path.

```typescript
// equally valid: any function of the right shape
const encoder: VectorEncoderFn = async (texts, kind) => {
  const res = await fetch('https://my-embeddings.internal/embed', {
    method: 'POST',
    body: JSON.stringify({ texts, kind }),
  })
  return (await res.json()).vectors // number[][]
}
```

::: tip Keep one embedding runtime per corpus
Vectors are only comparable if they came from the same embedding model and runtime. Mixing encoders within one collection produces vectors that don't share a space, and similarity scores between them are noise. Pick one encoder per collection and index + query with it. (This is the lesson behind the [Ask ADK showcase](../../showcase/ask-adk) keeping a single runtime for both index-build and query.)
:::

## Where to go next

* [The Query Builder & Filters](./query-builder) — `.nearText()` vs `.nearVector()` vs `.nearId()`.
* [Embeddings batteries](../../assembly/batteries-embeddings) — the bundled embedders `encoderFromEmbeddings()` wraps.
* [Retrievable Glue](./retrievable) — the encoder lives on the store; the glue passes text through and the store resolves it.
