---
url: 'https://adk.nht.io/batteries/vector/query-builder.md'
description: >-
  The knex-style chainable builder, the backend-neutral filter tree, .whereRaw()
  with safe bindings, and the required .select() projection contract.
---

# The Query Builder & Filters

## LLM summary — The Query Builder & Filters

* `vs(collection)` returns a fresh `VectorQueryBuilder`. The builder is **thenable** (`implements PromiseLike<VectorMatch[]>`) — awaiting it runs the query via an internal `#run()`. There is no `.execute()`.
* A chain WITH a `.near*()` clause is a similarity search; the SAME chain WITHOUT one is a metadata filter-scan. Both compile to a `SearchPlan`.
* Filter methods: `.where(field, value)` | `.where(field, op, value)` | `.where({ field: value, … })` | `.where(cb)` (grouping callback); `.andWhere` (alias of where, incl. `cb`); `.orWhere(field, value | op, value)` | `.orWhere(cb)`; `.whereNot(f,v)` (→ `ne`) | `.whereNot(cb)` (→ `{ not: <group> }`); `.orWhereNot(f,v)` | `.orWhereNot(cb)`; `.whereIn(f, values)` (→ `in`); `.whereNotIn(f, values)` (→ `nin`); `.whereNull(f)` (→ `exists:false`); `.whereExists(f)` (→ `exists:true`); `.whereRaw(sql, bindings?)` or `.whereRaw({ $dialect, $raw, $bindings })`. The callback form receives a `FilterBuilder` (exported) — filter methods only — and compiles to one nested `VectorFilter`; groups nest to any depth.
* Operator aliases normalize to the neutral set: `= == === → eq`; `!= <> !== → ne`; `> → gt`; `>= → gte`; `< → lt`; `<= → lte`; plus literal `eq/ne/gt/gte/lt/lte/in/nin/exists/contains`. An unknown operator throws `E_VECTOR_STORE_UNSUPPORTED_FILTER_OPERATOR`.
* Chained `.where()` is AND. `.orWhere(field, …)` opens an alternative branch; the accumulated AND-list becomes the first OR-group (so `where(A).where(B).orWhere(C)` → `(A AND B) OR C`). Compiles to a `FilterGroup` (`{and:[…]}` or `{or:[{and:[…]},…]}`).
* **Grouping callbacks** mix AND and OR to any depth: `.where(cb)` / `.andWhere(cb)` push a nested group into the current AND-list; `.orWhere(cb)` opens an OR branch holding the group; `.whereNot(cb)` / `.orWhereNot(cb)` push a negated group (`{ not: <group> }`). The callback receives a `FilterBuilder` (filter methods only — no `near*`/`select`/`limit`/terminals) and mutates it in place (knex-style); its conditions compile to one nested `VectorFilter`. `.whereNot(field, value)` (scalar) is still `→ ne`. The 6 native filter translators (pgvector, qdrant, milvus, opensearch, elasticsearch, redis) recurse over the nested tree; the over-fetch adapters JS-evaluate it. Exception: **Chroma rejects a `not` group** with `E_VECTOR_STORE_UNSUPPORTED_FILTER_OPERATOR` (same as its `exists`/`contains` limits).
* Similarity clauses are mutually exclusive: a second `.nearVector()`/`.nearText()`/`.nearId()` throws `E_VECTOR_STORE_QUERY_CONFLICT`. `.nearText()` needs an encoder or built-in encoding (see Encoders) else `E_VECTOR_STORE_ENCODER_REQUIRED`; `.nearId(id)` searches by an existing record's vector.
* `.select(...)` is REQUIRED on reads — a read with no `.select()` throws `E_VECTOR_STORE_PROJECTION_REQUIRED` when awaited. Four opt-in columns: `id`, `vector`, `document`, `metadata` (id is opt-in too). Forms: bare strings `.select('id','document')`; tuples `.select(['vector', { name }])`; object `.select({ metadata: { fields: […] } })`; `.select('*')` selects all four. Per-column config: `vector → { name? }` (named vectors), `metadata → { fields? }` (payload subset), `document → { field? }`.
* `.limit(n)` sets top-k (default 10). `.offset(n)` where the backend supports it. `.consistency(mode)` is a per-op read-after-write override (see Consistency).
* Terminals: `.upsert(records)` (returns `Promise<void>`, no projection needed) and `.delete()` (returns `Promise<void>`). `.whereIn('id', ids).delete()` is recognized as a by-id delete (extracted into `DeletePlan.ids`); any other filter compiles to `DeletePlan.filter`.
* `raw(sql, bindings?)` (exported) makes a tagged `RawExpr` (`{ __raw, bindings }`) usable in value positions. `.whereRaw()` string form is sugar for a top-level `RawFilter` with `$dialect:'sql'`. Bindings are NEVER interpolated; placeholder (`?`) count must equal bindings length or `E_VECTOR_STORE_RAW_BINDING_MISMATCH` throws before dispatch.
* The neutral filter tree (`filters.ts`): `FilterCondition { field, op, value }`, `FilterGroup { and?, or?, not? }`, `RawFilter { $dialect, $raw, $bindings? }`. `evaluateFilter(filter, metadata)` is the JS reference evaluator that defines cross-adapter semantics; most adapters over-fetch then JS-filter for exact parity, raw filters are not JS-evaluable and throw.

The builder is the whole public surface. You will spend more time here than anywhere else in the battery, so it is built to be the part you never have to think about: it is knex, pointed at vectors. If you have ever written `knex('users').where('active', true).limit(10)`, you already know this API and can skip to [Filters](#filters).

## One chain, two query kinds

`vs(collection)` opens a fresh builder. What you hang off it decides what kind of query you get — and the difference is exactly one method call:

```typescript
// SIMILARITY SEARCH — has a .near*() clause. "Order by similarity" is implicit; .limit() is top-k.
await vs('docs')
  .where('kind', 'policy')
  .nearText('how do I register a tool')
  .select('id', 'document', 'score')
  .limit(10)

// FILTER-SCAN — the SAME chain, no .near*(). No similarity, no score, just rows that match.
await vs('docs')
  .where('kind', 'policy')
  .select('id', 'document')
  .limit(50)
```

A `.near*()` clause is what turns a scan into a search. Drop it and you have a metadata query; add it and you have a ranked nearest-neighbour search over the same filter. There is no separate "search" vs "query" method to choose between — the shape of the chain is the choice. `VectorMatch.score` shows up only on the similarity path, normalized to `[0,1]` (higher is more similar); on a filter-scan it is simply absent, because there is nothing to score against.

::: tip Awaiting is executing
The builder `implements PromiseLike`, so `await`ing it runs the query. There is no `.execute()`, no `.then()` you call yourself, no `.toSQL()`-style intermediate. A chain you don't `await` (or `.then()`) never runs — it's an inert plan until something resolves it.
:::

## The `.near*()` clauses

Three ways to say "near", mutually exclusive — a second one throws `E_VECTOR_STORE_QUERY_CONFLICT`, because "near A and also near B" is not a query, it's a bug:

| Method | Means | Needs |
| --- | --- | --- |
| `.nearVector(number[])` | Nearest to this vector | Nothing — you brought the vector |
| `.nearText(string)` | Nearest to this text | An `encoder` or built-in encoding (see [Encoders](./encoders)), else `E_VECTOR_STORE_ENCODER_REQUIRED` |
| `.nearId(string)` | Nearest to an existing record's vector | The record to exist in the collection |

`.nearText()` is the one that does work behind your back: the base resolves the text to a vector (via your `VectorEncoderFn`) or hands it to a backend that embeds server-side, then the adapter searches. `.nearId()` is "more like this one" — find the row, use its vector, search.

## Filters

`.where()` carries the full knex overload set, and every operator normalizes to one neutral vocabulary so the adapter underneath doesn't care which spelling you used:

```typescript
vs('docs')
  .where('source', '/assembly/byo-llm')   // (field, value) — defaults to eq
  .where('year', '>=', 2024)              // (field, op, value)
  .where({ kind: 'policy', lang: 'en' })  // object — each key is an eq, all ANDed
  .whereIn('tag', ['a', 'b'])             // → in
  .whereNotIn('tag', ['x'])               // → nin
  .whereNot('status', 'archived')         // → ne
  .whereExists('summary')                 // → exists:true
  .whereNull('deletedAt')                 // → exists:false
```

The operator you pass — `'>='`, `'gte'`, `'>'`, `'gt'` — is normalized to the neutral set (`eq ne gt gte lt lte in nin exists contains`). An operator outside that set throws `E_VECTOR_STORE_UNSUPPORTED_FILTER_OPERATOR` at chain time, not three layers deep in a driver. The set is deliberately the **intersection every backend can express cleanly** — there is no operator here that some adapter has to fake.

### AND, OR, and what knex taught everyone

Chained `.where()` calls AND together. `.orWhere()` opens an alternative branch; everything accumulated so far becomes the first arm of the OR:

```typescript
// (kind = 'policy' AND year >= 2024) OR (kind = 'reference')
vs('docs')
  .where('kind', 'policy')
  .where('year', '>=', 2024)
  .orWhere('kind', 'reference')
```

This compiles to the neutral `FilterGroup` tree — `{ and: [...] }` for the simple case, `{ or: [{ and: [...] }, ...] }` once an OR branch exists. You never see the tree unless you want to; it's the compile target adapters consume.

### Grouping: mixing AND and OR

Flat chaining gets you AND-lists and a top-level OR, but the moment you need `A AND (B OR C)` — an OR *nested inside* an AND — you need a parenthesis. knex spells that with a callback, and so does this builder: pass a function instead of a field, and it receives a fresh filter builder whose conditions become a single parenthesized group.

```typescript
// kind = 'policy' AND (year >= 2024 OR pinned = true)
vs('docs')
  .where('kind', 'policy')
  .andWhere((qb) => qb.where('year', '>=', 2024).orWhere('pinned', true))

// (status = 'active' OR status = 'trial') AND plan = 'pro'
vs('docs')
  .where((qb) => qb.where('status', 'active').orWhere('status', 'trial'))
  .where('plan', 'pro')
```

The callback gets a `FilterBuilder` — the filter half of the chain only. There's no `.nearText()`, `.select()`, `.limit()`, or terminal inside a group, because a group is a *predicate*, not a query. Groups nest to any depth: a callback can itself call `.andWhere(cb)`.

Negation is a group too. `.whereNot(cb)` (and `.orWhereNot(cb)`) wrap the built group in a `not`:

```typescript
// kind = 'policy' AND NOT (archived = true OR hidden = true)
vs('docs')
  .where('kind', 'policy')
  .whereNot((qb) => qb.where('archived', true).orWhere('hidden', true))
```

Note `.whereNot('field', value)` — the scalar two-argument form — is unchanged: it's still a single `ne` condition, not a group. Only the callback form produces a `{ not: … }` node.

::: warning Chroma does not support `not`
A `not` group reaches the backend on 28 of the 29 adapters — the six native translators (pgvector, Qdrant, Milvus, OpenSearch, Elasticsearch, Redis) emit it directly, and the over-fetch adapters evaluate it in JS. **Chroma is the exception:** its native `where` dialect has no negation, so a `not` group throws `E_VECTOR_STORE_UNSUPPORTED_FILTER_OPERATOR` — the same honest refusal it gives for `exists` and `contains` (see [The Adapter Matrix](./adapters)). Nested AND/OR works everywhere, Chroma included.
:::

## Projection is `.select()`, and it's required

There is **no default projection**. A read with no `.select()` throws `E_VECTOR_STORE_PROJECTION_REQUIRED` the moment you await it. This is deliberate and it is the one place the builder is stricter than knex: the kit makes no assumption about what you want back. Vectors are the large, heavy column — a silent `SELECT *` default would quietly ship megabytes you didn't ask for — but more fundamentally, **even `id` is opt-in**. You declare the result shape; the battery returns exactly that and nothing else.

Four columns, three input forms (mix freely, multiple `.select()` calls union):

```typescript
.select('id', 'document', 'metadata')                 // bare strings
.select(['vector', { name: 'title' }])                // tuple: column + config
.select({ metadata: { fields: ['source', 'kind'] } }) // object: column → config
.select('*')                                          // all four, including id + vectors
```

The per-column config maps to real driver capabilities, which is why it isn't decoration:

| Column | Config | Effect |
| --- | --- | --- |
| `id` | — | `VectorMatch.id` is present only when selected |
| `vector` | `{ name? }` | Selects a **named vector** in multi-vector collections (Qdrant/Weaviate); omitted = default vector |
| `metadata` | `{ fields? }` | Projects a **payload subset** (Qdrant `with_payload`, a SQL column list, Weaviate `returnProperties`); omitted = all metadata |
| `document` | `{ field? }` | When the doc text lives under a non-default key |

`score` is not a column — it is the search's own output, always present on a similarity search and never on a filter-scan, so you don't need to select it to receive it.

## Mutations: `.upsert()` and `.delete()`

The two terminals return a `Promise<void>`, not a builder, and need no projection:

```typescript
await vs('docs').upsert([
  { id: 'a', vector: [/* … */], document: 'text', metadata: { kind: 'policy' } },
  { id: 'b', document: 'encode me', metadata: {} }, // vector omitted → the store encodes the document
])

await vs('docs').whereIn('id', ['a', 'b']).delete()  // by id
await vs('docs').where('kind', 'stale').delete()      // by filter
```

`.whereIn('id', ids).delete()` is special-cased: the builder extracts the ids into `DeletePlan.ids` so adapters can use a cheap native delete-by-key. Any other filter compiles to `DeletePlan.filter` and the adapter resolves matching ids first. A bare `vs('docs').delete()` (no filter) is delete-all.

::: warning `.upsert()` validates every record
Records are validated against `vectorRecordSchema` before they reach a driver: a non-string `id`, a `NaN`/`Infinity` in a vector, a vector whose length disagrees with the declared `dimensions`, or an unknown top-level key all fail loud with `E_INVALID_VECTOR_RECORD` (reporting the offending record's index). Bad data dies at the seam, not inside the database.
:::

## Raw fragments, with bindings that are never interpolated

When the neutral operators aren't enough, drop to the backend's own dialect — but the escape hatch still refuses to let you build an injection bug. `.whereRaw()` and the standalone `raw()` helper carry bindings separately from the fragment:

```typescript
import { raw } from '@nhtio/adk/batteries/vector'

// SQL backends: ? placeholders + a bindings array, rewritten to the driver's positional form
.whereRaw("metadata->>'price' > ?", [100])

// object-dialect backends (Qdrant/Pinecone/…): structured passthrough
.whereRaw({ $dialect: 'qdrant', $raw: { must: [/* … */] } })

// raw() in a value position
.where('score', '>', raw('? * ?', [0.5, 2]))
```

The binding count is checked **before the query runs**: if the `?` placeholder count doesn't match the bindings array length, you get `E_VECTOR_STORE_RAW_BINDING_MISMATCH` immediately, not a malformed query at the driver. SQL adapters rewrite each `?` to the driver's positional form (`$1, $2…` for `pg`, `?` retained for sqlite) and pass bindings as the driver's parameter array — the fragment string and the values never meet until the driver joins them safely. A string `$raw` (a SQL fragment) handed to a non-SQL adapter throws `E_VECTOR_STORE_UNSUPPORTED_OPERATION`; a `RawFilter` whose `$dialect` doesn't match the adapter throws `E_VECTOR_STORE_UNSUPPORTED_FILTER_OPERATOR`.

## The compile target

Every chain compiles to an immutable, backend-neutral plan — `SearchPlan`, `UpsertPlan`, or `DeletePlan`. Adapters consume plans; **they never see the builder.** That keeps the builder a pure, unit-testable compiler, and it means a power user can hand-build a plan and run it directly without touching the chain at all. The filter portion of a plan is the neutral tree from `filters.ts`:

```typescript
type VectorFilter = FilterCondition | FilterGroup | RawFilter
// FilterCondition { field, op, value }
// FilterGroup     { and?, or?, not? }
// RawFilter       { $dialect, $raw, $bindings? }
```

`filters.ts` also exports `evaluateFilter(filter, metadata)` — a JS predicate evaluator that **defines the reference semantics** every adapter's native translator is tested against. Most adapters take the honest route: over-fetch candidates, then JS-filter with `evaluateFilter`, guaranteeing byte-for-byte identical filter behaviour across all 29 backends. Adapters only translate to a native filter dialect where the backend makes it cheap and correct (Qdrant, Pinecone, OpenSearch/Elasticsearch have a `translate*Filter` plus a dedicated unit test). Raw filters are intentionally **not** JS-evaluable — handing one to `evaluateFilter` throws, because a raw fragment is a promise to a specific backend, not a portable predicate.

## Where to go next

* [Schema & Migrations](./schema-migrations) — create the collection these queries run against.
* [Consistency & Capabilities](./consistency) — what `.consistency(mode)` does and when read-after-write is guaranteed.
* [Encoders: vectors or text](./encoders) — what makes `.nearText()` work.
