Skip to content
7 min read · 1,472 words

The Query Builder & Filters

The builder is the whole public surface. You will spend more time here than anywhere else in the battery, so it is built to be the part you never have to think about: it is knex, pointed at vectors. If you have ever written knex('users').where('active', true).limit(10), you already know this API and can skip to Filters.

One chain, two query kinds

vs(collection) opens a fresh builder. What you hang off it decides what kind of query you get — and the difference is exactly one method call:

typescript
// SIMILARITY SEARCH — has a .near*() clause. "Order by similarity" is implicit; .limit() is top-k.
await vs('docs')
  .where('kind', 'policy')
  .nearText('how do I register a tool')
  .select('id', 'document', 'score')
  .limit(10)

// FILTER-SCAN — the SAME chain, no .near*(). No similarity, no score, just rows that match.
await vs('docs')
  .where('kind', 'policy')
  .select('id', 'document')
  .limit(50)

A .near*() clause is what turns a scan into a search. Drop it and you have a metadata query; add it and you have a ranked nearest-neighbour search over the same filter. There is no separate "search" vs "query" method to choose between — the shape of the chain is the choice. VectorMatch.score shows up only on the similarity path, normalized to [0,1] (higher is more similar); on a filter-scan it is simply absent, because there is nothing to score against.

Awaiting is executing

The builder implements PromiseLike, so awaiting it runs the query. There is no .execute(), no .then() you call yourself, no .toSQL()-style intermediate. A chain you don't await (or .then()) never runs — it's an inert plan until something resolves it.

The .near*() clauses

Three ways to say "near", mutually exclusive — a second one throws E_VECTOR_STORE_QUERY_CONFLICT, because "near A and also near B" is not a query, it's a bug:

MethodMeansNeeds
.nearVector(number[])Nearest to this vectorNothing — you brought the vector
.nearText(string)Nearest to this textAn encoder or built-in encoding (see Encoders), else E_VECTOR_STORE_ENCODER_REQUIRED
.nearId(string)Nearest to an existing record's vectorThe record to exist in the collection

.nearText() is the one that does work behind your back: the base resolves the text to a vector (via your VectorEncoderFn) or hands it to a backend that embeds server-side, then the adapter searches. .nearId() is "more like this one" — find the row, use its vector, search.

Filters

.where() carries the full knex overload set, and every operator normalizes to one neutral vocabulary so the adapter underneath doesn't care which spelling you used:

typescript
vs('docs')
  .where('source', '/assembly/byo-llm')   // (field, value) — defaults to eq
  .where('year', '>=', 2024)              // (field, op, value)
  .where({ kind: 'policy', lang: 'en' })  // object — each key is an eq, all ANDed
  .whereIn('tag', ['a', 'b'])             // → in
  .whereNotIn('tag', ['x'])               // → nin
  .whereNot('status', 'archived')         // → ne
  .whereExists('summary')                 // → exists:true
  .whereNull('deletedAt')                 // → exists:false

The operator you pass — '>=', 'gte', '>', 'gt' — is normalized to the neutral set (eq ne gt gte lt lte in nin exists contains). An operator outside that set throws E_VECTOR_STORE_UNSUPPORTED_FILTER_OPERATOR at chain time, not three layers deep in a driver. The set is deliberately the intersection every backend can express cleanly — there is no operator here that some adapter has to fake.

AND, OR, and what knex taught everyone

Chained .where() calls AND together. .orWhere() opens an alternative branch; everything accumulated so far becomes the first arm of the OR:

typescript
// (kind = 'policy' AND year >= 2024) OR (kind = 'reference')
vs('docs')
  .where('kind', 'policy')
  .where('year', '>=', 2024)
  .orWhere('kind', 'reference')

This compiles to the neutral FilterGroup tree — { and: [...] } for the simple case, { or: [{ and: [...] }, ...] } once an OR branch exists. You never see the tree unless you want to; it's the compile target adapters consume.

Grouping: mixing AND and OR

Flat chaining gets you AND-lists and a top-level OR, but the moment you need A AND (B OR C) — an OR nested inside an AND — you need a parenthesis. knex spells that with a callback, and so does this builder: pass a function instead of a field, and it receives a fresh filter builder whose conditions become a single parenthesized group.

typescript
// kind = 'policy' AND (year >= 2024 OR pinned = true)
vs('docs')
  .where('kind', 'policy')
  .andWhere((qb) => qb.where('year', '>=', 2024).orWhere('pinned', true))

// (status = 'active' OR status = 'trial') AND plan = 'pro'
vs('docs')
  .where((qb) => qb.where('status', 'active').orWhere('status', 'trial'))
  .where('plan', 'pro')

The callback gets a FilterBuilder — the filter half of the chain only. There's no .nearText(), .select(), .limit(), or terminal inside a group, because a group is a predicate, not a query. Groups nest to any depth: a callback can itself call .andWhere(cb).

Negation is a group too. .whereNot(cb) (and .orWhereNot(cb)) wrap the built group in a not:

typescript
// kind = 'policy' AND NOT (archived = true OR hidden = true)
vs('docs')
  .where('kind', 'policy')
  .whereNot((qb) => qb.where('archived', true).orWhere('hidden', true))

Note .whereNot('field', value) — the scalar two-argument form — is unchanged: it's still a single ne condition, not a group. Only the callback form produces a { not: … } node.

Chroma does not support not

A not group reaches the backend on 28 of the 29 adapters — the six native translators (pgvector, Qdrant, Milvus, OpenSearch, Elasticsearch, Redis) emit it directly, and the over-fetch adapters evaluate it in JS. Chroma is the exception: its native where dialect has no negation, so a not group throws E_VECTOR_STORE_UNSUPPORTED_FILTER_OPERATOR — the same honest refusal it gives for exists and contains (see The Adapter Matrix). Nested AND/OR works everywhere, Chroma included.

Projection is .select(), and it's required

There is no default projection. A read with no .select() throws E_VECTOR_STORE_PROJECTION_REQUIRED the moment you await it. This is deliberate and it is the one place the builder is stricter than knex: the kit makes no assumption about what you want back. Vectors are the large, heavy column — a silent SELECT * default would quietly ship megabytes you didn't ask for — but more fundamentally, even id is opt-in. You declare the result shape; the battery returns exactly that and nothing else.

Four columns, three input forms (mix freely, multiple .select() calls union):

typescript
.select('id', 'document', 'metadata')                 // bare strings
.select(['vector', { name: 'title' }])                // tuple: column + config
.select({ metadata: { fields: ['source', 'kind'] } }) // object: column → config
.select('*')                                          // all four, including id + vectors

The per-column config maps to real driver capabilities, which is why it isn't decoration:

ColumnConfigEffect
idVectorMatch.id is present only when selected
vector{ name? }Selects a named vector in multi-vector collections (Qdrant/Weaviate); omitted = default vector
metadata{ fields? }Projects a payload subset (Qdrant with_payload, a SQL column list, Weaviate returnProperties); omitted = all metadata
document{ field? }When the doc text lives under a non-default key

score is not a column — it is the search's own output, always present on a similarity search and never on a filter-scan, so you don't need to select it to receive it.

Mutations: .upsert() and .delete()

The two terminals return a Promise<void>, not a builder, and need no projection:

typescript
await vs('docs').upsert([
  { id: 'a', vector: [/* … */], document: 'text', metadata: { kind: 'policy' } },
  { id: 'b', document: 'encode me', metadata: {} }, // vector omitted → the store encodes the document
])

await vs('docs').whereIn('id', ['a', 'b']).delete()  // by id
await vs('docs').where('kind', 'stale').delete()      // by filter

.whereIn('id', ids).delete() is special-cased: the builder extracts the ids into DeletePlan.ids so adapters can use a cheap native delete-by-key. Any other filter compiles to DeletePlan.filter and the adapter resolves matching ids first. A bare vs('docs').delete() (no filter) is delete-all.

.upsert() validates every record

Records are validated against vectorRecordSchema before they reach a driver: a non-string id, a NaN/Infinity in a vector, a vector whose length disagrees with the declared dimensions, or an unknown top-level key all fail loud with E_INVALID_VECTOR_RECORD (reporting the offending record's index). Bad data dies at the seam, not inside the database.

Raw fragments, with bindings that are never interpolated

When the neutral operators aren't enough, drop to the backend's own dialect — but the escape hatch still refuses to let you build an injection bug. .whereRaw() and the standalone raw() helper carry bindings separately from the fragment:

typescript
import { raw } from '@nhtio/adk/batteries/vector'

// SQL backends: ? placeholders + a bindings array, rewritten to the driver's positional form
.whereRaw("metadata->>'price' > ?", [100])

// object-dialect backends (Qdrant/Pinecone/…): structured passthrough
.whereRaw({ $dialect: 'qdrant', $raw: { must: [/* … */] } })

// raw() in a value position
.where('score', '>', raw('? * ?', [0.5, 2]))

The binding count is checked before the query runs: if the ? placeholder count doesn't match the bindings array length, you get E_VECTOR_STORE_RAW_BINDING_MISMATCH immediately, not a malformed query at the driver. SQL adapters rewrite each ? to the driver's positional form ($1, $2… for pg, ? retained for sqlite) and pass bindings as the driver's parameter array — the fragment string and the values never meet until the driver joins them safely. A string $raw (a SQL fragment) handed to a non-SQL adapter throws E_VECTOR_STORE_UNSUPPORTED_OPERATION; a RawFilter whose $dialect doesn't match the adapter throws E_VECTOR_STORE_UNSUPPORTED_FILTER_OPERATOR.

The compile target

Every chain compiles to an immutable, backend-neutral plan — SearchPlan, UpsertPlan, or DeletePlan. Adapters consume plans; they never see the builder. That keeps the builder a pure, unit-testable compiler, and it means a power user can hand-build a plan and run it directly without touching the chain at all. The filter portion of a plan is the neutral tree from filters.ts:

typescript
type VectorFilter = FilterCondition | FilterGroup | RawFilter
// FilterCondition { field, op, value }
// FilterGroup     { and?, or?, not? }
// RawFilter       { $dialect, $raw, $bindings? }

filters.ts also exports evaluateFilter(filter, metadata) — a JS predicate evaluator that defines the reference semantics every adapter's native translator is tested against. Most adapters take the honest route: over-fetch candidates, then JS-filter with evaluateFilter, guaranteeing byte-for-byte identical filter behaviour across all 29 backends. Adapters only translate to a native filter dialect where the backend makes it cheap and correct (Qdrant, Pinecone, OpenSearch/Elasticsearch have a translate*Filter plus a dedicated unit test). Raw filters are intentionally not JS-evaluable — handing one to evaluateFilter throws, because a raw fragment is a promise to a specific backend, not a portable predicate.

Where to go next