Skip to content
4 min read · 744 words

Schema & Migrations

The schema surface is the other half of the knex bet. You don't learn a new "create index" call per backend — you describe a collection the way you'd describe a table, and the adapter materializes it however its backend wants. The same description produces a CREATE TABLE with a vector(384) column on pgvector, a vec0 virtual table on sqlite-vec, and a recorded shape hint on a schemaless store like Qdrant.

Creating a collection

vs.schema.createCollection(name, cb) takes a callback that receives a CollectionBuilder — knex's createTable(name, tableBuilder => …), exactly:

typescript
await vs.schema.createCollection('docs', (c) => {
  c.vector({ dimensions: 384, metric: 'cosine' }) // the similarity field — REQUIRED
  c.string('source')
  c.integer('year').index()
  c.string('kind')
  c.boolean('published').nullable()
})

The one rule the table analogy doesn't have: a collection must declare a .vector(). It's the similarity field — the reason the collection exists — so a builder that never calls .vector() throws at build time. metric defaults to 'cosine' if you omit it.

The payload fields use the knex column vocabulary you already know:

MethodDeclaresChain
.vector({ dimensions, metric? })The similarity field (required)
.string(name)A text payload field.index(), .nullable()
.integer(name)An integer field.index(), .nullable()
.number(name)A float field.index(), .nullable()
.boolean(name)A boolean field.index(), .nullable()
.json(name)A nested/object field.index(), .nullable()

Advisory on some backends, authoritative on others

Payload-field declarations mean different things depending on what's underneath, and the battery is honest about which:

  • Schemaless backends (Qdrant, Pinecone, Chroma, …) store arbitrary metadata regardless. Here the declarations are advisory — the adapter records the declared shape so it can validate writes and drop index hints, but it will not stop you from upserting a field you didn't declare.
  • SQL backends (pgvector, sqlite-vec, …) need real columns. Here the declarations are authoritative — pgvector emits actual CREATE TABLE columns, a vector(N) column, and the correct index opclass; sqlite-vec emits the vec0 virtual-table column spec. What you declare is what exists.

This isn't a leak in the abstraction — it's the abstraction telling the truth. A schemaless store can't enforce a column it doesn't have, and pretending otherwise would be the kind of fake guarantee the kit refuses to make.

The rest of the lifecycle

typescript
await vs.schema.hasCollection('docs')               // → boolean   (knex: hasTable)
await vs.schema.createCollectionIfNotExists('docs', cb)
await vs.schema.dropCollection('docs')              // knex: dropTable
await vs.schema.dropCollectionIfExists('docs')      // knex: dropTableIfExists
await vs.schema.renameCollection('docs', 'documents')

renameCollection is the one operation not every backend can honour. Where the backend can't rename in place (most managed and several server backends), it throws E_VECTOR_STORE_UNSUPPORTED_OPERATION rather than faking it with a copy-and-drop you didn't ask for. Check vs.capabilities.rename first if you need it portably — see Consistency & Capabilities.

connect() / close() are not schema verbs

Opening the connection is await vs.connect(), closing it is await vs.close() — plain lifecycle on the store, not part of vs.schema. knex doesn't put "open a pool" in knex.schema either; neither do we. Construct the store, connect(), then build schema.

Migrations

Migrations mirror a knex migration module: a named pair of up / down functions, each handed a context whose schema is the same VectorSchemaBuilder you used above.

typescript
// migrations/0001_docs.ts
import type { VectorMigrationContext } from '@nhtio/adk/batteries/vector'

export const name = '0001_docs'

export async function up({ schema }: VectorMigrationContext) {
  await schema.createCollection('docs', (c) => {
    c.vector({ dimensions: 384, metric: 'cosine' })
    c.string('source')
    c.integer('year').index()
  })
}

export async function down({ schema }: VectorMigrationContext) {
  await schema.dropCollectionIfExists('docs')
}

The runner is vs.migrate, with the two verbs knex taught everyone:

typescript
await vs.migrate.latest()   // apply all pending migrations, in order → returns names applied this run
await vs.migrate.rollback() // run the last applied migration's down() → returns its name, or null if none

latest() reads the ledger, filters out already-applied migrations, and runs the rest in order — recording each in the ledger as it succeeds. rollback() takes the most recently applied migration, runs its down(), and removes it from the ledger.

Failure stops the line

A migration whose up() (or down()) throws surfaces E_VECTOR_STORE_MIGRATION_FAILED([name, message]), and the ledger does not advance past the failure. The migrations that ran before it stay recorded; the one that threw is not recorded; nothing after it runs. You fix the broken migration and re-run latest() — the already-applied ones are skipped, and the run resumes from the one that failed. No half-applied limbo, no "it recorded success but the collection isn't there."

The ledger

Applied-migration state lives in a MigrationLedger the adapter owns — applied(), record(name), remove(name) — persisted in a backend-appropriate place (a _vector_migrations collection or table), exactly as knex uses a knex_migrations table. You don't implement the ledger unless you're writing an adapter; the shipped adapters provide it.

Where to go next