Schema & Migrations
The schema surface is the other half of the knex bet. You don't learn a new "create index" call per backend — you describe a collection the way you'd describe a table, and the adapter materializes it however its backend wants. The same description produces a CREATE TABLE with a vector(384) column on pgvector, a vec0 virtual table on sqlite-vec, and a recorded shape hint on a schemaless store like Qdrant.
Creating a collection
vs.schema.createCollection(name, cb) takes a callback that receives a CollectionBuilder — knex's createTable(name, tableBuilder => …), exactly:
await vs.schema.createCollection('docs', (c) => {
c.vector({ dimensions: 384, metric: 'cosine' }) // the similarity field — REQUIRED
c.string('source')
c.integer('year').index()
c.string('kind')
c.boolean('published').nullable()
})The one rule the table analogy doesn't have: a collection must declare a .vector(). It's the similarity field — the reason the collection exists — so a builder that never calls .vector() throws at build time. metric defaults to 'cosine' if you omit it.
The payload fields use the knex column vocabulary you already know:
| Method | Declares | Chain |
|---|---|---|
.vector({ dimensions, metric? }) | The similarity field (required) | — |
.string(name) | A text payload field | .index(), .nullable() |
.integer(name) | An integer field | .index(), .nullable() |
.number(name) | A float field | .index(), .nullable() |
.boolean(name) | A boolean field | .index(), .nullable() |
.json(name) | A nested/object field | .index(), .nullable() |
Advisory on some backends, authoritative on others
Payload-field declarations mean different things depending on what's underneath, and the battery is honest about which:
- Schemaless backends (Qdrant, Pinecone, Chroma, …) store arbitrary metadata regardless. Here the declarations are advisory — the adapter records the declared shape so it can validate writes and drop index hints, but it will not stop you from upserting a field you didn't declare.
- SQL backends (pgvector, sqlite-vec, …) need real columns. Here the declarations are authoritative — pgvector emits actual
CREATE TABLEcolumns, avector(N)column, and the correct index opclass; sqlite-vec emits thevec0virtual-table column spec. What you declare is what exists.
This isn't a leak in the abstraction — it's the abstraction telling the truth. A schemaless store can't enforce a column it doesn't have, and pretending otherwise would be the kind of fake guarantee the kit refuses to make.
The rest of the lifecycle
await vs.schema.hasCollection('docs') // → boolean (knex: hasTable)
await vs.schema.createCollectionIfNotExists('docs', cb)
await vs.schema.dropCollection('docs') // knex: dropTable
await vs.schema.dropCollectionIfExists('docs') // knex: dropTableIfExists
await vs.schema.renameCollection('docs', 'documents')renameCollection is the one operation not every backend can honour. Where the backend can't rename in place (most managed and several server backends), it throws E_VECTOR_STORE_UNSUPPORTED_OPERATION rather than faking it with a copy-and-drop you didn't ask for. Check vs.capabilities.rename first if you need it portably — see Consistency & Capabilities.
connect() / close() are not schema verbs
Opening the connection is await vs.connect(), closing it is await vs.close() — plain lifecycle on the store, not part of vs.schema. knex doesn't put "open a pool" in knex.schema either; neither do we. Construct the store, connect(), then build schema.
Migrations
Migrations mirror a knex migration module: a named pair of up / down functions, each handed a context whose schema is the same VectorSchemaBuilder you used above.
// migrations/0001_docs.ts
import type { VectorMigrationContext } from '@nhtio/adk/batteries/vector'
export const name = '0001_docs'
export async function up({ schema }: VectorMigrationContext) {
await schema.createCollection('docs', (c) => {
c.vector({ dimensions: 384, metric: 'cosine' })
c.string('source')
c.integer('year').index()
})
}
export async function down({ schema }: VectorMigrationContext) {
await schema.dropCollectionIfExists('docs')
}The runner is vs.migrate, with the two verbs knex taught everyone:
await vs.migrate.latest() // apply all pending migrations, in order → returns names applied this run
await vs.migrate.rollback() // run the last applied migration's down() → returns its name, or null if nonelatest() reads the ledger, filters out already-applied migrations, and runs the rest in order — recording each in the ledger as it succeeds. rollback() takes the most recently applied migration, runs its down(), and removes it from the ledger.
Failure stops the line
A migration whose up() (or down()) throws surfaces E_VECTOR_STORE_MIGRATION_FAILED([name, message]), and the ledger does not advance past the failure. The migrations that ran before it stay recorded; the one that threw is not recorded; nothing after it runs. You fix the broken migration and re-run latest() — the already-applied ones are skipped, and the run resumes from the one that failed. No half-applied limbo, no "it recorded success but the collection isn't there."
The ledger
Applied-migration state lives in a MigrationLedger the adapter owns — applied(), record(name), remove(name) — persisted in a backend-appropriate place (a _vector_migrations collection or table), exactly as knex uses a knex_migrations table. You don't implement the ledger unless you're writing an adapter; the shipped adapters provide it.
Where to go next
- The Query Builder & Filters — query the collection you just created.
- Consistency & Capabilities —
vs.capabilities.renameand the rest of the static-truth flags. - Writing an Adapter — implement
createCollection/dropCollection/hasCollection/renameCollectionand the migration ledger for a new backend.