---
url: 'https://adk.nht.io/batteries/vector/schema-migrations.md'
description: >-
  Knex-shaped collection lifecycle via vs.schema and the vs.migrate runner
  (latest / rollback with a ledger).
---

# Schema & Migrations

## LLM summary — Schema & Migrations

* Collection lifecycle is a knex-shaped schema builder on `vs.schema` (a `VectorSchemaBuilder`), NOT ad-hoc methods. `connect()` / `close()` remain plain lifecycle on the store (knex has no schema verb for opening a pool either).
* `vs.schema.createCollection(name, cb)` — `cb` receives a `CollectionBuilder`. `createCollectionIfNotExists(name, cb)`; `hasCollection(name) → boolean`; `dropCollection(name)`; `dropCollectionIfExists(name)`; `renameCollection(from, to)` (throws `E_VECTOR_STORE_UNSUPPORTED_OPERATION` where the backend can't).
* `CollectionBuilder` methods: `.vector({ dimensions, metric? })` (REQUIRED — the similarity field; metric defaults to `'cosine'`); plus knex-style payload fields `.string(name)` / `.integer(name)` / `.number(name)` / `.boolean(name)` / `.json(name)`, each returning a `FieldChain` with `.index()` / `.nullable()`. A builder with no `.vector()` throws at build time.
* Payload-field declarations are **advisory** for schemaless backends (Qdrant/Pinecone/Chroma store arbitrary payload; the adapter records the declared shape for validation/index hints) and **authoritative** for SQL backends (pgvector emits real columns + a `vector(N)` column + the right index opclass; sqlite-vec emits the `vec0` virtual-table spec).
* The builder compiles to a neutral `CollectionSpec { collection, vector: { dimensions, metric }, fields: CollectionFieldSpec[] }`; `createCollection` validates it then delegates to the adapter's `createCollection(spec, ifNotExists)` primitive.
* Migrations mirror a knex migration module: each exports `up(ctx)` / `down(ctx)` receiving a `VectorMigrationContext { schema }`. The runner is `vs.migrate` (a `VectorMigrator`) with `latest()` (applies pending in order, records each in the ledger, returns names applied this run) and `rollback()` (runs the last applied migration's `down` in reverse, removes it from the ledger, returns its name or null).
* A throwing migration surfaces `E_VECTOR_STORE_MIGRATION_FAILED([name, message])` and does NOT advance the ledger past the failure. The ledger (`MigrationLedger`: `applied()` / `record(name)` / `remove(name)`) is a backend-appropriate store the adapter owns — analogous to knex's `knex_migrations` table.

The schema surface is the other half of the knex bet. You don't learn a new "create index" call per backend — you describe a collection the way you'd describe a table, and the adapter materializes it however its backend wants. The same description produces a `CREATE TABLE` with a `vector(384)` column on pgvector, a `vec0` virtual table on sqlite-vec, and a recorded shape hint on a schemaless store like Qdrant.

## Creating a collection

`vs.schema.createCollection(name, cb)` takes a callback that receives a `CollectionBuilder` — knex's `createTable(name, tableBuilder => …)`, exactly:

```typescript
await vs.schema.createCollection('docs', (c) => {
  c.vector({ dimensions: 384, metric: 'cosine' }) // the similarity field — REQUIRED
  c.string('source')
  c.integer('year').index()
  c.string('kind')
  c.boolean('published').nullable()
})
```

The one rule the table analogy doesn't have: a collection **must** declare a `.vector()`. It's the similarity field — the reason the collection exists — so a builder that never calls `.vector()` throws at build time. `metric` defaults to `'cosine'` if you omit it.

The payload fields use the knex column vocabulary you already know:

| Method | Declares | Chain |
| --- | --- | --- |
| `.vector({ dimensions, metric? })` | The similarity field (required) | — |
| `.string(name)` | A text payload field | `.index()`, `.nullable()` |
| `.integer(name)` | An integer field | `.index()`, `.nullable()` |
| `.number(name)` | A float field | `.index()`, `.nullable()` |
| `.boolean(name)` | A boolean field | `.index()`, `.nullable()` |
| `.json(name)` | A nested/object field | `.index()`, `.nullable()` |

### Advisory on some backends, authoritative on others

Payload-field declarations mean different things depending on what's underneath, and the battery is honest about which:

* **Schemaless backends** (Qdrant, Pinecone, Chroma, …) store arbitrary metadata regardless. Here the declarations are **advisory** — the adapter records the declared shape so it can validate writes and drop index hints, but it will not stop you from upserting a field you didn't declare.
* **SQL backends** (pgvector, sqlite-vec, …) need real columns. Here the declarations are **authoritative** — pgvector emits actual `CREATE TABLE` columns, a `vector(N)` column, and the correct index opclass; sqlite-vec emits the `vec0` virtual-table column spec. What you declare is what exists.

This isn't a leak in the abstraction — it's the abstraction telling the truth. A schemaless store can't enforce a column it doesn't have, and pretending otherwise would be the kind of fake guarantee the kit refuses to make.

## The rest of the lifecycle

```typescript
await vs.schema.hasCollection('docs')               // → boolean   (knex: hasTable)
await vs.schema.createCollectionIfNotExists('docs', cb)
await vs.schema.dropCollection('docs')              // knex: dropTable
await vs.schema.dropCollectionIfExists('docs')      // knex: dropTableIfExists
await vs.schema.renameCollection('docs', 'documents')
```

`renameCollection` is the one operation not every backend can honour. Where the backend can't rename in place (most managed and several server backends), it throws `E_VECTOR_STORE_UNSUPPORTED_OPERATION` rather than faking it with a copy-and-drop you didn't ask for. Check `vs.capabilities.rename` first if you need it portably — see [Consistency & Capabilities](./consistency).

::: tip `connect()` / `close()` are not schema verbs
Opening the connection is `await vs.connect()`, closing it is `await vs.close()` — plain lifecycle on the store, not part of `vs.schema`. knex doesn't put "open a pool" in `knex.schema` either; neither do we. Construct the store, `connect()`, then build schema.
:::

## Migrations

Migrations mirror a knex migration module: a named pair of `up` / `down` functions, each handed a context whose `schema` is the same `VectorSchemaBuilder` you used above.

```typescript
// migrations/0001_docs.ts
import type { VectorMigrationContext } from '@nhtio/adk/batteries/vector'

export const name = '0001_docs'

export async function up({ schema }: VectorMigrationContext) {
  await schema.createCollection('docs', (c) => {
    c.vector({ dimensions: 384, metric: 'cosine' })
    c.string('source')
    c.integer('year').index()
  })
}

export async function down({ schema }: VectorMigrationContext) {
  await schema.dropCollectionIfExists('docs')
}
```

The runner is `vs.migrate`, with the two verbs knex taught everyone:

```typescript
await vs.migrate.latest()   // apply all pending migrations, in order → returns names applied this run
await vs.migrate.rollback() // run the last applied migration's down() → returns its name, or null if none
```

`latest()` reads the ledger, filters out already-applied migrations, and runs the rest in order — recording each in the ledger as it succeeds. `rollback()` takes the most recently applied migration, runs its `down()`, and removes it from the ledger.

### Failure stops the line

A migration whose `up()` (or `down()`) throws surfaces `E_VECTOR_STORE_MIGRATION_FAILED([name, message])`, and **the ledger does not advance past the failure**. The migrations that ran before it stay recorded; the one that threw is not recorded; nothing after it runs. You fix the broken migration and re-run `latest()` — the already-applied ones are skipped, and the run resumes from the one that failed. No half-applied limbo, no "it recorded success but the collection isn't there."

### The ledger

Applied-migration state lives in a `MigrationLedger` the adapter owns — `applied()`, `record(name)`, `remove(name)` — persisted in a backend-appropriate place (a `_vector_migrations` collection or table), exactly as knex uses a `knex_migrations` table. You don't implement the ledger unless you're [writing an adapter](./custom-adapter); the shipped adapters provide it.

## Where to go next

* [The Query Builder & Filters](./query-builder) — query the collection you just created.
* [Consistency & Capabilities](./consistency) — `vs.capabilities.rename` and the rest of the static-truth flags.
* [Writing an Adapter](./custom-adapter) — implement `createCollection`/`dropCollection`/`hasCollection`/`renameCollection` and the migration ledger for a new backend.
